James Edward Gray II, Scrappy Parsing:
The good news is that Elixir is the best language I have ever worked with for doing serious parsing. Let’s prove it. Let’s pull the data out of a SQLite database file using vanilla Elixir and some tricks from my Scrappy Programmer Livebook series.
Reader: he proves it. I enjoyed the heck out of reading this. James has still got it. (One should never doubt that he might lose it!)
James’ code reminded me of the many ways Elixir is really lovely. I’ve annotated his code with my own reflections on what makes Elixir great. All the code is James’ work; the comments are my, well, commentary and opinions.
# First off, Elixir is lovely and I'm a little sad I've
# never had the opportunity to work with it on a daily
# basis.
parse_page = fn bytes, i ->
start = if i == 1, do: 100, else: 0
# Reading binary data via pattern matching is one of the
# the best things about Erlang, so it's also a great thing
# about Elixir.
<<raw_type::1*8,
_first_page_freeblock::2*8,
cell_count::2*8,
_raw_cell_content_start::2*8,
_fragmented_free_bytes::1*8,
rest::binary>> = binary_slice(bytes, start, 12)
# "Plain old pattern matching", also lovely
type =
case raw_type do
2 -> :interior_index
5 -> :interior_table
10 -> :leaf_index
13 -> :leaf_table
end
right_most_pointer =
if type in [:interior_index, :interior_table] do
<<right_most_pointer::4*8>> = rest
right_most_pointer
else
nil
end
%{
index: i,
start: start,
type: type,
cell_count: cell_count,
right_most_pointer: right_most_pointer
}
end
# Here we have a function defined for a very specific pattern match, including structural _and_ guard conditions. How many conditionals does this save in function bodies? I don't know how often this helps daily Elixir users, but I sure do love reading it.
read_page = fn %{page_count: last_page} = db, i when i > 0 and i <= last_page ->
:file.position(db.file, (i - 1) * db.page_size)
# Despite many attempts, Elixir's syntax design for function
# pipelines will probably always eclipse the same design in JavaScript or Ruby.
# Spoiler: it's because Elixir doesn't have to rearrange `self` and the threaded parameter. 🤷🏻♂️
db.file
|> IO.binread(db.page_size)
|> parse_page.(i)
end
# I've tripped all over myself praising Elixir here, but I'd
# like to point out that Elixir has the same ungainly syntax
# for calling anonymous functions that Ruby does. 😆
open_db.(db_path, fn db ->
Enum.map(1..3//1, fn i ->
read_page.(db, i)
end)
end)
(This is not recruiter bait, but I’m listening, if you’re out there. 😉)