Read papers, work tutorials, the learning will happen

(Previously: Building a language model from scratch, from a tutorial)

I started to get a little impatient in transcribing linear algebra code from the tutorial into my version. In part, it’s tedious typing. More interesting, GitHub Copilot trying to autocomplete the code was sometimes right and sometimes the wrong idiom and actively deciding which was which compounded the tedium. This is a big lesson for our “humans-in-the-loop supervise generative LLMs” near-future. 😬

OTOH, when I got to the part where an attention head was implemented, it made way more sense having read Attention is All You Need previously. That feels like a big level-up: reading papers and working through implementations thereof brings it all together in a big learning moment. Success! 📈


Building a language model from scratch, from a tutorial

I’m working from Brian Kitano’s Llama from scratch (or how to implement a paper without crying). It’s very deep, so I probably won’t make it all the way through the long-weekend I’ve allocated for it.

I’ve skimmed the paper, but didn’t pay extremely close attention to the implementation details. But the tutorial provides breadcrumbs into the deeply math-y bits. No problems here.

I noticed that there are Ruby bindings for most of these ML libraries and was tempted to try implementing it in the language I love. But I would rather not get mired in looking up docs or translating across languages/APIs. And, I want to get more familiar with Python (after almost twenty years of not using it).

I started off trying to implement this like a Linux veteran would, as a basic CLI program. Nonetheless I switched over to Jupyter as it looks like part of building models is analyzing plots and that’s not going to go well on a CLI. And, so I’m not swimming upstream so much.

Per an idea from Making Large Language Models work for you, I’m frequently using ChatGPT to quickly understand the PyTorch neural network APIs in context. Normally, I’d go on a time-consuming side-quest getting up to speed on an unfamiliar ecosystem. ChatGPT is reducing that from hours and possibly a blocker to a few minutes. Highly recommend reading those slides and trying a few of the ideas in your daily work.