(Previously: Building a language model from scratch, from a tutorial)
I started to get a little impatient in transcribing linear algebra code from the tutorial into my version. In part, it’s tedious typing. More interesting, GitHub Copilot trying to autocomplete the code was sometimes right and sometimes the wrong idiom and actively deciding which was which compounded the tedium. This is a big lesson for our “humans-in-the-loop supervise generative LLMs” near-future. 😬
OTOH, when I got to the part where an attention head was implemented, it made way more sense having read Attention is All You Need previously. That feels like a big level-up: reading papers and working through implementations thereof brings it all together in a big learning moment. Success! 📈