simon-willison

April 11, 2024

Recently in pair programming with AI

My recent experience with GitHub Copilot Chat (non-autocomplete assistance) and Raycast’s ChatGPT-3.5 integrations lead me to think that prompting will be a crucial skill for most knowledge workers. (I’m hardly the first person to this observation!) Not quite so obtusely as using a Unix shell and command-line programs is for programmers. But, still a thing one will want to know how to get the most out of.

A few examples:

Talking/prompting a model into doing permutations and set math. This one is still blowing my mind a little. How does a word predictor do symbolic math? Not so much a stochastic parrot in this case!
Coaching a model to write like me, provide useful feedback, and not use the boring/hype-y voice of Twitter or whatever a model was trained on. Hitting “regenerate” a few times to get options on different prose to include. Using the model as an always-available writing partner. In particular for “help me keep going/writing here…”.
Building up programs from scratch with the assistance of a Copilot. Incrementally improving. Asking it for references on how to understand the library in question. Asking it for ideas on how to troubleshoot the program. I’ve used this to generate the boilerplate to get started with CodeMirror, with some success.

Elsewhere

Simon Willison, Building and testing C extensions for SQLite with ChatGPT Code Interpreter:

One of the infuriating things about working with ChatGPT Code Interpreter is that it often denies abilities that you know it has.

A non-trivial share of the prompting here is to remind ChatGPT “this is C and gcc, you know this”. I am not sure whether to eye-roll or laugh. Maybe part of the system prompt is “occasionally, you’ll require a little extra encouragement, just like a less-experienced human.” Simon did manage to give ChatGPT enough courage to build the extension, though!

Despite needing to encourage the thing, this bit is promising:

Here’s the thing I enjoy most about using Code Interpreter for these kinds of prototypes: since the prompts are short, and there’s usually a delay of 30s+ between each prompt while it does its thing, I can do the whole thing on my phone while doing other things.

In this particular case I started out in bed, then got up, fed the dog, made coffee and pottered around the house for a bit—occasionally glancing back at my screen and poking it in a new direction with another prompt.

This almost doesn’t count as a project at all. It began as mild curiosity, and I only started taking it seriously when it became apparent that it was likely to produce a working result.

I only switched to my laptop right at the end, to try out the macOS compilation steps.

Ilia Choly, Semgrep: AutoFixes using LLMs:

Although the built-in autofix feature is powerful, it’s limited to simple AST transforms. I’m currently exploring the idea of fixing semgrep matches using a Large Language Model (LLM). More specifically, each match is individually fed into the LLM and replaced with the response. To make this possible, I’ve created a tool called semgrepx, which can be thought of as xargs for semgrep. I then use semgrepx to rewrite the matches using the fantastic llm tool.

I’ve yet to land a big automated refactoring generated solely by abstract syntax tree-powered refactoring tools. By extension, I definitely haven’t tried marrying AST-based mass refactoring with an LLM. But it sounds neat!

September 6, 2023

Build with language models via llm

llm (previously) is a tool Simon Willison is working on for interacting with large language models, running via API or locally.

I set out to use llm as the glue for prototyping tools to generate embeddings from one of my journals so that I could experiment with search and clustering on my writings. Approximately, what I’m building is an ETL workflow: extract/export data from my journals, transform/index as searchable vectors, load/query for “what docs are similar to or match this query”.

Extract and transform, approximately

Given a JSON export from DayOne, this turned out to be a matter of shell pipelines. After some iterations with prompting (via Raycast’s GPT3.5 integration), I came up with a simple script for extracting entries and loading them into a SQLite database containing embedding vectors:

#!/bin/sh
# extract-entries.sh
# $ ./extract-entries Journals.json

file=$1
cat $file |
  jq '[.entries[] | {id: .uuid, content: .text}]' |
  llm embed-multi journals - \ # [1]
    --format json \
    --model sentence-transformers/all-MiniLM-L6-v2 \ # [2]
    --database journals.db \
    --store

A couple things to note here:

The placement of the - parameter matters here. I’m used to placing it at the end of the parameter list, but that didn’t work. The llm embed-multi docs suggest that --input is equivalent, but I think that’s a docs bug (the parameter doesn’t seem to exist in the released code).
I’m using locally-run model to generate the embeddings. This is very cool!

In particular, llm embed-multi takes one JSON doc per line, expecting id/content keys, and “indexes” those into a database of document/embedding rows. (If you’re thinking “hey, it’s SQLite, that has full search, why not both: yes, me too, that’s what I’m hoping to accomplish next!)

I probably could have just built this by iterating on shell commands, but I like editing with a full-blown editor and don’t particularly want to practice at using the zsh builtin editor. 🤷🏻‍♂️

Load, of a sort

Once that script finishes (it takes a few moments to generate all the embeddings), querying for documents similar to a query text is also straightforward:

# Query the embeddings and pretty display the results
# query.sh
# ./query.sh "What is good in life?"
query=$1

llm similar journals \
  --number 3 \
  --content "$query" |
  jq -r -c '.content' | # [1]
  mdcat # [2]

Of note, two things that probably should have been more obvious to me:

I don’t need to write a for-loop in shell to handle the output of llm similar; jq basically has an option for that
Pretty-printing Markdown to a terminal is trivial after brew install mdcat

I didn’t go too far into clustering, which also boils down to one command: llm cluster journals 10. I hit a hiccup wherein I couldn’t run a model like LLaMa2 or an even smaller one because of issues with my installation.

Things I learned!

jq is very good on its own!
- and has been for years, probably!
- using a copilot to help me take the first step with syntax using my own data is the epiphany here
llm is quite good, doubly so with its growing ecosystem of plugins
- if I were happier with using shells, I could have done all of this in a couple relatively simple commands
- it provides an adapter layer that makes it possible to start experimenting/developing against usage-priced APIs and switch to running models/APIs locally when you get serious
it’s feasible to do some kinds of LLM work on your own computer
- in particular, if you don’t mind trading your own time getting your installation right to gain independence from API vendors and usage-based pricing

Mission complete: I have a queryable index of document vectors I can experiment with for searching, clustering, and building applications on top of my journals.

September 3, 2023

Building a language model from scratch, from a tutorial

I’m working from Brian Kitano’s Llama from scratch (or how to implement a paper without crying). It’s very deep, so I probably won’t make it all the way through the long-weekend I’ve allocated for it.

I’ve skimmed the paper, but didn’t pay extremely close attention to the implementation details. But the tutorial provides breadcrumbs into the deeply math-y bits. No problems here.

I noticed that there are Ruby bindings for most of these ML libraries and was tempted to try implementing it in the language I love. But I would rather not get mired in looking up docs or translating across languages/APIs. And, I want to get more familiar with Python (after almost twenty years of not using it).

I started off trying to implement this like a Linux veteran would, as a basic CLI program. Nonetheless I switched over to Jupyter as it looks like part of building models is analyzing plots and that’s not going to go well on a CLI. And, so I’m not swimming upstream so much.

Per an idea from Making Large Language Models work for you, I’m frequently using ChatGPT to quickly understand the PyTorch neural network APIs in context. Normally, I’d go on a time-consuming side-quest getting up to speed on an unfamiliar ecosystem. ChatGPT is reducing that from hours and possibly a blocker to a few minutes. Highly recommend reading those slides and trying a few of the ideas in your daily work.