Things We Learned About LLMs in 2024

Simon Willison summarizes key learnings about LLMs in 2024.

Here are some things I found interesting:

LLMs can handle more context, which we ought to be able to use to have it handle problems outside of its training data.
Gemini 1.5 Pro also illustrated one of the key themes of 2024: increased context lengths. Last year most models accepted 4,096 or 8,192 tokens, with the notable exception of Claude 2.1 which accepted 200,000. Today every serious provider has a 100,000+ token model, and Google’s Gemini series accepts up to 2 million. Longer inputs dramatically increase the scope of problems that can be solved with an LLM: you can now throw in an entire book and ask questions about its contents, but more importantly you can feed in a lot of example code to help the model correctly solve a coding problem. LLM use-cases that involve long inputs are far more interesting to me than short prompts that rely purely on the information already baked into the model weights.
LLMs gained more multi-modal capabilities, being able to process video and audio in near real time.
The cost of using LLMs has gone down, which may reflect increased efficiency and competition in the market. (Remember to renegotiate any contracts that may include prices per token.)
In December 2023 […] OpenAI were charging $30/million input tokens for GPT-4 […] Today […] GPT-4o is $2.50 (12x cheaper than GPT-4) and GPT-4o mini is $0.15/mTok—nearly 7x cheaper than GPT-3.5 and massively more capable.
Training costs have also gone down:
DeepSeek v3 is a huge 685B parameter model—one of the largest openly licensed models currently available, significantly bigger than the largest of Meta’s Llama series, Llama 3.1 405B. […] The really impressive thing about DeepSeek v3 is the training cost. The model was trained on 2,788,000 H800 GPU hours at an estimated cost of $5,576,000. Llama 3.1 405B trained 30,840,000 GPU hours—11x that used by DeepSeek v3, for a model that benchmarks slightly worse.
Although efficiency reduces “OpEx” environmental imapcts of LLMs, there are huge fixed “CapEx” environmental imapcts incurred by big tech companies:
Companies like Google, Meta, Microsoft and Amazon are all spending billions of dollars rolling out new datacenters, with a very material impact on the electricity grid and the environment.
On the question of whether LLM advances will slow as there’s limited new training data:
The idea is seductive: as the internet floods with AI-generated slop the models themselves will degenerate, feeding on their own output in a way that leads to their inevitable demise! That’s clearly not happening. Instead, we are seeing AI labs increasingly train on synthetic content—deliberately creating artificial data to help steer their models in the right way.

Simon’s post has a lot more in it, and he’s worth a follow.