Deep Dive into LLMs by Andrej Karpathy

A few weeks ago, I watched “Deep Dive into LLMs like ChatGPT” by Andrej Karpathy.

I found it EXTREMELY valuable. It’s the most concise material I’ve seen so far on this topic.

If you have some knowledge of software and want to learn about LLMs, this video is the one to watch. It explains all the main concepts and fundamentals of this new tech.

Interestingly enough, I got the feeling that Andrej also tried to make this video helpful for people not from the software industry as well. For sure, even if you are a baker who has tried ChatGPT at least once, this video can definitely teach you some aspects of how LLMs work. However, I think that not knowing the basics of software development could leave many parts of the video feeling like a mystery. So, I would not call it a pop-science video. It’s more of an educational video for programmers (in broader terms).

What I personally learned from this video:

Resources
- Tiktokenizer - to see the results of the tokenization
- LLM Visualization - an interactive tool to visualize early and small models of OpenAI developed by Brendan Bycroft
- Hyperbolic - a platform for hosting and training open source LLMs
- Together AI - another inference provider
- LM Arena - the comparison of LLMs
- AI news - an AI-gathered and summarized news website. Posts daily
- LM Studio - a tool for running small LLMs locally
Demonstration of how a model is taught to use tools by giving it samples that contain special tokens indicating tool usage, like Assistant:<SEARCH_START>Who is character X?<SEARCH_END>. I don’t yet know how exactly LLMs are taught to use custom tools, but I can imagine it happens similarly. For instance, you can provide a sample that lists the tools available to the model and the right answer that uses the appropriate tool for the question (or does not use any tool when none is appropriate).
An explanation of why LLMs struggle with math problems by showing how the LLM uses its computation by going through the network to generate the result. Doing only a few passes through the network means using very little compute since only a few tokens will be generated. Generating a single token can be considered a “making thought process” by the model. The results get more accurate when the model is asked to explain the solution, which makes it go through the network more times and increases the chances of getting the right answer. Other examples of challenging problems:
1. Count the number of dots in “…….”
2. String operations like “print every second character in the word ‘ubiquitous’”
3. Which is bigger: 9.11 or 9.9?
A phenomenal section about reinforcement learning. Previously, I had some general understanding of what it is, but this video explained a lot. Great examples and a great overview of the challenges and limitations of the method. A clear explanation of how it creates “reasoning” in the models.
Move 37 - I read about AlphaGo but never paid attention to why move 37 is important. What makes this move special is that it was considered bad by human players, but in fact it became a crucial move in the game. The network came up with this move by playing against itself and not with humans, which explains why it was considered unpopular by professional players. It’s interesting how Andrej mentions that observing a “move 37” in other domains could be a breakthrough.

All in all, I’m very glad that I watched this video. Highly recommend it, especially to those who want to learn more about the LLM domain but are at the beginning of their journey.