Sunday
The Short Case for Nvidia Stock — I spent less than an hour reading a pretty substantial portion of this article. It’s so good that I will need to allocate some time to read it again. The entire article, and especially the DeepSeek portion of it, is highly recommended, even if one is not interested in investing. It’s a detailed outlook for the entire AI industry.
As I am reading it the second time, the article covered tech that I have been following quite closely as well:
- It mentioned how Cerebras solved its yield problem, while I have read its CePO test-time compute strategies;
- It mentioned Groq, and I have tried its excellent and speedy inference service with a free account;
- It mentioned George Hotz’s Tiny Corp. and its tinygrad, which I have been closely following on X. Back in the day, George Hotz was famous for jailbreaking the original iPhone as a teenager;
- It mentioned MLX, which, as the article said, provides a PyTorch-like API that can run efficiently on Apple Silicon, showing how abstraction layers can enable AI workloads to run on completely different architectures. MLX is particularly interesting as it supports distributed computation — both training and inference — across multiple Macs. And its main contributor, Awni Hannun, mentioned today that DeepSeek R1 can run with 4-bit quantization across three 192 GB M2 Ultra Mac Studios at 12 tokens per second, requiring a minimum of 450 GB GPU memory;
- And of course, it covered DeepSeek R1 in sufficient technical detail.
Wow, what a gem as a long-form read!
P.S.
- Chamath Palihapitiya also thought the article is very good:
With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step-by-step without relying on massive supervised datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully crafted reward functions, they managed to get models to develop sophisticated reasoning capabilities completely autonomously. This wasn’t just about solving problems— the model organically learned to generate long chains of thought, self-verify its work, and allocate more computation time to harder problems.
The technical breakthrough here was their novel approach to reward modeling. Rather than using complex neural reward models that can lead to “reward hacking” (where the model finds bogus ways to boost their rewards that don’t actually lead to better real-world model performance), they developed a clever rule-based system that combines accuracy rewards (verifying final answers) with format rewards (encouraging structured thinking). This simpler approach turned out to be more robust and scalable than the process-based reward models that others have tried.
-
With extraordinary prescience, NVidia stock is down by over 14% around 11 a.m. the next morning, after this article is written. And the tech-heavy Nasdaq Composite falls 2.5%.
-
Simon Willison likes it too, calling it “Long, excellent piece by Jeffrey Emanuel capturing the current state of the AI/LLM industry.” —
The real joy of this article is the way it describes technical details of modern LLMs in a relatively accessible manner. I love this description of the inference-scaling tricks used by O1 and R1, compared to traditional transformers.
7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient — Interesting. DeepSeek R1’s RL training techniques can be successfully applied to smaller models as well, at least for simple math datasets.
DeepSeek R1 for Everyone and DeepSeek V3 101 — With a brief read, these look promising as an easy read to understand some of the technical details of DeepSeek R1 and V3.
“Agents” still haven’t really happened yet — “If you tell me that you are building “agents”, you’ve conveyed almost no information to me at all. Without reading your mind I have no way of telling which of the dozens of possible definitions you are talking about.”