Nvidia, DeepSeek, and RL Reasoning: Long-Form Analysis Notes

TL;DR

Although it’s quite long, The Short Case for Nvidia Stock is a fascinating read. Also, agents are not happening yet.


The Short Case for Nvidia Stock — I spent less than an hour reading a pretty substantial portion of this article. It’s so good that I will need to allocate some time to read it again. The entire article, and especially the DeepSeek portion of it, is highly recommended, even if one is not interested in investing. It’s a detailed outlook for the entire AI industry.

As I am reading it the second time, the article covered tech that I have been following quite closely as well:

Wow, what a gem as a long-form read!

P.S.

With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step-by-step without relying on massive supervised datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully crafted reward functions, they managed to get models to develop sophisticated reasoning capabilities completely autonomously. This wasn’t just about solving problems— the model organically learned to generate long chains of thought, self-verify its work, and allocate more computation time to harder problems.

The technical breakthrough here was their novel approach to reward modeling. Rather than using complex neural reward models that can lead to “reward hacking” (where the model finds bogus ways to boost their rewards that don’t actually lead to better real-world model performance), they developed a clever rule-based system that combines accuracy rewards (verifying final answers) with format rewards (encouraging structured thinking). This simpler approach turned out to be more robust and scalable than the process-based reward models that others have tried.

The real joy of this article is the way it describes technical details of modern LLMs in a relatively accessible manner. I love this description of the inference-scaling tricks used by O1 and R1, compared to traditional transformers.


7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient — Interesting. DeepSeek R1’s RL training techniques can be successfully applied to smaller models as well, at least for simple math datasets.


DeepSeek R1 for Everyone and DeepSeek V3 101 — With a brief read, these look promising as an easy read to understand some of the technical details of DeepSeek R1 and V3.


“Agents” still haven’t really happened yet — “If you tell me that you are building “agents”, you’ve conveyed almost no information to me at all. Without reading your mind I have no way of telling which of the dozens of possible definitions you are talking about.”