Wednesday, Chinese New Year

On DeepSeek and Export Controls — Dario Amodei, Anthropic’s CEO, wrote a fairly long editorial on DeepSeek. However, it doesn’t mention at all the fact that DeepSeek’s models are open-weight models under a permissive MIT license, while Anthropic and OpenAI remained closed-weight models with no transparency on the technologies they used for both training and inference. At one point, Amodei mentioned that both DeepSeek and OpenAI o1 used RL, and used this to imply that DeepSeek’s use of RL to train R1-Zero is not so innovative. But we don’t know how OpenAI used RL to train o1, except that o1 “uses a chain of thought when attempting to solve a problem,” and that reinforcement learning has been used to train it1. It could be the case that DeepSeek’s use of RL for train-time compute is very different from o1, and the fact that its affiliated technical report goes into sufficient technical detail on GRPO that makes it fully reproducible is much more noteworthy.


DeepSeekMath Paper Explained — Yannic Kilcher gave this one-hour explanation of the DeepSeekMath paper. I watched the first five minutes and minute 30 and beyond on GRPO. His explanations of GRPO are top-notch. The final five minutes, on Section 5.2.2 (“Why RL Works”), is insightful and worth tuning into.


Complete hardware for the full DeepSeek R1 at Q8 quantization, at $6000 — The fact that this CPU-only server can generate at 6-8 tokens per second — the same as human reading speed — shows the very substantial advantage of Mixture-of-Experts (MOE) models when running in CPU-only home servers, as compared to dense models such as the Llama 3.1 405B. Assembling such a server is non-trivial and not for the faint of heart, but it certainly has been proven possible.

Footnotes

  1. Learning to reason with LLMs, OpenAI, September 12, 2024.