Monday
GRPO will soon be added to Apple MLX — The PR now works, using about 32 GB of memory when training Qwen2.5-0.5B
.
Minimal-R1 — Another excellent reproduction of DeepSeek R1 with GRPO, using only a 8xH100 server. It addresses the issue of scalability in Hugging Face’s Open-R1 when generating long completions. What makes it stand out is that it doesn’t depend on TRL, and has its own GRPO implementation. It dedicated one GPU for vLLM generation, and one GPU for the reference model.
Kevin Bryan shares his view on OpenAI Deep Research — Kevin Bryan from the University of Toronto shares his early experiences with OpenAI’s Deep Research. He is extremely upbeat about it, even sharing a paper that Deep Research (a.k.a. the o3 model with web search capabilities) wrote in 15 minutes, as well as another paper that is more theoretical.
Here are some interesting quotes of what Prof. Bryan said:
Nick Pretnar asks: Can it simultaneously write a paper + model code, estimate/calibrate such model, discern which results are relevant to discuss then present such results in a way humans can understand?
Kevin Bryan: That’s beyond current capabilities. But the proof of concept is pretty clear. At this point, it’s by far most useful as a complement — you should be writing your code with Cursor plus frontier models, having AI supplement and check analysis, having AI check proof accuracy, etc.
This is what I would call a human-in-the-loop approach to academic research. But of course, when abused, the landscape of academic research papers can have lots of mediocre AI-generated content in the near-term future.
WTF happened in 1971? — 1971 is indeed a special year, it was when Elon Musk, Marc Andreessen, Ma Huateng, Liu Yunhao, and I were born.