Tag: papers — Baochun Li

tiny-llm and Practical PyTorch Learning Prerequisites

By Baochun Li • Apr 29, 2025

tiny-llm Exactly what I wished for. It also contains links to two existing PyTorch related courses to machine learning from Carnegie Mellon University.

Ultra-Scale LLM Training Playbook and Streaming DiLoCo

By Baochun Li • Feb 19, 2025

The Ultra-Scale Playbook: Training LLMs on GPU Clusters — Amazing, and finally we have a 100-page open-source online book on how models are trained with multiple GPUs.

Unsloth GRPO, S1-Style Scaling, and RL Learning Resources

By Baochun Li • Feb 8, 2025

Unsloth.ai’s GRPO — it seems that the Unsloth implementation of GRPO uses less GPU memory, and it supports QLoRA and LoRA.

AI Peer Review with LLMs and S1 Test-Time Scaling

By Baochun Li • Feb 6, 2025

DOGE: Make AI Conferences Great Again — Zeyuan (Allen) Zhu wrote a very interesting piece on using LLMs as arbitrators in the reviewer-author discussions.

GRPO on Apple MLX and Minimal-R1 Scaling Insights

By Baochun Li • Feb 3, 2025

GRPO will soon be added to Apple MLX — The PR now works, using about 32 GB of memory when training Qwen2.5-0.5B.

DeepSeek, Export Controls, and Open-Weight AI Debates

By Baochun Li • Jan 29, 2025

On DeepSeek and Export Controls — Dario Amodei, Anthropic’s CEO, wrote a fairly long editorial on DeepSeek.

Qwen 2.5 7B 1M Local Testing and RL Survey Notes

By Baochun Li • Jan 27, 2025

Qwen 2.5 7B 1M — I have just tried Qwen’s latest local model, the 7B 1M, locally in LM Studio 0.3.8 (Build 4). I loaded an entire PhD thesis into the model, and LM Studio gleefully chose inject-full-content as its content injection strategy.

Nvidia, DeepSeek, and RL Reasoning: Long-Form Analysis Notes

By Baochun Li • Jan 26, 2025

Although it’s quite long, The Short Case for Nvidia Stock is a fascinating read. Also, agents are not happening yet.

Open-R1 and TinyZero: Early DeepSeek R1 Reproductions

By Baochun Li • Jan 25, 2025

Open-R1 — Hugging Face started to reproduce DeepSeek R1 in the open, and discussed the R1 technical report in a recorded YouTube video.