Baochun’s Notes

❯

❯

February 19

Wednesday

The Ultra-Scale Playbook: Training LLMs on GPU Clusters — Amazing, and finally we have a 100-page open-source online book on how models are trained with multiple GPUs, with reproducible source code.

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch — Latest paper from DeepMind about efficient geographically distributed training with overlapped communication.

Baochun Li © 2025