Baochun’s Notes

Home

❯

February 2025

❯

February 19

February 19

Wednesday

The Ultra-Scale Playbook: Training LLMs on GPU Clusters — Amazing, and finally we have a 100-page open-source online book on how models are trained with multiple GPUs, with reproducible source code.


Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch — Latest paper from DeepMind about efficient geographically distributed training with overlapped communication.


Baochun Li © 2025