Thursday
How to fine-tune open LLMs in 2025 with Hugging Face — Philipp Schmid a Technical Lead at Hugging Face, posted this article on fine-tuning LLMs using Hugging Face tools, without using the Unsloth API. I find it comprehensive and I will need to give it a try myself.
Mini-R1 — Philipp Schmid also posted this interesting reproduction of DeepSeek R1’s RL training. Similar to TinyZero, it used the Countdown Game as the task, but the article is much better written.
Mini-R1 used Hugging Face’s own TRL, designed to train transformer language models with RL in the post-training phase, which Hugging Face introduced in its smol course. To support multi-GPU training, it used DeepSpeed. In contrast, TinyZero used ByteDance’s veRL for both RL and distributed training, which doesn’t have either TRL or DeepSpeed in its dependencies.
veRL is based on HybridFlow, a University of Hong Kong/ByteDance paper published in EuroSys 2025, co-authored by Prof. Chuan Wu from the University of Hong Kong. I will allocate some time to study this paper in greater detail.
Microsoft added DeepSeek R1 to GitHub Models — I tried it with a simple question and not only the inference speed is astonishingly low, errors occurred before completing the answer. It is unusable at this point.