<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Baochun Li — blog</title><description>What I’ve Been Reading</description><link>https://baochun.org/</link><language>en-us</language><item><title>Defensible Moat and OpenAI</title><link>https://baochun.org/2026-04-08/</link><guid isPermaLink="true">https://baochun.org/2026-04-08/</guid><description>Does OpenAI has moat and is it defensible?</description><pubDate>Wed, 08 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I have read, in its entirety and with interest, an article written more than a year ago in November 2024, titled “&lt;a href=&quot;https://calpaterson.com/porter.html&quot;&gt;Building LLMs is probably not going be a brilliant business&lt;/a&gt;,” by Cal Paterson. Paterson argued that, just like airlines, OpenAI may not have the moat it needed to justify its 800+ billion valuation, not to mention the kind of &lt;a href=&quot;https://finance.yahoo.com/news/warren-buffett-explains-moat-principle-164442359.html&quot;&gt;defensible moat&lt;/a&gt; that Buffett was looking for. Apparently, John Gruber of Daring Fireball &lt;a href=&quot;https://daringfireball.net/linked/2024/11/29/cal-paterson-llms-as-businesses&quot;&gt;agreed&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;And Google agreed, too. In fact, in its &lt;a href=&quot;https://newsletter.semianalysis.com/p/google-we-have-no-moat-and-neither&quot;&gt;leaked internal document&lt;/a&gt;, Google claimed that open source AI will outcompete OpenAI (and itself). Written in May 2023, the document can have quite a bit of truth in it today, given how &lt;a href=&quot;https://z.ai/blog/glm-5.1&quot;&gt;Z.AI’s GLM 5.1&lt;/a&gt; performs compared to Opus 4.6 and GPT 5.4, as well as the fact that this 745B model can be &lt;a href=&quot;https://x.com/UnslothAI/status/2041552121259249850?s=20&quot;&gt;deployed locally&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>2026</category><author>Baochun Li</author></item><item><title>Autoresearch</title><link>https://baochun.org/2026-03-22/</link><guid isPermaLink="true">https://baochun.org/2026-03-22/</guid><description>Two weeks since Andrej Karpathy released Autoresearch, here are some noteworthy projects to keep an eye on.</description><pubDate>Sun, 22 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;It has been two weeks since Andrej Karpathy released &lt;a href=&quot;https://github.com/karpathy/autoresearch&quot;&gt;Autoresearch&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It has a simple idea: give an AI agent an environment where it knows what is the benchmark that it should run and optimize on, and ask it to repeatedly take actions to optimize the project for this particular benchmark. Experiment runs on the benchmark can be used to keep or discard the optimization, and optimizations that are kept will accumulate over time.&lt;/p&gt;
&lt;p&gt;Surprisingly, such a simple idea turns out to be extremely effective. As Kaparthy proclaimed:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;…in any case no one could tell if that’s right or wrong as the “code” is now a self-modifying binary that has grown beyond human comprehension.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The key is to define a precise benchmark that can be used to evaluate any solutions to a problem, so that an AI agent — or multiple collaborating agents — can run this benchmark to decide whether an idea should be kept or discarded. Naturally, since this requirement is not too exacting, quite a large number of projects have spun up, including my own experiments trying the idea on the &lt;a href=&quot;https://days.sh&quot;&gt;Days&lt;/a&gt; discrete-event network simulator, improving performance by over 25%. Autoresearch doesn’t really care about what you wish to optimize, as long as some precise benchmark is defined.&lt;/p&gt;
&lt;p&gt;This requirement, however, was not really satisfied in many academic research papers. Often, it is difficult to read between the lines to see what a paper is trying to optimize for. A paper can go on for 10 pages, yet there is not a single prescribed benchmark that can precisely capture the problem that it wishes to solve, and how the paper advances the state-of-the-art on this benchmark. In my own words, these papers are &lt;em&gt;not autoresearch-friendly&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Here are some noteworthy autoresearch projects over the past two weeks:&lt;/p&gt;
&lt;p&gt;—&lt;/p&gt;
&lt;p&gt;Shopify’s CEO, Tobi Lütke, announced that David Cortés and he implemented Autoresearch as a &lt;a href=&quot;https://pi.dev&quot;&gt;Pi&lt;/a&gt; extension, &lt;a href=&quot;https://github.com/davebcn87/pi-autoresearch&quot;&gt;pi-autoresearch&lt;/a&gt;, in about 2500 lines of TypeScript code.&lt;/p&gt;
&lt;p&gt;My own experiments in &lt;a href=&quot;https://days.sh&quot;&gt;Days&lt;/a&gt; used this extension, and it worked extremely well. Without any prompts and with only &lt;code&gt;/autoresearch&lt;/code&gt;, it would automatically dig into the codebase to find the most suitable benchmark to optimize for. After I provided a specific benchmark, it would switch to the one I asked for in my explicit prompt. For the initial benchmark that included a routing protocol implementation, the agent got a bit too eager and coded a custom routing implementation for FatTree topologies only, which breaks the routing mechanism when the topology is not a FatTree. Overall, however, autoresearch saved about 25% in runtime performance on this particular benchmark, which is quite a bit given that the codebase has already gone through many rounds of optimizations in the past.&lt;/p&gt;
&lt;p&gt;Since its inception, the &lt;code&gt;pi-autoresearch&lt;/code&gt; extension has been evolving. Two noteworthy improvements have been added in the past three days: a &lt;a href=&quot;https://github.com/davebcn87/pi-autoresearch/pull/22&quot;&gt;confidence score&lt;/a&gt; has been added, and additional &lt;a href=&quot;https://github.com/davebcn87/pi-autoresearch/pull/26&quot;&gt;actionable side information&lt;/a&gt; on why an optimization is discarded has been recorded.&lt;/p&gt;
&lt;p&gt;—&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://nousresearch.com/&quot;&gt;Nous Research&lt;/a&gt; used its open-source &lt;a href=&quot;https://hermes-agent.nousresearch.com/&quot;&gt;Hermes&lt;/a&gt; agent to &lt;a href=&quot;https://github.com/NousResearch/autonovel&quot;&gt;write a novel using autoresearch&lt;/a&gt;. The benchmark, in the context of autoresearch, is &lt;a href=&quot;https://github.com/NousResearch/autonovel/blob/master/reader_panel.py&quot;&gt;&lt;code&gt;reader_panel.py&lt;/code&gt;&lt;/a&gt;, which uses four different personas from Claude Opus 4.6 — the editor, the genre reader, the writer, and the first reader — to review the novel. It also runs &lt;a href=&quot;https://github.com/NousResearch/autonovel/blob/master/review.py&quot;&gt;&lt;code&gt;review.py&lt;/code&gt;&lt;/a&gt;, which also uses Claude Opus 4.6 with the following dual-persona prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Read the below novel, “{title}”. Review it first as a literary critic (like a newspaper book review) and then as a professor of fiction. In the later review, give specific, actionable suggestions for any defects you find. Be fair but honest. You don’t &lt;em&gt;have&lt;/em&gt; to find defects.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I like the inclusion of &lt;em&gt;you don’t have to find defects&lt;/em&gt; in the prompt. Strictly speaking, these reviews are not really a &lt;em&gt;precise&lt;/em&gt; benchmark, as Karpathy &lt;a href=&quot;https://x.com/karpathy/status/2034770453219484078?s=20&quot;&gt;mentioned&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Not exactly verifiable but might still work quite well given some effort.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Though they may be helpful to &lt;em&gt;edit&lt;/em&gt; the writeup, similar to discarding an experiment, the loop can certainly iterate confidently towards mediocre results. Still, this is a worthy experiment towards writing &lt;em&gt;anything&lt;/em&gt;, not just fiction.&lt;/p&gt;
&lt;p&gt;—&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://x.com/danveloper/status/2034353876753592372?s=20&quot;&gt;Autoresearching Apple’s LLM in a Flash to run Qwen 397B locally&lt;/a&gt;, by &lt;a href=&quot;https://x.com/danveloper&quot;&gt;Dan Woods&lt;/a&gt;, is a mind-boggling autoresearched advance towards running large models off SSDs on Macs. With freshly coded Objective-C, the AI agent can improve the performance of running a Qwen 3.5 397B MoE model on a MacBook Pro to around 6 token/second, which is extremely impressive. This also showcases the immense power of autoresearch and of AI agents in general, given the right context for them to get started working.&lt;/p&gt;
</content:encoded><category>2026</category><category>agents</category><category>workflows</category><author>Baochun Li</author></item><item><title>Converting PDFs with Apple Silicon GPU Acceleration</title><link>https://baochun.org/2026-03-21/</link><guid isPermaLink="true">https://baochun.org/2026-03-21/</guid><description>I discovered a better way of converting PDFs to Markdowns, with all mathematical formulas converted to LaTeX, on Apple silicon.</description><pubDate>Sat, 21 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Typically, PDF to Markdown converters either do not do a very good job converting mathematical formulas to LaTeX, or require an NVIDIA GPU to run a Transformer model. After quite a bit of work, I have discovered a way of converting PDF files, with all mathematical formulas converted to LaTeX, and using Apple silicon GPUs for acceleration.&lt;/p&gt;
&lt;p&gt;First, create a Python virtual environment and install &lt;code&gt;docling&lt;/code&gt; and &lt;code&gt;docling[vlm]&lt;/code&gt;. One way to do it is to quickly create a new file &lt;code&gt;pyproject.toml&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[project]
name = &amp;quot;pdf-convert&amp;quot;
version = &amp;quot;0.1.0&amp;quot;
description = &amp;quot;Setting up the virtual environment for converting PDFs with Apple Silicon GPUs.&amp;quot;
requires-python = &amp;quot;&amp;gt;=3.13&amp;quot;
dependencies = [
    &amp;quot;docling&amp;quot;,
    &amp;quot;docling[vlm]&amp;quot;,
]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and then run &lt;code&gt;uv sync&lt;/code&gt; and &lt;code&gt;source .venv/bin/activate&lt;/code&gt;. After setting up the environment, the launch command I used was:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;docling --enrich-formula --pipeline vlm --vlm-model granite_docling file.pdf
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This runs the &lt;a href=&quot;https://www.ibm.com/granite/docs/models/docling&quot;&gt;Granite Docling model&lt;/a&gt;, with 258M parameters, on the Apple Silicon GPUs with MLX. The conversion process may take a while, but the results look excellent. I have added my setup above to &lt;a href=&quot;https://github.com/baochunli/convert-pdf&quot;&gt;a git repository&lt;/a&gt; so that I can use it more easily.&lt;/p&gt;
</content:encoded><category>2026</category><category>workflows</category><author>Baochun Li</author></item><item><title>Successfully running Qwen 3.5 27B on my NVIDIA RTX 4090 (using 21 GB of CUDA memory)</title><link>https://baochun.org/2026-03-20/</link><guid isPermaLink="true">https://baochun.org/2026-03-20/</guid><description>Running Qwen 3.5 27B Q4KM on an RTX 4090 with llama-server and Hermes.</description><pubDate>Fri, 20 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I have successfully started the Qwen 3.5 27B model, with 4-bit quantization, on my GPU server with a NVIDIA RTX 4090.&lt;/p&gt;
&lt;p&gt;The launch command I used was:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;./build/bin/llama-server -m Qwen3.5-27B-Q4_K_M.gguf -ngl 99 -c 262144 -np 1 -fa on --cache-type-k q4_0 --cache-type-v q4_0 --host 0.0.0.0
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To use the model, I am running it with the &lt;a href=&quot;https://hermes-agent.nousresearch.com/&quot;&gt;Hermes&lt;/a&gt; agent. My setup was inspired by &lt;a href=&quot;https://x.com/sudoingX/status/2035000411342659979?s=20&quot;&gt;Sudo Su’s blog&lt;/a&gt;, which showed how capable the Qwen 3.5 9B model is when paired with the Hermes agent.&lt;/p&gt;
&lt;p&gt;With the 27B model, I am using around 21 GB of CUDA memory. It is hard to imagine having access to GPT-5 level intelligence on such modest GPU hardware! I have switched my &lt;a href=&quot;https://baochun.org/2026-02-26&quot;&gt;email triage system&lt;/a&gt; to Hermes and QWen 3.5 27B as well.&lt;/p&gt;
</content:encoded><category>2026</category><category>agents</category><author>Baochun Li</author></item><item><title>After the Prompt, Who Still Learns to Program?</title><link>https://baochun.org/2026-03-14/</link><guid isPermaLink="true">https://baochun.org/2026-03-14/</guid><description>A reflection on a New York Times Magazine story about AI coding tools, software labor, and what future programmers may stop learning by hand.</description><pubDate>Sat, 14 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;During my jog today, I listened to the New York Times Magazine article &lt;a href=&quot;https://www.nytimes.com/2026/03/12/magazine/ai-coding-programming-jobs-claude-chatgpt.html?unlocked_article_code=1.SlA.DBan.wbQDi-hptjj6&quot;&gt;Coding After Coders: The End of Computer Programming as We Know It&lt;/a&gt;. Read by James Patrick Cronin, it was an engaging 38-minute story about how coding evolved, from writing assembly to prompting Claude Code. Clive Thompson interviewed over 70 software developers, some optimistic, and some worried that A.I. is atomizing the work force.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Don’t get uppity at work — we could replace you with a bot.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I feel that A.I., as we know it, would forever change not only how software is to be developed, but also how computer science talent is to be educated. Fewer students will have any software development experience beyond prompting an A.I. agent, and when something is not optimally designed or breaks, no one will be able to fix them by hand.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://nicholas.carlini.com/writing/2026/how-to-win-a-best-paper-award.html&quot;&gt;How to win a best paper award&lt;/a&gt;, written by Nicholas Carlini, is worth a quick read. Most of the ideas here were great, for example:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The majority of my papers that have received best paper awards were rejected at least once before they got in. In one case, a paper of mine was rejected four times first.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And also:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Once you’ve read everything, the second step is to forget it all. The reason is simple: everything that’s already been done has already been done. If you constrain yourself to thinking only about what’s been done, you’ll never come up with something clever and new.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In a similar spirit, I recall Professor &lt;a href=&quot;https://en.wikipedia.org/wiki/Jane_Liu&quot;&gt;Jane W.S. Liu&lt;/a&gt; once said to her Ph.D. students: “Do not read more than 15 papers in your Ph.D. — you want to become a world-class researcher, not a mediocre one.”&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;If I were to briefly summarize the best writing advice I’ve received, it would be to listen to how your writing sounds spoken out loud, and try to make it understandable. I used to do this by reading my papers out loud to force myself to hear every word; I still do this sometimes, but now I also use text-to-speech systems to read the words back to me. You’ll notice things you’d never have caught yourself.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This reminds me of the NSF panel summary sessions, where panelists are required to read their panel summaries out loud to the entire panel. Apparently, many issues in writing can be caught by just listening to spoken words.&lt;/p&gt;
&lt;p&gt;But some of the other ideas seem to be quite dated in the A.I. era. For example, one doesn’t really need to proofread the work — the agents will gladly take over the job. Conducting many experiments are no longer harder than conducting a few: the agents will automatically conduct these for you.&lt;/p&gt;
</content:encoded><category>2026</category><category>agents</category><author>Baochun Li</author></item><item><title>My Multi-Agent Setup using Linear</title><link>https://baochun.org/2026-03-12/</link><guid isPermaLink="true">https://baochun.org/2026-03-12/</guid><description>I started to use Linear to track the tasks and their dependencies when I implemented new features with multiple agents in Codex.</description><pubDate>Thu, 12 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I installed OpenAI’s &lt;a href=&quot;https://github.com/openai/symphony&quot;&gt;Symphony&lt;/a&gt;, by simply directing Codex to its &lt;a href=&quot;https://github.com/openai/symphony/tree/main/elixir&quot;&gt;GitHub repository&lt;/a&gt;, as recommended by Symphony’s README:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Launch codex in your repo, give it the URL to the Symphony repo, and ask it to set things up for you.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Before I run Symphony, I will need to first create the issues corresponding to the task I would like to complete. I used the following prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Right now, there is no UI for a reviewer/TPC member to enter a review (or a meta review or a Area Chair review). The review form can already be configured by the chair under administration, but it did not surface to the reviewers or TPC members or Area Chairs. Each entry in the table in their list of “My Assignments” does not really have a button that they can use to view (and to review) the paper.&lt;/p&gt;
&lt;p&gt;Use your frontend design skills, design such a UI for reviewers/Area Chairs/TPC members to enter reviews/meta reviews/Area Chair reviews, based on the configured templates from the ‘review form’ configuration under Administration.&lt;/p&gt;
&lt;p&gt;Write a comprehensive plan carefully and break this into issues in project ‘reviewsdue’ in Linear. Scope each issue to one atomic task that can be completed with one agent and scoped to one reviewable PR. Include acceptance criteria in each issue. For each issue, add a follow-up ‘review’ issue that have acceptance criteria like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;reviewer reads the full diff, not just the summary&lt;/li&gt;
&lt;li&gt;verifies behavior against the acceptance criteria of the issue&lt;/li&gt;
&lt;li&gt;checks tests are adequate and not just passing narrowly&lt;/li&gt;
&lt;li&gt;records concrete findings or explicitly states no findings&lt;/li&gt;
&lt;li&gt;only closes after review comments are addressed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This provides a deliberate “implementation issue” followed by a separate “review issue” workflow, which is better than trusting the implementation issue alone. Set blocking relationships where order matters; later issues should be blocked by both its upstream implementation issue and their review gate. Use red/green test-driven development (TDD) when implementing each issue.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The first two paragraphs described the task I wish Codex to work on, and the later paragraphs instructed Codex to use Linear, and to create both implementation issues and their review gates.&lt;/p&gt;
&lt;p&gt;Symphony runs correctly, but I found that it is using a lot of tokens for simple features, and does not allow me to see what is going on in each of the active sessions it launched. For one issue, it couldn’t finish running the agent after over an hour — something must be broken inside the agent.&lt;/p&gt;
&lt;p&gt;So instead of depending on Symphony to pull the issues, I entered the following prompt:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Now that you have the Linear issues, pull each of these issues, respect dependency requirements, and then spawn subagents (in parallel, if needed) to resolve these issues. Once each agent is finished, commit with a detailed commit message following the PR instructions when pushing (but don’t need to PR, just commit). Respect instructions in &lt;a href=&quot;http://WORKFLOW.md&quot;&gt;WORKFLOW.md&lt;/a&gt; when pulling Linear issues. When all the issues have been completed, PR the entire feature with a detailed description.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This did the trick: it worked in a faster and perhaps more token-efficient way as using Symphony; but even more importantly, there is more transparency: I can read what the agents did in their sessions.&lt;/p&gt;
&lt;p&gt;There is one major advantage of using this workflow and Linear: after each issue is completed, Codex would add results of running the agents as a &lt;em&gt;comment&lt;/em&gt; in the issue itself, providing me with a central repository to log the history of all the work completed by the agents in the project. I think this is more important than the speed of running multiple agents, since I can read these comments and get a sense of what’s going on, reducing the cognitive debt when managing the project.&lt;/p&gt;
</content:encoded><category>2026</category><category>agents</category><author>Baochun Li</author></item><item><title>Computer Engineering in the Next 10 Years</title><link>https://baochun.org/2026-03-09/</link><guid isPermaLink="true">https://baochun.org/2026-03-09/</guid><description>A short answer to a student&apos;s question about AI agents, hardware progress, and why software creativity still matters.</description><pubDate>Mon, 09 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;My student asked me a question over Telegram: “How would computer engineering evolve over the next 10 years?”&lt;/p&gt;
&lt;p&gt;Here is my answer, with thinking set to “low”:&lt;/p&gt;
&lt;p&gt;The immense power of AI agents, distributed on billions of user devices, will be a norm, not an exception. Personalizing these agents will become cheaper and more accessible than ever, making it feel like living in prehistoric times if these agents are suddenly unavailable. Hardware advances will make these agents faster than ever, in both the cloud and user devices. Agents will be used to improve themselves, and to improve all disciplines in computer engineering in general. Though these agents will still not be very creative, they can help us create new ideas with accelerated speeds. Software will still be as relevant as ever, given that its design requires a lot of creativity.&lt;/p&gt;
</content:encoded><category>2026</category><category>agents</category><author>Baochun Li</author></item><item><title>Pages, Window Chrome, and Liquid Glass</title><link>https://baochun.org/2026-03-08/</link><guid isPermaLink="true">https://baochun.org/2026-03-08/</guid><description>A short note on why Pages&apos; older, more colorful chrome still feels preferable, and why staying on macOS 14.8 is a useful guardrail.</description><pubDate>Sun, 08 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://pxlnv.com/blog/window-chrome-of-our-discontent/&quot;&gt;The Window Chrome of Our Discontent&lt;/a&gt; is a great, well-crafted piece. I have been using Pages since its inception (around 2005), and in terms of chrome, I prefer having some vibrant colours in the first and second design, rather than the plain, greyscale design in the later versions. The worst UI design is, of course, the &lt;em&gt;Liquid Glass&lt;/em&gt; design in macOS Tahoe, which I refuse to upgrade to. I am currently still on macOS 14.8. Interestingly, because the latest release of Pages requires at least macOS 15, I am not allowed to upgrade to it. I consider this a feature as it serves as a guardrail against inadvertent upgrades. Oh, and I love it when Nick Heer, the author of this article, used “A.I.” for artificial intelligence, rather than “AI”.&lt;/p&gt;
</content:encoded><category>2026</category><author>Baochun Li</author></item><item><title>Rechecking the Days Codebase with GPT 5.4 and GPT 5.4 Pro</title><link>https://baochun.org/2026-03-06/</link><guid isPermaLink="true">https://baochun.org/2026-03-06/</guid><description>I rechecked the Days codebase with GPT 5.4 xhigh and GPT 5.4 Pro, and the pair of models has found serious issues in one aspect that I asked it to focus on in the current implementation.</description><pubDate>Fri, 06 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Here’s my past experience: it is a good idea to ask a new model, such as the just-released GPT 5.4, to review the codebase again. So I fired up Codex, and asked it to review the ring all-reduce implementation in &lt;a href=&quot;https://days.sh&quot;&gt;Days&lt;/a&gt;, my Rust-powered discrete-event network simulator. It quickly wrote the following review:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Findings&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;High: ring flow IDs are reserved incorrectly during config parsing. The parser advances by the configured flow_count in src/flows/collective.rs:316 and src/ flows/collective.rs:372, but runtime expansion emits 2 * n * (n - 1) flows and only then rewrites collective.flow_count in src/topos/topo.rs:676. A 4-rank ring reserves 4 IDs and later consumes 24, so any later flow or collective can collide.&lt;/li&gt;
&lt;li&gt;High: the dependency graph is not a real ring all-reduce schedule. Expansion only serializes each rank’s own outgoing steps in src/topos/topo.rs:605 and src/ topos/topo.rs:643. But the chunk mapping in src/topos/topo.rs:1269 means later steps should depend on receiving a chunk from the previous rank, not on the same sender’s previous transmit finishing. The model captures local serialization, not cross-rank data availability.&lt;/li&gt;
&lt;li&gt;High: the TCP path does not actually propagate or reduce data around the ring. Each directed link gets its own fresh zero-filled AppDataSource in src/topos/topo.rs:1281 backed by src/flows/app_source.rs:275, and flows just read byte slices from that local buffer in src/topos/topo.rs:1289. No receive-side statefeeds the next hop, and no reduction happens anywhere. This is not an all-reduce implementation; it is a pre-sliced traffic generator.&lt;/li&gt;
&lt;li&gt;High: handle-backed TCP ring flows silently drop any chunk smaller than the fixed 512-byte MSS. pull_from_appsource() can buffer a short chunk in src/flows/tcp_source.rs:321, but send_packet() only transmits when a full MSS fits in src/flows/tcp_source.rs:770 and src/flows/tcp_source.rs:804. stop_run() then treats that flow as complete in src/flows/source.rs:439. The bundled example at configs/collective_tcp.toml:39 uses size = 512 with 4 ranks, so each ring chunk is 128 bytes and every hop can complete without sending a packet.&lt;/li&gt;
&lt;li&gt;High: mixed broadcast + ring TCP collectives can lose the broadcast app-source actor because temporary ownership maps mix collective IDs and host IDs. Broadcast stores its source under &lt;a href=&quot;http://collective.id&quot;&gt;collective.id&lt;/a&gt; at src/topos/topo.rs:1246, ring later stores under src at src/topos/topo.rs:1307, and collective IDs start from zero in src/lib.rs:80. In the bundled config, the first broadcast and host 0 coexist in configs/collective_tcp.toml:21 and configs/collective_tcp.toml:39, so the broadcast source can be overwritten and dropped before registration.&lt;/li&gt;
&lt;li&gt;Medium: explicit ring paths are validated but ignored. Parsing accepts them in src/flows/collective.rs:201, but ring expansion hardcodes dst from sources in src/topos/topo.rs:609 and builds every ring flow with path: None in src/topos/topo.rs:619. So a user can believe they pinned ring routes while runtime silently falls back to normal routing.&lt;/li&gt;
&lt;li&gt;Medium: coverage is far too narrow for this feature. The only ring-specific test is src/topos/topo.rs:1434, and it only checks same-rank serialization. It does not exercise config parsing, ID allocation, runtime app-source wiring, chunk sizes below MSS, or explicit paths.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;And then summarized with the following verdict:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The main assumption behind the review is that RingAllReduce is intended to model an actual reduce-scatter + allgather collective, as the docs claim, rather
than just generate ring-shaped traffic. Under that assumption, the implementation is currently incorrect in both semantics and some concrete runtime behavior.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;That’s a pretty thorough and critical review! I then asked GPT 5.4 Pro to see if it can confirm that these are correct and if it has anything to add. GPT 5.4 Pro confirmed that everything is correct, and found three more suggestions. Then I asked Codex to fix these issues one by one, and reviewed the results.&lt;/p&gt;
</content:encoded><category>2026</category><category>workflows</category><author>Baochun Li</author></item><item><title>Harmony and Harness Engineering</title><link>https://baochun.org/2026-03-05/</link><guid isPermaLink="true">https://baochun.org/2026-03-05/</guid><description>A few weeks ago, OpenAI posted a blog post on harness engineering. Yesterday, it also released a component of its workflow as open-source, called Symphony.</description><pubDate>Thu, 05 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;On February 11, OpenAI posted a new blog post titled &lt;a href=&quot;https://openai.com/index/harness-engineering/&quot;&gt;Harness engineering: leveraging Codex in an agent-first world&lt;/a&gt;. I have not posted it in this website at the time, because I felt that it was pretty difficult to reproduce.&lt;/p&gt;
&lt;p&gt;Yesterday, OpenAI open-sourced &lt;a href=&quot;https://github.com/openai/symphony&quot;&gt;Symphony&lt;/a&gt; — OpenAI really loved music as it has another open-source repository called &lt;a href=&quot;https://github.com/openai/harmony&quot;&gt;Harmony&lt;/a&gt;. Symphony includes a reference implementation of managing the work that agents need to get done, implemented with &lt;a href=&quot;https://elixir-lang.org/&quot;&gt;Elixir&lt;/a&gt;, a dynamic, functional language for building scalable and maintainable applications.&lt;/p&gt;
&lt;p&gt;I should allocate a bit of time to study both the blog and Symphony.&lt;/p&gt;
</content:encoded><category>2026</category><category>workflows</category><author>Baochun Li</author></item><item><title>Donald Knuth on Claude</title><link>https://baochun.org/2026-03-04/</link><guid isPermaLink="true">https://baochun.org/2026-03-04/</guid><description>Prof. Donald Knuth, at age 88, said: “Shock! Shock! I learned yesterday that an open problem I’d been working on for several weeks had just been solved by Claude Opus 4.6 — Anthropic’s hybrid reasoning model that had been released three weeks earlier! It seems that I’ll have to revise my opinions about “generative AI” one of these days.”.</description><pubDate>Wed, 04 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;In his article titled &lt;a href=&quot;https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf&quot;&gt;Claude’s Cycles&lt;/a&gt;, Prof. Donald Knuth, at age 88, has carefully documented how Claude Opus 4.6 solved a problem that he has been working on for several weeks.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Shock! Shock! I learned yesterday that an open problem I’d been working on for several weeks had just
been solved by Claude Opus 4.6 — Anthropic’s hybrid reasoning model that had been released three weeks
earlier! It seems that I’ll have to revise my opinions about “generative AI” one of these days. What a joy
it is to learn not only that my conjecture has a nice solution but also to celebrate this dramatic advance in
automatic deduction and creative problem solving.&lt;/p&gt;
&lt;/blockquote&gt;
</content:encoded><category>2026</category><category>Agents</category><author>Baochun Li</author></item><item><title>Open-source PDF to Markdown with Marker</title><link>https://baochun.org/2026-03-02/</link><guid isPermaLink="true">https://baochun.org/2026-03-02/</guid><description>GPU-accelerated PDF-to-Markdown workflow with Marker that produces high-quality output quickly on an RTX 4090.</description><pubDate>Mon, 02 Mar 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I discovered a best-in-class open-source PDF to Markdown convertor: &lt;a href=&quot;https://github.com/datalab-to/marker&quot;&gt;Marker&lt;/a&gt;. On my NVIDIA RTX 4090 server, it converts a PDF in about 30 seconds to a minute, and the results are spectacular. I used &lt;code&gt;uv venv&lt;/code&gt; to create a virtual environment, activated it, and installed Marker within the environment using &lt;code&gt;pip install&lt;/code&gt;. It does require an NVIDIA GPU server, but the output quality is worth it.&lt;/p&gt;
</content:encoded><category>2026</category><category>CLI</category><author>Baochun Li</author></item><item><title>Email Triage System with Codex</title><link>https://baochun.org/2026-02-26/</link><guid isPermaLink="true">https://baochun.org/2026-02-26/</guid><description>An email triage system for Fastmail that auto-sorts messages by priority and drafts replies for high-priority emails.</description><pubDate>Thu, 26 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I just used Codex to implement a new email triage system. It uses JMAP to access my Fastmail account using an API token, and automatically triages inbound emails into high, medium, or low priority levels, and archives the medium and low-priority emails. For high priority emails, it will also automatically use Codex to draft responses. It runs on my Linux server every 15 minutes, and everything is configurable.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;I find the following prompt useful for reviewing a large codebase:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I want you to sort of randomly explore the code files in this project, choosing code files to deeply investigate and understand and trace their functionality and execution flows through the related code files which they import or which they are imported by. Once you understand the purpose of the code in the larger context of the workflows, I want you to do a super careful, methodical, and critical check with “fresh eyes” to find any obvious bugs, problems, errors, issues, silly mistakes, etc. and then systematically and meticulously and intelligently correct them. Be sure to comply with ALL rules in &lt;code&gt;AGENTS.md&lt;/code&gt; and ensure that any code you write or revise conforms to the best practice guides referenced in the &lt;code&gt;AGENTS.md&lt;/code&gt; file.&lt;/p&gt;
&lt;/blockquote&gt;
</content:encoded><category>2026</category><category>agents</category><author>Baochun Li</author></item><item><title>Building a Linear Walkthrough of a Codebase</title><link>https://baochun.org/2026-02-24/</link><guid isPermaLink="true">https://baochun.org/2026-02-24/</guid><description>I tried Simon Willison&apos;s prompt to build a linear walkthrough of Nextmini. Codex unsurprisingly launched several subagents as scouts to explore different parts of the codebase.</description><pubDate>Tue, 24 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I used the following prompt excerpted from Simon Willison’s &lt;a href=&quot;https://simonwillison.net/guides/agentic-engineering-patterns/linear-walkthroughs/&quot;&gt;excellent commentary&lt;/a&gt; to build a linear walkthrough of &lt;a href=&quot;https://nextmini.org&quot;&gt;Nextmini&lt;/a&gt;, a fairly complex codebase built in Rust. Unsurprisingly, Codex launched several subagents to scout different parts of the codebase without any additional hints on spawning subagents.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Read the source and then plan a linear walkthrough of the code that explains how it all works in detail.&lt;/p&gt;
&lt;p&gt;Then run “uvx showboat --help” to learn showboat - use showboat to create a &lt;a href=&quot;http://walkthrough.md&quot;&gt;walkthrough.md&lt;/a&gt; file in the repo and build the walkthrough in there, using showboat note for commentary and showboat exec plus sed or grep or cat or whatever you need to include snippets of code you are talking about.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;The command-line utility, &lt;code&gt;showboat&lt;/code&gt;, doesn’t need a skill, since &lt;code&gt;showboat --help&lt;/code&gt; is so comprehensive that the agent can understand how to use it just by reading the help.&lt;/p&gt;
&lt;p&gt;The entire session takes 9 minutes 41 seconds to complete with GPT 5.3 codex xhigh, which is suprisingly fast to me.&lt;/p&gt;
</content:encoded><category>2026</category><category>agents</category><author>Baochun Li</author></item><item><title>Agentic Engineering Patterns</title><link>https://baochun.org/2026-02-23/</link><guid isPermaLink="true">https://baochun.org/2026-02-23/</guid><description>I have read Simon Willison&apos;s Agentic Engineering Patterns, and red/green TDD, which I have not previously heard of, seems to be so effective that I must give it a try.</description><pubDate>Mon, 23 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I have read Simon Willison’s &lt;a href=&quot;https://simonwillison.net/2026/Feb/23/agentic-engineering-patterns/&quot;&gt;Agentic Engineering Patterns&lt;/a&gt;, and started to wonder why good content like this can be read for free over the Internet. I have never previously heard about the red/green test-driven development (TDD), but it feels so powerful that I must give it a try soon on one of my open-source projects.&lt;/p&gt;
</content:encoded><category>2026</category><category>agents</category><author>Baochun Li</author></item><item><title>My Own Extension for the Pi Coding Agent</title><link>https://baochun.org/2026-02-17/</link><guid isPermaLink="true">https://baochun.org/2026-02-17/</guid><description>I wrote my own extension for the Pi coding agent to allow me to start multiple agents that collaborate with one another by sending and receiving messages.</description><pubDate>Tue, 17 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Today, I wrote &lt;a href=&quot;https://github.com/baochunli/pi-collaborating-agents&quot;&gt;my own extension&lt;/a&gt;, called &lt;em&gt;collaborating agents&lt;/em&gt;, for the &lt;a href=&quot;https://pi.dev/&quot;&gt;Pi coding agent&lt;/a&gt; — which the famous (or infamous) &lt;a href=&quot;https://openclaw.ai/&quot;&gt;OpenClaw&lt;/a&gt; is based upon. It allows me to easily work with multiple agents that can collaborate with one another by sending and receiving messages, and to allow an orchestrator agent to spawn multiple subagents.&lt;/p&gt;
&lt;p&gt;My idea is inspired by Jeffrey Emanuel’s &lt;a href=&quot;https://github.com/Dicklesworthstone&quot;&gt;Agentic Coding Flywheel&lt;/a&gt;, and in particular its &lt;a href=&quot;https://github.com/Dicklesworthstone/mcp_agent_mail&quot;&gt;MCP Agent Mail&lt;/a&gt;. Spawning multiple agents is definitely helpful from a context engineering point of view, but they need to be able to communicate with one another, and to reserve and release files so that conflicts can be avoided.&lt;/p&gt;
&lt;p&gt;Jeffrey Emanuel’s system can be effective, but it is way too complex for me to use. In contrast, my new Pi extension is designed for agents to be easily spawned and to talk to one another, but with only around 3400 lines of TypeScript code. I wrote the extension with less than a day of work, written by working with Pi itself. It may need a bit more fine-tuning to be battle-tested, but it is very usable already.&lt;/p&gt;
&lt;p&gt;To install this &lt;em&gt;collaborating agents&lt;/em&gt; extension and its included skill, install Pi first:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;bun install -g @mariozechner/pi-coding-agent
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And then choose one of the installation options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;To install the extension as an &lt;a href=&quot;https://npmjs.com&quot;&gt;npm&lt;/a&gt; package, run:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;pi install npm:@baochunli/pi-collaborating-agents
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;To install it from the git repository, run:&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;pi install https://github.com/baochunli/pi-collaborating-agents
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To update the extension and skill to the latest release:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;pi update
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To remove the extension and skill:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;pi remove npm:@baochunli/pi-collaborating-agents
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;or (if installed from a git repository):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;pi remove https://github.com/baochunli/pi-collaborating-agents
&lt;/code&gt;&lt;/pre&gt;
</content:encoded><category>2026</category><category>agents</category><category>CLI</category><author>Baochun Li</author></item><item><title>Token Anxiety and Cognitive Debt</title><link>https://baochun.org/2026-02-16/</link><guid isPermaLink="true">https://baochun.org/2026-02-16/</guid><description>More of us are replacing Netflix with Codex and spinning up a new agentic session before falling asleep.</description><pubDate>Mon, 16 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I read two pieces today that somehow connect: &lt;a href=&quot;https://writing.nikunjk.com/p/token-anxiety&quot;&gt;token anxiety&lt;/a&gt; and &lt;a href=&quot;https://simonwillison.net/2026/Feb/15/cognitive-debt/&quot;&gt;cognitive debt&lt;/a&gt;. I enjoyed reading the following paragraph:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I replaced Netflix with Claude Code. I lie in bed thinking about what I can spin up before I fall asleep, what can run while I’m unconscious. Reading a novel feels indulgent now. Watching a movie without a laptop open feels wasteful. This voice in my head that says “something could be running right now” just doesn’t shut off. I’m not even building a company. I’m just addicted to building my random ideas.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;As tokens becoming less expensive ($1 an hour for the new &lt;a href=&quot;https://www.minimax.io/news/minimax-m25&quot;&gt;MiniMax 2.5&lt;/a&gt;), I think it is more and more addictive and cognitive debt will become widespread — not only on code, but also on research papers as they are more and more written by AI.&lt;/p&gt;
</content:encoded><category>2026</category><category>agents</category><category>CLI</category><author>Baochun Li</author></item><item><title>Codex is finally able to draw reasonably well</title><link>https://baochun.org/2026-02-15/</link><guid isPermaLink="true">https://baochun.org/2026-02-15/</guid><description>I have been looking for a way to get Codex to draw figures reasonably well. I think I finally found a way.</description><pubDate>Sun, 15 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I discovered an effective tool called &lt;a href=&quot;https://github.com/yctimlin/mcp_excalidraw&quot;&gt;mcp_excalidraw&lt;/a&gt; — which combines the powers of an MCP server and a skill — to get codex to draw figures using &lt;a href=&quot;https://excalidraw.com&quot;&gt;Excalidraw&lt;/a&gt; reasonably well.&lt;/p&gt;
&lt;p&gt;The difference between this tool and other alternatives is that, with this tool, Codex can see the figures by capturing screenshots using MCP calls. It is a bit of additional work to set up, but well worth it. I followed the following steps.&lt;/p&gt;
&lt;p&gt;First, I cloned the git repo:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;git clone git@github.com:yctimlin/mcp_excalidraw.git
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I copied the skill to my own skills folder:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;cp -R mcp_excalidraw/skills/excalidraw-skill ~/.agents/skills
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then I installed the MCP server:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;codex mcp add excalidraw \
  --env EXPRESS_SERVER_URL=http://localhost:3000 \
  --env ENABLE_CANVAS_SYNC=true \
  -- node /Users/bli/Playground/mcp_excalidraw/dist/index.js
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, I started the local web server (which the git repo did not mention):&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;cd mcp_excalidraw
npm install
npm run build
HOST=0.0.0.0 PORT=3000 npm run canvas
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In Codex, I just need to say: “Use the Excalidraw skill, draw…”. &lt;a href=&quot;https://nextmini.org/docs/design/architecture&quot;&gt;Here&lt;/a&gt; is an example figure drawn by Codex (with only minor adjustments by me).&lt;/p&gt;
&lt;p&gt;When I don’t need to draw figures (which is most of the time), I would just remove it for better efficiency:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;codex mcp remove excalidraw
&lt;/code&gt;&lt;/pre&gt;
</content:encoded><category>2026</category><category>agents</category><category>CLI</category><author>Baochun Li</author></item><item><title>Migrated Days and Nextmini websites to TanStack Start</title><link>https://baochun.org/2026-02-14/</link><guid isPermaLink="true">https://baochun.org/2026-02-14/</guid><description>It is surprisingly straightforward to migrate a website from Next.js to TanStack Start.</description><pubDate>Sat, 14 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Today, I have migrated both &lt;a href=&quot;https://days.sh&quot;&gt;Days&lt;/a&gt; and &lt;a href=&quot;https://nextmini.org&quot;&gt;Nextmini&lt;/a&gt; websites from &lt;a href=&quot;https://nextjs.org/&quot;&gt;Next.js&lt;/a&gt; to &lt;a href=&quot;https://tanstack.com/&quot;&gt;TanStack Start&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It is a surprise that Codex only needs a few minutes to migrate each project. Since their respective documentation websites are already using &lt;a href=&quot;https://www.fumadocs.dev/&quot;&gt;Fumadocs&lt;/a&gt; and TanStack Start, this is a natural transition and a more seamless fit. As a new web framework, &lt;a href=&quot;https://tanstack.com/&quot;&gt;TanStack Start&lt;/a&gt; feels simpler and much faster than &lt;a href=&quot;https://nextjs.org/&quot;&gt;Next.js&lt;/a&gt;, and will be my choice for new projects going forward.&lt;/p&gt;
</content:encoded><category>2026</category><category>agents</category><category>CLI</category><author>Baochun Li</author></item><item><title>Subagent-Friendly Planning Rules</title><link>https://baochun.org/2026-02-13/</link><guid isPermaLink="true">https://baochun.org/2026-02-13/</guid><description>A handy AGENTS.md addition that makes sure that codex writes better plans and uses subagents proactively.</description><pubDate>Fri, 13 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Following &lt;a href=&quot;https://x.com/LLMJunky/status/2021422988969799879?s=20&quot;&gt;this suggestion&lt;/a&gt;, I added the following sections to &lt;code&gt;AGENTS.md&lt;/code&gt; to make sure that codex always writes subagent-friendly plans and uses subagents more proactively. They worked well.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-markdown&quot;&gt;## Additional Agent Operating Rules

### Context7

- ALWAYS proactively use Context7 when I need library/API documentation, code generation, setup or configuration steps without me having to explicitly ask.
- External libraries/docs/frameworks should be guided by Context7.

### Planning

- All plans MUST include a dependency graph.
- Every task in a plan must declare `depends_on: []` using explicit task IDs such as `T1`, `T2`.

### Execution

- Complete all tasks from a plan without stopping for permission between steps. Use best judgment, keep moving.
- Only stop to ask when a step is destructive/irreversible or there is a genuine blocker.

### Subagents

- Spawn subagents automatically when:
  - Parallelizable work (e.g., install + verify, npm test + typecheck, unblocked tasks from plan)
  - Long-running or blocking tasks where a worker can run independently.
  - Isolation for risky changes or checks
  - Code review would be helpful
- If you&apos;re launching subagents for parallelization, add this robust context to your prompt:
  - **Context**: Share plan file location and info if available
  - **Dependencies**: What work/files are completed? Any dependencies?
  - **Related tasks**: Any adjacent tasks, files, or agents?
  - **Exact task**: Description, file paths/names, acceptance criteria
  - **Validation**: How to validate work if possible.
  - **Constraints**: Risks, gotchas, things to avoid
  - **Be thorough**: Provide ANY/ALL context that will aid success.
- ALWAYS wait for all subagents to complete before yielding.

### Bugs

- Add a regression test when it is appropriate for bug-related changes.
&lt;/code&gt;&lt;/pre&gt;
</content:encoded><category>2026</category><category>agents</category><category>CLI</category><author>Baochun Li</author></item><item><title>iOS Codex Workflow with Moshi and GPT 5.3 Codex Spark</title><link>https://baochun.org/2026-02-12/</link><guid isPermaLink="true">https://baochun.org/2026-02-12/</guid><description>The iOS codex workflow has been streamlined again: now with the Moshi iOS app to ssh into my computer via the Tailscale network. Also, GPT 5.3 Codex Spark is super fast.</description><pubDate>Thu, 12 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Only after a day, I discovered a better way to use codex from my iPhone: use the &lt;a href=&quot;https://apps.apple.com/us/app/moshi-ssh-mosh-terminal/id6757859949&quot;&gt;Moshi&lt;/a&gt; app to connect into my computer via the Tailscale network. It is simpler, faster, and more secure than &lt;a href=&quot;https://github.com/gbasin/agentboard&quot;&gt;Agentboard&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://openai.com/index/introducing-gpt-5-3-codex-spark/&quot;&gt;GPT 5.3 Codex Spark&lt;/a&gt; has been released and it is super fast — much faster than the regular 5.3 codex. This will be good enough for, say, fixing the “must-fix” items quickly after a code review.&lt;/p&gt;
</content:encoded><category>2026</category><category>agents</category><author>Baochun Li</author></item><item><title>Oracle Testing, Something Big, and Something Small</title><link>https://baochun.org/2026-02-11/</link><guid isPermaLink="true">https://baochun.org/2026-02-11/</guid><description>Electric&apos;s Configurancy argues that when code is cheap, specs and oracle testing matter more than unit tests alone. And something big is happening.</description><pubDate>Wed, 11 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://electric-sql.com/blog/2026/02/02/configurancy&quot;&gt;Configurancy&lt;/a&gt;, written by Electric, emphasized what we should do when writing code is cheap.&lt;/p&gt;
&lt;p&gt;What I find interesting is the concept of &lt;em&gt;oracle testing&lt;/em&gt;, where the oracle (Postgres in their example) is the spec that the codebase needs to satisfy. The moral of the story is that we need specs and conformance suites, not just simple unit test cases.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Something Big Is Happening&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://shumer.dev/something-big-is-happening&quot;&gt;Something Big Is Happening&lt;/a&gt;, written by Matt Shumer, is fascinating and long, but a must read.&lt;/p&gt;
&lt;p&gt;The era of manual coding is over, and perhaps soon, “vibe coding” will become “vibe research” in general, where not only programming, but also research, will be produced with almost 100% assistance by AI agents. This will allow us to try out new ideas, and to find out which works and which doesn’t, with unprecedented velocity.&lt;/p&gt;
&lt;p&gt;I like one piece of advice from this article:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;“Spend one hour a day experimenting with AI. Not passively reading about it. Using it.”&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h3&gt;243-Line MicroGPT by Karpathy&lt;/h3&gt;
&lt;p&gt;In sharp contrast with his &lt;a href=&quot;https://github.com/karpathy/nanochat&quot;&gt;NanoChat&lt;/a&gt; project and its &lt;a href=&quot;https://deepwiki.com/karpathy/nanochat&quot;&gt;DeepWiki&lt;/a&gt; documentation, the 243 lines of pure, dependency-free Python, &lt;a href=&quot;https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95&quot;&gt;MicroGPT&lt;/a&gt;, reminds me of his online videos on building an autograd engine. Surely someone will post a detailed tutorial soon, explaining these 243 lines line-by-line.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;On a side note, the &lt;a href=&quot;https://docs.devin.ai/work-with-devin/deepwiki-mcp&quot;&gt;DeepWiki MCP&lt;/a&gt; definitely sounds very interesting and may be better than Context7. I have added it to my codex MCP setup with the command:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;codex mcp add deepwiki --url https://mcp.deepwiki.com/mcp
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;
</content:encoded><category>2026</category><category>agents</category><author>Baochun Li</author></item><item><title>A Language for Agents</title><link>https://baochun.org/2026-02-10/</link><guid isPermaLink="true">https://baochun.org/2026-02-10/</guid><description>A quick iOS Codex access tip with Agentboard, plus a strong Rust-over-Python essay for agentic programming.</description><pubDate>Tue, 10 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Quick tip on how to get the current model served by codex:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;RUST_LOG=&apos;codex_api::sse::responses=trace&apos; codex exec --sandbox read-only --model gpt-5.3-codex &apos;ping&apos; 2&amp;gt;&amp;amp;1 | grep -m1 &apos;SSE event: {&amp;quot;type&amp;quot;:&amp;quot;response.created&amp;quot;&apos; | sed &apos;s/^.*SSE event: //&apos; | jq -r &apos;.response.model&apos;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Before running this command, I need to enter a trusted directory first.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;Agentboard and iOS Codex Access&lt;/h3&gt;
&lt;p&gt;There are quite a few projects that are designed to help access agentic CLI tools, such as codex, from an iOS device. I have researched several of them, and the best is &lt;a href=&quot;https://github.com/gbasin/agentboard&quot;&gt;Agentboard&lt;/a&gt;. It uses Tailscale to seamlessly visit a home computer or a server via just a web browser on the phone, and connect into any live codex sessions there. So far, &lt;a href=&quot;https://github.com/gbasin/agentboard&quot;&gt;Agentboard&lt;/a&gt; offers the best experience of connecting to a codex session from my phone, and is much better than other alternatives, such as &lt;a href=&quot;https://happy.engineering/&quot;&gt;happy.engineering&lt;/a&gt;, &lt;a href=&quot;https://github.com/antirez/tgterm&quot;&gt;tgterm&lt;/a&gt;, or directly using &lt;a href=&quot;https://openclaw.ai/&quot;&gt;Openclaw&lt;/a&gt; with Telegram.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;A Language for Agents&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://lucumr.pocoo.org/2026/2/9/a-language-for-agents/&quot;&gt;A Language for Agents&lt;/a&gt;, by Armin Ronacher, is an excellent (but long) essay about the future of programming languages in an agentic world.&lt;/p&gt;
&lt;p&gt;Though it lobbied for a new language only for agents, it also made a strong case for using Rust and TypeScript (and perhaps also Go) as “the agent’s language,” but not nearly as much for Python, which is not strongly typed.&lt;/p&gt;
&lt;p&gt;In my personal opinion, we can just settle for TypeScript and Rust as the programming languages of choice going forward when starting greenfield projects, and only use Python for machine learning. To reduce the cognitive load, we shouldn’t be learning programming languages beyond TypeScript (as the first language) and Rust (as the advanced, performance-oriented alternative) for general-purpose programming, and perhaps Python for its ecosystem in machine learning.&lt;/p&gt;
</content:encoded><category>2026</category><category>workflows</category><category>CLI</category><author>Baochun Li</author></item><item><title>Redesigned Personal Website with a Minimal Writing Workflow</title><link>https://baochun.org/2026-02-08/</link><guid isPermaLink="true">https://baochun.org/2026-02-08/</guid><description>I redesigned my personal website, featuring not only a simple, minimalist design, but also a streamlined process of writing and publishing new entries via CLI tools.</description><pubDate>Sun, 08 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I redesigned my personal website on the flight from Doha back to Toronto. Last year’s design took me a day and used a content management framework called &lt;a href=&quot;https://quartz.jzhao.xyz/&quot;&gt;Quartz&lt;/a&gt;, while the new design took me only about two hours in codex, and used only &lt;a href=&quot;https://astro.build/&quot;&gt;Astro&lt;/a&gt; as a lightweight framework. I used the plan mode in codex to produce a plan first before implementation. I have to say, codex completely changed how a website is to be designed: the old days of manually coding websites are gone.&lt;/p&gt;
&lt;p&gt;The new website allows creating a new post with a CLI tool: &lt;code&gt;post&lt;/code&gt;, for which I asked codex to create a skill for itself to use. In order to post a new entry, all I need to do is to activate the skill and tell codex what I wish to say.&lt;/p&gt;
</content:encoded><category>2026</category><category>workflows</category><category>CLI</category><author>Baochun Li</author></item><item><title>tiny-llm and Practical PyTorch Learning Prerequisites</title><link>https://baochun.org/2025-04-29/</link><guid isPermaLink="true">https://baochun.org/2025-04-29/</guid><description>tiny-llm is exactly what I wished for. It also contains links to two existing PyTorch related courses to machine learning from Carnegie Mellon University.</description><pubDate>Tue, 29 Apr 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://github.com/skyzh/tiny-llm&quot;&gt;tiny-llm&lt;/a&gt; Exactly what I wished for. It also contains links to two existing PyTorch related courses to machine learning from Carnegie Mellon University, to be used as prerequisites for this course.&lt;/p&gt;
</content:encoded><category>2025</category><category>frameworks</category><category>papers</category><author>Baochun Li</author></item><item><title>Arc Browser and the Modern shadcn/ui Tooling Stack</title><link>https://baochun.org/2025-04-02/</link><guid isPermaLink="true">https://baochun.org/2025-04-02/</guid><description>Arc — My new browser of choice. I love the fact that bookmarks are organized on the side panel, rather than clustered at the top of the window.</description><pubDate>Wed, 02 Apr 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://thebrowser.company/&quot;&gt;Arc&lt;/a&gt; — My new browser of choice. I love the fact that bookmarks are organized on the side panel, rather than clustered at the top of the window. Split windows and spaces are also quite nice, and it’s cool to read about using Swift to build the UI. The “Little Arc” windows are kind-of cute, too.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://vaul.emilkowal.ski/&quot;&gt;Vaul&lt;/a&gt; — What a beautiful and simple design! Enjoyed reading the designer [&lt;a href=&quot;https://emilkowal.ski/&quot;&gt;Emil Kowalski&lt;/a&gt;]’s website. His other creation, &lt;a href=&quot;https://sonner.emilkowal.ski/&quot;&gt;Sonner&lt;/a&gt;, is also something I use — its &lt;a href=&quot;https://sonner.emilkowal.ski/getting-started&quot;&gt;documentation&lt;/a&gt; is a thing of beauty, including its use of the marvellous &lt;a href=&quot;https://usgraphics.com/products/berkeley-mono&quot;&gt;Berkeley Mono&lt;/a&gt; typeface.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ui.shadcn.com/&quot;&gt;shadcn/ui&lt;/a&gt; — The best UI component distribution mechanism and library out there. It’s fully compatible with &lt;a href=&quot;https://tailwindcss.com/&quot;&gt;Tailwind CSS&lt;/a&gt; 4.1, and &lt;a href=&quot;https://tweakcn.com/&quot;&gt;tweakcn&lt;/a&gt; can be used to customize it.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ui.bazza.dev/&quot;&gt;bazza/ui&lt;/a&gt; — Best data table filters, based on shadcn/ui.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/ln-dev7/circle&quot;&gt;Circle&lt;/a&gt; — A dashboard template based on shadcn/ui, that allows components to be dragged and dropped across columns, as in &lt;a href=&quot;https://www.diceui.com/docs/components/kanban&quot;&gt;Kanban&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://shadcnuikit.com/dashboard/default&quot;&gt;Dashboards in shadcn UI Kit&lt;/a&gt; — A pretty good dashboard, but not without minor issues when adapting to narrower windows.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.shadcnblocks.com/&quot;&gt;shadcnblocks&lt;/a&gt; — Hundreds of useful blocks (for a flat fee). Very useful to have such a large selection.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://supabase.com/ui&quot;&gt;Supabase UI Library&lt;/a&gt; — Distributed using the shadcn CLI.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/sadmann7/diceui&quot;&gt;Dice UI&lt;/a&gt; — Unstyled UI component library based on the latest Tailwind CSS 4 and distributed using the shadcn CLI. It includes a nice &lt;a href=&quot;https://linear.app/homepage&quot;&gt;Linear&lt;/a&gt;-like &lt;a href=&quot;https://github.com/sadmann7/shadcn-table&quot;&gt;table filter and sorting&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://silkhq.co/&quot;&gt;Silk&lt;/a&gt; — Native‑like swipeable sheets on the web, and more advanced than &lt;a href=&quot;https://vaul.emilkowal.ski/&quot;&gt;Vaul&lt;/a&gt;. 299 Euro for small businesses with fewer than 5 employees.&lt;/p&gt;
</content:encoded><category>2025</category><category>frameworks</category><category>workflows</category><category>CLI</category><author>Baochun Li</author></item><item><title>Evaluating Eleventy as a Lightweight Static Site Generator</title><link>https://baochun.org/2025-04-01/</link><guid isPermaLink="true">https://baochun.org/2025-04-01/</guid><description>Eleventy appears to be a pretty simple static website generator that is worth exploring. A competitor to Hugo.</description><pubDate>Tue, 01 Apr 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://www.11ty.dev&quot;&gt;Eleventy&lt;/a&gt; appears to be a pretty simple static website generator that is worth exploring. A competitor to Hugo.&lt;/p&gt;
</content:encoded><category>2025</category><category>frameworks</category><author>Baochun Li</author></item><item><title>How I Use LLMs: Key Notes from Andrej Karpathy</title><link>https://baochun.org/2025-03-11/</link><guid isPermaLink="true">https://baochun.org/2025-03-11/</guid><description>How I use LLMs by Andrej Karpathy — A must watch.</description><pubDate>Tue, 11 Mar 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=EWvNQjAaOHw&quot;&gt;How I use LLMs&lt;/a&gt; by Andrej Karpathy — A must watch.&lt;/p&gt;
</content:encoded><category>2025</category><category>workflows</category><category>agents</category><author>Baochun Li</author></item><item><title>Panasonic S1R II and Early Claude Code Impressions</title><link>https://baochun.org/2025-02-25/</link><guid isPermaLink="true">https://baochun.org/2025-02-25/</guid><description>Panasonic S1R II — With the Sigma 28-105 f/2.8, this would be my dream camera. It is just slightly heavier than my Panasonic S5 IIx (1.57 lb vs. 1.45 lb body only).</description><pubDate>Tue, 25 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://www.thephoblographer.com/2025/02/25/panasonic-s1r-ii-review-its-time-to-get-excited/&quot;&gt;Panasonic S1R II&lt;/a&gt; — With the Sigma 28-105 f/2.8, this would be my dream camera. It is just slightly heavier than my Panasonic S5 IIx (1.57 lb vs. 1.45 lb body only).&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://ai-claude.net/code/&quot;&gt;Claude Code&lt;/a&gt; — I joined the waitlist last night and received the invitation today.&lt;/p&gt;
&lt;p&gt;I gave it a try on one of my ongoing projects and it was pretty costly (costing $0.40 to simply set up a basic understanding in &lt;code&gt;CLAUDE. d&lt;/code&gt;). I also tried &lt;a href=&quot;https://www.codebuff.com/&quot;&gt;CodeBuff&lt;/a&gt; and it does seem to fare better, but not magically solving the issues I experienced.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Morale of the story:&lt;/em&gt; Know what you are doing and do not try to use AI blindly. It will mess up the codebase to a point where it is taking longer to rescue the code than to work on the project manually without AI from the beginning.&lt;/p&gt;
</content:encoded><category>2025</category><category>agents</category><category>CLI</category><author>Baochun Li</author></item><item><title>Ultra-Scale LLM Training Playbook and Streaming DiLoCo</title><link>https://baochun.org/2025-02-19/</link><guid isPermaLink="true">https://baochun.org/2025-02-19/</guid><description>The Ultra-Scale Playbook: Training LLMs on GPU Clusters — Amazing, and finally we have a 100-page open-source online book on how models are trained with multiple GPUs.</description><pubDate>Wed, 19 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://huggingface.co/spaces/nanotron/ultrascale-playbook&quot;&gt;The Ultra-Scale Playbook: Training LLMs on GPU Clusters&lt;/a&gt; — Amazing, and finally we have a 100-page open-source online book on how models are trained with multiple GPUs, with reproducible source code.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2501.18512v1&quot;&gt;Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch&lt;/a&gt; — Latest paper from DeepMind about efficient geographically distributed training with overlapped communication.&lt;/p&gt;
</content:encoded><category>2025</category><category>papers</category><author>Baochun Li</author></item><item><title>Crafted UI, Fumadocs, and Design System References</title><link>https://baochun.org/2025-02-18/</link><guid isPermaLink="true">https://baochun.org/2025-02-18/</guid><description>Crafted — What a great looking set of open-source, hand-crafted UI templates based on shadcn/ui!</description><pubDate>Tue, 18 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://crafted.is/&quot;&gt;Crafted&lt;/a&gt; — What a great looking set of open-source, hand-crafted UI templates based on  &lt;a href=&quot;https://ui.shadcn.com/&quot;&gt;shadcn/ui&lt;/a&gt;!&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://fumadocs.vercel.app/&quot;&gt;Fumadocs&lt;/a&gt; — &lt;a href=&quot;https://better-auth.com&quot;&gt;Better Auth&lt;/a&gt;’s &lt;a href=&quot;https://docs.better-auth.com&quot;&gt;documentation&lt;/a&gt; is built with this excellent documentation framework based on Next.js.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://gwern.net/&quot;&gt;The website of Gwern Branwen&lt;/a&gt; — Beautiful &lt;a href=&quot;https://gwern.net/design&quot;&gt;design&lt;/a&gt;, with &lt;a href=&quot;https://github.com/adobe-fonts/source-serif&quot;&gt;Adobe Source Serif Pro&lt;/a&gt; as the main serif font choice. I couldn’t believe that the entire site infrastructure is &lt;a href=&quot;https://github.com/gwern/gwern.net&quot;&gt;open source&lt;/a&gt;, and constantly being updated.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://tailwindui.com/templates/syntax&quot;&gt;Syntax: Tailwind’s documentation template&lt;/a&gt; — A bit pricy, but good quality.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://remixicon.com/&quot;&gt;Remix Icons&lt;/a&gt; — Used by the &lt;a href=&quot;https://console.x.ai/&quot;&gt;Grok&lt;/a&gt; website.&lt;/p&gt;
</content:encoded><category>2025</category><category>frameworks</category><author>Baochun Li</author></item><item><title>Better Auth, Origin UI, and Open Research Data Tools</title><link>https://baochun.org/2025-02-16/</link><guid isPermaLink="true">https://baochun.org/2025-02-16/</guid><description>Better Auth — A new authentication library that is feature-complete and easy-to-use. Compared to Lucia, which advocates a copy-and-paste approach.</description><pubDate>Sun, 16 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://batter-auth.com/&quot;&gt;Better Auth&lt;/a&gt; — A new authentication library that is feature-complete and easy-to-use. Compared to Lucia, which advocates a copy-and-paste approach, this library requires less intimate knowledge about authentication, and its plug-in system implies that it doesn’t sacrifice on extensibility.  It feels more like an automatic with manual overrides in cars, rather than a manual transmission. My choice going forward.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://originui.com&quot;&gt;Origin UI&lt;/a&gt; — What an excellent set of UI components based on &lt;a href=&quot;https://ui.shadcn.com/&quot;&gt;shadcn/ui&lt;/a&gt;!  The number of variants for each UI category is mind-boggling.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://resend.com/blog/top-10-email-deliverability-tips&quot;&gt;Top 10 Email Deliverability Tips&lt;/a&gt; — Resend’s tips on improving the deliverability of outbound emails.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://simonwillison.net/2025/Feb/15/llm-mlx/&quot;&gt;Simon Willison’s take on MLX&lt;/a&gt; — Simon Willison (finally) added MLX as a new plugin to his LLM CLI utility, &lt;code&gt;llm&lt;/code&gt;. His experiences with MLX were very positive:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;This is &lt;em&gt;really&lt;/em&gt; good software. This small team at Apple appear to be almost single-handedly giving NVIDIA’s CUDA a run for their money!&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://www.semanticscholar.org/&quot;&gt;Semantic Scholar&lt;/a&gt; — Unlike Google Scholar, Semantic Scholar provides an open REST API to obtain metadata about papers and their authors, forming an &lt;em&gt;academic graph&lt;/em&gt;. Pretty cool and I didn’t know about it before.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://zed.dev/blog/edit-prediction&quot;&gt;Zed supports edit prediction with its open-source Zeta model&lt;/a&gt; — The blog post that introduces Zeta was pretty comprehensive and covered a lot of grounds, including their deployment with &lt;a href=&quot;https://www.baseten.co/&quot;&gt;Baseten&lt;/a&gt; with latency minimized.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://docs.openalex.org/&quot;&gt;OpenAlex&lt;/a&gt; — A fully open catalog of the global research system. The world becomes a better place with the dedication and hard work of people behind these efforts at &lt;a href=&quot;https://x.com/Dorialexander/status/1889299316780519462&quot;&gt;OurResearch&lt;/a&gt;. It is also part of in the recently released &lt;a href=&quot;https://x.com/Dorialexander/status/1889299316780519462&quot;&gt;Common Corpus 2&lt;/a&gt;, a second version of the &lt;a href=&quot;https://simonwillison.net/2024/Nov/14/releasing-the-largest-multilingual-open-pretraining-dataset/&quot;&gt;Common Corpus&lt;/a&gt;.&lt;/p&gt;
</content:encoded><category>2025</category><category>frameworks</category><author>Baochun Li</author></item><item><title>A Minimal GRPO Implementation from First Principles</title><link>https://baochun.org/2025-02-15/</link><guid isPermaLink="true">https://baochun.org/2025-02-15/</guid><description>Andriy Burkov’s minimalist implementation of GRPO from scratch — Rather than using a library such as Hugging Face’s TRL.</description><pubDate>Sat, 15 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://x.com/burkov/status/1890566690058170708&quot;&gt;Andriy Burkov’s minimalist implementation of GRPO from scratch&lt;/a&gt; — Rather than using a library such as Hugging Face’s TRL, it would always be a good idea to read a minimalist, back-to-square-one implementation of the GRPO reinforcement learning algorithm.&lt;/p&gt;
</content:encoded><category>2025</category><category>workflows</category><author>Baochun Li</author></item><item><title>Transformer Lab: MLX Fine-Tuning Workspace on Mac</title><link>https://baochun.org/2025-02-14/</link><guid isPermaLink="true">https://baochun.org/2025-02-14/</guid><description>Transformer Lab — a free, open-source LLM workspace that prepares a custom dataset and fine-tunes a model using MLX on the Mac.</description><pubDate>Fri, 14 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://transformerlab.ai/&quot;&gt;Transformer Lab&lt;/a&gt; —  a free, open-source LLM workspace that prepares a custom dataset and fine-tunes a model using MLX on the Mac (or of course, using a GPU-powered computer or in the cloud). Deep Gandhi offered a &lt;a href=&quot;https://x.com/deepgandhi_07/status/1890465271934034266&quot;&gt;quick step-by-step guide&lt;/a&gt; for using MLX to fine-tune a model. It’s open-source with the MIT license, and the tech stack for building its UI seems to be Electron and React. This is first UI I found that can fine-tune models using MLX.&lt;/p&gt;
</content:encoded><category>2025</category><category>workflows</category><category>frameworks</category><author>Baochun Li</author></item><item><title>Lucia’s New Authentication Design and Practical Tradeoffs</title><link>https://baochun.org/2025-02-11/</link><guid isPermaLink="true">https://baochun.org/2025-02-11/</guid><description>Lucia — Lucia, the authentication library, has adopted the design of cutting and pasting code, just like shadcn/ui, rather than implementing a library.</description><pubDate>Tue, 11 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://lucia-auth.com/&quot;&gt;Lucia&lt;/a&gt; — Lucia, the authentication library, has adopted the design of cutting and pasting code, just like &lt;a href=&quot;https://ui.shadcn.com/&quot;&gt;shadcn/ui&lt;/a&gt;, rather than implementing a library to encapsulate the details. This should work well with authentication, and reflects the design principle of working with simpler libraries rather than all-in-ones. In this case, the new Lucia design uses &lt;a href=&quot;https://arcticjs.dev/&quot;&gt;Arctic&lt;/a&gt; and &lt;a href=&quot;https://oslojs.dev/&quot;&gt;Oslo&lt;/a&gt;, but all session and cookie management code need to be written (cut and pasted).&lt;/p&gt;
</content:encoded><category>2025</category><category>frameworks</category><author>Baochun Li</author></item><item><title>From 0 to Production: Notes on Theo’s Modern React Tutorial</title><link>https://baochun.org/2025-02-09/</link><guid isPermaLink="true">https://baochun.org/2025-02-09/</guid><description>From 0 to Production — The Modern React Tutorial — Theo released it last year, and I always wanted to learn from this marathon tutorial.</description><pubDate>Sun, 09 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=d5x0JCZbAJs&quot;&gt;From 0 to Production — The Modern React Tutorial&lt;/a&gt; — Theo released it last year, and I always wanted to learn from this marathon tutorial. It covers all of the modern frameworks, &lt;a href=&quot;https://nextjs.org/&quot;&gt;Next.js&lt;/a&gt;, &lt;a href=&quot;https://ui.shadcn.com/&quot;&gt;shadcn/ui&lt;/a&gt;, and &lt;a href=&quot;https://www.typescriptlang.org/&quot;&gt;TypeScript&lt;/a&gt;. I will find some time to finish it.&lt;/p&gt;
</content:encoded><category>2025</category><category>workflows</category><category>frameworks</category><author>Baochun Li</author></item><item><title>Unsloth GRPO, S1-Style Scaling, and RL Learning Resources</title><link>https://baochun.org/2025-02-08/</link><guid isPermaLink="true">https://baochun.org/2025-02-08/</guid><description>Unsloth.ai’s GRPO — it seems that the Unsloth implementation of GRPO uses less GPU memory, and it supports QLoRA and LoRA.</description><pubDate>Sat, 08 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://unsloth.ai/blog/r1-reasoning&quot;&gt;Unsloth.ai’s GRPO&lt;/a&gt; — it seems that the Unsloth implementation of GRPO uses less GPU memory, and it supports QLoRA and LoRA.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://gist.github.com/awni/9d8b35ef9c983563cfaad449f867c0f1&quot;&gt;S1-style test-time scaling with MLX&lt;/a&gt; — Awni Hannun, the primary architect of MLX, posted a simple implementation of &lt;a href=&quot;https://arxiv.org/abs/2501.19393&quot;&gt;S1&lt;/a&gt;-style test-time scaling using DeepSeek R1 distilled models locally, with only 138 lines of Python code. Simplicity at its best.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://spinningup.openai.com/en/latest/&quot;&gt;Spinning Up in Deep RL&lt;/a&gt; — Excellent introduction to deep reinforcement learning, with a sufficient amount of math but skips unnecessary formalism. It comes with PyTorch implementations for the algorithms. As it stated in its introduction:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;However, while there are many resources to help people quickly ramp up on deep learning, deep reinforcement learning is more challenging to break into. To begin with, a student of deep RL needs to have some background in math, coding, and regular deep learning. Beyond that, they need both a high-level view of the field—an awareness of what topics are studied in it, why they matter, and what’s been done already—and careful instruction on how to connect algorithm theory to algorithm code.&lt;/p&gt;
&lt;p&gt;The high-level view is hard to come by because of how new the field is. There is not yet a standard deep RL textbook, so most of the knowledge is locked up in either papers or lecture series, which can take a long time to parse and digest. And learning to implement deep RL algorithms is typically painful, because either&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the paper that publishes an algorithm omits or inadvertently obscures key design details,&lt;/li&gt;
&lt;li&gt;or widely-public implementations of an algorithm are hard to read, hiding how the code lines up with the algorithm.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;Connecting algorithm theory to algorithm code is what’s sorely missing in many other online books or resources, especially in reinforcement learning. Many used Jupyter notebooks, which lead to horrible ways of learning from source code.&lt;/p&gt;
</content:encoded><category>2025</category><category>workflows</category><category>papers</category><author>Baochun Li</author></item><item><title>AI Peer Review with LLMs and S1 Test-Time Scaling</title><link>https://baochun.org/2025-02-06/</link><guid isPermaLink="true">https://baochun.org/2025-02-06/</guid><description>DOGE: Make AI Conferences Great Again — Zeyuan (Allen) Zhu wrote a very interesting piece on using LLMs as arbitrators in the reviewer-author discussions.</description><pubDate>Thu, 06 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://doge.allen-zhu.com/&quot;&gt;DOGE: Make AI Conferences Great Again&lt;/a&gt; — Zeyuan (Allen) Zhu wrote a very interesting piece on using LLMs as arbitrators in the reviewer-author discussions and the paper review process. Zhu is one of the co-authors of the 2021 LoRA paper, which with over 11000 citations became the &lt;em&gt;de facto&lt;/em&gt; standard in parameter-efficient fine-tuning, and widely used throughout the entire machine learning community.&lt;/p&gt;
&lt;p&gt;In the paper, one surprising fact was that the 2021 LoRA paper was initially rejected by NeurIPS 2021, even after author rebuttal. I believe this shows clear evidence that the paper review system is broken, at least in the ML/AI community, making Zhu’s proposal of using LLMs to improve the fairness of the review process more interesting.&lt;/p&gt;
&lt;p&gt;P.S. It &lt;a href=&quot;https://x.com/OriolVinyalsML/status/1887594344183701814&quot;&gt;looks like&lt;/a&gt; the widely cited paper, “&lt;a href=&quot;https://arxiv.org/pdf/1503.02531&quot;&gt;Distilling the Knowledge in a Neural Network&lt;/a&gt;”, co-authored by Geoffrey Hinton and Jeff Dean, was also rejected by NeurIPS 2014, and later appeared in the NeurIPS 2014 Deep Learning Workshop. It has since received over 23000 citations.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://arxiv.org/abs/2501.19393&quot;&gt;s1: Simple test-time scaling&lt;/a&gt; — Stanford University showed in this paper that, by fine-tuning the &lt;code&gt;Qwen2.5-32B-Instruct&lt;/code&gt; model with a curated high-quality dataset of only 1000 samples, and by appending &lt;code&gt;wait&lt;/code&gt; to force the model to think longer, a 32B model can perform as well as o1-preview. It is the simplest way to do test-time scaling over the total number of thinking tokens, but it appears that it works well.&lt;/p&gt;
&lt;p&gt;Interestingly, the paper stated:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The concurrently released r1-32B shows stronger performance than s1-32B while also only using SFT (DeepSeek-AI et al., 2025). However, it is trained on 800 × more reasoning samples. It is an open question whether one can achieve their performance with just 1,000 samples.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While DeepSeek r1-32B was indeed trained with far more reasoning samples, once the training is complete, it doesn’t need to perform test-time compute, which degrades user experience in terms of waiting time.&lt;/p&gt;
</content:encoded><category>2025</category><category>papers</category><author>Baochun Li</author></item><item><title>Karpathy’s LLM Deep Dive and MLX Rust Ecosystem Links</title><link>https://baochun.org/2025-02-05/</link><guid isPermaLink="true">https://baochun.org/2025-02-05/</guid><description>Deep Dive into LLMs like ChatGPT — Andrej Karpathy continues his top-notch hours-long education on large language models with a new episode today.</description><pubDate>Wed, 05 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://youtu.be/7xTGNNLPyMI&quot;&gt;Deep Dive into LLMs like ChatGPT&lt;/a&gt; — &lt;a href=&quot;https://www.youtube.com/@AndrejKarpathy&quot;&gt;Andrej Karpathy&lt;/a&gt; continues his top-notch hours-long education on large language models with a new episode today. I am also keeping an eye on his new venture, &lt;a href=&quot;https://eurekalabs.ai/&quot;&gt;Eureka Labs&lt;/a&gt;, which hopefully will eventually arrive with genuinely helpful educational content on all things machine learning.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/playlist?list=PLgPbN3w-ia_PeT1_c5jiLW3RJdR7853b9&quot;&gt;Deep Learning&lt;/a&gt; — a long list of 26 white board lectures on deep learning, taught by &lt;a href=&quot;https://www.youtube.com/@csprof&quot;&gt;Professor Bryce&lt;/a&gt;, Davidson College.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/oxideai/mlx-rs&quot;&gt;mlx-rs&lt;/a&gt; — Rust bindings for Apple’s MLX machine learning library on Apple silicon. Two of my favourite technologies are Rust and MLX, and this one has a bit of both.&lt;/p&gt;
</content:encoded><category>2025</category><category>frameworks</category><author>Baochun Li</author></item><item><title>GRPO on Apple MLX and Minimal-R1 Scaling Insights</title><link>https://baochun.org/2025-02-03/</link><guid isPermaLink="true">https://baochun.org/2025-02-03/</guid><description>GRPO will soon be added to Apple MLX — The PR now works, using about 32 GB of memory when training Qwen2.5-0.5B.</description><pubDate>Mon, 03 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://github.com/ml-explore/mlx-examples/pull/1233&quot;&gt;GRPO will soon be added to Apple MLX&lt;/a&gt; — The PR now works, using about 32 GB of memory when training &lt;code&gt;Qwen2.5-0.5B&lt;/code&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/SeungyounShin/minimal-r1&quot;&gt;Minimal-R1&lt;/a&gt; — Another excellent reproduction of DeepSeek R1 with GRPO, using only a 8xH100 server. It addresses the &lt;a href=&quot;https://github.com/huggingface/open-r1/issues/65&quot;&gt;issue of scalability&lt;/a&gt; in Hugging Face’s Open-R1 when generating long completions. What makes it stand out is that it doesn’t depend on TRL, and has its own GRPO implementation. It dedicated one GPU for vLLM generation, and one GPU for the reference model.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://x.com/Afinetheorem/status/1886206439582015870&quot;&gt;Kevin Bryan shares his view on OpenAI Deep Research&lt;/a&gt; — Kevin Bryan from the University of Toronto shares his early experiences with OpenAI’s Deep Research. He is extremely upbeat about it, even sharing &lt;a href=&quot;https://kevinbryanecon.com/o3WhatCanWeDo.pdf&quot;&gt;a paper&lt;/a&gt; that Deep Research (a.k.a. the o3 model with web search capabilities) &lt;a href=&quot;https://x.com/Afinetheorem/status/1886245511046271194&quot;&gt;wrote in 15 minutes&lt;/a&gt;, as well as &lt;a href=&quot;https://kevinbryanecon.com/o3InnovationTheory.pdf&quot;&gt;another paper&lt;/a&gt; that is more theoretical.&lt;/p&gt;
&lt;p&gt;Here are some interesting quotes of what Prof. Bryan said:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Nick Pretnar asks:&lt;/em&gt; Can it simultaneously write a paper + model code, estimate/calibrate such model, discern which results are relevant to discuss then present such results in a way humans can understand?&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Kevin Bryan:&lt;/em&gt; That’s beyond current capabilities. But the proof of concept is pretty clear. At this point, it’s by far most useful as a complement — you should be writing your code with Cursor plus frontier models, having AI supplement and check analysis, having AI check proof accuracy, etc.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This is what I would call a &lt;em&gt;human-in-the-loop&lt;/em&gt; approach to academic research. But of course, when abused, the landscape of academic research papers can have lots of mediocre AI-generated content in the near-term future.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://wtfhappenedin1971.com/&quot;&gt;WTF happened in 1971?&lt;/a&gt; — 1971 is indeed a special year, it was when Elon Musk, Marc Andreessen, Ma Huateng, Liu Yunhao, and I were born.&lt;/p&gt;
</content:encoded><category>2025</category><category>workflows</category><category>frameworks</category><category>papers</category><author>Baochun Li</author></item><item><title>Simple GRPO Implementations and DeepSeek FAQ Highlights</title><link>https://baochun.org/2025-02-02/</link><guid isPermaLink="true">https://baochun.org/2025-02-02/</guid><description>Another simple DeepSeek R1 reproduction — This reproduction of GRPO has one distinct feature: it is exceedingly simple and quite elegant.</description><pubDate>Sun, 02 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://github.com/emailtovamos/DeepSeekR1Zero/&quot;&gt;Another simple DeepSeek R1 reproduction&lt;/a&gt; — This reproduction of GRPO has one distinct feature: it is exceedingly simple and quite elegant. To run it on the Mac, I only need to do a few minor changes, such as removing the quantization using &lt;code&gt;bitsandbytes&lt;/code&gt;, which only works for CUDA. I have also used the following &lt;code&gt;pyproject.toml&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-toml&quot;&gt;[project]
name = &amp;quot;grpo&amp;quot;
version = &amp;quot;0.1.0&amp;quot;
description = &amp;quot;DeepSeek R1 reproduction using small models&amp;quot;
readme = &amp;quot;README.md&amp;quot;
requires-python = &amp;quot;&amp;gt;3.11, &amp;lt;=3.12&amp;quot;
dependencies = [
    &amp;quot;torch&amp;quot;,
    &amp;quot;accelerate&amp;quot;,
    &amp;quot;transformers&amp;quot;,
    &amp;quot;datasets&amp;quot;,
    &amp;quot;tqdm&amp;quot;,
    &amp;quot;wandb&amp;quot;
]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and the following command:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;uv run R1ZeroTrain.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Out of the several DeepSeek R1 reproductions, this is my favourite. Not only it is simple enough and does not depend on any external RL library (such as &lt;code&gt;TRL&lt;/code&gt; and &lt;code&gt;veRL&lt;/code&gt;), it shows some of the nice features in GRPO. Obviously, due to its simplicity, its GRPO implementation is not complete and may need more work. But this is an educational codebase, and the author even posted &lt;a href=&quot;https://www.youtube.com/watch?v=hRSzhn_lDd8&quot;&gt;a YouTube video&lt;/a&gt;, which I will try to find some time to watch.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://openai.com/index/introducing-deep-research/&quot;&gt;OpenAI releases Deep Research&lt;/a&gt; — ChatGPT Pro users who pay $200 a month get 100 Deep Research questions per month. No coding examples in the introduction.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/baochunli/mini-r1/blob/main/mac/train.py&quot;&gt;DeepSeek R1 reproduction now runs on my Mac&lt;/a&gt; — With a slight modification to &lt;code&gt;train.py&lt;/code&gt; to turn off the usage of flash attention 2, I got &lt;a href=&quot;https://github.com/Mohammadjafari80/GSM8K-RLVR&quot;&gt;the DeepSeek R1’s GRPO reproduction on small models with GSM8K&lt;/a&gt; running on my Mac, with the following &lt;code&gt;pyproject.toml&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-toml&quot;&gt;[project]
name = &amp;quot;grpo&amp;quot;
version = &amp;quot;0.1.0&amp;quot;
description = &amp;quot;DeepSeek R1 reproduction using small models&amp;quot;
readme = &amp;quot;README.md&amp;quot;
requires-python = &amp;quot;&amp;gt;=3.12&amp;quot;
dependencies = [
  &amp;quot;peft&amp;gt;=0.14.0&amp;quot;,
  &amp;quot;torch&amp;gt;=2.6.0&amp;quot;,
  &amp;quot;torchvision&amp;gt;=0.21.0&amp;quot;,
  &amp;quot;transformers&amp;gt;=4.48.2&amp;quot;,
  &amp;quot;trl&amp;gt;=0.14.0&amp;quot;,
  &amp;quot;wandb&amp;gt;=0.19.5&amp;quot;,
]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;and the command:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;uv run train.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;On my late-2021 M1 Max 64GB MacBook Pro, it runs at around 8.6 times slower than the NVIDIA RTX 4090, completing each step of RL in about 403 seconds, rather than 47 seconds on the 4090.  Memory usage is up to 58 GB.&lt;/p&gt;
&lt;p&gt;Interestingly, on my server with 3 NVIDIA RTX A4500 GPUs (each with 20 GB of CUDA memory), each step takes around 193 seconds, about 4x slower than the 4090. Out of a total of 60 GB CUDA memory, 23 GB is utilized[^1]. At least for this training session, the M1 Max (without using flash attention 2) is only roughly 2x slower than 3 A4500s.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://stratechery.com/2025/deepseek-faq/&quot;&gt;DeepSeek FAQ&lt;/a&gt; —  I have long admired Ben Thompson’s writing style in terms of its clarity, and this article on DeepSeek is no exception. It is indeed a long read, but the time is worth it.  I enjoyed reading about the DeepSeek V2, which very few others mentioned:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Let’s work backwards: what was the V2 model, and why was it important?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The DeepSeek-&lt;code&gt;V2&lt;/code&gt; model introduced two important breakthroughs: DeepSeekMoE and DeepSeekMLA. The “MoE” in DeepSeekMoE refers to “mixture of experts”. Some models, like GPT-3.5, activate the entire model during both training and inference; it turns out, however, that not every part of the model is necessary for the topic at hand. MoE splits the model into multiple “experts” and only activates the ones that are necessary; GPT-4 was a MoE model that was believed to have 16 experts with approximately 110 billion parameters each.&lt;/p&gt;
&lt;p&gt;DeepSeekMoE, as implemented in &lt;code&gt;V2&lt;/code&gt;, introduced important innovations on this concept, including differentiating between more finely-grained specialized experts, and shared experts with more generalized capabilities. Critically, DeepSeekMoE also introduced new approaches to load-balancing and routing during training; traditionally MoE increased communications overhead in training in exchange for efficient inference, but DeepSeek’s approach made training more efficient as well.&lt;/p&gt;
&lt;p&gt;DeepSeekMLA was an even bigger breakthrough. One of the biggest limitations on inference is the sheer amount of memory required: you both need to load the model into memory and also load the entire context window. Context windows are particularly expensive in terms of memory, as every token requires both a key and corresponding value; DeepSeekMLA, or multi-head latent attention, makes it possible to compress the key-value store, dramatically decreasing memory usage during inference.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;And it concludes with an upbeat note on competition:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;China is also a big winner, in ways that I suspect will only become apparent over time. Not only does the country have access to DeepSeek, but I suspect that DeepSeek’s relative success to America’s leading AI labs will result in a further unleashing of Chinese innovation as they realize they can compete.&lt;/p&gt;
&lt;p&gt;That leaves America, and a choice we have to make. We could, for very logical reasons, double down on defensive measures, like massively expanding the chip ban and imposing a permission-based regulatory regime on chips and semiconductor equipment that mirrors the E.U.’s approach to tech; alternatively, we could realize that we have real competition, and actually give ourself permission to compete. Stop wringing our hands, stop campaigning for regulations — indeed, go the other way, and cut out all of the cruft in our companies that has nothing to do with winning. If we choose to compete we can still win, and, if we do, we will have a Chinese company to thank.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;[^1]: This is with vLLM turned off. With it turned on, the server with 3 A4500s always ran out of CUDA memory, for reasons that are still unknown, at this point, to me.&lt;/p&gt;
</content:encoded><category>2025</category><category>workflows</category><category>CLI</category><category>frameworks</category><author>Baochun Li</author></item><item><title>Reproducing DeepSeek R1 GRPO on Consumer Hardware</title><link>https://baochun.org/2025-02-01/</link><guid isPermaLink="true">https://baochun.org/2025-02-01/</guid><description>Fourth attempt on reproducing DeepSeek R1’s GRPO on small models — The third fourth time is the charm. I can successfully run this repo, without activating vLLM.</description><pubDate>Sat, 01 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://github.com/Mohammadjafari80/GSM8K-RLVR&quot;&gt;Fourth attempt on reproducing DeepSeek R1’s GRPO on small models&lt;/a&gt; — The &lt;s&gt;third&lt;/s&gt; fourth time is the charm. I can successfully run this repo, without activating vLLM (keep &lt;code&gt;vllm=true&lt;/code&gt; uncommented in the source code), on a single NVIDIA RTX 4090 with 24 GB CUDA memory, training the &lt;code&gt;Qwen2.5-Math-1.5B&lt;/code&gt; model with the &lt;code&gt;gsm8k&lt;/code&gt; dataset.&lt;/p&gt;
&lt;p&gt;I used the following &lt;code&gt;pyproject.toml&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-toml&quot;&gt;[project]
name = &amp;quot;grpo&amp;quot;
version = &amp;quot;0.1.0&amp;quot;
description = &amp;quot;DeepSeek R1 reproduction using small models&amp;quot;
readme = &amp;quot;README.md&amp;quot;
requires-python = &amp;quot;&amp;gt;=3.11, &amp;lt;=3.12&amp;quot;
dependencies = [
    &amp;quot;torch&amp;quot;,
    &amp;quot;transformers&amp;quot;,
    &amp;quot;datasets&amp;quot;,
    &amp;quot;peft&amp;quot;,
    &amp;quot;wandb&amp;quot;,
    &amp;quot;vllm&amp;quot;,
    &amp;quot;trl&amp;quot;,
    &amp;quot;flash-attn&amp;quot;,
]

[tool.uv]
no-build-isolation-package = [&amp;quot;flash-attn&amp;quot;]

[tool.uv.sources]
torch = [
  { index = &amp;quot;pytorch-cu121&amp;quot;, marker = &amp;quot;sys_platform == &apos;linux&apos; or sys_platform == &apos;win32&apos;&amp;quot; },
]
torchvision = [
  { index = &amp;quot;pytorch-cu121&amp;quot;, marker = &amp;quot;sys_platform == &apos;linux&apos; or sys_platform == &apos;win32&apos;&amp;quot; },
]

[[tool.uv.index]]
name = &amp;quot;pytorch-cu121&amp;quot;
url = &amp;quot;https://download.pytorch.org/whl/cu121&amp;quot;
explicit = true
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;And the following command to run the repo:&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;uv run train.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I obtained the following result after around 6 hours and over 450 steps:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/images/archive/February%201%202025.png&quot; alt=&quot;Training result from February 1, 2025&quot;&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb&quot;&gt;Third attempt on reproducing DeepSeek R1’s GRPO on small models&lt;/a&gt; — Will Brown’s GRPO reproduction uses the &lt;code&gt;openai/gsm8k&lt;/code&gt; dataset with 7470 samples, rather than the Countdown Game dataset in the two previous attempts — &lt;a href=&quot;https://github.com/Jiayi-Pan/TinyZero&quot;&gt;TinyZero&lt;/a&gt; and &lt;a href=&quot;https://www.philschmid.de/mini-deepseek-r1&quot;&gt;Mini-R1&lt;/a&gt; — which is much more meaningful.  It has been shown by others that even the small &lt;code&gt;Qwen2.5-0.5B&lt;/code&gt; model can be trained from 41.6% to 51% on the &lt;code&gt;gsm8k&lt;/code&gt; test set. I will try to reproduce this result some time, but for now it ran out of CUDA memory for a single NVIDIA RTX A4500 with 20 GB of CUDA memory, even for training the &lt;code&gt;Qwen2.5-0.5B&lt;/code&gt; model.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://digitalspaceport.com/how-to-run-deepseek-r1-671b-fully-locally-on-2000-epyc-rig/&quot;&gt;Home server at $2000 for DeepSeek R1 at 4-bit quantization&lt;/a&gt; — $2000 home server, running the DeepSeek R1 671b model at 4-bit quantization and 3.5-4 tokens per second.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://build.nvidia.com/deepseek-ai/deepseek-r1&quot;&gt;NVIDIA hosts DeepSeek R1&lt;/a&gt; — much slower than &lt;a href=&quot;https://lambda.chat&quot;&gt;Lambda Labs&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://openai.com/index/openai-o3-mini/&quot;&gt;OpenAI o3-mini&lt;/a&gt; — On ChatGPT Plus, the rate limits are 150 messages per day for &lt;code&gt;o3-mini-medium&lt;/code&gt;, and 50 messages per week for &lt;code&gt;o3-mini-high&lt;/code&gt;. The latter is designed to be the strongest model on coding.&lt;/p&gt;
</content:encoded><category>2025</category><category>workflows</category><category>CLI</category><category>frameworks</category><author>Baochun Li</author></item><item><title>Running DeepSeek R1 on Lambda Labs and Notes on Ghostty</title><link>https://baochun.org/2025-01-31/</link><guid isPermaLink="true">https://baochun.org/2025-01-31/</guid><description>Lambda Labs hosts DeepSeek R1 — the dashboard is simple, nice to look at, free to use, and pretty fast when generating tokens. Overall, an excellent user experience.</description><pubDate>Fri, 31 Jan 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://lambda.chat/&quot;&gt;Lambda Labs hosts DeepSeek R1&lt;/a&gt; — the dashboard is simple, nice to look at, free to use, and pretty fast when generating tokens. Overall, an excellent user experience. The DeepSeek Llama 3.3 70B is also available, and it is much faster: reasoning is done in 9 seconds for my question &lt;em&gt;What are the axioms of probability theory?&lt;/em&gt;, as opposed to 69 seconds with DeepSeek R1 671B.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://ghostty.org/&quot;&gt;Ghostty&lt;/a&gt; — &lt;a href=&quot;https://ghostty.org/docs/install/release-notes/1-1-0&quot;&gt;version 1.1.0&lt;/a&gt; is available with lots of updates and bug fixes. The best terminal emulator becomes even better.&lt;/p&gt;
</content:encoded><category>2025</category><author>Baochun Li</author></item><item><title>Fine-Tuning Open LLMs in 2025 with Hugging Face and Mini-R1</title><link>https://baochun.org/2025-01-30/</link><guid isPermaLink="true">https://baochun.org/2025-01-30/</guid><description>How to fine-tune open LLMs in 2025 with Hugging Face — Philipp Schmid a Technical Lead at Hugging Face, posted this article on fine-tuning LLMs using Hugging Face.</description><pubDate>Thu, 30 Jan 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://www.philschmid.de/fine-tune-llms-in-2025&quot;&gt;How to fine-tune open LLMs in 2025 with Hugging Face&lt;/a&gt; — &lt;a href=&quot;https://www.philschmid.de/&quot;&gt;Philipp Schmid&lt;/a&gt; a Technical Lead at Hugging Face, posted this article on fine-tuning LLMs using Hugging Face tools, without using the &lt;a href=&quot;https://unsloth.ai/&quot;&gt;Unsloth&lt;/a&gt; API. I find it comprehensive and I will need to give it a try myself.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://www.philschmid.de/mini-deepseek-r1&quot;&gt;Mini-R1&lt;/a&gt; — &lt;a href=&quot;https://www.philschmid.de/&quot;&gt;Philipp Schmid&lt;/a&gt; also posted this interesting reproduction of DeepSeek R1’s RL training. Similar to &lt;a href=&quot;https://github.com/Jiayi-Pan/TinyZero&quot;&gt;TinyZero&lt;/a&gt;, it used the Countdown Game as the task, but the article is much better written.&lt;/p&gt;
&lt;p&gt;Mini-R1 used Hugging Face’s own &lt;a href=&quot;https://huggingface.co/docs/trl/index&quot;&gt;TRL&lt;/a&gt;, designed to train transformer language models with RL in the post-training phase, which Hugging Face introduced in &lt;a href=&quot;https://github.com/huggingface/smol-course/tree/main/2_preference_alignment&quot;&gt;its smol course&lt;/a&gt;. To support multi-GPU training, it used &lt;a href=&quot;https://github.com/microsoft/DeepSpeed&quot;&gt;DeepSpeed&lt;/a&gt;. In contrast, TinyZero used ByteDance’s &lt;a href=&quot;https://github.com/volcengine/verl&quot;&gt;veRL&lt;/a&gt; for both RL and distributed training, which doesn’t have either TRL or DeepSpeed in its &lt;a href=&quot;https://github.com/volcengine/verl/blob/main/pyproject.toml&quot;&gt;dependencies&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;veRL is based on &lt;a href=&quot;https://arxiv.org/abs/2409.19256v2&quot;&gt;HybridFlow&lt;/a&gt;, a University of Hong Kong/ByteDance paper published in EuroSys 2025, co-authored by Prof. &lt;a href=&quot;https://i.cs.hku.hk/~cwu/index.html&quot;&gt;Chuan Wu&lt;/a&gt; from the University of Hong Kong. I will allocate some time to study this paper in greater detail.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/marketplace/models/&quot;&gt;Microsoft added DeepSeek R1 to GitHub Models&lt;/a&gt; — I tried it with a simple question and not only the inference speed is astonishingly low, errors occurred before completing the answer. It is unusable at this point.&lt;/p&gt;
</content:encoded><category>2025</category><category>workflows</category><category>frameworks</category><author>Baochun Li</author></item><item><title>DeepSeek, Export Controls, and Open-Weight AI Debates</title><link>https://baochun.org/2025-01-29/</link><guid isPermaLink="true">https://baochun.org/2025-01-29/</guid><description>On DeepSeek and Export Controls — Dario Amodei, Anthropic&apos;s CEO, wrote a fairly long editorial on DeepSeek.</description><pubDate>Wed, 29 Jan 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://darioamodei.com/on-deepseek-and-export-controls&quot;&gt;On DeepSeek and Export Controls&lt;/a&gt; — Dario Amodei, Anthropic’s CEO, wrote a fairly long editorial on DeepSeek. However, it doesn’t mention at all the fact that DeepSeek’s models are open-weight models under a permissive MIT license, while Anthropic and OpenAI remained closed-weight models with no transparency on the technologies they used for both training and inference. At one point, Amodei mentioned that both DeepSeek and OpenAI o1 used RL, and used this to imply that DeepSeek’s use of RL to train R1-Zero is not so innovative. But we don’t know &lt;em&gt;how&lt;/em&gt; OpenAI used RL to train o1, except that o1 &lt;em&gt;“uses a chain of thought when attempting to solve a problem,”&lt;/em&gt; and that reinforcement learning has been used to train it[^1]. It could be the case that DeepSeek’s use of RL for train-time compute is very different from o1, and the fact that its &lt;a href=&quot;https://arxiv.org/pdf/2402.03300&quot;&gt;affiliated technical report&lt;/a&gt; goes into sufficient technical detail on GRPO that makes it &lt;a href=&quot;https://github.com/Mohammadjafari80/GSM8K-RLVR&quot;&gt;fully reproducible&lt;/a&gt; is much more noteworthy.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=bAWV_yrqx4w&quot;&gt;DeepSeekMath Paper Explained&lt;/a&gt; — Yannic Kilcher gave this one-hour explanation of the &lt;a href=&quot;https://arxiv.org/pdf/2402.03300&quot;&gt;DeepSeekMath paper&lt;/a&gt;. I watched the first five minutes and minute 30 and beyond on GRPO. His explanations of GRPO are top-notch. The final five minutes, on Section 5.2.2 (“Why RL Works”), is insightful and worth tuning into.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://x.com/carrigmat/status/1884244369907278106&quot;&gt;Complete hardware for the full DeepSeek R1 at Q8 quantization, at $6000&lt;/a&gt; — The fact that this CPU-only server can generate at 6-8 tokens per second — the same as human reading speed — shows the very substantial advantage of Mixture-of-Experts (MOE) models when running in CPU-only home servers, as compared to dense models such as the Llama 3.1 405B. Assembling such a server is non-trivial and not for the faint of heart, but it certainly has been proven possible.&lt;/p&gt;
&lt;p&gt;[^1]: &lt;a href=&quot;https://openai.com/index/learning-to-reason-with-llms/&quot;&gt;Learning to reason with LLMs&lt;/a&gt;, OpenAI, September 12, 2024.&lt;/p&gt;
</content:encoded><category>2025</category><category>papers</category><author>Baochun Li</author></item><item><title>The Illustrated DeepSeek-R1: A Clear Visual Walkthrough</title><link>https://baochun.org/2025-01-28/</link><guid isPermaLink="true">https://baochun.org/2025-01-28/</guid><description>The Illustrated DeepSeek-R1 — Jay Alammar, the author of O&apos;Reilly’s Hands-On Large Language Models, wrote a short piece on explaining DeepSeek R1 at a high level.</description><pubDate>Tue, 28 Jan 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://newsletter.languagemodels.co/p/the-illustrated-deepseek-r1&quot;&gt;The Illustrated DeepSeek-R1&lt;/a&gt; — Jay Alammar, the author of O’Reilly’s &lt;a href=&quot;https://www.llm-book.com/&quot;&gt;Hands-On Large Language Models&lt;/a&gt;, wrote a short piece on explaining DeepSeek R1 at a high level. I found it easy to read and the illustrations are pleasing to the eye.&lt;/p&gt;
</content:encoded><category>2025</category><author>Baochun Li</author></item><item><title>Qwen 2.5 7B 1M Local Testing and RL Survey Notes</title><link>https://baochun.org/2025-01-27/</link><guid isPermaLink="true">https://baochun.org/2025-01-27/</guid><description>Qwen 2.5 7B 1M — I have just tried Qwen&apos;s latest local model, the 7B 1M, locally in LM Studio 0.3.8 (Build 4). I loaded an entire PhD thesis into the model, and LM Studio gleefully chose inject-full-content as its content injection strategy.</description><pubDate>Mon, 27 Jan 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://qwenlm.github.io/blog/qwen2.5-1m/&quot;&gt;Qwen 2.5 7B 1M&lt;/a&gt; — I have just tried Qwen’s latest local model, the 7B 1M, locally in &lt;a href=&quot;https://lmstudio.ai/&quot;&gt;LM Studio&lt;/a&gt; 0.3.8 (Build 4). I loaded an entire PhD thesis into the model, and LM Studio gleefully chose &lt;code&gt;inject-full-content&lt;/code&gt; as its content injection strategy, rather than &lt;code&gt;retrieval&lt;/code&gt;, which uses — the notoriously useless, in my humble opinion — RAG. This was not feasible before using a previous model, such as the DeepSeek R1 Distill Qwen 7B, with a context length of 128K.&lt;/p&gt;
&lt;p&gt;It took 38 minutes to successfully inject the PhD thesis (with 166 pages), and fans in my MacBook Pro M1 Max 64GB memory were blowing at full speed. Once injected, it generates 2 output tokens per second, and once the content is injected, the next question only needs 21 seconds to the first token. So asking this model to read an entire PhD thesis works on a local Mac, but one would have to be a bit more patient. LM Studio reports that 20 GB of RAM has been used after the model is loaded, with the context length set to 256K.&lt;/p&gt;
&lt;p&gt;Overall, this is indeed a very useful model for local use.&lt;/p&gt;
&lt;p&gt;P.S. Of course, if data privacy is not a concern, one can also use the 14B 1M model available on &lt;a href=&quot;https://chat.qwenlm.ai/&quot;&gt;Qwen chat&lt;/a&gt;. I tried it and it takes about 2 minutes to inject the entire PhD thesis and answer the first question. It’s interesting to observe that the time to first token for the second question is not much faster, taking about a minute and a half. The quality of the summaries is quite solid, but the language is not much easier to understand than the original thesis. This implies that if the original document is not well written, the summaries will not be too helpful either.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://arxiv.org/pdf/2501.09686&quot;&gt;Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models&lt;/a&gt; — a recently updated (v3) survey of reinforced reasoning with LLMs from Tsinghua University. After a quick read, I felt it is already somewhat out of date, despite the fact that it is last updated a few days ago. The &lt;a href=&quot;https://arxiv.org/abs/2501.12948&quot;&gt;DeepSeek R1 technical report&lt;/a&gt; has not been cited yet, for example. The paper spent quite a bit of space talking about the Process Reward Model (PRM):&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Process Reward Model (PRM) based Reinforcement Learning represents a significant advancement in LLM reasoning, emphasizing the evaluation of intermediate steps rather than solely focusing on end-state outcomes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;While discussing PRMs, it did include a brief mention of GRPO, with a citation to the &lt;a href=&quot;https://arxiv.org/abs/2402.03300&quot;&gt;DeepSeekMath&lt;/a&gt; paper that introduced it originally back in February 2024. The paper also spent quite some more space discussing the use of Monte Carlo Tree Search (MCTS).&lt;/p&gt;
&lt;p&gt;However, the &lt;a href=&quot;https://arxiv.org/abs/2501.12948&quot;&gt;DeepSeek R1 technical report&lt;/a&gt; rendered both PRM and MCTS unsuccessful, at least with DeepSeek’s own attempts:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;PRM is a reasonable method to guide the model toward better approaches for solving reasoning tasks (Lightman et al., 2023; Uesato et al., 2022; Wang et al., 2023). However, in practice, PRM has three main limitations that may hinder its ultimate success. First, it is challenging to explicitly define a fine-grain step in general reasoning. Second, determining whether the current intermediate step is correct is a challenging task. Automated annotation using models may not yield satisfactory results, while manual annotation is not conducive to scaling up. Third, once a model-based PRM is introduced, it inevitably leads to reward hacking (Gao et al., 2022), and retraining the reward model needs additional training resources and it complicates the whole training pipeline. In conclusion, while PRM demonstrates a good ability to rerank the top-N responses generated by the model or assist in guided search (Snell et al., 2024), its advantages are limited compared to the additional computational overhead it introduces during the large-scale reinforcement learning process in our experiments.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Inspired by AlphaGo (Silver et al., 2017b) and AlphaZero (Silver et al., 2017a), we explored using Monte Carlo Tree Search (MCTS) to enhance test-time compute scalability. This approach involves breaking answers into smaller parts to allow the model to explore the solution space systematically. To facilitate this, we prompt the model to generate multiple tags that correspond to specific reasoning steps necessary for the search. For training, we first use collected prompts to find answers via MCTS guided by a pre-trained value model. Subsequently, we use the resulting question-answer pairs to train both the actor model and the value model, iteratively refining the process.&lt;/p&gt;
&lt;p&gt;However, this approach encounters several challenges when scaling up the training. First, unlike chess, where the search space is relatively well-defined, token generation presents an exponentially larger search space. To address this, we set a maximum extension limit for each node, but this can lead to the model getting stuck in local optima. Second, the value model directly influences the quality of generation since it guides each step of the search process. Training a fine-grained value model is inherently difficult, which makes it challenging for the model to iteratively improve. While AlphaGo’s core success relied on training a value model to progressively enhance its performance, this principle proves difficult to replicate in our setup due to the complexities of token generation.&lt;/p&gt;
&lt;/blockquote&gt;
</content:encoded><category>2025</category><category>papers</category><category>frameworks</category><author>Baochun Li</author></item><item><title>Nvidia, DeepSeek, and RL Reasoning: Long-Form Analysis Notes</title><link>https://baochun.org/2025-01-26/</link><guid isPermaLink="true">https://baochun.org/2025-01-26/</guid><description>Although it’s quite long, The Short Case for Nvidia Stock is a fascinating read. Also, agents are not happening yet.</description><pubDate>Sun, 26 Jan 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://youtubetranscriptoptimizer.com/blog/05_the_short_case_for_nvda&quot;&gt;The Short Case for Nvidia Stock&lt;/a&gt; — I spent less than an hour reading a pretty substantial portion of this article. It’s so good that I will need to allocate some time to read it again. The entire article, and especially the DeepSeek portion of it, is highly recommended, even if one is not interested in investing. It’s a detailed outlook for the entire AI industry.&lt;/p&gt;
&lt;p&gt;As I am reading it the second time, the article covered tech that I have been following quite closely as well:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It mentioned how &lt;a href=&quot;https://cerebras.ai/blog/100x-defect-tolerance-how-cerebras-solved-the-yield-problem&quot;&gt;Cerebras&lt;/a&gt; solved its yield problem, while I have read its &lt;a href=&quot;https://cerebras.ai/blog/cepo&quot;&gt;CePO&lt;/a&gt; test-time compute strategies;&lt;/li&gt;
&lt;li&gt;It mentioned &lt;a href=&quot;https://groq.com/&quot;&gt;Groq&lt;/a&gt;, and I have tried its excellent and speedy inference service with a free account;&lt;/li&gt;
&lt;li&gt;It mentioned George Hotz’s Tiny Corp. and its &lt;a href=&quot;https://tinygrad.org/&quot;&gt;tinygrad&lt;/a&gt;, which I have been closely following on &lt;a href=&quot;https://x.com/__tinygrad__&quot;&gt;X&lt;/a&gt;. Back in the day, George Hotz was famous for jailbreaking the original iPhone as a teenager;&lt;/li&gt;
&lt;li&gt;It mentioned &lt;a href=&quot;https://ml-explore.github.io/mlx/build/html/index.html&quot;&gt;MLX&lt;/a&gt;, which, as the article said, provides a PyTorch-like API that can run efficiently on Apple Silicon, showing how abstraction layers can enable AI workloads to run on completely different architectures. MLX is particularly interesting as it supports distributed computation — both training and inference — across multiple Macs. And its main contributor, &lt;a href=&quot;https://x.com/awnihannun&quot;&gt;Awni Hannun&lt;/a&gt;, mentioned today that &lt;a href=&quot;https://x.com/awnihannun/status/1883276535643455790&quot;&gt;DeepSeek R1 can run with 4-bit quantization across three 192 GB M2 Ultra Mac Studios&lt;/a&gt; at 12 tokens per second, requiring a minimum of 450 GB GPU memory;&lt;/li&gt;
&lt;li&gt;And of course, it covered DeepSeek R1 in sufficient technical detail.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Wow, what a gem as a long-form read!&lt;/p&gt;
&lt;p&gt;P.S.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Chamath Palihapitiya &lt;a href=&quot;https://x.com/chamath/status/1883579259769462819?s=46&amp;amp;t=A2DgT1wxhfYAPII40irQMw&quot;&gt;also thought the article is very good&lt;/a&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step-by-step without relying on massive supervised datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully crafted reward functions, they managed to get models to develop sophisticated reasoning capabilities completely autonomously. This wasn’t just about solving problems— the model organically learned to generate long chains of thought, self-verify its work, and allocate more computation time to harder problems.&lt;/p&gt;
&lt;p&gt;The technical breakthrough here was their novel approach to reward modeling. Rather than using complex neural reward models that can lead to “reward hacking” (where the model finds bogus ways to boost their rewards that don’t actually lead to better real-world model performance), they developed a clever rule-based system that combines accuracy rewards (verifying final answers) with format rewards (encouraging structured thinking). This simpler approach turned out to be more robust and scalable than the process-based reward models that others have tried.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;With extraordinary prescience, NVidia stock is &lt;a href=&quot;https://www.wsj.com/livecoverage/stock-market-today-dow-sp500-nasdaq-live-01-27-2025/card/nvidia-stock-is-down-more-than-10-here-s-why--sZmsM8tvQFTS3iUBASHa&quot;&gt;down by over 14%&lt;/a&gt; around 11 a.m. the next morning, after this article is written. And the tech-heavy Nasdaq Composite falls 2.5%.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://simonwillison.net/2025/Jan/27/deepseek-nvidia/&quot;&gt;Simon Willison&lt;/a&gt; likes it too, calling it &lt;em&gt;“Long, excellent piece by Jeffrey Emanuel capturing the current state of the AI/LLM industry.”&lt;/em&gt; —&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;The real joy of this article is the way it describes technical details of modern LLMs in a relatively accessible manner. I love this description of the inference-scaling tricks used by O1 and R1, compared to traditional transformers.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://hkust-nlp.notion.site/simplerl-reason&quot;&gt;7B Model and 8K Examples: Emerging Reasoning with Reinforcement Learning is Both Effective and Efficient&lt;/a&gt; — Interesting. DeepSeek R1’s RL training techniques can be successfully applied to smaller models as well, at least for simple math datasets.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://trite-song-d6a.notion.site/Deepseek-R1-for-Everyone-1860af77bef3806c9db5e5c2a256577d&quot;&gt;DeepSeek R1 for Everyone&lt;/a&gt; and &lt;a href=&quot;https://lunar-joke-35b.notion.site/Deepseek-v3-101-169ba4b6a3fa8090a7aacaee1a1cefaa?pvs=24&quot;&gt;DeepSeek V3 101&lt;/a&gt; — With a brief read, these look promising as an easy read to understand some of the technical details of DeepSeek R1 and V3.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://simonwillison.net/2024/Dec/31/llms-in-2024/#-agents-still-haven-t-really-happened-yet&quot;&gt;“Agents” still haven’t really happened yet&lt;/a&gt; — “If you tell me that you are building “agents”, you’ve conveyed almost no information to me at all. Without reading your mind I have no way of telling which of the dozens of possible definitions you are talking about.”&lt;/p&gt;
</content:encoded><category>2025</category><category>papers</category><author>Baochun Li</author></item><item><title>Open-R1 and TinyZero: Early DeepSeek R1 Reproductions</title><link>https://baochun.org/2025-01-25/</link><guid isPermaLink="true">https://baochun.org/2025-01-25/</guid><description>Open-R1 — Hugging Face started to reproduce DeepSeek R1 in the open, and discussed the R1 technical report in a recorded YouTube video.</description><pubDate>Sat, 25 Jan 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;a href=&quot;https://github.com/huggingface/open-r1&quot;&gt;Open-R1&lt;/a&gt; — Hugging Face started to reproduce DeepSeek R1 in the open, and discussed the &lt;a href=&quot;https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf&quot;&gt;R1 technical report&lt;/a&gt; in a recorded &lt;a href=&quot;https://www.youtube.com/watch?v=1xDVbu-WaFo&quot;&gt;YouTube video&lt;/a&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/Jiayi-Pan/TinyZero&quot;&gt;TinyZero&lt;/a&gt; — a reproduction of DeepSeek R1 Zero in countdown and multiplication tasks.&lt;/p&gt;
</content:encoded><category>2025</category><category>papers</category><author>Baochun Li</author></item><item><title>What I’ve Been Reading</title><link>https://baochun.org/2025-01-24/</link><guid isPermaLink="true">https://baochun.org/2025-01-24/</guid><description>This website is a space for storing — and sharing, if anyone cares about these — some of the websites, code repositories, and tweets that I have read.</description><pubDate>Fri, 24 Jan 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This website is a space for storing — and sharing, if anyone cares about these — some of the websites, code repositories, and tweets that I have read. They are mostly about technology, but not necessarily tech that is currently in the spotlight. They are stored here because I thought they are worthy of preserving for a longer period of time.&lt;/p&gt;
&lt;p&gt;I have two relatively quick ways of preserving quality content. I can bookmark a link in a web browser or app; it is painless, but those bookmarks are too easy to misplace or forget about, and pretty difficult to search for after accumulating a larger number over time. For longevity, I can also place the link in a personal note (such as iOS Notes or &lt;a href=&quot;https://obsidian.md/&quot;&gt;Obsidian&lt;/a&gt;), which takes more effort and time, and thus the motivation for doing so in the long run is a bit questionable: why would anyone diligently copy and paste links to a personal note every time something interesting comes up?&lt;/p&gt;
&lt;p&gt;The beauty of sharing links publicly, besides the nature of sharing itself, is to add a slice of motivation to the cocktail: it &lt;em&gt;motivates&lt;/em&gt; me to do the work for copying and pasting. It also motivates me to add a bit of commentary, which records what I have been thinking about when reading the content linked to. Some called such a publicly shared website that contains links and commentaries a &lt;em&gt;digital garden&lt;/em&gt; or a &lt;em&gt;microblog&lt;/em&gt;, terms that I don’t quite like. &lt;a href=&quot;https://simonwillison.net/2024/Dec/22/link-blog/&quot;&gt;Simon Willison&lt;/a&gt; and &lt;a href=&quot;https://daringfireball.net/linked/2025/01/02/willisons-approach-to-running-a-link-blog&quot;&gt;John Gruber&lt;/a&gt; called such a style a &lt;em&gt;link blog&lt;/em&gt;. I will simply call it “What I’ve been reading.”&lt;/p&gt;
</content:encoded><category>2025</category><author>Baochun Li</author></item></channel></rss>