Autoresearch
Two weeks since Andrej Karpathy released Autoresearch, here are some noteworthy projects to keep an eye on.
Two weeks since Andrej Karpathy released Autoresearch, here are some noteworthy projects to keep an eye on.
I discovered a better way of converting PDFs to Markdowns, with all mathematical formulas converted to LaTeX, on Apple silicon.
Running Qwen 3.5 27B Q4_K_M on an RTX 4090 with llama-server and Hermes.
A reflection on a New York Times Magazine story about AI coding tools, software labor, and what future programmers may stop learning by hand.
I started to use Linear to track the tasks and their dependencies when I implemented new features with multiple agents in Codex.
A short answer to a student’s question about AI agents, hardware progress, and why software creativity still matters.
A short note on why Pages’ older, more colorful chrome still feels preferable, and why staying on macOS 14.8 is a useful guardrail.
I rechecked the Days codebase with GPT 5.4 xhigh and GPT 5.4 Pro, and the pair of models has found serious issues in one aspect that I asked it to focus on in the current implementation.
A few weeks ago, OpenAI posted a blog post on harness engineering. Yesterday, it also released a component of its workflow as open-source, called Symphony.
Prof. Donald Knuth, at age 88, said: “Shock! Shock! I learned yesterday that an open problem I’d been working on for several weeks had just been solved by Claude Opus 4.6 — Anthropic’s hybrid reasoning model that had been released three weeks earlier! It seems that I’ll have to revise my opinions about “generative AI” one of these days.”.
GPU-accelerated PDF-to-Markdown workflow with Marker that produces high-quality output quickly on an RTX 4090.
An email triage system for Fastmail that auto-sorts messages by priority and drafts replies for high-priority emails.
I tried Simon Willison’s prompt to build a linear walkthrough of Nextmini. Codex unsurprisingly launched several subagents as scouts to explore different parts of the codebase.
I have read Simon Willison’s Agentic Engineering Patterns, and red/green TDD, which I have not previously heard of, seems to be so effective that I must give it a try.
I wrote my own extension for the Pi coding agent to allow me to start multiple agents that collaborate with one another by sending and receiving messages.
More of us are replacing Netflix with Codex and spinning up a new agentic session before falling asleep.
I have been looking for a way to get Codex to draw figures reasonably well. I think I finally found a way.
It is surprisingly straightforward to migrate a website from Next.js to TanStack Start.
A handy AGENTS.md addition that makes sure that codex writes better plans and uses subagents proactively.
The iOS codex workflow has been streamlined again: now with the Moshi iOS app to ssh into my computer via the Tailscale network. Also, GPT 5.3 Codex Spark is super fast.
Electric’s Configurancy argues that when code is cheap, specs and oracle testing matter more than unit tests alone. And something big is happening.
A quick iOS Codex access tip with Agentboard, plus a strong Rust-over-Python essay for agentic programming.
I redesigned my personal website, featuring not only a simple, minimalist design, but also a streamlined process of writing and publishing new entries via CLI tools.
tiny-llm is exactly what I wished for. It also contains links to two existing PyTorch related courses to machine learning from Carnegie Mellon University.
Arc — My new browser of choice. I love the fact that bookmarks are organized on the side panel, rather than clustered at the top of the window.
Eleventy appears to be a pretty simple static website generator that is worth exploring. A competitor to Hugo.
How I use LLMs by Andrej Karpathy — A must watch.
Panasonic S1R II — With the Sigma 28-105 f/2.8, this would be my dream camera. It is just slightly heavier than my Panasonic S5 IIx (1.57 lb vs. 1.45 lb body only).
The Ultra-Scale Playbook: Training LLMs on GPU Clusters — Amazing, and finally we have a 100-page open-source online book on how models are trained with multiple GPUs.
Crafted — What a great looking set of open-source, hand-crafted UI templates based on shadcn/ui!
Better Auth — A new authentication library that is feature-complete and easy-to-use. Compared to Lucia, which advocates a copy-and-paste approach.
Andriy Burkov’s minimalist implementation of GRPO from scratch — Rather than using a library such as Hugging Face’s TRL.
Transformer Lab — a free, open-source LLM workspace that prepares a custom dataset and fine-tunes a model using MLX on the Mac.
Lucia — Lucia, the authentication library, has adopted the design of cutting and pasting code, just like shadcn/ui, rather than implementing a library.
From 0 to Production — The Modern React Tutorial — Theo released it last year, and I always wanted to learn from this marathon tutorial.
Unsloth.ai’s GRPO — it seems that the Unsloth implementation of GRPO uses less GPU memory, and it supports QLoRA and LoRA.
DOGE: Make AI Conferences Great Again — Zeyuan (Allen) Zhu wrote a very interesting piece on using LLMs as arbitrators in the reviewer-author discussions.
Deep Dive into LLMs like ChatGPT — Andrej Karpathy continues his top-notch hours-long education on large language models with a new episode today.
GRPO will soon be added to Apple MLX — The PR now works, using about 32 GB of memory when training Qwen2.5-0.5B.
Another simple DeepSeek R1 reproduction — This reproduction of GRPO has one distinct feature: it is exceedingly simple and quite elegant.
Fourth attempt on reproducing DeepSeek R1’s GRPO on small models — The third fourth time is the charm. I can successfully run this repo, without activating vLLM.
Lambda Labs hosts DeepSeek R1 — the dashboard is simple, nice to look at, free to use, and pretty fast when generating tokens. Overall, an excellent user experience.
How to fine-tune open LLMs in 2025 with Hugging Face — Philipp Schmid a Technical Lead at Hugging Face, posted this article on fine-tuning LLMs using Hugging Face.
On DeepSeek and Export Controls — Dario Amodei, Anthropic’s CEO, wrote a fairly long editorial on DeepSeek.
The Illustrated DeepSeek-R1 — Jay Alammar, the author of O’Reilly’s Hands-On Large Language Models, wrote a short piece on explaining DeepSeek R1 at a high level.
Qwen 2.5 7B 1M — I have just tried Qwen’s latest local model, the 7B 1M, locally in LM Studio 0.3.8 (Build 4). I loaded an entire PhD thesis into the model, and LM Studio gleefully chose inject-full-content as its content injection strategy.
Although it’s quite long, The Short Case for Nvidia Stock is a fascinating read. Also, agents are not happening yet.
Open-R1 — Hugging Face started to reproduce DeepSeek R1 in the open, and discussed the R1 technical report in a recorded YouTube video.
This website is a space for storing — and sharing, if anyone cares about these — some of the websites, code repositories, and tweets that I have read.
Use ↑/↓ to navigate results, Enter to open, Esc to close.
Type to search posts.
No matching posts found.