Autoresearch

An open-source project by Andrej Karpathy that turns an AI coding agent into a fully autonomous ML researcher.

How It Works

Point the agent at a small LLM training setup, go to sleep, wake up to ~100 experiments and (hopefully) a better model.

The Loop

Agent reads all files for context, runs baseline training
Proposes an idea, edits train.py, commits
Trains for 5 minutes, reads the metric (val_bpb)
If improved → keep. If not → git reset back
Repeat indefinitely — no human in the loop

Three Files

prepare.py — Read-only infrastructure (data, tokenizer, eval). Agent cannot touch this
train.py — The single file the agent edits. Contains full GPT model, optimizer, training loop
program.md — The “research program” written by the human. Instructions, constraints, methodology

Key Design Choices

Fixed 5-minute wall-clock budget makes all experiments directly comparable
Single GPU, single file, single metric — intentionally minimal
Human role shifts: from writing Python to writing program.md — “programming the researcher”

Relevance to Kulify

The program.md pattern is directly analogous to how we use CLAUDE.md as the schema for the Second Brain. The human defines the program, the AI executes and iterates.

Could inspire automated knowledge curation — an “auto-lint” that continuously improves the SB.