concept

Autoresearch

created 2026-04-19 ai · research · automation · ml

Autoresearch

An open-source project by Andrej Karpathy that turns an AI coding agent into a fully autonomous ML researcher.

How It Works

Point the agent at a small LLM training setup, go to sleep, wake up to ~100 experiments and (hopefully) a better model.

The Loop

  1. Agent reads all files for context, runs baseline training
  2. Proposes an idea, edits train.py, commits
  3. Trains for 5 minutes, reads the metric (val_bpb)
  4. If improved → keep. If not → git reset back
  5. Repeat indefinitely — no human in the loop

Three Files

  • prepare.py — Read-only infrastructure (data, tokenizer, eval). Agent cannot touch this
  • train.py — The single file the agent edits. Contains full GPT model, optimizer, training loop
  • program.md — The “research program” written by the human. Instructions, constraints, methodology

Key Design Choices

  • Fixed 5-minute wall-clock budget makes all experiments directly comparable
  • Single GPU, single file, single metric — intentionally minimal
  • Human role shifts: from writing Python to writing program.md — “programming the researcher”

Relevance to Kulify

The program.md pattern is directly analogous to how we use CLAUDE.md as the schema for the Second Brain. The human defines the program, the AI executes and iterates.

Could inspire automated knowledge curation — an “auto-lint” that continuously improves the SB.