Backtest Engines

Architectural pattern for simulating strategies against historical data. Drawn from vibe-trading‘s 7-engine setup. The interesting bit isn’t the finance math — it’s the composition pattern for simulating across heterogeneous domains with a shared resource pool.

Per-Market Engine Layout

Each market gets its own engine because cost models, trading hours, position sizing, and instrument quirks differ:

Engine	Market	Quirks
AShareEngine	China A-shares	T+1 settlement, ST/*ST risk, 10% daily limits
USEquityEngine	US equities	T+2, fractional shares, after-hours
HKEquityEngine	Hong Kong	Lot sizes, stamp duty, Connect rules
CryptoEngine	Crypto spot	24/7, sub-cent precision, exchange fees
ChinaFuturesEngine	CN futures	Margin, daily settlement, contract roll
GlobalFuturesEngine	Global futures	Same as CN but cross-border quirks
OptionsEngine	Options	Greeks, IV, early exercise

The Composite Engine Pattern

The interesting move: a CompositeEngine that delegates to per-market engines but shares one capital pool.

        ┌─ AShareEngine    (cash: shared pool)
        ├─ HKEquityEngine  (cash: shared pool)
Composite─ USEquityEngine  (cash: shared pool)
        ├─ CryptoEngine    (cash: shared pool)
        └─ ...

Each child engine handles per-market rules; the composite enforces global cash constraints, currency conversion, and cross-market position sizing.

Why it matters as a pattern: this isn’t trading-specific. Same shape applies anywhere you need to simulate heterogeneous sub-systems against a shared global resource — supply chain (warehouses with shared inventory), networks (services with shared bandwidth budget), AI agents (subagents with shared token budget).

Statistical Validation Layer

On top of raw backtest output, three validation passes are standard:

Monte Carlo — resample trades to estimate the distribution of outcomes, not just a point estimate.
Bootstrap CI — confidence intervals on Sharpe, drawdown, win rate.
Walk-Forward — split history into train/test windows, walk forward in time. Catches overfit strategies that a single train/test split misses.

Without these, a backtest is just a story; with them, it’s a probabilistic claim.

Optimizer Layer

Four optimizer types ride on top of the engine:

Grid search (baseline, slow, exhaustive)
Random search
Bayesian (gp_minimize / Optuna)
Genetic / evolutionary

The interesting question for any optimizer: does it overfit to the validation set? Walk-forward is the antidote.

Benchmark Comparison

Strategies compare against benchmarks (SPY, CSI 300, BTC, etc.) on:

Total return
Excess return
Information ratio
Tracking error

Yfinance (yfinance) resolves benchmark tickers automatically.

Application Beyond Finance

The engine + composite + statistical-validation + optimizer + benchmark stack maps onto any “simulate, then validate, then tune, then compare” loop:

AI evals: per-task evaluators → composite eval over a benchmark set → bootstrap CI on accuracy → walk-forward across model versions → benchmark vs prior model.
Performance budgets: per-component synthetic load → composite end-to-end load → CI on latency → walk-forward across releases → vs SLA.

The pattern is more general than the domain it ships in.

vibe-trading — the implementation
Multi-Agent Finance Workflows — the agents that use the backtest engines
AI Agent Architectures — agents-with-tools view (engines are just deterministic tools)