concept

Backtest Engines

created 2026-05-04 finance · backtesting · simulation · architecture · patterns

Backtest Engines

Architectural pattern for simulating strategies against historical data. Drawn from vibe-trading‘s 7-engine setup. The interesting bit isn’t the finance math — it’s the composition pattern for simulating across heterogeneous domains with a shared resource pool.

Per-Market Engine Layout

Each market gets its own engine because cost models, trading hours, position sizing, and instrument quirks differ:

EngineMarketQuirks
AShareEngineChina A-sharesT+1 settlement, ST/*ST risk, 10% daily limits
USEquityEngineUS equitiesT+2, fractional shares, after-hours
HKEquityEngineHong KongLot sizes, stamp duty, Connect rules
CryptoEngineCrypto spot24/7, sub-cent precision, exchange fees
ChinaFuturesEngineCN futuresMargin, daily settlement, contract roll
GlobalFuturesEngineGlobal futuresSame as CN but cross-border quirks
OptionsEngineOptionsGreeks, IV, early exercise

The Composite Engine Pattern

The interesting move: a CompositeEngine that delegates to per-market engines but shares one capital pool.

        ┌─ AShareEngine    (cash: shared pool)
        ├─ HKEquityEngine  (cash: shared pool)
Composite─ USEquityEngine  (cash: shared pool)
        ├─ CryptoEngine    (cash: shared pool)
        └─ ...

Each child engine handles per-market rules; the composite enforces global cash constraints, currency conversion, and cross-market position sizing.

Why it matters as a pattern: this isn’t trading-specific. Same shape applies anywhere you need to simulate heterogeneous sub-systems against a shared global resource — supply chain (warehouses with shared inventory), networks (services with shared bandwidth budget), AI agents (subagents with shared token budget).

Statistical Validation Layer

On top of raw backtest output, three validation passes are standard:

  1. Monte Carlo — resample trades to estimate the distribution of outcomes, not just a point estimate.
  2. Bootstrap CI — confidence intervals on Sharpe, drawdown, win rate.
  3. Walk-Forward — split history into train/test windows, walk forward in time. Catches overfit strategies that a single train/test split misses.

Without these, a backtest is just a story; with them, it’s a probabilistic claim.

Optimizer Layer

Four optimizer types ride on top of the engine:

  • Grid search (baseline, slow, exhaustive)
  • Random search
  • Bayesian (gp_minimize / Optuna)
  • Genetic / evolutionary

The interesting question for any optimizer: does it overfit to the validation set? Walk-forward is the antidote.

Benchmark Comparison

Strategies compare against benchmarks (SPY, CSI 300, BTC, etc.) on:

  • Total return
  • Excess return
  • Information ratio
  • Tracking error

Yfinance (yfinance) resolves benchmark tickers automatically.

Application Beyond Finance

The engine + composite + statistical-validation + optimizer + benchmark stack maps onto any “simulate, then validate, then tune, then compare” loop:

  • AI evals: per-task evaluators → composite eval over a benchmark set → bootstrap CI on accuracy → walk-forward across model versions → benchmark vs prior model.
  • Performance budgets: per-component synthetic load → composite end-to-end load → CI on latency → walk-forward across releases → vs SLA.

The pattern is more general than the domain it ships in.