concept
Backtest Engines
Backtest Engines
Architectural pattern for simulating strategies against historical data. Drawn from vibe-trading‘s 7-engine setup. The interesting bit isn’t the finance math — it’s the composition pattern for simulating across heterogeneous domains with a shared resource pool.
Per-Market Engine Layout
Each market gets its own engine because cost models, trading hours, position sizing, and instrument quirks differ:
| Engine | Market | Quirks |
|---|---|---|
| AShareEngine | China A-shares | T+1 settlement, ST/*ST risk, 10% daily limits |
| USEquityEngine | US equities | T+2, fractional shares, after-hours |
| HKEquityEngine | Hong Kong | Lot sizes, stamp duty, Connect rules |
| CryptoEngine | Crypto spot | 24/7, sub-cent precision, exchange fees |
| ChinaFuturesEngine | CN futures | Margin, daily settlement, contract roll |
| GlobalFuturesEngine | Global futures | Same as CN but cross-border quirks |
| OptionsEngine | Options | Greeks, IV, early exercise |
The Composite Engine Pattern
The interesting move: a CompositeEngine that delegates to per-market engines but shares one capital pool.
┌─ AShareEngine (cash: shared pool)
├─ HKEquityEngine (cash: shared pool)
Composite─ USEquityEngine (cash: shared pool)
├─ CryptoEngine (cash: shared pool)
└─ ...
Each child engine handles per-market rules; the composite enforces global cash constraints, currency conversion, and cross-market position sizing.
Why it matters as a pattern: this isn’t trading-specific. Same shape applies anywhere you need to simulate heterogeneous sub-systems against a shared global resource — supply chain (warehouses with shared inventory), networks (services with shared bandwidth budget), AI agents (subagents with shared token budget).
Statistical Validation Layer
On top of raw backtest output, three validation passes are standard:
- Monte Carlo — resample trades to estimate the distribution of outcomes, not just a point estimate.
- Bootstrap CI — confidence intervals on Sharpe, drawdown, win rate.
- Walk-Forward — split history into train/test windows, walk forward in time. Catches overfit strategies that a single train/test split misses.
Without these, a backtest is just a story; with them, it’s a probabilistic claim.
Optimizer Layer
Four optimizer types ride on top of the engine:
- Grid search (baseline, slow, exhaustive)
- Random search
- Bayesian (gp_minimize / Optuna)
- Genetic / evolutionary
The interesting question for any optimizer: does it overfit to the validation set? Walk-forward is the antidote.
Benchmark Comparison
Strategies compare against benchmarks (SPY, CSI 300, BTC, etc.) on:
- Total return
- Excess return
- Information ratio
- Tracking error
Yfinance (yfinance) resolves benchmark tickers automatically.
Application Beyond Finance
The engine + composite + statistical-validation + optimizer + benchmark stack maps onto any “simulate, then validate, then tune, then compare” loop:
- AI evals: per-task evaluators → composite eval over a benchmark set → bootstrap CI on accuracy → walk-forward across model versions → benchmark vs prior model.
- Performance budgets: per-component synthetic load → composite end-to-end load → CI on latency → walk-forward across releases → vs SLA.
The pattern is more general than the domain it ships in.
Related
- vibe-trading — the implementation
- Multi-Agent Finance Workflows — the agents that use the backtest engines
- AI Agent Architectures — agents-with-tools view (engines are just deterministic tools)