We rebuilt the backtesting engine with point-in-time cursors and separate ingestion timestamps, collapsing the backtest-to-live delta from 37 percentage points to 1.4 points.
37→1.4pp
Backtest-to-live delta (biased → clean)
32.8%
Live win rate (was 10%)
380ms
5-year backtest query time
7
Surviving strategies (of 24)
CHAPTER 01
The original Apex backtesting framework reported a 47% win rate on early experiments. Live trading produced 10%. The gap was not a strategy failure. It was an infrastructure failure: the backtest was contaminated with look-ahead bias at multiple layers.
Three specific violations were found in the audit. First, daily bar data was queried using the bar's close timestamp as the available-as-of date, when in practice that bar was not ingested until the daily downloader ran hours later. Second, signal features used a 20-day rolling mean computed across the entire dataset before the backtest loop started, meaning the mean at day 5 of the backtest included data from day 300. Third, macro regime labels were computed once over the full history using a hidden Markov model fit on the complete dataset. The regime label at any historical time step therefore incorporated information from months that had not yet occurred.
CHAPTER 02
The redesign was organized around a single primitive: the point_in_time_cursor. Every data access during a backtest accepted a sim_clock parameter and returned only rows where ingested_at <= sim_clock. The ingested_at column was distinct from ts (the event time of the data) and reflected when the row actually landed in the database.
Historical rows that predated this schema change were assigned ingested_at values by backfilling with a conservative estimate: ts + INTERVAL 8 HOUR, encoding the assumption that end-of-day bars arrived eight hours after market close. This was a worst-case estimate rather than an optimistic one, deliberately.
Macro regime labels were recomputed using a walk-forward approach: the HMM was trained on data up to each quarter boundary, and a separate regime label set was stored per training window in a regime_labels_walk_forward table.
ARCHITECTURE OVERVIEW
INGEST
Rust 1.84 (apex-backtest)
FEATURES
ClickHouse 26.3 (point-in-time cursor)
TRAIN
rand_chacha 0.3 (deterministic RNG)
v1 / v2 / v3
SERVE
LightGBM
Production predictions feed back into training set. Continuous retraining cadence
CHAPTER 03
Feature computation changed from batch-over-history to rolling-at-time. Rolling means, volatility estimates, and momentum indicators were computed using only rows with ingested_at <= sim_clock. This was slower by roughly 8x on the initial implementation.
Feature computation was accelerated by precomputing and caching rolling windows up to the current sim_clock in 1-day increments and storing the intermediate state in a BTreeMap keyed by date. The simulation advanced in 1-day steps. At each step, the feature cache extended by one day rather than recomputing the full window, bringing rolling-mean computation back to approximately the same speed as the original contaminated batch approach.
A second architectural requirement was determinism. The same strategy parameters applied to the same date range had to produce identical results across runs. All randomness in the engine was seeded from a fixed seed derived from (strategy_id, start_date, end_date) using a hash function. Random number generation used the rand_chacha crate for reproducible ChaCha20 output.
TECH STACK
CHAPTER 04
After the point-in-time rebuild, win rate in backtest dropped from 47.0% to 34.2%, and live win rate rose from 10.0% to 32.8%. The backtest-to-live delta collapsed from 37 percentage points to 1.4 percentage points. The strategy-selection process filtered from 24 candidate strategies to 7 that passed a minimum Sharpe of 0.5 on clean backtests. The 7 survivors ran on live paper trading for 4 weeks and produced win rates between 30 and 38%, consistent with the clean backtest projections.
Strategies that showed a 15 to 40% PnL reduction under empirical slippage modeling were correctly eliminated. A strategy that requires perfect execution to be profitable is not a strategy.
37→1.4pp
Backtest-to-live delta (biased → clean)
32.8%
Live win rate (was 10%)
380ms
5-year backtest query time
7
Surviving strategies (of 24)
CHAPTER 05
DECISION · 01
Look-ahead bias is additive across layers. The original system had three independent violations. Any one of them was sufficient to corrupt results. Fixing two would have still produced a gap between backtest and live. All three had to be fixed together before the delta closed.
DECISION · 02
Separate event time from ingestion time at the schema level. The ts vs ingested_at distinction was the core fix. Without schema support, the point-in-time constraint cannot be enforced mechanically, and every new query becomes a potential new violation.
DECISION · 03
Backfill conservatively, not optimistically. Assigning ingested_at = ts + 8 HOUR for historical rows was a deliberate conservative choice. Optimistic backfill would have re-introduced mild look-ahead bias. Conservative backfill slightly underestimates performance on historical data, which is acceptable.
START A PROJECT
We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a ProjectWe rebuilt the signal scoring pipeline from scratch, fixing look-ahead contamination and adding a top-decile filter that produced 72.2% win rate on selected signals.
72.2% Win rate (top-decile signals)
Read case study →
AI / Machine LearningWe found a 50-percentage-point win rate spread between market regimes, fixed a regime classifier that was routing by symbol name instead of market structure, and built a live suppression system for anti-patterns.
62.1% Win rate in choppy regime
Read case study →
AI / Machine LearningWe built a Rust correlation engine processing 1,200 symbols with incremental sliding window updates at 340ms p95 per cycle, 14x faster than full recompute.
1,200 Symbols in correlation matrix
Read case study →