AI / Machine LearningSTATUS: PRODUCTIONSHIPPED IN 2 WEEKS

Apex Backtesting Engine: Eliminating Look-Ahead Bias at 425M Rows

We rebuilt the backtesting engine with point-in-time cursors and separate ingestion timestamps, collapsing the backtest-to-live delta from 37 percentage points to 1.4 points.

37→1.4pp

Backtest-to-live delta (biased → clean)

32.8%

Live win rate (was 10%)

380ms

5-year backtest query time

Surviving strategies (of 24)

Rust 1.84 (apex-backtest)ClickHouse 26.3 (point-in-time cursor)rand_chacha 0.3 (deterministic RNG)LightGBMPython 3.12

CHAPTER 01

Problem & Context

The original Apex backtesting framework reported a 47% win rate on early experiments. Live trading produced 10%. The gap was not a strategy failure. It was an infrastructure failure: the backtest was contaminated with look-ahead bias at multiple layers.

Three specific violations were found in the audit. First, daily bar data was queried using the bar's close timestamp as the available-as-of date, when in practice that bar was not ingested until the daily downloader ran hours later. Second, signal features used a 20-day rolling mean computed across the entire dataset before the backtest loop started, meaning the mean at day 5 of the backtest included data from day 300. Third, macro regime labels were computed once over the full history using a hidden Markov model fit on the complete dataset. The regime label at any historical time step therefore incorporated information from months that had not yet occurred.

CHAPTER 02

Approach & Architecture

The redesign was organized around a single primitive: the point_in_time_cursor. Every data access during a backtest accepted a sim_clock parameter and returned only rows where ingested_at <= sim_clock. The ingested_at column was distinct from ts (the event time of the data) and reflected when the row actually landed in the database.

Historical rows that predated this schema change were assigned ingested_at values by backfilling with a conservative estimate: ts + INTERVAL 8 HOUR, encoding the assumption that end-of-day bars arrived eight hours after market close. This was a worst-case estimate rather than an optimistic one, deliberately.

Macro regime labels were recomputed using a walk-forward approach: the HMM was trained on data up to each quarter boundary, and a separate regime label set was stored per training window in a regime_labels_walk_forward table.

ARCHITECTURE OVERVIEW

INGEST

Rust 1.84 (apex-backtest)

FEATURES

ClickHouse 26.3 (point-in-time cursor)

TRAIN

rand_chacha 0.3 (deterministic RNG)

v1 / v2 / v3

eval loop

SERVE

LightGBM

Production predictions feed back into training set. Continuous retraining cadence

37→1.4pp Backtest-to-live delta (biased → clean)32.8% Live win rate (was 10%)380ms 5-year backtest query time

CHAPTER 03

Implementation

Feature computation changed from batch-over-history to rolling-at-time. Rolling means, volatility estimates, and momentum indicators were computed using only rows with ingested_at <= sim_clock. This was slower by roughly 8x on the initial implementation.

Feature computation was accelerated by precomputing and caching rolling windows up to the current sim_clock in 1-day increments and storing the intermediate state in a BTreeMap keyed by date. The simulation advanced in 1-day steps. At each step, the feature cache extended by one day rather than recomputing the full window, bringing rolling-mean computation back to approximately the same speed as the original contaminated batch approach.

A second architectural requirement was determinism. The same strategy parameters applied to the same date range had to produce identical results across runs. All randomness in the engine was seeded from a fixed seed derived from (strategy_id, start_date, end_date) using a hash function. Random number generation used the rand_chacha crate for reproducible ChaCha20 output.

TECH STACK

Rust 1.84 (apex-backtest)ClickHouse 26.3 (point-in-time cursor)rand_chacha 0.3 (deterministic RNG)LightGBMPython 3.12

CHAPTER 04

Results & Metrics

After the point-in-time rebuild, win rate in backtest dropped from 47.0% to 34.2%, and live win rate rose from 10.0% to 32.8%. The backtest-to-live delta collapsed from 37 percentage points to 1.4 percentage points. The strategy-selection process filtered from 24 candidate strategies to 7 that passed a minimum Sharpe of 0.5 on clean backtests. The 7 survivors ran on live paper trading for 4 weeks and produced win rates between 30 and 38%, consistent with the clean backtest projections.

Strategies that showed a 15 to 40% PnL reduction under empirical slippage modeling were correctly eliminated. A strategy that requires perfect execution to be profitable is not a strategy.

37→1.4pp

Backtest-to-live delta (biased → clean)

32.8%

Live win rate (was 10%)

380ms

5-year backtest query time

Surviving strategies (of 24)

CHAPTER 05

Lessons & Technical Decisions

DECISION · 01

Look-ahead bias is additive across layers. The original system had three independent violations. Any one of them was sufficient to corrupt results. Fixing two would have still produced a gap between backtest and live. All three had to be fixed together before the delta closed.

DECISION · 02

Separate event time from ingestion time at the schema level. The ts vs ingested_at distinction was the core fix. Without schema support, the point-in-time constraint cannot be enforced mechanically, and every new query becomes a potential new violation.

DECISION · 03

Backfill conservatively, not optimistically. Assigning ingested_at = ts + 8 HOUR for historical rows was a deliberate conservative choice. Optimistic backfill would have re-introduced mild look-ahead bias. Conservative backfill slightly underestimates performance on historical data, which is acceptable.

START A PROJECT

Need something like this?

We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.

Start a Project

Related case studies

AI / Machine Learning

ML Signal Scoring: From 48% Accuracy to a 72% Win Rate Through Architectural Selection

We rebuilt the signal scoring pipeline from scratch, fixing look-ahead contamination and adding a top-decile filter that produced 72.2% win rate on selected signals.

72.2% Win rate (top-decile signals)

Read case study →

AI / Machine Learning

Regime Detection: A 50-Point Win Rate Spread and the System That Learned to Exploit It

We found a 50-percentage-point win rate spread between market regimes, fixed a regime classifier that was routing by symbol name instead of market structure, and built a live suppression system for anti-patterns.

62.1% Win rate in choppy regime

Read case study →

AI / Machine Learning

Correlation Engine: Real-Time NxN Correlation and Cluster Detection Across 1,200 Symbols

We built a Rust correlation engine processing 1,200 symbols with incremental sliding window updates at 340ms p95 per cycle, 14x faster than full recompute.

1,200 Symbols in correlation matrix

Read case study →

Start a Project

AI / Machine LearningSTATUS: PRODUCTIONSHIPPED IN 2 WEEKS

Apex Backtesting Engine: Eliminating Look-Ahead Bias at 425M Rows

We rebuilt the backtesting engine with point-in-time cursors and separate ingestion timestamps, collapsing the backtest-to-live delta from 37 percentage points to 1.4 points.

37→1.4pp

Backtest-to-live delta (biased → clean)

32.8%

Live win rate (was 10%)

380ms

5-year backtest query time

Surviving strategies (of 24)

Rust 1.84 (apex-backtest)ClickHouse 26.3 (point-in-time cursor)rand_chacha 0.3 (deterministic RNG)LightGBMPython 3.12

CHAPTER 01

Problem & Context

CHAPTER 02

Approach & Architecture

ARCHITECTURE OVERVIEW

INGEST

Rust 1.84 (apex-backtest)

FEATURES

ClickHouse 26.3 (point-in-time cursor)

TRAIN

rand_chacha 0.3 (deterministic RNG)

v1 / v2 / v3

eval loop

SERVE

LightGBM

Production predictions feed back into training set. Continuous retraining cadence

37→1.4pp Backtest-to-live delta (biased → clean)32.8% Live win rate (was 10%)380ms 5-year backtest query time

CHAPTER 03

Implementation

TECH STACK

Rust 1.84 (apex-backtest)ClickHouse 26.3 (point-in-time cursor)rand_chacha 0.3 (deterministic RNG)LightGBMPython 3.12

CHAPTER 04

Results & Metrics

Strategies that showed a 15 to 40% PnL reduction under empirical slippage modeling were correctly eliminated. A strategy that requires perfect execution to be profitable is not a strategy.

37→1.4pp

Backtest-to-live delta (biased → clean)

32.8%

Live win rate (was 10%)

380ms

5-year backtest query time

Surviving strategies (of 24)

CHAPTER 05

Lessons & Technical Decisions

DECISION · 01

DECISION · 02

DECISION · 03

START A PROJECT

Need something like this?

We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.

Start a Project

Related case studies

AI / Machine Learning

ML Signal Scoring: From 48% Accuracy to a 72% Win Rate Through Architectural Selection

We rebuilt the signal scoring pipeline from scratch, fixing look-ahead contamination and adding a top-decile filter that produced 72.2% win rate on selected signals.

72.2% Win rate (top-decile signals)

Read case study →

AI / Machine Learning

Regime Detection: A 50-Point Win Rate Spread and the System That Learned to Exploit It

62.1% Win rate in choppy regime

Read case study →

AI / Machine Learning

Correlation Engine: Real-Time NxN Correlation and Cluster Detection Across 1,200 Symbols

We built a Rust correlation engine processing 1,200 symbols with incremental sliding window updates at 340ms p95 per cycle, 14x faster than full recompute.

1,200 Symbols in correlation matrix

Read case study →