We replaced fixed 50/50 email A/B splits with Thompson sampling over Beta distributions, cutting sends to convergence from never-observed to 147 median and raising open rate from 13.7% to 19.2%.
19.2%
Open rate (Thompson vs 13.7% fixed)
4.1%
Click rate (Thompson vs 2.9% fixed)
147
Sends to convergence (median)
31%
Regret reduction (offline sim)
CHAPTER 01
A/B testing email copy at fixed 50/50 splits wastes sends on losing variants. If variant A is clearly underperforming after 30 sends, the next 220 sends splitting evenly between A and B are partially discarded value. The regret is real: every send to a losing variant is a send not going to the best-known variant.
The Avo Engine outbound system needed to select among copy variants automatically, route sends to better-performing variants faster than static splits, and promote winners without manual review on every campaign. The secondary problem was data sparsity. Cold email campaigns generate thin feedback loops. A good campaign might hit 5 to 10% open rate, meaning 90 to 95 sends return no signal at all. The algorithm needed to function with low observation counts.
CHAPTER 02
Thompson sampling over a Beta distribution was chosen over epsilon-greedy and UCB1 for three reasons. First, Beta-Bernoulli conjugacy made posterior updates exact and cheap: an open event updates the alpha parameter, a send without an open updates the beta parameter. Second, Thompson sampling performs well under low observation counts: UCB1 requires careful tuning of the exploration coefficient at low N, while Thompson sampling automatically reduces exploration as posteriors sharpen. Third, the system needed to support CTR as a secondary reward signal: a two-stage Beta model with independent posteriors allowed compositional scoring as open_prior times ctr_prior.
Prior initialization used Beta(1, 1) (uniform) for new variants. For variants with prior campaign history on similar audience segments, a weakly informative prior of Beta(2, 18) was used, encoding the background expectation of a roughly 10% open rate.
ARCHITECTURE OVERVIEW
INGEST
Rust 1.84 (statrs 0.17)
FEATURES
PostgreSQL (variant_stats)
TRAIN
Redis (webhook queue)
v1 / v2 / v3
SERVE
Chi-squared test (Rust implementation)
Production predictions feed back into training set. Continuous retraining cadence
CHAPTER 03
The bandit state was stored in a Postgres table with columns variant_id, campaign_id, alpha FLOAT, beta FLOAT, alpha_ctr FLOAT, beta_ctr FLOAT, sends INT, opens INT, clicks INT, updated_at TIMESTAMPTZ. No external ML library. Posterior parameters were maintained as running floats, updated in a single UPDATE statement on each event.
At send time, the send-selector drew one sample per active variant by computing a sample from beta_distribution_sample(alpha, beta) using the Rust statrs crate. The variant with the highest sample value won the send slot. Variants with wide posteriors win occasionally due to random sampling, providing natural exploration.
Auto-promotion fired when any variant's sends counter crossed the N=50 threshold and its sampled expected reward exceeded all other variants by a margin of 0.03 (3 percentage points). Below that margin, sampling continued. The margin prevented premature promotion when variants were within noise.
One postmortem issue: the first implementation recalculated posteriors from raw event counts on every draw rather than storing running parameters. This hit Postgres read contention under concurrent sends. Switching to stored alpha/beta floats updated incrementally eliminated the contention. A second issue: webhook delivery for open events was not exactly-once. A dedup table keyed on (message_id, event_type) with a unique index prevented double-counting.
TECH STACK
CHAPTER 04
Offline evaluation ran on 10,486 historical sends from the prior rule-based system. Thompson sampling reduced simulated cumulative regret by 31% relative to round-robin over the first 200 sends per campaign. By send 150, it had converged to the best variant in 89% of simulated campaigns. Round-robin had not converged by send 200 in any campaign.
Online results from the first three live campaigns: mean open rate on Thompson-selected variant allocations was 19.2% versus 13.7% when using the fixed 50/50 holdout. The auto-promotion path eliminated manual review on 100% of routine campaigns. Only 12% of campaigns were escalated for human review when the margin between top two variants stayed below 0.03 at N=200.
19.2%
Open rate (Thompson vs 13.7% fixed)
4.1%
Click rate (Thompson vs 2.9% fixed)
147
Sends to convergence (median)
31%
Regret reduction (offline sim)
CHAPTER 05
DECISION · 01
Posterior storage over recomputation. Storing running alpha/beta floats and updating them incrementally cost one additional column write per event but eliminated read contention entirely.
DECISION · 02
Dedup is not optional. Webhook-delivered events are not exactly-once. Any system that uses open or click signals as posterior updates must dedup at the store layer. Inflated alphas cause premature exploitation.
DECISION · 03
Beta-Bernoulli conjugacy scales. For binary outcomes at the scale of cold email, the conjugate model was sufficient. No deep learning required. The infrastructure cost for the entire bandit subsystem was two Postgres columns and one Rust crate.
START A PROJECT
We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a ProjectWe rebuilt the signal scoring pipeline from scratch, fixing look-ahead contamination and adding a top-decile filter that produced 72.2% win rate on selected signals.
72.2% Win rate (top-decile signals)
Read case study →
AI / Machine LearningWe found a 50-percentage-point win rate spread between market regimes, fixed a regime classifier that was routing by symbol name instead of market structure, and built a live suppression system for anti-patterns.
62.1% Win rate in choppy regime
Read case study →
AI / Machine LearningWe built a Rust correlation engine processing 1,200 symbols with incremental sliding window updates at 340ms p95 per cycle, 14x faster than full recompute.
1,200 Symbols in correlation matrix
Read case study →