AI / Machine LearningSTATUS: PRODUCTIONSHIPPED IN 1 WEEK

Thompson Sampling for Outbound Copy Variant Selection

We replaced fixed 50/50 email A/B splits with Thompson sampling over Beta distributions, cutting sends to convergence from never-observed to 147 median and raising open rate from 13.7% to 19.2%.

19.2%

Open rate (Thompson vs 13.7% fixed)

4.1%

Click rate (Thompson vs 2.9% fixed)

147

Sends to convergence (median)

31%

Regret reduction (offline sim)

Rust 1.84 (statrs 0.17)PostgreSQL (variant_stats)Redis (webhook queue)Chi-squared test (Rust implementation)

CHAPTER 01

Problem & Context

A/B testing email copy at fixed 50/50 splits wastes sends on losing variants. If variant A is clearly underperforming after 30 sends, the next 220 sends splitting evenly between A and B are partially discarded value. The regret is real: every send to a losing variant is a send not going to the best-known variant.

The Avo Engine outbound system needed to select among copy variants automatically, route sends to better-performing variants faster than static splits, and promote winners without manual review on every campaign. The secondary problem was data sparsity. Cold email campaigns generate thin feedback loops. A good campaign might hit 5 to 10% open rate, meaning 90 to 95 sends return no signal at all. The algorithm needed to function with low observation counts.

CHAPTER 02

Approach & Architecture

Thompson sampling over a Beta distribution was chosen over epsilon-greedy and UCB1 for three reasons. First, Beta-Bernoulli conjugacy made posterior updates exact and cheap: an open event updates the alpha parameter, a send without an open updates the beta parameter. Second, Thompson sampling performs well under low observation counts: UCB1 requires careful tuning of the exploration coefficient at low N, while Thompson sampling automatically reduces exploration as posteriors sharpen. Third, the system needed to support CTR as a secondary reward signal: a two-stage Beta model with independent posteriors allowed compositional scoring as open_prior times ctr_prior.

Prior initialization used Beta(1, 1) (uniform) for new variants. For variants with prior campaign history on similar audience segments, a weakly informative prior of Beta(2, 18) was used, encoding the background expectation of a roughly 10% open rate.

ARCHITECTURE OVERVIEW

INGEST

Rust 1.84 (statrs 0.17)

FEATURES

PostgreSQL (variant_stats)

TRAIN

Redis (webhook queue)

v1 / v2 / v3

eval loop

SERVE

Chi-squared test (Rust implementation)

Production predictions feed back into training set. Continuous retraining cadence

19.2% Open rate (Thompson vs 13.7% fixed)4.1% Click rate (Thompson vs 2.9% fixed)147 Sends to convergence (median)

CHAPTER 03

Implementation

The bandit state was stored in a Postgres table with columns variant_id, campaign_id, alpha FLOAT, beta FLOAT, alpha_ctr FLOAT, beta_ctr FLOAT, sends INT, opens INT, clicks INT, updated_at TIMESTAMPTZ. No external ML library. Posterior parameters were maintained as running floats, updated in a single UPDATE statement on each event.

At send time, the send-selector drew one sample per active variant by computing a sample from beta_distribution_sample(alpha, beta) using the Rust statrs crate. The variant with the highest sample value won the send slot. Variants with wide posteriors win occasionally due to random sampling, providing natural exploration.

Auto-promotion fired when any variant's sends counter crossed the N=50 threshold and its sampled expected reward exceeded all other variants by a margin of 0.03 (3 percentage points). Below that margin, sampling continued. The margin prevented premature promotion when variants were within noise.

One postmortem issue: the first implementation recalculated posteriors from raw event counts on every draw rather than storing running parameters. This hit Postgres read contention under concurrent sends. Switching to stored alpha/beta floats updated incrementally eliminated the contention. A second issue: webhook delivery for open events was not exactly-once. A dedup table keyed on (message_id, event_type) with a unique index prevented double-counting.

TECH STACK

Rust 1.84 (statrs 0.17)PostgreSQL (variant_stats)Redis (webhook queue)Chi-squared test (Rust implementation)

CHAPTER 04

Results & Metrics

Offline evaluation ran on 10,486 historical sends from the prior rule-based system. Thompson sampling reduced simulated cumulative regret by 31% relative to round-robin over the first 200 sends per campaign. By send 150, it had converged to the best variant in 89% of simulated campaigns. Round-robin had not converged by send 200 in any campaign.

Online results from the first three live campaigns: mean open rate on Thompson-selected variant allocations was 19.2% versus 13.7% when using the fixed 50/50 holdout. The auto-promotion path eliminated manual review on 100% of routine campaigns. Only 12% of campaigns were escalated for human review when the margin between top two variants stayed below 0.03 at N=200.

19.2%

Open rate (Thompson vs 13.7% fixed)

4.1%

Click rate (Thompson vs 2.9% fixed)

147

Sends to convergence (median)

31%

Regret reduction (offline sim)

CHAPTER 05

Lessons & Technical Decisions

DECISION · 01

Posterior storage over recomputation. Storing running alpha/beta floats and updating them incrementally cost one additional column write per event but eliminated read contention entirely.

DECISION · 02

Dedup is not optional. Webhook-delivered events are not exactly-once. Any system that uses open or click signals as posterior updates must dedup at the store layer. Inflated alphas cause premature exploitation.

DECISION · 03

Beta-Bernoulli conjugacy scales. For binary outcomes at the scale of cold email, the conjugate model was sufficient. No deep learning required. The infrastructure cost for the entire bandit subsystem was two Postgres columns and one Rust crate.

START A PROJECT

Need something like this?

We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.

Start a Project

Related case studies

AI / Machine Learning

ML Signal Scoring: From 48% Accuracy to a 72% Win Rate Through Architectural Selection

We rebuilt the signal scoring pipeline from scratch, fixing look-ahead contamination and adding a top-decile filter that produced 72.2% win rate on selected signals.

72.2% Win rate (top-decile signals)

Read case study →

AI / Machine Learning

Regime Detection: A 50-Point Win Rate Spread and the System That Learned to Exploit It

We found a 50-percentage-point win rate spread between market regimes, fixed a regime classifier that was routing by symbol name instead of market structure, and built a live suppression system for anti-patterns.

62.1% Win rate in choppy regime

Read case study →

AI / Machine Learning

Correlation Engine: Real-Time NxN Correlation and Cluster Detection Across 1,200 Symbols

We built a Rust correlation engine processing 1,200 symbols with incremental sliding window updates at 340ms p95 per cycle, 14x faster than full recompute.

1,200 Symbols in correlation matrix

Read case study →

Start a Project

AI / Machine LearningSTATUS: PRODUCTIONSHIPPED IN 1 WEEK

Thompson Sampling for Outbound Copy Variant Selection

We replaced fixed 50/50 email A/B splits with Thompson sampling over Beta distributions, cutting sends to convergence from never-observed to 147 median and raising open rate from 13.7% to 19.2%.

19.2%

Open rate (Thompson vs 13.7% fixed)

4.1%

Click rate (Thompson vs 2.9% fixed)

147

Sends to convergence (median)

31%

Regret reduction (offline sim)

Rust 1.84 (statrs 0.17)PostgreSQL (variant_stats)Redis (webhook queue)Chi-squared test (Rust implementation)

CHAPTER 01

Problem & Context

CHAPTER 02

Approach & Architecture

ARCHITECTURE OVERVIEW

INGEST

Rust 1.84 (statrs 0.17)

FEATURES

PostgreSQL (variant_stats)

TRAIN

Redis (webhook queue)

v1 / v2 / v3

eval loop

SERVE

Chi-squared test (Rust implementation)

Production predictions feed back into training set. Continuous retraining cadence

19.2% Open rate (Thompson vs 13.7% fixed)4.1% Click rate (Thompson vs 2.9% fixed)147 Sends to convergence (median)

CHAPTER 03

Implementation

TECH STACK

Rust 1.84 (statrs 0.17)PostgreSQL (variant_stats)Redis (webhook queue)Chi-squared test (Rust implementation)

CHAPTER 04

Results & Metrics

19.2%

Open rate (Thompson vs 13.7% fixed)

4.1%

Click rate (Thompson vs 2.9% fixed)

147

Sends to convergence (median)

31%

Regret reduction (offline sim)

CHAPTER 05

Lessons & Technical Decisions

DECISION · 01

Posterior storage over recomputation. Storing running alpha/beta floats and updating them incrementally cost one additional column write per event but eliminated read contention entirely.

DECISION · 02

DECISION · 03

START A PROJECT

Need something like this?

We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.

Start a Project

Related case studies

AI / Machine Learning

ML Signal Scoring: From 48% Accuracy to a 72% Win Rate Through Architectural Selection

We rebuilt the signal scoring pipeline from scratch, fixing look-ahead contamination and adding a top-decile filter that produced 72.2% win rate on selected signals.

72.2% Win rate (top-decile signals)

Read case study →

AI / Machine Learning

Regime Detection: A 50-Point Win Rate Spread and the System That Learned to Exploit It

62.1% Win rate in choppy regime

Read case study →

AI / Machine Learning

Correlation Engine: Real-Time NxN Correlation and Cluster Detection Across 1,200 Symbols

We built a Rust correlation engine processing 1,200 symbols with incremental sliding window updates at 340ms p95 per cycle, 14x faster than full recompute.

1,200 Symbols in correlation matrix

Read case study →