We replaced a Python fan-in that dropped ticks under load with a Rust multi-task aggregator handling 80,000 ticks per second across 10 exchanges at 3.1% CPU.
80K tick/s
Peak throughput
10
Exchanges simultaneous
0 rows
Tick loss on reconnect
3.1%
CPU aggregate (market open)
CHAPTER 01
Ten crypto and equity exchanges each publish a different WebSocket protocol. Binance sends order book deltas with U/u sequence number bounds and expects the client to maintain local book state. OKX drops connections silently every 30 seconds unless it receives a proprietary text-frame ping, not a WebSocket protocol-level ping frame. KuCoin rotates its endpoint tokens every 30 seconds and requires re-subscribing all channels after each rotation. Kraken uses a channel subscription model with numeric channel IDs that change per connection.
Our first attempt used Python's asyncio and websockets library. At three exchanges the fan-in worked. At seven, JSON parsing overhead in the asyncio event loop began causing event-loop lag spikes during high-volatility sessions. We measured 80ms to 600ms delays on ticks during the August 2025 VIX spike, caused entirely by the Python GIL and the event loop's inability to parallelize JSON decoding across CPU cores while also managing 70+ concurrent coroutines. At 10 exchanges the system dropped ticks rather than delaying them because the Redis write queue fell behind.
The hard constraint: missing ticks is not acceptable for minute-bar construction. A 1-minute bar is wrong if even one high or low tick is absent. For the backtestable dataset we needed 100% completeness over the ingest window, which meant any architecture that could silently drop ticks under load was disqualified.
CHAPTER 02
The final design runs each exchange connector as an independent Tokio async task compiled into a single Rust binary. Independence is the critical property: one exchange's failure, disconnect, or rate-limit event cannot affect any other connector's state or progress. Each task owns its own WebSocket connection, its own local sequence state, and its own Redis connection from the pool.
Data flow: each connector task reads raw wire frames, decodes the exchange-specific payload, normalizes to a canonical struct, and appends to a Redis Stream keyed by exchange:symbol. The canonical tick struct carries symbol, exchange, open, high, low, close, volume, trade count, timestamp at microsecond precision, and a source field. Every downstream consumer reads from this single normalized interface.
The Redis Streams consumer group model gives us one important property that pub/sub does not: consumers can acknowledge messages, so if the persistence worker crashes mid-batch, it re-reads unacknowledged messages from its last committed position. We do not lose ticks; we replay them.
ARCHITECTURE OVERVIEW
SOURCES
Rust 1.84
Tokio 1.40
TRANSFORM
tungstenite 0.21
validate + dedup
STORE
Redis 7.2 Streams
partitioned
QUERY
ClickHouse 26.3
+ cache
CHAPTER 03
The OKX heartbeat bug was the one that took longest to isolate. OKX sends a server-side text message every 30 seconds. The client must respond within 5 seconds or the server drops the connection without sending a WebSocket close frame. The tungstenite library's built-in ping/pong machinery handles protocol-level pings automatically but OKX uses application-layer pings disguised as text messages. We discovered this by instrumenting per-message timestamps on the receive loop and noticing that OKX messages stopped arriving at exactly 30-second intervals during quiet after-hours sessions. During high-volatility sessions, OKX messages arrived frequently enough that the 30-second timer reset implicitly and the disconnect never occurred, masking the bug.
Binance sequence tracking required persisting a per-symbol last_update_id in Redis. Gap recovery discard the local book state, request the depth snapshot REST endpoint, wait for a snapshot response, then replay deltas from that point. This process takes roughly 1.2 seconds per symbol per gap event. In production, 3 to 5 gaps per symbol per trading day are normal during the opening 15 minutes.
KuCoin's token rotation required requesting a new token before expiry and re-subscribing on the new connection while the old connection is still live, then closing the old one only after confirming subscription success. The overlap window prevents a subscription gap during the cutover.
TECH STACK
CHAPTER 04
Measured over a 14-day window across all 10 exchanges simultaneously. The 47 disconnects were distributed across OKX (19), Binance (8), KuCoin (12), and Kraken (8). Zero resulted in data gaps in the ClickHouse bars_1m table, verified by comparing expected bar count per symbol per hour against actual stored count for each hour window during the measurement period.
The Python fan-in implementation peaked at 18,000 ticks per second across 7 exchanges before tick loss appeared. The Rust implementation handles 80,000 ticks per second across 10 exchanges at 3.1% CPU. The rough efficiency factor is 14x throughput per CPU point, attributable to Tokio's zero-cost async model, Rust's JSON parsing speed, and the absence of GIL contention.
80K tick/s
Peak throughput
10
Exchanges simultaneous
0 rows
Tick loss on reconnect
3.1%
CPU aggregate (market open)
CHAPTER 05
DECISION · 01
Chose per-exchange task isolation over a shared event loop. The tradeoff: more code to write. What it gave us: during the period when we were debugging KuCoin token rotation, that connector was crashing every 30 seconds while all other connectors ran normally. In the Python fan-in, that instability would have blocked the shared event loop.
DECISION · 02
Chose Redis Streams over Kafka. The tradeoff accepted: no persistent log replay beyond the MAXLEN window of roughly 12 to 15 minutes of peak-throughput history. If we needed multi-hour replay durability or multi-datacenter consumption, Kafka would be the right choice.
DECISION · 03
The sequence gap detection logic is the most maintenance-intensive part of the system. Binance changes its delta message schema approximately twice per year. The abstraction boundary that saved us is the normalized tick struct: downstream consumers never see exchange-specific fields, so all eight downstream consumers needed zero changes across two Binance API updates since deployment.
START A PROJECT
We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a ProjectWe built a 723M-row market data pipeline ingesting 10 exchanges simultaneously at under 50ms tick-to-storage latency.
723M+ Total rows stored
Read case study →
DataWe migrated 425M rows to ClickHouse and achieved 8x storage compression and 15x faster analytical scans versus our prior QuestDB setup.
723M+ Rows stored
Read case study →
DataWe migrated 425M rows across 43 tables from a CPU-saturating QuestDB deployment to ClickHouse in 6.5 days with zero data loss.
425M+ Rows migrated
Read case study →