Real-TimeSTATUS: PRODUCTIONSHIPPED IN PRODUCTION

Building a Multi-Exchange WebSocket Ingest Layer for 1,200+ Trading Pairs

Argus needed a unified market data foundation across the major crypto exchanges.

~186/min

Sustained throughput (per exchange)

93.64M

Rows ingested (OKX alone)

3.7 MB

Resident memory (OKX)

840

Reconnect cycles / 14 hrs (Kucoin)

Rust 1.84Tokio 1.40ClickHouse 24.3WebSocket (RFC 6455)

CHAPTER 01

Problem & Context

Argus needed a unified market data foundation across the major crypto exchanges. Each exchange publishes real-time order book and trade data over WebSocket, but every venue runs a different authentication model, different subscription protocol, and different message envelope. OKX uses a channel-based subscribe/unsubscribe model with token rotation. KuCoin requires a server-issued token before the WebSocket handshake can begin. Binance uses topic strings with a threshold of 1,024 subscriptions per connection. Deribit delivers option chain data as incremental deltas against a running snapshot.

Sourcing data from a single venue was straightforward. Sourcing from ten simultaneously introduced three compounding problems. First, upstream connection instability was unpredictable. KuCoin's infrastructure would close sockets mid-stream; Binance connections would become silent without emitting any close frame. Second, each exchange had a different normalized structure for OHLCV and tick data, requiring per-source deserialization before a shared insert path could be used. Third, ClickHouse insert throughput had a ceiling: bulk inserts are efficient, but per-row async writes at sustained volume would saturate the async buffer if backpressure was not coordinated across all ten feeds.

The broader goal was not just to collect data. The 1,400-feature Argus engine (argus-features, compiled with AVX2 SIMD) and the 5-regime detection pipeline (argus-regime, 8 timeframes) depended on consistent, low-latency bar data. A stale feed silently corrupted downstream signal quality.

---

CHAPTER 02

Approach & Architecture

Ten separate Rust binaries were compiled: argus-ingest-okx, argus-ingest-binance, argus-ingest-kraken, argus-ingest-bybit, argus-ingest-kucoin, argus-ingest-coinbase, argus-ingest-deribit, argus-ingest-gateio, argus-ingest-bitget, and argus-ingest-hyperliquid. Each binary owned exactly one exchange. This decision had three advantages: process-level isolation meant one exchange crash did not bring down the others; per-binary resource profiles could be tuned independently (Binance required 96.7 MB resident due to high pair count; OKX and Kraken ran at 3.7 MB and 4.9 MB respectively); and systemd supervision gave each process an independent restart policy.

Each binary followed the same internal structure. A connection manager task opened the WebSocket, handled the exchange-specific handshake, and emitted raw message bytes into an unbounded Tokio channel. A normalization task consumed from that channel, parsed the exchange-specific JSON envelope, and emitted a typed Bar struct into a second channel. A writer task batched Bar structs and flushed to ClickHouse via HTTP INSERT using the clickhouse-rs crate against port 8123.

ClickHouse schema for argus.bars_1m used Gorilla codec on all price columns, DoubleDelta on the timestamp column, and LowCardinality on the symbol string. ZSTD(3) was applied as the final compression layer. This combination delivered 5 to 10x compression improvement over the previous QuestDB store while keeping p99 query latency in the 20 to 100 ms range versus the prior 100 to 500 ms.

---

ARCHITECTURE OVERVIEW

PRODUCERS

Rust 1.84

Tokio 1.40

STREAM

ClickHouse 24.3

ordered / durable

lag monitor

CONSUMERS

WebSocket (RFC 6455)

Consumer 2

SINK

Storage

ack + replay

~186/min Sustained throughput (per exchange)93.64M Rows ingested (OKX alone)3.7 MB Resident memory (OKX)

CHAPTER 03

Implementation

Backpressure strategy. The unbounded Tokio channel between the WebSocket receiver and the normalizer acted as a pressure valve during burst ingestion. The normalizer applied a configurable batch window (50 ms default) before forwarding to the writer. The writer blocked on ClickHouse INSERT acknowledgment before accepting the next batch, providing end-to-end backpressure. If the writer fell behind due to ClickHouse load, channel depth increased and the normalizer naturally slowed consumption, which in turn reduced the rate at which the socket was drained. This avoided out-of-memory conditions during exchange reconnect floods where buffered messages could arrive in large bursts.

At-least-once semantics and idempotency. The ingest layer operated under at-least-once delivery. Each Bar struct carried a composite key of (symbol, source, ts). ClickHouse's ReplacingMergeTree engine deduplicated rows with identical keys during background merges. On reconnect after a socket drop, the binary re-subscribed from the beginning of the current minute window, which could produce duplicate bars for already-ingested timestamps. The deduplication merge handled this without operator intervention.

Replay capability. For historical backfill, a separate argus-backfill binary was written using the same normalization pipeline but sourced from 7-exchange REST APIs rather than live sockets. The same Bar struct and INSERT path were reused, meaning the backfill and live ingest wrote to the same table with no schema divergence. This allowed the system to backfill gaps (for example, the 25-day Binance staleness window discovered in the April 25 audit) by replaying the REST endpoint for that period without touching any other code path.

Lag monitoring. A dedicated argus-health binary queried ClickHouse every 60 seconds:

Any source with lag exceeding 300 seconds triggered a structured alert via argus-alert-dispatcher. During normal operation, OKX sustained 4 to 5 minute data age and Kraken sustained 1 minute data age. Kucoin and Coinbase exhibited a recurring pattern: 60-second reconnect loops (visible in logs as ERROR: WebSocket closed by server; INFO: Reconnecting in 60s) that produced zero writes despite active process uptime, indicating the connection manager was restoring the socket but the subscription confirmation was failing silently.

Consumer group model. The live ingest binaries did not use a message queue. They wrote directly to ClickHouse. The downstream regime and feature pipelines polled ClickHouse on their own cadence, making the storage layer the implicit broker. This avoided the operational overhead of a Kafka or Pulsar deployment at the cost of some latency precision. The regime pipeline polled every 30 seconds; the feature engine polled every 60 seconds.

---

TECH STACK

Rust 1.84Tokio 1.40ClickHouse 24.3WebSocket (RFC 6455)

CHAPTER 04

Results & Metrics

- Sustained throughput: OKX feed alone produced approximately 186 rows per minute across 1,200 pairs during active market hours, or roughly 3.1 rows per second. All ten feeds combined delivered an estimated 800 to 1,200 rows per minute during peak overlap of US and Asia trading sessions. - Row accumulation: OKX reached 93.64 million rows in argus.bars_1m from the live feed. Kraken accumulated 1.08 million rows. Total bars_1m table size reached 714.4 million rows including historical backfill. - Process memory: OKX ingest ran at 3.7 MB resident. Kraken at 4.9 MB. Coinbase at 8.2 MB. Binance at 96.7 MB due to subscribing across 1,000+ pairs in a single process, which is the primary candidate for per-pair channel sharding in a future revision. - Lag during incidents: During the Kucoin 14-hour staleness window, the process logged 840 reconnect cycles over the period. Each cycle successfully re-established the WebSocket and subscription confirmation, but a database write timeout in the writer task caused all batches to be silently dropped. The lag monitor fired after 5 minutes and was visible in the health dashboard, but no automated remediation was in place at audit time. - p99 ClickHouse insert latency: Async insert buffer flushed every 10 MB or every 200 ms. Under sustained load, p99 flush acknowledgment from ClickHouse was measured at under 80 ms.

---

~186/min

Sustained throughput (per exchange)

93.64M

Rows ingested (OKX alone)

3.7 MB

Resident memory (OKX)

840

Reconnect cycles / 14 hrs (Kucoin)

CHAPTER 05

Lessons & Technical Decisions

DECISION · 01

Silent process death is worse than noisy process death. Binance and Bybit exhibited the most damaging failure mode: processes reported as running via ps aux but produced zero log output and zero database writes for 25 and 35 days respectively. A healthy process emits structured log lines at minimum every 60 seconds, even if only heartbeat records. Adding a mandatory periodic write (a synthetic keep-alive row or a health table ping) would have surfaced these failures within minutes rather than weeks.

DECISION · 02

Exchange authentication diversity is an operational risk. KuCoin's token-based WebSocket authentication required a pre-connection REST call to obtain a short-lived token. When that REST call failed due to rate limiting, the entire binary restarted its connection loop without any data written. Separating the auth renewal lifecycle from the message processing lifecycle would prevent auth failures from blocking ingestion.

DECISION · 03

One binary per exchange was the right call. The audit confirmed that OKX and Kraken ran fresh (under 5 minutes lag) while Binance and Bybit were stale for weeks. If all exchanges had shared a single process, the debugging surface would have been much larger and a crash in one exchange's handler could have disrupted all others. Process isolation made the healthy and broken feeds immediately apparent.

DECISION · 04

ClickHouse codec selection matters at this scale. The Gorilla codec on OHLCV columns, combined with DoubleDelta on timestamps, delivered approximately 6x compression on the bars_1m table compared to raw storage. At 714 million rows, this represented the difference between roughly 140 GB and over 800 GB of storage on the same hardware.

START A PROJECT

Need something like this?

We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.

Start a Project

Related case studies

Real-Time

Designing a Low-Latency Alert Dispatcher for Regime Breaks and Signal Anomalies

Argus generates trading signals continuously across 6 source categories: mean_reversion, novelty_anomaly, regime_caution, trend_following,.

100% Signal sources emitting (6 of 6)

Read case study →

Real-Time

Building a Data Freshness Validation System Across 404+ ClickHouse Tables

By late April 2026, the Argus data layer held over 800 million rows across bars_1m, bars_1d, and downstream tables.

<6 min Detection speed (silent-dead)

Read case study →

Real-Time

Real-Time ClickHouse Queries Powering Live Dashboards and Signal Generation

Argus generated regime signals, novelty anomalies, and trend-following calls from a 1,400-feature engine built on AVX2 SIMD.

3.2x Macro endpoint (1,476ms → 465ms warm)

Read case study →

Start a Project

Real-TimeSTATUS: PRODUCTIONSHIPPED IN PRODUCTION

Building a Multi-Exchange WebSocket Ingest Layer for 1,200+ Trading Pairs

Argus needed a unified market data foundation across the major crypto exchanges.

~186/min

Sustained throughput (per exchange)

93.64M

Rows ingested (OKX alone)

3.7 MB

Resident memory (OKX)

840

Reconnect cycles / 14 hrs (Kucoin)

Rust 1.84Tokio 1.40ClickHouse 24.3WebSocket (RFC 6455)

CHAPTER 01

Problem & Context

---

CHAPTER 02

Approach & Architecture

---

ARCHITECTURE OVERVIEW

PRODUCERS

Rust 1.84

Tokio 1.40

STREAM

ClickHouse 24.3

ordered / durable

lag monitor

CONSUMERS

WebSocket (RFC 6455)

Consumer 2

SINK

Storage

ack + replay

~186/min Sustained throughput (per exchange)93.64M Rows ingested (OKX alone)3.7 MB Resident memory (OKX)

CHAPTER 03

Implementation

Lag monitoring. A dedicated argus-health binary queried ClickHouse every 60 seconds:

---

TECH STACK

Rust 1.84Tokio 1.40ClickHouse 24.3WebSocket (RFC 6455)

CHAPTER 04

Results & Metrics

---

~186/min

Sustained throughput (per exchange)

93.64M

Rows ingested (OKX alone)

3.7 MB

Resident memory (OKX)

840

Reconnect cycles / 14 hrs (Kucoin)

CHAPTER 05

Lessons & Technical Decisions

DECISION · 01

DECISION · 02

DECISION · 03

DECISION · 04

START A PROJECT

Need something like this?

We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.

Start a Project

Related case studies

Real-Time

Designing a Low-Latency Alert Dispatcher for Regime Breaks and Signal Anomalies

Argus generates trading signals continuously across 6 source categories: mean_reversion, novelty_anomaly, regime_caution, trend_following,.

100% Signal sources emitting (6 of 6)

Read case study →

Real-Time

Building a Data Freshness Validation System Across 404+ ClickHouse Tables

By late April 2026, the Argus data layer held over 800 million rows across bars_1m, bars_1d, and downstream tables.

<6 min Detection speed (silent-dead)

Read case study →

Real-Time

Real-Time ClickHouse Queries Powering Live Dashboards and Signal Generation

Argus generated regime signals, novelty anomalies, and trend-following calls from a 1,400-feature engine built on AVX2 SIMD.

3.2x Macro endpoint (1,476ms → 465ms warm)

Read case study →