We built a revision-aware FRED pipeline tracking 63 macro series with 90-day lookback windows, growing coverage from 32 to 63 series in one sprint.
63
FRED series tracked (from 32)
100,422
Total macro_data rows
38 sec
Run time (full 63-series)
14
Revisions detected (15 days)
CHAPTER 01
Regime classification requires macroeconomic ground truth. A signal fired into a rising-rate environment where the yield curve is inverted and unemployment is at a 52-year low means something categorically different from the same signal fired during an easing cycle with credit spreads widening. Without that context, signal confidence scores are noise-adjusted price patterns with no structural anchor.
FRED publishes most major series with a 1-day to 30-day lag, but it also publishes revisions to historical values that can change a prior month's reading by 10 to 30 basis points. A pipeline that only ingests new observations and ignores revisions will, over time, hold stale historical readings that diverge from the official series. That divergence silently biases any regime model trained on the stored data.
We started with a Python pandas-datareader approach. It has no revision detection, no backfill logic for gaps, no per-series freshness tracking, and no rate-limit handling. When FRED returns a 503 during high-traffic periods around major economic releases like payroll Fridays, the entire pandas pipeline fails silently and the stale data persists indefinitely. Coverage at audit time was 32 of 63 critical FRED series, 50.8%. The 31 missing series included GS1 and GS10, TB3MS, BAA10Y, UNRATESA, and ISMNMI.
CHAPTER 02
The production pipeline is a cron-triggered Rust binary that runs daily at 06:00 UTC, two hours after FRED's typical overnight publication window. It reads a priority-ordered config list of 63 series IDs, determines which need updates, fetches only the necessary observations, detects revisions to historical values, and writes to ClickHouse.
Priority tiers determine behavior during partial outages. Tier 1 (Fed rates, employment depth, yield curve: 8 series) fetches synchronously on every run with a 30-second timeout. If a Tier 1 series fails, the run reports an error and retries immediately. Tier 2 (business surveys, housing, disposable income: 10 series) fetches synchronously but failures are tolerated for up to 24 hours before alerting. Tier 3 (regional variants, NIPA sub-components: 45 series) fetches with low priority and failures are not alerted for up to 72 hours.
On subsequent runs we fetch from the last stored observation date minus 90 days. The 90-day lookback is the revision window: FRED typically publishes revisions within 60 days of the initial release, and 90 days gives us a safety margin. A SHA-256 revision hash of the last 12 observation values lets us skip series entirely on days when no data changed, reducing the run from 63 API calls to approximately 5 calls on weekends.
ARCHITECTURE OVERVIEW
SOURCES
Rust 1.84
FRED API v2
TRANSFORM
ClickHouse 26.3
validate + dedup
STORE
Redis 7.2 (revision cache)
partitioned
QUERY
systemd timer
+ cache
CHAPTER 03
The revision detection bug that cost us two days: FRED sometimes publishes an observation with the value . (a literal period) to indicate missing data. Our initial parser treated . as a parse error and skipped the observation entirely. The consequence was that series with sparse publication schedules would have entire years of placeholder values silently omitted, creating gaps in ClickHouse that looked like missing data but were actually FRED's explicit not-available markers.
The fix stored . as a NULL in the ClickHouse value Float64 column and tracked it as a valid observation. The vintage_date is a second timestamp we did not originally capture. FRED's API returns a realtime_start and realtime_end for each observation representing when that specific value was first published and when it was superseded. For backtesting purposes, vintage dating is important: a model trained in 2024 would have seen the first-vintage employment number, not the final-revised number. We added vintage tracking to the schema but have not yet backfilled historical vintages.
The ISMNMI 18-hour staleness incident was caused by FRED returning a 503 on the day ISM published its manufacturing survey results. Our retry logic of 3 attempts at 5-minute intervals was not sufficient to absorb an 18-hour API outage. The fix added jittered retry with up to 4-hour backoff for non-critical series on release days.
TECH STACK
CHAPTER 04
Measured over a 15-day window post-pipeline completion. FRED series tracked grew from 32 to 63. Total macro_data rows reached 100,422. Run time for a full 63-series run is 38 seconds median, dropping to 4 seconds on hash-skip days. Revisions detected and applied were 14 over the 15-day window: six BLS employment revisions, four BEA GDP revisions, and four PCE component revisions. In all 14 cases the new value was within 0.2 percentage points of the prior value, consistent with typical BLS/BEA revision magnitudes. Tier 1 staleness incidents: zero. Tier 2 staleness incidents: one, the ISMNMI 18-hour delay.
63
FRED series tracked (from 32)
100,422
Total macro_data rows
38 sec
Run time (full 63-series)
14
Revisions detected (15 days)
CHAPTER 05
DECISION · 01
Chose the 90-day revision lookback window. The tradeoff: every run fetches up to 90 days of overlap per series, even when no revisions exist. The extra FRED API calls are cheap. What it would cost to skip: undetected historical revisions that silently corrupt the regime model's training data.
DECISION · 02
The revision hash in Redis is a performance optimization, not a correctness mechanism. If Redis is unavailable the pipeline falls back to fetching the full 90-day window for every series. Correctness is maintained. The 38-second run time would increase to roughly 90 seconds.
DECISION · 03
What we would do differently: add vintage tracking from the start. Retrofitting the schema is straightforward; backfilling the historical vintage data requires several thousand API calls per series for a monthly series with 70 years of history. The backfill will be built, but it was not scoped into the initial build.
START A PROJECT
We build fast. Most projects ship in under two weeks. Start with a free 30-minute discovery call.
Start a ProjectWe built a 723M-row market data pipeline ingesting 10 exchanges simultaneously at under 50ms tick-to-storage latency.
723M+ Total rows stored
Read case study →
DataWe migrated 425M rows to ClickHouse and achieved 8x storage compression and 15x faster analytical scans versus our prior QuestDB setup.
723M+ Rows stored
Read case study →
DataWe replaced a Python fan-in that dropped ticks under load with a Rust multi-task aggregator handling 80,000 ticks per second across 10 exchanges at 3.1% CPU.
80K tick/s Peak throughput
Read case study →