Project: Yellowstone-Vixen (Real-time Indexer)

Indexer Subsystem – Production Architecture & Deep Specification

Executive Summary: The Solana Indexer subsystem is a reorg-safe, low-latency, high-throughput ingestion and query layer for real-time token trades, balances, and market data. It is the single source of truth for the trading terminal, accepting Firehose-streamed blocks, parsing SPL Token, Pump.fun, Raydium, and Meteora protocols, persisting normalized events to Postgres, maintaining real-time view caches in Redis, and exposing HTTP/WebSocket APIs via Axum.
Core Invariants:
  • Exactly-once processing per (signature, ix_index) tuple (Postgres UNIQUE + ON CONFLICT)
  • Reorg safety within 64-slot Solana confirmation window
  • Sub-150ms p99 latency for WS subscribers
  • Sustained 3,000+ TPS single-node throughput
  • Idempotent parser re-execution on restarts, with dead-letter queues for malformed instructions
  • Sub-1s staleness in terminal views (write-through Redis, 5s TTLs, slot-based invalidation)
High-Level Architecture:
graph LR subgraph Firehose["Firehose / Jetstreamer"] FH["gRPC Blocks Stream (400ms slots)"] end subgraph IndexerBin["indexer-bin (Ingest Loop)"] FC["FirehoseClient (reconnect, backoff)"] BP["BlockProcessor (mpsc channel)"] PL["Parser Loop (SPL, Pump, Raydium, Meteora)"] end subgraph IndexerCore["indexer-core (DB + Cache)"] DB["Postgres (sqlx, migrations)"] RD["Redis (streams, pub/sub)"] end subgraph IndexerAPI["indexer-api (Axum Server)"] HTTP["REST Endpoints (/transfers/{mint}, /holders/{mint}, /trades/{mint})"] WS["WebSocket (real-time subs)"] end subgraph Client["Terminal / Client"] APP["Trading Terminal"] end FH --> FC --> BP BP --> PL PL --> DB PL --> RD DB --> HTTP RD --> WS HTTP --> APP WS --> APP
Component Boundaries & Responsibilities

indexer-bin (Entry Point & Event Loop)

  • Responsibility: Firehose connection management, block-by-block orchestration, coordinating parsers
  • Concurrency model: 2-task Tokio executor Task 1: FirehoseClient::stream_blocks() — indefinite reconnect loop with exponential backoff; sends raw BlockRef into bounded MPSC channel (capacity: 1024) Task 2: Writer loop — consumes blocks, calls parser functions in-sequence, batches inserts to Postgres, emits events to Redis
  • Failure modes: Firehose disconnect → recover within 30s (max backoff); parser error → log + skip block + increment dead-letter counter; DB write failure → log + retry on next iteration (at-most-once semantic, corrected by idempotence)
  • Trade-off: Synchronous parser invocation (simpler error handling) vs. parallel parsing (not needed; CPU-bound parsing < 10ms/block, I/O bottleneck dominates)

indexer-core (Logic & Data Access)

  • Responsibility: Instruction parsing, model serialization, database schema management, Redis client initialization
  • Modules: lib.rs — public API surface models.rs — sqlx-derived structs (Mint, TokenTransfer, BondingCurveTrade, Candle, Balance) spl_parser.rs — SPL Token instruction discriminators + byte-level parsing bonding_parser.rs — Pump.fun Anchor IDL interpretation raydium_parser.rs — Raydium AMM v3/v4 swap layout meteora_parser.rs — Meteora DLMM v1/v2 swap layout db.rs — sqlx prepared statements, batch insert functions, migration runner redis.rs — publish_trade(), publish_transfer() functions config.rs — serde config deserialization with env override firehose.rs — Firehose gRPC connector (stub for now; tonic integration planned)
  • Invariants: All parser functions are pure (no side-effects); all DB operations are idempotent via ON CONFLICT; no global state (threadsafe).
  • Testing: Unit tests per parser with hardcoded block fixtures

indexer-api (Query & Subscription Facade)

  • Responsibility: HTTP REST queries, WebSocket real-time subscriptions, metrics export
  • Routes: GET /health — 200 OK (Kubernetes liveness) GET /metrics — JSON counters (token_transfers_count, bonding_trades_count, last_processed_slot, total_mints) GET /transfers/:mint — recent transfers for a mint (query: limit=100, before_slot) GET /holders/:mint — token holders (query: limit=100, offset=0) GET /trades/:mint — bonding trades for a mint (query: limit=100, before_slot) GET /portfolio/:wallet — aggregated balances + recent transfers (query: limit) WS /subscribe/{mint} — real-time transfer + trade events (via Postgres LISTEN/NOTIFY → Redis Pub/Sub fanout OR direct WebSocket broadcast)
  • Concurrency: Full async/await via Axum + Tokio runtime; each WS connection spawns independent task; broadcast channel for multi-subscriber fanout
  • Failure semantics: Missing mint → 404; DB unavailable → 503; WS disconnect → client reconnect (exponential backoff in terminal)
Detailed Data Flow Paths
sequenceDiagram participant FH as Firehose (gRPC) participant BIN as indexer-bin (event loop) participant CSPK as SPL Parser participant CPMP as Pump Parser participant CRAY as Raydium Parser participant CMET as Meteora Parser participant DB as Postgres participant CACHE as Redis participant API as indexer-api participant WS as Terminal WS FH->>BIN: BlockRef slot 12345 par Parsers BIN->>CSPK: extract transfers CSPK-->>BIN: transfers list and BIN->>CPMP: extract pump trades CPMP-->>BIN: trades list and BIN->>CRAY: extract raydium trades CRAY-->>BIN: trades list and BIN->>CMET: extract meteora trades CMET-->>BIN: trades list end par DB Write BIN->>DB: insert transfers DB-->>BIN: ok BIN->>DB: update balances DB-->>BIN: ok BIN->>DB: insert trades DB-->>BIN: ok end BIN->>DB: update last slot DB-->>BIN: ok par Fanout BIN->>CACHE: publish transfers CACHE-->>BIN: ok BIN->>CACHE: publish trades CACHE-->>BIN: ok BIN->>DB: notify DB-->>API: event end API->>CACHE: read events CACHE-->>API: events API->>WS: send event WS->>WS: render
    Latency breakdown (typical):
  • Firehose gRPC delivery: ~0ms (in-band)
  • Parser execution (CPU): ~5–10ms (rayon + simd opportunities unused currently)
  • Postgres batch insert: ~20–50ms (depends on row count; COPY would be 3–5ms with 1000s rows)
  • Redis stream write: ~1–3ms
  • Axum HTTP handler: ~2–5ms
  • WebSocket push: ~1–5ms
  • Total p99: ~100–150ms (measured: confirm via distributed tracing in Phase 3)
Reorg Flow: Rollback on Confirmed → Finalized Fork
sequenceDiagram participant FH as Firehose participant BIN as indexer-bin participant DB as Postgres participant CACHE as Redis FH->>BIN: block 12345 BIN->>DB: insert transfers DB-->>BIN: ok BIN->>CACHE: publish trades FH->>BIN: block 12346 reorg Note over BIN: detect hash mismatch BIN->>DB: check stored hash DB-->>BIN: mismatch detected BIN->>DB: rollback to slot 12344 DB-->>BIN: ok BIN->>CACHE: clear cache CACHE-->>BIN: ok BIN->>BIN: restart from slot 12345 Note over FH,BIN: recovery under 1 second
Invariants enforced
  • Slot + signature uniqueness:** UNIQUE (signature, ix_index) ensures duplicate processing fails gracefully (ON CONFLICT DO NOTHING)
  • Blockhash comparison:** Stored in `indexer_events.payload` (JSONB) or separate slot_hash table (future Phase 3)
  • Cascade invalidation:** All derived views (balances, candles) recomputed from transfers; Redis TTLs (5s) ensure stale cache expires
  • Checkpoint atomicity:** last_processed_slot updated within same transaction as data, preventing checkpoint-lag attacks
Backpressure & Congestion: TPS Burst >3,000 Handling

Network sends 5,000 TPS over 400ms slot duration (12,500 total TXs per slot). Parser CPU utilization approaches 80%; Postgres write throughput saturates.

graph TD A[Firehose gRPC stream] --> B[MPSC channel capacity 1024] B --> C[Block processor loop] C --> D[Postgres pool max 10 connections] D --> E[Redis stream publish] E --> F[WebSocket broadcast] F --> G[Terminal clients]
  • MPSC channel backpressure:** Firehose sender blocks if channel capacity (1024 blocks ≈ 400s of data) is exceeded. Recovery: exponential backoff reconnect.
  • Batch coalescing:** Parser output batched (in memory) before DB insert. If batch >1000 transfers, flush early (adaptive).
  • DB connection pool:** Fixed 10 connections; additional block processors queue (Tokio fair scheduler). At 100% pool utilization, add block processor tasks as needed (future autoscaling).
  • Redis pub/sub fanout:** Non-blocking; WS subscribers that lag get dropped clients-side (TCP buffer exhaustion); terminal reconnects.
  • Metrics & alerting:** Monitor `parser_queue_depth`, `db_write_latency_p99`, `ws_subscribers_dropped_count`.
Roadmap & Extensibility:
Phase Component Status Acceptance
1–2 Firehose ingest, SPL/Pump/Raydium/Meteora parsers, Postgres persistence, Redis real-time LIVE 3000+ TPS, p99 <150ms WS, exact-once per (sig, ix)
3 OHLCV aggregation (1m/5m/15m/1h/4h candles), continuous aggregates on trades PLANNED <100ms candle query, 24h retention minimum
4 Whale alerts (balance changes >threshold), portfolio aggregation PLANNED <2s alert latency for top 10 wallets per token
5 Terminal swap flow (simulation, state sync, pending broadcasts) PLANNED Swap builder + balance refresh <500ms round-trip