Real-time Data Pipeline
· 3 min · snapitanalytics.com
Architecture
Data Sources → Kinesis Streams → Lambda Processors
↓
DynamoDB (hot) + S3 (cold)
↓
React Dashboard + WebSockets
SnapIt Analytics processes web analytics events in real-time. Kinesis handles ingestion, Lambda processes and aggregates, DynamoDB stores hot data, and S3 archives everything for long-term analysis.
Real-time Aggregation
Instead of querying raw events for dashboards, we pre-aggregate metrics hourly using DynamoDB atomic counters. This makes dashboard queries fast regardless of event volume.
// Atomic counter increment - no read-before-write needed
await dynamodb.update({
TableName: 'metrics_hourly',
Key: { eventType_hour: `${eventType}_${hour}` },
UpdateExpression: 'ADD #count :inc',
ExpressionAttributeNames: { '#count': 'count' },
ExpressionAttributeValues: { ':inc': 1 }
}).promise();
Hot/Cold Storage
Recent events live in DynamoDB with a 30-day TTL for fast queries. Everything also streams to S3 in JSON format for long-term retention and batch analysis with tools like Athena or QuickSight.
Performance
- Throughput: 100K+ events/second via Kinesis
- Dashboard latency: <100ms for real-time metrics
- Cost: ~$0.001 per 1K events
- Retention: 30 days hot, unlimited cold storage