Real-time Data Pipeline

· 3 min · snapitanalytics.com

Architecture

Data Sources → Kinesis Streams → Lambda Processors
                                          ↓
                              DynamoDB (hot) + S3 (cold)
                                          ↓
                              React Dashboard + WebSockets

SnapIt Analytics processes web analytics events in real-time. Kinesis handles ingestion, Lambda processes and aggregates, DynamoDB stores hot data, and S3 archives everything for long-term analysis.

Real-time Aggregation

Instead of querying raw events for dashboards, we pre-aggregate metrics hourly using DynamoDB atomic counters. This makes dashboard queries fast regardless of event volume.

// Atomic counter increment - no read-before-write needed
await dynamodb.update({
    TableName: 'metrics_hourly',
    Key: { eventType_hour: `${eventType}_${hour}` },
    UpdateExpression: 'ADD #count :inc',
    ExpressionAttributeNames: { '#count': 'count' },
    ExpressionAttributeValues: { ':inc': 1 }
}).promise();

Hot/Cold Storage

Recent events live in DynamoDB with a 30-day TTL for fast queries. Everything also streams to S3 in JSON format for long-term retention and batch analysis with tools like Athena or QuickSight.

Performance