# Deployment Guide Deploy the analytics platform to production. ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────┐ │ Load Balancer │ └─────────────────────────────────────────────────────────────────┘ │ ┌─────────────────────┼─────────────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ Collector │ │ Collector │ │ Collector │ │ Service │ │ Service │ │ Service │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ └─────────────────────┼─────────────────────┘ │ ▼ ┌───────────────┐ │ Redis │ │ (BullMQ) │ └───────────────┘ │ ┌─────────────────────┼─────────────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ Processor │ │ Processor │ │ Processor │ │ Worker │ │ Worker │ │ Worker │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ └─────────────────────┼─────────────────────┘ │ ▼ ┌───────────────┐ │ PostgreSQL │ │ (TimescaleDB)│ └───────────────┘ │ ▼ ┌───────────────┐ │ API Service │ └───────────────┘ ``` ## Services | Service | Port | Description | |---------|------|-------------| | Collector | 4001 | Event ingestion | | Processor | - | Queue worker (no HTTP) | | API | 4002 | Query endpoints | | Realtime | 4003 | WebSocket server | ## Docker Deployment ### docker-compose.yml ```yaml version: '3.8' services: collector: image: analytics/collector:latest ports: - "4001:4001" environment: - NODE_ENV=production - REDIS_URL=redis://redis:6379 - LOG_LEVEL=info depends_on: - redis deploy: replicas: 3 resources: limits: memory: 512M cpus: '0.5' processor: image: analytics/processor:latest environment: - NODE_ENV=production - REDIS_URL=redis://redis:6379 - DATABASE_URL=postgresql://postgres:password@postgres:5432/analytics - CONCURRENCY=10 depends_on: - redis - postgres deploy: replicas: 2 resources: limits: memory: 1G cpus: '1' api: image: analytics/api:latest ports: - "4002:4002" environment: - NODE_ENV=production - DATABASE_URL=postgresql://postgres:password@postgres:5432/analytics - REDIS_URL=redis://redis:6379 depends_on: - postgres - redis deploy: replicas: 2 resources: limits: memory: 512M cpus: '0.5' realtime: image: analytics/realtime:latest ports: - "4003:4003" environment: - NODE_ENV=production - REDIS_URL=redis://redis:6379 depends_on: - redis deploy: replicas: 2 redis: image: redis:7-alpine volumes: - redis_data:/data command: redis-server --appendonly yes postgres: image: timescale/timescaledb:latest-pg15 environment: - POSTGRES_DB=analytics - POSTGRES_USER=postgres - POSTGRES_PASSWORD=password volumes: - postgres_data:/var/lib/postgresql/data volumes: redis_data: postgres_data: ``` ## Kubernetes Deployment ### Collector Deployment ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: analytics-collector spec: replicas: 3 selector: matchLabels: app: analytics-collector template: metadata: labels: app: analytics-collector spec: containers: - name: collector image: analytics/collector:latest ports: - containerPort: 4001 env: - name: NODE_ENV value: production - name: REDIS_URL valueFrom: secretKeyRef: name: analytics-secrets key: redis-url resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" readinessProbe: httpGet: path: /health port: 4001 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: httpGet: path: /health port: 4001 initialDelaySeconds: 15 periodSeconds: 20 --- apiVersion: v1 kind: Service metadata: name: analytics-collector spec: selector: app: analytics-collector ports: - port: 4001 targetPort: 4001 type: ClusterIP --- apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: analytics-collector-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: analytics-collector minReplicas: 3 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 ``` ## Environment Variables ### Collector Service | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `NODE_ENV` | Yes | - | Environment (production/development) | | `PORT` | No | 4001 | HTTP port | | `REDIS_URL` | Yes | - | Redis connection URL | | `LOG_LEVEL` | No | info | Logging level | | `CORS_ORIGINS` | No | * | Allowed CORS origins | ### Processor Service | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `NODE_ENV` | Yes | - | Environment | | `REDIS_URL` | Yes | - | Redis connection URL | | `DATABASE_URL` | Yes | - | PostgreSQL connection URL | | `CONCURRENCY` | No | 5 | Worker concurrency | | `BATCH_SIZE` | No | 100 | Events per batch | ### API Service | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `NODE_ENV` | Yes | - | Environment | | `PORT` | No | 4002 | HTTP port | | `DATABASE_URL` | Yes | - | PostgreSQL connection URL | | `REDIS_URL` | Yes | - | Redis for caching | | `API_KEYS` | Yes | - | Comma-separated API keys | ## Database Setup ### PostgreSQL with TimescaleDB ```sql -- Create database CREATE DATABASE analytics; -- Enable TimescaleDB CREATE EXTENSION IF NOT EXISTS timescaledb; -- Create tables CREATE TABLE raw_events ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), session_id VARCHAR(64) NOT NULL, user_id VARCHAR(255), event_type VARCHAR(100) NOT NULL, event_action VARCHAR(255) NOT NULL, metadata JSONB DEFAULT '{}', timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); -- Convert to hypertable for time-series optimization SELECT create_hypertable('raw_events', 'timestamp'); -- Create indexes CREATE INDEX idx_raw_events_session ON raw_events(session_id); CREATE INDEX idx_raw_events_user ON raw_events(user_id); CREATE INDEX idx_raw_events_type ON raw_events(event_type); CREATE INDEX idx_raw_events_metadata ON raw_events USING GIN(metadata); -- Aggregated tables CREATE TABLE daily_metrics ( date DATE NOT NULL, metric_name VARCHAR(100) NOT NULL, dimension_key VARCHAR(255), dimension_value VARCHAR(255), value BIGINT NOT NULL DEFAULT 0, PRIMARY KEY (date, metric_name, dimension_key, dimension_value) ); -- Retention policy: keep raw events for 90 days SELECT add_retention_policy('raw_events', INTERVAL '90 days'); ``` ## Nginx Configuration ```nginx upstream collector { least_conn; server collector-1:4001; server collector-2:4001; server collector-3:4001; } upstream api { server api-1:4002; server api-2:4002; } upstream realtime { ip_hash; # Sticky sessions for WebSocket server realtime-1:4003; server realtime-2:4003; } server { listen 443 ssl http2; server_name analytics.example.com; ssl_certificate /etc/ssl/certs/analytics.crt; ssl_certificate_key /etc/ssl/private/analytics.key; # Collector - high throughput location /collect { proxy_pass http://collector; proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # Don't buffer - fast response proxy_buffering off; # Allow large batches client_max_body_size 1m; } # API - standard REST location /api { proxy_pass http://api; proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; # Cache GET requests proxy_cache api_cache; proxy_cache_valid 200 1m; proxy_cache_key "$request_method$request_uri"; add_header X-Cache-Status $upstream_cache_status; } # WebSocket - realtime location /realtime { proxy_pass http://realtime; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header Host $host; # Long-lived connections proxy_read_timeout 86400s; proxy_send_timeout 86400s; } } ``` ## Monitoring ### Health Checks All services expose `/health` endpoint: ```json { "status": "healthy", "version": "1.0.0", "uptime": 86400, "checks": { "redis": "ok", "database": "ok" } } ``` ### Metrics (Prometheus) Services expose `/metrics` endpoint: ``` # Collector metrics analytics_events_received_total{type="engagement"} 1234567 analytics_events_queued_total 1234500 analytics_batch_size_histogram_bucket{le="10"} 50000 # Processor metrics analytics_events_processed_total 1234000 analytics_processing_duration_seconds_bucket{le="0.1"} 1200000 analytics_queue_depth 500 # API metrics analytics_api_requests_total{endpoint="/trends",status="200"} 50000 analytics_api_latency_seconds_bucket{le="0.5"} 49000 ``` ### Grafana Dashboards Import pre-built dashboards from `/dashboards/`: - `collector-metrics.json` - Ingestion throughput - `processor-metrics.json` - Processing performance - `api-metrics.json` - Query latency and errors - `business-metrics.json` - Analytics KPIs ## Scaling Guidelines ### Collector Service - Scale horizontally based on incoming event rate - Target: <100ms p99 response time - Rule of thumb: 1 replica per 10,000 events/minute ### Processor Service - Scale based on queue depth - Target: Queue depth < 1000 - Increase `CONCURRENCY` before adding replicas ### API Service - Scale based on query latency - Target: <500ms p95 for complex queries - Add read replicas to PostgreSQL for heavy read load ### Database - Use TimescaleDB compression for historical data - Partition by month for large deployments - Consider ClickHouse for >1B events/day