| Service | Description |
|---|---|
| API Server | Node.js REST API handling merchant and user requests. Runs on ECS (Fargate) or EC2 behind an ALB. Connects to RDS (PostgreSQL) and Redis. |
| Redis (ElastiCache) | In-memory cache and session store. Used by the API server for rate limiting, session tokens, and short-lived data. |
| Database (RDS PostgreSQL) | Primary persistent store for users, merchants, campaigns, transactions, and redemptions. Multi-AZ enabled in production. |
| Smart Contract Event Processor | Background service that listens to Soroban/Stellar contract events and writes them to the database. Connects to a Stellar RPC endpoint and RDS. |
All services communicate within a private VPC. The ALB is the only public-facing entry point.
Via systemd (EC2):
sudo systemctl restart nova-api
Via ECS (force new deployment):
aws ecs update-service \
--cluster <placeholder-cluster-name> \
--service <placeholder-service-name> \
--force-new-deployment
Via AWS Console:
<placeholder-cluster-name>Via CLI:
aws elasticache test-failover \
--replication-group-id <placeholder-replication-group-id> \
--node-group-id 0001
aws rds reboot-db-instance \
--db-instance-identifier <placeholder-db-instance-id>
aws rds failover-db-cluster \
--db-cluster-identifier <placeholder-db-cluster-id>
terraform/main.tf → backup_retention_period = 7) or via RDS console under Maintenance & backups.aws rds create-db-snapshot \
--db-instance-identifier <placeholder-db-instance-id> \
--db-snapshot-identifier pre-deploy-<date>
Via AWS Console:
DATABASE_URL environment variable to point to the restored instanceVia CLI:
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier nova-restored-<date> \
--db-snapshot-identifier pre-deploy-<date>
<placeholder-incident-slack-channel>, update status page at <placeholder-status-page-url>, assign incident commanderdocs/ops/post-mortems/YYYY-MM-DD-<title>.mdSymptoms: CPU > 80% sustained for 5+ minutes on ECS task or EC2 instance.
Steps:
top or htop to identify the offending processaws ecs update-service \
--cluster <placeholder-cluster-name> \
--service <placeholder-service-name> \
--desired-count <increased-count>
Symptoms: too many connections errors in API logs; RDS DatabaseConnections metric at max.
Steps:
DB_POOL_MAX env var) — reduce if over-provisionedSELECT pid, state, query_start, query FROM pg_stat_activity WHERE state = 'idle';
Symptoms: evicted_keys metric rising in ElastiCache; cache miss rate increasing.
Steps:
maxmemory-policy in ElastiCache parameter group — set to allkeys-lru for general cachingredis-cli --bigkeysSymptoms: On-chain events not reflected in the database; event processor CloudWatch metric blocks_behind increasing.
Steps:
/nova/event-processor)curl <placeholder-stellar-rpc-url>/health
# Set START_BLOCK env var and restart the processor
aws ecs update-service \
--cluster <placeholder-cluster-name> \
--service <placeholder-event-processor-service> \
--force-new-deployment