Cloud Architecture: Designing for Scale on AWS
Technical deep-dive into cloud-native core banking architecture using AWS services, including event sourcing, CQRS, multi-tenancy, and high availability design.
Architecture Principles
Building a core banking platform requires architectural decisions that will shape the product for years. The following principles guide the design of modern, scalable banking systems:
- Cloud-Native: Built for AWS from the ground up, leveraging managed services
- Microservices: Independent, loosely-coupled services that scale independently
- Event-Driven: Asynchronous communication via events for loose coupling and resilience
- API-First: All functionality exposed through well-documented RESTful APIs
- Multi-Tenant: True multi-tenancy with row-level security and data isolation
- Security by Design: Defense in depth with encryption, authentication, and audit logging
Each principle addresses specific banking requirements: cloud-native enables cost efficiency; microservices enable independent scaling during peak loads; event-driven ensures audit trails for compliance; API-first enables ecosystem integration; multi-tenancy enables SaaS economics; security by design satisfies regulators.
Architecture Layers
| Layer | Components | AWS Services |
|---|---|---|
| Presentation | Web Portal, Mobile Apps, Admin Console | CloudFront, S3, Amplify |
| API Gateway | REST APIs, Authentication, Rate Limiting | API Gateway, WAF, Cognito |
| Application | Microservices, Business Logic, Workflows | ECS Fargate, Lambda, Step Functions |
| Integration | Event Bus, Message Queues, Streaming | EventBridge, SQS, MSK (Kafka) |
| Data | Databases, Cache, Search, Data Lake | RDS, DynamoDB, ElastiCache, OpenSearch |
| Analytics | Data Warehouse, BI, ML Models | Redshift, QuickSight, SageMaker |
| Infrastructure | Networking, Security, Monitoring | VPC, IAM, CloudWatch, X-Ray |
The Ledger Engine: Event Sourcing
The ledger is the heart of any banking system. A modern approach uses event sourcing—where every state change is captured as an immutable event.
Why Event Sourcing for Banking
| Concept | Description | Banking Benefit |
|---|---|---|
| Event Store | Append-only log of all state changes | Complete audit trail, regulatory compliance |
| Event Replay | Reconstruct state by replaying events | Point-in-time balances, debugging, recovery |
| Immutability | Events cannot be modified or deleted | Tamper-proof records, legal evidence |
| Temporal Queries | Query state at any historical point | Month-end reporting, dispute resolution |
CQRS (Command Query Responsibility Segregation)
CQRS separates read and write operations for optimal performance:
- Command Side (Write): Processes transactions, validates business rules, emits events
- Query Side (Read): Optimized projections for fast balance queries, statements, reports
- Event Bus: Asynchronous propagation from write to read models
- Multiple Projections: Different views for different use cases (real-time, reporting, analytics)
Ledger Event Example
// Example event types for a banking ledger
{
"eventType": "AccountOpened",
"accountId": "acc_123456",
"tenantId": "tenant_abc",
"timestamp": "2026-01-15T10:30:00Z",
"data": {
"accountType": "CURRENT",
"currency": "EUR",
"ownerId": "cust_789"
}
}
{
"eventType": "DepositReceived",
"accountId": "acc_123456",
"tenantId": "tenant_abc",
"timestamp": "2026-01-15T11:00:00Z",
"data": {
"amount": 1000.00,
"currency": "EUR",
"reference": "SALARY-JAN"
}
}
Event sourcing with CQRS enables 1M+ TPS by separating write path (event append) from read path (pre-computed projections). Real-time balance queries return in less than 10ms regardless of account history length.
Multi-Tenant Architecture
Row-Level Security (RLS)
True multi-tenancy uses database-level tenant isolation:
-- PostgreSQL Row-Level Security example
CREATE POLICY tenant_isolation ON accounts
USING (tenant_id = current_setting('app.current_tenant')::uuid);
-- Every query automatically filtered by tenant
SELECT * FROM accounts WHERE account_type = 'CURRENT';
-- Becomes: SELECT * FROM accounts
-- WHERE account_type = 'CURRENT'
-- AND tenant_id = 'tenant_abc';
Multi-Tenancy Benefits
| Approach | Cost per Customer | Deployment Speed | Upgrade Complexity |
|---|---|---|---|
| Single-Tenant | High (dedicated infra) | Weeks-months | Per-customer upgrades |
| True Multi-Tenant | Low (shared infra) | Hours-days | Single upgrade for all |
AWS Services Architecture
Compute
- ECS Fargate: Containerized microservices, serverless, auto-scaling
- Lambda: Event handlers, integrations, scheduled tasks
- Step Functions: Orchestration for complex workflows (loan origination, KYC)
Data
- RDS PostgreSQL: Primary transactional database with RLS
- DynamoDB: High-throughput NoSQL for session data, rate limiting
- ElastiCache Redis: Session cache, rate limiting, real-time analytics
- OpenSearch: Full-text search, log analytics, transaction search
Security
- WAF: Web application firewall, OWASP rules, rate limiting
- KMS: Key management, customer-managed keys option
- Cognito: User authentication, OAuth 2.0, MFA
- GuardDuty: Threat detection, anomaly monitoring
Performance Specifications
| Metric | Target | Measurement Method |
|---|---|---|
| Peak Throughput | 1M+ TPS | Sustained load test, 1 hour |
| API Latency (p99) | Under 200ms | End-to-end response time |
| Availability | 99.99% | Annual uptime (4.3 min downtime/month) |
| Recovery Time (RTO) | Under 15 minutes | Full system recovery |
| Recovery Point (RPO) | Under 5 minutes | Maximum data loss window |
Security Architecture
Banking-grade security requires defense in depth:
| Layer | Controls | AWS Services |
|---|---|---|
| Network | VPC isolation, security groups, NACLs | VPC, WAF, Shield |
| Transport | TLS 1.3 encryption, certificate management | ACM, CloudFront |
| Application | OAuth 2.0, JWT tokens, rate limiting | Cognito, API Gateway |
| Data | AES-256 encryption at rest, field-level encryption | KMS, RDS encryption |
| Audit | Immutable logs, tamper detection | CloudTrail, CloudWatch |
High Availability Design
99.99% availability requires eliminating single points of failure:
- Multi-AZ Deployment: All services deployed across 3 availability zones
- Database Replication: Synchronous replication with automatic failover
- Load Balancing: Application load balancers distribute traffic across healthy instances
- Auto-Scaling: Automatically add/remove capacity based on demand
- Circuit Breakers: Prevent cascade failures when dependencies fail
- Health Checks: Continuous monitoring with automatic instance replacement
Banking regulations require robust DR capabilities. Target: RTO (Recovery Time Objective) under 15 minutes, RPO (Recovery Point Objective) under 5 minutes. Achieved through multi-region active-passive setup with automated failover.
Event sourcing is ideal for banking. Immutable audit trails, point-in-time queries, and regulatory compliance make event sourcing the natural choice for ledger design.
True multi-tenancy enables SaaS economics. Row-level security provides complete data isolation while sharing infrastructure—enabling 40-50% cost advantage over single-tenant competitors.
AWS provides banking-grade infrastructure. Multi-AZ deployment, managed services, and compliance certifications (SOC 2, ISO 27001) accelerate time-to-market while meeting regulatory requirements.