This content originally appeared on DEV Community and was authored by Wakeup Flower
1. Multi-Site (Active-Active)
- Description: Full production runs simultaneously in multiple regions.
-
Real-Life Example:
- Global banking system: Customers in New York, London, and Tokyo need access to their accounts at all times.
- If the New York data center fails, London or Tokyo handles transactions without downtime.
RTO/RPO: Near zero (instant failover)
Cost: Very high (you pay for multiple full environments)
Use Case: Mission-critical apps with zero tolerance for downtime, e.g., stock trading platforms, airline reservation systems.
2. Warm Standby
- Description: A smaller-scale version of your environment is running in another region. It can scale up when needed.
-
Real-Life Example:
- E-commerce website: Main site in US-East, warm standby in US-West with minimum servers and database replicas.
- During a disaster in US-East, US-West scales up to handle full traffic.
RTO/RPO: Medium — typically minutes to a few hours
Cost: Medium — you pay for standby resources, not full production
Use Case: Apps that are critical but can tolerate brief downtime, e.g., online stores, internal enterprise applications.
3. Pilot Light
- Description: Minimal critical resources are running; rest of the environment is off but can be launched on-demand.
-
Real-Life Example:
- SaaS analytics platform: Only the database is running in a secondary region.
- During a disaster, application servers, load balancers, and other services are launched quickly.
RTO/RPO: Medium-High — some time required to bring services online
Cost: Low-Medium — only the critical part runs continuously
Use Case: Apps where cost savings are important but faster recovery than backup/restore is needed, e.g., SaaS reporting tools, business intelligence dashboards.
4. Backup & Restore
- Description: Data is backed up; environment is built only when needed.
-
Real-Life Example:
- Archival video content: Stored in S3 with snapshots.
- If the primary site is lost, you restore the content to a new environment, which may take hours or days.
RTO/RPO: High — hours to days; data loss depends on backup frequency
Cost: Low — you only pay for storage, not running instances
Use Case: Non-critical workloads or infrequent access content, e.g., backups, dev/test environments, archival systems.
DR Strategy | RTO | RPO | Cost | Complexity | Best For |
---|---|---|---|---|---|
Multi-Site | Very low | Near zero | High | High | Mission-critical apps, zero downtime |
Backup & Restore | High | Depends on backup | Low | Low | Non-critical workloads, archival data |
Warm Standby | Medium | Low-Medium | Medium | Medium | Critical apps with moderate downtime tolerance |
Pilot Light | Medium-High | Low-Medium | Low-Medium | Medium | Cost-conscious apps needing faster recovery than backup |
This content originally appeared on DEV Community and was authored by Wakeup Flower