Four Disaster Recovery (DR) strategies in AWS explained



This content originally appeared on DEV Community and was authored by Wakeup Flower

1. Multi-Site (Active-Active)

  • Description: Full production runs simultaneously in multiple regions.
  • Real-Life Example:

    • Global banking system: Customers in New York, London, and Tokyo need access to their accounts at all times.
    • If the New York data center fails, London or Tokyo handles transactions without downtime.
  • RTO/RPO: Near zero (instant failover)

  • Cost: Very high (you pay for multiple full environments)

  • Use Case: Mission-critical apps with zero tolerance for downtime, e.g., stock trading platforms, airline reservation systems.

2. Warm Standby

  • Description: A smaller-scale version of your environment is running in another region. It can scale up when needed.
  • Real-Life Example:

    • E-commerce website: Main site in US-East, warm standby in US-West with minimum servers and database replicas.
    • During a disaster in US-East, US-West scales up to handle full traffic.
  • RTO/RPO: Medium — typically minutes to a few hours

  • Cost: Medium — you pay for standby resources, not full production

  • Use Case: Apps that are critical but can tolerate brief downtime, e.g., online stores, internal enterprise applications.

3. Pilot Light

  • Description: Minimal critical resources are running; rest of the environment is off but can be launched on-demand.
  • Real-Life Example:

    • SaaS analytics platform: Only the database is running in a secondary region.
    • During a disaster, application servers, load balancers, and other services are launched quickly.
  • RTO/RPO: Medium-High — some time required to bring services online

  • Cost: Low-Medium — only the critical part runs continuously

  • Use Case: Apps where cost savings are important but faster recovery than backup/restore is needed, e.g., SaaS reporting tools, business intelligence dashboards.

4. Backup & Restore

  • Description: Data is backed up; environment is built only when needed.
  • Real-Life Example:

    • Archival video content: Stored in S3 with snapshots.
    • If the primary site is lost, you restore the content to a new environment, which may take hours or days.
  • RTO/RPO: High — hours to days; data loss depends on backup frequency

  • Cost: Low — you only pay for storage, not running instances

  • Use Case: Non-critical workloads or infrequent access content, e.g., backups, dev/test environments, archival systems.

DR Strategy RTO RPO Cost Complexity Best For
Multi-Site Very low Near zero High High Mission-critical apps, zero downtime
Backup & Restore High Depends on backup Low Low Non-critical workloads, archival data
Warm Standby Medium Low-Medium Medium Medium Critical apps with moderate downtime tolerance
Pilot Light Medium-High Low-Medium Low-Medium Medium Cost-conscious apps needing faster recovery than backup


This content originally appeared on DEV Community and was authored by Wakeup Flower