🧠 System Design: Foundations, Scaling Strategies, and Resilience Patterns

October 20, 2025

This content originally appeared on DEV Community and was authored by Matheus Gomes

What Is System Design and Why It’s Valuable

System design is the process of planning how different parts of a software system work together: the architecture, components, data flow, and how everything scales or recovers from failure.

It aims to make sure your system:

Works correctly (meets functional requirements)

Performs efficiently and reliably (meets non-functional requirements like scalability, latency, and fault tolerance)

Why It’s Valuable

Team Growth: Clear boundaries let multiple teams develop without interfering.

Traffic Growth: Plan for scaling so your app doesn’t crash under load.

Risk Reduction: Identify and eliminate bottlenecks or single points of failure.

Cost Efficiency: Optimize infrastructure to save money at scale.

Reliability: Design for uptime—your users expect it.

Separating Out the Database

When you begin, you might have your app and database all on one machine.

But soon, as users grow, you’ll need to separate them.

Example

Imagine a simple blog app:

Your code runs on a web server (for example, Node.js or Python/Django).
It stores posts in a database (e.g., PostgreSQL).

By running the database separately, you can:

Scale your web servers independently.
Back up the database securely.
Use different database technologies for different needs.

In production, databases often run on their own managed services, like Amazon RDS or Google Cloud SQL.

Vertical Scaling (Scaling Up)

Vertical scaling means upgrading your current machine, adding more CPU, memory, or faster SSDs.

Example

You start with:

t2.micro: 1 CPU, 1 GB RAM

Traffic grows, so you upgrade to:

t2.large: 4 CPUs, 16 GB RAM

Pros

Simple to implement, often no code changes required.
Low latency and fast in-memory performance.

Cons

Costs rise quickly.
Machine size has physical limits.
One failure can take down the whole system.

Use vertical scaling when:

You’re starting out.
Your app doesn’t yet need multiple servers.

Horizontal Scaling (Scaling Out)

Horizontal scaling means adding more machines instead of upgrading one.

It’s like adding more waiters to a busy restaurant instead of hiring one superhuman waiter.

Example

You start with:

1 web server handling all requests.

When traffic increases:

Add more servers.

A load balancer will distribute requests among them.

Load Balancer

A Load Balancer (LB) spreads requests evenly across several servers.

How It Works

Client → LB
LB → Sends request to the least busy server
Server responds → LB → Client

LB Responsibilities

Distribute traffic
Check server health
Terminate SSL/TLS
Remove bad servers from rotation

Example

AWS users might use Elastic Load Balancing (ELB).

In local setups, you might try NGINX or HAProxy.

Benefits

Seamless scaling by adding/removing servers.
Zero-downtime updates using rolling deployments.

Stateless Services

A stateless service means it doesn’t remember anything between requests.

All data or sessions are stored elsewhere (like a database or cache).

Example

Imagine a shopping cart:

Stateful: Stored in web server memory. If that server dies, cart is gone.
Stateless: Cart stored in a database or Redis. Any server can respond.

Benefits

Easy to scale horizontally.
Increased fault tolerance.
Updates and deployments are simpler.

Serverless

Serverless computing means you write functions, not servers.

Cloud providers run them on demand.

Example

You upload a photo → this triggers a Lambda function that stores it in S3 and updates a database.

You don’t manage infrastructure, you only pay per execution.

Pros

Zero infrastructure management.
Scales instantly.
You pay only when your code runs.

Cons

Startup delay (cold starts).
Harder debugging and monitoring.
Time and memory limits.

Serverless is ideal for:

Event-driven apps.
APIs with unpredictable traffic.
Lightweight background jobs (e.g., sending emails).

Scaling the Databases

Databases are often the hardest to scale, since they hold state.

Strategies

1. Read Replicas

Use additional servers for read operations, so the main database focuses on writes.

Example:

A news website can serve millions of readers using read replicas, while journalists write only to the primary database.

2. Caching

Store frequently accessed data in memory.

This reduces database load.

Example:
Instead of repeatedly querying SELECT * FROM product WHERE id=123, cache it for 10 minutes.

3. Sharding (Partitioning)

Split large datasets into smaller parts by a chosen key.

Example:

Shard 1: Users 1–1 million
Shard 2: Users 1–2 million

Benefits:

Boosts throughput and storage.
Avoids single DB bottlenecks.

Challenges:

Harder migrations.
Managing cross-shard queries.

4. Connection Pooling

Limit DB connections by having a shared pool (e.g., pgbouncer).

This avoids a DB overload when many app servers connect at once.

5. CQRS (Command Query Responsibility Segregation)

Separate read and write operations into different models:

Commands: Insert, update.
Queries: Fetch data, often denormalized.

This enables independent optimization and scaling.

6. Multi‑Region Setup

Replicate data across regions to reduce latency and improve resilience.

Example:

Users in Brazil read/write from the São Paulo region, while users in Germany use Frankfurt.

Failover Strategies

When something fails (and it will) your system must recover automatically.

Below are standard failover patterns, from cheapest to most resilient:

Cold Standby

Backup system exists but is turned off.
Restored manually from backups.

RTO: Hours

Cost: Low

Example: Archive systems or staging environments.

Warm Standby

Partially active backup that receives continuous data updates.
Scaled up on demand during failure.

RTO: Minutes

Cost: Medium

Example: E-commerce store backups.

Hot Standby

Fully provisioned clone, continuously updated and ready to take traffic.

RTO: Seconds

Cost: High

Example: Critical financial or healthcare systems.

Multi‑Primary (Active‑Active)

Multiple regions serve traffic simultaneously.
Requires bidirectional replication and conflict handling.

Fastest recovery and lowest latency

Hardest to manage due to data conflicts

Example:

A global chat app — EU users connect to the EU data center, US users to the US, both stay synchronized.

Putting It All Together (A Growth Journey)

Stage	What You Add	Purpose
Early Start	Single server, vertical scaling	Simple and low-cost setup
Growth Stage	Separate database, stateless app	Better reliability and maintainability
Scaling Stage	Load balancer with multiple servers	Handles more traffic
Data Scaling	Caching, read replicas, sharding	Reduces load on the main database
Reliability	Failover mechanisms, automation	Increases uptime and resilience
Mature System	Multi-region deployment, global monitoring	Supports global traffic and quick recovery