This content originally appeared on DEV Community and was authored by Matheus Gomes
What Is System Design and Why It’s Valuable
System design is the process of planning how different parts of a software system work together: the architecture, components, data flow, and how everything scales or recovers from failure.
It aims to make sure your system:
Works correctly (meets functional requirements)
Performs efficiently and reliably (meets non-functional requirements like scalability, latency, and fault tolerance)
Why It’s Valuable
Team Growth: Clear boundaries let multiple teams develop without interfering.
Traffic Growth: Plan for scaling so your app doesn’t crash under load.
Risk Reduction: Identify and eliminate bottlenecks or single points of failure.
Cost Efficiency: Optimize infrastructure to save money at scale.
Reliability: Design for uptime—your users expect it.
Separating Out the Database
When you begin, you might have your app and database all on one machine.
But soon, as users grow, you’ll need to separate them.
Example
Imagine a simple blog app:
Your code runs on a web server (for example, Node.js or Python/Django).
It stores posts in a database (e.g., PostgreSQL).
By running the database separately, you can:
Scale your web servers independently.
Back up the database securely.
Use different database technologies for different needs.
In production, databases often run on their own managed services, like Amazon RDS or Google Cloud SQL.
Vertical Scaling (Scaling Up)
Vertical scaling means upgrading your current machine, adding more CPU, memory, or faster SSDs.
Example
You start with:
t2.micro: 1 CPU, 1 GB RAM
Traffic grows, so you upgrade to:
t2.large: 4 CPUs, 16 GB RAM
Pros
Simple to implement, often no code changes required.
Low latency and fast in-memory performance.
Cons
Costs rise quickly.
Machine size has physical limits.
One failure can take down the whole system.
Use vertical scaling when:
You’re starting out.
Your app doesn’t yet need multiple servers.
Horizontal Scaling (Scaling Out)
Horizontal scaling means adding more machines instead of upgrading one.
It’s like adding more waiters to a busy restaurant instead of hiring one superhuman waiter.
Example
You start with:
- 1 web server handling all requests.
When traffic increases:
- Add more servers.
A load balancer will distribute requests among them.
Load Balancer
A Load Balancer (LB) spreads requests evenly across several servers.
How It Works
Client → LB
LB → Sends request to the least busy server
Server responds → LB → Client
LB Responsibilities
Distribute traffic
Check server health
Terminate SSL/TLS
Remove bad servers from rotation
Example
AWS users might use Elastic Load Balancing (ELB).
In local setups, you might try NGINX or HAProxy.
Benefits
Seamless scaling by adding/removing servers.
Zero-downtime updates using rolling deployments.
Stateless Services
A stateless service means it doesn’t remember anything between requests.
All data or sessions are stored elsewhere (like a database or cache).
Example
Imagine a shopping cart:
Stateful: Stored in web server memory. If that server dies, cart is gone.
Stateless: Cart stored in a database or Redis. Any server can respond.
Benefits
Easy to scale horizontally.
Increased fault tolerance.
Updates and deployments are simpler.
Serverless
Serverless computing means you write functions, not servers.
Cloud providers run them on demand.
Example
You upload a photo → this triggers a Lambda function that stores it in S3 and updates a database.
You don’t manage infrastructure, you only pay per execution.
Pros
Zero infrastructure management.
Scales instantly.
You pay only when your code runs.
Cons
Startup delay (cold starts).
Harder debugging and monitoring.
Time and memory limits.
Serverless is ideal for:
Event-driven apps.
APIs with unpredictable traffic.
Lightweight background jobs (e.g., sending emails).
Scaling the Databases
Databases are often the hardest to scale, since they hold state.
Strategies
1. Read Replicas
Use additional servers for read operations, so the main database focuses on writes.
Example:
A news website can serve millions of readers using read replicas, while journalists write only to the primary database.
2. Caching
Store frequently accessed data in memory.
This reduces database load.
Example:
Instead of repeatedly querying SELECT * FROM product WHERE id=123, cache it for 10 minutes.
3. Sharding (Partitioning)
Split large datasets into smaller parts by a chosen key.
Example:
Shard 1: Users 1–1 million
Shard 2: Users 1–2 million
Benefits:
Boosts throughput and storage.
Avoids single DB bottlenecks.
Challenges:
Harder migrations.
Managing cross-shard queries.
4. Connection Pooling
Limit DB connections by having a shared pool (e.g., pgbouncer).
This avoids a DB overload when many app servers connect at once.
5. CQRS (Command Query Responsibility Segregation)
Separate read and write operations into different models:
Commands: Insert, update.
Queries: Fetch data, often denormalized.
This enables independent optimization and scaling.
6. Multi‑Region Setup
Replicate data across regions to reduce latency and improve resilience.
Example:
Users in Brazil read/write from the São Paulo region, while users in Germany use Frankfurt.
Failover Strategies
When something fails (and it will) your system must recover automatically.
Below are standard failover patterns, from cheapest to most resilient:
Cold Standby
Backup system exists but is turned off.
Restored manually from backups.
RTO: Hours
Cost: Low
Example: Archive systems or staging environments.
Warm Standby
Partially active backup that receives continuous data updates.
Scaled up on demand during failure.
RTO: Minutes
Cost: Medium
Example: E-commerce store backups.
Hot Standby
- Fully provisioned clone, continuously updated and ready to take traffic.
RTO: Seconds
Cost: High
Example: Critical financial or healthcare systems.
Multi‑Primary (Active‑Active)
Multiple regions serve traffic simultaneously.
Requires bidirectional replication and conflict handling.
Fastest recovery and lowest latency
Hardest to manage due to data conflicts
Example:
A global chat app — EU users connect to the EU data center, US users to the US, both stay synchronized.
Putting It All Together (A Growth Journey)
Stage | What You Add | Purpose |
---|---|---|
![]() |
Single server, vertical scaling | Simple and low-cost setup |
![]() |
Separate database, stateless app | Better reliability and maintainability |
![]() |
Load balancer with multiple servers | Handles more traffic |
![]() |
Caching, read replicas, sharding | Reduces load on the main database |
![]() |
Failover mechanisms, automation | Increases uptime and resilience |
![]() |
Multi-region deployment, global monitoring | Supports global traffic and quick recovery |
Key Takeaways
System design = trade‑offs under constraints.
Start small, evolve realistically — don’t over‑engineer early on.
Stateless design + separate databases unlock horizontal scaling.
Database scaling = replicas + caching + sharding + pooling.
Failover design ensures reliability during disasters.
Evolve incrementally — track performance, failure rates, and cost.
This content originally appeared on DEV Community and was authored by Matheus Gomes