This content originally appeared on DEV Community and was authored by Arijit Ghosh
Migrating a production database is like replacing the engine of a plane mid-flight — users keep using your app while you swap the core infrastructure beneath it. If you mess up, downtime, data corruption, or revenue loss are almost guaranteed.
The gold standard: zero downtime migration (ZDM).
This post covers all major migration scenarios:
- SQL → NoSQL
- NoSQL → SQL
- SQL → SQL
- On-Premises → Cloud
- Homogeneous Upgrades
- Database Consolidation
We’ll break down:
- Why you’d migrate
- Phased strategy
- Tools, patterns, and failure handling
- Code + config examples
- Operational playbook (monitoring, rollback, validation)
Phases of a Zero-Downtime Migration
Every migration (regardless of DB type) follows the same 5-phase flow:
- Preparation & Planning
- Schema mapping (source vs target).
- Choosing tools (CDC, replication, ETL).
- Rollback plan.
- SLA definition: how much latency is acceptable?
- Bulk Load (Historical Data)
- Snapshot existing data into the target DB.
- Use ETL tools (pgloader, Spark, Dataflow).
- Validate counts & checksums.
- Change Data Capture (CDC)
- Stream live changes (inserts, updates, deletes) to the new DB.
- Keep old and new in sync.
- Tools: Debezium, AWS DMS, GoldenGate, Kafka Connect.
- Dual Writes (Optional but safer)
- Application writes to both old & new DBs.
- Protects against CDC lag/missed events.
- Usually controlled via feature flags.
def insert_user(user):
# Old DB
pg_conn.execute("INSERT INTO users (id, name) VALUES (%s, %s)", (user.id, user.name))
# New DB
cassandra_conn.execute("INSERT INTO users (id, name) VALUES (?, ?)", (user.id, user.name))
- Cutover & Verification
- Switch read traffic to new DB.
- Run validation jobs (row counts, checksums, business queries).
- Keep old DB in sync until rollback window closes.
SQL → NoSQL (PostgreSQL → Cassandra)
Why
Scalability → Handle billions of records across commodity servers.
High write throughput → IoT sensors, clickstreams, logging.
Flexible schema → User profiles, metadata that evolves.
Migration Architecture
- Bulk Load
- Export from PostgreSQL → load into Cassandra.
- Example: Spark job reading from JDBC + writing to Cassandra.
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("pg2cassandra").getOrCreate()
pg_df = spark.read \
.format("jdbc") \
.option("url", "jdbc:postgresql://pg-host/mydb") \
.option("dbtable", "users") \
.option("user", "postgres") \
.option("password", "secret") \
.load()
pg_df.write \
.format("org.apache.spark.sql.cassandra") \
.options(table="users", keyspace="app") \
.save()
- CDC with Debezium + Kafka → Cassandra Sink
- PostgreSQL WAL logs → Debezium → Kafka topic → Cassandra Sink Connector.
- Ensures all updates flow in real-time.
- Dual Writes
- New API writes to both databases.
- Read traffic still on PostgreSQL.
- Read Cutover
- Gradually shift reads to Cassandra behind a feature flag.
- Use A/B rollout → 5% traffic, 20%, 50%, 100%.
- Validation
-
Count check:
SELECT COUNT(*) FROM users; -- Postgres
```cql
SELECT COUNT(*) FROM users; -- Cassandra
```
- Sampled data comparison with hash functions.
Failure Handling
- If Cassandra lags (CDC not caught up), keep PostgreSQL as the source of truth until caught up.
- If data corruption detected → rollback reads to PostgreSQL instantly (via feature flag).
NoSQL → SQL (MongoDB → MySQL)
Why
Transactions (ACID) → Payments, banking.
Complex queries → JOINs, reporting, analytics.
Data integrity → Strong schemas, constraints.
Migration Architecture
- Schema Design
- MongoDB nested docs → multiple relational tables.
-
Example:
{ "user_id": 1, "name": "Alice", "orders": [ { "id": 101, "amount": 50 }, { "id": 102, "amount": 75 } ] }
→
CREATE TABLE users (user_id INT PRIMARY KEY, name VARCHAR(255)); CREATE TABLE orders (id INT PRIMARY KEY, user_id INT, amount DECIMAL(10,2));
- Bulk Migration
- Use
mongoexport
+mysqlimport
, or Spark ETL job.
mongoexport --db app --collection users --out users.json
import json, pymysql
conn = pymysql.connect(host="mysql", user="root", password="secret", db="app")
with open("users.json") as f:
for line in f:
doc = json.loads(line)
conn.cursor().execute("INSERT INTO users (user_id, name) VALUES (%s, %s)", (doc["user_id"], doc["name"]))
- Dual Writes
- Application writes to MongoDB and MySQL.
- Ensure transaction ordering is preserved.
- Read Cutover & Validation
- Gradually switch analytics/reporting → MySQL.
- Eventually shift entire app.
Failure Handling
- Mongo → SQL mismatches (nulls, missing keys).
- Use strict ETL validation: enforce schema during migration.
SQL → SQL (Oracle → PostgreSQL)
Why
Cost reduction → Oracle licenses are \$\$\$.
Modern features → JSONB, extensions in PostgreSQL.
Vendor freedom → Avoid lock-in.
Migration Architecture
- Schema Conversion
- Use AWS Schema Conversion Tool (SCT) or
ora2pg
.
ora2pg -c config/ora2pg.conf -t TABLE -o schema.sql
- Bulk Migration
-
Use pgloader:
LOAD DATABASE FROM oracle://user:pass@oracle-db/mydb INTO postgresql://user:pass@pg-db/mydb WITH include drop, create tables, create indexes, reset sequences;
- CDC / Replication
- Oracle GoldenGate → Kafka → PostgreSQL.
- Cutover
- DNS/application config switch after lag = 0.
Failure Handling
- Schema incompatibility (e.g., Oracle NUMBER vs PostgreSQL NUMERIC).
- Performance regressions (query plans differ).
- Fix with query rewrites, indexes, or caching.
On-Prem → Cloud (Postgres → RDS)
Why
Elasticity → scale up/down instantly.
Managed services → backups, patching handled by AWS/GCP.
Opex over Capex → Pay-as-you-go vs hardware.
Migration Architecture
- Snapshot Load
-
Dump on-prem DB:
pg_dump -Fc mydb > mydb.dump aws s3 cp mydb.dump s3://migration-bucket/
-
Restore into RDS:
pg_restore -h rds-host -U postgres -d mydb mydb.dump
- Logical Replication (CDC)
- Set up
wal2json
plugin → stream into RDS. - Or use AWS DMS for ongoing replication.
- Cutover
- Point application DB connection string to RDS.
- Use DNS switch for instant failover.
Homogeneous Upgrade (MySQL 5.7 → 8.0)
Why
Security patches.
Performance improvements.
New features (CTEs, window functions, JSON ops).
Migration Architecture
- In-Place Upgrade (if supported).
- Replication Upgrade (safer):
- Configure MySQL 8.0 as replica.
- Sync binlogs.
- Promote replica → new primary.
CHANGE MASTER TO
MASTER_HOST='mysql57',
MASTER_USER='repl',
MASTER_PASSWORD='secret',
MASTER_LOG_FILE='binlog.000001',
MASTER_LOG_POS=154;
START SLAVE;
Database Consolidation
Why
Merge systems after acquisitions.
Single source of truth.
Simplify operations.
Migration Architecture
- Canonical Schema Design
- Define golden schema across merged DBs.
- Data Cleaning & Deduplication
- ETL pipeline (Airflow, Spark, dbt).
- Resolve conflicts (e.g., duplicate customer IDs).
- Cutover
- Redirect all applications to new consolidated DB.
Observability & Validation
No migration succeeds without visibility. You need:
- Lag monitoring → replication delay, CDC offsets.
- Data validation jobs → nightly row counts, hash comparisons.
- Application metrics → error rates, query latency.
- User-level canaries → test user accounts to verify correctness.
Rollback Strategies
Always assume failure. Options:
- Read-Only Freeze → stop new writes, replay missed events, retry.
- Feature-Flag Flip → route reads/writes back to old DB instantly.
- Dual DB Window → keep old DB warm for X days before decommission.
Final Thoughts
Zero-downtime migration is not a one-night project. It’s an iterative engineering effort involving:
- Bulk + incremental sync (ETL + CDC).
- Dual writes for safety.
- Feature flags for cutover.
- Continuous validation.
- A tested rollback plan.
The actual migration isn’t the hard part — the hard part is:
- Handling edge cases (schema mismatch, nulls, encoding issues).
- Keeping the business running while you operate two DBs at once.
- Convincing management to invest in testing and observability.
But done right → your users won’t even notice that your backend just swapped its heart out mid-flight.
This content originally appeared on DEV Community and was authored by Arijit Ghosh