Zero-Downtime Database Migration: The Definitive Guide

September 7, 2025

This content originally appeared on DEV Community and was authored by Arijit Ghosh

Migrating a production database is like replacing the engine of a plane mid-flight — users keep using your app while you swap the core infrastructure beneath it. If you mess up, downtime, data corruption, or revenue loss are almost guaranteed.

The gold standard: zero downtime migration (ZDM).

This post covers all major migration scenarios:

SQL → NoSQL
NoSQL → SQL
SQL → SQL
On-Premises → Cloud
Homogeneous Upgrades
Database Consolidation

We’ll break down:

Why you’d migrate
Phased strategy
Tools, patterns, and failure handling
Code + config examples
Operational playbook (monitoring, rollback, validation)

Phases of a Zero-Downtime Migration

Every migration (regardless of DB type) follows the same 5-phase flow:

Preparation & Planning

Schema mapping (source vs target).
Choosing tools (CDC, replication, ETL).
Rollback plan.
SLA definition: how much latency is acceptable?

Bulk Load (Historical Data)

Snapshot existing data into the target DB.
Use ETL tools (pgloader, Spark, Dataflow).
Validate counts & checksums.

Change Data Capture (CDC)

Stream live changes (inserts, updates, deletes) to the new DB.
Keep old and new in sync.
Tools: Debezium, AWS DMS, GoldenGate, Kafka Connect.

Dual Writes (Optional but safer)

Application writes to both old & new DBs.
Protects against CDC lag/missed events.
Usually controlled via feature flags.

   def insert_user(user):
       # Old DB
       pg_conn.execute("INSERT INTO users (id, name) VALUES (%s, %s)", (user.id, user.name))

       # New DB
       cassandra_conn.execute("INSERT INTO users (id, name) VALUES (?, ?)", (user.id, user.name))

Cutover & Verification

Switch read traffic to new DB.
Run validation jobs (row counts, checksums, business queries).
Keep old DB in sync until rollback window closes.

SQL → NoSQL (PostgreSQL → Cassandra)

Why

Scalability → Handle billions of records across commodity servers.
High write throughput → IoT sensors, clickstreams, logging.
Flexible schema → User profiles, metadata that evolves.

Migration Architecture

Bulk Load

Export from PostgreSQL → load into Cassandra.
Example: Spark job reading from JDBC + writing to Cassandra.

   from pyspark.sql import SparkSession

   spark = SparkSession.builder.appName("pg2cassandra").getOrCreate()

   pg_df = spark.read \
       .format("jdbc") \
       .option("url", "jdbc:postgresql://pg-host/mydb") \
       .option("dbtable", "users") \
       .option("user", "postgres") \
       .option("password", "secret") \
       .load()

   pg_df.write \
       .format("org.apache.spark.sql.cassandra") \
       .options(table="users", keyspace="app") \
       .save()

CDC with Debezium + Kafka → Cassandra Sink

PostgreSQL WAL logs → Debezium → Kafka topic → Cassandra Sink Connector.
Ensures all updates flow in real-time.

Dual Writes

New API writes to both databases.
Read traffic still on PostgreSQL.

Read Cutover

Gradually shift reads to Cassandra behind a feature flag.
Use A/B rollout → 5% traffic, 20%, 50%, 100%.

Validation

Count check:

 SELECT COUNT(*) FROM users; -- Postgres

 ```cql
 SELECT COUNT(*) FROM users; -- Cassandra
 ```

Sampled data comparison with hash functions.

Failure Handling

If Cassandra lags (CDC not caught up), keep PostgreSQL as the source of truth until caught up.
If data corruption detected → rollback reads to PostgreSQL instantly (via feature flag).

NoSQL → SQL (MongoDB → MySQL)

Why

Transactions (ACID) → Payments, banking.
Complex queries → JOINs, reporting, analytics.
Data integrity → Strong schemas, constraints.

Migration Architecture

Schema Design

MongoDB nested docs → multiple relational tables.

Example:

 {
   "user_id": 1,
   "name": "Alice",
   "orders": [
     { "id": 101, "amount": 50 },
     { "id": 102, "amount": 75 }
   ]
 }

→

 CREATE TABLE users (user_id INT PRIMARY KEY, name VARCHAR(255));
 CREATE TABLE orders (id INT PRIMARY KEY, user_id INT, amount DECIMAL(10,2));

Bulk Migration

Use mongoexport + mysqlimport, or Spark ETL job.

   mongoexport --db app --collection users --out users.json

   import json, pymysql

   conn = pymysql.connect(host="mysql", user="root", password="secret", db="app")

   with open("users.json") as f:
       for line in f:
           doc = json.loads(line)
           conn.cursor().execute("INSERT INTO users (user_id, name) VALUES (%s, %s)", (doc["user_id"], doc["name"]))

Dual Writes

Application writes to MongoDB and MySQL.
Ensure transaction ordering is preserved.

Read Cutover & Validation

Gradually switch analytics/reporting → MySQL.
Eventually shift entire app.

Failure Handling

Mongo → SQL mismatches (nulls, missing keys).
Use strict ETL validation: enforce schema during migration.

SQL → SQL (Oracle → PostgreSQL)

Why

Cost reduction → Oracle licenses are \$\$\$.
Modern features → JSONB, extensions in PostgreSQL.
Vendor freedom → Avoid lock-in.

Migration Architecture

Schema Conversion

Use AWS Schema Conversion Tool (SCT) or ora2pg.

   ora2pg -c config/ora2pg.conf -t TABLE -o schema.sql

Bulk Migration

Use pgloader:

 LOAD DATABASE
   FROM oracle://user:pass@oracle-db/mydb
   INTO postgresql://user:pass@pg-db/mydb
 WITH include drop, create tables, create indexes, reset sequences;

CDC / Replication

Oracle GoldenGate → Kafka → PostgreSQL.

Cutover

DNS/application config switch after lag = 0.

Failure Handling

Schema incompatibility (e.g., Oracle NUMBER vs PostgreSQL NUMERIC).
Performance regressions (query plans differ).
Fix with query rewrites, indexes, or caching.

On-Prem → Cloud (Postgres → RDS)

Why

Elasticity → scale up/down instantly.
Managed services → backups, patching handled by AWS/GCP.
Opex over Capex → Pay-as-you-go vs hardware.

Migration Architecture

Snapshot Load

Dump on-prem DB:

 pg_dump -Fc mydb > mydb.dump
 aws s3 cp mydb.dump s3://migration-bucket/

Restore into RDS:

 pg_restore -h rds-host -U postgres -d mydb mydb.dump

Logical Replication (CDC)

Set up wal2json plugin → stream into RDS.
Or use AWS DMS for ongoing replication.

Cutover

Point application DB connection string to RDS.
Use DNS switch for instant failover.

Homogeneous Upgrade (MySQL 5.7 → 8.0)

Why

Security patches.
Performance improvements.
New features (CTEs, window functions, JSON ops).

Migration Architecture

In-Place Upgrade (if supported).
Replication Upgrade (safer):

Configure MySQL 8.0 as replica.
Sync binlogs.
Promote replica → new primary.

   CHANGE MASTER TO
     MASTER_HOST='mysql57',
     MASTER_USER='repl',
     MASTER_PASSWORD='secret',
     MASTER_LOG_FILE='binlog.000001',
     MASTER_LOG_POS=154;
   START SLAVE;

Database Consolidation

Why

Merge systems after acquisitions.
Single source of truth.
Simplify operations.

Migration Architecture

Canonical Schema Design

Define golden schema across merged DBs.

Data Cleaning & Deduplication

ETL pipeline (Airflow, Spark, dbt).
Resolve conflicts (e.g., duplicate customer IDs).

Cutover

Redirect all applications to new consolidated DB.

Observability & Validation

No migration succeeds without visibility. You need:

Lag monitoring → replication delay, CDC offsets.
Data validation jobs → nightly row counts, hash comparisons.
Application metrics → error rates, query latency.
User-level canaries → test user accounts to verify correctness.

Rollback Strategies

Always assume failure. Options:

Read-Only Freeze → stop new writes, replay missed events, retry.
Feature-Flag Flip → route reads/writes back to old DB instantly.
Dual DB Window → keep old DB warm for X days before decommission.

Final Thoughts

Zero-downtime migration is not a one-night project. It’s an iterative engineering effort involving:

Bulk + incremental sync (ETL + CDC).
Dual writes for safety.
Feature flags for cutover.
Continuous validation.
A tested rollback plan.

The actual migration isn’t the hard part — the hard part is:

Handling edge cases (schema mismatch, nulls, encoding issues).
Keeping the business running while you operate two DBs at once.
Convincing management to invest in testing and observability.

But done right → your users won’t even notice that your backend just swapped its heart out mid-flight.

This content originally appeared on DEV Community and was authored by Arijit Ghosh