MongoDB Basics: Python CRUD, Docker Setup, and DevOps Essentials – ██FR█████ █INTELL███████████

This content originally appeared on Level Up Coding – Medium and was authored by Akash Mahale

MongoDB is a popular open-source, NoSQL database that stores data in a flexible, JSON-like format called BSON. Unlike traditional relational databases (RDBMS), MongoDB doesn’t require a fixed schema, making it ideal for applications where data structures evolve rapidly and are flexible. As a NoSQL database, it excels in handling unstructured or semi-structured data, scaling horizontally, and supporting high-throughput workloads. MongoDB powers multiple use cases, including storing millions of user profiles for social media platforms, managing product catalogs in large e-commerce sites, and storing sensor data in IoT applications, as well as serving location-based content in real-time mapping apps.

In this tutorial, we will start by understanding how MongoDB stores and retrieves data, touch upon its ACID and CAP theorem properties, and explore its NoSQL-oriented design. We will then set up MongoDB in a Docker container, expose it on localhost, and connect to it using Python, implementing full CRUD (Create, Read, Update, Delete) operations. Finally, we go beyond CRUD and touch on practical DevOps considerations such as backups, monitoring, and scaling — giving both a developer’s and a DevOps engineer’s perspective on running MongoDB in production.

MongoDB Primer

How data is stored in MongoDB:

Data is stored as documents inside collections (collection ≈ tables, document ≈ rows in RDBMS).
Document format: key-value pairs, nested objects, arrays.


{
 "name": "Virtus",
 "manufacturer": "Volkswagen",
 "specs": {
 "top_speed": 200,
 "fuel": "petrol"
 },
 "features": ["sedan", "5-star safety"]
}

ACID and CAP theorem:

ACID is an acronym that stands for atomicity, consistency, isolation, and durability. Together, ACID properties ensure that a set of database operations (grouped in a transaction) leave the database in a valid state, even in the event of unexpected errors. MongoDB is fully ACID-compliant.

The CAP theorem says a distributed system can only guarantee two of the following three:
1. Consistency — All clients see the same data at the same time.
2. Availability — Every request gets a response (it might not be the latest data).
3. Partition tolerance — The system works even if network partitions occur between nodes.

CAP Theorem in MongoDB:

MongoDB in a replica set environment is a horizontally scalable system and supports Consistency and Partition tolerance.
1. Consistency: Primary node serves the latest writes.
2. Partition Tolerance: Even if some nodes can’t communicate, MongoDB still ensures correctness.
3. Availability: This is the tradeoff; if the primary node goes down and the election is in progress, writes may be paused for a short time.

Example:
1. Imagine a MongoDB replica set storing a cars collection.
2. The primary node records that a car named Virtusis sold.
3. A network issue isolates one secondary node. To ensure consistency, MongoDB prevents the secondary from serving stale “unsold” data to clients until it catches up.
4. The partitioned secondary node won’t serve outdated data unless you explicitly allow read from secondaries. Once the network heals, MongoDB syncs changes back automatically. Thus ensuring partition tolerance.
5. This means for a few seconds during the partition, availability might drop — but you never risk selling the same car twice.

Let us now start with the Python tutorial.

MongoDB CRUD operations with Docker and Python:

1. Pull the MongoDB Image

Pull the MongoDB Docker image from Docker Hub

docker pull mongo:6.0

Output:
6.0: Pulling from library/mongo
a3be5d4ce401: Pull complete
bfbb7983a832: Pull complete
11cfdf60aef1: Pull complete
a1f82b251ca3: Pull complete
d24acb38257d: Pull complete
98fc5242ccda: Pull complete
e0463f09bf7c: Pull complete
db0e1dc18f7d: Pull complete
Digest: sha256:363f5fab76d1616de3ce3b0228159126939f7ceb213c357f53cc2c0611e46cc5
Status: Downloaded newer image for mongo:6.0
docker.io/library/mongo:6.0

2. Run MongoDB Container on localhost

Run the Docker image as a container on localhost (127.0.0.1). The default port of MongoDB is 27017

docker run -d \
  --name my-mongo \
  -p 27017:27017 \
  -e MONGO_INITDB_ROOT_USERNAME=admin \
  -e MONGO_INITDB_ROOT_PASSWORD=secret \
  mongo:6.0

Output:
e5ac514ff33c3540b2f1d748a7e7baf8dc1e93db54107688214960c4b0f01e20

docker ps -a

Output:
CONTAINER ID   IMAGE       COMMAND                  CREATED          STATUS          PORTS                      NAMES
e5ac514ff33c   mongo:6.0   "docker-entrypoint.s…"   31 seconds ago   Up 30 seconds   0.0.0.0:27017->27017/tcp   mongo-tutorial

3. Python code for CRUD operation on MongoDB

A simple Python code that covers CRUD operations and an aggregation operation on MongoDB. The MongoClient connects to the MongoDB container on localhost and the CRUD operations.

Install pymongo Python package before executing the code

pip3 install pymongo

Run the above code, or comment specific function that you do not want to execute.

python3 mongocrud.py

Car inserted with ID: 689f1ec151b576aeed0ccc46
Found 1 cars
All cars: [{'_id': ObjectId('689f1ec151b576aeed0ccc46'), 'name': 'Virtus', 'manufacturer': 'Volkswagen', 'specs': {'top_speed': 200, 'fuel': 'petrol'}, 'features': ['sedan', '5-star safety']}]
Projected cars: [{'_id': ObjectId('689f1ec151b576aeed0ccc46'), 'name': 'Virtus', 'manufacturer': 'Volkswagen'}]
Matched: 1, Modified: 1
Cars by fuel type: [{'_id': 'petrol', 'count': 1}]
Deleted: 1

DevOps Best Practices for MongoDB

1. Disk & Storage Best Practices

Use SSDs for low latency and high IOPS. HDDs will choke under write-heavy workloads.
Separate data & logs to different disks if possible (/data/db and /var/log/mongodb).
Disk Capacity Rule: Keep disk usage < 70% to avoid performance degradation and allow for replication lag recovery.

2. Backups

MongoDB hosted on Virtual Machines:
1. Use mongodump for logical backup and push the dump to the Storage bucket
2. Take disk snapshots periodically for faster point-in-time recovery
MongoDB hosted on Kubernetes
1. Take periodic Persistent Volume snapshots and store those in the Storage bucket
2. Restore by creating a new Persistent Volume Claim (PVC) from that snapshot

3. Horizontal Scaling

Sharding :
1. Distribute data across shards based on a shard key (e.g., manufacturer for cars database).
2. Use a hashed shard key for write-heavy workloads to avoid hot spots.
3. Place shards across cloud availability zones for resilience.
Replica Sets for HA:
1. Keep a minimum of 3 nodes (1 primary, 2 secondaries).
2. Secondary reads can offload simple queries.

4. Monitoring Key Performance Indicators (KPIs)

To know when to scale or troubleshoot issues, we need to keep an eye on the following KPIs.

KPIs for MongoDB monitoring

5. IOPS Bottlenecks:

Symptoms: Slow queries, replication lag, and high disk queue depth.

Causes:
1. Poor indexes cause a full scan of the full data
2. Write-heavy workloads are hitting a single shard/partition.
3. Disk space is full, especially the HDDs' performance is deteriorated.

Fixes:
1. Use appropriate indexes for queries.
2. Distribute writes via sharding.
3. Move to optimally sized SSDs

6. Scaling MongoDB:

Consider scaling MongoDB when:

CPU consistently > 70–75%.
Working set (indexes and data volumes) exceeds available RAM (causing high disk reads).
Disk IOPS are at 75-80% capacity.
Replication lag is growing under normal workloads.
Query response times are increasing despite query optimization.

7. Using MongoDB Managed Services

Using managed MongoDB services like MongoDB Atlas or Azure Cosmos DB for Mongo has significant advantages over managing the MongoDB cluster on VMs or Kubernetes.
They have built-in security features like TLS, encryption at rest, firewall rules for network whitelisting, etc.
They also support automated backups, point-in-time restore, and come with premium support and SLA backed performance.

Conclusion:

In this blog, we covered what MongoDB is, how it stores data, and its use cases. We explored the ACID and the CAP theorems with simple analogies. We covered an interactive hands-on tutorial by running MongoDB locally in Docker, and built a Python CRUD operations code from scratch. On the DevOps side, we discussed backups, scaling strategies, Kubernetes hosting, and key metrics to monitor the database performance. With the right setup and operational best practices, MongoDB can scale effortlessly while staying reliable, serving the best for our use cases. That's it from this blog of mine. Suggestions are most welcome.

MongoDB Basics: Python CRUD, Docker Setup, and DevOps Essentials was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This content originally appeared on Level Up Coding – Medium and was authored by Akash Mahale