Taming the Whale: What I Learned Debugging Docker for the First Time



This content originally appeared on DEV Community and was authored by Kenechukwu Anoliefo

“It runs on my machine.”

It is the classic developer meme, and for good reason. I recently finished a Machine Learning project—a Water Quality Classifier—that worked perfectly in my Jupyter Notebook. I felt great. I had high accuracy, a tuned Random Forest model, and clean code.

Then, I tried to deploy it with Docker.

What followed was a weekend of “File Not Found” errors, timeouts, and confused staring at the terminal. If you are a data scientist trying to move your models into containers, here are the hard-fought lessons I learned from the trenches.

Lesson 1: The Case of the Invisible Extension

My first hurdle happened before I even started the build. I ran the standard command:

docker build -t water-quality-api .

The Error:
failed to read dockerfile: open Dockerfile: no such file or directory

I stared at my folder. The file was right there! It was named Dockerfile. I created it myself!

The Fix:
It turns out, Windows was “helpfully” hiding the file extension. To my eyes, it looked like Dockerfile. To the computer, it was actually Dockerfile.txt. Docker is strict—it demands a specific filename with NO extension.

I had to use the command line to rename it and strip the extension.

  • Takeaway: Never trust your file explorer. Always run ls or dir to see what your files are actually named.

Lesson 2: When pip Gives Up (The Timeout Issue)

Once Docker actually found the file, it started installing my dependencies. NumPy, Pandas, Scikit-Learn… these are heavy libraries.

Suddenly, the build crashed with red text everywhere.

The Error:
ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.

My internet connection fluctuated for a second, and pip (the Python installer) decided to quit. By default, if a download takes too long, it just fails the whole build.

The Fix:
I learned a new flag that saved my life: --default-timeout. I updated my Dockerfile to tell pip to be patient:

RUN pip install --default-timeout=1000 --no-cache-dir -r requirements.txt

This gives the installer over 16 minutes to finish a download before quitting.

Lesson 3: Docker “Save Points” (Layering)

Even with the timeout fix, my builds were agonizing. I had all my libraries listed in one requirements.txt file.

This meant that if the installation failed on the last library (Matplotlib), Docker would scrap the entire process. When I tried again, it had to re-download NumPy and Pandas from scratch. It was like playing a video game with no save points.

The Fix:
I restructured my Dockerfile to install libraries one by one (or in small groups).

# Installing libraries individually creates "cached layers"
RUN pip install --no-cache-dir numpy==1.26.4
RUN pip install --no-cache-dir pandas==2.2.2
RUN pip install --no-cache-dir scikit-learn==1.5.0

In Docker, every line is a “layer.” If the build fails on Scikit-Learn, Docker remembers that NumPy and Pandas are already done. When I ran the build command again, it zoomed past the first two steps and resumed exactly where it left off.

The Victory Lap

After fixing the filename, increasing the timeout, and layering the build, I finally saw the magic word: FINISHED.

I started the container and sent a curl request to my local port.

{
  "water_quality_prediction": 2
}

It wasn’t just a number. It was proof that my model was no longer trapped in a notebook on my laptop. It was a live service, running in an isolated container, ready for the real world.

The Bottom Line:
Building the model is only 50% of Data Science. The other 50% is Engineering—making that model robust, portable, and runnable. It was frustrating, but taming the Docker whale was absolutely worth it.


This content originally appeared on DEV Community and was authored by Kenechukwu Anoliefo