Why Is My AI Docker Image So Big? A Deep Dive with ‘dive’ tool to Find the Bloat
Key Takeaways
- A Docker image isn’t just a single file, but a stack of immutable layers — where each layer corresponds to a Dockerfile instruction.
- Large AI-Docker images often bloat because of heavy AI library installations and large base OS components.
- Use tools like docker history (to view layer sizes) and dive (to interactively explore image contents) to pinpoint where the bloat comes from.
- Once you identify the big layers or unnecessary files, you can make targeted changes and shrink the image for faster builds, cheaper storage, and better security.
Introduction
There are two great reasons to use Docker images in AI projects:
- The image works — it reliably runs your model or service.
- The image is well-crafted — it’s lean, builds quickly, deploys efficiently.
In the demanding world of AI development and DevOps, a huge image is more than an inconvenience: a 5 GB image that takes minutes to build and deploy becomes a drag on your team’s velocity and your cloud bill.
Before we can optimise, we must diagnose. We need to become Docker image detectives: peeling back the layers, understanding how our image is built, and identifying where the waste lies.
Why Optimize?
Let’s refresh the motivations:
Slower Development Lifecycle
A toy image clocked in at 2.54 GB and took ~56 seconds to build. In a production environment, builds of that size slow down iteration, hinder developer feedback loops, and cost time.
Inefficient CI/CD Pipelines
Every image push/pull in CI/CD costs time and bandwidth. If your image grows to 5-10 GB, the cost and delay multiply across several builds and deployments per team per day.
Higher Cloud Costs & Larger Attack Surface
Large images take more storage in registries, more data transfer, and often contain more OS-packages and utilities than necessary — increasing the vulnerability footprint.
Our Specimen: The “Naive” BERT Classifier
Here’s the example image under investigation: a simple text-classification app using the bert-base-uncased model and a Python stack.
requirements.txt (naive flavour)
transformers==4.52.3
torch==2.7.0
torchvision==0.22.0
torchaudio==2.7.0
flask==2.3.3
pandas
numpy==1.26.4
requests==2.32.3
pillow
scikit-learn
pytest
jupyter
ipython
matplotlib
seaborn
black
flake8
mypy
Problematic Dockerfile
FROM python:3.10
RUN apt-get update && apt-get install -y curl
WORKDIR /app
COPY naive_image/requirements.txt ./requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY naive_image/app/ ./app/
COPY naive_image/sample_data/ ./sample_data/
RUN echo "Build complete" > /app/build_status.txt
CMD ["python", "app/predictor.py", "sample_data/sample_text.txt"]
When built, the image size came out to ~2.54 GB.
The Diagnostic Toolkit: Peeling Back the Layers
Inspect total size
docker image ls bert-classifier-naive
This raises a flag — the image is very large for a lightweight demo.
View layer sizes
docker history bert-classifier-naive
Typical output shows:
RUN pip install –-no-cache-dir –r requirements.txt 1.51 GB
FROM python:3.10 560 MB
RUN apt-get install curl 19.4 MB
Voila — two heavy hitters: the Python dependencies layer (1.51 GB) and the base OS layer (560 MB).
Dive into contents
dive bert-classifier-naive
With dive, you can explore each layer:
- See huge directories (e.g., /usr/local/lib/python3.10/site-packages/torch/…)
- Spot leftover package manager caches (e.g., /var/lib/apt/lists/)
- Detect that maybe a .dockerignore was missing and unnecessary data got copied.
What We Learned
- The 1.51 GB layer came from installing heavy AI libraries.
- The 560 MB base image shows the cost of picking a full Python OS image rather than a slim or alpine variant.
- The 19.4 MB curl installation included ~9.5 MB of waste (cache files).
- Using COPY . . without a .dockerignore can quietly import tons of unnecessary files (data, logs, venvs).
How to Fix It (Optimisation Techniques)
Use a slimmer base image
Instead of python:3.10, consider python:3.10-slim or python:3.10-alpine.
Clean up package caches
RUN apt-get update && apt-get install -y curl
&& rm -rf /var/lib/apt/lists/*
Separate build vs runtime (multi-stage builds)
FROM python:3.10-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
FROM python:3.10-slim
WORKDIR /app
COPY --from=builder /app /app
CMD ["python", "app/predictor.py"]
Add .dockerignore
Exclude unnecessary files:
.git/
venv/
__pycache__/
*.ipynb
Be mindful of dependencies
Ask: “Do I really need the full Torch / Transformers stack in this container?” Maybe use a trimmed runtime version or ONNX model.
TL;DR
- Big Docker images = heavy base OS + bulky dependencies + leftover caches + unwanted files.
- Use docker history to find large layers, and dive to inspect them.
- Once you know where the waste is, you can take targeted actions: slim base image, clean caches, multi-stage builds, and exclude junk.
By applying those diagnostics and fixes, your builds become faster, deployments quicker, registry costs lower, and the resulting images more secure.
Your Turn to Dive In
Now it’s your turn: build your image, run docker history, fire up dive, and explore.
Look for “Why is it so big?” — and then fix it.
🌟 Thanks for reading! If this post added value, a like ❤️, follow, or share would encourage me to keep creating more content.
— Latchu | Senior DevOps & Cloud Engineer
☁️ AWS | GCP | ☸️ Kubernetes | 🔐 Security | ⚡ Automation
📌 Sharing hands-on guides, best practices & real-world cloud solutions