Why Is My AI Docker Image So Big? A Deep Dive with ‘dive’ tool to Find the Bloat

Key Takeaways

A Docker image isn’t just a single file, but a stack of immutable layers — where each layer corresponds to a Dockerfile instruction.
Large AI-Docker images often bloat because of heavy AI library installations and large base OS components.
Use tools like docker history (to view layer sizes) and dive (to interactively explore image contents) to pinpoint where the bloat comes from.
Once you identify the big layers or unnecessary files, you can make targeted changes and shrink the image for faster builds, cheaper storage, and better security.

Introduction

There are two great reasons to use Docker images in AI projects:

The image works — it reliably runs your model or service.
The image is well-crafted — it’s lean, builds quickly, deploys efficiently.

In the demanding world of AI development and DevOps, a huge image is more than an inconvenience: a 5 GB image that takes minutes to build and deploy becomes a drag on your team’s velocity and your cloud bill.

Before we can optimise, we must diagnose. We need to become Docker image detectives: peeling back the layers, understanding how our image is built, and identifying where the waste lies.

Why Optimize?

Let’s refresh the motivations:

Slower Development Lifecycle

A toy image clocked in at 2.54 GB and took ~56 seconds to build. In a production environment, builds of that size slow down iteration, hinder developer feedback loops, and cost time.

Inefficient CI/CD Pipelines

Every image push/pull in CI/CD costs time and bandwidth. If your image grows to 5-10 GB, the cost and delay multiply across several builds and deployments per team per day.

Higher Cloud Costs & Larger Attack Surface

Large images take more storage in registries, more data transfer, and often contain more OS-packages and utilities than necessary — increasing the vulnerability footprint.

Our Specimen: The “Naive” BERT Classifier

Here’s the example image under investigation: a simple text-classification app using the bert-base-uncased model and a Python stack.

requirements.txt (naive flavour)

transformers==4.52.3
torch==2.7.0
torchvision==0.22.0
torchaudio==2.7.0
flask==2.3.3
pandas
numpy==1.26.4
requests==2.32.3
pillow
scikit-learn
pytest
jupyter
ipython
matplotlib
seaborn
black
flake8
mypy

Problematic Dockerfile

FROM python:3.10
RUN apt-get update && apt-get install -y curl
WORKDIR /app
COPY naive_image/requirements.txt ./requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
COPY naive_image/app/ ./app/
COPY naive_image/sample_data/ ./sample_data/
RUN echo "Build complete" > /app/build_status.txt
CMD ["python", "app/predictor.py", "sample_data/sample_text.txt"]

When built, the image size came out to ~2.54 GB.

The Diagnostic Toolkit: Peeling Back the Layers

Inspect total size

docker image ls bert-classifier-naive

This raises a flag — the image is very large for a lightweight demo.

View layer sizes

docker history bert-classifier-naive

Typical output shows:

RUN pip install –-no-cache-dir –r requirements.txt   1.51 GB  
FROM python:3.10                                            560 MB  
RUN apt-get install curl                                    19.4 MB

Voila — two heavy hitters: the Python dependencies layer (1.51 GB) and the base OS layer (560 MB).

Dive into contents

dive bert-classifier-naive

With dive, you can explore each layer:

See huge directories (e.g., /usr/local/lib/python3.10/site-packages/torch/…)
Spot leftover package manager caches (e.g., /var/lib/apt/lists/)
Detect that maybe a .dockerignore was missing and unnecessary data got copied.

What We Learned

The 1.51 GB layer came from installing heavy AI libraries.
The 560 MB base image shows the cost of picking a full Python OS image rather than a slim or alpine variant.
The 19.4 MB curl installation included ~9.5 MB of waste (cache files).
Using COPY . . without a .dockerignore can quietly import tons of unnecessary files (data, logs, venvs).

How to Fix It (Optimisation Techniques)

Use a slimmer base image

Instead of python:3.10, consider python:3.10-slim or python:3.10-alpine.

Clean up package caches

RUN apt-get update && apt-get install -y curl 
    && rm -rf /var/lib/apt/lists/*

Separate build vs runtime (multi-stage builds)

FROM python:3.10-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .

FROM python:3.10-slim
WORKDIR /app
COPY --from=builder /app /app
CMD ["python", "app/predictor.py"]

Add .dockerignore

Exclude unnecessary files:

.git/
venv/
__pycache__/
*.ipynb

Be mindful of dependencies

Ask: “Do I really need the full Torch / Transformers stack in this container?” Maybe use a trimmed runtime version or ONNX model.

TL;DR

Big Docker images = heavy base OS + bulky dependencies + leftover caches + unwanted files.
Use docker history to find large layers, and dive to inspect them.
Once you know where the waste is, you can take targeted actions: slim base image, clean caches, multi-stage builds, and exclude junk.

By applying those diagnostics and fixes, your builds become faster, deployments quicker, registry costs lower, and the resulting images more secure.

Your Turn to Dive In

Now it’s your turn: build your image, run docker history, fire up dive, and explore.
Look for “Why is it so big?” — and then fix it.

🌟 Thanks for reading! If this post added value, a like ❤️, follow, or share would encourage me to keep creating more content.

— Latchu | Senior DevOps & Cloud Engineer

☁️ AWS | GCP | ☸️ Kubernetes | 🔐 Security | ⚡ Automation
📌 Sharing hands-on guides, best practices & real-world cloud solutions

🎬 Watch the Video

Why Is My AI Docker Image So Big? A Deep Dive with ‘dive’ tool to Find the Bloat

Key Takeaways

Introduction

Why Optimize?

Our Specimen: The “Naive” BERT Classifier

The Diagnostic Toolkit: Peeling Back the Layers

What We Learned

How to Fix It (Optimisation Techniques)

TL;DR

Your Turn to Dive In

Announcing React Native Versatile Modal: One Modal Component to Rule Them All

What Are Webhooks, and How Do You Implement Them?

AI Week

Hate ‘slide to stop’ alarms in iOS 26.1? Here’s how to get the button back

Black Friday could be the right time to buy a new Nvidia GPU as RTX 5000 Super refreshes are rumored to be canceled

The AI ick

Key Takeaways

Introduction

Why Optimize?

Our Specimen: The “Naive” BERT Classifier

The Diagnostic Toolkit: Peeling Back the Layers

What We Learned

How to Fix It (Optimisation Techniques)

TL;DR

Your Turn to Dive In

Similar Posts