One Dockerfile, Two Stages: A 50% Size Reduction Story

The Power of Simple Optimizations

Sometimes the most impactful improvements come from stepping back and rethinking your approach. A recent pull request demonstrates this perfectly: 32 lines added, 23 removed, and a Docker image that’s half the size with 72% fewer security vulnerabilities.

Let’s break down exactly what changed and why it matters.

The Numbers

Before:

  • Image size: 2.04 GB
  • Security vulnerabilities: 290+
  • Build approach: Single-stage

After:

  • Image size: ~900 MB (50% reduction)
  • Security vulnerabilities: ~80 (72% reduction)
  • Build approach: Multi-stage

Code changed: 1 file, net +9 lines

---

## The Five Critical Insights

### 1. The Slim Base Image Difference

**Before:**
```

dockerfile
FROM python:3.11.2


After:


dockerfile
# Stage 1: Build
FROM python:3.11.2 AS builder

# Stage 2: Runtime
FROM python:3.11.2-slim


The Learning: The difference between python:3.11.2 and python:3.11.2-slim is substantial. The full image includes compilers, build tools, and system utilities you don’t need at runtime. The slim variant strips these away.

Impact: This single change likely contributed 40–50% of the total size reduction.

2. Separate Build and Runtime Concerns

Before: Everything in one stage


dockerfile
RUN pip install poetry \
    && poetry config virtualenvs.create false \
    && poetry install
# ...poetry stays in final image


After: Two distinct stages


dockerfile
# Stage 1: Install and build
FROM python:3.11.2 AS builder
RUN pip install poetry
RUN poetry install

# Stage 2: Copy only what's needed
FROM python:3.11.2-slim
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages


The Learning: Build tools don’t need to ship to production. Poetry, compilers, and development headers are essential for installation but deadweight at runtime.

Impact: Eliminates build dependencies from the final image while keeping all functionality.

3. Layer Consolidation Matters

Before: Multiple separate apt-get update calls


dockerfile
RUN apt-get update && \
    apt-get install -y postgresql-client libpq-dev && \
    rm -rf /var/lib/apt/lists/*

RUN apt-get update && apt-get install -y \
    libmemcached11 \
    libmemcachedutil2 \
    libmemcached-dev \
    libz-dev

RUN apt-get update && apt-get install -y dos2unix


After: Consolidated installation


dockerfile
RUN apt-get update && \
    apt-get install -y postgresql-client libpq-dev \
    libmemcached11 libmemcachedutil2 libmemcached-dev libz-dev \
    dos2unix && \
    rm -rf /var/lib/apt/lists/*


The Learning: Each RUN command creates a new layer. Multiple apt-get update calls mean multiple cached package lists. Consolidating reduces both layer count and duplicate data.

Impact: Smaller image size and better cache utilization during builds.

4. Copy Strategy Optimization

Before: Copy everything early


dockerfile
WORKDIR /blt
COPY . /blt
# ...then install dependencies


After: Copy dependencies first, code last


dockerfile
# Stage 1
COPY pyproject.toml poetry.lock* ./
RUN poetry install

# Stage 2
COPY . /blt


The Learning: Docker caches layers. If you copy your entire application before installing dependencies, every code change invalidates the dependency installation cache. Copying dependency files first means dependencies only reinstall when they actually change.

Impact: Faster builds during development and CI/CD.

5. Runtime vs. Build Dependencies

Before: All dependencies in final image


dockerfile
libmemcached-dev  # Development headers
libz-dev          # Development headers


After: Only runtime dependencies in final stage


dockerfile
# Builder: libmemcached-dev libz-dev
# Runtime: libmemcached11 libmemcachedutil2


The Learning: Development headers (-dev packages) are needed to compile extensions but not to run them. The runtime only needs the shared libraries.

Impact: Smaller attack surface and fewer vulnerabilities (72% reduction).

The Security Dimension

Reducing vulnerabilities from 290+ to 80 wasn’t just about using a slim base image—it was about shipping less code.

Every package is a potential vulnerability:

  • Build tools contain vulnerabilities you’ll never exploit (because you’re not building in production)
  • Development headers expose attack surfaces you don’t need
  • Unused system utilities are just risk without reward

The math is simple: Fewer packages = fewer CVEs = better security posture.

The Practical Takeaways

For Your Next Dockerfile

  • Always use multi-stage builds for compiled/installed dependencies

    • Stage 1: Build and install
    • Stage 2: Runtime with minimal base
  • Choose the smallest base image that works

    • python:3.11-slim over python:3.11
    • Alpine if you can handle the musl differences
  • Consolidate RUN commands

    • Combine related operations
    • Clean up in the same layer
  • Copy smartly

    • Dependencies first
    • Application code last
    • Leverage layer caching
  • Distinguish -dev from runtime packages

    • Build stage: libpq-dev, gcc, make
    • Runtime stage: libpq5, binaries only

The ROI

  • Development: Faster builds, better caching
  • Deployment: Faster pulls, quicker rollbacks
  • Security: Smaller attack surface, fewer CVEs
  • Costs: Less storage, less bandwidth

Conclusion: Small Changes, Big Impact

This pull request is a reminder that optimization doesn’t always require complex refactoring. Sometimes it’s about:

  • Understanding what you’re shipping
  • Separating what you need from what you needed
  • Applying fundamental best practices

One file changed. Fifty-five lines modified. Image size cut in half. Vulnerabilities reduced by 72%.

That’s the power of thoughtful Docker optimization.

Before You Go

Action Item: Pull up your most recent Dockerfile and ask yourself:

  • Am I shipping build tools to production?
  • Could I use a slimmer base image?
  • Are my RUN commands consolidated?
  • Is my COPY strategy cache-friendly?

The answers might just cut your image size in half.

open source PR link – https://github.com/OWASP-BLT/BLT/pull/3072

References

Similar Posts