Multistage Build Pitfalls That Sabotage Production

Why Multistage Builds Fail in Production—and How to Recover

Multistage Dockerfiles are a staple in modern CI/CD pipelines, promising smaller images, faster builds, and improved security by separating build-time dependencies from runtime artifacts. Yet many teams find that their carefully crafted multistage builds still produce bloated, slow, or insecure images. This guide explains the most common pitfalls—from copying unnecessary files to relying on outdated base images—and provides actionable fixes. We focus on practical, production-tested strategies that you can apply today.

The Promise vs. The Reality

In theory, multistage builds let you compile code in a heavy builder stage and copy only the final binary into a slim runtime stage. In practice, subtle mistakes undermine this efficiency. For example, a team might copy the entire /usr/lib directory instead of only required libraries, or forget to exclude test dependencies. These oversights can balloon image size by hundreds of megabytes, increase attack surface, and slow down deployments.

Why This Matters for Production

Production environments demand minimal attack surfaces, fast cold starts, and deterministic builds. A bloated image not only consumes more storage and bandwidth but also increases the window for vulnerabilities. Moreover, inconsistent build caching can lead to flaky deployments where a working build suddenly fails on a different machine.

To set the stage, consider a typical Node.js microservice. A naive Dockerfile might install all dev dependencies, run tests, and then copy everything into a final image. A multistage refactor can trim that image from 1.2 GB to 150 MB—but only if done correctly. In the sections that follow, we'll explore specific pitfalls and their solutions, drawing from real-world patterns observed in open-source projects and enterprise pipelines.

Pitfall #1: Including Build Tools in the Final Image

The most fundamental mistake is failing to exclude compilers, package managers, and development headers from the runtime stage. This not only bloats the image but also exposes tools that attackers could exploit. For instance, including apt, pip, or npm in a production image is a security risk because these tools can install arbitrary packages if an attacker gains access.

The Fix: Use Two Distinct Stages

Create a builder stage that installs all build dependencies and compiles code, then a runtime stage that copies only the compiled artifact and its minimal runtime dependencies. Use COPY --from=builder to transfer exactly what's needed. For example, a Go application might copy only the binary, while a Node.js app might copy only the node_modules containing production packages.

Common Oversights

Teams often forget to remove apt cache files or temporary build artifacts. Always run rm -rf /var/lib/apt/lists/* after apt-get install in the same RUN command to avoid caching. Similarly, for Node.js, use npm ci --only=production in the runtime stage or, better, install dependencies in the builder stage and copy only the production node_modules.

Another subtle issue: copying the entire node_modules from builder to runtime may include native addons compiled against system libraries. If the runtime base image lacks those libraries, the app crashes. Fix this by building in a stage with a base image that matches the runtime (e.g., using the same Debian version).

In summary, the golden rule is: the final image should contain only what is necessary to run the application. Every extra tool is a liability. Audit your final image with docker history and docker scan to verify.

Pitfall #2: Copying the Wrong Files Between Stages

Even when build tools are excluded, copying the wrong files can reintroduce bloat or break the application. The classic mistake is using COPY . . from the builder stage, which copies everything including .git, node_modules from the host, and other artifacts. This not only increases image size but also can overwrite files that were correctly installed in the builder.

The Fix: Explicit COPY Instructions

Always specify exact source and destination paths. For example, instead of COPY --from=builder /app ., use COPY --from=builder /app/dist /app/dist and COPY --from=builder /app/node_modules /app/node_modules. This prevents accidental inclusion of unnecessary files and makes the Dockerfile self-documenting.

Leverage .dockerignore

A .dockerignore file is essential to exclude files like .git, Dockerfile, README.md, and local environment configs from the build context. Even if you use explicit COPY, the build context is still sent to the Docker daemon, so excluding large unnecessary files speeds up builds. However, note that .dockerignore does not affect COPY --from=builder—it only filters the build context for the current stage.

A real-world example: a team was copying node_modules from the builder stage but forgot to exclude devDependencies in the builder's npm ci. The runtime image ended up containing testing frameworks like Mocha and Chai, increasing size by 50 MB. The fix was to run npm ci --only=production in the builder and then copy only the resulting node_modules.

Another scenario: a Python project copied the entire /usr/local/lib/python3.9/site-packages from the builder, which included pip and setuptools. The solution was to install packages with pip install --user and copy only the ~/.local directory, or use a virtual environment and copy that.

Pitfall #3: Not Using BuildKit's Cache Mounts

Standard multistage builds often rebuild dependencies from scratch on every CI run, even when only source code changes. This wastes time and bandwidth. Docker BuildKit provides cache mounts that persist package caches across builds, drastically speeding up repeated steps.

The Fix: Mount Cache Directories

Replace RUN npm install with RUN --mount=type=cache,target=/root/.npm npm install. For apt, use --mount=type=cache,target=/var/cache/apt. This keeps downloaded packages between builds, so subsequent runs only fetch new or changed packages. Note that cache mounts are not copied into the final image; they are temporary volumes.

Common Mistakes with Cache Mounts

One pitfall is mounting a cache in a stage that is only used for testing, then discarding the stage. That's fine—the cache is still available for later builds. However, if you mount a cache in a stage that is never rebuilt (e.g., because its base image changes), the cache is invalidated. Another issue: using cache mounts with --mount=type=cache inside a RUN command that also writes to the same directory can cause permission errors. Always ensure the cache target is owned by the user running the command.

For example, in a Node.js project, you might do:

FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN --mount=type=cache,target=/root/.npm \
 npm ci --only=production
COPY . .
RUN npm run build

This caches the npm packages, so only the build step (which may change more frequently) re-runs. Teams report build time reductions of 50-70% for dependency-heavy projects.

For apt-based images, cache mounts can also speed up apt-get install, but be careful to not cache sensitive data. BuildKit's cache mounts are host-specific and not shared across CI runners, so they work best on persistent build agents.

Pitfall #4: Leaking Secrets into the Final Image

A common security flaw is embedding API keys, SSH keys, or registry credentials in the image, often through ARG or ENV instructions that survive into the runtime stage. Even if you delete the secret in a later RUN command, it remains in the image layers.

The Fix: Use BuildKit's Secret Mounts

Instead of passing secrets as build args, use --mount=type=secret to mount files only during build steps. For example:

RUN --mount=type=secret,id=npmrc \
 cp /run/secrets/npmrc /root/.npmrc && \
 npm ci && \
 rm /root/.npmrc

The secret file is never stored in any layer. You pass it at build time with --secret id=npmrc,src=.npmrc.

What About ARG?

Build args are visible in docker history, so they should never contain secrets. If you must use ARG for non-sensitive configuration, ensure it is not persisted in the final image. For example, use ARG BUILD_VERSION and then ENV VERSION=$BUILD_VERSION only in the runtime stage—but note that ENV values are visible in the running container, so avoid secrets.

A real-world incident: a team used ARG NPM_TOKEN and then RUN echo //registry.npmjs.org/:_authToken=${NPM_TOKEN} > .npmrc. The token was embedded in the layer even after deletion, because each RUN command creates a new layer. An attacker with access to the image could extract the token via docker history. The fix was to use secret mounts.

Another approach for private packages: use docker buildx with SSH mounts to forward the host's SSH agent during build, avoiding any token in the image.

Always audit your image layers with docker history --no-trunc to check for unintended secrets. Tools like Dive can also inspect layers interactively.

Pitfall #5: Ignoring Layer Ordering and Caching

Docker builds each instruction as a layer and caches layers if nothing changed. Poor ordering can invalidate the cache frequently, forcing rebuilds of expensive steps. For example, copying source code before installing dependencies means any source change invalidates the dependency layer.

The Fix: Order Instructions from Least to Most Frequently Changing

First copy only the dependency manifest files (e.g., package.json, requirements.txt, go.mod), install dependencies, then copy the rest of the source code. This way, dependency layers are cached as long as the manifest doesn't change.

Example:

COPY package*.json ./
RUN npm ci
COPY . .

If you have multiple manifests (e.g., frontend and backend), copy each before installing their respective dependencies.

Advanced: Using BuildKit's Cache Mounts with Ordering

Combine optimal ordering with cache mounts for maximum speed. For instance, the cache mount for npm persists the download cache, while the layered caching of npm ci itself is preserved if package-lock.json hasn't changed. However, note that npm ci uses the lock file, so lock file changes invalidate that layer.

A common mistake is to copy all files in one layer, then run a script that conditionally installs dependencies. This destroys caching because the source layer changes often. Instead, split into multiple COPY instructions: one for manifest files, one for shared libraries, and finally one for application code.

In monorepo setups, you may need multiple manifest files. Use COPY --from=builder carefully to avoid copying entire build contexts from earlier stages. Consider using Docker's --target option to build only specific stages during development.

Cache invalidation also occurs when base images are updated. Pinning base image tags to a specific digest (e.g., node:18@sha256:abc123) prevents unexpected cache busts but requires manual updates for security patches.

Pitfall #6: Using Bloated Base Images

Base images are the foundation of your container. Starting with a full Ubuntu or Debian image when a slim Alpine or distroless image would suffice adds hundreds of megabytes and thousands of vulnerabilities. Even within the same family, different tags vary wildly.

The Fix: Choose the Right Base Image for Each Stage

For builder stages, you may need a full image with compilers and headers. For runtime stages, prefer minimal images. Common options:

Alpine: ~5 MB, uses musl libc, may cause compatibility issues with some native modules.
Distroless: ~10-20 MB, contains only application runtime (e.g., Java, Python, Node), no shell or package manager.
Scratch: 0 MB, used for statically compiled binaries (e.g., Go, Rust).

Comparison Table

Base Image	Size	Security Surface	Best For
Ubuntu 22.04	~77 MB	Large (many tools)	Builder stages needing apt
Alpine 3.18	~5 MB	Small (musl, few packages)	Runtime if no native deps
Distroless Node	~30 MB	Minimal (no shell)	Production Node.js apps
Scratch	0 MB	None (only binary)	Go/Rust static binaries

However, Alpine's musl libc can cause issues with Python's cryptography or Node's sharp because they expect glibc. In such cases, use a slim Debian (e.g., python:3.11-slim) instead of Alpine.

Another pitfall: using latest tags. Always pin to a specific version or digest to ensure reproducibility. For example, node:18.17.0-slim instead of node:latest.

Finally, consider using multi-architecture images (e.g., --platform linux/amd64) to avoid pulling emulated images that can be slower.

Pitfall #7: Forgetting to Clean Up Temporary Files

Each RUN command creates a new layer, and files deleted in later layers still exist in previous layers, contributing to image size. Temporary files like apt cache, downloaded archives, and build artifacts must be cleaned in the same RUN command that creates them.

The Fix: Combine Commands and Clean in the Same Layer

Instead of:

RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

Do:

RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

This ensures the apt cache is removed in the same layer, so it doesn't persist.

Common Examples

npm: run npm cache clean --force after install in the same RUN.
pip: use pip install --no-cache-dir to avoid caching.
apt: always clean lists and any downloaded .deb files.

A real-world example: a Python image built with pip install was 200 MB, but after adding --no-cache-dir and removing __pycache__ directories, it dropped to 120 MB. The key is to do this in the builder stage, not just the runtime stage.

Also, consider using multi-stage to discard the entire builder layer. Even if you clean in the builder, the layer still exists in the builder image—but since you copy only the final artifact, the runtime image stays small. This is one of the main benefits of multistage builds.

For advanced cleanup, use tools like strip on binaries to remove debug symbols. In Go, compile with -ldflags='-w -s' to strip debug info.

Pitfall #8: Not Using .dockerignore Effectively

The .dockerignore file excludes files from the build context, but many teams either omit it or include only obvious things like .git. This leads to large build contexts that slow down the initial build and can accidentally include secrets or large data files.

The Fix: A Comprehensive .dockerignore

At minimum, include:

node_modules
npm-debug.log
.git
.gitignore
Dockerfile
docker-compose*
.env
.env.*
*.md
*.log
dist
.cache

But also consider project-specific files like test, docs, coverage, and large assets that are not needed at runtime.

Impact on Build Speed

A large build context (e.g., including a node_modules folder of 500 MB) must be sent to the Docker daemon for every build, even if the Dockerfile doesn't use it. This adds seconds to minutes to each build. For CI, this can be a major bottleneck.

One team I consulted had a monorepo with multiple services; their build context was 2 GB because they didn't ignore node_modules from sibling projects. After adding proper ignore rules, the context shrank to 50 MB, and build times dropped from 8 minutes to 2.

Note that .dockerignore does not affect COPY --from=builder; it only filters the build context for the current stage. However, it does affect the initial context sent to the daemon, so it's still critical for performance.

Another tip: use docker build -f to specify a different Dockerfile path, but the build context is still the directory passed as the last argument. Keep the context as small as possible.

Pitfall #9: Ignoring Multi-Architecture Builds

In a heterogeneous environment (e.g., mixing AMD64 and ARM64 nodes), building only for one architecture can cause runtime failures. Many teams build on AMD64 CI runners but deploy to ARM64 servers, leading to crashes due to missing emulation or incompatible binaries.

The Fix: Use Buildx for Multi-Architecture Builds

Docker Buildx supports building for multiple platforms in one command. Set up a builder instance that supports cross-compilation or emulation via QEMU. Then build with --platform linux/amd64,linux/arm64.

However, this introduces new pitfalls:

Base images must be available for all target architectures.
Some package managers (e.g., apt) may install different versions on different architectures.
Native extensions (e.g., C++ addons in Node) must be compiled for each architecture. You can either cross-compile or use emulated builds (slower).

Example Workflow

For a Node.js app, you can use node:18-slim which supports both AMD64 and ARM64. In the builder stage, compile native modules using npm rebuild with the target platform. Alternatively, use the --platform flag in the RUN command to specify architecture-specific steps.

A common mistake is to hardcode architecture-specific paths or dependencies. For example, installing chromium for testing might have different package names on ARM64. Use conditional logic in the Dockerfile with build args like TARGETARCH (automatically set by Buildx) to handle differences.

Finally, test your multi-architecture image by running it on an actual ARM64 device, not just emulation. Emulation can hide subtle bugs.

Pitfall #10: Neglecting Security Scanning in the Pipeline

Even a well-structured multistage build can contain vulnerabilities if base images or dependencies have known flaws. Many teams skip scanning until after deployment, when it's harder to fix.

The Fix: Integrate Scanning into CI/CD

Use tools like Docker Scout, Trivy, or Snyk to scan images as part of the build pipeline. Fail the build if critical vulnerabilities are found. For multistage builds, scan both the builder and runtime images, but focus on the runtime image since it's what runs in production.

What to Scan For

OS packages: apt, yum, apk packages with known CVEs.
Application dependencies: npm, pip, gem packages with vulnerabilities.
Secret leakage: check for hardcoded credentials (e.g., using TruffleHog).

A real-world example: a team used node:18 (full image) for their runtime stage; scanning revealed 50+ high-severity CVEs in system packages. After switching to node:18-slim, the count dropped to 5, and after further trimming to distroless, it was 0.

Note that scanning only at build time is not enough—base images update, and new CVEs are discovered. Schedule regular rescans of images in your registry. Use tools that support continuous monitoring.

Also, consider using docker sbom to generate a software bill of materials (SBOM) for each image, which helps in vulnerability tracking and compliance.

Frequently Asked Questions About Multistage Builds

How many stages should I use?

There's no fixed number, but most use 2-4: a builder stage, a test stage (optional), a runtime stage, and sometimes a development stage. Avoid excessive stages as they can complicate the Dockerfile and slow down builds due to increased image export time. A good rule of thumb: if you find yourself copying from multiple stages, consider merging some steps.

Can I use a different base image in each stage?

Yes, and in fact that's common. The builder stage may use a full image with compilers, while the runtime stage uses a minimal image. However, ensure that the compiled binaries are compatible with the runtime base (e.g., linking against the same libc). For Go or Rust static binaries, you can use scratch regardless of the builder.

How do I handle private packages in multistage builds?

Use BuildKit's secret mounts to pass authentication tokens without embedding them. For npm, mount a .npmrc file containing the token. For pip, use --mount=type=secret,id=pip.conf. Alternatively, use SSH mounts to forward the host's SSH agent for git-based dependencies.

What about development vs. production Dockerfiles?

Many projects maintain separate Dockerfiles (e.g., Dockerfile.dev and Dockerfile.prod) or use multi-stage with a dev target. You can use docker build --target development to stop at a stage that includes dev dependencies and hot-reload tooling. For production, build the final stage that excludes those.

Why is my image still large after multistage?

Check for these issues: copying unnecessary files (use explicit COPY), not cleaning temporary files in the same layer, using a bloated base image, or including build tools in the final stage. Also, check for large static assets (videos, images) that should be served externally.

Conclusion: Fix Your Multistage Builds for Good

Multistage builds are a powerful technique, but they require discipline to realize their full benefits. By avoiding the ten pitfalls covered in this guide—including including build tools, copying wrong files, ignoring caching, leaking secrets, poor layer ordering, bloated base images, failing to clean, ineffective .dockerignore, ignoring multi-arch, and skipping security scanning—you can produce images that are lean, fast, and secure.

Start by auditing your current Dockerfile: run docker history to see layer sizes, use Dive to inspect, and scan with Trivy. Then apply the fixes incrementally. Focus on the highest-impact changes first: switch to a minimal runtime base, use secret mounts, and optimize layer ordering. Over time, you'll build a culture of container excellence that pays off in reduced deployment times, lower storage costs, and fewer security incidents.

Remember that containerization is an ongoing practice. As base images and dependencies evolve, revisit your Dockerfiles regularly. Set up automated scanning and rebuild processes to keep images current. With the strategies in this guide, you can turn multistage builds from a source of frustration into a reliable foundation for production.

About the Author

This article was prepared by the editorial team for this publication. We focus on practical explanations and update articles when major practices change.

Last reviewed: May 2026

Table of Contents