Skip to main content
Multi-Stage Build Pitfalls

Misplaced Dependencies: How Incorrect Build Stage Order Sabotages Your Container's Integrity

This guide explains a critical yet often overlooked pitfall in containerized development: how the sequence of operations in your Dockerfile's build stages directly determines the security, size, and reliability of your final image. We move beyond basic syntax to explore the architectural consequences of misplaced dependencies, where a seemingly minor ordering mistake can introduce vulnerabilities, bloat, and non-deterministic builds. Using a problem-solution framework, we dissect common mistakes

The Silent Saboteur: Why Build Stage Order Isn't Just About Speed

In the rush to containerize applications, many teams focus on the triumphant "it runs" moment, overlooking the subtle engineering decisions that dictate long-term integrity. The order of commands in a Dockerfile, especially within multi-stage builds, is frequently treated as an afterthought—a matter of syntactic correctness rather than architectural consequence. This guide addresses that critical oversight. We will demonstrate how misplaced dependencies—installing a package in the wrong layer, copying files at an inopportune time, or cleaning up artifacts prematurely—systematically undermine your container's security, reproducibility, and efficiency. It's a problem of causality: every command creates a layer, and every layer inherits the state and baggage of all preceding layers. An incorrect sequence doesn't just make the build slower; it can bake in secrets, leave behind vulnerable tooling, or create a bloated, inconsistent artifact that behaves unpredictably across environments. Understanding this is the first step from treating containers as mere packaging to engineering them as robust, immutable units of deployment.

The Core Mechanism: Layers, Cache, and Immutable History

To understand why order matters, you must visualize the Docker build process as constructing a stack of transparent slides. Each RUN, COPY, or ADD instruction creates a new slide on top of the stack. Crucially, Docker's build cache uses the content of these slides (their layer hashes) to determine if a step can be skipped. If you change a slide early in the stack, every subsequent slide must be recreated. This is why placing frequently changing application code before installing stable system dependencies destroys cache efficiency. More insidiously, everything placed on a slide remains in the history of that build stage. Even if you delete a file in a later RUN command, a record of that file's existence—and often its contents—persists in the intermediate layer's history, which can be a security liability. The order dictates what is exposed, what is hidden, and what becomes a permanent part of your image's DNA.

A Composite Scenario: The Leaky Build Tool

Consider a typical project building a Go application. A common, flawed pattern is to start the final stage with COPY --from=builder /app/my-binary .. This seems clean. However, what if the 'builder' stage also installed curl, wget, git, and a compiler to fetch dependencies? Those tools are not copied, correct? True, but the builder stage itself remains an accessible part of the image manifest. In certain orchestration environments or if the build cache is exported, an attacker with access to the image could theoretically dissect the builder stage to find vulnerabilities in those tools or even exploit them if they can execute commands in a compromised container during build. The mistake was not isolating the build toolchain effectively; the dependency (the full build environment) was placed in a location that, while not in the final filesystem, remains part of the artifact's accessible context. The solution involves stricter separation and potentially using separate, disposable build containers.

This scenario illustrates that the problem extends beyond final image size. It's about attack surface and compliance. A security scanner that only analyzes the final layer might give a false sense of security, while the actual risk lurks in the ancestry. Therefore, structuring stages isn't just a performance optimization; it's a fundamental security control. The order must be planned to minimize persistent exposure, not just to streamline the build. Every tool you install, even temporarily, must be accounted for in your dependency graph, with a clear path for its removal or containment before the image is finalized.

Anatomy of a Flawed Dockerfile: Common Mistake Patterns

To diagnose and fix order-related issues, we must first recognize their symptoms. These mistakes often arise from incremental development, where instructions are added to solve an immediate problem without considering the layer stack's cumulative effect. Let's categorize the most prevalent anti-patterns that sabotage container integrity. Each represents a misunderstanding of the layer-caching mechanism or a misplacement of a logical dependency. By naming these patterns, teams can conduct more effective code reviews and preemptively avoid the pitfalls that lead to bloated, insecure, or unreliable images. The goal is to shift from reactive debugging to proactive design, where the Dockerfile structure is a deliberate blueprint rather than a historical log of commands.

Mistake 1: The Monolithic RUN Command

A single, sprawling RUN instruction that updates apt, installs packages, downloads source code, configures the environment, and cleans up in one line is a major red flag. While it may reduce layer count, it destroys cache efficiency and debugging clarity. If any part of that command changes (like adding one new package), the entire expensive operation—including the apt update and cleanup—is invalidated and rerun. Furthermore, if the cleanup step fails mid-script, the layer may be committed with partially cleaned state. The dependency graph here is opaque; there's no logical separation between the dependency fetch, installation, and cleanup phases. This pattern makes it impossible for Docker to reuse the stable portion of your system setup, leading to longer build times and wasted bandwidth.

Mistake 2: Copy Early, Update Late

This pattern looks like this: COPY . /app followed later by RUN npm install or RUN go mod download. The application code (which changes with every developer commit) is placed before the dependency installation. This ensures that the dependency installation layer cache is busted on every code change, even if package.json or go.mod remains unaltered. The logical dependency is reversed: the static dependency list should be copied and installed first, as it changes less frequently. Only then should the volatile application code be copied in. This simple reordering can cut build times by 80% or more for incremental builds, as the expensive npm install step is cached until its actual inputs change.

Mistake 3: Leaving the Kitchen Sink in the Final Image

In multi-stage builds, a frequent error is incomplete cleanup in intermediate stages or copying unnecessary artifacts between stages. For example, a builder stage might compile a binary but also generate debug symbols, test reports, or source code. If the final COPY --from=builder instruction uses a wildcard or copies an entire directory, it may pull these extraneous files into the slim final image, negating the benefit of multi-stage builds. The dependency on these intermediate artifacts is misplaced because they are not required for runtime. The solution is to be surgical: copy only the strict runtime necessities, like the compiled binary, and ensure the builder stage's working directory doesn't accumulate cruft. This requires thinking of stages as independent, with explicit contracts on what they produce.

Recognizing these patterns is the diagnostic phase. The next step is to adopt a principled framework for ordering instructions that aligns with the true dependency graph of your application, prioritizing immutability, cacheability, and minimalism at each step. This involves a mindset shift from writing sequential commands to designing a directed acyclic graph (DAG) of build stages, where each node has a clear purpose and a defined output, and edges represent explicit, minimal transfers of data.

Designing for Integrity: A Framework for Correct Stage Order

Building a container with integrity requires a proactive design philosophy. We propose a framework centered on the concept of a "Dependency-First, Stateful-Last" ordering. This means structuring your Dockerfile to satisfy stable, external dependencies first, manage application dependencies next, incorporate volatile code last, and rigorously clean state between logical phases. The framework treats the build as a pipeline with clear checkpoints where the cache can be most effective and security risks can be shed. It's not a rigid template but a set of priorities that guide your instruction sequence. By applying this framework, you create images that are not only smaller and faster to build but also more predictable and secure because each layer's purpose and content are intentional.

Priority 1: Base Image and System Dependencies

Start with the most stable foundations. This includes selecting a minimal, specific base image (e.g., debian:12-slim instead of just latest) and installing system-level packages. The key here is to chain related commands to form a single, cacheable layer that represents your "configured OS." A best-practice pattern is: RUN apt-get update && apt-get install -y package1 package2 && rm -rf /var/lib/apt/lists/*. This chains the update, install, and cleanup in one RUN to prevent the apt/lists directory from persisting in a separate layer. This layer changes only when your package list changes, making it highly cacheable. This step establishes the immutable platform upon which everything else depends.

Priority 2: Application Dependencies and Tooling

With the OS prepared, the next layer should install runtime dependencies specific to your language or framework. Copy the dependency manifest file (package.json, requirements.txt, go.mod, etc.) independently and run the install command. For example: COPY package.json package-lock.json ./ followed by RUN npm ci --only=production. This layer is cached independently of your source code. It will only invalidate when the dependency files change. This separation is the single most impactful optimization for developer iteration speed. For build-time tooling (compilers, linters), this is where multi-stage design becomes critical: these should be installed in a separate, earlier builder stage, not in the final runtime image.

Priority 3: Application Code and Assets

Only after the environment and dependencies are settled should you copy the rest of your application source code. This includes all files that change frequently during development. The instruction is typically a simple COPY . . or a more targeted copy. Because this layer sits atop the stable dependency layers, a code change only triggers a rebuild from this point forward, leaving the expensive setup layers cached. This order respects the true dependency graph: the code depends on the libraries, which depend on the system, not the other way around.

This framework must be adapted for multi-stage builds. The principle extends to designing stages as independent, functional units. A builder stage might follow Priorities 1-3 to create a compiled artifact. The final stage then resets, often starting with a fresh, even more minimal base image (Priority 1 again), and then copies only the artifact from the builder, satisfying Priority 2 (now just the binary) and skipping Priority 3 (source code) entirely. This layered, priority-driven approach systematically eliminates misplaced dependencies by making the contract between stages explicit and minimal.

Multi-Stage Build Strategies: A Comparative Analysis

Multi-stage builds are the most powerful tool for combating image bloat and leakage, but they introduce their own ordering complexities. Choosing the right strategy depends on your application's build process, language, and security requirements. Below, we compare three common architectural patterns for multi-stage builds, evaluating their pros, cons, and ideal use cases. This comparison will help you decide not just how to use multiple stages, but how to sequence operations within and between them to maximize integrity.

StrategyHow It WorksProsConsBest For
Simple Builder PatternTwo stages: a full-featured "builder" stage with compilers/tooling produces an artifact, which is copied into a clean, minimal runtime stage.Straightforward, great for compiled languages (Go, Rust, Java). Dramatically reduces final image size. Isolates build tools.Builder stage may still be accessible in image metadata. Can be overkill for interpreted languages where the runtime needs many packages.Go binaries, Rust applications, any project where build requires significant extra tooling.
Dedicated Dependency StageThree+ stages: Stage1 installs system deps. Stage2 (builder) uses Stage1 as base, adds dev tools, compiles. Stage3 (runtime) uses Stage1 as base, copies artifact.Maximizes layer reuse. The system dependency layer is shared between builder and runtime, cached efficiently. Very clean separation.More complex Dockerfile. Requires careful management of paths and artifacts between stages.Complex applications with shared system dependencies between build and runtime (e.g., Python with C extensions, Node.js with native modules).
Scratch or Distroless FinaleBuilder stage as before, but final stage uses FROM scratch or a distroless base (like gcr.io/distroless/base). Copies only the binary and maybe CA certificates.Maximum security minimalism. Almost zero attack surface. Ideal for compliance-sensitive environments.Debugging is difficult (no shell). Requires static linking or careful assembly of all runtime dependencies (libc, etc.).Extremely security-conscious deployments of statically-linked binaries. Kubernetes-native applications.

The choice of strategy dictates the dependency flow. In the Simple Builder pattern, the critical order is within the builder stage (follow the framework) and ensuring the final copy is minimal. In the Dedicated Dependency Stage, the order of defining the shared base stage is paramount—it must contain all common, stable dependencies. For Scratch/Distroless, the entire focus is on the builder stage's output completeness; the final stage order is trivial (just copy), but the dependency on a perfectly self-contained artifact is absolute. Each strategy manages the risk of misplaced dependencies differently, by either isolating, sharing, or eliminating entire classes of components.

Step-by-Step Guide: Refactoring a Problematic Dockerfile

Let's apply our framework and understanding to a concrete remediation process. Suppose we have a Dockerfile for a Node.js API that suffers from the "Copy Early, Update Late" and "Monolithic RUN" mistakes. Our goal is to refactor it for cache efficiency, smaller size, and better security. We'll proceed step-by-step, explaining the rationale for each change. This process is universally applicable and can be used as a checklist for reviewing any Dockerfile.

Step 1: Analyze the Original Structure

First, examine the existing Dockerfile to identify order-related flaws. A typical flawed example might start with FROM node:18, then immediately COPY . ., followed by a large RUN npm install && npm run build && rm -rf /root/.npm. Finally, it runs CMD ["node", "dist/index.js"]. The problems are evident: the code copy invalidates the cache for npm install on every change, and the monolithic RUN mixes concerns. Our analysis should list all dependencies: the OS/base image, Node.js runtime, npm packages (production only), source code, and build artifacts.

Step 2: Implement a Multi-Stage Design

We'll split into two stages: a builder and a runner. Start the Dockerfile with the builder stage: FROM node:18 AS builder. In this stage, we will install dependencies and build the application. The key is to order instructions for optimal caching. First, copy only the dependency manifests: COPY package.json package-lock.json ./. Then, install dependencies: RUN npm ci. Now, copy the rest of the source code: COPY . .. Finally, run the build: RUN npm run build. This sequence ensures the expensive npm ci is cached unless package.json changes.

Step 3: Construct the Lean Final Stage

Now, define the final runtime stage. For a Node.js app, we can still use a slim Node image, but we'll use the -alpine variant for size: FROM node:18-alpine. Set the working directory. Now, copy only the production dependencies from the builder stage. This is more precise than copying the entire node_modules. We can do this by copying the package.json and running npm ci --only=production in the final stage, or by copying the node_modules from the builder. The cleaner method is: COPY --from=builder /app/package.json ./ and RUN npm ci --only=production. This ensures only production dependencies are installed, avoiding any devDependencies that might have been in the builder.

Step 4: Copy the Application Artifact and Set Runtime

The final step is to copy the built application from the builder stage, not the source code. For example: COPY --from=builder /app/dist ./dist. This is the only other thing we copy. The final stage now contains a minimal OS, Node.js runtime, production node_modules, and the built artifact—nothing else. No source code, no dev tools, no test files. End with CMD ["node", "dist/index.js"]. This step-by-step refactoring transforms a bloated, cache-inefficient image into a streamlined, secure, and performant container. The process underscores the importance of ordering within stages and the selective copying between them.

Beyond the Dockerfile: Tooling and Cultural Shifts

Fixing Dockerfile order is a technical necessity, but sustaining container integrity requires broader tooling and team culture. A perfectly ordered Dockerfile can still be subverted by a careless COPY or a hidden dependency in a shell script. Therefore, we must embed checks and practices that make integrity the default. This involves integrating linters into CI/CD pipelines, adopting image-scanning tools that understand layers, and fostering a review culture that scrutinizes Docker changes as critically as application logic. The goal is to move from manual, expert-driven optimization to automated, team-wide guardrails.

Integrating Linters and Static Analysis

Tools like Hadolint (a Dockerfile linter) can be integrated into your pre-commit hooks or CI pipeline. Hadolint checks for dozens of best practices, including order-related rules like "Copy files before running commands that use them" (DL3045) and "Avoid multiple RUN statements that could be combined" (DL3059). By making these checks automatic, you catch common ordering mistakes before they reach the repository. This shifts the burden from human memory to automated policy, ensuring even new team members produce optimized Dockerfiles. The linter acts as a persistent, objective reviewer of your dependency graph's expression in code.

Implementing Regular Image Scanning

Static analysis of the Dockerfile is not enough; you must also analyze the resulting image. Tools like Trivy, Grype, or Docker Scout can scan your built images for vulnerabilities, but crucially, they can also show you which layer introduced a particular vulnerability. This feedback is invaluable for diagnosing misplaced dependencies. If a scan reveals a high-severity CVE in a tool like curl in your final image, you can trace it back to the RUN apt-get install layer. If curl isn't needed at runtime, this is a clear signal that your multi-stage separation failed or that the package was a misplaced dependency in the final stage. Regular scanning turns abstract principles into concrete, actionable security findings.

Cultivating a Review Mindset

Finally, teams must treat Dockerfile changes with the same rigor as application code. In pull requests, reviewers should ask questions like: "Does this new package need to be in the final stage or just the builder?" "Will this COPY command break the cache for the dependency layer?" "Can we combine these RUN commands to clean up within the same layer?" This cultural shift elevates container construction from an operational task to a core engineering discipline. It encourages developers to think in terms of layers and dependencies, not just commands. Over time, this mindset produces a codebase of high-integrity Dockerfiles that are maintainable, secure, and efficient by design, not by accident.

Common Questions and Concerns (FAQ)

As teams implement these practices, several recurring questions and concerns arise. Addressing these helps solidify understanding and overcome practical hurdles. This section tackles the nuances and edge cases that aren't always covered in basic tutorials, providing the deeper judgment needed for real-world scenarios.

Does layer order really impact security, or just size?

Absolutely, it impacts security profoundly. A larger image has a larger attack surface—more packages mean more potential vulnerabilities. More critically, the order determines what is exposed in the final image and what remains in the historical layers. Secrets copied in an early layer and "deleted" later are still recoverable. Build tools left in a final layer provide exploitation vectors. Security scanners analyze layers, so a vulnerability in an early, deleted layer might be missed by a shallow scan but exploitable by someone with access to the image registry or build cache. Correct ordering minimizes persistent risk.

How do I handle dependencies that need OS packages AND language packages?

This is a complex dependency chain. The best approach is often the "Dedicated Dependency Stage" pattern. Create a base stage that installs the required OS packages (e.g., libpq-dev for PostgreSQL client libraries). Use this as the base for both your builder stage (where you install Python, pip, and dev tools) and your final runtime stage. In the final stage, you start FROM base-stage, then install Python runtime dependencies via pip from a requirements.txt file. This ensures the OS-level dependency is satisfied and cached in a shared layer, avoiding duplication and ensuring consistency.

What if my build is non-deterministic despite good order?

Non-deterministic builds often stem from dependencies outside the explicit dependency graph. Examples include downloading the "latest" tag of a base image, using package managers without version pinning (apt-get install some-package without a version), or scripts that fetch resources from the internet during build. The solution is to tighten your dependencies: use specific digest-tagged base images (FROM debian:12-slim@sha256:...), pin all package versions in your manifest files, and avoid network fetches during the RUN commands. If you must fetch, ensure you validate checksums. Order provides structure, but determinism requires complete and versioned inputs.

Is the extra complexity of multi-stage builds worth it for small apps?

This is a trade-off. For a tiny internal tool, the complexity overhead might outweigh the benefits. However, consider the long-term trajectory and security posture. A multi-stage build enforces a discipline that pays dividends as the app grows. It also future-proofs the deployment against sudden security scanning requirements. A good rule of thumb: if your final image would contain compilers, SDKs, or test frameworks that are not needed for runtime, a multi-stage build is warranted. The initial investment in a properly ordered Dockerfile, even with multiple stages, reduces operational debt and prevents a costly refactor later.

Conclusion: Building with Intention, Not Accident

The integrity of your containerized application is not a happy accident; it is the direct result of intentional design choices, with the order of build stages being one of the most significant. Misplaced dependencies—whether they are packages, files, or tools—create a cascade of problems: bloated images, slow builds, hidden vulnerabilities, and non-reproducible artifacts. By adopting a "Dependency-First, Stateful-Last" framework, leveraging multi-stage strategies appropriately, and integrating validation tooling into your workflow, you transform your Dockerfile from a fragile script into a robust blueprint. This guide has provided the problem-solution lens and concrete patterns to help you avoid common mistakes. The goal is to shift your team's mindset, to see each layer not as a step in a procedure, but as a deliberate, cache-optimized, security-hardened component in a final, immutable artifact. Start by auditing your most critical images, refactor using the step-by-step guide, and embed the practices that make integrity the default, not the exception.

About the Author

This article was prepared by the editorial team for this publication. We focus on practical explanations and update articles when major practices change.

Last reviewed: April 2026

Share this article:

Comments (0)

No comments yet. Be the first to comment!