Why we usually choose Pulumi over Terraform
Pulumi is our default when infrastructure starts behaving like software. Existing Terraform estates can still be the better decision when the migration cost is higher than the operational gain.
On this page
The default
For new infrastructure, or for infrastructure that’s still evolving in real ways, we usually choose Pulumi over Terraform. That’s the default. Not because Terraform is unusable, and not because tool choice needs to become a personality trait. It’s because once infrastructure starts repeating, branching, and carrying business rules, we’d rather use a real language than keep working around the limits of a narrower one. It also helps that we can often work in a language the team already knows instead of forcing infrastructure into its own isolated dialect.
Existing Terraform estates change the decision fast. If a client already has substantial Terraform code, mature modules, dependable pipelines, policy controls, and a team that can operate the system without daily friction, then the question is no longer what we’d choose from zero. The question is whether a migration would create enough operational gain to justify the cost, risk, and distraction. A lot of the time, it wouldn’t. That’s not inconsistency. That’s the difference between choosing a tool for a new system and replacing one that’s already paying for itself.
That judgment also depends on how the team handles state, ownership boundaries, apply paths, and refactors. A Terraform estate is not weak just because it uses Terraform. A lot depends on whether the operating model around it is actually sound.
Where Terraform is strong
Terraform is good at a very specific class of problem, and it’s worth saying that plainly. It’s widely understood, has broad provider support, and gives teams a familiar declarative workflow that works well when the infrastructure is relatively stable and the main job is to keep extending an already-settled shape. If a team wants a standard industry model, is comfortable with HCL, and the estate isn’t under much abstraction pressure, Terraform can do the job without much drama.
Not every system needs more expressive power. Some systems really are just a set of resources, a few conventions, and a delivery pipeline that mostly needs to stay boring. In that context, Terraform’s constraints aren’t a problem. Sometimes they’re useful, because they reduce the temptation to turn a simple stack into an abstraction project it never needed.
The problem is that a lot of real systems don’t stay that simple for very long.
Where the drag usually starts
The problem isn’t that Terraform can’t express logic. It can. The problem starts when the logic stops being incidental and starts shaping the system. That’s the point where the structure of the code matters more than whether the provider blocks technically work.
The same failure modes show up over and over. A platform pattern needs to be reused, but not always in exactly the same way, so teams start building layers of modules with slightly different inputs and slightly different behavior. Environment-specific logic creeps in, but instead of being stated directly, it gets spread across locals, variable defaults, naming rules, and a pile of “this is just how we do it” explanations that live outside the code.
Review quality starts dropping because understanding the real behavior means tracing values, conventions, and indirection instead of just reading the logic. The stack still works, but it gets harder to reason about and more expensive to change. That’s usually the stage where nobody wants to touch it unless they absolutely have to.
At that point, provider coverage isn’t the issue anymore. The issue is maintainability. The infrastructure has started behaving like software, while the tooling still wants to be treated like static configuration. That’s where most of the drag comes from.
Why Pulumi is usually the better fit
What matters most to us is whether the infrastructure stays legible as it grows. That’s usually where the real pain shows up. The first version of the stack is rarely the problem. The fifth one is. That’s when the original shortcuts are still there, the exceptions have multiplied, and nobody wants to touch the parts that “technically work.”
Pulumi usually fits better once the infrastructure actually needs logic. Reuse gets easier because it’s just code. Branching gets easier because you can state it directly instead of hiding it behind conventions, locals, and indirection. Refactoring starts feeling normal again, which matters a lot once the stack is long-lived and business-critical. It also helps that we can often match the team’s language instead of forcing infrastructure into its own isolated dialect. A lot of teams already have at least some TypeScript around. If not, Pulumi still gives us other options like Go, which makes it easier to keep the infrastructure layer aligned with the rest of the engineering environment.
That’s why we default to Pulumi. Not because it’s newer, and not because we need infrastructure to feel fashionable. Once the stack is large enough, infrastructure is just software in another form, and we’d rather treat it that way than keep building around the limits of a narrower tool.
What this buys us in practice
The biggest gain isn’t extra expressiveness for its own sake. It’s clarity. Once the system needs shared building blocks, policy enforcement, conditional behavior, and choices like directory per environment versus shared stacks, a real language makes it easier to keep that logic explicit. Review gets cleaner, reuse gets more honest, and the odds of two environments drifting apart drop because the contract between them is actually encoded instead of implied.
It also makes it easier to centralize the rules that usually decay first in infrastructure code. Naming, IAM patterns, environment wiring, service defaults, secret handling, and resource relationships are all easier to standardize when they live in normal abstractions instead of being enforced through habit, comments, and memory. You still need discipline, obviously. Pulumi doesn’t save teams from bad engineering. But it does give good teams a better medium once the stack reaches the point where normal software engineering starts to matter.
Once the infrastructure is getting more complex and the team needs more than repetition with variable substitution, the tradeoff changes.
The tradeoff is real
None of this is free, and pretending otherwise would make the whole decision useless. Pulumi increases surface area. It asks more from the team. It gives people more room to build bad abstractions if they have poor taste or weak boundaries. It also narrows the pool of people who can step into the stack comfortably if the team is optimized around a very standard Terraform workflow and nothing else.
That cost is worth paying when the estate is complex enough that the gain is obvious. It isn’t worth paying on every stack. If the infrastructure is small, stable, and unlikely to develop much branching logic, then more language power may just be unnecessary. If the delivery model is intentionally narrow and the team values convention more than expressiveness, Terraform may still be the more practical fit. That’s why we treat Pulumi as a default, not a doctrine.
The goal isn’t to maximize cleverness. It’s to keep the system maintainable without paying for power that never gets used.
When we stay with Terraform
We stay with Terraform when the client already has a real Terraform investment that’s paying for itself. That means more than “there is some code in the repo.” It means the system is operable, the modules are reasonably mature, the delivery pipeline is dependable, policy controls exist where they need to, and the team understands the stack well enough that it isn’t creating daily friction.
A lot depends on how the team handles Terraform state - clear ownership, controlled apply paths, real locking, and refactors that can happen without panic.
In that situation, migrating for philosophical reasons is usually a bad idea. Even if Pulumi would have been the better greenfield choice, migration is still work, still risk, and still organizational overhead. It competes with everything else that could be improved. If the existing Terraform estate is already producing solid outcomes, continuity usually matters more than purity.
Existing systems have inertia for a reason. Sometimes that’s bureaucracy. Sometimes it’s stability that’s already doing its job. Those aren’t the same thing, and it’s expensive to confuse them.
The decision rule
The tool should match the shape of the problem. If the infrastructure is mostly static and the organization is already operating Terraform well, then staying with Terraform is often the adult decision. If the infrastructure is growing into a real software system with shared logic, branching behavior, and reuse pressure, then a real language usually makes more sense than a narrower declarative one.
That’s why Pulumi is usually the default for us, but not the answer to every question. We’re not trying to win a tooling argument. We’re trying to keep infrastructure calm while the system around it gets more complicated.
If Pulumi makes the stack more legible, easier to evolve, and cheaper to maintain, we choose it. If Terraform is already in place and already paying for itself, we leave it alone.
Everything else is branding.
More in this domain: Infrastructure
Browse allHow we decide between Cloud SQL connectors, Auth Proxy, and private IP
Cloud SQL connectors, the Auth Proxy, and private IP are not interchangeable secure connection options. They change identity, routing, deployment shape, and how much network plumbing the team actually owns.
IAM DB auth for Cloud SQL: when it simplifies security and when it complicates delivery
IAM DB auth can reduce password sprawl and make revocation cleaner, but it also turns database access into an identity operating model that depends on disciplined service-account boundaries.
Safe scaling defaults for Cloud Run + Postgres
Cloud Run autoscaling is not a database strategy. Safe defaults keep the application from scaling itself into a Postgres incident before the team understands the workload.
Cloud Run request timeouts don't kill your code (so your architecture has to)
A Cloud Run request timeout ends the request, not necessarily the work. If the operation can outlive its caller, the system needs explicit job semantics instead of hope.
Cloud Run scaling from zero is a feature until it isn't
Scale to zero is a good default for request-driven services, until startup delay, warm-capacity needs, or instance caps turn it into user-visible reliability behavior instead of a pricing feature.
Related patterns
When repeated Pulumi code earns abstraction and when it doesn't
We don't abstract repeated Pulumi code just because it shows up more than once. We do it when the shared shape is real, the behavior is stable enough to deserve a boundary, and the result is easier to read than the duplication it replaces.
How we decide between directory per environment and shared stacks in Pulumi
We do not force DRY across environments by default. We keep Pulumi environments separate until shared code, shared rules, and drift risk make consolidation cheaper than duplication.
How we structure a directory per environment in Pulumi
When we keep Pulumi environments separate, we make the environment boundary obvious in the filesystem and keep shared logic outside it.
What goes in Pulumi stack config and what doesn't
We use Pulumi stack config for environment-specific values, not as a hiding place for infrastructure logic.