Cloud Run scaling from zero is a feature until it isn't
Scale to zero is a good default for request-driven services, until startup delay, warm-capacity needs, or instance caps turn it into user-visible reliability behavior instead of a pricing feature.
On this page
The default
Scale to zero is a feature.
For the right workload, it’s one of the best parts of Cloud Run. Idle capacity disappears. Costs stay closer to actual use. Small internal systems can exist without carrying warm infrastructure all day just in case somebody clicks a button.
The mistake is treating that feature like a law of nature instead of a workload decision.
The real question isn’t just cold starts
People talk about scale to zero as if the whole issue were cold-start latency on the first request. That’s only part of it.
The more important question is whether the service can honestly tolerate request-driven wake-up behavior at all. Cloud Run can only scale from zero because a request arrives. If the workload needs to keep making progress while the system is otherwise idle, or if useful work depends on background activity that has no incoming request to wake it up, then the scaling model is already part of the architecture.
At that point, scale to zero isn’t just a pricing feature. It’s a runtime boundary.
Not every cheap default is a good fit
For plenty of services, scale to zero is exactly right.
Internal APIs. Admin surfaces. Event handlers. Lightweight automation entry points. Small tools with uneven usage. Those all tend to benefit from not paying for warm instances nobody needs most of the day.
But some workloads start asking for more than that model wants to give. They need predictable startup behavior. They need warm capacity because latency is visible. They need to do work even when nobody’s actively calling them. Or they need enough burst headroom that “we’ll wake up when traffic arrives” stops sounding like a serious plan.
That isn’t Cloud Run failing. It’s the workload telling you what shape it actually has.
Request-driven wake-up is part of the contract
Scale to zero works because Cloud Run is willing to let a service sit at nothing and wake it back up only when a request comes in.
It’s a very good trade when the request really is the unit of work. It’s a much worse trade when the request is only the trigger for work that needs to keep going after the caller disappears, or when the system needs standing capacity to feel responsive enough in practice.
That link to request timeouts is part of the same runtime question. Startup behavior, request lifetime, and retry semantics all belong to the workload shape. They aren’t separate tuning checklists.
Minimum instances are sometimes the honest answer
There is nothing morally superior about running at zero.
If the service can’t tolerate startup delay, or if it needs CPU even while it isn’t actively handling requests, then keeping at least one instance warm is usually the honest configuration. That isn’t waste. That’s just paying for the behavior the workload already requires.
A lot of teams resist this longer than they should because “serverless” sounds cleaner when it implies nothing is running. Fine. The bill doesn’t care about branding. If the service wants warm capacity, pretending otherwise usually just pushes the cost into latency, retries, and user annoyance.
Maximum instances are not just a cost setting
Maximum instances are also not a harmless little cap you set and forget.
Once you cap how far the service can scale, you’re choosing what happens when traffic wants more than the service is allowed to provide. Cloud Run will queue pending requests for a short window. After that, they can fail. That isn’t an internal implementation detail. That’s the user-visible behavior you selected when you decided the service should stop scaling past a certain point.
So instance caps aren’t just about cost control or protecting a backing service. They’re also part of your queueing and rejection policy, whether you wrote that policy down anywhere or not.
Idle time isn’t always really idle
Another thing that often gets overlooked is that scale to zero isn’t the same as “the platform instantly kills idle capacity.”
Cloud Run may keep idle instances around for a while to soften cold starts. That helps, but it doesn’t change the underlying model. It just means the platform sometimes gives you a short grace period before dropping back down. Useful, yes. Something to build guarantees on, no.
If the workload needs warm capacity as part of normal behavior, set it explicitly. Relying on the platform maybe keeping an instance around for a bit is how people end up acting surprised by behavior the platform never promised them.
This is still usually a very good trade
For a lot of SME systems, the trade is still excellent.
Pay-per-use scaling with optional warm instances covers a large share of real internal platform workloads without dragging in a heavier runtime too early. That’s a big part of Cloud Run as the default. The platform stays small until the workload proves it wants something more expensive, more opinionated, or more continuously alive.
That’s exactly how it should be.
When the workload stops matching the runtime
If the service now wants stable warm capacity, more involved topology, continuous background activity, or workload controls that no longer fit comfortably inside the Cloud Run model, the clean answer may be a different runtime shape altogether.
Kubernetes via GKE Autopilot starts to make sense here. Not because Cloud Run failed, but because the workload stopped matching the conditions that made Cloud Run such a good default in the first place.
The point
Scale to zero is an optimization, not a law of nature.
Use it when the workload can honestly tolerate request-driven wake-up, variable startup behavior, and the scaling limits you put around it. Stop pretending it’s free when the system has already told you it wants warm capacity or a different execution model.
More in this domain: Infrastructure
Browse allHow we decide between Cloud SQL connectors, Auth Proxy, and private IP
Cloud SQL connectors, the Auth Proxy, and private IP are not interchangeable secure connection options. They change identity, routing, deployment shape, and how much network plumbing the team actually owns.
IAM DB auth for Cloud SQL: when it simplifies security and when it complicates delivery
IAM DB auth can reduce password sprawl and make revocation cleaner, but it also turns database access into an identity operating model that depends on disciplined service-account boundaries.
Safe scaling defaults for Cloud Run + Postgres
Cloud Run autoscaling is not a database strategy. Safe defaults keep the application from scaling itself into a Postgres incident before the team understands the workload.
Cloud Run request timeouts don't kill your code (so your architecture has to)
A Cloud Run request timeout ends the request, not necessarily the work. If the operation can outlive its caller, the system needs explicit job semantics instead of hope.
Direct VPC egress vs Serverless VPC Access for Cloud Run: our default
We default to Direct VPC egress for Cloud Run because it is the cleaner networking shape: fewer moving parts, no connector resource, and costs that scale with the service instead of beside it.
Related patterns
GKE Autopilot as the escape hatch from Cloud Run
When Cloud Run stops fitting, the next move is usually GKE Autopilot: more Kubernetes-shaped control without immediately taking on the full burden of Standard clusters.
"Internal-only" Cloud Run isn't just a checkbox
Making a Cloud Run service private is not one toggle. It is a decision about ingress, routing, caller path, and IAM working together as one access model.
Why we default to Cloud Run for SME internal platforms
For SME internal platforms, Cloud Run is our default because it covers a large share of useful workload shapes without forcing teams to own cluster operations before they have earned that surface area.
How we treat Terraform state in team environments
Terraform starts feeling fragile in teams when state is treated like a backend setting instead of a shared dependency for safe change.