Why Cloud Run + Postgres needs a connection budget
Cloud Run and Postgres get fragile when connection growth is left implicit. We treat connections as a finite runtime budget, not as plumbing the app can multiply without consequence.
On this page
Cloud Run makes it easy to think in terms of requests, latency, and cold starts. Postgres does not care about any of that until the service has translated it into sessions, transactions, locks, and queries. That translation is where a lot of otherwise reasonable systems get into trouble. The app tier is elastic. The database is not. A service can add execution capacity far faster than Postgres can absorb connection demand, and most of the time nothing in the rollout will say so plainly enough until the incident is already underway.
Connection budgeting sits at the front of the cluster because it is not a pooling page and it is not a tuning page. It is the boundary under both. If database connections are treated as a local detail inside one service, Cloud Run will eventually turn that local optimism into fleet-wide pressure. A pool size that looked harmless on two warm instances stops looking harmless during a burst, a rollout, or a retry storm when ten new instances all wake up and make the same claim on the database at once.
The usual diagnosis is “Postgres ran out of connections.” That is true and not especially helpful. What failed was the runtime contract. The service had no clear answer to a simple question: how much database pressure is this thing allowed to create when Cloud Run behaves exactly as configured and scales out? A connection budget is just that answer written down.
Once the budget exists, a lot of later discussions stop being vague. Questions around safe scaling defaults, request timeout behavior, or managed pooling are no longer isolated knobs. They become different ways of honoring or breaking the same constraint.
Connections are runtime economics, not free plumbing
It is very easy to talk about a database connection like it is just a socket with authentication on top. That mental model is too cheap for Postgres. A session is memory, scheduling, transaction state, lock participation, and one more backend that can sit blocked, idle, or hold resources longer than anyone intended. One connection is not expensive in isolation. A service architecture that can multiply them casually is.
In Cloud Run, serverless elasticity makes local assumptions contagious. If one instance believes it deserves a pool of ten, the platform is happy to repeat that assumption across as many instances as your scaling settings allow. If a second service shares the same database, or two revisions overlap during rollout, or a worker has a different execution shape than the API but still points at the same cluster, the multiplication gets harder to see without becoming any less real.
Raising max_connections can buy time. Sometimes that is the right tactical move. It does not change the economics underneath. It just allows the service to spread a vague contract across more backends for a little longer. If the application still has no real limit on how much pressure it can create, a higher ceiling is usually just a way to fail later and more expensively.
What we care about instead is bounded pressure. That means the service cannot multiply database demand indefinitely just because Cloud Run can multiply containers. Waiting has to start somewhere. Rejection has to start somewhere. Queueing has to happen somewhere. If those boundaries are not explicit, the database becomes the place where the whole system finds them out.
The numbers that keep getting collapsed into one
A lot of bad math comes from treating several different numbers as if they all meant “capacity.”
Request concurrency is how many HTTP requests one Cloud Run instance is willing to accept at once. Worker count is how much parallel work the runtime creates inside that instance, whether through threads, async tasks, multiple processes, or framework-specific workers. Pool size is how many database sessions one instance is allowed to hold or wait for. Backend pressure is what Postgres actually sees after all of that has been multiplied across the fleet.
metric what it actually means
request concurrency HTTP work one instance accepts
worker count internal parallel work the process creates
pool max DB sessions one instance can claim
backend pressure aggregate sessions Postgres must carry These values do not move together. A service with concurrency 80 can be safe if few requests touch Postgres and the pool is small. A service with concurrency 8 can still be reckless if each instance quietly owns a pool of 20 and holds sessions through long-running business logic. A worker endpoint with almost no HTTP traffic can create more consistent database pressure than a busy API because its internal parallelism is higher and less visible in Cloud Run metrics.
“We run at concurrency 20” is not a database capacity statement. It tells us almost nothing by itself. It does not say how many DB-bound code paths can run at once, whether the app opens one pool per process, or whether background work is sharing the same instance contract. Without those answers, the number is mostly theater.
The budget should be simple enough to use during an incident
We do not need an elegant model. We need one someone can use during design review and then use again at 2 a.m. when the service is widening and nobody wants a lecture.
A budget sketch like this is usually enough:
service:
max_instances: 8
container_concurrency: 16
app_workers_per_instance: 1
request_timeout_seconds: 30
database:
app_pool_max: 4
app_pool_min: 0
reserved_headroom_for_admin_and_migrations: 10
reserved_headroom_for_other_services: 20
budget:
worst_case_service_connections: 32
total_planned_pressure: 62 The model is deliberately boring. It forces the design to state how many instances the service may create, how many sessions each instance may claim, and how much room must remain for everything else. It also forces a more useful question than “will the service usually be fine?” When traffic rises, does the app queue and shed load before Postgres collapses, or does it expect the database to negotiate new capacity on the fly?
A smaller table often makes the risk easier to see:
service api
max instances 8
pool max per instance 4
possible sessions 32
shared DB headroom reserved 30
planned DB pressure 62 It is not full database sizing. It is the minimum viable version. Without it, services get defended with lines like “the pool is only six” or “the service rarely hits that many instances” as if either one were a contract. They are just anecdotes unless the upper bound is part of the design.
The bad version of this math is rarely absurd, which is why it survives.
api:
max_instances: 30
container_concurrency: 80
app_pool_max: 10 Individually, each choice can sound defensible. Together, the service is asserting a worst case of 300 sessions before workers, migrations, admin access, or neighboring services have even entered the conversation. Staging often misses this because staging proves that one or two instances can talk to Postgres. It does not prove the fleet has a sane contract once Cloud Run behaves like production.
Where the story usually goes soft
The first lie is that scaling from zero is protection. It is not. Scaling from zero changes the idle story, not the wake-up story. A service can be cheap at rest and still hit the database brutally when traffic returns.
The second lie is that pool size is self-enforcing. It is only per-instance self-enforcing. A pool of four feels disciplined until twelve instances all wake up and make the same claim. If nobody ever multiplies the pool across the fleet, it is not reasoning about database pressure. It is reasoning about one container and hoping production stays polite.
The third lie is that request timeout is backpressure. It is not. A timed-out request may already have acquired a session, started a transaction, or kicked off work that keeps running after the caller has gone away. Request timeout behavior keeps showing up in the same postmortems as connection exhaustion. A timeout is a transport boundary, not a guarantee that the database work stopped.
The fourth lie is that a pooler makes budgeting optional. A pooler can help when the real problem is churn and burst fan-out. It changes how sessions are shared. It does not turn unbounded scale into bounded scale. If the service has no connection contract, a pooler just moves the ambiguity somewhere else.
The fifth lie is usually the most expensive one. Operators assume the database is the scarce thing because the alert came from Postgres. Often the real problem is higher up. The service may be holding transactions across downstream HTTP calls, retrying too aggressively, opening one pool per worker process, or doing work in the request path that should have crossed an async boundary long ago. Postgres sees the pressure, but the application is often where the dishonesty started.
Bounded pressure is the real operating property
Once the budget exists, the next question is what respecting it looks like. It takes more than “use smaller numbers.” Overload needs to show up somewhere teams can actually understand.
Bounded pressure means a service can only increase database demand in a narrow, predictable way. Waiting for a connection is brief and visible. Requests fail honestly when the pool is exhausted instead of pretending to make progress for thirty seconds while the database gets slower and more opaque. The service queues or rejects before the whole estate turns into a fleet of polite liars all waiting on the same finite backend.
In practice, that usually means modest per-instance pools, explicit max instance caps, acquisition timeouts that fail fast enough to matter, and async boundaries when the request path has clearly become the wrong home for the work.
DB_POOL_MAX=4
DB_POOL_MIN=0
DB_POOL_ACQUIRE_TIMEOUT_MS=1000
DB_STATEMENT_TIMEOUT_MS=5000 Those settings are not magic. They are posture. The app is saying it will not wait forever to claim a scarce resource, and it will not behave as if a busy database is an invitation to accumulate more indefinite work.
What we optimize for is predictable degradation. That sounds dull until you have watched a Cloud Run service scale itself into a Postgres incident. Then it starts sounding like mercy. Predictable degradation means the system slows or rejects in ways teams can explain. It means the pool goes empty before Postgres turns into a pile of blocked backends. It means someone can look at the scale cap and the pool size and know roughly how bad the blast radius can get before opening the first query log.
When the budget says the current shape is wrong
Sometimes the exercise shows the problem is not tuning. The service may simply have the wrong runtime shape. If the work needs to outlive the caller, or can be buffered and processed asynchronously, then continuing to negotiate scale and pool settings is often the wrong move. The honest fix is to move the work behind a task, queue, or job boundary and stop pretending the request path owns durable execution.
Sometimes the budget says the service is too wide. Then max instances need to come down. Sometimes it says each instance is too greedy. Then pool size or concurrency needs to come down. Sometimes it says the API and the worker should not share the same database rights or the same headroom assumptions. Then they need separate budgets.
And sometimes it shows the database product is not the first thing to change. Teams often reach for a bigger managed database as soon as pressure becomes visible. If the current system still has no credible connection contract, changing products is often just a more expensive way to avoid the same design problem. The choice between Cloud SQL and AlloyDB only gets interesting after the app has admitted the database is finite and designed accordingly.
Cloud Run and Postgres work well together when the service treats connections as a scarce runtime resource instead of as plumbing it can multiply without consequence. Write the budget down. Separate request concurrency, worker count, pool size, and backend pressure instead of collapsing them into one comforting number. Put queueing, rejection, or async handoff in front of the database rather than after it. If the only thing standing between a traffic spike and a connection incident is hope that the app will stay polite, there is no budget yet. There is just deferred pain.
More in this domain: Operations
Browse allAlloyDB managed connection pooling: when we'd trust it over PgBouncer
AlloyDB managed pooling is attractive because it removes a moving part, but the useful decision is whether the managed path gives enough semantic confidence, observability, and migration predictability to replace PgBouncer.
Cloud SQL to AlloyDB migration: what actually changes, what doesn't, and what we'd test first
A Cloud SQL to AlloyDB move is not a philosophical upgrade. It changes the operational boundary, and the useful work is re-proving the parts of the system that may no longer behave the same.
Cloud SQL vs AlloyDB: the real difference is operational boundary, not benchmarks
The useful comparison between Cloud SQL and AlloyDB is not raw speed. It is how the operating boundary changes around scaling, pooling, failover, migration, and team burden.
How we diagnose and fix a "too many connections" incident for Cloud Run + Postgres
A "too many connections" incident is rarely a one-line fix. It usually exposes a bad contract between Cloud Run scaling, app pool behavior, and database capacity.
Managed connection pooling in Cloud SQL: when it helps and when it complicates things
Managed connection pooling in Cloud SQL can reduce bursty connection pressure, but it also changes session behavior and should be adopted like a runtime boundary, not like a harmless checkbox.
Related patterns
How we decide between Cloud SQL connectors, Auth Proxy, and private IP
Cloud SQL connectors, the Auth Proxy, and private IP are not interchangeable secure connection options. They change identity, routing, deployment shape, and how much network plumbing the team actually owns.
Safe scaling defaults for Cloud Run + Postgres
Cloud Run autoscaling is not a database strategy. Safe defaults keep the application from scaling itself into a Postgres incident before the team understands the workload.
What we keep out of orchestration in data platforms
We use orchestration to sequence work, not to become the real home of model semantics, cleanup logic, or hidden branching behavior in the data platform.
IAM DB auth for Cloud SQL: when it simplifies security and when it complicates delivery
IAM DB auth can reduce password sprawl and make revocation cleaner, but it also turns database access into an identity operating model that depends on disciplined service-account boundaries.