Why Cloud Run + Postgres needs a connection budget

Cloud Run and Postgres get fragile when connection growth is left implicit. We treat connections as a finite runtime budget, not as plumbing the app can multiply without consequence.

Operating principle Operations

By Ivan RichterLinkedIn

Last updated: Apr 4, 2026

9 min read

cloud-run postgres cloud-sql

On this page

Cloud Run makes it easy to think in terms of requests, latency, and cold starts. Postgres does not care about any of that until the service has translated it into sessions, transactions, locks, and queries. That translation is where a lot of otherwise reasonable systems get into trouble. The app tier is elastic. The database is not. A service can add execution capacity far faster than Postgres can absorb connection demand, and most of the time nothing in the rollout will say so plainly enough until the incident is already underway.

Connection budgeting sits at the front of the cluster because it defines the boundary underneath both pooling and tuning. If database connections are treated as a local detail inside one service, Cloud Run will eventually turn that local optimism into fleet-wide pressure. A pool size that looked harmless on two warm instances stops looking harmless during a burst, a rollout, or a retry storm when ten new instances all wake up and make the same claim on the database at once.

The usual diagnosis, “Postgres ran out of connections,” only names the symptom. What failed was the runtime contract. The service had no clear answer to a simple question: how much database pressure is this thing allowed to create when Cloud Run behaves exactly as configured and scales out? A connection budget is just that answer written down.

Once the budget exists, questions around safe scaling defaults, request timeout behavior, or managed pooling are no longer isolated knobs. They become different ways of honoring or breaking the same constraint.

Connections are runtime economics, not free plumbing

It’s very easy to talk about a database connection like it’s just a socket with authentication on top. That mental model is too cheap for Postgres. A session is memory, scheduling, transaction state, lock participation, and one more backend that can sit blocked, idle, or hold resources longer than anyone intended. The cost appears when a service architecture multiplies connections casually.

In Cloud Run, serverless elasticity makes local assumptions contagious. If one instance believes it deserves a pool of ten, the platform is happy to repeat that assumption across as many instances as your scaling settings allow. If a second service shares the same database, or two revisions overlap during rollout, or a worker has a different execution shape than the API but still points at the same cluster, the multiplication gets harder to see without becoming any less real.

Raising max_connections can buy time. Sometimes that is the right tactical move. It spreads the same vague contract across more backends for a little longer. If the application still has no real limit on how much pressure it can create, a higher ceiling is usually just a way to fail later and more expensively.

What we care about instead is bounded pressure. That means the service cannot multiply database demand indefinitely just because Cloud Run can multiply containers. Waiting has to start somewhere. Rejection has to start somewhere. Queueing has to happen somewhere. If those boundaries are not explicit, the database becomes the place where the whole system finds them out.

The numbers that keep getting collapsed into one

A lot of bad math comes from treating several different numbers as if they all meant “capacity.”

Request concurrency is how many HTTP requests one Cloud Run instance is willing to accept at once. Worker count is how much parallel work the runtime creates inside that instance, whether through threads, async tasks, multiple processes, or framework-specific workers. Pool size is how many database sessions one instance is allowed to hold or wait for. Backend pressure is what Postgres actually sees after all of that has been multiplied across the fleet.

metric                      what it actually means
request concurrency         HTTP work one instance accepts
worker count                internal parallel work the process creates
pool max                    DB sessions one instance can claim
backend pressure            aggregate sessions Postgres must carry

These values do not move together. A service with concurrency 80 can be safe if few requests touch Postgres and the pool is small. A service with concurrency 8 can still be reckless if each instance quietly owns a pool of 20 and holds sessions through long-running business logic. A worker endpoint with almost no HTTP traffic can create more consistent database pressure than a busy API because its internal parallelism is higher and less visible in Cloud Run metrics.

“We run at concurrency 20” is not a database capacity statement. It tells us almost nothing by itself. It doesn’t say how many DB-bound code paths can run at once, whether the app opens one pool per process, or whether background work is sharing the same instance contract. Without those answers, the number is mostly theater.

The budget should be simple enough to use during an incident

We need a model someone can use during design review and then use again at 2 a.m. when the service is widening and nobody wants a lecture.

A budget sketch like this is usually enough:

service:
  max_instances: 8
  container_concurrency: 16
  app_workers_per_instance: 1
  request_timeout_seconds: 30

database:
  app_pool_max: 4
  app_pool_min: 0
  reserved_headroom_for_admin_and_migrations: 10
  reserved_headroom_for_other_services: 20

budget:
  worst_case_service_connections: 32
  total_planned_pressure: 62

The model is deliberately boring. It forces the design to state how many instances the service may create, how many sessions each instance may claim, and how much room must remain for everything else. It also forces a more useful question than “will the service usually be fine?” When traffic rises, does the app queue and shed load before Postgres collapses, or does it expect the database to negotiate new capacity on the fly?

A smaller table often makes the risk easier to see:

service                       api
max instances                 8
pool max per instance         4
possible sessions             32
shared DB headroom reserved   30
planned DB pressure           62

This is the minimum viable sizing exercise. Without it, services get defended with lines like “the pool is only six” or “the service rarely hits that many instances” as if either one were a contract. They are just anecdotes unless the upper bound is part of the design.

The bad version of this math is rarely absurd, which is why it survives.

api:
  max_instances: 30
  container_concurrency: 80
  app_pool_max: 10

Individually, each choice can sound defensible. Together, the service is asserting a worst case of 300 sessions before workers, migrations, admin access, or neighboring services have even entered the conversation. Staging often misses this because staging proves that one or two instances can talk to Postgres. It doesn’t prove the fleet has a sane contract once Cloud Run behaves like production.

Where the story usually goes soft

The first lie treats scaling from zero as protection. Scaling from zero changes the idle story, not the wake-up story. A service can be cheap at rest and still hit the database brutally when traffic returns.

The second lie is that pool size is self-enforcing. It is only per-instance self-enforcing. A pool of four feels disciplined until twelve instances all wake up and make the same claim. Without fleet-wide multiplication, the model covers one container and merely hopes production stays polite.

The third lie treats request timeout as backpressure. A timed-out request may already have acquired a session, started a transaction, or kicked off work that keeps running after the caller has gone away. Request timeout behavior keeps showing up in the same postmortems as connection exhaustion. A timeout is a transport boundary, not a guarantee that the database work stopped.

The fourth lie is that a pooler makes budgeting optional. A pooler can help when the real problem is churn and burst fan-out. It changes how sessions are shared. It doesn’t turn unbounded scale into bounded scale. If the service has no connection contract, a pooler just moves the ambiguity somewhere else.

The fifth lie is usually the most expensive one. Operators assume the database is the scarce thing because the alert came from Postgres. The application often creates the pressure by holding transactions across downstream HTTP calls, retrying too aggressively, opening one pool per worker process, or doing work in the request path that should have crossed an async boundary long ago. Postgres sees the pressure, but the application is often where the dishonesty started.

Bounded pressure is the real operating property

Once the budget exists, the next question is what respecting it looks like. It takes more than “use smaller numbers.” Overload needs to show up somewhere teams can actually understand.

Bounded pressure means a service can only increase database demand in a narrow, predictable way. Waiting for a connection is brief and visible. Requests fail honestly when the pool is exhausted instead of pretending to make progress for thirty seconds while the database gets slower and more opaque. The service queues or rejects before the whole estate turns into a fleet of polite liars all waiting on the same finite backend.

In practice, that usually means modest per-instance pools, explicit max instance caps, acquisition timeouts that fail fast enough to matter, and async boundaries when the request path has clearly become the wrong home for the work.

DB_POOL_MAX=4
DB_POOL_MIN=0
DB_POOL_ACQUIRE_TIMEOUT_MS=1000
DB_STATEMENT_TIMEOUT_MS=5000

Together those settings express a posture. The app is saying it will not wait forever to claim a scarce resource, and it will not behave as if a busy database is an invitation to accumulate more indefinite work.

What we optimize for is predictable degradation. That sounds dull until you have watched a Cloud Run service scale itself into a Postgres incident. Then it starts sounding like mercy. Predictable degradation means the system slows or rejects in ways teams can explain. It means the pool goes empty before Postgres turns into a pile of blocked backends. It means someone can look at the scale cap and the pool size and know roughly how bad the blast radius can get before opening the first query log.

When the budget says the current shape is wrong

Sometimes the exercise shows that tuning won’t be enough. The service may simply have the wrong runtime shape. If the work needs to outlive the caller, or can be buffered and processed asynchronously, then continuing to negotiate scale and pool settings is often the wrong move. The honest fix is to move the work behind a task, queue, or job boundary and stop pretending the request path owns durable execution.

Sometimes the budget says the service is too wide. Then max instances need to come down. Sometimes it says each instance is too greedy. Then pool size or concurrency needs to come down. Sometimes it says the API and the worker should not share the same database rights or the same headroom assumptions. Then they need separate budgets.

And sometimes it shows that the application contract needs work first. Teams often reach for a bigger managed database as soon as pressure becomes visible. If the current system still has no credible connection contract, changing products is often just a more expensive way to avoid the same design problem. The choice between Cloud SQL and AlloyDB only gets interesting after the app has admitted the database is finite and designed accordingly.

Cloud Run and Postgres work well together when the service treats connections as a scarce runtime resource instead of as plumbing it can multiply without consequence. Write the budget down. Separate request concurrency, worker count, pool size, and backend pressure instead of collapsing them into one comforting number. Put queueing, rejection, or async handoff in front of the database rather than after it. If the only thing standing between a traffic spike and a connection incident is hope that the app will stay polite, hope is only deferring the pain.

More in this domain: Operations

Browse all

An alert is not a notification

A notification says something happened. An operational alert identifies a business situation, assigns ownership, carries enough context to act, records the response, and becomes workflow state.

Why alert feedback should be structured first

Free text helps, but structured alert feedback lets the system measure relevance, timing, duplicates, bad data, and rule quality. Human response becomes evidence the rules can learn from.

How we diagnose and fix a "too many connections" incident for Cloud Run + Postgres

A "too many connections" incident is rarely a one-line fix. It usually exposes a bad contract between Cloud Run scaling, app pool behavior, and database capacity.

AlloyDB managed connection pooling: when we'd trust it over PgBouncer

AlloyDB managed pooling is attractive because it removes a moving part, but the useful decision is whether the managed path gives enough semantic confidence, observability, and migration predictability to replace PgBouncer.

Cloud SQL to AlloyDB migration: what actually changes, what doesn't, and what we'd test first

A Cloud SQL to AlloyDB move is not a philosophical upgrade. It changes the operational boundary, and the useful work is re-proving the parts of the system that may no longer behave the same.

Related patterns

Cloud SQL vs AlloyDB: the real difference is operational boundary, not benchmarks

The useful comparison between Cloud SQL and AlloyDB is not raw speed. It is how the operating boundary changes around scaling, pooling, failover, migration, and team burden.

Managed connection pooling in Cloud SQL: when it helps and when it complicates things

Managed connection pooling in Cloud SQL can reduce bursty connection pressure, but it also changes session behavior and should be adopted like a runtime boundary, not like a harmless checkbox.

How we decide between Cloud SQL connectors, Auth Proxy, and private IP

Cloud SQL connectors, the Auth Proxy, and private IP are not interchangeable secure connection options. They change identity, routing, deployment shape, and how much network plumbing the team actually owns.

Safe scaling defaults for Cloud Run + Postgres

Cloud Run autoscaling is not a database strategy. Safe defaults keep the application from scaling itself into a Postgres incident before the team understands the workload.