Safe scaling defaults for Cloud Run + Postgres

Cloud Run autoscaling is not a database strategy. Safe defaults keep the application from scaling itself into a Postgres incident before the team understands the workload.

Decision memo Infrastructure

By Ivan RichterLinkedIn

Last updated: Apr 4, 2026

9 min read

cloud-run postgres scaling

On this page

Cloud Run elasticity lets the app tier create more workers quickly. Postgres is still finite, still stateful, and still perfectly capable of becoming the queue for a service that scales more confidently than it should. Scaling defaults determine whether traffic degrades the service in a readable way or lets software dogpile the database while doing exactly what it was told.

It sits below the connection budget for a reason. The budget explains the fleet-wide math. This page is about the concrete settings that decide whether the service respects that math or crashes straight through it. Max instances, concurrency, request timeout, queueing behavior, and per-instance pool size form one contract. If one part says “scale freely” and another part says “claim database sessions aggressively,” the contract is already broken.

The safe default is usually a little conservative on purpose. It keeps the runtime from behaving heroically before the workload has earned that kind of trust.

The settings only matter as a group

Operators often tune these settings one at a time as if each knob owns a different problem. That’s how they end up with a service that looks reasonable in pieces and reckless in aggregate.

Concurrency decides how much request work one instance can absorb before Cloud Run widens the fleet. Max instances decide how wide that fleet is allowed to become. Request timeout decides how long request-shaped work is allowed to occupy the runtime. Pool size decides how much of the database one instance may claim. Acquire timeout decides whether the app admits overload quickly or disguises it as patient waiting.

knob                    what it changes
container concurrency   per-instance request fan-in
max instances           fleet-wide expansion ceiling
request timeout         lifetime of request-shaped work
pool max                per-instance DB claim
acquire timeout         whether overload waits or fails

The interaction between those settings matters more than any single value. Low concurrency does not protect the database if max scale is loose and each instance can still open a greedy pool. A modest max scale does not help much if one instance is allowed to hold too many sessions. Long request timeouts are not innocent if those requests also hold database connections while work drifts through the process. Safe defaults have to be chosen as a posture, not as a pile of separate optimizations.

The default should be boring

For a typical internal API backed by Postgres, we usually start with a profile that assumes the database is worth protecting before the app has proven it can be trusted with more freedom.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: api
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/maxScale: '5'
    spec:
      containerConcurrency: 10
      timeoutSeconds: 30

DB_POOL_MAX=4
DB_POOL_MIN=0
DB_POOL_ACQUIRE_TIMEOUT_MS=1000
DB_STATEMENT_TIMEOUT_MS=5000

Those example numbers express a posture. The service is saying a few simple things at once. It won’t widen without a ceiling. One instance won’t claim much of the database. Request-shaped work should finish or fail within a bounded window. Waiting for a session should become visible quickly instead of turning into a soft backlog buried inside the app and the database.

The profile is deliberately less generous than many framework defaults or benchmark-minded examples. Good. The job is to survive an ordinary traffic spike without teaching Postgres a lesson in human optimism.

A weak default usually looks like generous everything. High concurrency, wide max scale, long request timeout, and a pool size that felt harmless in one container on a quiet afternoon.

# bad shape
maxScale: '50'
containerConcurrency: 80
timeoutSeconds: 300
# app pool max 15

One traffic shape nobody rehearsed, one rollout that widens too quickly, or one request path that holds sessions longer than anyone admitted.

Concurrency, max scale, and pool size solve different problems

These three settings often get lumped together under the word “capacity,” which is how the wrong knob keeps getting blamed.

Lower concurrency when one instance is doing too much database-heavy work at once. If each request tends to borrow a connection and hold it through meaningful work, high concurrency lets one container turn into a small attack on Postgres all by itself.

Lower max instances when one instance is individually reasonable but the fleet as a whole can still make a reckless claim. This is the usual Cloud Run multiplication problem. The per-instance shape looks fine. The service-wide shape is where the math goes bad.

Lower pool size when the instance itself has become too entitled. If the service expects ten or fifteen sessions per instance for an ordinary workload, it is usually asking for more database than it has earned.

symptom                                 first lever to examine
one instance overwhelms DB              lower concurrency or pool size
fleet expands into DB pain              lower max instances
requests wait forever for a session     shorten acquire timeout
slow queries hold sessions too long     fix query/transaction shape first

There’s no magic number that fixes “Cloud Run + Postgres.” The right move depends on whether the service is too deep per instance, too wide as a fleet, or too dishonest about how long it is willing to wait.

Queueing is often healthier than more scale

For a Postgres-backed service, a short and visible queue is often a sign that the system still understands its limits.

Cloud Run already has a queueing story while instances are being brought online, which shows that the platform accepts some waiting as normal. Choose whether that wait happens visibly and with bounds in the app layer or turns into blocked sessions, slow checkout, and a much uglier database incident.

Most of the time, for a finite database, bounded queueing is healthier than unconstrained widening. A small wait near the caller is easier to reason about than a new wave of instances all arriving at the database at once. This is the less glamorous side of scaling from zero. The feature is great right up until the wake-up pattern becomes the incident pattern.

A bad design lets the database become the queue. Requests stack on checkout, then on blocked queries, then on slow transactions, all while the service keeps widening because the runtime still thinks more workers must be the answer. The queue exists either way. One version keeps it visible. The other buries it in the least pleasant part of the system.

The healthier pattern is brief waiting, then an honest failure or a handoff to an async path. That keeps pressure legible instead of letting the database absorb it until every dependent path starts feeling sick.

Request timeout is part of the database contract

Request timeout gets discussed like a user-experience setting. In a Postgres-backed service it also decides how long request-shaped work is allowed to hold the runtime open, and sometimes how long that work is allowed to hold a database session while it does so.

Timeout keeps showing up in the same conversations as a “too many connections” incident and Cloud Run request timeouts. Long timeouts are not automatically bad, but they should force a harder question: is the request path still the right home for the work? If a request can keep the process alive for a long time while also keeping a session busy, then the timeout is no longer just about the caller’s patience. It’s part of the service’s pressure budget.

Shorter timeouts expose a dishonest request boundary without fixing the underlying contract on their own. That’s often a useful kind of pressure.

Different service shapes deserve different defaults

Not every Postgres-backed Cloud Run service should get the same profile. An internal admin API, a user-facing API, and a worker-style endpoint can all use the same database and deserve different scaling behavior.

An admin surface that gets occasional manual traffic usually does not need much width. Low max scale, modest timeout, and a small pool are normally enough because throughput is not the goal. A user-facing API may justify somewhat higher concurrency, but it still should not be allowed to claim much of the database per instance unless the workload has proved that most requests are light. A worker endpoint is often the most dangerous place to be casual because each request tends to do more real work and hold onto resources longer, which usually means lower concurrency and tighter scale caps.

service shape          default posture
admin tool             low maxScale, low pool, modest timeout
public-ish API         moderate concurrency, modest scale cap, small pool
worker endpoint        low concurrency, explicit scale cap, tiny pool per instance

Connection method can influence the margins here. The behavior of connectors, the Auth Proxy, and private IP can affect startup and reconnect shape. That’s worth accounting for. It doesn’t remove the need for a basic scale contract.

The same few mistakes keep showing up

The first mistake is treating autoscaling as a replacement for database capacity planning. Cloud Run can widen quickly. Postgres still has a ceiling.

The second mistake is tuning for hypothetical throughput before there is real evidence about the request mix, query mix, and pressure shape. That mindset produces profiles that look aggressive in a review and collapse the first time traffic behaves badly.

The third mistake is combining retries, long request timeouts, generous pool waits, and permissive scaling into one soft failure mode. The app looks busy because it keeps trying. The database looks dead because it is carrying too many waiting or stalled sessions. The healthier system would have failed faster and closer to the caller.

The fourth mistake is forgetting that the safest scaling change may be to stop doing the work synchronously. If the request is really a job, more scale knobs do not make it a better request.

Sometimes the right tuning is an async boundary

Some services should stop trying to be synchronous before they spend one more hour debating concurrency. If the work can be acknowledged, queued, and handled later, then the safest scaling profile may be to stop making the request own the full unit of work.

Moving the work behind an async boundary is often the first honest design move. A request that no longer owns the whole job no longer needs a long runtime, a wide fleet, or a persistent claim on Postgres while downstream steps unfold. When the work has clearly outgrown a request-shaped boundary, giving the request more room to misbehave is usually the wrong response.

Looser settings have to be earned

We loosen defaults only with evidence. That means production-like traffic or a disciplined load test that actually reflects the request mix, query mix, rollout shape, and burst pattern the service will face. We want to know whether most requests are truly light on the database, whether acquire waits stay boring during bursts, and whether fleet growth remains inside the intended budget during deploys as well as steady traffic.

If the evidence shows that one instance does little database work and rarely holds sessions for long, concurrency can go up. If the evidence shows that the fleet still respects the connection budget under burst, max scale can widen. What we do not do is loosen the profile because one test felt fast or because a framework default sounded professionally optimistic.

What we optimize for

The goal is boring overload behavior. Small claims per instance. Clear limits on widening. Short waits. Visible pressure. Settings that can be explained from first principles instead of inherited from a benchmark or a default nobody remembers choosing.

The posture is designed to prevent surprise. We loosen the profile after the workload proves it is lighter, more tolerant, or less database-bound than expected. Until then, it is better to make the service slightly conservative than to let it discover the database boundary at production speed.

Safe scaling defaults for Cloud Run + Postgres exist to stop the app from scaling itself into a database incident. Start with bounded growth, modest concurrency, short waits, and small per-instance claims on Postgres. Prefer a short queue or an honest failure over a wider fleet rushing the same finite database. Then loosen the defaults only after the workload has actually earned that freedom.

More in this domain: Infrastructure

Browse all

How we decide between Cloud SQL connectors, Auth Proxy, and private IP

Cloud SQL connectors, the Auth Proxy, and private IP are not interchangeable secure connection options. They change identity, routing, deployment shape, and how much network plumbing the team actually owns.

IAM DB auth for Cloud SQL: when it simplifies security and when it complicates delivery

IAM DB auth can reduce password sprawl and make revocation cleaner, but it also turns database access into an identity operating model that depends on disciplined service-account boundaries.

Cloud Run request timeouts don't kill your code (so your architecture has to)

A Cloud Run request timeout ends the request, not necessarily the work. If the operation can outlive its caller, the system needs explicit job semantics instead of hope.

Cloud Run scaling from zero is a feature until it isn't

Scale to zero is a good default for request-driven services, until startup delay, warm-capacity needs, or instance caps turn it into user-visible reliability behavior instead of a pricing feature.

Direct VPC egress vs Serverless VPC Access for Cloud Run: our default

We default to Direct VPC egress for Cloud Run because it is the cleaner networking shape: fewer moving parts, no connector resource, and costs that scale with the service instead of beside it.

Related patterns

GKE Autopilot as the escape hatch from Cloud Run

When Cloud Run stops fitting, the next move is usually GKE Autopilot: more Kubernetes-shaped control without immediately taking on the full burden of Standard clusters.

Why we default to Cloud Run for SME internal platforms

For SME internal platforms, Cloud Run is our default because it covers a large share of useful workload shapes without forcing teams to own cluster operations before they have earned that surface area.

"Internal-only" Cloud Run isn't just a checkbox

Making a Cloud Run service private is not one toggle. It is a decision about ingress, routing, caller path, and IAM working together as one access model.

How we diagnose and fix a "too many connections" incident for Cloud Run + Postgres

A "too many connections" incident is rarely a one-line fix. It usually exposes a bad contract between Cloud Run scaling, app pool behavior, and database capacity.