← Back to Patterns

Safe scaling defaults for Cloud Run + Postgres

Cloud Run autoscaling is not a database strategy. Safe defaults keep the application from scaling itself into a Postgres incident before the team understands the workload.

By Ivan Richter LinkedIn

Last updated: Apr 4, 2026

10 min read

On this page

Cloud Run makes it very easy to pretend elasticity solved a database problem. It didn’t. It solved the part where the app tier can create more workers quickly. Postgres is still finite, still stateful, and still perfectly capable of becoming the queue for a service that scales more confidently than it should. Scaling defaults matter. They are not performance cosmetics. They are the line between “traffic rose and the service degraded in a readable way” and “traffic rose and the database got dogpiled by software doing exactly what it was told.”

It sits below the connection budget for a reason. The budget explains the fleet-wide math. This page is about the concrete settings that decide whether the service respects that math or crashes straight through it. Max instances, concurrency, request timeout, queueing behavior, and per-instance pool size are not independent tuning choices. They are one contract. If one part says “scale freely” and another part says “claim database sessions aggressively,” the contract is already broken.

The safe default is usually a little conservative on purpose. That is not fear. It is just refusing to let the runtime behave heroically before the workload has earned that kind of trust.

The settings only matter as a group

Operators often tune these settings one at a time as if each knob owns a different problem. That is how they end up with a service that looks reasonable in pieces and reckless in aggregate.

Concurrency decides how much request work one instance can absorb before Cloud Run widens the fleet. Max instances decide how wide that fleet is allowed to become. Request timeout decides how long request-shaped work is allowed to occupy the runtime. Pool size decides how much of the database one instance may claim. Acquire timeout decides whether the app admits overload quickly or disguises it as patient waiting.

knob                    what it changes
container concurrency   per-instance request fan-in
max instances           fleet-wide expansion ceiling
request timeout         lifetime of request-shaped work
pool max                per-instance DB claim
acquire timeout         whether overload waits or fails

The interaction between those settings matters more than any single value. Low concurrency does not protect the database if max scale is loose and each instance can still open a greedy pool. A modest max scale does not help much if one instance is allowed to hold too many sessions. Long request timeouts are not innocent if those requests also hold database connections while work drifts through the process. Safe defaults have to be chosen as a posture, not as a pile of separate optimizations.

The default should be boring

For a typical internal API backed by Postgres, we usually start with a profile that assumes the database is worth protecting before the app has proven it can be trusted with more freedom.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: api
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/maxScale: '5'
    spec:
      containerConcurrency: 10
      timeoutSeconds: 30
DB_POOL_MAX=4
DB_POOL_MIN=0
DB_POOL_ACQUIRE_TIMEOUT_MS=1000
DB_STATEMENT_TIMEOUT_MS=5000

Those numbers are not sacred. The posture is. The service is saying a few simple things at once. It will not widen without a ceiling. One instance will not claim much of the database. Request-shaped work should finish or fail within a bounded window. Waiting for a session should become visible quickly instead of turning into a soft backlog buried inside the app and the database.

The profile is deliberately less generous than many framework defaults or benchmark-minded examples. Good. The job is not to look ambitious on paper. The job is to survive an ordinary traffic spike without teaching Postgres a lesson in human optimism.

A weak default usually looks like generous everything. High concurrency, wide max scale, long request timeout, and a pool size that felt harmless in one container on a quiet afternoon.

# bad shape
maxScale: '50'
containerConcurrency: 80
timeoutSeconds: 300
# app pool max 15

The profile does not need huge traffic to get ugly. It only needs one traffic shape nobody rehearsed, one rollout that widens too quickly, or one request path that holds sessions longer than anyone admitted.

Concurrency, max scale, and pool size solve different problems

These three settings often get lumped together under the word “capacity,” which is how the wrong knob keeps getting blamed.

Lower concurrency when one instance is doing too much database-heavy work at once. If each request tends to borrow a connection and hold it through meaningful work, high concurrency lets one container turn into a small attack on Postgres all by itself.

Lower max instances when one instance is individually reasonable but the fleet as a whole can still make a reckless claim. This is the usual Cloud Run multiplication problem. The per-instance shape looks fine. The service-wide shape is where the math goes bad.

Lower pool size when the instance itself has become too entitled. If the service expects ten or fifteen sessions per instance for an ordinary workload, it is usually asking for more database than it has earned.

symptom                                 first lever to examine
one instance overwhelms DB              lower concurrency or pool size
fleet expands into DB pain              lower max instances
requests wait forever for a session     shorten acquire timeout
slow queries hold sessions too long     fix query/transaction shape first

There is no magic number that fixes “Cloud Run + Postgres.” The right move depends on whether the service is too deep per instance, too wide as a fleet, or too dishonest about how long it is willing to wait.

Queueing is often healthier than more scale

Queueing gets talked about like it means the platform failed. For a Postgres-backed service, a short and visible queue is often a sign that the system still understands its limits.

Cloud Run already has a queueing story while instances are being brought online. That does not settle the design question. It just means the platform accepts that some waiting is normal. The real question is where that waiting should happen. In the app layer where it is visible and bounded, or in the database where it turns into blocked sessions, slow checkout, and a much uglier incident?

Most of the time, for a finite database, bounded queueing is healthier than unconstrained widening. A small wait near the caller is easier to reason about than a new wave of instances all arriving at the database at once. This is the less glamorous side of scaling from zero. The feature is great right up until the wake-up pattern becomes the incident pattern.

A bad design lets the database become the queue. Requests stack on checkout, then on blocked queries, then on slow transactions, all while the service keeps widening because the runtime still thinks more workers must be the answer. The queue exists either way. One version keeps it visible. The other buries it in the least pleasant part of the system.

The healthier pattern is brief waiting, then an honest failure or a handoff to an async path. That keeps pressure legible instead of letting the database absorb it until every dependent path starts feeling sick.

Request timeout is part of the database contract

Request timeout gets discussed like a user-experience setting. In a Postgres-backed service it also decides how long request-shaped work is allowed to hold the runtime open, and sometimes how long that work is allowed to hold a database session while it does so.

Timeout keeps showing up in the same conversations as a “too many connections” incident and Cloud Run request timeouts. Long timeouts are not automatically bad, but they should force a harder question: is the request path still the right home for the work? If a request can keep the process alive for a long time while also keeping a session busy, then the timeout is no longer just about the caller’s patience. It is part of the service’s pressure budget.

Shorter timeouts do not fix a bad contract on their own. They do make it harder for the service to pretend the request boundary was honest when it wasn’t. That is often a useful kind of pressure.

Different service shapes deserve different defaults

Not every Postgres-backed Cloud Run service should get the same profile. An internal admin API, a user-facing API, and a worker-style endpoint can all use the same database and deserve different scaling behavior.

An admin surface that gets occasional manual traffic usually does not need much width. Low max scale, modest timeout, and a small pool are normally enough because throughput is not the goal. A user-facing API may justify somewhat higher concurrency, but it still should not be allowed to claim much of the database per instance unless the workload has proved that most requests are light. A worker endpoint is often the most dangerous place to be casual because each request tends to do more real work and hold onto resources longer, which usually means lower concurrency and tighter scale caps.

service shape          default posture
admin tool             low maxScale, low pool, modest timeout
public-ish API         moderate concurrency, modest scale cap, small pool
worker endpoint        low concurrency, explicit scale cap, tiny pool per instance

Connection method can influence the margins here. The behavior of connectors, the Auth Proxy, and private IP can affect startup and reconnect shape. That is worth accounting for. It does not remove the need for a basic scale contract.

The same few mistakes keep showing up

The first mistake is treating autoscaling as if it were a replacement for database capacity planning. It isn’t. Cloud Run can widen quickly. Postgres still has a ceiling.

The second mistake is tuning for hypothetical throughput before there is real evidence about the request mix, query mix, and pressure shape. That mindset produces profiles that look aggressive in a review and collapse the first time traffic behaves badly.

The third mistake is combining retries, long request timeouts, generous pool waits, and permissive scaling into one soft failure mode. The app looks busy because it keeps trying. The database looks dead because it is carrying too many waiting or stalled sessions. The healthier system would have failed faster and closer to the caller.

The fourth mistake is forgetting that the safest scaling change may be to stop doing the work synchronously. If the request is really a job, more scale knobs do not make it a better request.

Sometimes the right tuning is an async boundary

Some services should stop trying to be synchronous before they spend one more hour debating concurrency. If the work can be acknowledged, queued, and handled later, then the safest scaling profile may be to stop making the request own the full unit of work.

Moving the work behind an async boundary is often the first honest design move. A request that no longer owns the whole job no longer needs a long runtime, a wide fleet, or a persistent claim on Postgres while downstream steps unfold. When the work has clearly outgrown a request-shaped boundary, giving the request more room to misbehave is usually the wrong response.

Looser settings have to be earned

We loosen defaults with evidence, not hope. That means production-like traffic or a disciplined load test that actually reflects the request mix, query mix, rollout shape, and burst pattern the service will face. We want to know whether most requests are truly light on the database, whether acquire waits stay boring during bursts, and whether fleet growth remains inside the intended budget during deploys as well as steady traffic.

If the evidence shows that one instance does little database work and rarely holds sessions for long, concurrency can go up. If the evidence shows that the fleet still respects the connection budget under burst, max scale can widen. What we do not do is loosen the profile because one test felt fast or because a framework default sounded professionally optimistic.

What we optimize for

The goal is boring overload behavior. Small claims per instance. Clear limits on widening. Short waits. Visible pressure. Settings that can be explained from first principles instead of inherited from a benchmark or a default nobody remembers choosing.

The posture is not anti-scale. It is anti-surprise. We loosen the profile after the workload proves it is lighter, more tolerant, or less database-bound than expected. Until then, it is better to make the service slightly conservative than to let it discover the database boundary at production speed.

Safe scaling defaults for Cloud Run + Postgres exist to stop the app from scaling itself into a database incident. Start with bounded growth, modest concurrency, short waits, and small per-instance claims on Postgres. Prefer a short queue or an honest failure over a wider fleet rushing the same finite database. Then loosen the defaults only after the workload has actually earned that freedom.

More in this domain: Infrastructure

Browse all

Related patterns