← Back to Patterns

Cloud Run request timeouts don't kill your code (so your architecture has to)

A Cloud Run request timeout ends the request, not necessarily the work. If the operation can outlive its caller, the system needs explicit job semantics instead of hope.

By Ivan Richter LinkedIn

Last updated: Mar 25, 2026

6 min read

On this page

The trap

A Cloud Run request timeout isn’t a safe stop signal.

When a request exceeds the configured timeout, Cloud Run closes the connection and returns a 504. That’s the part the caller sees. What matters more is what the platform doesn’t promise. The container instance that handled the request isn’t guaranteed to be terminated just because the request timed out. The code may keep running after the caller has already been told it failed.

That’s where systems get themselves into trouble. If the architecture treats “request timed out” as if it means “work stopped,” it’s working from a false boundary.

The request ended. The work may not have.

Timeouts describe how long the platform will wait for a response. They don’t guarantee the application rolled back cleanly, noticed the disconnect, or abandoned the operation halfway through.

A request timeout also isn’t the same thing as instance shutdown. Cloud Run can send SIGTERM, and eventually SIGKILL, when it’s actually shutting an instance down. That’s a separate lifecycle event. A timed-out request only tells you the caller stopped waiting.

That matters any time the request does something with real side effects. Charging a card. Sending an email. Publishing a message. Updating multiple systems. Writing partial state. Kicking off follow-up work that the caller will now retry because it thinks nothing happened.

The question isn’t whether the handler returned a 504. The question is whether the system can tolerate the code continuing after that. If the answer is no, the request was never a safe owner for that work.

A bigger timeout doesn’t fix ownership

Cloud Run lets you increase the request timeout, and sometimes that’s the right operational move. But it doesn’t solve the design problem.

A longer timeout only means the platform is willing to wait longer before giving up on the response. It doesn’t turn the request into a durable execution contract. The client can still disconnect. A retry can still arrive. A network path can still break. A human can still hit refresh and send the same intent again.

So timeout tuning can reduce pressure. It can’t be the thing that makes fragile work safe. If the operation breaks the moment the caller and the worker stop sharing the same timeline, the problem isn’t the timeout value. The problem is ownership.

Request-shaped work and work that only started from a request

Some work genuinely fits the request lifecycle.

Validate input. Read data. Apply a small mutation. Return a response. The request comes in, the service does the thing, and the result goes back to the caller. Clean enough.

Other work only happens to enter the system through HTTP. The request is just the front door. The real workload is queue work, batch work, state-machine work, or longer processing that needs to keep going after the caller is gone.

Once that’s true, the request is no longer a safe place to anchor correctness. It’s only the trigger. Treating it like the owner of the work is how systems end up with retries, duplicate side effects, and state that nobody can explain cleanly.

Cloud Run scaling to zero matters here too. A runtime that wakes on requests works well when the request really is the unit of work. It fits a lot worse when the real unit of work needs to survive the request that started it.

Acknowledged is not completed

One of the more expensive confusions in cloud systems is treating acceptance like completion.

The request was received. The handler started. Maybe it even wrote something before the timeout. None of that means the operation finished in a way the rest of the system can reason about safely.

That’s why long or fragile work needs explicit state. Pending. Running. Succeeded. Failed. Retrying. Cancelled. Anything less and the system ends up inferring truth from transport behavior, which is how you get duplicate execution, partial side effects, and operator folklore instead of real runtime guarantees.

A 504 isn’t domain state. It’s just a failed conversation between caller and service.

Give the work a real owner

When the operation can outlive the caller, we want explicit work semantics.

Sometimes that means enqueue and acknowledge. Persist the intent, return quickly, and let another worker own the execution path. Sometimes it means task-driven processing with retries and idempotency. Sometimes it means a Cloud Run job, where the unit of work is meant to run to completion instead of pretending to be an HTTP response. Sometimes it means checkpoint-and-resume behavior with explicit state transitions so the system can recover without guessing what happened last.

The pattern isn’t “use queues for everything.” The pattern is simpler. Give the work an owner that survives the request.

Idempotency is the minimum, not the bonus

Once request and execution can drift apart, retries stop being optional theory and start becoming normal behavior.

The client retries because it saw a timeout. The platform retries because a task failed. An operator retries because the status is unclear. If the operation can’t tolerate being attempted more than once, the system is brittle before traffic even shows up.

That doesn’t mean every action becomes perfectly repeatable. Some work has side effects that need explicit deduplication or a stronger state machine around them. Fine. The point is still the same. Once a timeout can leave execution in flight, idempotency stops being nice to have. It becomes part of basic correctness.

Cloud Run is still usually the right default

None of this makes Cloud Run a bad runtime.

For SME internal systems, it’s still usually the right place to start because it removes a lot of ownership burden while covering a large share of sane workload shapes. That broader case lives in Cloud Run as the default.

The boundary isn’t “never do long work on Cloud Run.” The boundary is “do not pretend a fragile request lifecycle is a durable work model.”

Cloud Run can serve the front door just fine. It just shouldn’t be asked to fake semantics the request boundary doesn’t actually provide.

When the runtime shape stops fitting

Sometimes the problem isn’t the handler. The problem is the workload.

If the system needs more Kubernetes-native control, more involved service topology, or workload patterns that no longer fit comfortably inside request-driven services and jobs, Cloud Run may stop being the clean choice. At that point, adding more patches around the request model is usually a sign the workload wants a different home.

At that point GKE Autopilot starts to make sense. Not because Cloud Run failed, but because the workload stopped matching the runtime shape that made Cloud Run such a good default in the first place.

The point

A request timeout is a transport boundary, not an execution guarantee.

If timing out a caller can leave the system in a bad state, the answer isn’t just a bigger timeout. The answer is to give the work a safer owner than the request itself.

More in this domain: Infrastructure

Browse all

Related patterns