Why declarative data models scale better than script-driven pipelines

Declarative modeling scales better because it keeps business shape, dependencies, and reviewable intent visible as the platform and team both grow.

Decision memo Data

By Ivan RichterLinkedIn

Last updated: Mar 24, 2026

4 min read

data-engineering data-modeling transformations

On this page

The default

Once the warehouse starts behaving like a shared software system, we prefer declarative data models over script-driven pipelines.

Shared transformation logic ages badly when important behavior is spread across procedural steps, runtime flags, and helper code instead of living in named models with visible contracts.

A quick script can be fine when the work is local, temporary, and not carrying much meaning. Trouble starts when the same logic becomes part of the platform and still doesn’t have a proper home.

Intent stays visible

The biggest gain is visibility.

A declarative model tells the reader what the table is, what it depends on, and how the transformation is shaped. A script-driven pipeline often tells the reader only what sequence of steps happened to run. That’s a very different kind of information. One helps you understand the model. The other helps you replay the process and hope the meaning falls out.

That’s why reviewable transformations and reviewability fit together so naturally. The same structure that makes models easier to review also makes them easier to trust.

When intent stays visible, change gets cheaper. A reviewer can look at a model and understand what it does without reconstructing a miniature runtime from scattered clues.

Boundaries hold up better

Declarative systems don’t remove the need for boundaries, but they do make those boundaries easier to keep intact.

Model semantics can stay in the model. Helper logic can stay in code when it truly belongs there. Workflow tools can stay focused on coordination instead of quietly becoming the place where business logic ends up living by accident.

That’s the practical value behind layer boundaries. Once those responsibilities get blurred, the platform becomes readable mostly to people who already know its history. Everyone else is left following control flow and trying to infer meaning from side effects.

The platform becomes dependent on institutional memory.

Script-driven pipelines leak meaning into the wrong places

A pipeline made of scripts often starts out looking flexible. Later it starts collecting meaning in the wrong layers.

A workflow branch decides which cleanup path is real. A helper quietly carries a business rule that should’ve lived in the model. A runtime flag changes what gets materialized. Now the repo still runs, but understanding behavior means tracing execution instead of reading a data model.

That’s why orchestration boundaries matter so much. Workflow glue is useful. It’s just a terrible long-term home for the semantics of the platform.

Once meaning leaks into workflow and helper layers, review gets slower, incidents get murkier, and small changes start needing local folklore to feel safe.

Shared change is the real pressure

Shared change creates the scaling pressure.

A system can be technically fast and still be structurally weak if every meaningful change requires someone to explain how the moving parts fit together. That’s where declarative models win. They keep more of the important behavior attached to named objects with visible dependencies, instead of scattering it across code paths that only feel obvious to the person who wrote them.

That matters more as the team grows, but it also matters the moment a second person has to review a change without a guided tour. If the model can’t explain itself well enough to survive that handoff, it accumulates surface area without scaling.

This isn’t unique to data

The same tradeoff shows up in IaC work too.

Once a system starts carrying shared logic, branching behavior, and real abstraction pressure, constrained formats and ad hoc glue create drag. That’s one reason Pulumi over Terraform is often the cleaner choice for us.

And even inside a more expressive system, abstraction only helps when it keeps the behavior readable. That’s the same test behind earned abstraction in Pulumi. Different layer, same standard. If the structure hides the real logic, it isn’t helping.

What this changes in practice

We want named models instead of loose script sequences. Explicit dependencies instead of implied ordering. Incremental behavior attached to the model instead of smuggled through scheduler inputs. Assertions close to the transformation they protect. Helper code only where it genuinely makes the model easier to understand.

None of that guarantees a perfect platform. It just gives the system a better chance of staying legible once it starts growing under shared ownership.

The decision rule

We choose declarative data models when the transformation needs to stay legible under shared change.

If a quick script is truly local and disposable, fine. If the logic is turning into platform behavior, we want named models, explicit dependencies, and boundaries that keep the meaning visible.

More in this domain: Data

Browse all

BigQuery cost guardrails that won't break your teams

BigQuery cost control works when guardrails are designed around workload shape and blast radius, not around shaming whoever happened to run the last expensive query.

On-demand vs slots: the SME decision boundary

For SMEs, the question is not which BigQuery pricing model is more sophisticated. The question is when workload classes have become distinct enough to deserve different compute lanes.

Partitioning defaults for event tables that don't lie

Partitioning is not just a performance tweak. It is one of the cheapest ways to control scan blast radius, but only if the partition contract matches how the table is actually queried.

Physical vs logical storage: a dataset classification rule for SMEs

Physical versus logical storage billing is not a warehouse philosophy debate. It is a dataset classification choice based on change rate, retention behavior, and how much storage churn the table creates.

Reservations for workload isolation: the minimal setup

Reservation design for SMEs is usually not an enterprise org chart. It is a small blast-radius pattern that keeps BI, batch, and sandbox work from bullying each other.

Related patterns

Why we model around decision boundaries, not source cleanup

We shape analytical models around the business decision or entity they need to represent, not around the temporary cleanup steps needed to tame source data on the way in.

Dataform vs. script piles: how we keep transformations reviewable

We prefer a declarative transformation layer over ad hoc script piles once warehouse logic becomes shared, incremental, and worth reviewing as a system.

Incremental models are only safe when change detection is explicit

Incremental models are trustworthy only when they can deliberately identify which records need another pass after late or changed upstream data shows up.

How we decide whether a transformation belongs in SQLX, code, or orchestration

We keep transformations in SQLX by default, move to code when the logic truly stops being legible in SQL, and keep orchestration for sequencing rather than business meaning.