Data & System Architecture, from the ground up Lesson 52 / 80

CD for data: deployment patterns for batch and streaming

Blue-green, canary, dark launch. Why streaming jobs need different deploy patterns than web services, and how batch jobs deploy through their schedule.

Continuous delivery for web services is well-understood. You roll out a new version of an API, you watch error rates and latencies, and if anything looks wrong you roll back. The whole industry has converged on a small set of patterns: blue-green, canary, rolling deploys, feature flags. Books have been written about it, the SRE community has matured around it, and most modern platforms (Kubernetes, ECS, Cloud Run, Lambda) ship the primitives in the box.

Data pipelines are different. Not because the principles change, but because the shape of the workload changes. A web service handles millions of independent short requests; a bad deploy produces a brief spike of 5xxs that you can roll back in seconds. A pipeline either produces large slices of derived data on a schedule or runs continuously and holds state. A bad deploy can write wrong rows to the warehouse for hours before anyone notices, or corrupt the state of a streaming job that has been running for weeks. The blast radius is bigger; the discipline must match.

This lesson walks through the deployment patterns and how each maps onto batch and streaming. The big idea is that the patterns are the same, but the mechanics are different enough that you cannot just copy the web-services playbook.

What CD looks like for a web service

To set the contrast, here is the standard web-service flow. You merge to main, CI builds an image, the image gets pushed to a registry, the deploy system rolls it out. Rolling deploy means new pods come up with the new image while old ones drain; at any moment some traffic hits old, some hits new. Canary means you route a small percentage (5 percent, 10 percent) to the new version and watch metrics before promoting. Blue-green means you stand up an entirely fresh environment with the new version, smoke test it, and flip traffic over with a load-balancer switch.

The unifying property: a request is a small, isolated unit. If the new version is bad, you stop sending it requests, and the bad behaviour stops. Nothing persists across deploys except the database, which both versions share.

That last sentence is where the data world starts to diverge. For pipelines, the deploy unit is a job, not a request. The job has state. The job writes to outputs that other jobs and humans depend on. And the things you can roll back vary depending on whether the job is batch or streaming.

Batch pipelines: deploy through the schedule

Batch is the gentler of the two. A batch job runs on a schedule (hourly, daily, whenever the upstream data lands). When you deploy new code, the next scheduled run picks it up. There is no in-flight long-running process to wrestle with.

The default deploy flow for a batch pipeline looks like this:

  1. Merge a code change to the repo.
  2. CI builds the artefact (a Docker image, a wheel, a JAR) and uploads it to wherever the orchestrator pulls from.
  3. The next time the job is triggered (next 02:00 run, next manual trigger, next sensor fire), it uses the new code.

Rollback is symmetric: revert the commit, redeploy the artefact, the next run goes back to the old behaviour. If you want to be paranoid, kill any currently running job before rolling back so it does not finish under the old code with new outputs already partially written.

Idempotency, from lesson 38, is what makes this clean. If the job is safely re-runnable, you can roll back and replay the affected dates with no fuss. If it is not, you have to chase down whatever wrong output the bad version wrote and clean it up by hand.

The catch is schema migrations. If the new code expects a new column that the old code did not write, rolling back the code without rolling back the schema breaks reads. Worse, if you rolled back because the new schema turned out to be wrong, you might have outputs in the new shape that the old code cannot read at all.

The discipline that makes this manageable is the same as for online services: keep schema changes backwards-compatible. Adding a column is safe; dropping or renaming a column is not. Deploys that need a schema change should be split into two deploys: first the additive migration, then the code that uses it. If you ever need the destructive change (drop a column, narrow a type), you do it as a separate explicit step long after the code that depended on the old shape is gone.

The upshot: batch deploys feel like web deploys but slowed down to the cadence of the schedule. The pattern that matters most is keeping migrations and code releases independent.

Streaming pipelines: the long-running job problem

Streaming is where the deploy story gets interesting. A Flink or Spark Structured Streaming job is a long-running process. It has been running for weeks, holding state (windows, joins, deduplication tables, machine-learning model state) that took non-trivial time to accumulate. You cannot just roll out new code the way you would for a stateless API.

The standard pattern for stateful stream processors is the savepoint. Flink, the canonical example, lets you trigger a savepoint at any moment: a snapshot of the entire job state written to durable storage. The deploy flow becomes:

  1. Build the new code.
  2. Trigger a savepoint on the running job.
  3. Stop the job once the savepoint completes.
  4. Start the new job from the savepoint, using the new code.

The new job picks up exactly where the old one left off, with the state intact. Kafka offsets are part of the savepoint, so it consumes the same next message the old job would have. No reprocessing, no gap.

This works as long as the new code can read the old state. If you renamed a state field, changed its type, or added a new operator that did not exist before, you need a state migration step. Flink supports schema evolution for state if you stick to compatible serializers (POJO, Avro), but the moment you start storing arbitrary Java objects with no schema you are on your own.

Stateless streaming jobs (a simple stateless filter, a stateless enrichment) are much easier. There is no state to migrate, so you can do a rolling restart: start the new instance, let it catch up, kill the old one. Same pattern as web services.

The diagram below shows the contrast.

flowchart TB
    subgraph BATCH["Batch deploy"]
        B1[Merge code] --> B2[CI builds artefact]
        B2 --> B3[Next scheduled run]
        B3 --> B4[Uses new code]
    end
    subgraph STREAM["Streaming deploy with savepoint"]
        S1[Merge code] --> S2[CI builds artefact]
        S2 --> S3[Trigger savepoint on running job]
        S3 --> S4[Stop old job]
        S4 --> S5[Start new job from savepoint]
    end

Blue-green, canary, dark launch for data

The web-service patterns translate to data pipelines, with adjustments.

Blue-green for data. Maintain two complete pipelines, blue and green. Blue is what consumers read from. Deploy the new version as green, run it alongside blue, and once you have confidence, flip the consumer pointer (a view, a table alias, a downstream job’s input config) to green. Blue stays around as the rollback target.

The cost is real: you are running two copies of the pipeline and storing two copies of the output. For a small pipeline this is fine. For a petabyte lakehouse it is prohibitive. The pattern is most useful for the critical paths where you cannot tolerate a wrong-output incident, and you accept the doubled cost as insurance.

Canary for data. Deploy the new version to a slice of the work. For a Kafka-fed streaming job, that might be a subset of partitions or a subset of topics. For a batch job partitioned by tenant, a single tenant. Watch the outputs, compare against what the old version would have produced, promote to the rest if everything looks right.

The hard part is comparison. You need both versions to produce comparable outputs that you can diff, which usually means writing the new version’s output to a side table and computing diffs on a sample. Cheap if you set up the tooling once; tedious if you do it from scratch every time.

Dark launches. Run the new version in shadow mode: it processes the same inputs as the old version but writes to a separate output. Nothing downstream reads it. Compute diffs against the old output until you trust the new version, then flip the cutover. This is the canary pattern with no production risk because the new output is not consumed yet.

For a stateful streaming job, dark-launching is expensive because you are running two copies of the pipeline holding two copies of the state. For a batch job, dark-launching is cheap: the new version just runs once a day to a different table.

Feature flags for pipelines. Less common than in web services but useful. The pipeline reads a runtime flag and branches its logic on it. You can flip the flag for a subset of partitions, tenants, or environments. The advantage is that a flag flip does not require a redeploy; the disadvantage is that the pipeline code carries dead branches and the testing matrix grows.

The data-is-forever problem

A bad deploy in a web service causes a brief spike of 5xxs. Users see errors for a few minutes, you roll back, and life goes on. The bad effect is bounded in time and in scope.

A bad deploy in a pipeline writes wrong data. The wrong data sits in a table that downstream pipelines read, dashboards display, and machine-learning models train on. By the time someone notices that yesterday’s user counts are off by a factor of two, the bad data has been consumed by a dozen downstream systems and propagated into a few derived datasets. Rolling back the deploy stops the bleeding but does not undo the damage.

This is why the discipline around CD for data has to be tighter than for online services, even though the consequences feel less immediate. A web 500 is loud and visible; a wrong row is quiet and contagious. The deployment patterns that look like overkill (dark launches, side-by-side comparison, blue-green on critical paths) are the ones that pay for themselves the first time they catch a bad deploy before downstream consumers see it.

What this means in practice

For a small team starting out, the realistic flow is:

  1. Batch jobs deploy via the orchestrator: merge to main, CI ships a new image, next scheduled run picks it up. Idempotency makes rollback safe.
  2. Streaming jobs deploy via savepoint-and-restart for stateful work, rolling restart for stateless. Savepoint discipline matters more than any framework feature.
  3. Schema migrations are a separate deploy step, additive by default, never coupled to a code release.
  4. The most critical pipelines (revenue, regulatory reporting, anything user-facing in the product) earn either dark-launches or blue-green. Lesser pipelines do not.
  5. Anything that can be made idempotent should be. Idempotency is the property that lets every other deployment pattern work.

The rest is tooling: orchestrator, CI, registry, observability, alerting on output diffs. Lesson 53 covers the infrastructure-as-code piece, which is the layer that makes the orchestrator and the pipelines themselves reproducible. Lesson 54 covers the container piece, which is the unit most modern data jobs ship as.

Citations

  • “What Is Continuous Delivery?” (https://continuousdelivery.com/, retrieved 2026-05-01).
  • Apache Flink documentation, “Savepoints” (https://nightlies.apache.org/flink/flink-docs-stable/docs/ops/state/savepoints/, retrieved 2026-05-01).
  • “Schema Evolution and State Migration” in the Flink documentation (https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/fault-tolerance/schema_evolution/, retrieved 2026-05-01).
  • Google SRE Book, chapter on release engineering (https://sre.google/sre-book/release-engineering/, retrieved 2026-05-01).
Search