Real case: Stripe's deployment pipeline

Module 7 has built up a picture of how teams turn architecture into running systems. Branching strategies in lesson 49, trunk-based development in lesson 50, CI for data pipelines in lesson 51, CD patterns in lesson 52, infrastructure as code in lesson 53, containers in lesson 54, Kubernetes in lesson 55. Each lesson is a layer; each layer interacts with the others. The point of this closing case study is to look at one company that has put all those layers together at scale and published enough about it that the picture is concrete.

Stripe is the canonical “high-stakes engineering with high deploy frequency” story. They run the financial infrastructure that processes hundreds of billions of dollars per year. The system has to be available (a payment that fails because Stripe is down is a real lost transaction for a real merchant) and safe to deploy (a buggy payment processor is worse than a slow one). Those two requirements pull in opposite directions for most teams. Stripe’s published practices show how the same team can hit both, and what they invested in to make that possible.

The information in this lesson comes from Stripe’s engineering blog and a handful of public talks. The citations at the end give the primary sources. The framing is mine; the practices are theirs.

The shape of the problem

Stripe sits in the payment path of a large fraction of internet commerce. When a buyer clicks “pay” on a Shopify store, an Uber ride, a SaaS subscription, the request often hits Stripe’s API. The latency budget is short (a payment that takes too long looks broken to the buyer), the correctness budget is shorter (a wrongly charged card or a missed authorization is a customer-service incident at minimum and a regulatory issue at worst), and the volume is enormous.

The team that runs this system also has the normal pressures of any growing tech company: new features, new payment methods, new countries, new regulations, new fraud patterns. They have to ship constantly. The deploys cannot stop, the bugs cannot ship, and the system cannot go down. The interesting question is what they built to make all three possible at once.

Monorepo with services

Most Stripe code lives in a single repository. This is not the same as a monolith. The repository contains many services, deployed independently, with independent on-call rotations. What the monorepo gives is a single source of truth for the code, a single dependency graph, a single CI configuration to maintain.

The benefits Stripe has called out publicly:

Cross-service refactors are tractable. Renaming an API in one service and updating every caller in twelve other services is a single pull request. In a multi-repo setup the same change is a coordinated rollout across thirteen repos, which in practice means it does not happen and the API stays misnamed for years.

Internal API and contract testing is straightforward. Service A’s tests can spin up service B from the same checkout. The contracts between services are tested at every commit, not at integration time.

Tooling investment is leveraged. The CI infrastructure, the build cache, the test selection logic, the static analysis, the linters: each of them is built once and used across every service. The cost of writing a new check is amortised over the whole company.

The trade is real. A monorepo at Stripe’s scale has its own engineering effort behind it. Build systems (Bazel-style or custom), test selection (only run tests affected by the change), CI parallelism, code review tooling: all of it has to be tuned to the size. For a small team this would be wildly disproportionate. For a company that has grown into the scale, the investment is what makes the monorepo workable.

The lesson for smaller teams is not “use a monorepo”. It is “the boundaries of your repos shape the kinds of changes you can make easily”. A team that can do cross-service refactors trivially will have a cleaner architecture three years in than a team that cannot.

Sorbet: types as an architectural choice

Stripe’s primary backend language was Ruby. Ruby is dynamically typed by default, which is fine at small scale and gets expensive at large scale: a typo in a method name is not caught until production, a refactor that misses a caller is not visible until the caller crashes, a change to a data structure has no automatic check that downstream code still works.

Stripe’s response was to build Sorbet, a gradual type checker for Ruby, and open-source it. The investment is significant: a multi-year effort by a dedicated team, integrated deeply into the development workflow. The payoff is a class of bugs caught at edit-time rather than in production.

The framing matters for this module. Type systems are sometimes presented as a coding-style preference: some people like them, some people do not. At Stripe’s scale, type systems are an architectural choice. They enable refactors that would otherwise be unsafe. They make the code more legible to reviewers. They catch the kind of regression that an automated test would miss, because the bug is in code that has no test.

The lesson generalises. Static analysis, type checking, linters, contract checks: the more of them run at edit-time and CI-time, the more the team can move fast without breaking things. Stripe built Sorbet because they could not buy what they needed; for most teams, the off-the-shelf tools (TypeScript, mypy, Go’s compiler) cover the same ground.

Rapid iteration with safety

The deploy frequency at Stripe is high. Many deploys per day per service is normal. The cultural piece is that each deploy is small and reversible.

Small deploys are easier to review and faster to roll back. If a deploy contains a single feature flag flip and a small bug fix, and something goes wrong, the cause is in one of two places. If a deploy contains forty merged pull requests, the cause is in one of forty places, and the rollback brings back forty changes that may now have to be replayed.

Reversible deploys are the property that lets the team treat a deploy as a non-event. A deploy that cannot be rolled back is a deploy that has to be perfect, and a deploy that has to be perfect produces a culture of fear around deploys. The investment in reversibility (feature flags, blue-green, canaries, careful schema migrations) is what removes that fear.

The connection to lesson 50 is direct. Trunk-based development with small commits, gated by trunk-based CI, with feature flags for incomplete work: that is the pattern that produces small, reversible deploys at scale.

Online migrations: the hardest part

Stripe’s published “Online Migrations at Scale” blog post is, in this writer’s opinion, the canonical reference for “how to change a critical schema without downtime”. It is worth reading in full; the structure of the playbook is teachable and applies far beyond Stripe.

The pattern they describe has four phases:

Backwards-compatible additive change. The first deploy adds the new column or the new table. Nothing reads from it yet. The old code path continues to work. This deploy can roll back trivially because nothing depends on the new shape.

Dual writes. The second deploy makes the application write to both the old and the new shape. The new shape is being populated for new rows; the old shape is still authoritative for reads. If anything goes wrong, the team turns off the dual write and the system is back to the old steady state.

Backfill. The historical rows have to be migrated to the new shape. This is a job that runs offline, in batches, with rate limiting so it does not overwhelm the database. The job is idempotent (lesson 38 territory), so it can be paused, resumed, and restarted without producing wrong data.

Cutover. Once the new shape is fully populated and dual-writes are confirmed correct, the application starts reading from the new shape. This is the highest-risk step; it is gated by feature flags and rolled out gradually. The old shape is still being written to in case of rollback.

Cleanup. Eventually, after weeks of confidence-building, the dual write is removed and the old shape is dropped.

The whole process can take months for a single migration. That is not a bug. The migration is changing the shape of data that is critical to a financial system. The slowness is the price of doing it without downtime and without data loss.

The lesson for any team handling persistent data is that schema migrations are the hardest part of CI/CD. Most teams under-invest here, and most outages that originate in deploys are migration-shaped: a column rename that broke reads, a NOT NULL added before the backfill finished, a foreign key added with rows that violate it. The Stripe playbook is teachable, generalisable, and worth treating as a default template.

flowchart TB
    OLD[Old schema in use] --> ADD[Phase 1: Add new shape]
    ADD --> DUAL[Phase 2: Dual writes]
    DUAL --> BACKFILL[Phase 3: Backfill historical]
    BACKFILL --> CUT[Phase 4: Cutover reads]
    CUT --> CLEAN[Phase 5: Drop old shape]

Observability investment

A team deploying many times per day has to know within seconds whether the deploy made things worse. That is an observability problem.

Stripe has invested heavily here. They built Veneur, a high-volume statsd implementation, when off-the-shelf tools could not keep up with their metric volume. They have published on tracing infrastructure, structured logging, and the discipline of treating observability as a first-class engineering concern.

The deploy-time use case is the most direct. When a new version rolls out, dashboards show error rates, latencies, and business metrics for that specific version. If anything regresses against the old version, the rollout pauses or rolls back automatically. The team that did the deploy gets paged, but the system is already back to a known-good state.

The deeper use case is reading the system in production. A payment that took too long has a trace that shows every service it touched, every database query it made, every external API it called. Debugging is reading the trace, not guessing.

The lesson generalises. The investment in observability is what makes high-deploy-frequency safe. Without it, the team is flying blind, and “deploy as non-event” turns into “deploy as occasional disaster”. With it, the deploy is just another data point on the dashboards, and the dashboards make problems visible before users do.

Deploy as non-event

The cultural piece is the hardest to copy and the most important to understand. At Stripe, deploys are not celebrated and not feared. They happen continuously. The engineer who shipped a feature does not stand around watching the deploy; the system handles it, the metrics confirm it, and the engineer is already on the next thing.

This is a state a team has to earn. It comes from years of investment in tooling, testing, observability, and rollback infrastructure. It also comes from a culture that treats deploys as the normal way the system stays alive, not as risky events that require ceremony.

Teams that do not have this culture often produce its opposite: deploys are rare, scary, and come with elaborate change-control rituals. Each deploy is a big batch of accumulated changes, which makes it more likely to break, which produces more ceremony, which makes deploys rarer, which makes the next deploy bigger. The vicious cycle is real.

The way out is the trunk-based development pattern from lesson 50, plus the CI/CD investment from lessons 51 and 52, plus the observability investment to know whether things are working. There is no shortcut. There is also no alternative if the team wants to ship fast and not break things.

What this means for the rest of us

Stripe is operating at a scale where their solutions are sometimes overbuilt for what most teams need. Sorbet is a good example: the right tool for a Ruby codebase of Stripe’s size, overkill for a thirty-person startup whose Ruby codebase is fifty thousand lines. The lesson is not “build Sorbet”. The lesson is “invest in the tooling layer that catches your most expensive class of bugs at edit-time”.

Several patterns generalise more directly:

Investment in tooling pays off when the team’s deploy frequency is bottlenecked by manual work. If every deploy needs a human to babysit it, the team’s deploy frequency is capped by human attention. Automating away the human reveals the next bottleneck.

Type systems and static analysis are an architectural choice, not just coding-style preference. They prevent classes of bugs before they hit prod. The decision to use a typed language or to add types to a dynamic language is a decision about how the team scales.

Schema migrations are the hardest part of CI/CD. The Stripe online-migrations playbook is the canonical reference. Teams that adopt the pattern early have far fewer migration-shaped incidents than teams that improvise each migration.

Trust in the test suite enables velocity. If CI is reliable, the team trusts a green build and ships. If CI is flaky, the team learns to ignore it, and the safety net stops working. Investment in CI reliability is not glamorous and it is high-leverage.

Culture matters as much as tooling. Deploys are events because the team makes them events. The cultural shift to “deploys are routine” requires both the tooling to make them safe and the social agreement to treat them as routine.

Module 7 wrap

The seven lessons of this module add up to one operational picture. Branching strategies and trunk-based development decide how the team coordinates around the codebase. CI catches bugs before merge. CD limits the blast radius of bugs that slip through. Infrastructure as code makes the environments reproducible and reviewable. Containers and Kubernetes are the runtime substrate that everything else lives on. Stripe’s case study shows what it looks like when all seven layers are invested in together, in a context where the cost of getting it wrong is high.

Module 8 starts with orchestration. The deeper layer of how the jobs in a data platform actually get scheduled, how dependencies between them are tracked, and how the team operates the platform once the deploy patterns are in place. Airflow, Dagster, Prefect, and the patterns that work across all of them.

Citations

“Online Migrations at Scale” on the Stripe engineering blog (https://stripe.com/blog/online-migrations, retrieved 2026-05-01).
“Sorbet: Stripe’s type checker for Ruby” on the Stripe engineering blog (https://stripe.com/blog/sorbet-stripes-type-checker-for-ruby, retrieved 2026-05-01).
Sorbet project documentation (https://sorbet.org/, retrieved 2026-05-01).
“Veneur: a high-performance, distributed statsd” on the Stripe engineering blog (referenced via https://stripe.com/blog/engineering, retrieved 2026-05-01).
Stripe engineering blog index (https://stripe.com/blog/engineering, retrieved 2026-05-01).
Trunk Based Development site, referenced from lesson 50 (https://trunkbaseddevelopment.com/, retrieved 2026-05-01).