Data & System Architecture, from the ground up Lesson 49 / 80

Git for engineering teams: branching strategies that work

Trunk-based, GitHub flow, gitflow. The realities at small vs large team scale, when each fits, and the patterns that survived 15 years of practice.

Modules 5 and 6 covered the data-platform half of the course: batch in Module 5, streaming in Module 6, two case studies (Netflix and Uber) that exercised every primitive on the way through. Module 7 starts a different conversation. The architectural designs of the previous modules have to land in production somehow, and the path from “we wrote the design” to “it runs” goes through code, version control, code review, automated tests, deployment pipelines, and the operational tooling that keeps the lights on. That is what Module 7 is about.

The opener picks the most universal piece of that toolchain: git, and the question every engineering team answers somehow, which is how the team uses branches. The answer matters more than it looks. Branching strategy shapes review cadence, deployment frequency, merge conflict frequency, and the cultural posture of the team toward shipping. A team using gitflow ships differently from a team using trunk-based development, and the difference is not just process: it is the lived experience of the engineers.

This lesson covers the three strategies that account for almost all of professional practice. Gitflow, GitHub flow, and trunk-based development. The next lesson zooms in on trunk-based development, because that is the strategy most modern teams converge on, and the one that most rewards a deeper look. Lesson 51 then turns to CI for data pipelines, which is where the abstract software-engineering practices meet the specific shape of the data work the rest of the course has been building toward.

The three strategies

Three patterns dominate. They differ on branch lifetime, merge complexity, release cadence, and what tooling they require to work. They are not the only patterns. They are the ones most teams actually use, and the ones most public material is written about.

Gitflow

Vincent Driessen’s 2010 blog post “A successful Git branching model” introduced gitflow (https://nvie.com/posts/a-successful-git-branching-model/, retrieved 2026-05-01). The model is hierarchical. Two long-lived branches anchor the repo: main (sometimes called master in older conventions) holds released, production code, and develop holds the integration line where in-progress work converges. Around those two, three short-lived branch families appear and disappear: feature branches off develop, release branches off develop when a release is being prepared, and hotfix branches off main when a production bug needs a fast patch.

A typical lifecycle. A developer creates feature/checkout-redesign from develop, works for a week, opens a pull request, gets reviewed, merges back into develop. Several feature branches accumulate on develop over a few weeks. When the team is ready to ship, a release/2.7.0 branch comes off develop. The release branch only accepts bug fixes; new feature work continues on develop. When the release branch is stable, it merges into main, gets tagged v2.7.0, and merges back into develop so any release-time fixes flow forward. A production bug found later spawns a hotfix/2.7.1 off main, gets fixed, merges into both main and develop, and the cycle continues.

It is a lot of ceremony. The ceremony exists because gitflow was designed for products with formal versioned release cycles: shrink-wrapped software, mobile apps that ship through review, on-prem enterprise software, libraries with semver. In those contexts, separating “what is in development” from “what we have shipped” is a real architectural concern, and the multiple branches let the team work on the next version while supporting the current one.

GitHub flow

GitHub’s documentation describes a much simpler pattern (https://docs.github.com/en/get-started/using-github/github-flow, retrieved 2026-05-01). One long-lived branch (main) holds production-ready code. Every change, however small, happens on a short-lived feature branch off main. The work lives on the branch for hours or a few days. A pull request opens, CI runs the test suite, a reviewer approves, the branch merges into main. The merge often triggers an automatic deploy.

The cadence is continuous. There is no “release branch” because there is no formal release: every merge to main is a release, in the sense that it is what is currently running. The pattern fits SaaS and most web applications naturally, because the deployment surface is one running system that you continuously update. It also fits most internal tools and most data pipelines, where the deploy target is a service that runs the latest version of the code.

GitHub flow is the dominant pattern in the open-source world and in most modern engineering organisations under a few hundred engineers. It is also the easiest to explain to someone new to git: “make a branch, do your work, open a PR, merge it, delete the branch.”

Trunk-based development

The third pattern goes one step further than GitHub flow. Everyone commits directly to main. Branches still exist for code review, but they live for hours, not days. Work that is not yet ready to be exposed to users hides behind feature flags: the code is on main, deployed, and tested, but a runtime switch keeps it from running until the team is ready to flip it on.

Paul Hammant and others have documented the pattern at https://trunkbaseddevelopment.com (retrieved 2026-05-01), drawing on the practices of large-scale engineering organisations. Google, Facebook, and Microsoft (for Azure DevOps and parts of the Office stack) publicly run massive monorepos with this model. Etsy and Spotify are mid-scale examples that have written about it. The common thread is that at sufficient scale, the alternatives stop working: branches that live for a week conflict with everything around them, and merge resolution starts consuming a real fraction of engineering time.

Trunk-based development requires more infrastructure than GitHub flow. Feature flags become essential, not optional. CI has to be fast enough that every commit gets validated within minutes. Code review has to operate on small changes, because the changes land directly on the integration line. Lesson 50 covers all of that in depth.

The comparison axes

Stepping back, the three strategies separate cleanly along four axes.

Branch lifetime. Gitflow has long-lived branches: develop is permanent, feature branches often live weeks, release branches days. GitHub flow has medium-lived branches: feature branches live hours to a few days. Trunk-based has short-lived branches: hours, sometimes minutes. The shorter the branch lifetime, the less divergence accumulates between what you are working on and what main looks like.

Merge complexity. This follows directly from branch lifetime. Long-lived branches conflict with everything around them, so merging them is a real operation. Short-lived branches barely diverge from main, so merges are usually fast-forwards or trivial three-way merges. Teams that hate merging are usually teams with branches that live too long.

Release cadence. Gitflow assumes scheduled releases: the team prepares a release branch, stabilises it, ships it, repeats. GitHub flow assumes continuous releases: every merge to main is a release. Trunk-based assumes continuous deployment: every merge can be deployed automatically, often is, and the question of “what is in this release” becomes “what was on main at deploy time”.

Tooling required. Gitflow needs the least tooling: the model works with bare git and a CI system. GitHub flow needs CI on every PR and ideally automated deploys. Trunk-based needs all of that plus feature-flag infrastructure, plus a CI pipeline fast and reliable enough that committing directly to main is not terrifying. The tooling cost of trunk-based is real, and it is why small teams often start with GitHub flow and move to trunk-based as scale demands it.

flowchart LR
    subgraph main ["main branch"]
        direction LR
        m1["init"] --> m2["small fix"] --> m3["merge login<br/>deploy"] --> m4["merge api<br/>deploy"] --> m5["merge dashboard<br/>deploy"]
    end

    subgraph login ["feature/login"]
        direction LR
        l1["login UI"] --> l2["login tests"]
    end

    subgraph api ["feature/api"]
        direction LR
        a1["api endpoint"]
    end

    subgraph dash ["feature/dashboard"]
        direction LR
        d1["dashboard"] --> d2["dashboard tests"]
    end

    m2 -.branch.-> l1
    l2 -.merge.-> m3
    m2 -.branch.-> a1
    a1 -.merge.-> m4
    m3 -.branch.-> d1
    d2 -.merge.-> m5

    classDef mainNode fill:#0d9488,stroke:#0d9488,color:#ffffff
    classDef branchNode fill:#1f2933,stroke:#52606d,color:#e8edf1
    classDef boundary fill:transparent,stroke:#0d9488,stroke-dasharray: 5 5
    class m1,m2,m3,m4,m5 mainNode
    class l1,l2,a1,d1,d2 branchNode
    class main,login,api,dash boundary

Diagram to create: a polished version of the GitHub flow gitGraph above. main is the central spine, three short-lived feature branches come off it and merge back in over a few days, and “deploy” markers sit on the merge commits. The visual point is that main is always shippable, branches are short, and merges happen continuously.

When each strategy fits

The strategies are tools, not religions. The fit depends on what the team is shipping.

Gitflow fits products with versioned releases. Mobile apps where the App Store review is the deploy gate. Libraries with semver where users pin to specific versions. On-prem enterprise software shipped quarterly. Embedded firmware. The common feature is that there is a meaningful difference between “what we have released” and “what we are working on”, and the branching model makes that difference explicit. Gitflow’s ceremony is appropriate when the underlying release process is itself ceremonious.

GitHub flow fits most SaaS, most internal tools, and most data pipelines. The deploy target is a single running system. Releases are continuous. The team is small enough that long-lived branches are unnecessary. Feature-flag infrastructure may exist but is not a hard prerequisite. This is where the bulk of professional engineering happens, and it is the safe default for a team that does not have a specific reason to pick something else.

Trunk-based fits very large teams or teams committed to continuous deployment. Two distinct populations end up here. The first is teams large enough that branching at all is painful: hundreds of engineers committing to a monorepo cannot afford week-long branches because the merge cost dominates. The second is teams with strong feature-flag culture who want the discipline of “everything on main, gated by a flag” because it eliminates an entire class of integration bugs. Both populations gain real benefits from the pattern; both pay the tooling tax that the pattern requires.

The boundary between GitHub flow and trunk-based is fuzzy in practice. A small team using GitHub flow with one-day branches and a fast CI loop is almost doing trunk-based. The cultural difference is whether the team thinks of main as “the integration branch we merge to” (GitHub flow) or “the place we work, with branches as a review formality” (trunk-based). The mechanical difference is whether feature flags are a routine part of the workflow or a special case.

The data-engineering wrinkle

Data pipelines complicate the picture in one specific way: schema migrations and stateful systems do not roll forward as cleanly as stateless code does.

A web service can be deployed twice a day, ten times a day, on every commit. A pipeline that writes to a warehouse table cannot. If the pipeline change includes a schema migration (a new column, a renamed column, a changed type), the migration runs once, against a real warehouse, and rolling back means undoing the migration, which is sometimes impossible. The branching strategy interacts with this in two ways.

First, long-lived branches are riskier for pipeline code than for service code. If the feature/customer-360 branch lives for three weeks while the team works on a new dimension model, and main ships a different schema change in week two, merging the branch becomes a forensic exercise: which schema is correct, which migration runs first, what does the warehouse look like at each point. The shorter the branch, the smaller this problem.

Second, feature flags are awkward for batch pipelines. A flag that gates a code path in a request handler is cheap; the flag is checked, the path is taken, the request returns. A flag that gates a transformation in a nightly batch job is the same idea, but the consequences of a wrong flag value are persistent: the data has been written, in one shape or another, and you cannot un-write it without a backfill. Trunk-based development for data pipelines requires an investment in idempotent jobs (lesson 38) and reliable backfill machinery (lesson 39) so that “wrong” outputs can be regenerated.

The practical recommendation for most data teams is GitHub flow with disciplined short branches, CI that runs the pipeline on sample data (lesson 51), and migrations gated behind explicit deploy steps that the team approves separately from code merges. Trunk-based for data pipelines is achievable, and Module 7’s later lessons describe the patterns, but the prerequisites are real.

Where this leaves us

The branching strategy is the surface choice. Underneath, the deeper question is how the team relates to the integration line: is main a destination you arrive at after preparation, or a workspace you live on continuously. Gitflow treats main as the destination. GitHub flow treats main as both the workspace and the deploy line. Trunk-based treats main as the workspace and bets on feature flags to keep the deploy line safe.

Lesson 50 takes the trunk-based pattern apart in detail: why the largest engineering organisations converged on it, what the prerequisites are, how feature flags work, and the cultural shift the pattern demands of developers. Lesson 51 then turns to CI for data pipelines, which is the testing discipline that makes any of these branching strategies safe in a data-engineering context.

Citations and further reading

  • Vincent Driessen, “A successful Git branching model”, 2010, https://nvie.com/posts/a-successful-git-branching-model/ (retrieved 2026-05-01). The original gitflow proposal, with the diagrams that became the standard reference.
  • GitHub Docs, “GitHub flow”, https://docs.github.com/en/get-started/using-github/github-flow (retrieved 2026-05-01). The canonical description of GitHub flow as practised across the platform.
  • Paul Hammant and contributors, “Trunk-Based Development”, https://trunkbaseddevelopment.com (retrieved 2026-05-01). The reference site for trunk-based development, including the case studies from large-scale engineering organisations.
  • Atlassian, “Comparing workflows”, https://www.atlassian.com/git/tutorials/comparing-workflows (retrieved 2026-05-01). A vendor-neutral side-by-side comparison of the major patterns, useful as a teaching reference.
Search