Lesson 73 picked the shape of the services. Lesson 74 picked the shape of the conversations between them. This lesson picks the shape of the geography. Once the architecture is decided and the workflows are settled, the next question that lands on the team is “what happens when one of our regions goes down?” or “what happens when our European users complain that the app is slow?”. Both questions point at the same answer: multi-region. And both come with a price tag the team has to look at honestly before signing.
This is the lesson that closes Module 10’s opener. After this, the rest of the module turns to the cross-cutting concerns (security, cost, sustainability, AI integration) that sit on top of whatever shape the team has settled on. Geography is the last big architectural decision before those concerns take over.
Why teams go multi-region
The four reasons, in roughly the order that pushes most teams over the line:
Latency. Users on the other side of the world experience the latency of the speed of light, plus the network’s actual speed which is worse. A request from Sydney to a US-East data centre is around 200ms one-way before the application does anything. For a single page that might be tolerable; for a chatty app or anything real-time it is not. Serving Sydney users from a Sydney region brings the round-trip back down to single-digit milliseconds.
Disaster recovery. A region-wide outage at any of the major clouds happens. AWS US-East-1 has had several multi-hour outages in the last decade. Azure has had region-wide DNS and storage incidents. GCP has had control-plane failures. When the entire region your product lives in is down, “it will come back when AWS comes back” is not a satisfying answer to give the customer paying for an SLA. Multi-region with the ability to fail over is the defence.
Compliance. GDPR (EU), CCPA (California), LGPD (Brazil), the Personal Information Protection Law (China), and a growing list of regional regulations specify where personal data can be stored and processed. The same product that ships globally has to keep European user data in European data centres, sometimes Chinese user data in Chinese data centres, and sometimes Russian user data in Russian data centres. Compliance is the most absolute of the reasons: it is not about performance or availability, it is about whether you are allowed to operate.
Capacity. A small minority of products outgrow the capacity of one region. The cloud regions are large; this case is rare for most teams. When it happens, the second region is added because the first one cannot hold the load.
The first three drive most of the multi-region adoption. The fourth is a real but uncommon trigger.
The three deployment shapes
Once the team has decided to go multi-region, three shapes are the choices on the menu.
flowchart TD
subgraph AP[Active-Passive]
AP1[Primary Region<br/>serves all traffic]
AP2[Secondary Region<br/>standby, replicating]
AP1 -->|async replication| AP2
end
subgraph AA[Active-Active]
AA1[Region A<br/>serves traffic]
AA2[Region B<br/>serves traffic]
AA1 <-->|bidirectional replication| AA2
end
subgraph FS[Follow-the-Sun]
FS1[APAC Region<br/>serves during APAC hours]
FS2[EMEA Region<br/>serves during EMEA hours]
FS3[Americas Region<br/>serves during US hours]
FS1 -->|handoff| FS2
FS2 -->|handoff| FS3
FS3 -->|handoff| FS1
end
Active-passive
The primary region serves all traffic. The secondary region runs the same stack, holds a replica of the data, and is otherwise idle. When the primary region fails, the team (or an automated process) flips DNS and the secondary becomes the primary. This is the traditional disaster-recovery shape, used since long before “multi-region” was the trendy phrase for it.
The strengths:
- Operationally simpler. There is one region serving traffic at a time. Conflicts between regions cannot happen. The data model in the secondary is whatever async replication delivered, applied in order.
- Lower cost. The secondary region’s compute can run at a fraction of the primary’s capacity, ramped up only on failover.
- Easier consistency. There is one writer; replication is one-directional.
The weaknesses:
- Failover is a real operation. It takes minutes to hours, not seconds. DNS TTLs propagate slowly. Connections drain. The team has to practice the failover, ideally in regular game days, or the runbook will not work when it is needed.
- Recovery time objective (RTO) is bounded by failover time. If the team has not practiced and not invested in automation, the RTO is “however long the on-call takes to get the runbook out and execute it”. Realistic active-passive RTOs in 2026 range from minutes for well-prepared teams to hours for teams discovering the runbook for the first time mid-incident.
- The secondary region is wasted capacity most of the time. Most teams put it to use as a read replica, which helps with cost but adds operational complexity.
- Recovery point objective (RPO) is bounded by replication lag. Async replication means some recent writes are lost on failover. For some workloads this is a deal-breaker.
Active-active
All regions serve traffic simultaneously. A user request hits the nearest region, the request is processed there, and the data is replicated to the other regions in the background. There is no failover because there is no single primary; if a region goes down, traffic shifts to the others.
The strengths:
- No failover. If one region goes down, the others continue. RTO is the time it takes for DNS or the load balancer to detect the outage and stop routing to the failed region (seconds to minutes).
- Latency. Each user is served from the nearest region.
- Capacity is fully used. Every region is doing real work all the time.
The weaknesses:
- Conflicts. Two regions can accept conflicting writes for the same record. User updates their email in Frankfurt; an admin updates the same user’s email in Virginia. Both writes succeed locally. Replication delivers each to the other side, and the system has to reconcile.
- Replication is harder. Bidirectional replication has more failure modes than one-directional.
- Stronger application requirements. The application has to be designed for eventual consistency, idempotent operations, and conflict resolution. Retrofitting an existing application is expensive.
- Higher cost. All regions are running at production scale all the time.
The conflict-resolution problem is the hard part of active-active. The strategies in 2026:
Last-write-wins. Each write has a timestamp; the latest one wins. Simple, but loses data. Two valid concurrent writes become one, and the loser is gone.
Application-level merge. When the system detects a conflict, it surfaces it to the application or to the user. “Two versions of this document; which do you want?” Works for documents, doesn’t work for counters.
CRDTs (Conflict-free Replicated Data Types). Data structures designed so that any two replicas can be merged without coordination, by construction. A CRDT counter merges by summing the increments from each replica. A CRDT set merges by union. The cost is that the data structures are more complex than their non-CRDT equivalents, and not every domain fits a CRDT shape.
Sharding by region. The simplest workaround: each user’s home region is fixed, and writes for that user always go to the home region. Conflicts cannot happen because there is only one writer per record. The cost is that some users get higher latency when they travel.
The choice depends on the data shape. Counters: CRDT or sharded. Documents: application-level merge or last-write-wins. User profiles: sharded by region. Most production active-active systems use a mix.
Follow-the-sun
A variant of active-active for global services with strong daily patterns. Traffic routes to the region whose users are awake. APAC traffic peaks during APAC business hours; EMEA traffic peaks during EMEA hours; Americas traffic peaks during US hours. The system can shift workload across regions to follow the demand.
The most-cited use case is operational: support staff and on-call rotations follow the sun, with each region’s team handling incidents during their local business hours. This is more about ops than architecture, but it shows up in the architecture when the team builds tooling to keep the on-call region’s region as the active one for sensitive workflows.
The architectural use case is cost optimisation. Compute that is idle during the local night can be scaled down or used for batch jobs. Some teams run their analytics workloads in whichever region is currently in low demand for serving traffic.
Follow-the-sun is rarely the primary shape. It is a refinement on top of active-active for teams that have specific patterns to exploit.
The hard problems
The reasons multi-region costs more than it looks like:
Data replication
Cross-region replication is high-latency. The fastest cross-Atlantic round-trip is around 70ms; cross-Pacific is around 130ms; the long ones are over 200ms. Synchronous replication across regions kills write latency: every write has to wait for at least one cross-region round-trip. Async replication is the pragmatic choice for most workloads, but it means the secondary region is always behind, and failover loses the most recent writes.
The PACELC framing from lesson 11 is exactly the trade-off here: in the absence of partitions, the system chooses between latency (async, low write latency, weaker consistency) and consistency (sync, higher write latency, stronger consistency). Multi-region forces the team to make this choice for every dataset.
The 2026 toolset has converged on a few patterns. Aurora Global Database, Spanner, Cosmos DB multi-region, and YugabyteDB all offer different points on the latency-consistency curve. The cloud-managed databases hide most of the operational complexity but not the conceptual one. The team still has to know which dataset is replicated which way and what RPO each one carries.
Cross-region network cost
The network between regions is not free, and the cloud providers have made it the most expensive bandwidth tier they sell. AWS inter-region data transfer in 2026 is on the order of 0.02 to 0.09 USD per GB depending on the regions involved; GCP and Azure are in similar ranges. A multi-region application that replicates every write to every region pays for the bandwidth on every byte.
The numbers are small per-byte and large per-month. A workload pushing a few terabytes per day of cross-region replication burns through a meaningful budget line. Active-active is more expensive than active-passive, partly because of duplicate compute and storage and partly because of the chatty replication.
The mitigations: replicate only what needs to be replicated (cold data can stay in one region); compress aggressively; batch where the consistency model allows; choose region pairs that have lower transfer pricing.
DNS-level traffic steering
The system that decides which region a user hits is a layer above the regions themselves. The choices in 2026:
GeoDNS. DNS responses depend on the source IP’s geography. Users in Europe get the Frankfurt IP; users in the US get the Virginia IP. Simple, but slow to respond to outages because DNS TTLs are typically minutes.
Latency-based routing. AWS Route 53, Cloudflare, and Akamai measure latency from each user to each region and route to the lowest-latency one. Better than GeoDNS for users near region borders.
Anycast. Cloud-provider load balancers (AWS Global Accelerator, GCP’s global load balancer, Cloudflare’s network) advertise the same IP from many regions. The internet’s BGP routing delivers each user to the closest one. Sub-second failover when a region goes down, because the routing reconverges in BGP, not in DNS.
The 2026 default for serious multi-region deployments is anycast. GeoDNS is the legacy fallback.
When not to go multi-region
The honest list of cases where multi-region is the wrong choice:
Small product without latency or DR requirements. A single-region deployment is dramatically simpler and cheaper. Most products do not need multi-region until they have product-market fit, real customers, and a real revenue model that demands it.
Single-country product. A product that serves only one country can usually be served from one or two regions in that country with no cross-region replication.
Regional regulations do not apply. If your product has no users in regions with data-residency rules, the compliance reason is moot.
You haven’t outgrown one region’s capacity. Cloud regions are large. Most teams that worry about capacity are nowhere near the actual limit.
The team cannot operate it. Multi-region is not a configuration flag. It is a substantial increase in operational complexity, including practiced failover, conflict-resolution code paths, cross-region monitoring, and replication-lag alerting. A team that is struggling with single-region operations will struggle worse with multi-region.
The progression most products follow: start single-region. Stay single-region for as long as possible. Add a passive secondary when DR or compliance forces it. Add active-active only when latency or capacity forces it. Each step is a real architectural commitment that the team has to invest in for years.
The cross-references close the module’s opening. The CAP theorem from lesson 10 and PACELC from lesson 11 give the formal vocabulary for the consistency trade-offs that multi-region forces. The replication-lag lesson (26) covers the operational reality of async replication that this lesson assumes. Lesson 73’s microservices framing applies inside each region; lesson 74’s event-driven framing applies across regions when the workloads tolerate eventual consistency.
The next module turns to the concerns that sit above the architecture: security, cost, sustainability, and the integration of AI into the platform. Each of those is a cross-cutting layer on top of whatever shape the architecture has settled on. The shape questions, after this lesson, are settled enough to stop asking.
Citations
- AWS Multi-Region Application Architecture documentation,
https://docs.aws.amazon.com/whitepapers/latest/aws-multi-region-fundamentals/, retrieved 2026-05-01. - Azure Architecture Center, “Multi-region deployments”,
https://learn.microsoft.com/azure/architecture/reliability/regions-paired, retrieved 2026-05-01. - Google Cloud, “Multi-region patterns and practices”,
https://cloud.google.com/architecture/disaster-recovery, retrieved 2026-05-01. - Marc Shapiro et al., “Conflict-free Replicated Data Types”, INRIA Technical Report, 2011. The foundational CRDT paper.
- AWS Aurora Global Database documentation,
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database.html, retrieved 2026-05-01. - Google Cloud Spanner documentation,
https://cloud.google.com/spanner/docs/instances, retrieved 2026-05-01. - Cloudflare, “How Anycast works”,
https://www.cloudflare.com/learning/cdn/glossary/anycast-network/, retrieved 2026-05-01.