Scaling 10x: what breaks, what survives

The 10x exercise is one of the most useful thought experiments in capacity planning. Take the system as it runs today, in production, with current traffic. Now imagine the load is ten times what it is. Ten times the requests per second, ten times the daily volume, ten times the connected users, ten times the writes. Walk through the architecture component by component and ask: what breaks first?

The exercise has a property that makes it more useful than precise capacity modelling. It forces the team to confront the components nobody is currently watching, because at 1x they cope and nobody has reason to look at them. At 10x they stop coping. The exercise is a flashlight pointed at the corners of the architecture where the team has not yet had reason to think.

This lesson is about the shape of those corners. Which kinds of components scale gracefully with more horsepower; which ones hit walls; which architectural patterns survive an order-of-magnitude jump and which patterns have to be redesigned. The framing comes from a long line of scalability writing, with Michael Nygard’s “Release It!” and Martin Kleppmann’s “Designing Data-Intensive Applications” as the two most cited sources in this corner of the field.

The pieces that scale linearly

Some components are happy with more load, in the sense that doubling the load and roughly doubling the capacity gets you the same per-request behaviour. These are the parts of the architecture that survive 10x with money rather than redesign.

Stateless web tiers. A service that holds no per-request state in memory between requests can be horizontally scaled by adding more identical instances behind a load balancer. The math is approximately linear: 10x traffic, 10x instances, same per-request latency. The qualifier is “approximately”, because load balancers, DNS, and connection-establishment costs introduce small inefficiencies, but the shape holds across two orders of magnitude in most realistic deployments.

Read replicas. A read-heavy database workload can be scaled by adding read replicas, up to a point that depends on the database engine and the replication topology. Postgres comfortably handles a small number of streaming replicas; well-tuned setups run dozens. The replication lag from lesson 26 becomes a constraint at the upper end, but the linear-scaling regime is real for the first 10x of read growth.

Object storage. S3, GCS, and Azure Blob Storage are designed to be effectively infinitely scalable from a customer’s point of view. The provider absorbs the scaling problem. A team writing a terabyte a day can write ten terabytes a day with no architectural change. The cost is linear in volume; the throughput is, for practical purposes, unbounded.

CDNs. A content delivery network caches content close to users and serves from edge locations that the provider has already provisioned for the long tail of traffic spikes. CloudFront, Cloudflare, and Fastly all advertise capacity at terabit-per-second scales. A team that adds a CDN in front of static or semi-static content removes the scaling concern from that layer entirely.

Message queues, in the basic case. Kafka, SQS, Pub/Sub, and similar systems are designed to scale by adding partitions or shards. The producers and consumers can scale independently. Within the bounds of a well-designed topic structure, going from 1x to 10x is a partition-count change rather than a redesign.

The common shape of the linearly-scalable components is statelessness, sharding, or “the provider handles it for me”. They are the cheap parts of the 10x exercise. The expensive parts are the components that have one of something.

The pieces that hit walls

The trouble starts at any component that has a hard upper bound, a single point through which all traffic flows, or a coordination problem that gets worse with concurrency.

Write-heavy single-leader databases. Postgres, MySQL, and most other relational databases concentrate writes on a single leader by default. Read replicas help with reads; they do not help with writes. The leader’s capacity is whatever the largest available instance can sustain, and that ceiling is finite. A workload at 70% of the leader’s write capacity at 1x is at 700% of the leader’s capacity at 10x, which is to say it does not work. The redesign options are read-write splitting (which only helps if the read fraction grows), application-level sharding, or moving the write-heavy table to a horizontally-scalable system (DynamoDB, Cassandra, Spanner). All three are major projects.

Single-threaded background workers. A worker that processes one item at a time on one thread has a capacity equal to one CPU core’s worth of throughput. At 10x load, the queue depth grows linearly, the latency grows linearly, and at some point the SLO breaks. The fix is concurrent workers, which exposes a new problem: any per-worker shared state (global locks, shared in-memory caches, sequential ID generators) becomes the next bottleneck.

Anything with global locks. A piece of code that takes an exclusive lock on a row, a table, or a queue serialises every concurrent caller. At 1x with five concurrent callers the contention is unobservable. At 10x with fifty concurrent callers, the lock is the system. The architectural fix is to remove the contention: per-key locks instead of global locks, optimistic concurrency control, lock-free queues, sharded counters. None of these are cheap to retrofit.

Cross-shard queries. Lesson 31 covered this directly. Queries that need to fan out to multiple shards have a cost that grows with the shard count, and the latency is bounded below by the slowest shard. At 10x scale the team has likely added shards, so cross-shard queries are slower, and there are more of them. The result is that the queries that were fine at 1x become unworkable at 10x, and the application has to be redesigned to avoid them.

Synchronous fan-out. A request handler that calls N downstream services in series has a latency that is the sum of all N latencies, and a throughput that drops as the slowest of the N gets slow. At 10x, all N are likely under more load and slower, so the fan-out request gets slower than the sum of its parts would predict. The throughput multiplier is also brutal: each external request producing N internal requests means the internal network is at 10 * N times its 1x rate.

Anything with a single point in the critical path. A single cache instance, a single DNS lookup, a single load balancer, a single auth service. At 1x the single thing is fine; at 10x the single thing is the bottleneck. The architectural answer is replication or sharding of the single thing, which is a redesign rather than a configuration change.

The classic 10x failures

Walking through a real architecture under the 10x lens, the same handful of failure modes show up everywhere. They are worth recognising by name because the diagnosis at 03:00 is faster when the pattern is familiar.

Connection pool exhaustion. Each application instance opens a pool of connections to the database. At 1x with ten application instances and a pool size of fifty, the database sees five hundred connections. At 10x with a hundred application instances and the same per-instance pool, it sees five thousand connections, which exceeds Postgres’s default max_connections and exhausts memory on the database server. The fix is connection pooling at the database side (pgbouncer, RDS Proxy) or aggressive per-instance pool tuning, but the surprise comes from the multiplicative effect of the pool times the instance count.

Hot keys on the partition. A sharded system with a key like user_id is balanced when traffic is uniform across users. At 10x traffic, the variance amplifies: the most popular user, the most popular product, the most popular subreddit takes a disproportionate share of the total. The shard hosting that key becomes the system’s bottleneck while the others sit idle. Lesson 28 covered the rebalancing patterns. Recognise the symptom: one shard’s CPU at 100%, the others at 10%, average utilisation a misleading 20%.

The missing index. A query that scans the table at 1x with a million rows is fine; a query that scans the table at 10x with ten million rows is several seconds. Worse, query time scales superlinearly because the working set no longer fits in the buffer pool, and the disk seeks dominate. A query that took 50ms at 1x can take 5 seconds at 10x, and the fix is the index that should have been there from the start. Postgres’s pg_stat_statements and similar tools surface the offenders.

Synchronous fan-out amplification. The pattern above, but with a particular failure mode: when a downstream service slows down, the fan-out callers wait, hold their connections, and starve the upstream tier of available workers. The upstream tier looks healthy on CPU but unable to accept new requests because all its workers are blocked on the slow downstream. Lesson 30’s split-brain framing has a cousin here: the system is not broken in any one place, but the cumulative behaviour is broken everywhere.

The single-flight failure. When a hot cache key expires, every concurrent reader misses simultaneously, every reader hits the database, and the database falls over. This is the cache-stampede problem; lesson 70 covers the mitigations. Recognise the symptom: a flat load on the database for hours, then a brief spike that takes everything down at the same instant the cache TTL aligns.

The patterns that survive

The architectural patterns that hold up under 10x are largely the ones the rest of this course has already introduced. The 10x exercise is a useful retrospective filter on the choices made earlier.

Stateless services with horizontal load balancing. The first principle. State lives in databases or caches, not in service memory. Adding capacity is adding instances.

Sharded data with sharded compute. Lesson 29 covered this; lesson 32 (Discord) showed the practice. A workload divided across many independent shards, each handling a manageable slice, scales by adding more shards rather than by making each shard larger.

Asynchronous, event-driven communication. A producer that emits an event into a topic and walks away does not wait for the consumers. The consumers process at their own pace, and a slow consumer cannot back-pressure the producer. The pattern decouples the failure modes: a downstream slowdown produces queue lag, not upstream timeout cascades.

Caching at the right layers. The lesson 70 territory in detail. CDNs at the edge, application caches for hot data, database query caches for repeated reads. Each layer absorbs traffic the next layer would otherwise have to handle, and the multiplicative effect is what makes 10x affordable.

Backpressure and circuit breakers. Michael Nygard’s “Release It!” laid these out as cornerstone patterns. Backpressure: when a downstream is overwhelmed, the upstream slows down or sheds load deliberately rather than piling on. Circuit breakers: when a downstream is failing, the upstream stops calling it for a cooling-off window, lets the system recover, and resumes carefully. Without these, a slowdown anywhere becomes an outage everywhere.

The “redesign at every order of magnitude” rule

A heuristic that has aged well across decades of scaling stories: most systems have to be redesigned at every order of magnitude in load. The architecture that fits 1000 users does not fit 10000. The architecture that fits 10000 does not fit 100000. The architecture that fits a million does not fit ten million.

The reason is the same in each step. At each scale, the components that were cheap at the previous scale become the bottleneck. The single database that was fine at a thousand users needs read replicas at ten thousand and sharding at a hundred thousand. The single application server that was fine at ten thousand needs horizontal scaling at a hundred thousand and a CDN at a million. The constants change; the shapes change.

The corollary is that designing the 1000x architecture at 1x is over-engineering. A system that needs sharding for the load it actually has is fundamentally different from a system that has been sharded “in case” for load it will not see for years, and the second one is harder to operate without the corresponding payoff. The Stripe deployment case (lesson 56) and the Netflix batch case (lesson 40) both make this point: pick the architecture for the next order of magnitude, not three orders out, and revisit when you cross the threshold.

What the 10x exercise gives you is the next-step view. It surfaces what would break at 10x, ranks the bottlenecks, and informs the next two quarters of capacity planning. Doing it once a year is a reasonable cadence; doing it before any major launch is mandatory.

The bottleneck progression

A typical web-and-data architecture has a recognisable bottleneck progression as load grows. The shape transfers across many published architectures.

flowchart LR
    LB[Load balancer] --> WEB[Web tier]
    WEB --> APP[Application tier]
    APP --> CACHE[Cache layer]
    APP --> DB[(Primary database)]
    APP --> EXT[External services]
    DB --> REPLICA[(Read replicas)]

    classDef linear fill:#d4edda,stroke:#155724
    classDef wall fill:#f8d7da,stroke:#721c24

    class LB,WEB,APP,CACHE,REPLICA linear
    class DB,EXT wall

The green nodes scale linearly with horsepower in the typical case. The red nodes hit walls and need redesign at successive orders of magnitude. The progression of bottlenecks under 10x growth is usually:

The web tier saturates on CPU or memory; add more instances.
The connection pool exhausts the database; add a connection pooler.
The single database leader saturates on writes; add read replicas, then shard or migrate.
The cache becomes a hot spot; add cache replicas or move to a sharded cache.
External service rate limits become the limiting factor; introduce circuit breakers and retry budgets.
The fan-out cost dominates; redesign for fewer, larger calls or asynchronous processing.

Diagram to create: a bottleneck-progression panel showing four stages of the same architecture (1x, 10x, 100x, 1000x), with the red “wall-hit” component at each stage labelled. The visual point is that the bottleneck moves: the database is the bottleneck at 10x, the cache becomes one at 100x, the cross-region replication becomes one at 1000x. Each redesign moves the wall further out.

Running the exercise

The mechanics of doing the 10x exercise on a real architecture take a focused half-day with the platform team, an architecture diagram, and current load metrics. The ritual:

Print or draw the current architecture, with each component annotated with its current load (RPS, CPU, memory, connection count, daily volume).
For each component, ask: what is the headroom? Is it at 10% utilisation, 50%, 90%? At 90% it breaks at less than 10x; at 10% it might survive 10x with no change.
For each component below 50% headroom, ask: how does it scale? Linearly with instances, with a manual sharding step, with a redesign? What is the operational cost of each?
Identify the three components that break first. Those are the next two quarters of platform-engineering work.

The output is a prioritised list of architectural risks, each with a rough cost to mitigate. The list is more honest than any other capacity-planning artefact, because it is grounded in the components actually in production rather than in a hypothetical model.

What the next lesson covers

This lesson identified caching as one of the survival patterns and pointed at the cache-stampede problem as a classic 10x failure. Lesson 70 dives into caching specifically: the three tiers (CDN, application, database), the four canonical cache patterns, the invalidation problem, and the stampede mitigations. The shape of caching changes how aggressively the rest of the architecture has to scale, and a team thinking carefully about caching often finds that the 10x exercise is less daunting than it first appeared.

Citations and further reading

Michael Nygard, “Release It! Design and Deploy Production-Ready Software”, second edition (Pragmatic Bookshelf, 2018). The canonical reference for stability patterns: timeouts, circuit breakers, bulkheads, backpressure. Every chapter is relevant to the 10x exercise.
Martin Kleppmann, “Designing Data-Intensive Applications” (O’Reilly, 2017). Chapter 1 frames reliability, scalability, and maintainability as the three concerns; the rest of the book covers each in depth. The scalability discussion in chapter 1 is one of the clearest written.
Pat Helland, “Life Beyond Distributed Transactions” (ACM Queue, 2007 and reprints). The argument that scaling forces the team to abandon comfortable abstractions; an old paper that aged well.
The High Scalability blog, http://highscalability.com/ (retrieved 2026-05-01). A long-running collection of scaling case studies that document the order-of-magnitude transitions in real architectures.
Stripe Engineering, “Online migrations at scale”, https://stripe.com/blog/online-migrations (retrieved 2026-05-01). A practical write-up of running the redesign at the 10x boundary while live traffic flows.