Caching is the architectural lever with the highest leverage and the highest tax. The leverage is enormous: a well-placed cache layer can absorb 90 to 99 percent of the read traffic that would otherwise hit the database, and the cost of the cache hardware is a fraction of the cost of the database it protects. The tax is the complexity it introduces: every cached value is a copy that can disagree with the source, every TTL is a guess about how stale is acceptable, every invalidation is a coordination problem, and the day a hot key expires is the day the database falls over.
This lesson covers caching as a system-design problem. The three tiers where caches live, the four canonical patterns for keeping the cache in sync with the source of truth, the invalidation strategies that exist along the consistency-versus-correctness axis, and the cache-stampede problem that produces the most common cache-related outage. The specifics are Redis, CloudFront, and Memcached because those are the dominant tools in 2026, but the patterns are tool-agnostic.
The three caching tiers
A reasonably-architected system has caches at three logical layers, each absorbing a different class of traffic.
The CDN tier. Content delivery networks (CloudFront, Cloudflare, Fastly, Akamai) cache responses at edge locations close to users. The cached content is anything mostly-static: HTML pages, images, JavaScript bundles, CSS, video segments, font files, API responses with long TTLs. CDN cache hits never touch the origin servers; they are served from a node geographically near the user, in tens of milliseconds rather than hundreds. The CDN tier is the fastest cache because it is the closest to the consumer.
The economics of the CDN tier are favourable. As lesson 68 covered, CDN egress is typically cheaper than direct origin egress, and the cache-hit ratio further reduces origin load. A team putting a CDN in front of static assets typically sees both lower bills and lower latency, with no consistency cost (the assets are immutable or have well-understood revalidation rules).
The application cache tier. Redis or Memcached, sitting between the application and the database. The application checks the cache first; on a hit, it returns the cached value; on a miss, it queries the database and (in most patterns) populates the cache for the next caller. The application cache is the workhorse: it caches expensive query results, computed values, session data, feature flags, rate-limit counters, leaderboards, and the long tail of “things the application reads more often than it writes”.
Redis dominates this tier in 2026. Memcached is still around, simpler and faster for the pure-key-value case, but Redis’s richer data structures (sorted sets, streams, hyperloglogs, pub/sub) and its persistence options have made it the default choice. Managed offerings (ElastiCache, MemoryDB, Upstash, Redis Cloud) handle the operational tax of running it.
The database cache tier. This one is often invisible. Postgres has a query result cache for prepared statements; the buffer pool caches recently-read pages; materialised views cache pre-computed results. MySQL has its own buffer pool and (until 8.0) had a query cache. The database cache tier is the closest to the data and the most limited in size, but for the workload the database itself is handling, these caches are doing real work without the application team having to think about them.
The three tiers form a hierarchy. A request that hits the CDN never reaches the application; a request that hits the application cache never reaches the database; a request that hits the database buffer pool never reaches the disk. Each layer absorbs traffic the next would otherwise have to handle. The architectural job is choosing what lives at which layer and how each layer stays consistent with the source.
The four canonical patterns
Once a cache exists, the question is how to keep the cached values consistent with the underlying source. Four patterns are the textbook canon. Each makes a different trade-off between simplicity, consistency, and write throughput.
Cache-aside, also called lazy loading. The application is responsible for both reads and writes. On a read, the application checks the cache first. On a hit, it returns the cached value. On a miss, it queries the database, populates the cache with the result, and returns. On a write, the application writes to the database and either invalidates the cache entry or updates it. This is the most common pattern in real systems and the one Redis examples most often demonstrate.
The advantages: the cache is a separate, optional layer; the application code is in control; the failure mode is graceful (a cache outage means slower reads, not broken reads). The disadvantages: there is a window between the database write and the cache invalidation where a stale value can be read; the application code has the cache logic threaded through it; cache misses pay the full database cost.
Read-through. The cache fetches from the database transparently. The application asks the cache for a key; the cache, on a miss, fetches from the database, populates itself, and returns. The application code does not see the database call directly; the cache abstracts it.
The advantage: less cache-handling code in the application. The disadvantage: tighter coupling between cache and database; the cache needs to know how to read from every data source it caches; failure modes are murkier (a cache failure becomes a database failure from the application’s view). Read-through is more common in caching libraries that sit at the data-access layer (some ORMs, some sidecar caches) than in raw Redis deployments.
Write-through. The application writes to the cache, and the cache writes to the database synchronously. Both are updated before the write is acknowledged. Strong consistency is preserved: a read after a write sees the new value either in cache or in database, never the old one.
The advantage: consistency. The disadvantage: write latency is the sum of cache write and database write, slower than either alone. Write-through is appropriate for read-heavy workloads where consistency is critical and write throughput is not the bottleneck.
Write-behind, also called write-back. The application writes to the cache; the cache acknowledges immediately and asynchronously flushes to the database. The fastest write pattern, because the application sees only the cache latency.
The advantage: write throughput. The disadvantage: if the cache crashes before the flush completes, data is lost. Write-behind is appropriate for workloads that tolerate occasional data loss (counter increments, analytics events, click tracking) and inappropriate for anything where a missing write is a correctness problem.
A practical caveat: real systems mix these. A given Redis deployment might cache one set of keys cache-aside, another set of keys write-through, and a third set as a write-behind buffer for an event log. The patterns are not exclusive choices for the whole system; they are tools per use case.
The invalidation problem
Phil Karlton’s quip that “there are only two hard things in computer science: cache invalidation and naming things” is a meme but the cache-invalidation half is genuine. The problem is that the cache holds a copy of data the source might change, and there has to be a mechanism that brings the cache back in sync. The mechanisms exist on a spectrum from “easy and lossy” to “hard and correct”.
TTL-based expiry. Each cached value is stored with a time-to-live (60 seconds, 5 minutes, 1 hour). When the TTL elapses, the cache discards the value, and the next read repopulates from the source. The simplest pattern, and almost always the one to start with.
The trade-off is acceptable staleness: between a database write and the next TTL expiry, the cache serves a stale value. For many use cases (a leaderboard, a category list, a homepage feed) this is fine. For others (an account balance, a permission check) it is not.
Explicit invalidation. When the application writes to the database, it also tells the cache that a particular key is no longer valid. The cache evicts the entry; the next read repopulates. The mechanism is either a direct cache delete call or a publish-subscribe message that any caching layer subscribes to.
The advantage is correctness: a write is followed by an invalidation, and subsequent reads see the new value. The disadvantage is that every code path that writes to the database has to remember to invalidate, and the surface area for bugs is large. A single forgotten call is a stale-cache bug that can persist for the lifetime of the cached entry.
Cache versioning. Instead of invalidating individual keys, bump a version key. Each cached entry has a version stamp; on read, the cache checks whether the entry’s version matches the current version key; if not, the entry is treated as a miss. To “invalidate everything tagged with category 5”, increment the version key for category 5; all cached entries from before the increment are now stale.
The pattern is useful for invalidating groups of entries that are hard to enumerate. The cost is a version-key read on every cache read, which is cheap if the version keys themselves are cached locally.
Event-driven invalidation. A change-data-capture stream from the database (lesson 46’s territory) feeds a Kafka topic; cache services subscribe and invalidate the affected keys when a relevant change arrives. This is the most architecturally clean pattern: the source of truth emits change events, every consumer that cares (caches, search indices, downstream replicas) listens, and the invalidation is automatic.
The cost is the infrastructure: a CDC pipeline, a topic, the consumer plumbing. For a system already running CDC for other reasons, the marginal cost of feeding the cache is low. For a system without CDC, building it specifically for cache invalidation is rarely worth it.
The pragmatic recommendation is layered. Use TTL as the default baseline (it eventually corrects any inconsistency). Add explicit invalidation for the keys whose staleness is genuinely harmful. Reach for CDC-driven invalidation when the team already has the infrastructure and the workload’s correctness needs justify it.
The cache stampede
The single most common cache-related outage in production systems is the cache stampede, also called the dogpile or the thundering herd. The pattern is consistent enough across companies that it has its own folklore.
The setup. A hot key (the homepage feed, the popular product listing, the global counter) is cached with a TTL. The TTL elapses. At that exact moment, hundreds or thousands of concurrent requests are reading the key. Each request misses the cache. Each request queries the database. The database, which was sized for the load post-cache, sees a sudden spike of identical queries from every concurrent reader simultaneously. It saturates. Latency goes to seconds. Connections back up. The application tier times out. The cache eventually gets repopulated, but by something that arrives via paging the on-call.
The root cause is a coordination failure. There is no protocol that says “if many readers miss simultaneously, only one should refill”. The default behaviour is “everyone refills independently”.
The mitigations are well-known and worth implementing for any cache key with serious traffic.
Single-flight refresh, also called request coalescing. When a cache miss occurs, a lock is taken on the key. The reader holding the lock fetches from the database and populates the cache. Other readers that miss the same key wait briefly for the lock to release, then read from the now-populated cache. Only one database query happens per miss event regardless of concurrency. Implementations exist for Redis (using SET NX as the lock primitive), for application frameworks (Go’s singleflight package, Java’s Caffeine), and for caching libraries broadly.
Probabilistic early refresh. Before the TTL fully elapses, a probabilistic check refreshes the value. The probability increases as the TTL approaches expiry. With this pattern, the cache repopulates on a few unlucky reads near expiry rather than waiting for everyone to miss simultaneously. The original paper is Vattani et al., “Optimal Probabilistic Cache Stampede Prevention” (PVLDB 2015).
Background refresh. A scheduled job refreshes the cached value before the TTL elapses, regardless of whether anyone is asking for it. Hot keys never expire from the readers’ point of view. The pattern is appropriate for predictable hot keys (the homepage, the leaderboard, the daily aggregate) and inappropriate for the long tail (millions of per-user keys would be impractical to refresh on a schedule).
Stale-while-revalidate. A cache entry past its TTL is returned to the reader anyway, while a background process refreshes it. The reader gets a slightly-stale value; the cache stays warm. CDNs implement this natively (CloudFront’s stale-while-revalidate directive, Fastly’s similar feature). Application caches can implement it manually.
flowchart TD
R1[Reader 1] --> C{Cache lookup}
R2[Reader 2] --> C
R3[Reader 3] --> C
C -->|hit| HIT[Return cached value]
C -->|miss| LOCK{Acquire single-flight lock}
LOCK -->|got lock| DB[(Database query)]
LOCK -->|waiting| WAIT[Wait briefly]
DB --> POP[Populate cache, release lock]
POP --> HIT
WAIT --> C
Diagram to create: an animation-friendly version showing the stampede scenario on the left (every reader hits the database simultaneously when the TTL expires) and the single-flight pattern on the right (one reader fills the cache while the others wait). The visual point is the asymmetry between the two: the database load on the left peaks proportionally to the number of concurrent readers; on the right, it is constant regardless of concurrency.
Cache sizing and eviction
A cache is by definition smaller than the source it caches. When the cache is full, something has to give. The eviction policy decides what.
LRU (least recently used). Evict the entry that has not been read for the longest time. The default for most caches; matches the common case where recent reads predict near-future reads. Implemented natively in Redis (maxmemory-policy allkeys-lru) and Memcached.
LFU (least frequently used). Evict the entry with the fewest accesses. Better for workloads with hot keys that should stay regardless of recent access. Redis’s allkeys-lfu is the option.
Random. Evict a randomly-chosen entry. Surprisingly competitive with LRU in some workloads, and cheap to implement. Redis’s allkeys-random.
TTL-based. Evict whichever entry expires soonest. Useful when entries have meaningful TTLs and the cache should be kept fresh.
The choice rarely matters in the steady state, but the wrong choice produces correlated evictions that look like cache failures. A workload with hot keys evicted under naive LRU because of a brief flood of cold-key reads is a recurring source of “the cache is degraded” alerts.
A worked example
A typical e-commerce product page illustrates how the layers come together.
The HTML scaffold (header, footer, navigation, JavaScript bundles) is served by CloudFront from edge locations, with a TTL of an hour and revalidation on tag-based invalidation when a deployment ships.
The product data (name, description, image URLs, base price) is fetched by the application from Redis, which holds the product row keyed by product ID with a 60-second TTL. On a cache miss, the application queries Postgres, populates Redis under a single-flight lock, and returns. When a merchandiser updates the product description, the publishing pipeline writes to Postgres and explicitly invalidates the Redis key.
The pricing (which depends on the user’s location, currency, and active promotions) is too dynamic to cache the result, but the inputs (the promotion list, the currency rate) are themselves cached at the application tier with shorter TTLs and explicit invalidation when a promotion is activated.
The reviews (long tail of read traffic, slowly changing content) are cached at both the application tier and the CDN tier, with the CDN serving the bulk of the traffic and the application tier acting as a backstop for cache misses.
The inventory (count of remaining units) is read directly from Postgres without caching, because the staleness cost is too high (overselling is more expensive than the database load) and the read pattern is moderate. A different team might cache it with a 1-second TTL, accepting brief inaccuracy as a price for reduced load.
The shape of the example is the shape of every real caching architecture: different keys at different layers with different policies, chosen per use case based on the staleness cost, the read load, and the cost of a miss.
What this lesson sets up
Module 9 is about cost optimisation, and caching is one of its largest levers. A team that caches well pays for less database capacity, less compute, less network egress, and less latency-sensitive infrastructure overall. The next lessons in Module 9 cover storage layout, query optimisation, and the FinOps discipline that makes cost a first-class engineering concern. Caching is the first of those levers, the most architecturally visible, and the one most likely to compound across years of operation.
Citations and further reading
- Andrei Vattani, Flavio Chierichetti, Keegan Lowenstein, “Optimal Probabilistic Cache Stampede Prevention”, PVLDB 2015,
http://www.vldb.org/pvldb/vol8/p886-vattani.pdf(retrieved 2026-05-01). The paper that formalised the probabilistic early-refresh approach. - Redis documentation, “Eviction policies”,
https://redis.io/docs/latest/operate/oss_and_stack/management/config/(retrieved 2026-05-01). The configuration reference for the policies discussed above. - Redis documentation, “Distributed locks with Redis”,
https://redis.io/docs/latest/develop/use/patterns/distributed-locks/(retrieved 2026-05-01). The reference for the single-flight pattern at the Redis level. - Memcached wiki,
https://github.com/memcached/memcached/wiki(retrieved 2026-05-01). For the simpler cousin to Redis and the still-relevant baseline for raw key-value caching. - AWS, “ElastiCache for Redis caching strategies”,
https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Strategies.html(retrieved 2026-05-01). A practical walkthrough of the four canonical patterns in an AWS-managed context. - Cloudflare blog, “Cache stampede protection”,
https://blog.cloudflare.com/(retrieved 2026-05-01). Vendor-perspective writing on stale-while-revalidate and edge-cache patterns. - Phil Karlton’s “two hard things” quote is widely attributed; the original Netscape-era context is documented in the c2 wiki and Martin Fowler’s bliki,
https://www.martinfowler.com/bliki/TwoHardThings.html(retrieved 2026-05-01). - Michael Nygard, “Release It!”, second edition (Pragmatic Bookshelf, 2018). The stampede pattern is covered under the broader “stability patterns” treatment, alongside the bulkhead and circuit-breaker patterns relevant to lesson 69.