Data & System Architecture, from the ground up Lesson 26 / 80

Replication lag and read-after-write consistency

The user-saw-stale-data bug. Why it happens with async replication, and the patterns to prevent it: read-your-writes, sticky sessions, monotonic reads.

The previous lesson described leader/follower replication as the default for most production systems. It also mentioned that asynchronous replication is the default within that default: the leader does not wait for followers to confirm before acknowledging a write. The latency cost of waiting is too high for most workloads, and a few seconds of replication lag is usually invisible.

Usually. This lesson is about the times it is not invisible, the user-perceived bugs that result, and the patterns that fix them. The phenomenon has a name (replication lag) and the bug class has a name (read-after-write inconsistency, or more generally, the consistency anomalies the user sees when their reads land on a follower that is behind the leader). Both names describe the same thing from different angles. The lesson is small, well-known, and the place where a lot of teams discover that “eventual consistency” is not a slogan but a daily operational concern.

The bug, told as a story

You have a leader/follower Postgres setup. One leader, three read replicas, all asynchronous. The application reads from any replica via a load balancer, and writes go directly to the leader. Replication lag is normally under a hundred milliseconds.

A user posts a comment. The application sends the INSERT to the leader, which returns success in a few milliseconds. The application redirects the user to the comment thread. The page-load happens to land on a read replica that has not yet received the change from the leader. The page renders without the user’s comment.

The user is confused. Did the comment go through? They click “post” again. This time the page-load happens to hit a replica that has caught up, and now they see two copies of the same comment. Or three. Or, the second INSERT lands on the leader before the first replica caught up, and the user posts twice and sees once, and this is even more confusing.

This is replication lag, made visible. The system is behaving exactly as designed: the write is durable, the read replicas are working, the load balancer is doing its job. The user is the one who sees an inconsistency, and the user does not care about CAP, PACELC (lesson 11), or the consistency models from lesson 12. They care that they cannot see what they just wrote.

This is the bug. It is one of four user-visible anomalies that asynchronous replication can produce.

Four classes of consistency anomaly

Read-your-writes violation. The bug above. The user makes a write, then immediately reads, and does not see their own write. This is the most directly user-perceivable failure mode. People notice when the thing they just typed has vanished. They do not notice when a row they have never seen is a few hundred milliseconds out of date.

Monotonic-reads violation. The user reads, sees X. They refresh. They see an older state, X minus one. This happens when two consecutive reads land on different replicas, and the second one is further behind the first. From the user’s perspective, time appears to run backwards. Not as immediately visible as the first anomaly, but disorienting when it happens, and especially bad in dashboards or activity feeds where stale-after-fresh looks like a glitch.

Causal violation. A user posts a question, and a friend posts an answer. Another user, watching the thread, sees the answer before they see the question. This happens because the question and answer were written at different times, replicated independently, and reached the watcher’s read replica out of order. Causality is broken: B appears to precede A, even though A caused B. Common in messaging, comment threads, and collaborative documents.

Stale data persisting at scale. A read replica falls minutes behind because of a long-running query, a network blip, or a load spike on the leader. Every user routed to that replica gets stale data for the duration. The application is not visibly broken (no error), but every read is wrong. The team finds out from a bug report or, worse, from a metrics dashboard that started lying to them.

The anomalies are listed in roughly increasing order of subtlety and decreasing order of how often the application gets blamed for them. Read-your-writes is the one users complain about. The other three are the ones that quietly erode trust in the system.

Read-your-writes guarantee

The simplest fix for the first anomaly is the bluntest. After a user writes, route their reads to the leader for some window of time. The leader has the write by definition; reads from it always see it.

sequenceDiagram
    participant U as User
    participant LB as Load balancer
    participant L as Leader
    participant F as Follower

    Note over U,F: Without read-your-writes (the bug)
    U->>LB: POST /comment
    LB->>L: INSERT comment
    L-->>LB: success
    LB-->>U: 200 OK
    U->>LB: GET /thread
    LB->>F: SELECT comments
    F-->>LB: stale list (no new comment)
    LB-->>U: missing comment

    Note over U,F: With read-your-writes (the fix)
    U->>LB: POST /comment
    LB->>L: INSERT comment
    L-->>LB: success
    LB-->>U: 200 OK (set sticky flag)
    U->>LB: GET /thread (sticky flag)
    LB->>L: SELECT comments
    L-->>LB: fresh list (with new comment)
    LB-->>U: comment present

The “window of time” question is what the implementation hinges on. Three flavours.

Time-based window. After a write, route reads to the leader for the next thirty seconds (or whatever value comfortably exceeds the worst-case replication lag). After the window, fall back to followers. Easy to implement, but every replica failure or slow link extends the worst-case lag, and your window has to be set conservatively.

Tracked-replica window. Each user’s session stores a token: the log sequence number, GTID, or timestamp of their most recent write. On each read, the application picks a replica that has caught up to at least that token, falling back to the leader if no follower has. Postgres exposes this through pg_last_wal_replay_lsn() and clients can pass Sync requests; MySQL has WAIT_FOR_EXECUTED_GTID_SET. More precise than time-based, more complex to implement, and worth it for systems where read load on the leader is a real cost.

Per-user-and-resource window. Track on a per-resource basis: the user only needs read-your-writes consistency on the things they wrote. Reading other people’s data tolerates the normal replication lag. This requires the application to know which reads are post-write and which are not, which usually maps to “reads from the same controller action that just did a write are routed to the leader, other reads are not”. Pragmatic and effective.

Sticky sessions

If a user is consistently routed to the same replica throughout their session, they get monotonic reads for free: the replica only moves forward in time, so consecutive reads from it cannot go backwards. This is “sticky sessions”: the load balancer or the application pins a user to one replica.

The trade-off is clear. Pinning to one replica breaks load balancing: if a popular user is pinned to a small replica, that replica gets overloaded. If a replica fails, every user pinned to it has to be re-routed, possibly to a replica that is further behind, possibly producing the monotonic-read violation you were trying to avoid. Sticky sessions also do not help across logout/login, across devices, or across long enough sessions that a single replica’s lag drifts.

In practice, sticky sessions are often combined with a tracked-replica token: pin where possible, and when re-routing is necessary, only re-route to a replica that has caught up to the user’s last-seen log position.

Monotonic-read guarantee

If you cannot or do not want to pin to a single replica, you can still get monotonic reads by ensuring each read is served from a replica at least as up-to-date as the previous one. The session tracks the most-recent log position the user has seen on any read; subsequent reads only go to replicas that have caught up to that position. If no replica has, the read goes to the leader.

This is a generalisation of the tracked-replica read-your-writes pattern: instead of tracking only “what the user wrote,” it tracks “what the user has seen”. The implementation cost is similar; the consistency property is stronger.

Causal consistency

Read-your-writes and monotonic reads each address one anomaly. Causal consistency addresses the third: ensuring that if write A causally precedes write B, every reader sees A before B. The mechanism is more involved.

The classical implementation is vector clocks: each replica tracks a vector of counters, one per other replica, that records how many writes from each replica it has applied. A write from one replica carries a vector clock; another replica only applies the write once its own vector dominates the dependencies. This guarantees causal order across the whole system but is expensive in metadata: every record carries a vector that grows with the number of replicas.

Most production systems do something cheaper. They track causal dependencies only within a session (the reads and writes one user has seen), pass them as a token, and ensure each read sees all causally-prior writes from that session. This is the pattern in MongoDB causal-consistent reads (where the client passes a clusterTime token between operations), in CockroachDB, and in similar systems. It is weaker than full causal consistency (cross-session causal links can still be violated), but it covers most practical cases for a fraction of the cost.

The cost of every guarantee

Each consistency property you add buys back some user-visible correctness at a measurable cost.

Read-your-writes sends some reads to the leader, reducing the read-scaling benefit of replicas. If a popular page does a write-then-read pattern, that page’s reads all hit the leader, and you have effectively unscaled them.

Sticky sessions break the load balancer’s ability to even out load and complicate replica failure handling. They also do not survive cross-device sessions cleanly.

Monotonic-read tokens add complexity to every read path: the application has to know the token, the load balancer has to filter replicas by it, and the data store has to expose log positions. Workable, but it is more moving parts.

Causal-consistency tokens add metadata to requests and responses, which costs bandwidth. They also require the application to thread the token through every call, including across services if the architecture is more than one service deep.

You do not pay all these costs everywhere. The reasonable engineering practice is to pay them on the paths where the bug would be most visible, and accept the default lag on the paths where it would not.

What “good enough” looks like in 2026

Most applications can tolerate a few seconds of replication lag for most reads. The user is not refreshing fast enough to notice. The data is not changing fast enough to be ambiguous. The application is implicitly relying on this, and that is fine.

The places where you should not rely on it are the ones where the user has a mental model that the system is supposed to satisfy. Three useful heuristics.

The user just saved their profile and the next page loads their profile? Read from the leader. The user just posted a comment and the comment thread loads? Read from the leader. The user just placed an order and is being shown the order summary? Read from the leader. These are all the read-your-writes pattern, applied selectively to the highest-visibility paths.

The user is browsing a feed of other people’s content? Read from a follower. A few seconds of staleness is invisible. The user is on a dashboard that updates every minute? Read from a follower. Staleness is part of the contract. Internal-only reads, like an admin tool that runs reports, can tolerate seconds or minutes of lag without anyone noticing.

The thing that hurts is user-perceived staleness: when the user has a model of what the data should be, and the system fails to match it. The thing that does not hurt is invisible staleness: when nobody is looking closely enough at any specific value to notice it is a second or two behind.

Engineering replication-lag mitigations is mostly about figuring out which reads are user-perceived and which are not, and applying the costlier patterns only to the former. Get this distinction right and the system feels strongly consistent for free, on the paths the user cares about, while keeping the read-scaling benefits of replicas everywhere else.

What the next lesson covers

Replication keeps multiple copies of the same data on different machines. Partitioning, the topic of lesson 27, splits the data so each machine holds a different subset. Most real systems do both: each partition is replicated, each replica is a leader or follower for its partition’s data. The patterns from this lesson and the next compose, and the resulting four-quadrant design (replicated and partitioned) is the default for any database that operates at significant scale.

Citations and further reading

  • Martin Kleppmann, Designing Data-Intensive Applications (O’Reilly, 2017), Chapter 5, “Problems with Replication Lag”. The reference treatment, with the same anomaly taxonomy used here.
  • Postgres documentation, “Hot Standby” and pg_last_wal_replay_lsn(), https://www.postgresql.org/docs/current/hot-standby.html (retrieved 2026-05-01). The mechanics of streaming replication and how to track replica catch-up.
  • MongoDB documentation, “Causal Consistency and Read and Write Concerns”, https://www.mongodb.com/docs/manual/core/causal-consistency-read-write-concerns/ (retrieved 2026-05-01). A worked example of causal-consistency tokens at the client/session level.
  • AWS RDS documentation, “Working with Read Replicas”, https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html (retrieved 2026-05-01). The operational view of read replicas, replication lag metrics, and failover.
  • Doug Terry et al., “Session Guarantees for Weakly Consistent Replicated Data”, IEEE PDIS 1994. The original paper that named read-your-writes, monotonic reads, monotonic writes, and writes-follow-reads.
Search