Data & System Architecture, from the ground up Lesson 18 / 80

Key-value stores: Redis, DynamoDB, when they win

Pure speed, pure simplicity. The use cases where a key-value store is the right answer: caching, sessions, rate limits, leaderboards.

The previous lesson made the case for Postgres as the boring correct default. This lesson is about the family of stores that beats Postgres at one specific game and loses every other game on the table: the key-value store. If you understand exactly what they trade, you will know when to reach for one and when reaching for one is a mistake you will pay for later.

The pitch in one sentence: a key-value store gives you O(1) lookups by a primary key, very high throughput, and very low operational complexity, in exchange for giving up joins, ad-hoc queries by non-key columns, and most of what makes SQL useful. When that trade matches your workload, the result is genuinely magical. When it does not, you will reinvent SQL on top of it, badly.

The data model

A key-value store maps keys to values. That is the whole model. Everything else is implementation detail.

Keys are usually strings, sometimes binary blobs, occasionally compound (a partition key plus a sort key, in DynamoDB’s vocabulary). Values are where engines diverge. Some treat values as opaque bytes (Memcached, classic DynamoDB). Others give the value internal structure: Redis values can be strings, integers, lists, sets, sorted sets, hashes, streams, bitmaps, hyperloglogs, geospatial indexes. The richer the value, the more useful patterns you can express without round-tripping data to the application.

What you cannot do, in any of them: query by anything other than the key (no WHERE status = 'pending' unless you separately maintain an index keyed on status); join two keys (you stitch in the application); run ad-hoc queries you did not anticipate (access patterns are baked into the schema); run multi-key transactions across the dataset, except in narrow forms.

What you get in return: sub-millisecond p99 latency at very high throughput (Redis comfortably handles hundreds of thousands of ops per second per node, DynamoDB holds single-digit-ms latency at any scale you can pay for); operational simplicity (the constrained data model means few moving parts); and linear horizontal scaling, in many cases automatic, because there are no cross-shard joins to worry about.

Redis: the in-memory swiss army knife

Redis (REmote DIctionary Server) was released in 2009 by Salvatore Sanfilippo. It is, mostly, an in-memory data structure server. Persistence is optional and configurable: snapshot-based RDB, append-only AOF, or both. The trade is that you keep the working set in RAM and pay for the RAM, in exchange for the speed of not touching disk.

The killer feature is the value-side richness. Redis is not just a key-value store; it is a key-to-data-structure store, and the data structures are the ones you actually want.

Caching is the canonical use case. The cache-aside pattern (lesson 70 covers it properly) puts Redis in front of Postgres: the application asks Redis first, falls back to Postgres on a miss, populates Redis on the way back. A well-tuned cache routinely takes 95% or more of the read traffic, multiplying the effective capacity of your Postgres instance by an order of magnitude. The hard parts are TTLs, invalidation, and key shape.

Session storage lets multiple stateless web servers serve a logged-in user without sticky load balancing. The session is a Redis hash keyed on a session ID; the application reads and writes attributes in O(1). Most modern web frameworks ship a Redis session backend out of the box.

Rate limiting is one counter per user per window. The fixed-window pattern is INCR rate:user:42:minute:1714521600 followed by EXPIRE. Sliding-window uses a sorted set with timestamp scores. Token-bucket uses a small atomic Lua script. Rate-limiting libraries for every popular language ship with a Redis backend.

Leaderboards are the application that sold sorted sets to a generation of game developers. A sorted set is unique members with numeric scores, kept sorted, with O(log N) insertion and O(log N + M) range reads. ZADD leaderboard:weekly 1234 player_42 and ZREVRANGE leaderboard:weekly 0 9 give you the top ten without scanning anything. No SQL solution comes close on throughput.

Pub/sub messaging is built in. PUBLISH channel "message" and any subscriber on that channel gets it. Fire-and-forget, no persistence: subscribers offline at publish time miss the message. Fine for low-volume in-system notifications (cache invalidations, in-process event listeners). For durability, ordering, or replay, use Kafka or a real broker (Module 4).

Distributed locks are possible, with caveats. The simple pattern (SET lock:thing value NX EX 30) is a single-node mutex with a timeout. Redlock extends this across a Redis cluster, though Martin Kleppmann’s 2016 critique argues it is not safe under all failure modes. Practical advice: do not use Redis locks for correctness-critical exclusion (use a real consensus system); they are fine for “let’s not have two cron jobs run at once.”

Streams (Redis 5+) are an append-only log resembling a poor person’s Kafka, with consumer groups and at-least-once semantics. Reasonable for low-to-medium-volume eventing inside a system.

The Redis weakness is the in-memory part: your dataset has to fit in RAM, or in the aggregate RAM of a cluster. If your hot working set is under a few hundred GB, fine. Multiple terabytes, wrong tool.

DynamoDB: the cloud-native key-value store

Amazon released DynamoDB in 2012 as a managed service, but the lineage goes back to the 2007 Dynamo paper by Werner Vogels and the AWS team. The paper described an internal Amazon system designed for the shopping-cart workload: very high availability, single-digit-millisecond latency, partition tolerance through consistent hashing, conflict resolution at read time. The published DynamoDB product is the descendant.

The data model is a table of items. Each item is a document (a set of named attributes) addressed by a primary key. The primary key is either a single partition key, or a partition key plus a sort key. Hash on the partition key, sort within the partition by the sort key. Within a single partition you can do range queries on the sort key, which is the one place DynamoDB lets you do something more than O(1) lookup without explicit indexes.

The pricing model shapes how you use it. You pay per request (or per provisioned read/write capacity unit if you choose that mode) and per GB stored. Reads can be eventually consistent (cheaper) or strongly consistent (twice the cost). Writes are always strongly consistent. The pricing makes you care about access patterns in a way that Postgres pricing does not: every “let me just scan the table” query is a real bill.

The famous pattern in the DynamoDB world is single-table design, popularized by Rick Houlihan and Alex DeBrie. The idea is that one table holds many entity types, with a denormalized schema where the partition key and sort key are designed to make all of your access patterns into single queries. A user’s profile, that user’s orders, the line items of those orders, all in one table, all retrievable with a single query. The schema looks bizarre to anyone trained on SQL (the partition key is something like USER#42 and the sort key is PROFILE or ORDER#2024-11-03), but it works because the access patterns were designed up front and the schema was reverse-engineered from them.

Single-table design is also where DynamoDB hurts the most when your access patterns change. Adding a new way to query the data usually means adding a Global Secondary Index (a separate copy of the data with a different partition key), which costs storage and write throughput, or it means a one-time backfill into a new shape. The flexibility you get for free in SQL (“just write a new query”) is paid for as a separate item on the bill in DynamoDB.

DynamoDB genuinely wins for workloads with very high write throughput and well-known access patterns (where Postgres-with-sharding would mean building your own sharding layer), for globally distributed applications (Global Tables give you multi-region active-active replication for free), for serverless stacks (Lambda plus API Gateway plus DynamoDB compose well, and per-request pricing matches spiky workloads), and for small teams where database operations are the bottleneck.

It is the wrong choice when access patterns will evolve (today’s single-table design is wrong for next year’s product, and migration is painful), for reporting and analytics (DynamoDB is not built for joins or scans; most teams export to a warehouse), or when the team has Postgres expertise and the workload would fit on Postgres anyway.

Other notable key-value stores

The KV space is wider than Redis and DynamoDB suggest. Memcached (2003) is the simpler, older counterpart to Redis: pure in-memory cache, no persistence, no rich data structures, still in heavy use at Facebook and Wikipedia. etcd is a consensus-backed (Raft) KV store, used by Kubernetes for cluster state, optimized for small strongly-consistent configuration data. Valkey is the post-fork Redis successor, started in 2024 after the Redis license change moved Redis away from open source; backed by AWS, Google, Oracle, and the Linux Foundation, it is the most likely future home of the open-source Redis tradition. RocksDB is an embedded KV engine used as the storage layer of many other databases.

When key-value stores are not enough

KV stores are great until you need a query they were not designed for. Queries by non-key fields (“find me all orders with status pending”) force you to maintain a separate index keyed on status, or to scan the dataset. Joins (“user details for everyone who placed an order in the last hour”) become “fetch orders, then fetch each user one at a time, then assemble”, an N+1 problem at the architecture level. Transactions across many keys are limited or unavailable: DynamoDB supports up to 100 items per transaction; Redis MULTI/EXEC works on a single shard. Aggregations (“sum of orders per region per month”) force you to maintain pre-computed totals or stream the data elsewhere for analytics.

Each of these can be worked around, but each workaround pushes complexity into the application. At some threshold, the pile of workarounds exceeds the complexity you were trying to avoid by not using SQL.

The cache-vs-primary distinction

The most common deployment shape in 2026 is Postgres as the system of record, Redis as a cache in front of it. Most teams do not need DynamoDB at all. The architecture looks like this:

flowchart LR
    Clients[Clients] --> API[API service]
    API -->|1 read| Redis[(Redis cache)]
    Redis -.->|2 miss| API
    API -->|3 fallback| PG[(Postgres)]
    PG -.->|4 row| API
    API -->|5 populate| Redis
    API -.->|6 response| Clients

The cache absorbs the read load. Postgres handles writes and cache misses. The cache has a TTL (seconds to minutes for hot data, longer for cold data), and the application invalidates entries on known changes. At no point in this shape is Redis the system of record: if Redis blows up, the application slows down but does not lose data; if Postgres blows up, you have a real outage.

The other shape (DynamoDB or Redis as the primary store) is a deliberate choice with deliberate tradeoffs: the workload genuinely fits the KV access pattern, operational simplicity is worth the loss of SQL, and the access patterns are stable enough that single-table design will not bite next quarter. Coherent, but less common than the discourse suggests.

Where this lesson lands

A key-value store is not a smaller relational database. It is a different shape, with different strengths, suitable for different workloads. Reach for one when the access pattern is genuinely “given this key, give me this value, fast, at high throughput, and I do not need to query it any other way.” Use Redis as a cache in almost every system; use DynamoDB as a primary store when its specific shape (cloud-native, single-table-design, well-known access patterns) is a real fit; and remember that “just put it in Redis” is not a complete architecture if you have not thought about persistence, consistency, and what happens when the cache is empty.

The next lesson covers the third major data-storage family: document stores, with MongoDB as the canonical example and a small history lesson about how the model fell out of fashion and then quietly came back.

Citations and further reading

  • Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, et al., “Dynamo: Amazon’s Highly Available Key-value Store”, SOSP 2007. The original Dynamo paper. Worth reading once for the consistency-vs-availability section, which is more honest than most modern marketing copy.
  • The Redis documentation, https://redis.io/docs/ (retrieved 2026-05-01). Especially the “Data types” pages and the patterns reference.
  • The DynamoDB Developer Guide, https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ (retrieved 2026-05-01). The “Best Practices” section is a master class in why single-table design works.
  • Alex DeBrie, “The DynamoDB Book” (2020). The reference text on DynamoDB modeling, including single-table design with worked examples.
  • Martin Kleppmann, “How to do distributed locking”, https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html (retrieved 2026-05-01). The Redlock critique. Read alongside Sanfilippo’s response for the full debate.
Search