Module 6 has been building up the streaming model: the engines (lesson 41), the topologies (lesson 42), the state stores (lesson 43). All of that machinery was about how a stream processor moves and remembers data. This lesson is about a problem that the machinery alone does not solve, the one that derails most first attempts at a windowed aggregation: time.
There are two times in any streaming system, and they are not the same time. If you do not separate them in your head, the rest of streaming will quietly betray you. The bug is not loud. Your dashboard does not crash. It just shows numbers that look reasonable and are subtly wrong, and you find out three months later when finance reconciles them against the source of truth and asks polite questions.
The two times
Event time is when the event happened in the world. A user clicked a button at 10:00:00. A sensor read a temperature at 14:23:17. A payment cleared at 02:11:09. The timestamp is a property of the event itself, set by the device or service that produced it, embedded in the message payload.
Processing time is when the stream processor saw the event. The same click that happened at 10:00:00 might arrive at the processor at 10:00:03 because of normal network latency, at 10:00:30 because the broker had a small backlog, at 11:30:00 because the user’s phone was offline on the metro and the SDK buffered events for ninety minutes, or, in the genuinely pathological case, at 02:00 the next morning because a regional outage held a queue for hours and it drained when the alert finally fired.
The gap between event time and processing time is called skew. In a calm system the skew is seconds. In a real system, with mobile clients, retries, partition rebalancing, and the occasional mid-day broker incident, the skew has a long tail. The 99th percentile is minutes. The 99.9th is sometimes hours.
A windowed aggregation has to pick one of these times to bucket on, and the choice changes everything.
Why processing-time windows are tempting and wrong
Windowing by processing time is the easy default. Every event is bucketed by the wall clock at the moment the processor sees it. There is no late data, by definition: a window closes when the wall clock advances past its end, and any event that arrives after that is just put into a later window. The implementation is a few lines.
It is also wrong for any business-correct aggregation, and the reason is the skew. Imagine a “clicks per minute” dashboard. At 10:00:30 a regional CDN node hiccups and 20% of events are delayed by 90 seconds. They arrive at the processor between 10:02:00 and 10:02:15. In a processing-time window, those events are now counted in the 10:02 bucket, even though they happened in 10:00. The 10:00 bucket undercounts. The 10:02 bucket overcounts. Neither matches what users actually did.
For a dashboard that is roughly indicative, you might tolerate this. For billing, fraud detection, anomaly thresholds, A/B test outcomes, or anything you compare to a database aggregate, you cannot. The aggregates have to be in event time, because event time is the only time that means anything outside the streaming system.
Event-time windows and the completeness problem
Window by event time and the bucketing is correct: the click at 10:00:00 is always in the 10:00 bucket, no matter when it arrives. The problem moves elsewhere. When can you emit the result for the 10:00 bucket?
If you wait forever, you can be sure no late event will ever change the answer, but you never emit. If you emit at 10:01 the moment the wall clock leaves the window, you might be right and you might be missing 20% of the events because of that CDN hiccup. You need a notion of “we have probably seen all events with event time up to T” so that windows up to T can be closed with confidence.
That notion is a watermark.
Watermarks
A watermark is the streaming engine’s estimate of “the event-time clock has advanced to at least T.” It is not a guarantee. It is a heuristic, computed from the events flowing through the pipeline, that says: based on what we have seen, we believe no event with event time earlier than T will arrive from now on. Windows that ended before T can be safely closed and emitted.
The trade-off is direct. A tighter watermark, one that advances aggressively close to the latest event time seen, lets you emit results quickly. It also means more genuinely late events fall outside the watermark by the time they arrive, and have to be either dropped or routed to a special path. A looser watermark, one that lags behind by some configured tolerance, catches more late events inside the windows where they belong, at the cost of higher emit latency.
There is no universal right answer. A real-time fraud signal needs results in seconds and is willing to drop some late events. A nightly billing aggregate can lag by an hour and capture nearly everything. The watermark policy is a per-pipeline choice, and the right one depends on the consumer.
The major engines compute watermarks differently. Flink lets sources emit per-partition watermarks that the runtime combines: the operator’s effective watermark is the minimum of its inputs’ watermarks, so a slow partition holds the whole job back, which is correct but sometimes painful. Spark Structured Streaming uses withWatermark("event_time", "10 minutes"), a per-stream policy that says “the watermark lags 10 minutes behind the maximum event time seen.” The PySpark course in Module 9 covers the Spark-specific patterns in detail. Kafka Streams maintains a per-task timestamp derived from the records each task has consumed, and uses that to drive window emission and state cleanup.
The mechanics differ. The contract is the same: the watermark is the engine’s best guess at “we have probably seen everything up to here,” and you, the developer, configure how aggressive that guess is.
Late events and what to do with them
No matter how the watermark is configured, some events will arrive after their window has closed. There are three ways to handle them, and you usually combine.
Drop them. The simplest option. The engine logs that a late event was dropped, optionally exposes a metric, and moves on. This is correct when the use case can tolerate a small loss tail (real-time monitoring, telemetry that already has a batch reconciliation behind it) and incorrect when it cannot.
Side output. Most engines let you route late events to a separate stream rather than dropping them. Flink calls this a “side output.” Spark exposes it through a custom sink. The late stream can be persisted, batched, and processed by a slower pipeline that does not have to be real-time. This is the right pattern when late events are rare but important: a billing event that arrived three hours late still has to land in the right invoice, just not in the real-time dashboard.
Allowed lateness. The window stays open longer than the watermark would suggest. When a late event arrives, it joins the window’s state, and the engine emits an updated result. Downstream has to be ready to handle updates rather than a single final emission, and the state store has to keep the window’s data around for the allowed-lateness duration, so this option is the most expensive in memory and the most demanding for downstream consumers.
The choice falls out of two questions. How important is it that late events are reflected? How willing is the downstream sink to update results that it has already received? If the answers are “very” and “yes,” allowed lateness. If “very” and “no,” side output and a reconciling batch job. If “not very,” drop and metric.
A worked example
A click stream with a 1-minute tumbling window in event time. The watermark is configured to lag 30 seconds behind the maximum event time seen. Allowed lateness is zero: late events go to a side output.
A click happens at event time 10:00:15 and arrives at the processor at processing time 10:00:18. The processor places it in the 10:00 window. The watermark, which is at 10:00:15 minus 30 seconds, is at 09:59:45. The 10:00 window does not yet close.
Events for the rest of the 10:00 minute arrive normally. The maximum event time seen reaches 10:00:58 at processing time 10:01:01. The watermark is now at 10:00:28. Still inside the 10:00 window.
By processing time 10:01:32, events with event time up to 10:01:02 have been seen. The watermark is at 10:00:32. The 10:00 window’s end (exclusive) is 10:01:00, so the watermark has now passed it. The window emits its result and closes.
A click that happened at event time 10:00:45, but was buffered on a mobile client and arrives at processing time 10:01:35, lands after the watermark has passed 10:00:32 by three seconds. The 10:00 window has closed. The event is routed to the late side output.
A click with event time 10:00:50 that arrives at processing time 10:06:00, after a five-minute mobile-network outage, also lands in the side output. By that point the watermark is well past 10:00. The dashboard never sees this click. The side-output processor, running on a slower cadence, will eventually fold it into a reconciled total.
sequenceDiagram
participant Source
participant Processor
participant Window10 as Window 10:00 to 10:01
participant SideOut as Late side output
Source->>Processor: click(et=10:00:15) at pt=10:00:18
Processor->>Window10: add to state
Source->>Processor: click(et=10:00:58) at pt=10:01:01
Processor->>Window10: add to state
Note over Processor: watermark = max_et - 30s = 10:00:28
Source->>Processor: click(et=10:01:02) at pt=10:01:32
Note over Processor: watermark = 10:00:32 > 10:01:00? not yet
Source->>Processor: click(et=10:01:08) at pt=10:01:38
Note over Processor: watermark = 10:00:38, still less than 10:01:00
Source->>Processor: click(et=10:01:31) at pt=10:02:01
Note over Processor: watermark = 10:01:01 > 10:01:00, close 10:00 window
Processor->>Window10: emit result, close
Source->>Processor: click(et=10:00:45) at pt=10:01:35
Note over Processor: 10:00 window already closed
Processor->>SideOut: late event(et=10:00:45)
The discipline
Two rules cover most of the trouble.
The first: always include the event timestamp in the message. Every producer, in every service, embeds the time the event occurred. Not when the producer sends it. Not when the broker receives it. When the user clicked, the sensor read, the trade happened. If you control the producer, this is a one-line change. If you are reading from a third-party source that does not include event time, treat the ingestion timestamp as event time and document the limitation; downstream will see whatever skew the source had.
The second: decide the watermark policy before you build the dashboard, not after. Watermark choice is a product question. How fast do results need to appear? How tolerant is the consumer to late updates? What fraction of events arrive late, and how late? The answers determine whether you drop, side-output, or update, and they are easier to ask up front than to retrofit when the first inconsistency complaint arrives.
The lesson on exactly-once is next, and it shares a piece of DNA with this one: streaming is honest about the failure modes that batch hides. Watermarks expose the cost of taking time seriously. Exactly-once exposes the cost of taking duplicates seriously. Both are work. Both are unavoidable in any pipeline whose answers people use to make decisions.
Citations and further reading
- Tyler Akidau, Slava Chernyak, Reuven Lax, “Streaming Systems” (O’Reilly, 2018). The reference text on event time, watermarks, and the conceptual model behind Apache Beam. Retrieval 2026-05-01.
- Apache Flink documentation, “Event Time and Watermarks”,
https://nightlies.apache.org/flink/flink-docs-stable/docs/concepts/time/(retrieved 2026-05-01). - Spark documentation, “Structured Streaming Programming Guide: Handling Late Data and Watermarking”,
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html(retrieved 2026-05-01). - Tyler Akidau, “The world beyond batch: Streaming 101” and “Streaming 102”, O’Reilly Radar (retrieved 2026-05-01). The two essays that introduced the event-time and watermark vocabulary now used across the industry.