Python, from the ground up Lesson 4 / 60

Iterators, generators, comprehensions — pulling them apart

Three related concepts that most Python writers blur together. The differences matter when memory matters and when laziness matters.

If you’ve written Python for more than a week, you’ve used all three of these. You’ve written for x in something:. You’ve written [x*2 for x in xs]. You’ve maybe even written yield. And if someone asked you “what’s the difference between an iterable, an iterator, a generator, and a comprehension,” you’d probably do what most people do — squint, mumble something about laziness, and change the subject.

That’s fine for casual code. It stops being fine the day you have to stream a 50 GB CSV through a memory-constrained container, or debug a generator that mysteriously runs twice with no results the second time. The four words mean four different things. Today we pull them apart.

Iterable vs iterator: the two-sentence version

An iterable is anything you can put after in in a for loop. Lists, tuples, dicts, sets, strings, files, ranges, custom classes that implement __iter__. The thing itself doesn’t track position; you can loop over it as many times as you want.

An iterator is the stateful, single-use thing that actually walks through an iterable. It tracks where it is. When it runs out, it raises StopIteration. Once exhausted, it’s done — restart it by getting a fresh one.

xs: list[int] = [10, 20, 30]      # iterable

it = iter(xs)                      # iterator, freshly born
print(next(it))                    # 10
print(next(it))                    # 20
print(next(it))                    # 30
print(next(it))                    # raises StopIteration

for x in xs: is sugar for “call iter(xs) once, then call next() on it in a loop until StopIteration, then stop.” The list xs is the iterable. The thing returned by iter(xs) is the iterator. They are not the same object, and the distinction is what lets you loop over the same list twice without it being “used up.”

A subtle gotcha: an iterator is also an iterable (it has __iter__, which returns itself). That’s why for x in some_generator: works. But you can only walk it once:

g = (x*2 for x in [1, 2, 3])       # generator expression
print(list(g))                     # [2, 4, 6]
print(list(g))                     # [] — already exhausted

If you’ve ever written a function that returns a generator and then tried to iterate over the result twice, this is the bug.

The iterator protocol on custom classes

If you want your own class to work in a for loop, you need two methods:

from typing import Iterator


class CountDown:
    def __init__(self, start: int) -> None:
        self.start = start

    def __iter__(self) -> Iterator[int]:
        # Return a fresh iterator each time — this is what makes
        # CountDown itself an *iterable*, not an iterator.
        return CountDownIterator(self.start)


class CountDownIterator:
    def __init__(self, current: int) -> None:
        self.current = current

    def __iter__(self) -> "CountDownIterator":
        return self

    def __next__(self) -> int:
        if self.current <= 0:
            raise StopIteration
        value = self.current
        self.current -= 1
        return value


for n in CountDown(3):
    print(n)
# 3, 2, 1

That is a lot of ceremony for “count down.” Reading it, you can see why yield exists.

Generators: iterators without the boilerplate

A generator is an iterator built from a function with yield in it. The interpreter does the __iter__/__next__/StopIteration plumbing for you.

from typing import Iterator


def count_down(start: int) -> Iterator[int]:
    while start > 0:
        yield start
        start -= 1


for n in count_down(3):
    print(n)
# 3, 2, 1

Same behaviour as the class above. About one-fifth the code. This is the reason iterator classes are rare in modern Python — you reach for one only when the state is genuinely complex (e.g., a tree traversal that needs to resume mid-recursion, or an iterator that has methods other than __next__).

When the function hits yield, it pauses. Local variables, the program counter, the stack — all frozen. The next next() call resumes from that exact point. When the function returns (or runs off the end), the generator raises StopIteration automatically.

yield from lets one generator delegate to another:

from typing import Iterator


def numbers() -> Iterator[int]:
    yield from range(3)        # 0, 1, 2
    yield from [10, 20, 30]    # 10, 20, 30
    yield 99                   # 99


print(list(numbers()))
# [0, 1, 2, 10, 20, 30, 99]

Without yield from you’d write for x in range(3): yield x — fine, but less direct.

Comprehensions: syntactic sugar for building things

A list comprehension builds a list with a compact loop:

xs: list[int] = [1, 2, 3, 4, 5]

doubled = [x * 2 for x in xs]
# [2, 4, 6, 8, 10]

evens_doubled = [x * 2 for x in xs if x % 2 == 0]
# [4, 8]

Equivalent to writing the loop and .append() calls by hand, but shorter and slightly faster (the interpreter optimises it). The same syntax exists for sets and dicts:

unique_lengths = {len(s) for s in ["hi", "hello", "hey"]}
# {2, 5, 3}

word_lengths = {s: len(s) for s in ["hi", "hello", "hey"]}
# {'hi': 2, 'hello': 5, 'hey': 3}

There is no tuple comprehension. The syntax (x*2 for x in xs) looks like one but isn’t — it’s a generator expression. To build a tuple from a comprehension you write tuple(x*2 for x in xs).

The memory difference: list vs generator expression

This is the one practical thing that matters most.

# List comprehension — builds the whole list in memory
squares_list = [x * x for x in range(10_000_000)]
# Memory: ~80 MB for a list of 10 million ints. Allocated up front.

# Generator expression — builds nothing, yields on demand
squares_gen = (x * x for x in range(10_000_000))
# Memory: ~200 bytes. Just the generator object.

If you’re going to consume every element and want random access later, the list is fine. If you’re going to consume them once, in order, the generator is almost always the right call.

The classic example: streaming a huge file.

from typing import Iterator


def numeric_columns(path: str, col: int) -> Iterator[float]:
    with open(path, encoding="utf-8") as f:
        next(f)                          # skip header
        for line in f:                   # files are iterators of lines
            parts = line.rstrip("\n").split(",")
            yield float(parts[col])


total = 0.0
count = 0
for value in numeric_columns("orders_50gb.csv", col=4):
    total += value
    count += 1

print(total / count if count else 0.0)

That program computes the average of one column across a 50 GB file using a few kilobytes of RAM. The file object itself is an iterator — for line in f: reads one line at a time. The generator function pipes those lines through a transformation, one at a time. Nothing materialises. This is the shape of every memory-bounded data pipeline you’ll ever write in Python.

The same code with a list comprehension would try to load all 50 GB into a Python list. Your container would OOM at the 30% mark.

itertools: the standard-library toolkit

itertools is a module of generator-based building blocks. A few I use weekly:

import itertools

# chain: concatenate iterables lazily
combined = itertools.chain([1, 2], [3, 4], [5])
# 1, 2, 3, 4, 5 — without ever building a combined list

# islice: slice an iterator without converting to a list
first_ten = list(itertools.islice(numeric_columns("huge.csv", 4), 10))
# Reads exactly 10 lines from the file, then stops.

# tee: split one iterator into N independent iterators
a, b = itertools.tee(numeric_columns("huge.csv", 4), 2)
# a and b can each be consumed independently — but tee buffers the
# values neither has consumed yet. If a races far ahead of b, you're
# back to holding most of the data in memory.

# groupby, accumulate, pairwise (3.10+), batched (3.12+)...

I won’t enumerate the whole module — help(itertools) is what you want when you need a tool. But once you see the pattern (everything is a generator, everything composes), you stop writing manual loops for things like “iterate in pairs” or “take every nth element.”

When to use which

A rough decision tree:

  • Need a finite collection you’ll index into or iterate multiple times? Use a list or a list comprehension.
  • Need to transform-and-iterate-once over something potentially huge? Generator expression or generator function.
  • Need behaviour beyond __next__ — say, an iterator with a reset() method or extra state? Write a class.
  • Composing standard transformations? itertools first, custom code only when nothing fits.

The mistake to avoid: writing [x for x in big_thing if cond] and then immediately iterating over it once. That’s a generator expression in disguise — drop the brackets, save the memory.

Common gotchas

Generators are single-use. If you need to walk the same data twice, either store it in a list or call the generator function twice (which gives you a fresh generator each time).

Late binding in closures inside comprehensions. A comprehension’s expression is evaluated lazily for generator expressions — the loop variable in a list comprehension is fine, but in a generator expression the iterable is bound at creation time, while the rest is lazy. The classic trap:

gens = [(x * i for x in range(3)) for i in range(3)]
# Each generator captures `i` by reference. By the time you iterate,
# `i` is 2. All three generators yield the same multiples of 2.

If you actually want each generator to capture the current value of i, pass it as a default argument or use a factory function.

StopIteration inside a generator silently ends it. Since PEP 479 (Python 3.7+), a StopIteration that escapes from inside a generator is converted to RuntimeError instead of mysteriously ending the iteration. Good change, but worth knowing if you read older code that relied on the old behaviour.

That’s iterators, generators, and comprehensions, no longer blurred together. Next lesson: decorators. The pattern that wraps half the third-party Python code you import.


Citations (retrieved 2026-05-01):

Search