If you’ve written Python for more than a week, you’ve used all three of these. You’ve written for x in something:. You’ve written [x*2 for x in xs]. You’ve maybe even written yield. And if someone asked you “what’s the difference between an iterable, an iterator, a generator, and a comprehension,” you’d probably do what most people do — squint, mumble something about laziness, and change the subject.
That’s fine for casual code. It stops being fine the day you have to stream a 50 GB CSV through a memory-constrained container, or debug a generator that mysteriously runs twice with no results the second time. The four words mean four different things. Today we pull them apart.
Iterable vs iterator: the two-sentence version
An iterable is anything you can put after in in a for loop. Lists, tuples, dicts, sets, strings, files, ranges, custom classes that implement __iter__. The thing itself doesn’t track position; you can loop over it as many times as you want.
An iterator is the stateful, single-use thing that actually walks through an iterable. It tracks where it is. When it runs out, it raises StopIteration. Once exhausted, it’s done — restart it by getting a fresh one.
xs: list[int] = [10, 20, 30] # iterable
it = iter(xs) # iterator, freshly born
print(next(it)) # 10
print(next(it)) # 20
print(next(it)) # 30
print(next(it)) # raises StopIteration
for x in xs: is sugar for “call iter(xs) once, then call next() on it in a loop until StopIteration, then stop.” The list xs is the iterable. The thing returned by iter(xs) is the iterator. They are not the same object, and the distinction is what lets you loop over the same list twice without it being “used up.”
A subtle gotcha: an iterator is also an iterable (it has __iter__, which returns itself). That’s why for x in some_generator: works. But you can only walk it once:
g = (x*2 for x in [1, 2, 3]) # generator expression
print(list(g)) # [2, 4, 6]
print(list(g)) # [] — already exhausted
If you’ve ever written a function that returns a generator and then tried to iterate over the result twice, this is the bug.
The iterator protocol on custom classes
If you want your own class to work in a for loop, you need two methods:
from typing import Iterator
class CountDown:
def __init__(self, start: int) -> None:
self.start = start
def __iter__(self) -> Iterator[int]:
# Return a fresh iterator each time — this is what makes
# CountDown itself an *iterable*, not an iterator.
return CountDownIterator(self.start)
class CountDownIterator:
def __init__(self, current: int) -> None:
self.current = current
def __iter__(self) -> "CountDownIterator":
return self
def __next__(self) -> int:
if self.current <= 0:
raise StopIteration
value = self.current
self.current -= 1
return value
for n in CountDown(3):
print(n)
# 3, 2, 1
That is a lot of ceremony for “count down.” Reading it, you can see why yield exists.
Generators: iterators without the boilerplate
A generator is an iterator built from a function with yield in it. The interpreter does the __iter__/__next__/StopIteration plumbing for you.
from typing import Iterator
def count_down(start: int) -> Iterator[int]:
while start > 0:
yield start
start -= 1
for n in count_down(3):
print(n)
# 3, 2, 1
Same behaviour as the class above. About one-fifth the code. This is the reason iterator classes are rare in modern Python — you reach for one only when the state is genuinely complex (e.g., a tree traversal that needs to resume mid-recursion, or an iterator that has methods other than __next__).
When the function hits yield, it pauses. Local variables, the program counter, the stack — all frozen. The next next() call resumes from that exact point. When the function returns (or runs off the end), the generator raises StopIteration automatically.
yield from lets one generator delegate to another:
from typing import Iterator
def numbers() -> Iterator[int]:
yield from range(3) # 0, 1, 2
yield from [10, 20, 30] # 10, 20, 30
yield 99 # 99
print(list(numbers()))
# [0, 1, 2, 10, 20, 30, 99]
Without yield from you’d write for x in range(3): yield x — fine, but less direct.
Comprehensions: syntactic sugar for building things
A list comprehension builds a list with a compact loop:
xs: list[int] = [1, 2, 3, 4, 5]
doubled = [x * 2 for x in xs]
# [2, 4, 6, 8, 10]
evens_doubled = [x * 2 for x in xs if x % 2 == 0]
# [4, 8]
Equivalent to writing the loop and .append() calls by hand, but shorter and slightly faster (the interpreter optimises it). The same syntax exists for sets and dicts:
unique_lengths = {len(s) for s in ["hi", "hello", "hey"]}
# {2, 5, 3}
word_lengths = {s: len(s) for s in ["hi", "hello", "hey"]}
# {'hi': 2, 'hello': 5, 'hey': 3}
There is no tuple comprehension. The syntax (x*2 for x in xs) looks like one but isn’t — it’s a generator expression. To build a tuple from a comprehension you write tuple(x*2 for x in xs).
The memory difference: list vs generator expression
This is the one practical thing that matters most.
# List comprehension — builds the whole list in memory
squares_list = [x * x for x in range(10_000_000)]
# Memory: ~80 MB for a list of 10 million ints. Allocated up front.
# Generator expression — builds nothing, yields on demand
squares_gen = (x * x for x in range(10_000_000))
# Memory: ~200 bytes. Just the generator object.
If you’re going to consume every element and want random access later, the list is fine. If you’re going to consume them once, in order, the generator is almost always the right call.
The classic example: streaming a huge file.
from typing import Iterator
def numeric_columns(path: str, col: int) -> Iterator[float]:
with open(path, encoding="utf-8") as f:
next(f) # skip header
for line in f: # files are iterators of lines
parts = line.rstrip("\n").split(",")
yield float(parts[col])
total = 0.0
count = 0
for value in numeric_columns("orders_50gb.csv", col=4):
total += value
count += 1
print(total / count if count else 0.0)
That program computes the average of one column across a 50 GB file using a few kilobytes of RAM. The file object itself is an iterator — for line in f: reads one line at a time. The generator function pipes those lines through a transformation, one at a time. Nothing materialises. This is the shape of every memory-bounded data pipeline you’ll ever write in Python.
The same code with a list comprehension would try to load all 50 GB into a Python list. Your container would OOM at the 30% mark.
itertools: the standard-library toolkit
itertools is a module of generator-based building blocks. A few I use weekly:
import itertools
# chain: concatenate iterables lazily
combined = itertools.chain([1, 2], [3, 4], [5])
# 1, 2, 3, 4, 5 — without ever building a combined list
# islice: slice an iterator without converting to a list
first_ten = list(itertools.islice(numeric_columns("huge.csv", 4), 10))
# Reads exactly 10 lines from the file, then stops.
# tee: split one iterator into N independent iterators
a, b = itertools.tee(numeric_columns("huge.csv", 4), 2)
# a and b can each be consumed independently — but tee buffers the
# values neither has consumed yet. If a races far ahead of b, you're
# back to holding most of the data in memory.
# groupby, accumulate, pairwise (3.10+), batched (3.12+)...
I won’t enumerate the whole module — help(itertools) is what you want when you need a tool. But once you see the pattern (everything is a generator, everything composes), you stop writing manual loops for things like “iterate in pairs” or “take every nth element.”
When to use which
A rough decision tree:
- Need a finite collection you’ll index into or iterate multiple times? Use a list or a list comprehension.
- Need to transform-and-iterate-once over something potentially huge? Generator expression or generator function.
- Need behaviour beyond
__next__— say, an iterator with areset()method or extra state? Write a class. - Composing standard transformations?
itertoolsfirst, custom code only when nothing fits.
The mistake to avoid: writing [x for x in big_thing if cond] and then immediately iterating over it once. That’s a generator expression in disguise — drop the brackets, save the memory.
Common gotchas
Generators are single-use. If you need to walk the same data twice, either store it in a list or call the generator function twice (which gives you a fresh generator each time).
Late binding in closures inside comprehensions. A comprehension’s expression is evaluated lazily for generator expressions — the loop variable in a list comprehension is fine, but in a generator expression the iterable is bound at creation time, while the rest is lazy. The classic trap:
gens = [(x * i for x in range(3)) for i in range(3)]
# Each generator captures `i` by reference. By the time you iterate,
# `i` is 2. All three generators yield the same multiples of 2.
If you actually want each generator to capture the current value of i, pass it as a default argument or use a factory function.
StopIteration inside a generator silently ends it. Since PEP 479 (Python 3.7+), a StopIteration that escapes from inside a generator is converted to RuntimeError instead of mysteriously ending the iteration. Good change, but worth knowing if you read older code that relied on the old behaviour.
That’s iterators, generators, and comprehensions, no longer blurred together. Next lesson: decorators. The pattern that wraps half the third-party Python code you import.
Citations (retrieved 2026-05-01):
- Python Language Reference, “The for statement” — https://docs.python.org/3/reference/compound_stmts.html#the-for-statement
itertoolsmodule documentation — https://docs.python.org/3/library/itertools.html- PEP 234, “Iterators” — https://peps.python.org/pep-0234/
- PEP 255, “Simple Generators” — https://peps.python.org/pep-0255/
- PEP 380, “Syntax for Delegating to a Subgenerator” (
yield from) — https://peps.python.org/pep-0380/ - PEP 479, “Change StopIteration handling inside generators” — https://peps.python.org/pep-0479/