NumPy: arrays, broadcasting, the foundation of scientific Python

We’ve spent the last block of lessons inside DataFrames — pandas, Polars, the tabular world. Underneath all of that, at the bottom of every numerical Python stack you’ll ever touch, is NumPy. Pandas stores most of its columns as NumPy arrays. scikit-learn takes NumPy arrays as input. PyTorch and JAX both implement a NumPy-compatible API on top of their tensors so people don’t have to relearn the basics. Even Polars, which uses Apache Arrow internally, gives you .to_numpy() because everyone downstream expects it.

So before we open Module 8 properly with plotting and SciPy, we need a clean lesson on NumPy itself. Not a deep dive — that would be its own course — but enough to read code, write the operations you’ll actually need, and understand why broadcasting is the conceptual move that makes the whole library worth learning.

The ndarray

NumPy’s central object is the ndarray — an n-dimensional array. From outside it looks like a Python list of numbers. Inside it’s something completely different:

Contiguous memory. All the elements live next to each other in one block of RAM, the way a C array does. Python lists are arrays of pointers to scattered objects; NumPy arrays are flat slabs of bytes.
Fixed dtype. Every element is the same type — float64, int32, bool, whatever — and the type is stamped on the array, not on each element.
Shape and strides. The same 1-D block of memory can be interpreted as a 1-D vector, a 2-D matrix, or a higher-dimensional tensor by changing how NumPy walks through it.

That layout is the whole reason NumPy is fast. When you do arr * 2, NumPy doesn’t iterate in Python — it dispatches to a C loop that runs over the contiguous bytes with no interpreter overhead, often vectorized to SIMD instructions by the compiler. The same operation on a Python list is roughly 50-100x slower, and the gap grows with array size.

import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a.dtype)    # int64 on most platforms
print(a.shape)    # (5,)
print(a.ndim)     # 1

Creating arrays

Five constructors cover most of what you’ll do:

np.array([1, 2, 3])              # from a list
np.zeros((3, 4))                 # 3x4 of zeros, dtype float64
np.ones((2, 2), dtype=np.int32)  # 2x2 of ones, integer
np.arange(0, 10, 2)              # [0, 2, 4, 6, 8] — like range()
np.linspace(0, 1, 5)             # [0., 0.25, 0.5, 0.75, 1.] — N evenly-spaced points

arange gives you a step size, linspace gives you a count — that’s the only difference worth remembering. For random data: np.random.default_rng(seed=42).normal(size=(1000, 3)) is the modern API; the old np.random.randn style still works but the default_rng route is what the docs recommend now.

Reshaping moves the same memory into a different layout:

a = np.arange(12)              # shape (12,)
b = a.reshape((3, 4))          # shape (3, 4), same data
c = a.reshape((2, 2, 3))       # shape (2, 2, 3), same data

-1 in a reshape means “infer this dimension”: a.reshape((3, -1)) on a 12-element array gives you (3, 4). Use this constantly.

Vectorization is the point

The first thing to internalize: you don’t write loops over NumPy arrays. Every operation that looks like it should be a loop is already one, written in C, called a ufunc:

prices = np.array([10.0, 20.0, 35.5, 7.99])
with_vat = prices * 1.22          # element-wise multiplication
total = prices.sum()              # scalar reduction
log_prices = np.log(prices)       # element-wise log

No for. No list comprehension. Just operators and named functions, applied to the whole array at once. This is what people mean by “vectorized code.” When you find yourself writing a Python loop over the elements of an ndarray, stop and look for the vectorized equivalent — it almost always exists.

Broadcasting

Broadcasting is the rule that says: when you do an operation on two arrays of different shapes, NumPy tries to stretch them to compatible shapes before doing the element-wise op. It’s the feature that turns “I need to subtract this 1-D vector from every row of a matrix” from a loop into a single line.

The rules, right-aligned (compare dimensions starting from the right):

If one array has fewer dimensions, treat the missing ones as size 1.
Two dimensions are compatible if they’re equal, or if one of them is 1.
A dimension of size 1 is stretched to match the other.
If a dimension is incompatible, you get a ValueError.

The classic example: subtract the column means from a 2-D matrix.

X = np.array([
    [1.0, 2.0, 3.0],
    [4.0, 5.0, 6.0],
    [7.0, 8.0, 9.0],
])

col_means = X.mean(axis=0)        # shape (3,) — [4., 5., 6.]
centered = X - col_means          # shape (3, 3) - shape (3,) → broadcasts

X is (3, 3). col_means is (3,). Right-align the shapes:

X:          3 x 3
col_means:      3

The trailing dimensions both equal 3, so they’re compatible. The missing leading dimension on col_means is treated as 1, then stretched to 3. The result is the same as if you’d repeated col_means three times along the row axis — but no copy is actually made; broadcasting is a view-time operation.

If you want to subtract row means instead, you need to keep the row axis:

row_means = X.mean(axis=1, keepdims=True)   # shape (3, 1) instead of (3,)
centered_rows = X - row_means

(3, 1) aligns with (3, 3): the 1 stretches to 3 along columns. Without keepdims=True, you’d get (3,) and broadcasting would try to align it against the columns of X instead — silently doing the wrong thing if your matrix is square. This is the broadcasting bug everyone hits once.

Slicing and boolean indexing

NumPy slicing looks like Python list slicing extended to multiple dimensions, with one important difference: slices are views, not copies. Modifying the slice modifies the original.

arr = np.arange(20).reshape((4, 5))

arr[1:3, 0]        # rows 1 and 2, column 0 — shape (2,)
arr[:, :3]         # all rows, first three columns — shape (4, 3)
arr[-1, :]         # last row — shape (5,)

Boolean indexing — picking elements where a condition is true — is the workhorse:

arr = np.array([1, -2, 3, -4, 5, -6])
arr[arr > 0]                      # array([1, 3, 5])
arr[arr > 0] = 0                  # zero out positives in place

Combined with np.where for conditional replacement:

np.where(arr < 0, 0, arr)         # replace negatives with 0, leave the rest

The axis parameter

Every reduction (sum, mean, min, max, std, argmax, …) takes an axis argument. This is the second thing people get wrong, after broadcasting.

X = np.arange(12).reshape((3, 4))
X.sum()              # 66 — reduce over everything, scalar
X.sum(axis=0)        # shape (4,) — sum down each column
X.sum(axis=1)        # shape (3,) — sum across each row

The mnemonic that finally made it stick for me: axis= is the dimension that disappears. axis=0 collapses the row axis, leaving you with one number per column. axis=1 collapses the column axis, leaving one number per row. Same for any higher-dimensional case.

A few more functions worth knowing

np.concatenate([a, b], axis=0) — glue arrays along an existing axis.
np.stack([a, b], axis=0) — glue arrays along a new axis (creates a new dimension).
np.unique(arr, return_counts=True) — distinct values and how often each appears.
arr.astype(np.float32) — change dtype.
np.allclose(a, b) — element-wise approximate equality, the right way to compare floats.

Memory layout (briefly)

Two facts to keep in your back pocket. NumPy arrays are by default C-contiguous — rows live next to each other in memory, the way C lays out a 2-D array. Fortran-contiguous is the other layout (columns next to each other), what MATLAB and Fortran use natively. You almost never need to think about this, except in two places: when you pass arrays to a C extension or PyTorch (some kernels demand contiguous input — arr.contiguous() or np.ascontiguousarray(arr) fixes it), and when you’re squeezing the last bit of performance out of a column-major operation.

NumPy 2.x is the current world

NumPy 2.0 shipped in mid-2024 and the 2.x line is where everything lives in 2026. The big change was a cleanup of the dtype subsystem: more consistent string types, cleaner promotion rules (int + float no longer surprises you in edge cases), and a lot of deprecated APIs finally removed. If you’re reading code older than 2024 you may see things like np.int (gone — use int or np.int64) or np.product (gone — use np.prod). The deprecation warnings in the 1.20-1.26 era told everyone this was coming; 2.0 just delivered it.

For new code, don’t think about it. Pin numpy>=2 in your pyproject.toml, install with uv, move on.

Views, copies, and the bug everyone hits

One more thing before we wrap: NumPy distinguishes carefully between views and copies. Slicing returns a view — same memory, different shape descriptor. Boolean indexing and fancy indexing (passing an array of indices) return copies. This matters when you start mutating:

arr = np.arange(10)
view = arr[2:5]
view[0] = 999
print(arr)            # [0, 1, 999, 3, 4, 5, 6, 7, 8, 9]  — original was modified

arr2 = np.arange(10)
copy = arr2[arr2 > 3]
copy[0] = 999
print(arr2)           # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]    — original unchanged

The rule of thumb: if in doubt, call .copy() explicitly. The performance cost is usually negligible compared to the cost of debugging an aliasing bug at 11pm before a deadline.

np.shares_memory(a, b) will tell you whether two arrays are views of each other. Useful when you’re tracking down “why did changing X also change Y?”

When NOT to use NumPy directly

Here’s the awkward truth at the end of a NumPy lesson: most data work in 2026 doesn’t actually start with np.array(...). If your data is tabular — columns with names, mixed types, missing values — pandas, Polars, or PyArrow all give you everything NumPy does plus labels, plus better string and datetime handling, plus better I/O. Reach for NumPy when you have genuinely numerical, homogeneous, n-dimensional data: an image, a feature matrix for a model, a simulation grid, a time series of measurements.

The other awkward truth is that for many ML workloads the array library you actually want is PyTorch (or JAX). They both implement a NumPy-compatible API on top of their tensor types, and they both run on a GPU. torch.tensor(arr) round-trips with NumPy in microseconds, so the workflow of “preprocess in NumPy, then move to torch for the model” is what most pipelines actually look like. The good news: everything you learned in this lesson — broadcasting, axes, reshapes, slicing — transfers directly. The two libraries diverged in API a decade ago and have been quietly converging since.

The next two lessons assume you have those numbers. Lesson 44 plots them; lesson 45 runs statistics, optimization, and signal processing on them. NumPy is the substrate underneath both.

Reference: NumPy documentation, retrieved 2026-05-01. NumPy 2.x release notes for the dtype changes and removed APIs.