pathlib: filesystem paths the right way

If you learned Python before about 2015, every script that touched the filesystem started with import os and built paths by smashing strings together with os.path.join. It worked. It still works. But every line of it is a string. The path doesn’t know it’s a path. You can pass it anywhere a string is expected, including places that have no business taking a path. Cross-platform separator handling is a function call. Reading a file is open(os.path.join(base, "data", "events.json"), "r", encoding="utf-8").read() — eight tokens of ceremony for one operation.

pathlib has been in the standard library since Python 3.4, and as of 3.13 the rough edges are gone. Path objects are the modern way. This lesson is the small set of methods that cover almost everything you’ll do with files in a real script.

The basics: what a Path is

from pathlib import Path

p: Path = Path("/tmp/foo.txt")          # POSIX absolute
q: Path = Path("data/raw/events.json")  # relative, OS-agnostic
home: Path = Path.home()                # /home/narcis or C:\Users\narcis
here: Path = Path.cwd()                 # current working directory

Path is a class. On Linux you get a PosixPath, on Windows a WindowsPath. You don’t usually care; both share the same interface. The constructor accepts strings, other Path objects, and anything implementing __fspath__(). It does not touch the filesystem. Just creating a Path doesn’t check that the file exists, isn’t I/O, doesn’t raise.

Joining paths: the `/` operator

This is the headline feature.

base: Path = Path("data")
events: Path = base / "raw" / "events.json"
# Path('data/raw/events.json') on POSIX
# WindowsPath('data\\raw\\events.json') on Windows

The / operator joins path segments with the OS-appropriate separator. No more os.path.join(a, b, c). No more wondering what happens when one of the segments has a trailing slash. The right-hand side can be a string or another Path.

Three things to know:

If a segment is an absolute path, joining resets to that absolute path. Path("a") / "/b" is Path("/b"). This bites people occasionally; if you’re concatenating user input, watch for it.
/ works in either direction as long as one side is a Path. Path("a") / "b" and "a" / Path("b") both work.
There’s no + operator. Use /.

The properties you’ll use constantly

p: Path = Path("/var/log/app/access.log.gz")

p.name        # 'access.log.gz'    — final component
p.stem        # 'access.log'       — name minus the last suffix
p.suffix      # '.gz'              — last extension only
p.suffixes    # ['.log', '.gz']    — all of them
p.parent      # Path('/var/log/app')
p.parents[1]  # Path('/var/log')   — chain upward
p.parts       # ('/', 'var', 'log', 'app', 'access.log.gz')
p.is_absolute()  # True

stem and suffix are the two you’ll reach for in 80% of file-processing scripts. Want to swap an extension?

csv: Path = Path("report.xlsx").with_suffix(".csv")
# Path('report.csv')

Want to rename the file but keep the extension?

renamed: Path = p.with_name("error.log.gz")
renamed = p.with_stem("error")  # 3.9+: change stem only, keep suffix

These return new Path objects. The original is unchanged. Path is immutable, like a string.

Existence and type checks

p: Path = Path("data/raw/events.json")

p.exists()    # True if anything is at that path
p.is_file()   # True if it's a regular file
p.is_dir()    # True if it's a directory
p.is_symlink()

These hit the filesystem. They follow symlinks by default. They return False (rather than raising) if the path doesn’t exist, which is usually what you want for a guard.

config: Path = Path("config.toml")
if not config.is_file():
    raise SystemExit(f"Missing config: {config}")

Reading and writing

The convenience methods that replace four lines of open() boilerplate:

text: str = Path("notes.md").read_text(encoding="utf-8")
data: bytes = Path("image.png").read_bytes()

Path("output.txt").write_text("hello\n", encoding="utf-8")
Path("output.bin").write_bytes(b"\x00\x01\x02")

Every read_text should specify encoding="utf-8" explicitly. The default depends on locale on older interpreters and has caused enough cross-platform headaches that 3.10 added a warning, 3.15 will make UTF-8 the locale default, but be explicit anyway. Future you will thank present you.

For large files, don’t use read_text — it loads the whole thing. Use open():

with Path("huge.log").open("r", encoding="utf-8") as f:
    for line in f:
        process(line)

Path.open() works exactly like the built-in open(), just with the path baked in. The context manager closes the file. Streaming through a big file uses constant memory.

Globbing: finding files by pattern

data_dir: Path = Path("data")

# Files in this directory only
csvs: list[Path] = list(data_dir.glob("*.csv"))

# Recursive — every Python file under src/
py_files: list[Path] = list(Path("src").rglob("*.py"))

# Same as rglob but with explicit ** wildcard
also_py: list[Path] = list(Path("src").glob("**/*.py"))

glob returns a generator, not a list. Wrap with list() if you need to count or reuse. Iterate directly if you’re processing each one and don’t care about the total.

for log in Path("/var/log").rglob("*.log"):
    if log.stat().st_size > 1_000_000:
        print(f"big: {log}")

rglob recurses; glob does not. The ** wildcard in glob("**/*.py") does the same thing as rglob("*.py") — pick whichever reads better to you.

Listing a directory

for child in Path("data").iterdir():
    print(child.name, "is dir" if child.is_dir() else "is file")

iterdir() yields Path objects for everything in a directory — files, subdirectories, symlinks, the lot. No order guarantee; sort if you need consistency.

sorted_files: list[Path] = sorted(Path("data").iterdir())

Creating directories

out: Path = Path("artifacts/2026/05")
out.mkdir(parents=True, exist_ok=True)

The two arguments you’ll always pass:

parents=True — create intermediate directories. Without it, mkdir fails if a parent doesn’t exist (like the old mkdir vs mkdir -p distinction).
exist_ok=True — don’t raise if the directory already exists. Without it, mkdir fails on the second run of your script.

Together: idempotent, recursive directory creation. Set both. Move on.

Deleting things

Path("temp.txt").unlink()             # delete a file
Path("temp.txt").unlink(missing_ok=True)  # don't raise if it's gone
Path("empty_dir").rmdir()             # delete an empty directory

import shutil
shutil.rmtree(Path("build"))          # delete a directory and everything in it

shutil.rmtree is the only common file operation that doesn’t have a Path method, because it’s destructive enough that the explicit import is a feature, not a bug. It accepts a Path directly.

Resolving and comparing paths

Two paths can refer to the same file but look different as strings. data/raw/../raw/events.json and data/raw/events.json are the same. So is data/raw/events.json and the absolute path that contains it. To compare reliably, normalize first.

p: Path = Path("data/raw/../raw/events.json")
p.resolve()
# Path('/home/narcis/project/data/raw/events.json') — absolute, symlinks followed, .. collapsed

resolve() makes a path absolute, follows symlinks, and removes . and .. segments. It hits the filesystem. Use it before storing a path somewhere or comparing to another. Two different-looking paths that refer to the same file will compare equal after resolve():

a: Path = Path("notes.md").resolve()
b: Path = Path("./subdir/../notes.md").resolve()
a == b  # True

A close cousin that doesn’t hit the filesystem is Path.absolute() — it just prepends the current working directory if the path is relative. It doesn’t collapse .. segments and doesn’t resolve symlinks. For pure path math without I/O, absolute() is the right tool. For “is this the same file as that file,” it’s resolve().

There’s also samefile():

Path("notes.md").samefile(Path("./subdir/../notes.md"))
# True — same inode

Useful for “is this the file I think it is” without manually normalizing.

Renaming and moving

src: Path = Path("draft.txt")
src.rename(Path("final.txt"))         # rename in place

# Across directories — same operation
src.rename(Path("archive/final.txt"))

# Replace, even if the destination exists
src.replace(Path("final.txt"))

rename will fail on some platforms if the destination exists; replace overwrites. For copies (preserving the original), use shutil.copy2 (preserves metadata) or shutil.copy (just contents):

import shutil
shutil.copy2(Path("source.csv"), Path("backup/source.csv"))

All of these accept Path objects directly. shutil is path-aware throughout.

PurePath vs Path: the distinction nobody explains

Path does I/O. PurePath doesn’t. PurePath is the same path-manipulation API minus anything that touches the filesystem.

from pathlib import PurePath, PurePosixPath, PureWindowsPath

p: PurePosixPath = PurePosixPath("/etc/hosts")
p.parent      # works
p.suffix      # works
p.exists()    # AttributeError — no filesystem access

When is this useful? Two cases:

Tests. You want to verify your function builds the right path without setting up a fake filesystem. Pass it a PurePath.
Cross-platform manipulation. You’re on Linux but parsing paths from a Windows log file. PureWindowsPath understands Windows separators and drive letters even on a POSIX host.

Most code uses plain Path. Mention PurePath exists, move on.

Rewriting an os.path script in pathlib

Here’s a chunk of code in the old style:

import os
import os.path

def collect_logs(root: str) -> list[str]:
    out: list[str] = []
    for dirpath, _, filenames in os.walk(root):
        for name in filenames:
            if name.endswith(".log"):
                full = os.path.join(dirpath, name)
                if os.path.getsize(full) > 0:
                    out.append(full)
    return out

archive_dir = os.path.join(root, "archive")
if not os.path.isdir(archive_dir):
    os.makedirs(archive_dir)

for path in collect_logs("/var/log"):
    base = os.path.basename(path)
    dest = os.path.join(archive_dir, base + ".processed")
    with open(path, "r", encoding="utf-8") as src, open(dest, "w", encoding="utf-8") as dst:
        dst.write(src.read())

Same logic in pathlib:

from pathlib import Path

def collect_logs(root: Path) -> list[Path]:
    return [p for p in root.rglob("*.log") if p.stat().st_size > 0]

archive_dir: Path = root / "archive"
archive_dir.mkdir(parents=True, exist_ok=True)

for path in collect_logs(Path("/var/log")):
    dest: Path = archive_dir / (path.name + ".processed")
    dest.write_text(path.read_text(encoding="utf-8"), encoding="utf-8")

Half the lines. No string-juggling. Type signatures that say “this is a path,” not “this is some string that happens to be a path.” When you read the second version a year from now, you can see at a glance what’s going on.

Stat info: size, mtime, mode

When you need file metadata, stat() returns a os.stat_result with everything the OS knows about the file:

info = Path("events.json").stat()
info.st_size      # bytes
info.st_mtime     # last modification, seconds since epoch (float)
info.st_ctime     # creation time on Windows, metadata-change on POSIX
info.st_mode      # permission bits and file type

For human-friendly modification times, convert with datetime:

from datetime import datetime, timezone

mtime: datetime = datetime.fromtimestamp(
    Path("events.json").stat().st_mtime,
    tz=timezone.utc,
)

Path.stat() follows symlinks; Path.lstat() doesn’t. If you want to know whether a path is a symlink to a file or the file itself, use lstat() and check the mode.

Working with the current working directory

Path.cwd()                     # absolute path of where the script is running
Path("data/raw").is_absolute()  # False, relative
Path("data/raw").absolute()     # prepend cwd, no symlink resolution

A relative Path is resolved against cwd whenever you actually do I/O on it. This means a script that uses Path("data/raw/events.json") will read different files depending on where you run it from. If you want a path relative to the script itself, anchor it with __file__:

HERE: Path = Path(__file__).resolve().parent
DATA: Path = HERE / "data" / "raw"

Now DATA points to the same place no matter where the script is invoked from. This is the right pattern for any script that bundles its own data files.

The handful of methods worth memorizing

If you remember nothing else from this lesson:

Path("...") / "..." / "..." for joining
.parent, .name, .stem, .suffix for parts
.exists(), .is_file(), .is_dir() for checks
.read_text(encoding="utf-8"), .write_text(...) for small files
.glob("*.csv"), .rglob("**/*.py") for finding
.mkdir(parents=True, exist_ok=True) for creating
.unlink(missing_ok=True) and shutil.rmtree(...) for deleting

That’s the 95%. The full API has another fifty methods for symlinks, permissions, ownership, hardlinks, resolution — they exist when you need them. Most scripts don’t.

Next lesson: the timezone swamp. Naive vs aware datetimes, why pytz is finally retired in favor of stdlib zoneinfo, and the DST trap that breaks “1 day later” arithmetic twice a year.

References: pathlib — Object-oriented filesystem paths, shutil — High-level file operations. Retrieved 2026-05-01.