AI-assisted Python development: the workflow that works

In 2026, AI assistants are a normal part of the Python toolchain. Copilot is in the editor, Claude is one keystroke away in Cursor, Aider lives in the terminal, Codex has its own CLI, Continue plugs into VS Code and JetBrains alike. Pretending they don’t exist is a productivity tax. Treating them as oracles is a quality tax. The interesting question is the middle: where they pay back the time you spend prompting them, and where they cost you more than they save.

This lesson is the workflow I’ve landed on. Five categories of help that genuinely work, the failure modes that bite, and the prompt patterns that turn an AI assistant from a guessing machine into a competent junior pair.

Five categories where AI assistants earn their keep

1. Boilerplate generation

This is the gimme. Anything that’s structural, repetitive, and follows a template, AI does near-perfectly:

A click or typer CLI scaffold from a one-line description.
A @dataclass from a JSON sample.
A pytest.mark.parametrize table from “here are six cases”.
Test stubs from a function signature.
A pyproject.toml skeleton.

A real example. You paste a JSON blob and say “dataclass for this, frozen, with slots”:

from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class Customer:
    id: int
    name: str
    email: str
    signup_date: str
    is_active: bool

That’s thirty seconds saved, twenty times a day. The category is broad: any time you’re typing something the keyboard already knows the shape of, an AI assistant will type it faster than you and rarely get it wrong.

2. API recall

“How do I use Pandas resample to bucket by week starting Monday?” This used to mean a tab to the docs and three minutes of skimming. Now it’s a question and an answer with a working code block:

import pandas as pd

weekly = df.resample("W-MON", on="ts", label="left").sum()

For mainstream libraries — pandas, numpy, requests, sqlalchemy, fastapi, polars — AI recall is faster and more focused than docs lookup. The caveat: it’s tuned for the common case. For obscure APIs, niche libraries, or anything released in the last six months, it confidently fabricates. Verify by running the code, not by reading it.

3. Refactor proposals

Cursor and Aider both shine here, because they have your whole file (or repo) in scope. “Rename process to process_invoice everywhere it refers to the invoice flow, leaving the unrelated process in payments.py alone.” That’s a five-minute manual job, sixty seconds with a tool that can read all the call sites.

The bigger refactors — “split this 300-line class into three smaller ones” — require more babysitting, but the AI is excellent at the mechanical part: extracting methods, threading parameters, updating imports. You provide the architectural decision; it does the typing.

4. Code review

Surprisingly good. Paste a function and ask “any issues?” and you’ll routinely get back:

Off-by-one errors in slicing.
Type mismatches the linter missed.
Mutable default arguments.
Missing exception handling on a known-flaky call.
“This regex has catastrophic backtracking on input X.”

It’s not a replacement for a human reviewer — it doesn’t know your business rules, your performance budget, or your domain — but for the mechanical layer of review, AI is a strong second pair of eyes. I now run my own pull requests through Claude before posting them and routinely fix two or three things before any human sees the diff.

5. Explanation

“Why is this slow?” “What does this regex match?” “What’s this __init_subclass__ doing?” For inherited code or unfamiliar libraries, an AI assistant’s explanation is usually correct, complete, and faster than reading the source. For onboarding to a new codebase, it’s transformative.

The caveat is the same as category 2: verify. The explanation that sounds right and is right are different events.

Where AI assistants hurt

The damage is real, and it’s worth naming explicitly.

Plausible-looking but wrong code in unfamiliar territory. When the AI doesn’t know the answer, it doesn’t say so. It produces code that compiles, looks idiomatic, and is wrong in a way that takes you longer to debug than writing it yourself would have. This is most painful in obscure libraries, recent language features, and anything where the training data is thin.

Over-engineering. Ask for a function that fetches a URL, get back a function with retries, exponential backoff, a cache layer, structured logging, and a Protocol for the HTTP client. None of that was asked for. None of it is wrong, exactly, but it’s noise you didn’t want, and it takes longer to read than to write.

Ignoring project conventions. Your codebase uses httpx; the AI gives you requests. Your codebase uses loguru; the AI uses logging. Your team forbids try/except Exception; the AI sprinkles them generously. Without context, the AI defaults to a generic “Python code on the internet” style, not your style.

Stale syntax for newer features. AI assistants often produce Optional[X] and List[int] instead of X | None and list[int], because their training data leans on the older style. They produce from typing import Dict when nothing imported it. Modern Python defaults need to be enforced.

The Tab-Tab-Tab trap. The single biggest failure mode in 2026 is people accepting AI suggestions without reading them. The autocomplete is fast, the suggestions look right, and you ship code you’ve never actually read. This is how you get bugs that nobody on the team understands, because nobody on the team wrote them.

The workflow that actually works

A few habits that turn the AI from liability into asset.

Type-hint everything. AI suggestions get dramatically better when there are types in scope. The model has more to work with, the suggestions become more specific, and the failure mode shifts from “made up an attribute” to “called the right method”. This is the single highest-leverage thing you can do.

Show the AI your conventions. A short CONVENTIONS.md (or STYLE.md, or whatever) pinned in the AI’s context tells it which logging library, which HTTP client, which testing patterns, which error handling style. Cursor and Aider both let you pin files; Copilot picks up .github/copilot-instructions.md; Claude in Cursor reads CLAUDE.md. Spend an hour on this file once and every prompt afterward gets better.

# Conventions

- HTTP: httpx, async by default.
- Logging: loguru, no f-strings in log messages, structured kwargs.
- Errors: domain exceptions in `errors.py`. No bare `except:`.
- Tests: pytest, parametrize over loops, fixtures in `conftest.py`.
- Style: ruff with our `pyproject.toml` config.

Review every line. Read the diff. The fact that you didn’t type it doesn’t matter; you’re shipping it. The mental shift is from “writer” to “editor”, and editing means actually reading.

Pin the assistant to small steps. “Implement this whole feature” produces a sprawling, hard-to-review patch. “Implement just the parser” produces a focused diff you can read in a minute. The smaller the unit, the better the output.

Prompt patterns that work

A few patterns I use daily, and the ones I’ve stopped using.

Patterns that work:

“Write a pytest parametrize table for these N cases.” Almost always perfect.
“Add type hints to this function.” Reliable, low risk.
“Refactor this to use early returns.” Specific, mechanical, easy to verify.
“Explain what this code does, line by line.” Great for onboarding.
“Does this have any bugs? Look for off-by-one, type mismatches, and mutable defaults.” Specific review prompt — better than “review this”.
“Convert this unittest to pytest.” Mechanical, well-defined.

Patterns that produce variable results:

“Write a function that does X.” Without context, X is interpreted generously and the result over-engineers.
“Optimise this.” Optimise for what? Latency? Memory? Readability?
“Make this more Pythonic.” Vague. The AI guesses, sometimes well, sometimes by adding unwanted abstractions.

Patterns to retire:

“Write the whole module.” You’ll spend more time fixing it than writing it.
“Generate tests for this code.” Tends to produce tests that mirror the implementation, not the behaviour. Better: write the test names yourself, ask the AI to fill in the bodies.

A real human-AI-review-ship cycle

A small, real-world example. I have a function I want to write: round a price to the nearest cent, but with banker’s rounding.

I start with the signature and a docstring:

def round_price(amount: float) -> float:
    """Round to the nearest cent using banker's rounding (round half to even)."""

I ask the AI to fill it in. It produces:

from decimal import Decimal, ROUND_HALF_EVEN

def round_price(amount: float) -> float:
    """Round to the nearest cent using banker's rounding (round half to even)."""
    return float(Decimal(str(amount)).quantize(Decimal("0.01"), rounding=ROUND_HALF_EVEN))

I read it. The str(amount) conversion is suspicious — what if amount is already a Decimal? But the signature is float, so it’s fine. I’d also rather return Decimal than float, but the signature says float. I accept it.

Then I ask: “Write a parametrize table covering: 1.005, 1.015, 1.025, -1.005, 0.0, large numbers.” It produces:

import pytest

@pytest.mark.parametrize("amount,expected", [
    (1.005, 1.00),
    (1.015, 1.02),
    (1.025, 1.02),
    (-1.005, -1.00),
    (0.0, 0.0),
    (1_000_000.555, 1_000_000.56),
])
def test_round_price(amount: float, expected: float) -> None:
    assert round_price(amount) == expected

I run them. They pass. I ship.

Total time: under two minutes. Hand-written, this would have been ten minutes of typing and a quick detour to the decimal docs to remind myself of the constant name. The AI didn’t do anything I couldn’t have done; it did things I would have done, faster.

The 2026 tool tradeoffs, briefly

Copilot — best inline autocomplete, weak on multi-file refactors. The default if you live in VS Code and don’t want to think about it.
Cursor — VS Code fork with first-class repo-level context and an excellent chat panel. Worth the switch if you want the AI to see more than one file at a time.
Claude in Cursor / Continue — same shell, smarter model for harder tasks. The combination I use for non-trivial refactors.
Aider — terminal-based, git-aware, makes commits for you. Great for batch refactors and headless workflows.
Codex CLI — OpenAI’s terminal agent, similar territory to Aider with different ergonomics.

The tools converge over time; the workflow doesn’t. Type-hint, document conventions, review every line, prompt small. Do that and any of them work. Skip those and none of them do.