Reading an execution plan without crying

Your query takes 45 seconds. Your colleague writes what looks like the same logic and it runs in two. Yours looks simpler. You copy theirs. It still takes 45 seconds in your code. The universe, as always, is laughing at you.

The thing that’s different is the execution plan: the actual set of steps SQL Server chose to run your query. Learning to read it is like learning to read a map. Once you can, you stop getting lost, and you stop copy-pasting queries from Stack Overflow that “worked for somebody once.”

What an execution plan actually is

When you submit a query, SQL Server’s query optimizer looks at it, looks at statistics about the tables, looks at the available indexes, and picks what it thinks is the cheapest way to get your answer. That plan is a tree of operators — small specialized pieces like “seek this index,” “hash these two inputs together,” “sort the result.” Each operator takes some rows as input and produces some rows as output.

Two important quirks:

The optimizer guesses. It doesn’t actually know how many rows your WHERE clause will match. It estimates using statistics. Bad estimates produce bad plans.
The optimizer caches. Once it has a plan, it reuses it for the next query that looks the same. This is usually great and occasionally a disaster. (See the parameter sniffing section below.)

How to see the plan

In SSMS:

Estimated plan — Ctrl+L or “Display Estimated Execution Plan” button. Shows what the optimizer thinks it will do, without running the query. Useful for massive queries you don’t want to wait on.
Actual plan — Ctrl+M, then run the query. Shows what actually happened, including real row counts. This is the one you want 95% of the time. If you’re not already turning this on out of habit, start today.

In Azure Data Studio, it’s the “Explain” button. In DBeaver, it’s “Execute SQL Statement and Show Plan.” Similar idea, different UI.

To see plans for queries that ran earlier and already left the building:

SELECT TOP 20
    qs.execution_count,
    qs.total_elapsed_time / qs.execution_count / 1000 AS avg_ms,
    SUBSTRING(st.text,
              qs.statement_start_offset / 2 + 1,
              (CASE qs.statement_end_offset
                   WHEN -1 THEN DATALENGTH(st.text)
                   ELSE qs.statement_end_offset
               END - qs.statement_start_offset) / 2 + 1) AS statement_text,
    qp.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle)   AS st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
ORDER BY qs.total_elapsed_time DESC;

Click the query_plan column in the result grid and SSMS will open the plan in a new tab. This is how you find out what the production server has been doing while you weren’t looking.

Read plans right to left, bottom to top

Execution plans are trees that flow right to left. The rightmost operators produce data — scans, seeks, constant rows. The leftmost operator is the final SELECT handing results back to you. Arrows carry rows between operators.

Every operator has three numbers you should look at before anything else:

Cost percent — relative cost of this operator inside this query. An operator at 85% is where your time is going. (Caveat: this percentage is itself an estimate. Trust it as a pointer, not as gospel.)
Estimated rows vs actual rows — if these differ by even 10×, the optimizer is probably making a bad strategic call based on the wrong assumption. Differ by 100× and it’s almost certainly the source of your pain.
Arrow thickness — the thickness of the arrow between operators represents data volume. A thick arrow feeding a thin operator is a filter doing its job. A thin arrow feeding a thick one is a nested loop multiplying rows out of control.

If you only ever look at those three, you will solve a lot of problems on your own without ever learning what “Eager Spool” means.

The three operators that explain ninety percent of slow queries

SQL Server has about fifty operators. You do not need to learn them all. Learn these three and come back for the rest later.

1. Clustered Index Scan (aka “scanning the whole table”)

A scan reads every leaf page of the index. For a clustered index, that means every row in the table. For a million-row table that’s a million rows, even if you only wanted ten.

Scans aren’t inherently evil. If you genuinely need most of the table — a nightly aggregate, an export, a COUNT(*) of the whole thing — a scan is the cheapest option. But if your query has a selective WHERE clause and the plan still shows a scan, it almost always means: no index exists, or the index isn’t usable because of the way the filter is written.

Things that kill indexability:

-- Function on the column: index is useless
WHERE YEAR(OrderDate) = 2024

-- Rewrite as a range: index can seek
WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01'

-- Implicit conversion: index is useless
WHERE CustomerId = '42'    -- CustomerId is INT, string gets converted

-- Rewrite: index can seek
WHERE CustomerId = 42

This is a deceptively big category. “I have an index and it’s still doing a scan” is almost always one of these patterns. When in doubt, copy the query into a new window, select the column in the WHERE clause, and check its data type in the table definition. If the literal you’re comparing against isn’t the same type, you have an implicit conversion and it’s hiding a scan.

2. Key Lookup (the sneaky expensive one)

You have a non-clustered index that finds the rows, but the query wants columns that aren’t in that index. So SQL Server does a seek in the non-clustered index, gets a pointer back to the clustered index, and fetches the missing columns one row at a time. That per-row round-trip is the Key Lookup.

For five rows, fine. For fifty thousand rows, your query just turned into fifty thousand tiny random reads and your fan is spinning up.

The fix is to make the non-clustered index covering — include the missing columns in the INCLUDE list so no round-trip is needed:

CREATE NONCLUSTERED INDEX IX_Orders_Status
    ON dbo.Orders (Status)
    INCLUDE (CustomerName, OrderTotal);

Now the seek returns everything the query needs. Key Lookup disappears from the plan. Your query runs 50× faster. Somebody gives you a promotion. (Step three is optional.)

3. Hash Match and Sort — not bad, but telling

These appear when SQL Server is joining or aggregating data. They’re not inherently a problem. They are information:

Nested Loops — great when one input is tiny. For every row on the outer input, the engine scans the inner input. Small outer + indexed inner = very fast. Large outer = catastrophic.
Hash Match — great when inputs are large and unsorted. Builds a hash table of one side in memory, then probes it with the other. Uses memory.
Merge Join — great when both inputs are already sorted on the join key. Most efficient option if you can get it.

The optimizer picks based on row estimates. When the estimates are badly wrong, it picks badly. If you see a Hash Match eating 70% of your query cost and the row estimate is off by two orders of magnitude, you probably have stale statistics. Update them:

UPDATE STATISTICS dbo.Orders;
-- or for everything:
EXEC sp_updatestats;

If that fixes it, schedule a weekly job. If it doesn’t fix it, you’ve got a more interesting problem, and lesson 26 (statistics and cardinality estimation) is going to be your friend.

The three plan shapes you should recognize on sight

After a while, you start reading plans the way a mechanic reads the sound of an engine. These are the three shapes I want you to be able to name in under a second:

Shape A — the clean seek. Index Seek → SELECT. One operator doing the work, a thin arrow, low cost. This is what a healthy query looks like. Admire it.

Shape B — the seek-plus-lookup. Index Seek (NonClustered) → Nested Loops → Key Lookup (Clustered) → SELECT. The Key Lookup is the problem. Fix with a covering index.

Shape C — the scan feeding a giant hash join. Clustered Index Scan → Hash Match → Hash Match → SELECT, with arrows that get progressively thicker. Somebody forgot an index, statistics are stale, or the query is asking for something that fundamentally can’t be seek-answered. Start with stats. Then indexes. Then rewrite the query.

Parameter sniffing: the invisible villain

SQL Server caches the plan for a query the first time it runs, optimized for whatever parameter values it saw that first time. Next time the same query runs with different values, it reuses the plan. Sometimes the old plan is terrible for the new values. This is called parameter sniffing and it’s the reason behind almost every “it’s fast in SSMS but slow from the application” ticket.

In SSMS you ran it with fresh parameters and got a fresh plan optimized for you. The application hit the cache and got yesterday’s plan optimized for someone else’s parameters. Your query and the application’s are textually identical. The plans aren’t.

Quick fixes, in increasing order of “things will go wrong” risk:

-- Safest: compile a fresh plan for this one call
EXEC dbo.GetOrdersByCustomer @CustomerId = 123 WITH RECOMPILE;

-- Inside the stored procedure itself: generic plan, always
CREATE PROCEDURE dbo.GetOrdersByCustomer @CustomerId INT
AS
BEGIN
    SELECT * FROM dbo.Orders
    WHERE CustomerId = @CustomerId
    OPTION (OPTIMIZE FOR UNKNOWN);
END;

-- Nuclear: flush the single cached plan for this query
DBCC FREEPROCCACHE(plan_handle);

A real fix usually involves Query Store plan forcing, which we’ll cover in lesson 28. For now: if you’ve been chasing a “runs fast in SSMS, slow in the app” ghost, parameter sniffing is the first hypothesis to test.

The 80/20 reading strategy

You do not need to understand every operator. You need a checklist you can run on any plan in under a minute:

Find the fattest arrow. That’s where the most data is moving.
Find the highest cost percentage. That’s where the most time is going.
Compare estimated rows vs actual rows at the expensive operator. Off by more than 10× means either stale statistics or parameter sniffing.
Scan where a seek should be? Missing index, implicit conversion, or a function on the column.
Key Lookup on the expensive branch? Missing INCLUDE columns.
Nested Loops on a huge outer input? Stats are lying to the optimizer about how many rows it would get.

These six checks solve about 90% of slow queries. The remaining 10% is where the course gets interesting (lessons 25 through 28 live in this territory) and where tools like sp_BlitzCache from lesson 38 start paying for themselves.

Run this on your own machine

Here’s a minimal playground you can run in tempdb and see all three operators in one afternoon:

USE tempdb;

CREATE TABLE dbo.LessonOrders (
    OrderId      BIGINT IDENTITY(1,1) PRIMARY KEY,
    CustomerId   INT NOT NULL,
    Status       TINYINT NOT NULL,
    CustomerName NVARCHAR(100) NOT NULL,
    OrderTotal   DECIMAL(18,2) NOT NULL
);

-- Fill it with some data
INSERT INTO dbo.LessonOrders (CustomerId, Status, CustomerName, OrderTotal)
SELECT TOP (100000)
       ABS(CHECKSUM(NEWID())) % 1000,
       ABS(CHECKSUM(NEWID())) % 5,
       N'Customer ' + CAST(ROW_NUMBER() OVER (ORDER BY a.object_id) AS NVARCHAR(10)),
       ABS(CHECKSUM(NEWID())) % 100000 / 100.0
FROM sys.all_objects a CROSS JOIN sys.all_objects b;

-- Turn on actual execution plans: Ctrl+M
-- 1. Scan
SELECT CustomerName, OrderTotal
FROM dbo.LessonOrders
WHERE Status = 2;
-- Plan: Clustered Index Scan. Slow.

-- Add a helpful-looking index
CREATE NONCLUSTERED INDEX IX_Status ON dbo.LessonOrders (Status);

-- 2. Seek + Key Lookup
SELECT CustomerName, OrderTotal
FROM dbo.LessonOrders
WHERE Status = 2;
-- Plan: Index Seek → Key Lookup. Still slow.

-- Make the index covering
DROP INDEX IX_Status ON dbo.LessonOrders;
CREATE NONCLUSTERED INDEX IX_Status
    ON dbo.LessonOrders (Status)
    INCLUDE (CustomerName, OrderTotal);

-- 3. Clean seek
SELECT CustomerName, OrderTotal
FROM dbo.LessonOrders
WHERE Status = 2;
-- Plan: Index Seek. Fast.

Three queries, three shapes. Scan. Seek plus lookup. Clean seek. Once you can see these three at a glance, most of your “it’s slow and I don’t know why” problems are going to look really different.

Next lesson: statistics and cardinality estimation — what actually drives the optimizer’s guessing game, and how to tell when its guesses have gone off the rails.