You start writing a simple query at Runehold: “top-10 customers by lifetime value.” Then marketing asks “only customers in Germany.” Then “and only those whose LTV is above the country average.” Then “show both last quarter and this quarter, side-by-side.” Suddenly your simple query is 60 lines, nested three deep, and nobody can read it.
This lesson is about keeping those queries readable. We’ll cover subqueries (queries inside queries), derived tables (subqueries in the FROM), correlated subqueries (subqueries that reference the outer row), and CTEs (Common Table Expressions, aka WITH clauses) — the tool that lets you decompose a big query into named, readable steps.
The four flavours of subquery
1. Scalar subquery: returns a single value
SELECT o.OrderId,
o.Total,
(SELECT AVG(Total) FROM Sales.Orders) AS overall_average
FROM Sales.Orders AS o
WHERE o.Total > (SELECT AVG(Total) FROM Sales.Orders);
The inner query returns a single value (one row, one column). You can use it anywhere a literal value would go: in SELECT, in WHERE, in ORDER BY.
Gotcha: if the scalar subquery returns more than one row at runtime, you get an error. SQL Server doesn’t know which value to use. Write scalar subqueries so they provably return one row.
2. Multi-row subquery with IN or EXISTS
-- All orders from customers in Italy
SELECT OrderId, Total
FROM Sales.Orders
WHERE CustomerId IN (SELECT CustomerId FROM Sales.Customer WHERE Country = 'IT');
-- Equivalent with EXISTS (often clearer for complex conditions)
SELECT o.OrderId, o.Total
FROM Sales.Orders AS o
WHERE EXISTS (
SELECT 1 FROM Sales.Customer c
WHERE c.CustomerId = o.CustomerId AND c.Country = 'IT'
);
IN is fine with a simple column. EXISTS is better when the subquery has multiple conditions or needs to reference the outer row. Both produce nearly identical plans in modern SQL Server; pick based on readability.
3. Derived table: subquery in the FROM
-- Per-country AOV, filtered to countries with at least 10 orders
SELECT country, orders, aov
FROM (
SELECT CountryCode AS country,
COUNT(*) AS orders,
AVG(Total) AS aov
FROM Sales.Orders
GROUP BY CountryCode
) AS country_stats
WHERE orders >= 10
ORDER BY aov DESC;
The subquery (SELECT ... GROUP BY CountryCode) is a derived table. It has to be aliased (here, country_stats), and the outer query treats it like any other table. Useful when you need to aggregate once and then filter/join on the aggregated result.
4. Correlated subquery: references the outer row
-- For each customer, show their latest order's date
SELECT c.CustomerId,
c.Name,
(SELECT MAX(o.OrderDate)
FROM Sales.Orders o
WHERE o.CustomerId = c.CustomerId) AS last_order
FROM Sales.Customer AS c;
The inner query references c.CustomerId, which comes from the outer query. It runs conceptually once per outer row. This is called a correlated subquery.
Correlated subqueries look slow but the optimizer usually rewrites them into a join or a semi-join. Still, prefer explicit joins when possible — they express the intent more clearly.
CTEs: the WITH clause
A CTE (Common Table Expression) is a named subquery you define at the top of the statement and reference later:
WITH country_stats AS (
SELECT CountryCode AS country,
COUNT(*) AS orders,
AVG(Total) AS aov,
SUM(Total) AS revenue
FROM Sales.Orders
GROUP BY CountryCode
),
top_customers AS (
SELECT TOP (10) c.Country, c.Name, SUM(o.Total) AS ltv
FROM Sales.Customer AS c
JOIN Sales.Orders AS o ON o.CustomerId = c.CustomerId
GROUP BY c.Country, c.Name
ORDER BY ltv DESC
)
SELECT cs.country, cs.revenue, cs.aov, tc.Name AS top_customer, tc.ltv
FROM country_stats AS cs
LEFT JOIN top_customers AS tc ON tc.Country = cs.country
ORDER BY cs.revenue DESC;
Compare this with the same logic written as nested subqueries — it would be 30 lines of nesting, unreadable after a week. CTEs turn big queries into named steps, each one a self-contained thought.
CTE rules:
- Multiple CTEs are comma-separated.
- Each CTE can reference earlier CTEs in the same
WITH. - The CTE only exists for this one statement. Not a view. Not stored.
- CTEs are usually inlined by the optimizer — no performance penalty versus a derived table.
When to reach for a CTE:
- Any query that would nest subqueries two or more levels.
- Any query you’d want to comment with “step 1”, “step 2.”
- Recursive queries (see next section).
The “is it the same plan” myth
You’ll hear people say “CTEs are the same as derived tables, same plan, same performance.” That’s mostly true in SQL Server: the optimizer usually inlines a non-recursive CTE and treats it the same as a derived table.
Not always. A CTE referenced multiple times isn’t materialized; the optimizer inlines the definition into each reference. For expensive CTEs used more than once, this means the expensive operation runs repeatedly. If you see that pattern and the CTE is genuinely expensive, the workaround is to dump the CTE’s results into a temp table and join against that.
-- If country_stats is very expensive and we reference it twice:
DROP TABLE IF EXISTS #country_stats;
SELECT CountryCode AS country,
COUNT(*) AS orders,
AVG(Total) AS aov
INTO #country_stats
FROM Sales.Orders
GROUP BY CountryCode;
-- Then the outer queries use #country_stats, computed once.
SELECT * FROM #country_stats WHERE orders > 100;
SELECT * FROM #country_stats WHERE aov < 50;
Temp tables (the # prefix) live for the life of your session. They materialize; the query optimizer sees their statistics; they can even be indexed. For truly expensive intermediate computations, they win over CTEs. For everything else, CTEs are cleaner.
Recursive CTEs: less scary than they sound
A recursive CTE references itself. Classic use: walking a hierarchy, like Runehold’s org chart (employees → managers → directors → VPs).
WITH org_tree AS (
-- Anchor: the CEO, the top of the tree
SELECT EmployeeId, FullName, ManagerId, 0 AS level
FROM HR.Employee
WHERE ManagerId IS NULL
UNION ALL
-- Recursive part: add rows one level below what we already have
SELECT e.EmployeeId, e.FullName, e.ManagerId, t.level + 1
FROM HR.Employee AS e
JOIN org_tree AS t ON t.EmployeeId = e.ManagerId
)
SELECT level, REPLICATE(' ', level) + FullName AS indented_name
FROM org_tree
ORDER BY level, FullName;
Structure:
- Anchor — the starting point. “Rows where ManagerId is NULL” = the CEO.
UNION ALL— mandatory; notUNION.- Recursive part — a query that joins the CTE to itself, adding one more layer each iteration.
The engine runs the anchor first, then the recursive part repeatedly, adding rows until the recursive part returns zero new rows. Fails safely with a default limit of 100 recursion levels; override with OPTION (MAXRECURSION 500) if you have a genuinely deep tree.
Uses of recursive CTEs:
- Org charts, folder trees, comment threads, category nesting.
- Bill-of-materials (product → parts → sub-parts).
- Generating date ranges: “give me a row for every day between X and Y.”
- Graph traversal.
For anything reachable from one “anchor,” a recursive CTE is the right tool. It’s not magic; it’s just repeated self-joining with a termination condition.
Refactoring a monster query
Here’s a before-and-after. The “before” is based on an actual production query I rewrote last year.
Before — nested, 55 lines, impossible to follow:
SELECT
c.Name,
c.Country,
(SELECT COUNT(*) FROM Sales.Orders o
WHERE o.CustomerId = c.CustomerId) AS order_count,
(SELECT SUM(o.Total) FROM Sales.Orders o
WHERE o.CustomerId = c.CustomerId) AS lifetime_value,
(SELECT MAX(o.OrderDate) FROM Sales.Orders o
WHERE o.CustomerId = c.CustomerId) AS last_order_date,
CASE
WHEN (SELECT COUNT(*) FROM Sales.Orders o
WHERE o.CustomerId = c.CustomerId) = 0 THEN 'Never ordered'
WHEN (SELECT MAX(o.OrderDate) FROM Sales.Orders o
WHERE o.CustomerId = c.CustomerId) < DATEADD(MONTH, -6, GETDATE()) THEN 'Dormant'
ELSE 'Active'
END AS status
FROM Sales.Customer c
WHERE c.Country IN ('IT', 'NL', 'DE');
Four correlated subqueries, each running per customer. The CASE repeats two of them. Slow, unreadable, brittle.
After — using a CTE:
WITH customer_orders AS (
SELECT o.CustomerId,
COUNT(*) AS order_count,
SUM(o.Total) AS lifetime_value,
MAX(o.OrderDate) AS last_order_date
FROM Sales.Orders AS o
GROUP BY o.CustomerId
)
SELECT
c.Name,
c.Country,
COALESCE(co.order_count, 0) AS order_count,
COALESCE(co.lifetime_value, 0) AS lifetime_value,
co.last_order_date,
CASE
WHEN co.order_count IS NULL OR co.order_count = 0 THEN 'Never ordered'
WHEN co.last_order_date < DATEADD(MONTH, -6, GETDATE()) THEN 'Dormant'
ELSE 'Active'
END AS status
FROM Sales.Customer AS c
LEFT JOIN customer_orders AS co ON co.CustomerId = c.CustomerId
WHERE c.Country IN ('IT', 'NL', 'DE');
One aggregation, joined once, used four times. Same result. Faster. Readable. Easier to add a new column.
This refactor is the single most common one I do in real code. If you see a query with the same scalar subquery five times, reach for a CTE immediately.
Run this on your own machine
USE Runehold;
GO
-- 1. Scalar subquery: orders above the overall average
SELECT o.OrderId, o.Total,
(SELECT AVG(Total) FROM Sales.Orders) AS overall_aov
FROM Sales.Orders AS o
WHERE o.Total > (SELECT AVG(Total) FROM Sales.Orders)
ORDER BY o.Total DESC;
-- 2. Derived table: countries with at least 2 orders, with their AOV
SELECT country, orders, aov
FROM (
SELECT CountryCode AS country, COUNT(*) AS orders, AVG(Total) AS aov
FROM Sales.Orders
GROUP BY CountryCode
) AS s
WHERE orders >= 2
ORDER BY aov DESC;
-- 3. CTE version of the same
WITH country_stats AS (
SELECT CountryCode, COUNT(*) AS orders, AVG(Total) AS aov
FROM Sales.Orders
GROUP BY CountryCode
)
SELECT CountryCode, orders, aov
FROM country_stats
WHERE orders >= 2
ORDER BY aov DESC;
-- 4. Multi-CTE: customer classification
WITH customer_orders AS (
SELECT CustomerId,
COUNT(*) AS order_count,
SUM(Total) AS ltv,
MAX(OrderDate) AS last_order
FROM Sales.Orders
GROUP BY CustomerId
),
classified AS (
SELECT c.CustomerId, c.Name, c.Country,
COALESCE(co.ltv, 0) AS ltv,
CASE
WHEN co.order_count IS NULL THEN 'Never ordered'
WHEN co.last_order < DATEADD(MONTH, -6, GETDATE()) THEN 'Dormant'
ELSE 'Active'
END AS status
FROM Sales.Customer AS c
LEFT JOIN customer_orders AS co ON co.CustomerId = c.CustomerId
)
SELECT * FROM classified ORDER BY ltv DESC;
-- 5. Recursive CTE: generate every day between two dates
WITH days AS (
SELECT CAST('2026-04-01' AS DATE) AS d
UNION ALL
SELECT DATEADD(DAY, 1, d)
FROM days
WHERE d < '2026-04-10'
)
SELECT d, DATENAME(WEEKDAY, d) AS day_of_week
FROM days
OPTION (MAXRECURSION 1000);
The recursive one is especially powerful: it gives you a row per date without a date-dimension table. Useful for filling gaps in a time series report.
Next lesson: window functions. ROW_NUMBER, LAG, LEAD, running totals. The thing that makes “top N per group” trivial and replaces half the cases where people would reach for a self-join.