SQL Server, from the ground up Lesson 23 / 40

Filtered indexes and columnstore: the niche heroes

When rowstore isn't enough. Filtered indexes for soft-delete + sparse data. Columnstore for warehouse queries. How to pick and when.

Rowstore indexes — the B-tree we’ve been discussing for the past two lessons — are the default and the right choice for 90% of queries. Today is about the other 10%: filtered indexes for predicates that keep coming back, and columnstore indexes for reporting queries over tens of millions of rows.

Runehold uses both. Filtered indexes on Status (soft-delete flag) and on IsActive (subscription state) speed up the most common customer service queries. Columnstore lives on the finance warehouse star schema, where “revenue by country by channel by month” aggregates over 200 million rows in under a second.

Filtered indexes

A filtered index is a non-clustered index with a WHERE clause. It only indexes rows matching that predicate. Small, cheap, focused.

-- Index only the small fraction of orders that are currently in flight
CREATE NONCLUSTERED INDEX IX_Orders_Open
    ON Sales.Orders (CustomerId, OrderDate DESC)
    INCLUDE (Total, Status)
    WHERE Status IN (1, 2);   -- 1 = new, 2 = processing

Compared to an unfiltered equivalent: this index has maybe 1,000 rows instead of 100 million. Seeks are faster. Maintenance is cheaper. The operations dashboard query that filters by Status IN (1, 2) seeks this small index. Queries that don’t match the filter ignore it.

Common uses

Soft-delete / flag columns:

CREATE NONCLUSTERED INDEX IX_Customer_Active
    ON Sales.Customer (Email)
    WHERE IsDeleted = 0;
-- Only indexes live customers

Sparse columns where most values are NULL:

-- Only 1% of customers have a priority support flag
CREATE NONCLUSTERED INDEX IX_Customer_VIP
    ON Sales.Customer (VipTier)
    WHERE VipTier IS NOT NULL;

Unique among a subset:

-- Email must be unique among active customers; deleted customers can reuse
CREATE UNIQUE NONCLUSTERED INDEX UX_Customer_Email_Active
    ON Sales.Customer (Email)
    WHERE IsDeleted = 0;

That last one is an underused trick. SQL Server won’t let you create a UNIQUE constraint on a column that has duplicates, but a filtered UNIQUE index lets you enforce uniqueness only among rows that matter.

Gotchas

Filtered indexes have picky rules:

  • The query’s WHERE must include the filter predicate for the optimizer to use the index.
  • Some parameterization patterns prevent matches. A literal WHERE Status IN (1, 2) matches; a parameterized WHERE Status = @s doesn’t, even when @s is 1. Workaround: recompile, or use a stored procedure written for the filter.
  • Filtered indexes don’t participate in certain types of statistics usage, occasionally leading to weird estimates.
  • Creating a filtered index on a large, always-changing set can update-storm on maintenance.

For well-defined “status = active” style queries, they’re nearly magic. For anything else, test thoroughly.

Columnstore indexes

A columnstore index stores data by column instead of by row. Each column is compressed separately. For analytics queries that read many rows but few columns — “total revenue per country per month over five years” — columnstore is 10-100× faster than rowstore.

Clustered columnstore

Replaces the clustered index entirely:

CREATE CLUSTERED COLUMNSTORE INDEX CCI_OrderFact
    ON Fact.OrderLine;

The whole table is now stored column-by-column, compressed heavily (typically 5-10× reduction from rowstore size). You write to it with INSERT, UPDATE, DELETE; reads on a few columns over many rows are very fast; reads of single rows are slower than a B-tree.

Best for: fact tables in a data warehouse, event logs queried aggregate-style, historical data.

Bad for: OLTP — high-frequency single-row lookups are much slower than rowstore.

Non-clustered columnstore

Added alongside an existing rowstore table. Readers who want analytics use the columnstore; OLTP writes go through the B-tree as normal.

CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Orders_Analytics
    ON Sales.Orders (OrderDate, CustomerId, CountryCode, Status, Total);

Hybrid workload: OLTP writes hit both structures; analytics queries hit the columnstore. Slower writes than rowstore-only, but readers get both worlds.

Batch-mode execution

Columnstore unlocks batch-mode execution — the engine processes 900 rows at a time in tight, vectorized loops instead of row-by-row. Dramatic CPU savings on aggregate queries. Your query goes from 60 seconds to 0.8.

SQL Server 2019+ also supports batch mode on rowstore for some queries (“batch mode on rowstore”), which you get for free on an Enterprise-licensed instance. Useful when you can’t add a columnstore but want some of the benefit.

Rules of thumb

  • Fact tables with 5M+ rows: columnstore is usually worth it.
  • Tables smaller than 100K rows: skip columnstore; a rowstore with the right indexes is fine.
  • Hot OLTP tables: don’t use clustered columnstore. Consider non-clustered columnstore if you need to run analytics against live data.

When to pick what

Decision flow for Runehold’s engineering team:

  1. Start with rowstore and a covering non-clustered index. 90% done.
  2. Is the predicate always the same, narrow, and well-known? Add a filtered index.
  3. Is this a fact table that grows, and are most queries aggregating? Add a columnstore.
  4. Do you have a hybrid workload — OLTP writes + analytics reads on the same table? Non-clustered columnstore on top of the rowstore.

Runehold’s Sales.Orders table: rowstore with covering non-clustered indexes for customer-facing lookups, plus a non-clustered columnstore for the finance reporting layer. Best of both worlds.

Index hints (use sparingly)

Sometimes the optimizer picks a bad plan. You can force an index with a hint:

SELECT * FROM Sales.Orders WITH (INDEX(IX_Orders_Open)) WHERE Status IN (1, 2);

Do this rarely. Every index hint is a bet that the plan you pick today will still be best in six months. Upgrades, statistics changes, data growth can make the hint wrong. Prefer updating statistics, fixing the index, or using Query Store plan forcing (lesson 28) over hardcoded hints.

Run this on your own machine

USE Runehold;
GO

-- Demo: filtered index
CREATE NONCLUSTERED INDEX IX_Orders_Open_Customer
    ON Sales.Orders (CustomerId, OrderDate DESC)
    INCLUDE (Total)
    WHERE Status IN (1, 2);

-- Runs fast IF the WHERE includes the filter condition
SELECT CustomerId, OrderDate, Total
FROM Sales.Orders
WHERE Status IN (1, 2)
  AND CustomerId = 42;

-- Check the plan: should show an Index Seek on IX_Orders_Open_Customer.

-- If you parameterize differently, the filtered index may NOT be used:
DECLARE @s TINYINT = 1;
SELECT CustomerId, OrderDate, Total
FROM Sales.Orders
WHERE Status = @s
  AND CustomerId = 42;
-- This may not match the filtered index. Check the plan.

-- Demo: clustered columnstore on a small fact table
IF OBJECT_ID('Reporting.OrderFact', 'U') IS NULL
BEGIN
    IF SCHEMA_ID('Reporting') IS NULL EXEC('CREATE SCHEMA Reporting AUTHORIZATION dbo');
    CREATE TABLE Reporting.OrderFact (
        OrderId     BIGINT,
        OrderDate   DATE,
        CustomerId  INT,
        CountryCode CHAR(2),
        Channel     VARCHAR(20),
        Total       DECIMAL(19, 4)
    );
END;

-- Fill 1M rows for the demo
INSERT INTO Reporting.OrderFact (OrderId, OrderDate, CustomerId, CountryCode, Channel, Total)
SELECT TOP (1000000)
    ROW_NUMBER() OVER (ORDER BY (SELECT 1)),
    DATEADD(DAY, -ABS(CHECKSUM(NEWID())) % 730, GETDATE()),
    ABS(CHECKSUM(NEWID())) % 5000,
    CHAR(65 + ABS(CHECKSUM(NEWID())) % 5) + CHAR(65 + ABS(CHECKSUM(NEWID())) % 5),
    CASE ABS(CHECKSUM(NEWID())) % 3 WHEN 0 THEN 'Web' WHEN 1 THEN 'Retail' ELSE 'Marketplace' END,
    ABS(CHECKSUM(NEWID())) % 10000 / 10.0
FROM sys.all_objects a CROSS JOIN sys.all_objects b;

-- Baseline rowstore
SET STATISTICS IO, TIME ON;
SELECT CountryCode, Channel, SUM(Total) AS revenue, COUNT(*) AS orders
FROM Reporting.OrderFact
GROUP BY CountryCode, Channel;

-- Add a columnstore
CREATE CLUSTERED COLUMNSTORE INDEX CCI_OrderFact ON Reporting.OrderFact;

-- Same query, much faster
SELECT CountryCode, Channel, SUM(Total) AS revenue, COUNT(*) AS orders
FROM Reporting.OrderFact
GROUP BY CountryCode, Channel;
SET STATISTICS IO, TIME OFF;

-- Cleanup
DROP TABLE Reporting.OrderFact;

Run both queries before and after adding the columnstore. Note the elapsed time and reads. On a million rows the difference is already visible; on 100 million it’s dramatic.

Next lesson: fragmentation, rebuilds, reorgs, and the myth of 30% — what Paul Randal actually said, why the default maintenance plan is wrong for modern SSDs, and how to build a sensible maintenance strategy.

Search