Significance of order of columns in an index
I have been delivering a certified course in MS SQL Server 2012 recently and I was asked a very interesting question "Does the order of the column in an index matter?"
Let's give some basic definitions first and make some analogies.
A single column index is straightforward. Think of it as an index in the back of the book.
Let’s say you want to learn about “DBCC FREEPROCCACHE” . You look for that command in the index of the book. The index does not have the information. It has a pointer to the page where the command is described.You turn to that page and read about it. This is a good analogy for a single column,non-clustered index
In SQL Server you can also create an index that contains more than one column.This is known as the composite index. A good analogy for a composite index is the telephone book.
A telephone book lists everyone who has publicly available a telephone number in an area.It is organised not by one column but by two:last name and first name.To look up someone in the telephone book , we first navigate to the last name and then the first name. To find John Smith you first locate Smiths and then John. Composite indexes contain more than 1 column and can reference up to 16 columns from a single table or view.
Back to our question.I have answered this question with the demo below.
I have installed SQL Server 2012 Enterprise edition in my machine but you can use the SQL Server 2012/2008 Express edition as well.
I am connecting to my local instance through Windows Authentication and in a new query window I type (you can copy paste)
USE tempdb
GO
--create a sample table
CREATE TABLE Customers
(
Customer_ID INT NOT NULL IDENTITY(1,1),
Last_Name VARCHAR(20) NOT NULL,
First_Name VARCHAR(20) NOT NULL,
Email_Address VARCHAR(50) NULL
)
--create a clustered index on Customer_ID and
--a non-clustered composite index on the Last_Name and First_Name
CREATE CLUSTERED INDEX ix_Customer_ID ON Customers(Customer_ID)
CREATE NONCLUSTERED INDEX ix_Customer_Name ON Customers(Last_Name,First_Name)
--when we issue a query to SQL Server that
--retrieves data from the Customers table, the
--SQL Server query optimiser will consider
--the various retrieval methods at its disposal
--and select one it deems most appropriate
-- insert test row
INSERT INTO customerS VALUES('Smith','John','[email protected]')
--use SQL Data Generator to create sample data 10000 rows
SELECT * FROM customers
-- DBCC DROPCLEANBUFFERS removes all
-- buffers from the buffer pool
-- DBCC freeproccache removes all
-- entries from the procedure cache
DBCC dropcleanbuffers
DBCC freeproccache
-- when we run this query and see the execution plan
-- we have an index seek on the nonclustered index
-- to locate the rows selected by the query
-- we also have a key lookup to find the values for the email
-- to retrieve the non-indexed columns
SELECT last_name ,
first_name ,
email_address
FROM customers
WHERE Last_Name = 'Smith'
AND First_Name = 'John'
-- what happens in this case?
--where we have the WHERE statement in different order than the index order?
-- it will use the same execution plan
SELECT last_name ,
first_name ,
email_address
FROM customers
WHERE First_Name = 'John'
AND Last_Name = 'Smith'
-- and what happens in this case?
--where we use only last name?
--the same plan is used
SELECT last_name ,
first_name ,
email_address
FROM customers
WHERE Last_Name = 'Smith'
-- what happens in this case?
-- when we use only first_name
--the index cannot be used
-- an index cannot be used to seek rows of data when the first column
-- of the index is not specified in the WHERE clause
INSERT INTO customers VALUES ('kantzelis','nikos','[email protected]')
SELECT last_name ,
first_name ,
email_address
FROM customers
WHERE First_Name = 'nikos'
SET STATISTICS IO on
DBCC dropcleanbuffers
DBCC freeproccache
SELECT last_name ,
first_name ,
email_address
FROM customers
WHERE Last_Name = 'kantzelis'
-- now lets drop the index to see what happens
DROP INDEX ix_Customer_Name ON customers
--and rerun the query
--we see a huge increase in logical
-- reads without the index
--we have in this case, in the
-- absence of the index a clustered index scan
--which means that each row of the table had to be read
SELECT last_name ,
first_name ,
email_address
FROM customers
WHERE Last_Name = 'kantzelis'
Let me explain what I am doing in this bit of code
- I make the tempdb the current database
- I create a new table called Customers
- I create a clustered index on the table ix_Customer_ID on Customer_ID
- I insert a test row to the table
- Then I use a third party generator tool to create 10.000 records
- Then I just use a simple statement to make sure all the rows were inserted
- Then I remove all buffers from the buffer pool and all entries from the procedure cache
- Then (I activate the actual execution plan) and execute the Select query in line 43.
- We have an index seek on the non clustered index and a key lookup to get the values for the email column.
- Then I execute the Select statement in lines 54. The order of the columns in the WHERE clause are different than the index order.
- Still the same execution plan is used
- Then I execute the Select statement in lines 65. We have just the last_name column in the WHERE clause.The same plan is used
- Then in line 76 I insert a new value in the table
- Then I execute another Select statement (lines 78) where just the first_name column is in the WHERE clause.In this case the index cannot be used. An index cannot be used to seek rows of data when the first column of the index is not specified in the WHERE clause.
- Then in line 84 I set statistics IO on so i can investigate the logical reads when having or not the index.
- In line 89 I run a Select query again and make a note of the logical reads , which is 4 in my case.
- Then I drop the index and rerun the query (line 103). This time the logical reads are 77. So you can see that the performance of the query without the index has been dramatically decreased.
Hope it helps!!!