Copy
Buckle up: time to learn more about SQL Server,
or whatever I'm obsessed with this week.

Free Fundamentals of Index Tuning Week: Part 1, Indexing for the WHERE Clause

I’m teaching my Mastering Index Tuning pre-con at SQLBits and SQL Saturday Iceland, and part of the prerequisites is that you’re already familiar with my How to Think Like the Engine class and my Fundamentals of Index Tuning class.

I was going to email the class attendees a free copy of Fundamentals of Index Tuning so they could prep, but then I thought…hey, why not just give it out to the public for one week only? So this week, I’m going to put the class modules right here on the blog. But follow along, because on April 1, they’re going to disappear. Let’s get started!

Each day’s post will have:

  • A video of my lecture
  • The demo scripts I use in the lecture
  • A hands-on lab for you to do
  • A video of me doing the same lab (but don’t watch that lab until you do it first!)

To follow along with the demos, download a Stack Overflow database. I’ll be using the medium-sized 50GB StackOverflow2013 database, but if you’ve got a different one handy (either the smaller 10GB or larger 300+GB one), that’s fine too – the same concepts will still apply. If you’re on a slow desktop/laptop, feel free to use the small 10GB StackOverflow2010 database.

Attach it to any supported version of SQL Server, can be Standard, Enterprise, or Developer. (Sorry, SQL Server 2008 and Express Edition won’t cut it here.) If you want a free development or evaluation version of SQL Server, hit up SQLServerUpdates.com.

Let’s get started!

Lecture: Indexing for the WHERE Clause (33 min)

It sounds easy at first: just look at the list of fields in the WHERE clause, and put them in the index keys. But what order should you put them in?

There’s a common misconception that the most selective (or unique) fields should go first. For example, if you have a column with a million distinct values, and another column with only a hundred, then the million distinct values must be more selective. That’s true – but only if you’re doing an equality search.

We’ll look at equality searches, range searches, and combinations to come up with rules for key ordering.

Invalid Vimeo Video Id: 377113839

 

Lecture Scripts

If you want to follow along with me during the class, this will make it a little easier. I often go off-script and tweak demos based on attendee questions – this is just my starting point:

/*
Fundamentals of Index Tuning: WHERE Clause

v1.2 - 2019-09-04

https://www.BrentOzar.com/go/indexfund


This demo requires:
* Any supported version of SQL Server
* Any Stack Overflow database: https://www.BrentOzar.com/go/querystack

This first RAISERROR is just to make sure you don't accidentally hit F5 and
run the entire script. You don't need to run this:
*/
RAISERROR(N'Oops! No, don''t just hit F5. Run these demos one at a time.', 20, 1) WITH LOG;
GO


/* I'm using the 50GB medium Stack database: */
USE StackOverflow2013;
GO
/* And this stored procedure drops all nonclustered indexes: */
DropIndexes;
GO
/* It leaves clustered indexes in place though. */


SET STATISTICS IO ON;

/* Visualizing index contents: */
CREATE INDEX IX_LastAccessDate_Id
  ON dbo.Users(LastAccessDate, Id);
GO
/* Becomes: */
SELECT TOP 1000 LastAccessDate, Id
  FROM dbo.Users
  ORDER BY LastAccessDate, Id;
GO


EXEC DropIndexes;
GO


CREATE INDEX IX_LastAccessDate_Id_DisplayName_Age
  ON dbo.Users(LastAccessDate, Id, DisplayName, Age);
GO
/* Becomes: */
SELECT LastAccessDate, Id, DisplayName, Age
  FROM dbo.Users
  ORDER BY LastAccessDate, Id, DisplayName, Age;
GO


EXEC DropIndexes;
GO



CREATE INDEX IX_LastAccessDate_Id_Includes
  ON dbo.Users(LastAccessDate, Id)
  INCLUDE (DisplayName, Age);
GO
/* Becomes: */
SELECT LastAccessDate, Id, DisplayName, Age
  FROM dbo.Users
  ORDER BY LastAccessDate, Id  /* DisplayName, Age;  These aren't sorted */
GO


EXEC DropIndexes;
GO


/* Now you try: write a SELECT query to visualize each of these: */

CREATE INDEX IX_Reputation_Location_Includes
  ON dbo.Users(Reputation, Location)
  INCLUDE (DisplayName);
GO


CREATE INDEX IX_CreationDate_Views
  ON dbo.Users(CreationDate, Views)
  INCLUDE (DownVotes, UpVotes);
GO


CREATE INDEX IX_Age_Reputation_Location
  ON dbo.Users(Age, Reputation, Location);
GO



EXEC DropIndexes;
GO



/* Design an index for this: */
SELECT Id, DisplayName, Location
  FROM dbo.Users
  WHERE DisplayName = 'alex';

CREATE INDEX IX_DisplayName_Includes
  ON dbo.Users(DisplayName)
  INCLUDE (Location);

SELECT DisplayName, Location, Id
  FROM dbo.Users
  ORDER BY DisplayName;

SELECT Id, DisplayName, Location
  FROM dbo.Users
  WHERE DisplayName = 'alex';


/* Design an index for this: */
SELECT Id, DisplayName, Location
  FROM dbo.Users
  WHERE DisplayName = 'alex'
    AND Location = 'Seattle, WA';

/* But we already have this.... */
CREATE INDEX IX_DisplayName_Includes
  ON dbo.Users(DisplayName)
  INCLUDE (Location);

/* Visualize the index: */
SELECT DisplayName, Location
  FROM dbo.Users
  ORDER BY DisplayName;

/* Visualize the index contents: */
SELECT DisplayName, Location
  FROM dbo.Users
  WHERE DisplayName = 'alex'
  ORDER BY DisplayName;

/* Create a couple of index options: */
CREATE INDEX IX_DisplayName_Location
  ON dbo.Users(DisplayName, Location);

CREATE INDEX IX_Location_DisplayName
  ON dbo.Users(Location, DisplayName);
GO

/* Test 'em with index hints: */
SET STATISTICS IO ON;
GO
SELECT Id, DisplayName, Location
  FROM dbo.Users WITH (INDEX = 1) /* Clustered index scan */
  WHERE DisplayName = N'alex'
    AND Location = N'Seattle, WA';

SELECT Id, DisplayName, Location
  FROM dbo.Users WITH (INDEX = IX_DisplayName_Includes)
  WHERE DisplayName = N'alex'
    AND Location = N'Seattle, WA';

SELECT Id, DisplayName, Location
  FROM dbo.Users WITH (INDEX = IX_DisplayName_Location)
  WHERE DisplayName = N'alex'
    AND Location = N'Seattle, WA';

SELECT Id, DisplayName, Location
  FROM dbo.Users WITH (INDEX = IX_Location_DisplayName)
  WHERE DisplayName = N'alex'
    AND Location = N'Seattle, WA';
GO

/* Which one does SQL Server pick? */
SELECT Id, DisplayName, Location
  FROM dbo.Users
  WHERE DisplayName = N'alex'
    AND Location = N'Seattle, WA';
GO



SET STATISTICS IO ON;
GO
SELECT Id, DisplayName, Location
  FROM dbo.Users WITH (INDEX = 1) /* Clustered index scan */
  WHERE DisplayName = N'alex'
    AND Location <> N'Seattle, WA';

SELECT Id, DisplayName, Location
  FROM dbo.Users WITH (INDEX = IX_DisplayName_Includes)
  WHERE DisplayName = N'alex'
    AND Location <> N'Seattle, WA';

SELECT Id, DisplayName, Location
  FROM dbo.Users WITH (INDEX = IX_DisplayName_Location)
  WHERE DisplayName = N'alex'
    AND Location <> N'Seattle, WA';

SELECT Id, DisplayName, Location
  FROM dbo.Users WITH (INDEX = IX_Location_DisplayName)
  WHERE DisplayName = N'alex'
    AND Location <> N'Seattle, WA';
GO

/* Showing the total pages in each index: */
SELECT COUNT(*)
  FROM dbo.Users WITH (INDEX = 1) /* Clustered index scan */

SELECT COUNT(*)
  FROM dbo.Users WITH (INDEX = IX_DisplayName_Includes)

SELECT COUNT(*)
  FROM dbo.Users WITH (INDEX = IX_DisplayName_Location)

SELECT COUNT(*)
  FROM dbo.Users WITH (INDEX = IX_Location_DisplayName)
GO

/* Which one does SQL Server pick? */
SELECT Id, DisplayName, Location
  FROM dbo.Users
  WHERE DisplayName = N'alex'
    AND Location <> N'Seattle, WA';
GO






/*
License: Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
More info: https://creativecommons.org/licenses/by-sa/3.0/

You are free to:
* Share - copy and redistribute the material in any medium or format
* Adapt - remix, transform, and build upon the material for any purpose, even 
  commercially

Under the following terms:
* Attribution - You must give appropriate credit, provide a link to the license,
  and indicate if changes were made.
* ShareAlike - If you remix, transform, or build upon the material, you must
  distribute your contributions under the same license as the original.
*/

Hands-on demo: now it’s your turn.

Now, it’s your turn: in the Stack Overflow database (any size is fine, including the small 10GB one), take the below queries individually and design the right nonclustered indexes to make them dramatically faster.

As you work through the list:

  • Try different combinations of key and include fields for each query
  • Use index hints to measure the logical reads of each index option
  • Try to keep the number of indexes on the Users table to 5 or less

I’ll get you started by walking you through the beginning of it in this 13-minute video:

Invalid Vimeo Video Id: 377121492

And now here are the scripts you’ll be working with:

/*
Fundamentals of Index Tuning: WHERE Lab

v1.1 - 2019-06-03

https://www.BrentOzar.com/go/indexfund


This demo requires:
* Any supported version of SQL Server
* Any Stack Overflow database: https://www.BrentOzar.com/go/querystack

This first RAISERROR is just to make sure you don't accidentally hit F5 and
run the entire script. You don't need to run this:
*/
RAISERROR(N'Oops! No, don''t just hit F5. Run these demos one at a time.', 20, 1) WITH LOG;
GO


/* This stored procedure drops all nonclustered indexes: */
DropIndexes;
GO
/* It leaves clustered indexes in place though. */





/* ****************************************************************************
FIRST LAB CHALLENGE: design the right index for this:
*/
SELECT DisplayName, Id
  FROM dbo.Users
  WHERE WebsiteUrl = 'http://127.0.0.1'
    AND Location = 'United States';
GO

/* 
Which field should go first in the WHERE clause?
Which filter is more selective?
*/
SELECT COUNT(*) AS NumberOfRowsInTheTable FROM dbo.Users;

SELECT COUNT(*) /* DisplayName, Id */
  FROM dbo.Users
  WHERE WebsiteUrl = 'http://127.0.0.1'
    /* AND Location = 'United States'; */

SELECT COUNT(*) /* DisplayName, Id */
  FROM dbo.Users
  WHERE /* WebsiteUrl = 'http://127.0.0.1'
    AND */ Location = 'United States';


/* 
Looks like our WebsiteUrl is more selective for these parameters.
(If we change the parameter values, that might not be the case anymore.)

This one seems best:
*/

CREATE INDEX IX_WebsiteUrl_Location_Includes 
  ON dbo.Users(WebsiteUrl, Location) 
  INCLUDE (DisplayName); /* The fields from our SELECT */

/* For testing purposes, we'll also create the opposite index: */
CREATE INDEX IX_Location_WebsiteUrl_Includes 
  ON dbo.Users(Location, WebsiteUrl) 
  INCLUDE (DisplayName); /* The fields from our SELECT */
GO



/* Now take your original query, and run it with hints to test which index
gives you the lowest logical reads: */

SET STATISTICS IO ON;
GO
SELECT DisplayName, Id
  FROM dbo.Users WITH (INDEX = 1) /* The original clustered index */
  WHERE WebsiteUrl = 'http://127.0.0.1'
    AND Location = 'United States';

SELECT DisplayName, Id
  FROM dbo.Users WITH (INDEX = IX_WebsiteUrl_Location_Includes)
  WHERE WebsiteUrl = 'http://127.0.0.1'
    AND Location = 'United States';

SELECT DisplayName, Id
  FROM dbo.Users WITH (INDEX = IX_Location_WebsiteUrl_Includes)
  WHERE WebsiteUrl = 'http://127.0.0.1'
    AND Location = 'United States';
GO


/* It's a dead heat! They're both good for these parameters.

Drop the index that we decide not to use, and then keep going. Remember, as you
work through the exercises, I want you to aim for 5 or less indexes per table,
with around 5 or less fields per index.

Alright, you're up next. */






/* ****************************************************************************
NEXT EXERCISE: design the right index to find the nicest people:
*/
SELECT DisplayName, Location, UpVotes, Id
  FROM dbo.Users
  WHERE DownVotes = 0
    AND UpVotes > 100;
GO






/* ****************************************************************************
NEXT EXERCISE: find German people with a high reputation:
*/
SELECT DisplayName, Location, Reputation, Id
  FROM dbo.Users
  WHERE Location LIKE '%Germany%'
    AND Reputation > 100000;
GO



/* ****************************************************************************
LET'S MIX THINGS UP: You've created a few indexes so far. Pick one of them,
and write 3 queries:

* One that will scan the index (and only that index, not touching any others)
* Write a query that will do an index seek (but again, not touching any others)
* Write a query that will use that index, but then get a residual predicate
  (Reminder: that's a query that uses the index to do a seek, but then has to
  do an additional filter, like maybe going over to the clustered index to do a
  key lookup, and do additional filtering there)
*/




/* ****************************************************************************
NEXT UP: find people who match an unusual filter:
*/
SELECT DisplayName, Location, Reputation, Id
  FROM dbo.Users
  WHERE Location = 'Moscow, Russia'
     OR DisplayName LIKE 'Dmitry%';
GO






/* ****************************************************************************
NEXT QUESTION: design the right index to find all of the people who created an
account, but then never accessed the system again:
*/
SELECT CreationDate, LastAccessDate, DisplayName, Id
  FROM dbo.Users
  WHERE CreationDate = LastAccessDate;
GO





/* ****************************************************************************
TRICKY BONUS QUESTION: design the right index for this.
*/

SELECT CreationDate, DisplayName, Location
  FROM dbo.Users
  WHERE CreationDate >= '2009-01-01'
    AND CreationDate < '2009-01-02'
    AND Reputation = 1;
GO
/* 
It has two fields in the WHERE clause. Figure out which one should go first
using the same methods we've been using so far, but after you have both indexes
in place, test them.

Which part of the where clause is more selective?
Which index has less logical reads?

Here's the tricky part: why?
*/




/*
License: Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0)
More info: https://creativecommons.org/licenses/by-sa/3.0/

You are free to:
* Share - copy and redistribute the material in any medium or format
* Adapt - remix, transform, and build upon the material for any purpose, even 
  commercially

Under the following terms:
* Attribution - You must give appropriate credit, provide a link to the license,
  and indicate if changes were made.
* ShareAlike - If you remix, transform, or build upon the material, you must
  distribute your contributions under the same license as the original.
*/

Think you’ve solved it? Check your answers against mine. (41m)

Invalid Vimeo Video Id: 377118099

Tomorrow, we’ll dig into indexing for ORDER BY, then Wednesday we’ll cover joins, and then Thursday we’ll finish up by analyzing Clippy’s index recommendations.

If you enjoy this class, you’re gonna love my Mastering Index Tuning pre-con at SQLBits and SQL Saturday Iceland. If you can’t join me in person there, I’m running a sale on my Live Class Season Pass this month: just $995 while these blog posts are up. See you in class!

This week's sponsor: don't knock it til you try it. free database performance monitoring tool, Spotlight Cloud.


 

Agree? Disagree? Leave a comment.

 
sfs_icon_twitter.png
sfs_icon_forward.png
icon_feed.png
Copyright © 2020 Brent Ozar Unlimited®, All rights reserved.