Pages

Wednesday, September 28, 2011

Select me first please

The order of columns in an index - second blog in my series "From newbie to DBA"


When writing this new blog and really start thinking "oh there is a long way to I can call my self a real DBA". There are still a lot of things I need to learn. But as long as I keep learning I'm happy :)

I started this blog as an inspiration toward SQL PASS Summit 2011 in Seattle, which I really looking forward to. My goal is to put a lots of tips and tricks from one newbie to another trying to use terms and descriptions that are easy to understand and implement. So if anyone finds this useful then please comments.

Yesterday I talked about clusters and how the order of the cluster has an impact on inserting/deleting on heavy tables. Back when I did this my self I found out an other thing which I wasn't aware of but really is quit logical when you thing about it. It is selectivity which can be categorized as high or low.

Just a little background info on index before I get to the real point.

Imaging you have a group of 1000 people standing i front of you and from that crowd you should select all women which has brown hair and at the age of 40 - like 3 fields: Sex, HairColor and Age. You have to run around in the crow and ask each one about the 3 parameters and that can take time. So you can put up and index just like having the crowd standing in a certain order. So if I made my index: Sex, HairColor and Age then I could go to the middle of the crowd and ask are you a women (and then I would get slapped in the face :)). If she was I could then ask for the HairColor but if not I could to the middle of the middle and keep doing that until I found a women. I know that they are order Man and Women and Black, Blond, Brown for hair and so on. That way using the index I don't need to ask everybody to select the people I want.

But I can do even better. With the order I have made in the above "index" I only de-select around half namely the men. So when I come to hair color I still have to find them between 500 women. The selectivity on the sex field is low. But if I started with the hair color and let say we distinguish 5 different colors then I only have to find women between 200 people assuming that there are equally as many with each color.

And I can do better yet as it turns out that the selectivity is high for age. Potentially will only 10 (age 1-100) of then be 40 years old and of those 10 will only 2 have the color brown and only 1 be a women :) Well I know this is unlikely but the point been that by using the high selectivity field Age first in my index and HairColor second the SQL Server will faster find into the fields I want. So the index for this crowd should be Age, HairColor and Sex - uhm I guess my wife is pretty effective then - she always have sex as the last choice as well :)

BTW: I think that the statistics that the SQL Server gathers is used to figure out this as well. I have to look into that. See it is helpful to write down your knowledge - it gets you thinking.

When I first learn about selectivity and the order of fields in an index I wonder if I had to switch the order of the fields in my select as well. But no as long as your where part only uses equal (= sign) the SQL Server figures this out.

But if you uses a non-equal like <> then its another story.Because it has to check all the fields that are remain if they are different from the <> field.

Lets go back to our example. If the index is Age, HairColor and Sex then I would be fine writing:
select * from crowd where Age = 40 and HairColor = 'Brown' and Sex <> 'Man'. Using our index and our easy assumption about the distribution of the crowd,  when we have found the age and haircolor there are only to people to ask.

But if I with the same index ask for select * from crowd where Age <> 40 and HairColor = 'Brown' and Sex = 'Women' then the index is not very helpfull and the SQL Server has to ask (scan) everybody what there age is and then ask the remainer about color and sex.

So if this is a select statement you use a lot then a HairColor, Sex and Age index would be better.

So lesson learn by me was when designing cluster and indexes consider first 3 things. How does data arrive - is it like prices for today like in the previous blog. How do I use the data? The selects that our systems uses to get data. And finally what are the selectivity on the different fields that I consider for cluster and/or indexes. This is actually very simple steps to better performance.

I think in my next post, I might tell you a little about how I found out that the execute plan can do wonders telling where the bottlenecks are. It will also give you a hint if your index needs to be looked at.

BTW only 10 days until I will leave for the SQL PASS Summit :)

No comments:

Post a Comment