Pages

Friday, September 30, 2011

SQL PASS Summit plans

I think I have my execution plan for the summit ready. Me and my co-worker will leave Denmark on Saturday the 8th of October around noon - yes that's only a week from now :) Yeahhhhhhh
We have to fly to Amsterdam and from there to Seattle where we will be a couple of hours later - not considering the time difference :) So it will only be around 5pm when we arrive at the SEA airport.

We are staying at The Paramount, in working distance from the summit - even after a few drinks.

I was hoping to catch a Seahaws game that weekend but unfortunately there play away. I'm a big fan of american football (called that here since we call soccer for football :)) so it would have been nice to see a live game. Even though I'm a Vikings fan - that would have been great. Anybody know if there is a college football game that weekend near by? Wachington Huskies an't playing the weekend at the arrival.

Well I guess we will just take care of the jet lag on Saturday and maybe get a steak :) love steaks. But on Sunday it will be time to see Seattle. We both know another great Dane which comes to the summit and I would love to try to do some stuff with him on Sunday. Maybe a look at one off my troubled SQL Servers over drinks :) See that would be a nice day!

As this is my first trip to the summit I don't really know what I'll (we) doing in the evenings. But I like to mingle so I will figure out something to do. Hopefully I find someone which wants to talk about SQL stuff over a couple of whiskeys - my two top choices. You too? Then we have a play date lined up.

From Monday it's all about the summit. My current plan looks like this:


Monday, October 10, 2011
8:30 AM - 4:30 PM[AD-498-P] Fixing Queries With Advanced T-SQL Constructs[RM:609-610]  
Tuesday, October 11, 2011
8:30 AM - 4:30 PM[DBA-399-P] No More Guessing! An Enlightened Approach to Performance Troubleshooting[RM:4C1-2] 
Wednesday, October 12, 2011
10:15 AM - 11:45 AM[DBA-311-S] Bad plan! Sit![RM:6E] 
1:30 PM - 2:45 PM[DBA-406] Tips for handling large data volumes[RM:4C1-2] 
4:45 PM - 6:00 PM[DBA-320] Why are we Waiting..[RM:606-607] 
Thursday, October 13, 2011
10:15 AM - 11:30 AM[DBA-403] Advanced SQL Server 2008 Troubleshooting[RM:6E] 
1:30 PM - 2:45 PM[DBA-307] 100% Availability without 100% Uptime[RM:2AB] 
3:00 PM - 4:15 PM[DBA-319-C] All the Magic Knobs[RM:2AB] 
5:00 PM - 6:15 PM[DBA-317] Discovering the Plan Cache[RM:602-604] 
Friday, October 14, 2011
10:15 AM - 11:30 AM[DBA-305] Inside the SQL Server Transaction Log[RM:602-604] 
1:00 PM - 2:15 PM[AD-305] Rewrite Your T-SQL for Great Good![RM:615-617] 
2:30 PM - 3:45 PM[DBA-207] Baseline Basics: Where to Start, What You Need[RM:4C1-2] 
4:15 PM - 5:45 PM[AD-308-S] T-SQL Awesomeness - 3 Ways to Write Cool SQL[RM:6E]


This is my priority right at this moment anyway.

And on Saturday we go back :( at lunch time. I guess the plan on moving Europe during that week because the flight back to Denmark takes almost 24 hours - we are not home before Sunday afternoon :) Well I might have to learn something about time zones.

Thursday, September 29, 2011

Let the little ones join up first

The order of tables in a join - third blog in my series "From newbie to DBA"


In my last blog I promised to look at the simple way to use the execution plan. That will have to wait as I just remember another important tip I found some time ago to easily bust performance when handling large amount of data. And it leads up to one of the things I found to look for en the execution plan.

It's kinda up the ball park with my last post. Its all about giving the SQL Server the best information to work with.

When you make joins between a large table and a smaller one the order of the tables does matter. If I for instance make this query:

select * from BigTable a
inner join SmallTable b on a.Key1 = b.Key1 and a.Key2 = b.Key2

then everything is runned based on BigTable. Back to the selectivity I talked about ind the last post. You start out with a large amount of data and then you select out based on the small amount. Then yYou have a large amount of data to match up.

But if you instead use the SmallTable as base:

select * from SmallTable a
inner join BigTable b on a.Key1 = b.Key1 and a.Key2 = b.Key2

In this case you only have a small amount of data to match up. And you have already reduced the amount of data SQL Server has to go through in order to give you the result.

Its all about high selectivity. And you can see if your query uses small over large right away in the execution plan. More on that later. But you should all the time have in mind that the fastest solution is high selectivity

Decimal reduction with ease

A co-worker of mine just asked me today how the best way to get a some percentages outputted with just two decimal - like 0.00 up til 100.00.

Well the simple answer is just casting the output to decimal(5,2). This decimal floater type can hold five digits where 2 of them will be after the decimal point.

So select cast(100 as decimal(5,2)) will output 100.00

A simple example is here:


set nocount on
/* Make a demo table holding the numbers */
declare @numbers table(num float)
/* Counter and a stop number */
declare @num float = 1, @maxnum float = 10000
/* Insert the counter and count it up by something until we reach the stop number*/
while @num < @maxnum
begin
insert @numbers select @num
set @num = @num + 12.3456789
print @num
end
/* select the output and show the casted decimal(5,2) number */
select num as Number, num/@maxnum*100 as FloatNumber, cast(num/@maxnum*100 as decimal(5,2)) as DecimalNumber from @numbers

Wednesday, September 28, 2011

Select me first please

The order of columns in an index - second blog in my series "From newbie to DBA"


When writing this new blog and really start thinking "oh there is a long way to I can call my self a real DBA". There are still a lot of things I need to learn. But as long as I keep learning I'm happy :)

I started this blog as an inspiration toward SQL PASS Summit 2011 in Seattle, which I really looking forward to. My goal is to put a lots of tips and tricks from one newbie to another trying to use terms and descriptions that are easy to understand and implement. So if anyone finds this useful then please comments.

Yesterday I talked about clusters and how the order of the cluster has an impact on inserting/deleting on heavy tables. Back when I did this my self I found out an other thing which I wasn't aware of but really is quit logical when you thing about it. It is selectivity which can be categorized as high or low.

Just a little background info on index before I get to the real point.

Imaging you have a group of 1000 people standing i front of you and from that crowd you should select all women which has brown hair and at the age of 40 - like 3 fields: Sex, HairColor and Age. You have to run around in the crow and ask each one about the 3 parameters and that can take time. So you can put up and index just like having the crowd standing in a certain order. So if I made my index: Sex, HairColor and Age then I could go to the middle of the crowd and ask are you a women (and then I would get slapped in the face :)). If she was I could then ask for the HairColor but if not I could to the middle of the middle and keep doing that until I found a women. I know that they are order Man and Women and Black, Blond, Brown for hair and so on. That way using the index I don't need to ask everybody to select the people I want.

But I can do even better. With the order I have made in the above "index" I only de-select around half namely the men. So when I come to hair color I still have to find them between 500 women. The selectivity on the sex field is low. But if I started with the hair color and let say we distinguish 5 different colors then I only have to find women between 200 people assuming that there are equally as many with each color.

And I can do better yet as it turns out that the selectivity is high for age. Potentially will only 10 (age 1-100) of then be 40 years old and of those 10 will only 2 have the color brown and only 1 be a women :) Well I know this is unlikely but the point been that by using the high selectivity field Age first in my index and HairColor second the SQL Server will faster find into the fields I want. So the index for this crowd should be Age, HairColor and Sex - uhm I guess my wife is pretty effective then - she always have sex as the last choice as well :)

BTW: I think that the statistics that the SQL Server gathers is used to figure out this as well. I have to look into that. See it is helpful to write down your knowledge - it gets you thinking.

When I first learn about selectivity and the order of fields in an index I wonder if I had to switch the order of the fields in my select as well. But no as long as your where part only uses equal (= sign) the SQL Server figures this out.

But if you uses a non-equal like <> then its another story.Because it has to check all the fields that are remain if they are different from the <> field.

Lets go back to our example. If the index is Age, HairColor and Sex then I would be fine writing:
select * from crowd where Age = 40 and HairColor = 'Brown' and Sex <> 'Man'. Using our index and our easy assumption about the distribution of the crowd,  when we have found the age and haircolor there are only to people to ask.

But if I with the same index ask for select * from crowd where Age <> 40 and HairColor = 'Brown' and Sex = 'Women' then the index is not very helpfull and the SQL Server has to ask (scan) everybody what there age is and then ask the remainer about color and sex.

So if this is a select statement you use a lot then a HairColor, Sex and Age index would be better.

So lesson learn by me was when designing cluster and indexes consider first 3 things. How does data arrive - is it like prices for today like in the previous blog. How do I use the data? The selects that our systems uses to get data. And finally what are the selectivity on the different fields that I consider for cluster and/or indexes. This is actually very simple steps to better performance.

I think in my next post, I might tell you a little about how I found out that the execute plan can do wonders telling where the bottlenecks are. It will also give you a hint if your index needs to be looked at.

BTW only 10 days until I will leave for the SQL PASS Summit :)

What a clusterfuck

Clustered index on large tables - first blog in my series "from newbie to DBA"

Well I geuss it's not nice to start your blog by swearing :) But as a former soldier clusterfuck is the correct term for my use of clusters on the SQL Server.

Uptil a year ago I didn't really know what the cluster did or was. But I was about to find out.

As mention in my first blog, our data has grown fast during the last couple of year. I'm managing a stack of SQL Servers which is used to collect data from stockexchanges. My main problem was the table holding the prices.

 This table had close to 900.000.000 records. 8 columns in the table is key fields. The order of those keys was nearly random as they where set up in logical reading order. And I couldn't understand why inserting around 1.000.000 new records every day took so long time - until I understood the concept of clusters.

So if you have the same problem this is where you put the smile back on your face. If you don't have that problem then the smile should already be there :)

When you create a table in SQL Management Studio (MS), using the design editor, your actually really don't pay attention to whats really happend behind the scenes. And as you can see from my CV - in most cases you actually don't need to. I made a good living letting MS do the hard work. But when the complexity increases and you need to find that DBA hat - you are a bit lost.

This was the first time I learned the hard way to use scripts for everything. I still often uses MS, also for designing my tables, but instead of pressing the OK button I always select the little script sign. And I always makes sure I totally understand what the script does.

But back to the cluster. So what happens when you design a table, select som keys an presses the OK button. Lets take a simpel version of my price table. For those prices I need an id (called ISIN), a trade date and a value. So for me the logical order is like I just descriped. Then I select ISIN and TradeDate as keys and I'm done. The table is created just like that and because I have selected the two keys the table has a cluster index on the order ISIN, TradeDate. See the scripts a the bottom of the post.

So when I select * from Prices where ISIN = 'DK00000111' and TradeDate = '20110927' it matches right up with the index and because its a cluster index the data is physically stored in order by ISIN, TradeDate sort of speak. Thats all very nice and quick.

Until.... you have a lot of data. See what happens when I each day stores the prices from today. I have around 500.000 different ISIN's and let say one TradeDate. And because the cluster says the data should be order by ISIN, TradeDate it (SQL Server) makes a lot of moving around to get those 500K of prices in order.

The solution is simple. I should decide the order of the cluster myself. And if I switch around on the two keys and use the order TradeDate, ISIN instead what would that mean? Well then I don't move data around but insert directly order by ISIN for todays prices. And now instead of using an hour for a few million records I just use minuts. And it does not affect my select * from Prices where ISIN = 'DK00000111' and TradeDate = '20110927' as it can still use the cluster index. See the changed script at the bottom of the post.

So whats important when designing tables which will hold large amount of data is design the clustered index meet the best crossing between inserts and selection for your job. Especially if its a detail table which often have more than one key - the natural order of columns is almost never the natural order of the cluster. And about the selectivity of the index I'll write a blog soon about that - as I just found out a few weeks ago that this can be optimized using a certain order of the columns in the index :) More about that later.

As I mention in my real live table for prices I have 8 columns which is the key fields. And sadly enough my prices don't come just for one day as I get prices from all over the world. So some prices may be 5-10 days old when I get them. So there was still a lot of moving going on.

That let me to thing about why did I have a keys on - and why the cluster? For this table we rarely do selects as all the prices we use for calculation is in a upfront table which only holds the last week of prices for quick processing. So I only used the cluster index to insure that records was unique and when a rare select was performed that it would not take ages.

That means I didn't really need the cluster. I could insure the uniqueness and the selection with a normal non-clustered index. So I removed the cluster and just made a non-cluster index. And my insertion of new prices is faster than ever and no data moving or ordering are done. Just into the pin and away we go. And the selection was still fast as the cluster didn't matter for that anyway. The many columns for the cluster and the nature of the data made sure that the SQL Server couldn't just spool data in from the order of the cluster. It had to go digging all the same.

I guess I could have made a non-clustered constraint to insure the uniqueness of that but I figured that the non-clustered index did the same trick and gave me an index all in one.

This is a very simple explanation of cluster and what to do. But the best lesson this gave me was always go the script way. Below you see the above sample table th MS way and the way I would create it.

MS Way


CREATE TABLE dbo.Prices
(
ISIN varchar(12) NOT NULL,
TradeDate datetime NOT NULL,
Value float(53) NULL
)  ON [PRIMARY]
GO
ALTER TABLE dbo.PricesADD CONSTRAINT
PK_Prices PRIMARY KEY CLUSTERED
(
ISIN,
TradeDate
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]


My way 


CREATE TABLE dbo.Prices
(
ISIN varchar(12) NOT NULL,
TradeDate datetime NOT NULL,
Value float(53) NULL
)
GO
ALTER TABLE dbo.PricesADD CONSTRAINT
PK_Prices PRIMARY KEY CLUSTERED
(
TradeDate,
ISIN
)

Tuesday, September 27, 2011

From newbie to DBA


I'm was originally a software engineer which specialized in data acquisition. I have done programs for collecting data from radar systems in the military to prices from stock exchanges. 
When catching large quantities of live data you need a fast and safe data storage and for that I have always, since 1994 anyway, used MS SQL Server. The SQL Server (of any brand) gives the possiblity of throwing a lot of data at it and you don't need to worry about the rest.
When the databases got bigger we just bought a bigger server. It was cheaper than me spendig time optimizing and understandig how to really utililize the true power of the SQL Server.
But about a year ago the data caught up with me. On the project I have been working on since 2006, the amount of data was now so big that my time was cheaper than a new server. I figured a raise would be in order to leverage this but my boss didn't :)
So now I needed to figure out how to really use the SQL Server - after having abused it for 16-17 years. I needed to be a DBA - I think thats the right term :)
In this blog I will try to descripe the steps I went and still are going throug in order to get better performance and to gain knowledge to become a DBA.

In 11 days I'm going to SQL Pass Summit i Seattle and this trip really put action to an old dream about writing a blog. I need to have a place to collect my ideas and troubles anyway so why not share it with you :)