Contributing to Open Source

I’ve always been a fan of giving back wherever possible.

I work for a software vendor where each customer hosts their own SQL Server on site, which means that I indirectly support about 400 servers around the world, all of which have different hardware specifications and unique things to think about. A lot of what I do is jumping on customer’s sick SQL Servers, working out what’s wrong and working out a solution. I use a number of tools so that I can gather information automatically rather than manually pull it each time.

A major set of tools that I use is the Brent Ozar First Responder Kit. This is an excellent set of tools that give a great range of stats about your SQL Server and links to read further if you’re unsure what the stats mean. This is great for getting people with less experience up to speed and still extremely useful for more experienced users.

I’ve got a .NET tool which runs these scripts (among others) and exports to Excel, such a time saver. I added additional checks on my local versions for things that were specific to what I wanted that weren’t in the main scripts (I still do for certain things that may be unsuitable to be included in the main scripts, see my recent post on sp_regread).

In June 2016 these scripts were open sourced the team open sourced these scripts and put them up on GitHub. After browsing the issue list I decided to help out a little in order to give back and make these scripts even more useful for others. The changes I’ve made aren’t large ones but there is definitely a sense of accomplishment when those pull requests get accepted.

 

This kit gets released once a month and they credit who has coded on the release, it’s nice seeing your name on their release notes (e.g. here and here). It will also not be a bad thing for the old CV when it comes to it, you get your own profile which details what you’ve been contributing on. You even get a cool little graph which tells you the volume of your own contributions.

If you’ve ever seen an open source project that you thought it might be fun to contribute to, I urge you to go ahead and give it a go.

 

 

 

sp_regread – Undocumented Stored Procedure

I came across an interesting undocumented stored proc today and thought I’d share what I found out.

Situation: I currently have a set of scripts to run when I jump on a server to bring me up to speed with what’s happening. They check all manner of things so that I don’t have to check everything manually. They’re extremely useful and save me a lot of time.

Problem: One thing customers do is run multiple instances of SQL Server on the same machine (not virtualised). This is bad because it’s extremely difficult to work out which server is causing certain statistics. It’s like a crowded lift, you have no idea who farted.

Solution: xp_regread can tell you via T-SQL how many instances of SQL Server you have running on the box.

EXECUTE xp_regread
@rootkey = 'HKEY_LOCAL_MACHINE',
@key = 'SOFTWARE\Microsoft\Microsoft SQL Server',
@value_name = 'InstalledInstances'

The output will look something like this;

ValueValueData
InstalledInstances – Item #1MSSQLSERVERNULL
InstalledInstances – Item #2INSTANCE2 NULL

Issues: As a software vendor we always connect as an admin account so this always works. You do need a certain level of permissions for this to work, if you’re only in a basic group then you’re probably out of luck I’m afraid. Time to buy the DBA team a bottle of whisky and ask really nicely for your permissions to be elevated.

SQL Server Memory Utilisation

Memory Utilisation

OK, let’s say this straight off the bat, SQL Server is a bully. It will sit on your server and use as much RAM as it can get its hands on if you let it. This is great for the performance of your SQL queries but if it’s fighting other processes for those resources then you may start running into issues.

 

But I’m running other processes on my SQL box, how do I throttle SQL?

Simple, this is a SQL box, not a crowded nightclub where everyone is knocking into each other and shouting for the bartender’s attention. You should be using dedicated hardware for your SQL server, move your other processors onto a separate server. SQL licences are expensive, why would you restrict its ability to do its job?

 

OK, I’ve got it on dedicated hardware, do I need to do anything?

Now this all depends on how your SQL instance is behaving. SQL will attempt to load all of its data into RAM where it can. Accessing data from here is much faster than having to go and retrieve it from disk (even if you’ve got it on lovely SSD’s). If you’ve got 5GB of data on a box with 12GB RAM then I doubt you’re seeing an issue, whereas if you’ve got 15GB of data on that same box then you’re likely to start running into problems (however, more data and you’re going to want to think about investing in more RAM, the cost of this pales in comparison to the cost of SQL licences).

 

OK, so I might have a RAM issue, what do I do about it?

SQL does have settings to enable you to set a maximum amount of memory that it can use, this can ease contention with things like the operating system or other background tasks. A good starting point is to leave 4GB available for the OS and limit SQL to using the rest. This does depend upon what else you have running on the box so may require tweaking. Try setting your max server memory to 4GB less than you have on the box and go from there. Monitor your memory usage and see if you can give SQL that extra bit more memory.

 

How to change SQL memory settings

I’ve restricted SQL to less memory, is this going to cause me a problem?

One of the main statistics that you can get out of your SQL instance is the Page Life Expectancy. This is quite a big topic so I’ll be following this up with its own post later.

SQL Server Parallelism Explained

What is Parallelism and why should I care?

You know that wait stat you sometimes see; ‘CXPACKET’? You know all of that blocking you’re getting on your server? You know of those times when a user is waiting on the system and there’s nothing obvious causing it? That’s why you should care. Now, parallelism isn’t a silver bullet to fix all of your problems but the standard settings for this are so ancient that this could be causing you major headaches.

 

First of all, a quick pop quiz

In SSMS, right click on your server and go to properties. Open the ‘Advanced’ tab and scroll to the bottom. Take a look at Cost Threshold for Parallelism and Max Degree of Parallelism. I bet this is set to 5 and 0 right? If not, good for you, but do you understand these settings? If you are seeing a 5 and a 0, don’t worry, these are the defaults but you should be aware that they’re rubbish and you’ll want to be changing these.

 

Parallelism Basics

Big queries are hungry, they take a lot of CPU to process, it would be great if we could split up the work that they can do so they run faster right? Yep, it’s a great idea. Instead of using a single core of that quad core processor your server is running on, wouldn’t it be a great idea if we could use more and run our query in parallel? Yep, but there’s some gotchas. Let’s talk about these two settings before we think about changing them.

 

Cost Threshold for Parallelism

Let’s get the first question out of the way, what does this number stand for? What exactly is (the default) of 5? Is it seconds?

  • Short answer: Yes but no.
  • Long answer: it was a count of seconds a long time ago in a land far far away (Microsoft HQ, Washington). Now it’s just a number that we can use to compare the relative size of queries.

This number is calculated by the optimizer when a query is compiled. Statistics are compared and SQL decides how much it thinks the query is going to cost. If the threshold is over this number then it’s going to try to run parallel. But how many cores is it going to use?

 

Max Degree of Parallelism (MAXDOP)

Optimizer has decided that the query it’s going to run is big enough to run parallel, how parallel it goes is based upon this setting. The SQL default is set to 0 which for this setting means unlimited. This means that this massive query we have is going to use as many cores as it likes, on that quad core processor it’s going to split itself into 4 and use all of the CPU so that it can run as fast as it can. Sounds great right, this big query is going to run really fast? Not in the real world.

 

Why are the defaults rubbish?

Technology has come quite a way since SQL was designed. What used to be a large query is, comparatively, tiny by today’s standards. Queries with a cost of 5 or more is likely to be quite a number of queries, chances are that you’re going to have relatively small queries going parallel. When they do, your MAXDOP being set to 0 (unlimited) is going to mean that the majority of your queries are going to be taking up the whole CPU every time they are run. They’re going to be fighting over your resource which is going to be killing your performance and it likely to be a cause of all of those blocks that you’re seeing.

 

Damn, how do I fix it?

Great thing is, this can be a quick win but it might take a bit of tinkering to get the best possible settings for your application.

  • A good place to start for the Cost Threshold setting is to stick it to 50 to begin with. You may feel that you want to reduce or increase this setting as time moves on but give it a go.
  • MAXDOP will require a bit of working out for you, but only a bit! How many cores does your server have on the CPU? 4? 8? Try setting your MAXDOP to half of the number of available cores up to a maximum of 4 without hyper threading and 8 with hyper threading turned on.

Don’t leave these settings and walk away. They’ll be fine, but what if a little tweak helps further? Continue to monitor your blocking to see if this alleviates the problem.