Install PRAW with SQL Server and Python

So you’ve installed SQL Server with Python and you’ve also installed pip onto your SQL/Python Instance? Great! Let’s go ahead and get the PRAW package installed.

PRAW stands for Python Reddit API Wrapper and it can be used for creating bots as well as scraping data directly from Reddit. We’ll be using pip to install as it’s the recommended method of installing PRAW.

First of all, find your Python Services Scripts folder. On a standard installation of SQL Server 2019 it’s in this location

C:\Program Files\Microsoft SQL Server\MSSQL15.SQL2019PYTHON\PYTHON_SERVICES\Scripts

Open the command prompt at this file location and run

It’s going to go ahead and download/install PRAW for you

To check PRAW is installed go ahead and run the following command

If you do not have PRAW installed you’ll get an error message similar to this. Notice the line ImportError: No module named ‘praw’.

Whereas with a successful install you won’t see the error message

Congrats, you’ve installed PRAW. Well done!

Installing pip on SQL Server with Python

One of the easiest ways of getting packages into your Python install is to install pip.

First thing you’ll want to do is to make sure that you already have SQL Server installed with Python (check my post about that here). Once you’ve done that we’re ready to begin.

Let’s go ahead and download the pip file. Go here https://bootstrap.pypa.io/get-pip.py and save the page to your local machine. I’m going to save it to my Python folder.

Because I’ve used SQL Server 2019 (named SQL2019PYTHON) the location is below. You will want to find your version of this folder. Go ahead and make sure your get-pip.py file is in here.

C:\Program Files\Microsoft SQL Server\MSSQL15.SQL2019PYTHON\PYTHON_SERVICES\

To test you’ve got the right folder, open the command prompt for this location and type ‘python’. You should see what version of python is installed on your machine.

Once you’re sure you’ve got the right folder you want to go ahead and run

This is going to go ahead and install pip on your server.

To confirm your pip has installed correctly you’ll want to go into the Scripts sub folder

C:\Program Files\Microsoft SQL Server\MSSQL15.SQL2019PYTHON\PYTHON_SERVICES\Scripts

And in the command line run the following command

You should see the specific version you have running.

And that’s it. You’ve got pip installed, you can now use this to get the packages you want.

Installing Python with SQL Server

First things first, you’ll want to choose your version of SQL Server. Python is available on 2017 and greater. For this demo I’ll be using SQL Server 2019 Developer Edition (CTP 2.2).

With 2019 CTP2.2 they’ve increased the requirement of your OS too, in my example I had a spare VM with Windows Server 2012 laying around but I needed to update this to Server 2016. Check the relevant documentation for the version you’re using.

Once you’ve got your install file ready, go ahead and get started with the installation.

When selecting your installation type, you want to choose a custom install

Give it somewhere to save your media

After selecting Install it’s going to go ahead and download your initial installation files. This should only take a couple of minutes based upon your download speeds.

Once downloaded feel free to read through the documentation.

Finished? Great. Now we can go ahead and get SQL Installed. We’re going to go to the installation tab and choose a fresh install.

Because this instance is not going to be doing any sort of production workload we’re going to go ahead and choose Developer Edition. This edition is free and comes with all enterprise features but is only for development work (check your licence terms if needed)

Once you’ve read the T&C’s you can decide to allow SQL to use Microsoft Update to check for patches. This is totally your call.

The installation is going to check any inconsistencies against it’s internal rule list. You’ll have to fix any failures here before moving on. I’ve just got a warning about Windows Firewall but I’m going to leave that for now

The next step we’re going to choose here is to install the database engine but also choose to install Machine Learning Services with the Python option.

Note: you can install Machine Learning Server as a standalone feature. We do not want to do this.

I’ve left the instance directories as standard. Feel free to change these if you would prefer them elsewhere.

Your next choice will be leaving as the default instance or giving your instance a name. I’m going name mine and call it SQL2019Python.

Next, choose what service accounts you want to boot at startup. I like agent starting so I’ll set that to automatic.

Notice the little check box underneath that talks about Instant File Initialisation. You want this ticked. If you’re not sure what it does then just trust me, you’ll thank me later.

Next, you’ll choose your Authentication modes and give your SA account it’s password. I’d also suggest clicking the “Add Current User” button just in case.

Next, read and consent to the Python Install.

You’ll see a summary of your installation options. Have a read and make sure you’re happy with everything. Once you are then go ahead and click Install.

Download is going to depend on your download speeds again so go make a cuppa while you wait, you deserve it.

Eventually you’ll be greeted by a lovely screen that tells you that everything is installed.

The first way to check that your instance is up and running is to go to your Services app and check that you have your SQL instance running.

Congratulations, SQL and Python are installed!

To double check that Python is working you’ll want to open SQL Server Management Studio (SSMS) (if you don’t have it check this post out) and connect to your instance.

Once connected, run the following script;

By default, this option is not enabled. To allow it you’ll need to run the following command.

Now run the first script and you’ll see that it’s now enabled.

Now to confirm Python is working go ahead and run a very simple Python script;

You’ll see the output in the messages tab

Congratulations, you have SQL Server installed with Python and it works!

Installing SQL Server Management Studio (SSMS)

SSMS is the way that most people query their SQL Server instances so let’s go ahead and show how to install it.

First of all, head over to Microsoft’s download page for the latest version of SSMS and go ahead and download the latest version.

Once the file is downloaded go ahead and put it on the machine that you’ll be installing SSMS (this doesn’t have to be your SQL Server, any machine that has network connectivity to that machine is totally fine).

Open the file and you will be met with the splash screen.

You should obviously read the licence terms. Once you’ve done this go ahead and start the installation.

It’s going to do a lot of work without your input, just sit and wait through the process.

Once it’s done it’s thing you’ll need a quick reboot

Once you’re restarted you’re good, it’s all installed and ready to use.

Query Store Missing Intervals

This was an interesting one to narrow down.

When messing about (developing) things that are based on the query store I came across an unusual phenomenon. We were missing query store intervals. It’s not that data had been deleted but rather than it was never created in the first place. Take a look at this;

 

I’d set the query store to 5 minute intervals for testing purposes and left it. Notice that there is no data between 15:20 and 15:25. It’s not that the data has been deleted, we know that because the runtime_stats_interval_id is still creating increments correctly.

Turns out, that if there is no activity on your user database for any given interval then the Query Store will just decide that it doesn’t need to capture anything at all for that interval.

This makes sense from a Query Store perspective, but, just don’t rely on those intervals being there if you’re joining this data to other performance metrics like I was…

Presenting at SQL Saturday Manchester

So, some excellent and surprising news, I’ve just been chosen to speak at SQL Saturday 730 in Manchester this July. Now, it wasn’t entirely a surprise as I designed and submitted three presentations (very deliberately) to their website before the deadline. I suppose you could say that it’s at least partially my own fault.

It was an interesting thing to find out about as a relatively new speaker in the community. First of all, there were 164 sessions submitted for a total of 30 available slots which doesn’t put the odds in the favour of an unknown name in the community. This does say a lot for the selection process for SQL Saturdays that they take the merits of the presentation rather than the reputation of the speaker. Great news for people who are considering embarking on a speaking career in the SQL Server community.

This presentation is one I’ve given before so I’m pretty comfortable with it, I’ve tweaked it since I gave it last but there’s a good foundation there. I’ve moved it over to the SQL Saturday powerpoint template and made sure that my demos are up to date. That’s pretty much the hard work done.

I’ll post updates before the day if I have any, otherwise I’ll make sure to give an update after it’s done and my heart rate has returned to normal.

Now to do the usual things, check floor plans for the venue and try to work out which room you’re going to be presenting in, how many people it seats and how empty it’ll be on the day…

T-SQL Tuesday #102 – Giving Back

This month’s T-SQL Tuesday has been chosen by Riley Major and invites people to give back to the community

To give you a little background on me currently. I began working at a startup a couple of years ago. It’s a software house with a product that’s based on SQL Server on the back end. They were getting to a point where they’d never had a proper development DBA before and couldn’t put off having one for much longer. Enter the sandman.

One thing I’ve done here is to train people on the depths of SQL Server and to give people the tools to be able to write faster and more reliable code. I realised this year that I’ve spent a lot of time with people 1 on 1 which has been very effective but the downside to this is that it’s very time consuming on my part and it can be difficult for people to find some of my time to be able to train them.

I decided to commit to run training courses for one lunch time a month for at least a full year (bolded to hold me to this). I’ll then make them available to anybody who thinks they’d benefit from the training. We’re now part of a much larger company (as the startup got bought out) and have colleagues across the globe so making this training available to them was going to be essential. The first session was to a full room and the second (tomorrow) is looking to be just as busy so that’s good news. I’ve also made sure the recordings are available on our document management system so that remote employees or people who just couldn’t make it that day can watch it at their convenience.

I’ve got an idea of some future topics but I’m also making to take note of feedback and designing future sessions to meet what people think will be interesting.

I’ve also spoken at a couple of small events and I’m planning to use the ideas taken from the internal talks and submit sessions to local events (SQL Saturdays, User Groups etc) and I’ll update here when I find out whether my talks have been successful or not.

I also need to blog more, but that’s easier said than done….

T-SQL Tuesday #101 – Building your Emergency Toolkit

This month’s T-SQL Tuesday is being hosted by Jens Vestergaard. The idea is to speak about essential SQL Server tools and how you use them in your environment.

This is a subject that fascinates me. Over time I’ve built a set of tools that makes analysis of servers much easier and quicker. These are mostly community tools or tools inspired by other’s blog posts or stack overflow posts. I’m looking forward to checking everybody else’s posts on this subject.

I won’t re-hash what I’ve said previously but here’s the presentation I gave about this subject on GroupBy.org

And here’s a link to the page about it.

T-SQL 98 – My Technical Challenge – Clustered Index Conversion

This month’s T-SQL Tuesday subject has been selected by Arun Sirpal and is titled Your Technical Challenges Conquered. For my post I’m going to talk about the challenge of changing a Non-Clustered Primary Key into a Clustered one. This sounds relatively simple but gets more complicated as you delve into it.

First of all, what are the reasons for doing this? A table without a Clustered Index is called a HEAP. Now, heap tables do have their place but they’re not appropriate for most workloads in a relational database, they’re great if you need to insert a lot of data into a table quickly but reading from a heap is notoriously slow in most instances. Heap tables are stored on disk wherever SQL Server finds space, this can be spread around your disk. Clustered indexes are stored nicely ordered on disk and all of the data is stored together. You can monitor how many forward fetches you’re having from heap tables to see whether this particular issue is a problem for you.

Let’s assume that it is an issue and we want to add a clustered index to our table. A quick fix to this would be to just add a clustered index additionally to any other indexes we have on this table. This approach is a little messy though and isn’t the nicest way to fix this particular issue. There are exceptions, but generally if you have a Primary Key on a table then this will be your clustering key also (use cases may vary).

Unfortunately there isn’t a simple way to alter a table in order to turn a non clustered PK into a clustered one, this is because it’s such a fundamental change to how the data is stored on disk. In order to make this change you need to drop your existing primary key and then recreate it as a clustered primary key. Sounds simple right? Well, it can be, but it’s not as simple as that.

One thing about Primary Keys is that they can be used by Foreign Keys from other tables. You can’t drop a PK if there are FKs currently using it. Great, we’ll have to script all of these out, drop them, then recreate them after we’ve altered our table. We’re also going to assume that people have had the ability to alter our tables from whatever source control we have so we’ll have to do this dynamically.

One thing we wanted to check for was whether somebody had created a clustered index on the table but it was not on the primary key. In this use case I needed the clustered index to be on the primary key so we store the details of the other clustered index and drop that also.

You can get the sample scripts from my GitHub repo.

Script One – We’re just going to make some sample tables so that we can understand how this process works. Do not create these tables in a database that you care about (hands off production!).

Script Two – This shows the current status of our keys and how the tables relate to each other. Run this once you’ve run script one so you can see how it looks. Keep it around and we’ll run it after script three too. You should see something like this;

Take a look at the Foreign keys based on our main table and the clustered/nonclustered indexes. Check script one for more details on these.

Script Three – This is our script that’s going to actually do the work. Have a read through this script to get an idea of what it does. You’ll notice that I’ve commented out the execution statements. This is so that you’ll see what it’s planning to do in your messages tab in SSMS before it actually does anything. If you run it, you should see something like this;

As you can see, what it wants to do is to drop the two foreign keys first, it’s then going to drop the existing clustered index on our PKTest_Main table. It’s then going to recreate the primary key as a clustered one, once it’s done this it’s going to reapply the foreign keys. In this instance it’s also not going to reapply our old clustered index as it’s not necessary.

If you copy these messages into a new window and run them (or un-comment the exec statements in script three) it’s going to actually make the alterations above. Once this is done, the results from script two should look something like this;

You’ll notice that our primary key is now clustered and the unnecessary previous clustered index is no longer present. Lovely.

In production (and in change control) you’d probably want this as a stored procedure with the pass through parameters as @TableName = ” and (probably) @Execute = 1 (default 0) so that you don’t fire this proc by accident.

Let me know your thoughts on this, if you’d make any changes to any of the scripts.

 

Visualising the Marvel Cinematic Universe in T-SQL

This post is inspired by a blog post I saw here which showed an interesting way to visualise data. The example was not valid syntax for SQL Server so I thought I’d make a version that does work in T-SQL.

Let’s make some sample data. We’re going to use a list of all title characters (heroes with their own independent movies) in the MCU and which films they’ve played a role in.


IF OBJECT_ID('tempdb..#MCUData') IS NOT NULL DROP TABLE #MCUData
CREATE TABLE #MCUData (Hero_Name nvarchar(40), Movie_Name nvarchar(40), Release_Year int)
INSERT INTO #MCUData (Hero_Name, Movie_Name, Release_Year)
VALUES
 ('Iron Man', 'Iron Man', 2008)
,('The Hulk', 'The Incredible Hulk',2008)
,('Iron Man', 'Iron Man 2', 2010)
,('Thor', 'Thor', 2011)
,('Captain America', 'Captain America: The First Avenger', 2011)
,('Iron Man','Marvels The Avengers',2012)
,('The Hulk','Marvels The Avengers',2012)
,('Thor','Marvels The Avengers',2012)
,('Captain America','Marvels The Avengers',2012)
,('Iron Man', 'Iron Man 3', 2013)
,('Thor', 'Thor: The Dark World',2013)
,('Captain America', 'Captain America: The Winter Soldier',2014)
,('Star Lord', 'Guardians of the Galaxy',2014)
,('Iron Man','Avengers: Age of Ultron',2015)
,('The Hulk','Avengers: Age of Ultron',2015)
,('Thor','Avengers: Age of Ultron',2015)
,('Captain America','Avengers: Age of Ultron',2015)
,('Ant Man','Ant Man',2015)
,('Captain America','Captain America: Civil War',2016)
,('Iron Man','Captain America: Civil War',2016)
,('Thor','Captain America: Civil War',2016)
,('The Hulk','Captain America: Civil War',2016)
,('Ant Man','Captain America: Civil War',2016)
,('Spider-Man','Captain America: Civil War',2016)
,('Doctor Strange','Doctor Strange',2016)
,('Star Lord','Guardians of the Galaxy Vol. 2',2017)
,('Spider-Man','Spider-Man: Homecoming',2017)
,('Thor','Thor: Ragnarok',2017)
,('The Hulk','Thor: Ragnarok',2017)

Now, I wonder which character has appeared in the most movies. Let’s work that out. The nifty thing here is that we’re going to see the results in SSMS as a row graph chart, try this;

SELECT 
     mcu.Hero_Name
    ,REPLICATE('|',(COUNT(*)/CONVERT(decimal,m.Movies))*100) Graph
FROM #MCUData mcu
CROSS JOIN (SELECT COUNT(DISTINCT Movie_Name) Movies FROM #MCUData) m
GROUP BY mcu.Hero_Name, m.Movies
ORDER BY mcu.Hero_Name

Take a look at the results pane, you can see that Iron Man and Thor are the most reoccurring Avengers with 35.3% movies each.


Please feel free to point out all of the reasons The list of characters & movies is wrong in the comments section below. I doubt I’d be able to stop you 🙂