Generate Test Data with Faker & Python within SQL Server

Make sure you’ve done these steps first

  1. You’ve installed SQL Server with Python
  2. You’ve then installed pip
  3. You’ve also installed Pandas using pip

Then let’s get started

We’re going to use a Python library called Faker which is designed to generate test data. You’ll need to open the command line for the folder where pip is installed. In my standard installation of SQL Server 2019 it’s here (adjust for your own installation);

C:\Program Files\Microsoft SQL Server\MSSQL15.SQL2019PYTHON\PYTHON_SERVICES\Scripts

From here you want to run the following command to install mimesis;

Once it’s done we’ve got it installed, we can open SSMS and get started with our test data.

We’re going to get started with the sample queries from the official documentation but we have to add a print statement to see our results because we’re using SSMS;

If you run this in SSMS you’ll see the output in the messages window

This guy loves quality legwear

Now we know that works, let’s put this into a useable format within SQL Server.

This is going to be our block of Python;

For the purposes of this example, we’re going to make a temp table to store the data and view what we’ve done. Wrapping this python script into t-sql will give us an output like so;

Go ahead and run it, you should see a sample of 100 names and addresses that are currently stored in your temp table;

There are far more options when using Faker. Looking at the official documentation you’ll see the list of different data types you can generate as well as options such as region specific data.

Go have fun trying this, it’s a small setup for a large amount of time saved.