SQL Server Graph Database of US Capitals

SQL Server 2017 introduced graph database functionality. Graphs are a core concept in computer science and I was excited to hear of this new feature.

Robert Sheldon has a great series on SQL Server’s graph database here. If you’re interested in learning about graphs in SQL Server, it’s a great place to start.

While there’s countless relational databases out there for practice, there’s not much in the way of graph databases. It is my intent to share my graph databases with the world in hopes that it removes the friction associated with your learning.

US Capitals is a popular data set for working with graphs. Nodes identify a state capital. An edge connects a capital in one state with the capital of a neighboring state. Only the lower 48 states are present. While the data is readily available, I was unable to find TSQL scripts to create the graph using SQL Server 2017 graph database. I created those scripts and have made them readily available on GitHub.

Create a SQL Server database and execute these scripts in order:

1_CreateGraphNode_Capitals.sql

2_CreateGraphEdge_Distances.sql

3_InsertNode_Capitals.sql

4_InsertEdge_Distances.sql

There are two sets of INSERT statements in order to create a bi-directional graph. Without both edges, the graph could only be traversed in one direction.

Once you have a usable graph database, you can begin programming against it. In my next post, I’ll go over my PowerShell implementation of Dijkstra’s algorithm, which is a classic shortest path algorithm.

-Cheers

Parallel Ola Hallengren Backups

When a SQL Server instance has a large number of databases (100s or 1000s), it can be very challenging to complete backups in a timely manner. Using common methods such as Ola Hallengren scripts or native maintenance plans often take a long time due to serial processing (1 processor). So the question becomes, how do we perform backups in parallel using multiple processors?

With any backup automation, there’s a few core requirements that we’ll keep in mind while developing this solution:
* Ensure newly created databases are backed up
* Ensure deleted databases are not backed up
* Ensure we don’t backup a database multiple times

I’ve come up with a method to run Ola Hallengren’s famous maintenance scripts in parallel. Using dynamic SQL we can generate a new database list at runtime, and backup just the databases in that list. There’s no need to maintain a list of databases and there’s no chance a database will be missed. We will create multiple SQL Agent jobs that each refer to a separate list of databases. These jobs can then be executed in parallel.

Decide On The Number of Jobs

First step to this process is to decide on the number of parallel jobs you want to create. For this example, I will use 4 parallel jobs. It would be wise to consider how many processors you have on the system and the load incurred by the server during this process.

Dividing Up The Databases

Every database is given a unique integer ID upon creation. This database_id can be found in the sys.databases system table. Using the modulo function, we can divide up the databases based on database_id and the number of jobs we decided upon (in my case, 4). We can view the modulo value via:

SELECT name, database_id, database_id % 4 AS [mod] FROM sys.databases

Unfortunately, Ola Hallengren’s scripts don’t run on database_id, they run on database name. Let’s convert our script to return a list of database names, remove the hard coded number, and make the script easier to update:

DECLARE @NumberOfJobs INT
DECLARE @ThisJob INT

SET @NumberOfJobs = 4
SET @ThisJob = 0

DECLARE @DATABASES NVARCHAR(MAX)
SELECT @DATABASES = COALESCE(@DATABASES + ',', '') + name
FROM sys.databases
WHERE database_id % @NumberOfJobs = @ThisJob

Notice the new variable @ThisJob. This variable will be different for each SQL Server Agent job you create. Use values 0 through the number of jobs minus one.

Also notice the COALESCE call which includes a comma. This is critical as it will generate a comma separated list of database names which we can then pass into Ola Hallengren using the @DATABASES variable.

Dynamically Call Ola Hallengren

Looking at the default parameters for Ola Hallengren’s DatabaseBackup procedure, we can see that all databases are backed up via the @Databases = 'USER_DATABASES' parameter value. Let’s convert a DatabaseBackup call to dynamic SQL:

DECLARE @SQL NVARCHAR(MAX)

SELECT @SQL =  
'EXECUTE [dbo].[DatabaseBackup] 
@Databases = ''USER_DATABASES'',  
@Directory = N''C:\SQLBackups'', 
@BackupType = ''FULL'', 
@Verify = ''Y'', 
@CleanupTime = 48, 
@CheckSum = ''Y'', 
@LogToTable = ''Y'''

--print @SQL
EXEC (@SQL)

And now, let’s combine the modulo script and the dynamic SQL script. This will be our final product that we can put directly into a SQL Agent job.

DECLARE @NumberOfJobs INT
DECLARE @ThisJob INT

SET @NumberOfJobs = 4
SET @ThisJob = 0

DECLARE @DATABASES NVARCHAR(MAX)
SELECT @DATABASES = COALESCE(@DATABASES + ',', '') + name
FROM sys.databases
WHERE database_id % @NumberOfJobs = @ThisJob

DECLARE @SQL NVARCHAR(MAX)

SELECT @SQL =  
'EXECUTE [dbo].[DatabaseBackup] 
@Databases = ''' + @DATABASES + ''', 
@Directory = N''C:\SQLBackups'', 
@BackupType = ''FULL'', 
@Verify = ''Y'', 
@CleanupTime = 48, 
@CheckSum = ''Y'', 
@LogToTable = ''Y'''

--print @SQL
EXEC (@SQL)

Pay attention to where the @Databases parameter is set to our modulo generated list of databases. By calling the modulo script everytime, we guarantee that new databases will be backed up and deleted databases will not.

You Must Create Multiple Jobs

To ensure you backup every database on the instance, you must create multiple agent jobs. How many is determined by the value you specified in @NumberOfJobs. If you leave a value out, then all the databases with that database_id modulo result will not be backed up.

In practice, I name my jobs using the default job names with a suffix that corresponds to which modulo result the job uses. I script one of the jobs out, then modify the suffix and the @ThisJob variable. This gives me a tidy set of agent jobs:

ParallelOlaJobs

There’s definitely room for more automation in this solution. I still need to automate the deployment of all jobs, whichout the need to modify the suffix or @ThisJob. But that’s a simple matter of PowerShell string concatenation. Stay tuned for a future post with that solution.

-Cheers