Dijkstra’s Algorithm in PowerShell and SQL Server Graph

In my previous post I shared a SQL Server 2017 graph database of US capitals. Graphs are a computer science core competency and present some interesting challenges for programmers. Most notable of these challenges is finding the shortest path between nodes. Dijkstra’s algorithm is a commonly taught algorithm for finding shortest path. Dijkstra’s is often asked about during entry level developer interviews and it is a great algorithm to implement when learning a new language since it requires utilizing loops, logic, and data structures.

Here’s my implementation of Dijkstra’s algorithm using PowerShell, traversing a graph of US capitals. Rather than manage our own graph nodes and edges, we’ll utilize graph tables and queries in SQL Server. There’s a lot of different ways to implement this in PowerShell, my first cut of this ended up using a hash table so I could perform random access. There’s a give-and-take with custom PowerShell objects, which sacrifice random access for some other benefits.

# Replace these values with your instance and database
$SQLInstance    = 'sql2017'
$GraphDB        = 'graph'

# The city we wish to find the shortest paths from
$CurrentCity    = 'Salem'

Next, I’ll create a hash table, query SQL Server for nodes, and populate the hash table.

# Initialize and populate our graph data structure.
$Cities = @{}

$AllCities = Invoke-SQLCMD -ServerInstance $SQLInstance -Database $GraphDB -Query 'SELECT name FROM capitals'

# Give each city a high mileage and mark them un-visited.
foreach ($City in $AllCities) {
    $Record = "" | Select-Object miles, visited
    $Record.miles       = [int]10000
    $Record.visited     = $false

    $Cities.Add($City.name, $Record)
}

Next, mark your current city as having 0 miles, and store the number of un-visited cities, and start a loop to process all un-visited cities.

$Cities[$CurrentCity].miles = 0

$UnvisitedCities = ($Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false} | Measure-Object).Count

while ($UnvisitedCities -gt 0) {

Query SQL Server for edge information from your current city. This query will return all neighboring capitals and their distances. If you’re not familiar with the MATCH component or querying graphs, see Robert Sheldon’s post.

    $NeighborsQuery   = "SELECT cityTo.name AS toName, distances.miles
                        FROM capitals AS cityFrom, distances, capitals AS cityTo
                        WHERE MATCH (cityFrom-(distances)->cityTo)
                        AND cityFrom.name = '$CurrentCity'"
    $Neighbors = Invoke-SQLCMD -ServerInstance $SQLInstance -Database $GraphDB -Query $NeighborsQuery

Now, for each neighbor, determine if you have found a shorter route through the current node. If you have, update the distance.

    foreach ($Neighbor in $Neighbors) {

        # If the neighbor has been visited, don't process it
        if ($Cities[$Neighbor.toName].visited -ne $true) {
            # Update the distance to the neighbor through the current city
            $CurrentMileage = $Cities[$CurrentCity].miles + $Neighbor.miles

            # If the current path is shorter than the neighbors current shortest path, update it
            if ($CurrentMileage -lt $Cities[$Neighbor.toName].miles) {
                $Cities[$Neighbor.toName].miles = $CurrentMileage
            }
        }
    }

Once we’ve processed all neighboring nodes, we can mark the node as visited.

    # Mark the city visited
    $Cities[$CurrentCity].visited = $true

Because I’m using a hash table, we need to loop to find the next city that we want to visit. Dijkstra’s dictates we visit the un-visited city with the shortest distance.

    # Find the next un-visited city to visit.
    $Shortest = 100000
    foreach ($City in $Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false}) {
        if ($City.Value.miles -lt $Shortest) {
            $Shortest       = $City.Value.miles
            $CurrentCity    = $City.Name 
        }
    }

Once we’ve found it, update the un-visited cities count and close the loop.

<br />    # Update the un-visited cities count
    $UnvisitedCities = ($Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false} | Measure-Object).Count
}

Finally, spit out the results by returning the hash table. The hash table will now have updated distances which reflect the shortest distance from your starting city.

# Return the results
$Cities

Here’s the entire script:

# Replace these values with your instance and database
$SQLInstance    = 'sql2017'
$GraphDB        = 'graph'

# The city we wish to find the shortest paths from
$CurrentCity    = 'Salem'

# Initialize and populate our graph data structure.
$Cities = @{}

$AllCities = Invoke-SQLCMD -ServerInstance $SQLInstance -Database $GraphDB -Query 'SELECT name FROM capitals'

foreach ($City in $AllCities) {
    $Record = "" | Select-Object miles, visited
    $Record.miles       = [int]10000
    $Record.visited    = $false

    $Cities.Add($City.name, $Record)
}

$Cities[$CurrentCity].miles = 0

$UnvisitedCities = ($Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false} | Measure-Object).Count

while ($UnvisitedCities -gt 0) {

    $NeighborsQuery   = "SELECT cityTo.name AS toName, distances.miles
                        FROM capitals AS cityFrom, distances, capitals AS cityTo
                        WHERE MATCH (cityFrom-(distances)->cityTo)
                        AND cityFrom.name = '$CurrentCity'"
    $Neighbors = Invoke-SQLCMD -ServerInstance $SQLInstance -Database $GraphDB -Query $NeighborsQuery

    foreach ($Neighbor in $Neighbors) {

        #if the neighbor has been visited, don't process it
        if ($Cities[$Neighbor.toName].visited -ne $true) {
            # Update the distance to the neighbor through the current city
            $CurrentMileage = $Cities[$CurrentCity].miles + $Neighbor.miles

            # If the current path is shorter than the neighbors current shortest path, update it
            if ($CurrentMileage -lt $Cities[$Neighbor.toName].miles) {
                $Cities[$Neighbor.toName].miles = $CurrentMileage
            }
        }
    }

    # Mark the city visited
    $Cities[$CurrentCity].visited = $true

    # Find the next un-visited city to visit.
    $Shortest = 100000
    foreach ($City in $Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false}) {
        if ($City.Value.miles -lt $Shortest) {
            $Shortest       = $City.Value.miles
            $CurrentCity    = $City.Name 
        }
    }

    # Update the unvisited cities count
    $UnvisitedCities = ($Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false} | Measure-Object).Count
}

# Return the results
$Cities

Whew.

-Cheers

SQL Server Graph Database of US Capitals

SQL Server 2017 introduced graph database functionality. Graphs are a core concept in computer science and I was excited to hear of this new feature.

Robert Sheldon has a great series on SQL Server’s graph database here. If you’re interested in learning about graphs in SQL Server, it’s a great place to start.

While there’s countless relational databases out there for practice, there’s not much in the way of graph databases. It is my intent to share my graph databases with the world in hopes that it removes the friction associated with your learning.

US Capitals is a popular data set for working with graphs. Nodes identify a state capital. An edge connects a capital in one state with the capital of a neighboring state. Only the lower 48 states are present. While the data is readily available, I was unable to find TSQL scripts to create the graph using SQL Server 2017 graph database. I created those scripts and have made them readily available on GitHub.

Create a SQL Server database and execute these scripts in order:

1_CreateGraphNode_Capitals.sql

2_CreateGraphEdge_Distances.sql

3_InsertNode_Capitals.sql

4_InsertEdge_Distances.sql

There are two sets of INSERT statements in order to create a bi-directional graph. Without both edges, the graph could only be traversed in one direction.

Once you have a usable graph database, you can begin programming against it. In my next post, I’ll go over my PowerShell implementation of Dijkstra’s algorithm, which is a classic shortest path algorithm.

-Cheers

Parallel Ola Hallengren Backups

When a SQL Server instance has a large number of databases (100s or 1000s), it can be very challenging to complete backups in a timely manner. Using common methods such as Ola Hallengren scripts or native maintenance plans often take a long time due to serial processing (1 processor). So the question becomes, how do we perform backups in parallel using multiple processors?

With any backup automation, there’s a few core requirements that we’ll keep in mind while developing this solution:
* Ensure newly created databases are backed up
* Ensure deleted databases are not backed up
* Ensure we don’t backup a database multiple times

I’ve come up with a method to run Ola Hallengren’s famous maintenance scripts in parallel. Using dynamic SQL we can generate a new database list at runtime, and backup just the databases in that list. There’s no need to maintain a list of databases and there’s no chance a database will be missed. We will create multiple SQL Agent jobs that each refer to a separate list of databases. These jobs can then be executed in parallel.

Decide On The Number of Jobs

First step to this process is to decide on the number of parallel jobs you want to create. For this example, I will use 4 parallel jobs. It would be wise to consider how many processors you have on the system and the load incurred by the server during this process.

Dividing Up The Databases

Every database is given a unique integer ID upon creation. This database_id can be found in the sys.databases system table. Using the modulo function, we can divide up the databases based on database_id and the number of jobs we decided upon (in my case, 4). We can view the modulo value via:

SELECT name, database_id, database_id % 4 AS [mod] FROM sys.databases

Unfortunately, Ola Hallengren’s scripts don’t run on database_id, they run on database name. Let’s convert our script to return a list of database names, remove the hard coded number, and make the script easier to update:

DECLARE @NumberOfJobs INT
DECLARE @ThisJob INT

SET @NumberOfJobs = 4
SET @ThisJob = 0

DECLARE @DATABASES NVARCHAR(MAX)
SELECT @DATABASES = COALESCE(@DATABASES + ',', '') + name
FROM sys.databases
WHERE database_id % @NumberOfJobs = @ThisJob

Notice the new variable @ThisJob. This variable will be different for each SQL Server Agent job you create. Use values 0 through the number of jobs minus one.

Also notice the COALESCE call which includes a comma. This is critical as it will generate a comma separated list of database names which we can then pass into Ola Hallengren using the @DATABASES variable.

Dynamically Call Ola Hallengren

Looking at the default parameters for Ola Hallengren’s DatabaseBackup procedure, we can see that all databases are backed up via the @Databases = 'USER_DATABASES' parameter value. Let’s convert a DatabaseBackup call to dynamic SQL:

DECLARE @SQL NVARCHAR(MAX)

SELECT @SQL =  
'EXECUTE [dbo].[DatabaseBackup] 
@Databases = ''USER_DATABASES'',  
@Directory = N''C:\SQLBackups'', 
@BackupType = ''FULL'', 
@Verify = ''Y'', 
@CleanupTime = 48, 
@CheckSum = ''Y'', 
@LogToTable = ''Y'''

--print @SQL
EXEC (@SQL)

And now, let’s combine the modulo script and the dynamic SQL script. This will be our final product that we can put directly into a SQL Agent job.

DECLARE @NumberOfJobs INT
DECLARE @ThisJob INT

SET @NumberOfJobs = 4
SET @ThisJob = 0

DECLARE @DATABASES NVARCHAR(MAX)
SELECT @DATABASES = COALESCE(@DATABASES + ',', '') + name
FROM sys.databases
WHERE database_id % @NumberOfJobs = @ThisJob

DECLARE @SQL NVARCHAR(MAX)

SELECT @SQL =  
'EXECUTE [dbo].[DatabaseBackup] 
@Databases = ''' + @DATABASES + ''', 
@Directory = N''C:\SQLBackups'', 
@BackupType = ''FULL'', 
@Verify = ''Y'', 
@CleanupTime = 48, 
@CheckSum = ''Y'', 
@LogToTable = ''Y'''

--print @SQL
EXEC (@SQL)

Pay attention to where the @Databases parameter is set to our modulo generated list of databases. By calling the modulo script everytime, we guarantee that new databases will be backed up and deleted databases will not.

You Must Create Multiple Jobs

To ensure you backup every database on the instance, you must create multiple agent jobs. How many is determined by the value you specified in @NumberOfJobs. If you leave a value out, then all the databases with that database_id modulo result will not be backed up.

In practice, I name my jobs using the default job names with a suffix that corresponds to which modulo result the job uses. I script one of the jobs out, then modify the suffix and the @ThisJob variable. This gives me a tidy set of agent jobs:

ParallelOlaJobs

There’s definitely room for more automation in this solution. I still need to automate the deployment of all jobs, whichout the need to modify the suffix or @ThisJob. But that’s a simple matter of PowerShell string concatenation. Stay tuned for a future post with that solution.

-Cheers