Redirecting Standard In With PowerShell

Command Line Interface (CLI) tools can be very useful for interacting with certain applications. However, some CLIs do not let a user pass in parameters which makes it difficult to automate. Instead, they lock a user into an interactive session and force the user to enter commands. Fortunately, some programming languages allow for a redirection of the standard input, output, and error of an running process. A developer co-worker of mine has been very successful doing this in Java, which got me thinking… And turns out, using the .NET libraries, we can implement this functionality for any CLI in PowerShell.

Two .NET assemblies are needed for this task:
ProcessStartInfo – This object will store the metadata for our process, such as the executable path and arguments.
Process – This object will correspond to the running process on our system.

First, let’s create a new ProcessStartInfo, assign an executable, and assign some arguments. In this example I am using SQLCMD as our executable (yes, I know SQLCMD accepts parameters, this is just an example). The arguments here are identical to what you would pass the executable if you were executing from a shell. I’m passing my server (localhost\sql2017), a database (master), and setting the flag to use integrated authentication.

$ProcInfo = New-Object System.Diagnostics.ProcessStartInfo
$ProcInfo.FileName = 'C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\SQLCMD.EXE'
$ProcInfo.Arguments = '-S localhost\SQL2017 -d master -E'

Some additional members need to be assigned to allow our process to redirect input. Luckily, these are all booleans and fairly straight forward.

# These two are required to redirect any of the streams
$ProcInfo.UseShellExecute           = $false
$ProcInfo.ErrorDialog               = $false
# Allow redirection
$ProcInfo.RedirectStandardError     = $true
$ProcInfo.RedirectStandardInput     = $true
$ProcInfo.RedirectStandardOutput    = $true

With our ProcessStartInfo object instantiated we can now move onto instantiate our Process object. We will create the object and then assign StartInfo to our newly created ProcessStartInfo.

$Process            = New-Object System.Diagnostics.Process
$Process.StartInfo  = $ProcInfo

Next, we will start our process and access its streams. Start returns a boolean which can be logically used for error handling. For clarity, I’ve assigned the streams to new variables but this is not neccessary.

$Process.Start()

$STDIn      = $Process.StandardInput
$STDOut     = $Process.StandardOutput
$STDError   = $Process.StandardError

With our streams re-directed, we can now use the standard input stream programmatically. In this example, I am hard-coding strings but in production this could be a script file or even another process.

$STDIn.WriteLine('SELECT @@SERVERNAME')
$STDIn.WriteLine('GO')
# If you don't call "quit", SQLCMD will not return
$STDIn.WriteLine('quit')

Finally, we can capture all the output from our process using the ReadToEnd method.

Write-Host $STDOut.ReadToEnd()

There’s several other methods available to make your script more flexible. For example, if your process runs for a long time, you can loop until it completes using the HasExited boolean. Or if your process has multiple points where you need to read from standard out, the ReadLine method is available. All of these are common in more traditional programming languages and it’s incredibly powerful to have them available to us in PowerShell.

Here’s the entire script.

# Specify our executable and parameters.
$ProcInfo           = New-Object System.Diagnostics.ProcessStartInfo
$ProcInfo.FileName  = 'C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\SQLCMD.EXE'
$ProcInfo.Arguments = '-S localhost\sql2017 -d master -E'

# Configure
$ProcInfo.UseShellExecute           = $false
$ProcInfo.ErrorDialog               = $false
$ProcInfo.RedirectStandardError     = $true
$ProcInfo.RedirectStandardInput     = $true
$ProcInfo.RedirectStandardOutput    = $true

# Create a process
$Process            = New-Object System.Diagnostics.Process
$Process.StartInfo  = $ProcInfo

# Start the process
$Process.Start()

# Redirect input, output, and error streams
$STDIn      = $Process.StandardInput
$STDOut     = $Process.StandardOutput
$STDError   = $Process.StandardError

# Write some strings to stdin
$STDIn.WriteLine('SELECT @@SERVERNAME')
$STDIn.WriteLine('GO')
$STDIn.WriteLine('quit')

# Read all the output
Write-Host $STDOut.ReadToEnd()

-Cheers

Installing SQLCMD on Linux

One challenge with SQL Server database automation is that eventually you’re going to have to execute a native TSQL statement. There are some very trendy tools out there, ie. Puppet, Ansible, etc… and it can be a challenge to accomplish everything using their native framework. Have you ever seen a linked server resource type in Puppet? SQL Server is so feature rich that implementing every possible object and setting would take years. So, it often becomes practical to re-use existing TSQL scripts and find a way to execute them from the command line.

I’ve recently gotten more exposure to Ansible and one of the drawbacks is that Ansible is only executable on Linux. So, as the title suggests, how do we install SQLCMD on Linux so that we can expand our capabilities with Ansible?

Oh, I almost forgot, there’s a million distros of Linux. The version I’m using is rhel7. Also worth noting, I’m a total Linux newbie.

Microsoft has decent instructions on installing SQLCMD on linux. My current company blocks access to external yum repositories, so I followed the “Offline” installation. There’s two packages to install:

  1. msodbcsql – aka Microsoft’s ODBC driver for SQL Server
  2. mssql-tools – SQLCMD
sudo yum install msodbcsql-13.1.6.0-1.x86_64.rpm
sudo yum install mssql-tools-14.0.5.0-1.x86_64.rpm

Microsoft is a little misleading in parts of their instructions, you do have to install the ODBC package prior to SQLCMD.

ODBC will be installed in /opt/microsoft and MSSQL-Tools in /opt/mssql-tools. You’ll find both SQLCMD and BCP in /opt/mssql-tools/bin

I then added the SQLCMD path to my bash profile to save me some typing:

PATH=$PATH:/opt/mssql-tools/bin

At this point, you should be able to execute SQLCMD. Note: I capitalize SQLCMD but Linux is case sensitive, so executions must use a lowercase sqlcmd.

Executing sqlcmd with no parameters will give you a brief overview of the available parameters. If you’ve used SQLCMD before, you’ll be happy to see that there’s not much difference between Windows and Linux parameters:
SQLCMD

My next step here is really to explore Ansible further. In future posts I’ll show how to run TSQL scripts using SQLCMD on Linux via Ansible.

-Cheers

Dijkstra’s Algorithm in PowerShell and SQL Server Graph

In my previous post I shared a SQL Server 2017 graph database of US capitals. Graphs are a computer science core competency and present some interesting challenges for programmers. Most notable of these challenges is finding the shortest path between nodes. Dijkstra’s algorithm is a commonly taught algorithm for finding shortest path. Dijkstra’s is often asked about during entry level developer interviews and it is a great algorithm to implement when learning a new language since it requires utilizing loops, logic, and data structures.

Here’s my implementation of Dijkstra’s algorithm using PowerShell, traversing a graph of US capitals. Rather than manage our own graph nodes and edges, we’ll utilize graph tables and queries in SQL Server. There’s a lot of different ways to implement this in PowerShell, my first cut of this ended up using a hash table so I could perform random access. There’s a give-and-take with custom PowerShell objects, which sacrifice random access for some other benefits.

# Replace these values with your instance and database
$SQLInstance    = 'sql2017'
$GraphDB        = 'graph'

# The city we wish to find the shortest paths from
$CurrentCity    = 'Salem'

Next, I’ll create a hash table, query SQL Server for nodes, and populate the hash table.

# Initialize and populate our graph data structure.
$Cities = @{}

$AllCities = Invoke-SQLCMD -ServerInstance $SQLInstance -Database $GraphDB -Query 'SELECT name FROM capitals'

# Give each city a high mileage and mark them un-visited.
foreach ($City in $AllCities) {
    $Record = "" | Select-Object miles, visited
    $Record.miles       = [int]10000
    $Record.visited     = $false

    $Cities.Add($City.name, $Record)
}

Next, mark your current city as having 0 miles, and store the number of un-visited cities, and start a loop to process all un-visited cities.

$Cities[$CurrentCity].miles = 0

$UnvisitedCities = ($Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false} | Measure-Object).Count

while ($UnvisitedCities -gt 0) {

Query SQL Server for edge information from your current city. This query will return all neighboring capitals and their distances. If you’re not familiar with the MATCH component or querying graphs, see Robert Sheldon’s post.

    $NeighborsQuery   = "SELECT cityTo.name AS toName, distances.miles
                        FROM capitals AS cityFrom, distances, capitals AS cityTo
                        WHERE MATCH (cityFrom-(distances)->cityTo)
                        AND cityFrom.name = '$CurrentCity'"
    $Neighbors = Invoke-SQLCMD -ServerInstance $SQLInstance -Database $GraphDB -Query $NeighborsQuery

Now, for each neighbor, determine if you have found a shorter route through the current node. If you have, update the distance.

    foreach ($Neighbor in $Neighbors) {

        # If the neighbor has been visited, don't process it
        if ($Cities[$Neighbor.toName].visited -ne $true) {
            # Update the distance to the neighbor through the current city
            $CurrentMileage = $Cities[$CurrentCity].miles + $Neighbor.miles

            # If the current path is shorter than the neighbors current shortest path, update it
            if ($CurrentMileage -lt $Cities[$Neighbor.toName].miles) {
                $Cities[$Neighbor.toName].miles = $CurrentMileage
            }
        }
    }

Once we’ve processed all neighboring nodes, we can mark the node as visited.

    # Mark the city visited
    $Cities[$CurrentCity].visited = $true

Because I’m using a hash table, we need to loop to find the next city that we want to visit. Dijkstra’s dictates we visit the un-visited city with the shortest distance.

    # Find the next un-visited city to visit.
    $Shortest = 100000
    foreach ($City in $Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false}) {
        if ($City.Value.miles -lt $Shortest) {
            $Shortest       = $City.Value.miles
            $CurrentCity    = $City.Name 
        }
    }

Once we’ve found it, update the un-visited cities count and close the loop.

<br />    # Update the un-visited cities count
    $UnvisitedCities = ($Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false} | Measure-Object).Count
}

Finally, spit out the results by returning the hash table. The hash table will now have updated distances which reflect the shortest distance from your starting city.

# Return the results
$Cities

Here’s the entire script:

# Replace these values with your instance and database
$SQLInstance    = 'sql2017'
$GraphDB        = 'graph'

# The city we wish to find the shortest paths from
$CurrentCity    = 'Salem'

# Initialize and populate our graph data structure.
$Cities = @{}

$AllCities = Invoke-SQLCMD -ServerInstance $SQLInstance -Database $GraphDB -Query 'SELECT name FROM capitals'

foreach ($City in $AllCities) {
    $Record = "" | Select-Object miles, visited
    $Record.miles       = [int]10000
    $Record.visited    = $false

    $Cities.Add($City.name, $Record)
}

$Cities[$CurrentCity].miles = 0

$UnvisitedCities = ($Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false} | Measure-Object).Count

while ($UnvisitedCities -gt 0) {

    $NeighborsQuery   = "SELECT cityTo.name AS toName, distances.miles
                        FROM capitals AS cityFrom, distances, capitals AS cityTo
                        WHERE MATCH (cityFrom-(distances)->cityTo)
                        AND cityFrom.name = '$CurrentCity'"
    $Neighbors = Invoke-SQLCMD -ServerInstance $SQLInstance -Database $GraphDB -Query $NeighborsQuery

    foreach ($Neighbor in $Neighbors) {

        #if the neighbor has been visited, don't process it
        if ($Cities[$Neighbor.toName].visited -ne $true) {
            # Update the distance to the neighbor through the current city
            $CurrentMileage = $Cities[$CurrentCity].miles + $Neighbor.miles

            # If the current path is shorter than the neighbors current shortest path, update it
            if ($CurrentMileage -lt $Cities[$Neighbor.toName].miles) {
                $Cities[$Neighbor.toName].miles = $CurrentMileage
            }
        }
    }

    # Mark the city visited
    $Cities[$CurrentCity].visited = $true

    # Find the next un-visited city to visit.
    $Shortest = 100000
    foreach ($City in $Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false}) {
        if ($City.Value.miles -lt $Shortest) {
            $Shortest       = $City.Value.miles
            $CurrentCity    = $City.Name 
        }
    }

    # Update the unvisited cities count
    $UnvisitedCities = ($Cities.GetEnumerator() | Where-Object {$_.Value.visited -eq $false} | Measure-Object).Count
}

# Return the results
$Cities

Whew.

-Cheers

SQL Server Graph Database of US Capitals

SQL Server 2017 introduced graph database functionality. Graphs are a core concept in computer science and I was excited to hear of this new feature.

Robert Sheldon has a great series on SQL Server’s graph database here. If you’re interested in learning about graphs in SQL Server, it’s a great place to start.

While there’s countless relational databases out there for practice, there’s not much in the way of graph databases. It is my intent to share my graph databases with the world in hopes that it removes the friction associated with your learning.

US Capitals is a popular data set for working with graphs. Nodes identify a state capital. An edge connects a capital in one state with the capital of a neighboring state. Only the lower 48 states are present. While the data is readily available, I was unable to find TSQL scripts to create the graph using SQL Server 2017 graph database. I created those scripts and have made them readily available on GitHub.

Create a SQL Server database and execute these scripts in order:

1_CreateGraphNode_Capitals.sql

2_CreateGraphEdge_Distances.sql

3_InsertNode_Capitals.sql

4_InsertEdge_Distances.sql

There are two sets of INSERT statements in order to create a bi-directional graph. Without both edges, the graph could only be traversed in one direction.

Once you have a usable graph database, you can begin programming against it. In my next post, I’ll go over my PowerShell implementation of Dijkstra’s algorithm, which is a classic shortest path algorithm.

-Cheers

Parallel Ola Hallengren Backups

When a SQL Server instance has a large number of databases (100s or 1000s), it can be very challenging to complete backups in a timely manner. Using common methods such as Ola Hallengren scripts or native maintenance plans often take a long time due to serial processing (1 processor). So the question becomes, how do we perform backups in parallel using multiple processors?

With any backup automation, there’s a few core requirements that we’ll keep in mind while developing this solution:
* Ensure newly created databases are backed up
* Ensure deleted databases are not backed up
* Ensure we don’t backup a database multiple times

I’ve come up with a method to run Ola Hallengren’s famous maintenance scripts in parallel. Using dynamic SQL we can generate a new database list at runtime, and backup just the databases in that list. There’s no need to maintain a list of databases and there’s no chance a database will be missed. We will create multiple SQL Agent jobs that each refer to a separate list of databases. These jobs can then be executed in parallel.

Decide On The Number of Jobs

First step to this process is to decide on the number of parallel jobs you want to create. For this example, I will use 4 parallel jobs. It would be wise to consider how many processors you have on the system and the load incurred by the server during this process.

Dividing Up The Databases

Every database is given a unique integer ID upon creation. This database_id can be found in the sys.databases system table. Using the modulo function, we can divide up the databases based on database_id and the number of jobs we decided upon (in my case, 4). We can view the modulo value via:

SELECT name, database_id, database_id % 4 AS [mod] FROM sys.databases

Unfortunately, Ola Hallengren’s scripts don’t run on database_id, they run on database name. Let’s convert our script to return a list of database names, remove the hard coded number, and make the script easier to update:

DECLARE @NumberOfJobs INT
DECLARE @ThisJob INT

SET @NumberOfJobs = 4
SET @ThisJob = 0

DECLARE @DATABASES NVARCHAR(MAX)
SELECT @DATABASES = COALESCE(@DATABASES + ',', '') + name
FROM sys.databases
WHERE database_id % @NumberOfJobs = @ThisJob

Notice the new variable @ThisJob. This variable will be different for each SQL Server Agent job you create. Use values 0 through the number of jobs minus one.

Also notice the COALESCE call which includes a comma. This is critical as it will generate a comma separated list of database names which we can then pass into Ola Hallengren using the @DATABASES variable.

Dynamically Call Ola Hallengren

Looking at the default parameters for Ola Hallengren’s DatabaseBackup procedure, we can see that all databases are backed up via the @Databases = 'USER_DATABASES' parameter value. Let’s convert a DatabaseBackup call to dynamic SQL:

DECLARE @SQL NVARCHAR(MAX)

SELECT @SQL =  
'EXECUTE [dbo].[DatabaseBackup] 
@Databases = ''USER_DATABASES'',  
@Directory = N''C:\SQLBackups'', 
@BackupType = ''FULL'', 
@Verify = ''Y'', 
@CleanupTime = 48, 
@CheckSum = ''Y'', 
@LogToTable = ''Y'''

--print @SQL
EXEC (@SQL)

And now, let’s combine the modulo script and the dynamic SQL script. This will be our final product that we can put directly into a SQL Agent job.

DECLARE @NumberOfJobs INT
DECLARE @ThisJob INT

SET @NumberOfJobs = 4
SET @ThisJob = 0

DECLARE @DATABASES NVARCHAR(MAX)
SELECT @DATABASES = COALESCE(@DATABASES + ',', '') + name
FROM sys.databases
WHERE database_id % @NumberOfJobs = @ThisJob

DECLARE @SQL NVARCHAR(MAX)

SELECT @SQL =  
'EXECUTE [dbo].[DatabaseBackup] 
@Databases = ''' + @DATABASES + ''', 
@Directory = N''C:\SQLBackups'', 
@BackupType = ''FULL'', 
@Verify = ''Y'', 
@CleanupTime = 48, 
@CheckSum = ''Y'', 
@LogToTable = ''Y'''

--print @SQL
EXEC (@SQL)

Pay attention to where the @Databases parameter is set to our modulo generated list of databases. By calling the modulo script everytime, we guarantee that new databases will be backed up and deleted databases will not.

You Must Create Multiple Jobs

To ensure you backup every database on the instance, you must create multiple agent jobs. How many is determined by the value you specified in @NumberOfJobs. If you leave a value out, then all the databases with that database_id modulo result will not be backed up.

In practice, I name my jobs using the default job names with a suffix that corresponds to which modulo result the job uses. I script one of the jobs out, then modify the suffix and the @ThisJob variable. This gives me a tidy set of agent jobs:

ParallelOlaJobs

There’s definitely room for more automation in this solution. I still need to automate the deployment of all jobs, whichout the need to modify the suffix or @ThisJob. But that’s a simple matter of PowerShell string concatenation. Stay tuned for a future post with that solution.

-Cheers

Logon Trigger Woes

T-SQL-Tuesday-Logo

This month’s TSQLTuesday is hosted by Steve Jones on the topic of triggers!

In a large enough environment, eventually the question will be asked “is this database still being used?” There’s a few technologies available to help you answer this question and by far the worst is logon triggers.

I work in a large mixed environment (SQL Server, Oracle, MySQL, YourSQL, etc…) and logon triggers have been used here for a very long time. Specifically on the Oracle servers, logon triggers are used to track who is connecting to the database from where. To a lesser extent, they’re also used to control where connections come from. So the idea of using logon triggers to identify if a SQL Server was still being used seemed very natural to the team.

Logon triggers were something I learned for a 2008 certification exam but I have never seen them put to use in a SQL Server environment.

Setting up a Logon Trigger

First, I created a database with a table to capture login information. Columns for the login, the timestamp, and the host they rode in on:

CREATE DATABASE [DBAInfo]

USE [DBAInfo]
CREATE TABLE [dbo].[LoginInfo](
[Login_Name] [nvarchar](256) NULL,
[Login_Time] [datetime] NULL,
[Host_Name] [nvarchar](128) NULL
) ON [PRIMARY]

I wrote my logon trigger to delete any old Login\Host records before inserting the new record. The idea being that I really only care about the most recent login. I also don’t want this table to cause any growth concerns and I also don’t want to put a purge job in place. Having the DELETE in the trigger takes care of all those concerns (but adds new ones, more on that below):

CREATE TRIGGER [LastLogon]
ON ALL SERVER WITH EXECUTE AS 'sa'
FOR LOGON
AS
BEGIN
DECLARE @hostname NVARCHAR(128)

--Capture the hostname
SELECT @hostname = EVENTDATA().value('(/EVENT_INSTANCE/ClientHost)[1]','NVARCHAR(128)')

--Remove old Login\Hostname records
DELETE FROM DBAInfo.dbo.LoginInfo
WHERE [Login_Name] = ORIGINAL_LOGIN()
AND [Host_Name] = @hostname

--Insert the new record
INSERT INTO DBAInfo.dbo.LoginInfo
SELECT ORIGINAL_LOGIN(), GETDATE(),
EVENTDATA().value('(/EVENT_INSTANCE/ClientHost)[1]','NVARCHAR(128)')
END
GO

The trigger works as expected, recording all logins. As a DBA, I like having a table to query and sort to my liking.

Ooops

But what happens when the trigger fails? My first brush with this was when I dropped the database containing the logon trigger. This causes an error in the trigger and fails the logon! No users or application accounts can login, including myself.

The error message provided is “Logon failed for login due to trigger execution” and there are two ways to fix it.
1. Logon with the dedicated administrator connection (DAC) and disable\drop the trigger.
1. Restart SQL Server in single user mode and disable\drop the trigger.

While I may be able to go the rest of my career without breaking this trigger, another DBA might not. Mistakes happen to all of us and having a server’s availability hinge on a single table or database strikes me as very risky.

Deadlocks

I implemented this trigger on a test system that was being monitored by Idera SQL Diagnostic Manager. Idera was creating so many connections that it was deadlocking due to the logon trigger. Both failed logons and deadlocks are logged in the SQL Server Error log:
Idera_Deadlocks

Summary

My logon trigger, as written, cannot go into production. If you are forced to use a logon trigger in production, here’s some thoughts:

  1. Make sure you have error handling in place. The trigger can never fail or you will lock users out of the instance.
  2. Make sure you have remote DAC configured (and know how to use it!), for cases when the trigger does fail.
  3. Limit the transactions in the trigger to avoid deadlocking.

If you’re like me and need to get a grip on who’s loging into your server, a better solution is really to use an audit to capture logins (post coming in the future on that), a trace, or a third party tool. There are many ways this trigger could be re-written to prevent these issues but IMO logon triggers are too risky for critical environments.

SUSER_SNAME() Returning the Wrong Login

If users are logging into your SQL Server with their Active Directory accounts, you may run into an issue if their login changes. This issue is usually associated with someone changing their account name to reflect a change in their legal name. If the user executes the function SUSER_SNAME(), it may return their old login.

There is some caching that takes place on the server which is holding onto the “key” associated with that account. When a user has their account name changed, Active Directory will not update this “key”, only the “name”. Any server that has cached the account may retain the old account name. When SQL Server processes the SUSER_SNAME() call, the old account name is returned.

Rebooting the server has been shown to resolve this issue but there may be a better way. If your users are obtaining access through an Active Directory group (which they should be), simply creating a login for that specific user and then dropping it will resolve the issue.

Creating the login will require the server to validate the account “name” against Active Directory, during which the cached record will be over-written. Dropping the login will return the instance configuration to it’s original state, where the user obtains access via their group.