Press "Enter" to skip to content

Author: Kevin Feasel

Attaching Databases Via Dockerfile

Andrew Pruski shares a better technique for attaching database files held outside of a Docker container:

Now this works a treat. It waits ten seconds for the SQL instance to come up within the container and then runs the sqlcmd script below.
The problem with this is, it’s a bit of a hack. The HEALTHCHECK command isn’t designed to run once, it’ll carry on running every 10 seconds once the container comes up…not great.
So, what’s the better way of doing it?

Andrew gives us a clear explanation of what’s going on and gives a shout out to Bob Ward’s SQL Server on Linux book.

Comments closed

Disabling SQL Agent Jobs For Maintenance Periods

Jon Shaulis shows us a way to disable SQL Agent jobs with T-SQL:

A user had a unique issue where their system would have dynamically changing job names and schedules, but they need to disable and re-enable them during maintenance. Obviously, this is a huge headache.
I made a recommendation that they should ultimately create a list of currently enabled jobs that had a schedule using a system query.

SELECT *
FROM MSDB.dbo.sysschedules ss
     INNER JOIN msdb.dbo.sysjobschedules jss
ON jss.schedule_id = ss.schedule_id
WHERE ss.enabled = 1;
The code above returns all schedules that are paired to a job that are enabled. The enabled = 1 flag and the inner join to the sysjobschedules table are what dictate those filters.

Read on for more details about what’s going on and some caveats.

Comments closed

Nested Loop Joins

Bert Wagner walks us through nested loop joins:

Nested loops joins work like this: SQL Server takes the first value from our first table (our “outer” table – by default SQL Server decides for us which table of the two this will be), and compares it to every value in our second “inner” table to see if they match. 
Once every inner value has been checked, SQL Server moves to the next value in the outer table and the process repeats until every value from our outer table has been compared to every value in our inner table.

This description is a worst case example of the performance of a nested loop join.

Read the whole thing.  Understanding physical join operators is a key to figuring out if your data retrieval is as fast as it should be. 

Comments closed

Agent Job Cmdlets In dbatools

Garry Bargsley has a series looking at different cmdlets in dbatools.  Today’s focus is Agent jobs:

Agent Jobs are the heartbeat to most SQL Server environments.  Jobs are how we move data, purge data, automate processes, backup databases, perform database maintenance and many other functions.  Managing one SQL Server and the Agent Jobs can be a daunting task, much less managing 10, 20, 100 or more.  Luckily dbatools has you covered!!!  There are a wide range of commands that help you with Agent Jobs.  There are commands to Copy jobs, Find Jobs, Get Jobs, Create New Jobs, Remove Jobs, Set Jobs Settings, Start Jobs and Stop Jobs.  That is a huge range of functionality, so lets get started on the fourth day goodness.

Click through for descriptions and examples.

Comments closed

Working With Images In Spark 2.4

Tomas Nykodym and Weichen Xu give us an update on working with images in the most recent version of Apache Spark:

An image data source addresses many of these problems by providing the standard representation you can code against and abstracts from the details of a particular image representation.
Apache Spark 2.3 provided the ImageSchema.readImages API (see Microsoft’s post Image Data Support in Apache Spark), which was originally developed in the MMLSpark library. In Apache Spark 2.4, it’s much easier to use because it is now a built-in data source. Using the image data source, you can load images from directories and get a DataFrame with a single image column.
This blog post describes what an image data source is and demonstrates its use in Deep Learning Pipelines on the Databricks Unified Analytics Platform.

If you’re interested in working with convolutional neural networks or otherwise need to analyze image data, check it out.

Comments closed

Comparing Streaming Engines

George Vetticaden compares Spark Streaming, Storm, and Kafka Streams:

Before the addition of Kafka Streams support, HDP and HDF supported two stream processing engines:  Spark Structured Streaming and Streaming Analytics Manager (SAM) with Storm. So naturally, this begets the following question:
Why add a third stream processing engine to the platform?
With the choice of using Spark structured streaming or SAM with Storm support, customers had the choice to pick the right stream processing engine based on their non- functional requirements and use cases. However, neither of these engines addressed the following types of requirements that we saw from our customers:

And this doesn’t even include Samza or Flink, two other popular streaming engines.

My biased answer is, forget Storm.  If you have a legacy implementation of it, that’s fine, but I wouldn’t recommend new streaming implementations based off of it.  After that, you can compare the two competitors (as well as Samza and Flink) to see which fits your environment better.  I don’t think either of these has many scenarios where you completely regret going with, say, Kafka Streams instead of Spark Streaming.  Each has its advantages, but they’re not so radically different.

Comments closed

Reporting On Unit Tests In R With covrpage

Maelle Salmon recaps Locke Data’s involvement with the covrpage package:

To read more about getting started with covrpage in your own package in a few lines of code only, we recommend checking out the “get started” vignette. It explains more how to setup the Travis deploy, mentions which functions power the covrpage report, and gives more motivation for using covrpage.
And to learn how the information provided by covrpage should be read, read the “How to read the covrpage report” vignette.

Check it out.

Comments closed

Tee-Object In Powershell

Adam Bertram has a paean in honor of Tee-Object:

What does Tee-Object  do anyway? Tee-Object basically kills two birds with one stone. This cmdlet redirects output from a PowerShell command and either saves it to a file or to a variable while also returning the output to the pipeline. It allows a scripter to save the output and send the output across the pipeline all in one shot.
Let’s say a scripter wants to send some text to a file and then perform kind of task on that same text afterward. Perhaps we’re pulling a list of Windows services from a computer and saving that output to a text file to create a snapshot of the before state before we attempt to start the services. This task can be done in two separate steps or with a single one with Tee-Object .

I’ve used it several times in Powershell as well as the tee command in Linux.  It’s great when you need to do several things with the same data but don’t want to break out of your pipeline.

Comments closed

A Graph Database Of US Capitals

James Livingston has a graph database to share:

While there’s countless relational databases out there for practice, there’s not much in the way of graph databases. It is my intent to share my graph databases with the world in hopes that it removes the friction associated with your learning.
US Capitals is a popular data set for working with graphs. Nodes identify a state capital. An edge connects a capital in one state with the capital of a neighboring state. Only the lower 48 states are present. While the data is readily available, I was unable to find TSQL scripts to create the graph using SQL Server 2017 graph database. I created those scripts and have made them readily available on GitHub.

I’m interested in the forthcoming post on Dijkstra’s algorithm; I think the last time I saw that was my undergrad days.

Comments closed

Tips For Migrating SSISDB

Kenneth Fisher shares some thoughts on SSISDB:

We’ve been doing a lot of upgrading recently and at one point had to move an instance from one 2016 server to another. In the process, we found out (the hard way) that it’s not that easy to move SSISDB (the SSIS Catalog that may or may not be named SSISDB). I mean it’s not hard, but it’s definitely not a basic backup/restore. The full BOL instructions on how to do this are here. That said, here are the elements that are involved.

Read on for the list as well as an order of operations.

Comments closed