Author: Kevin Feasel

The databases that I store in my container image are updated on a weekly basis and currently, the process to update our containers is manual. Once the updated image has been created, the existing running containers are dropped and new ones created from the updated image.

But what if we could automatically refresh our containers with the updated image? If we could do that then the only process that’s manual is updating the image. We would no longer have to worry about any containers running SQL instances with databases that are out of date.

Luckily, there’s a way to do this and it’s accessible via an image on the Docker Hub called Watchtower. What Watchtower does is monitor the Docker Hub and if there’s an update to an image it will automatically refresh all running containers that are on the same host.

Read on for a step-by-step solution.

Comments closed

SQLOS Task Origins

Published 2017-07-26 by Kevin Feasel

Ewald Cress explains how SQLOS tasks come into being:

The above system tasks wear their purpose on their sleeves, because the function pointer in the SOS_Task::Param is, well, to the point. The tasks that run user queries are more abstract, because the I/O completion port listener can’t be bothered with understanding much beyond the rudiments of reading network packets – it certainly can’t be mucking about with fluent TDS skills, SQL parsing and compilation, or permission checks.

So what it does is to enqueue a task that speaks TDS well, pointing it to a bunch of bytes, and sending it on its way. Here is an example of such a dispatch, which shows the “input” side of the WorkDispatcher:

Read the whole thing.

Comments closed

Cycling The SSMS Clipboard Ring

Published 2017-07-26 by Kevin Feasel

Bert Wagner shows how you can use the clipboard ring in SQL Server Management Studio:

Copying and pasting is a feature that’s available in nearly every text editor (“nearly” — anyone remember the days before iOS had a clipboard?).

However, SQL Server Management Studio goes above and beyond the regular copy and paste feature set — it has a clipboard ring.

What’s a clipboard ring you ask?

The clipboard ring let’s you cycle through the last 20 things you copied onto your clipboard when you go to paste in SSMS.

Read on to learn how to use the clipboard ring. When I remember that it exists, it’s quite useful.

Comments closed

Top Jobs Report

Published 2017-07-26 by Kevin Feasel

Kenneth Fisher only looks at the best SQL Agent jobs:

But sometimes what you want is an overall history. Maybe you want to see your longest running jobs? Or the most frequent jobs? Or jobs that are failing? There is a great pre-built report just for that.

Now, be warned. The report and the history view both pull from the system views and they only hold so much history. You need to make s

With that warning stated, this is a good report if you just want to get some idea of what’s happening on your server.

Comments closed

Prioritizing Rows In A Union

Published 2017-07-26 by Kevin Feasel

Daniel Hutmacher shows several ways to combine data from multiple sources, prioritized by source:

For the sake of simplicity, I’ll assume that our example data has a clustered index on the “primary key” that we want to use to determine which rows have already been included in the set – in our example, the primary key is (FirstName, LastName). The data I’m using is a fictional example, but here’s the jist:

#Employees has about 33 000 rows.

#Customers has about 44 000 rows.

#Passengers has about 500 000 rows.

The data is constructed in a way that these queries should return 530 000 rows, so we’re looking at some overlap but far from totally overlapping rows.

Example #492,806 that T-SQL is not a true 4th Generation Language, that how you write the query can greatly matter for performance.

Comments closed

Kafka Streams Basics

Published 2017-07-25 by Kevin Feasel

Anuj Saxena walks through Kafka Streams and provides a quick example:

The features provided by Kafka Streams:

Highly scalable, elastic, distributed, and fault-tolerant application.
Stateful and stateless processing.
Event-time processing with windowing, joins, and aggregations.
We can use the already-defined most common transformation operation using Kafka Streams DSL or the lower-level processor API, which allow us to define and connect custom processors.
Low barrier to entry, which means it does not take much configuration and setup to run a small scale trial of stream processing; the rest depends on your use case.
No separate cluster requirements for processing (integrated with Kafka).
Employs one-record-at-a-time processing to achieve millisecond processing latency, and supports event-time based windowing operations with the late arrival of records.
Supports Kafka Connect to connect to different applications and databases.

Read on for more details as well as a sample script to get started.

Comments closed

Random Forests In R

Published 2017-07-25 by Kevin Feasel

Anish Sing Walia explains the basics of random forests and provides sample code in R:

Random Forests are similar to a famous Ensemble technique called Bagging but have a different tweak in it. In Random Forests the idea is to decorrelate the several trees which are generated on the different bootstrapped samples from training Data.And then we simply reduce the Variance in the Trees by averaging them.
Averaging the Trees helps us to reduce the variance and also improve the Perfomance of Decision Trees on Test Set and eventually avoid Overfitting.

The idea is to build lots of Trees in such a way to make the Correlation between the Trees smaller.

Random forests frequently give a good answer to classification problems, enough so as to make them a nice starting point.

Comments closed

Using seplyr Instead Of dplyr

Published 2017-07-25 by Kevin Feasel

John Mount explains seplyr and why it can be better for certain use cases than dplyr:

seplyr is a dplyr adapter layer that prefers “slightly clunkier” standard interfaces (or referentially transparent interfaces), which are actually very powerful and can be used to some advantage.

The above description and comparisons can come off as needlessly broad and painfully abstract. Things are much clearer if we move away from theory and return to our practical example.

Click through for a great example, and also read John’s comment on the Pascal-style assignment operator he uses.

Comments closed

Data Lake Analysis With Excel And Power BI

Published 2017-07-25 by Kevin Feasel

Sachin C Sheth announces support for Azure Data Lake Store within Excel and Power BI:

Until now, if you had to analyze data stored in ADLS with Excel, you would have to copy it into a relational data store like Azure SQL Data Warehouse or download the data onto a machine, and then use Excel to analyze that data. This was rather cumbersome involving additional cost and time. With this new support, you can now access files stored in ADLS with Excel in-place, without having to copy them to other stores or locations. You can quickly get advanced insights into raw or prepared data. Models and queries you have created using Excel that ran against local data, can be run seamlessly against data stored in ADLS.

Security capabilities of ADLS allow administrators to control access to the data stored in ADLS in a discretionary manner. With this you can limit the access that Excel users have for the data in ADLS. In this manner, data in the ADLS-based data lake continues to be the single source of truth with no redundant copies and can be analyzed by analytics tools of your own choice .

Click through for a demo video.

Comments closed

Diving Into The Data Lake

Published 2017-07-25 by Kevin Feasel

Jesse Gorter explains the data lake metaphor:

A data lake is a concept that opposes the idea of a data mart. Where a data mart is a silo with structured and cleansed data, a data lake is a huge data collection that is unstructured and raw. You could also say that a data mart is a bottle of clean water whereas the data lake is the lake with (not so clean) water. 🙂

Now why would you want a data lake? Imagine you are generating huge logfiles, for example in airplanes. Machines that track air pressure, temperature etc. If something goes wrong, you definitely want to be alerted. That is event-driven: “if A and B happen, alert pilot, or do C” and there are tools for dealing with that kind of streaming data. But what if the plane landed safely? What do you do with all that data? You do not need it anymore right?

Well, some people would say: “Wrong”. You might need that data later for reasons you do not know today. Google, Microsoft and Facebook are all hoarding data. Also data they are not sure they might need someday. This data could later prove to be valuable for AI, machine learning or for something else.

Read the whole thing. The data lake concept is powerful, but it requires at least as much data governance as prior models. Just because you can dump a bunch of files without thinking about it doesn’t mean you’ll get back something useful later.

Comments closed

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31