Kafka As A Backbone

Ben Stopford explains how to use Kafka as a backbone for a microservices architecture:

Taking a log-structured approach has an interesting side effect. Both reads and writes are sequential operations. This makes them sympathetic to the underlying media, leveraging pre-fetch, the various layers of caching and naturally batching operations together. This makes them efficient. In fact, when you read messages from Kafka, the server doesn’t even import them into the JVM. Data is copied directly from the disk buffer to the network buffer. An opportunity afforded by the simplicity of both the contract and the underlying data structure.

So batched, sequential operations help with overall performance. They also make the system well suited to storing messages longer term. Most traditional message brokers are built using index structures, hash tables or B-trees, used to manage acknowledgements, filter message headers, and remove messages when they have been read. But the downside is that these indexes must be maintained. This comes at a cost. They must be kept in memory to get good performance, limiting retention significantly. But the log is O(1) when either reading or writing messages to a partition, so whether the data is on disk or cached in memory matters far less.

This is a higher-level look and helps explain why I like Kafka so much as a message broker.

R Services Internals

Niels Berglund has an excellent series on R Services internals.  Here’s the latest post:

This post is the ninth post about Microsoft SQL Server R Services, and the eight post that drills down into the internal of how it works.

So far in this series we have been looking at what happens in SQL Server as well as the launchpad service when we execute sp_execute_external_script, and we have still no real “clue” to where the R engine comes into play.

Well, hopefully that will change (at least a little bit) with this post, as we here will look at what happens when we leave the launchpad service.

This series is like candy to me.  It’s the best write-up I’ve seen so far about what’s really happening when you run SQL Server R Services.

Powershell Parameter VAlidation

Adam Bertram shows how to write safer code by validating parameters:

Simply adding [Parameter()] to the parameter block makes this function “advanced.” Once we have this in place, we can then add in some parameter validation code. In our case, we need to ensure that the file provided with FilePath actually exists before proceeding. To do this, we’ll use the ValidateScript attribute which allows us to run any code we want. As long as it returns $true, it will allow the function to run.

function Set-File {
param(
[Parameter()]
[ValidateScript({ Test-Path -Path $_ -PathType Leaf})]
[string]$FilePath
)
}

Notice that I used the pipeline variable $_ in this example. This represents whatever value the parameter is. If the user provides a file path that does not exist, they will be notified.

Read the whole thing.

Multiple Data Sets In External Scripts

Tomaz Kastrun shows a workaround to the “one data set” limit in sp_execute_external_script:

Some of the  arguments of the procedure sp_execute_external_script are enumerated. This is valid for the inputting dataset and as the name of argument @input_data_1 suggests, one can easily (and this is valid doubt) think, there can also be @input_data_2 argument, and so on. Unfortunately, this is not true.  External procedure can hold only one T-SQL dataset, inserted through this parameter.

There are many reasons for that, one would be the cost of sending several datasets to external process and back, so inadvertently, this forces user to rethink and pre-prepare the dataset (meaning, do all the data munging beforehand), prior to sending it into external procedure.

But there are workarounds on how to pass additional query/queries to sp_execute_external_script. I am not advocating this, and I strongly disagree with such usage, but here it is.

It does feel like a hinky solution, but sometimes you just need to get two data sets in.

Using Watchtower To Refresh SQL Server Containers

Andrew Pruski explains how to use Watchtower to keep your SQL Server Docker containers up to date:

The databases that I store in my container image are updated on a weekly basis and currently, the process to update our containers is manual. Once the updated image has been created, the existing running containers are dropped and new ones created from the updated image.

But what if we could automatically refresh our containers with the updated image? If we could do that then the only process that’s manual is updating the image. We would no longer have to worry about any containers running SQL instances with databases that are out of date.

Luckily, there’s a way to do this and it’s accessible via an image on the Docker Hub called Watchtower. What Watchtower does is monitor the Docker Hub and if there’s an update to an image it will automatically refresh all running containers that are on the same host.

Read on for a step-by-step solution.

SQLOS Task Origins

Ewald Cress explains how SQLOS tasks come into being:

The above system tasks wear their purpose on their sleeves, because the function pointer in the SOS_Task::Param is, well, to the point. The tasks that run user queries are more abstract, because the I/O completion port listener can’t be bothered with understanding much beyond the rudiments of reading network packets – it certainly can’t be mucking about with fluent TDS skills, SQL parsing and compilation, or permission checks.

So what it does is to enqueue a task that speaks TDS well, pointing it to a bunch of bytes, and sending it on its way. Here is an example of such a dispatch, which shows the “input” side of the WorkDispatcher:

Read the whole thing.

Cycling The SSMS Clipboard Ring

Bert Wagner shows how you can use the clipboard ring in SQL Server Management Studio:

Copying and pasting is a feature that’s available in nearly every text editor (“nearly” — anyone remember the days before iOS had a clipboard?).

However, SQL Server Management Studio goes above and beyond the regular copy and paste feature set — it has a clipboard ring.

What’s a clipboard ring you ask?

The clipboard ring let’s you cycle through the last 20 things you copied onto your clipboard when you go to paste in SSMS.

Read on to learn how to use the clipboard ring.  When I remember that it exists, it’s quite useful.

Top Jobs Report

Kenneth Fisher only looks at the best SQL Agent jobs:

But sometimes what you want is an overall history. Maybe you want to see your longest running jobs? Or the most frequent jobs? Or jobs that are failing? There is a great pre-built report just for that.

Now, be warned. The report and the history view both pull from the system views and they only hold so much history. You need to make s

With that warning stated, this is a good report if you just want to get some idea of what’s happening on your server.

Prioritizing Rows In A Union

Daniel Hutmacher shows several ways to combine data from multiple sources, prioritized by source:

For the sake of simplicity, I’ll assume that our example data has a clustered index on the “primary key” that we want to use to determine which rows have already been included in the set – in our example, the primary key is (FirstName, LastName). The data I’m using is a fictional example, but here’s the jist:

  • #Employees has about 33 000 rows.
  • #Customers has about 44 000 rows.
  • #Passengers has about 500 000 rows.

The data is constructed in a way that these queries should return 530 000 rows, so we’re looking at some overlap but far from totally overlapping rows.

Example #492,806 that T-SQL is not a true 4th Generation Language, that how you write the query can greatly matter for performance.

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31