Press "Enter" to skip to content

Curated SQL Posts

Filtering On Kafka Streams

Robin Moffatt has a new series showing how to use Kafka Streams for dealing with syslog data:

syslog is one of those ubiquitous standards on which much of modern computing runs. Built into operating systems such as Linux, it’s also commonplace in networking and IoT devices like IP cameras. It provides a way for streaming log messages, along with metadata such as the source host, severity of the message, and so on. Sometimes the target is simply a local logfile, but more often it’s a centralised syslog server which in turn may log or process the messages further.

As a high-performance, distributed streaming platform, Apache Kafka® is a great tool for centralised ingestion of syslog data. Since Apache Kafka also persists data and supports native stream processing we don’t need to land it elsewhere before we can utilise the data. You can stream syslog data into Kafka in a variety of ways, including through Kafka Connect for which there is a dedicated syslog plugin.

In this post, we’re going to see how KSQL can be used to process syslog messages as they arrive in realtime.

Check it out.

Comments closed

Configuring Azure SQL Analytics

Esat Erkec has a guide showing how to configure and use Azure SQL Analytics on Azure SQL Database:

The most important and challenging responsibility of a database administrator is monitoring performance metrics. Because monitoring performance and troubleshooting performance issues are considered to be difficult. For this reason, we need diagnostic and monitoring tools to measure performance counters and metrics. For Azure SQL there is a tool which is named SQL Analytics. With this tool, we can measure and monitor Azure SQL databases and elastic pools. At the same time, we can create alerts for notifications. SQL Analytics offers performance metrics in graphical form. In this article, we will learn how to enable Azure SQL Analytics.

This is a long and screenshot-filled post, which is helpful if you’re getting started.

Comments closed

Powershell Splats And Quotes

Shane O’Neill shows when you need to use single quotes when splatting in Powershell:

If you’ve checked the examples in that post – and I recomment that you do – then you’ll see that it takes the syntax of Parameter = 'Value'.

Notice the Parameter portion is not in quotes? It also works perfectly well if you have the Parameter name in quotes e.g. 'Parameter' = 'Value' (double quotes works too).

Why would you use one instead of the other?

There is a special circle in the Inferno for people who put spaces in their parameter names.

Comments closed

Managing Central Management Server

Warren Estes explains how he keeps track of his servers using a Central Management Server:

We use a CMS server for each domain and I can’t imagine life without it. Kind of like when I discovered Amazon prime, or bought my first memory foam mattress.

The real magic of a CMS comes from being able to push jobs, or evaluate policies, on any server (targets) you want.

You can also execute T-SQL against all, or a subset of servers with either registered servers or CMS.

There are some caveats to look out for like collation differences and version specific DMVs when running queries across instances. Also security needs to be addressed. However, that is outside the scope of this post. You can find that information in the links in the first section.

I liked CMS when I had to deal with a dozen instances.  With hundreds of instances, I wouldn’t want to administer anything without one.

Comments closed

Powershell: CSVs And NoTypeInformation

Dan Blank lays down some information about saving headerless CSV files in Powershell:

Anyway, a question was posted recently using the #pshelp hashtag on Twitter.  How do you replace the header line of multiple csv files at once?

I saw this and maybe cockily thought to myself, “that’s got to be an easy 2 liner, bet I can boost my ego and quickly write out a solution”.

Spoilers:  it turned out to be slightly more complex.

Comments closed

Early Stats From SQLConstantCare Customers

Brent Ozar has some interesting and mostly disheartening statistics:

One of my favorite things about my job is being able to take the pulse of how real companies are managing their databases. I don’t wanna be locked in an ivory tower, preaching down to you, dear reader, about how you should be doing five million things a day – when in reality, you’re struggling to get an hour of real work done because you’ve got so many meetings.

But sometimes I wonder – am I out of touch? When I was a DBA, I remember struggling with backups and corruption checking – has that gotten easier? Have today’s DBAs started using more automation and tools to protect their estate? Is Transparent Data Encryption catching on? Did DBAs start using Extended Events for monitoring when I wasn’t looking?

And it’s important because I wanna build the right training material and scripts for our customers. I see a problem trending, I want to be able to give people the right information to fix the problem, fast.

Standard disclaimers about potential bias in samples apply, but it’s an interesting slice of the population.

Comments closed

Bucketing Tables By Size

Bill Fellows has an interesting approach to bucketing tables into groups of similar size:

You need to do something to all of the tables in SQL Server. That something can be anything: reindex/reorg, export the data, perform some other maintenance—it really doesn’t matter. What does matter is that you’d like to get it done sooner rather than later. If time is no consideration, then you’d likely just do one table at a time until you’ve done them all. Sometimes, a maximum degree of parallelization of one is less than ideal. You’re paying for more than one processor core, you might as well use it. The devil in splitting a workload out can be ensuring the tasks are well balanced. When I’m staging data in SSIS, I often use a row count as an approximation for a time cost. It’s not perfect – a million row table 430 columns wide might actually take longer than the 250 million row key-value table.

Click through for the script.  For the R version, this Stack Overflow post shows how to do it with cumulative sums and the cut function.

Comments closed

Visualizing Logistic Regression In Action

Sebastian Sauer shows using ggplot2 visuals what happens when there are interaction effects in a logistic regression:

Of course, probabilities greater 1 do not make sense. That’s the reason why we prefer a “bended” graph, such as the s-type ogive in logistic regression. Let’s plot that instead.

First, we need to get the survival probabilities:

d %>% 
  mutate(pred_prob = predict(glm1, type = "response")) -> d

Notice that type = "response gives you the probabilities of survival (ie., of the modeled event).

Read the whole thing.

Comments closed

Line Drawing And The Traveling Salesman Problem

Antonio Sanchez Chinchon builds a shortest-path portrait generator:

In this experiment I apply an heuristic algorithm to solve the TSP to draw a portrait. The idea is pretty simple:

  • Load a photo

  • Convert it to black and white

  • Choose a sample of black points

  • Solve the TSP to calculate a route among the points

  • Plot the route

Click through for the code.  This is an interesting application of the traveling salesman problem.

Comments closed

DTUs Or vCores For Azure SQL DB

Denny Cherry looks at a new Azure SQL Database announcement:

Today Microsoft has announced that there is a new way to buy Azure SQL DB. If DTUs aren’t making sense to you, you’ll be happy to know that you can now simply select how many vCores you want for your SQL DB workload.  Now this will still require that you have an understanding on your workload to use this new vCore based way to buy Azure SQL DB, but Cores are a concept that is easy for people to talk about and wrap their heads around. Now this new model is only in preview at the moment, but I’m guessing that it’ll be around for a while in preview, then it’ll go GA as this new model makes sense.

I definitely prefer this model, as IT departments already understand the idea, whereas DTUs were nebulous at best.

Comments closed