Press "Enter" to skip to content

Day: May 14, 2020

Bot-Building with ksqlDB

Robin Moffatt has an interesting project for us:

But what if you didn’t need any datastore other than Kafka itself? What if you could ingest, filter, enrich, aggregate, and query data with just Kafka? With ksqlDB we can do just this, and I want to show you exactly how.

We’re going to build a simple system that captures Wi-Fi packets, processes them, and serves up on-demand information about the devices connecting to Wi-Fi. The “secret sauce” here is ksqlDB’s ability to build stateful aggregates that can be directly accessed using pull queries. This is going to power a very simple bot for the messaging platform Telegram, which takes a unique device name as input and returns statistics about its Wi-Fi probe activities to the user:

Click through for the tutorial.

Comments closed

Spark UDFs and Error Handling

Bipin Patwardhan takes us through an error-handling scenario when writing a Spark User-Defined Function:

A couple of weeks ago, at my work place, I wrote a metadata-driven data validation framework for Spark. After the initial euphoria of having created the framework in Scala/Spark and Python/Spark, I started reviewing the framework. During the review, I noted that the User Defined Functions (UDF) I had written were prone to throw an error in certain situations.

I then explored various options to make the UDFs fail-safe.

These are like any other code: you want it to be as robust to failure as you can get it (or at least robust enough at the margin).

Comments closed

Taking the Pulse of Your Azure VM

Mark Broadbent has put together a quick Powershell script to figure out what’s going on with an Azure VM:

I was recently given the task of identifying the state of an Azure VM so that an automation script using the az vm run-command invoke would not fail if the VM was down or under a reboot.

I initially thought the task would be really easy and a simple query of the VM state using Get-AzVM would provide us with a running state property of the VM, but as it happens the state is a little abstracted.

Click through to see how Mark solved the challenge.

Comments closed

Finding the SQL Server Service Account with T-SQL

Andy Mallon gives us a way in SQL to find the service account used for SQL Server:

SQL Server 2005 introduced Dynamic Management Views(DMVs). Technically, there are both views & functions–DMVs & DMFs, but colloquially they’re often simply referred to as just “DMVs”. DMVs simply return information about the state of the server or database and can be used for monitoring and troubleshooting of server/database health & performance.

New DMVs are added all the time. Today, let’s take a look at one of those useful DMVs.

This won’t cover all of the services—for example, it does not include the PolyBase engine or data movement service accounts—but it will give you some of the most important ones.

Comments closed

Notes on Derived Tables

Itzik Ben-Gan continues a series on table expressions:

The term derived table is used in SQL and T-SQL with more than one meaning. So first I want to make it clear which one I’m referring to in this article. I’m referring to a specific language construct that you define typically, but not only, in the FROM clause of an outer query. I’ll provide the syntax for this construct shortly.

The more general use of the term derived table in SQL is the counterpart to a derived relation from relational theory. A derived relation is a result relation that is derived from one or more input base relations, by applying relational operators from relational algebra like projection, intersection and others to those base relations. Similarly, in the general sense, a derived table in SQL is a result table that is derived from one or more base tables, by evaluating expressions against those input base tables.

There’s a lot to digest in this post, so check it out.

Comments closed

Who and What is Using Your tempdb Space?

Guy Glantser helps us troubleshoot who is using tempdb space and for what purpose:

In the past two weeks I encountered 3 different scenarios in which tempdb has grown very large, so I decided to write about troubleshooting such scenarios.

Before I describe the methods for troubleshooting tempdb space usage, let’s begin with an overview of the types of objects that consume space in tempdb.

The answer, of course, was Professor Plum in the Aviary with the eager spool.

Comments closed

Never Forget a WHERE Clause Again

If you frequently forget to include the WHERE clause when you run queries, Rob Farley has an answer for you:

You meant to only update a few rows, but you weren’t paying attention to what you’d selected before hitting the Execute button. The WHERE clause got missed from what you executed, and you’ve updated every row. Oops.

Now, I totally hear you that you could’ve put this in a transaction, checked your results and then only done a COMMIT when you were sure. You could even have put that COMMIT inside an IF block that checked @@ROWCOUNT… yes, yes, yes. You could have used an SSMS add-in to let you execute the whole query that you’re currently on. You could’ve used a different tool entirely, such as Azure Data Studio. There are plenty of ways to avoid this. But I’m going to show you something that I find works for me, especially when I’m giving the script to someone else to run.

It’s more effort and makes the code harder to read, but if you have a real concern about somebody (and that somebody can include you) goofing this up, it does the job.

Comments closed

Building Queues in the Database

Erik Darling has created a series on queue tables in the database. Part one builds out a table:

A little explanation of the table: this is good for a reusable, ordered queue, that’ll give us some feedback on how long things took. It could also be used is there were a pending element, but we’d probably wanna change the indexing so that we could find either the last start time, or the last end time efficiently.

The thing that probably needs the most explanation here is the indexing and constraints. Because I know you, and you’re staring at the way I have my primary key, and you’re getting an itch and a twitch. That’s okay, it’s you’ve been conditioned for years to put the most selective column first. Unfortunately, that won’t work here.

Part two takes us through querying the queue:

For the table, our main priorities were indexing to make sure we can find work easily, and not allowing duplicate items.

For our worker proc, the main goals are going to be

– Looping until we run out of work
– Finding and reserving work with minimal locking
– Making that process atomic

Thankfully, this is a lot easier than it sounds. The table design does a lot of the work for us.

Erik’s design is not one I typically reach for, though my constraints are a bit different from his—typically, I’m using queue tables to run on periodic schedules and grab batches of records which finish processing before the next timed batch begins, and processes are idempotent, so if a queued item re-runs on occasion, it’s okay. But this is a really good technique if you need a more robust solution.

Comments closed