Press "Enter" to skip to content

Author: Kevin Feasel

Watermarking in Spark Structured Streaming

Max Fisher takes us through an important feature for Spark streaming:

When building real-time pipelines, one of the realities that teams have to work with is that distributed data ingestion is inherently unordered. Additionally, in the context of stateful streaming operations, teams need to be able to properly track event time progress in the stream of data they are ingesting for the proper calculation of time-window aggregations and other stateful operations. We can solve for all of this using Structured Streaming.

For example, let’s say we are a team working on building a pipeline to help our company do proactive maintenance on our mining machines that we lease to our customers. These machines always need to be running in top condition so we monitor them in real-time. We will need to perform stateful aggregations on the streaming data to understand and identify problems in the machines.

This is where we need to leverage Structured Streaming and Watermarking to produce the necessary stateful aggregations that will help inform decisions around predictive maintenance and more for these machines.

Read on to see how watermarking works in various scenarios, including when you join together streams.

Comments closed

Automating Bring-Your-Own-Key Rotation for TDE in Azure SQL DB

Shoham Dasgupta announces a new preview program:

Transparent data encryption (TDE) in Azure SQL Database and Managed Instance helps protect against the threat of malicious offline activity by encrypting data at rest.  TDE with Customer-Managed Key (CMK) enables Bring Your Own Key (BYOK) scenario for data protection at rest, by allowing a key stored in a customer-owned and customer-managed Azure Key Vault to be used as the TDE Protector on the server or managed instance.

When using TDE with Customer-Managed Key, one of the important responsibilities that customers need to perform on a regular basis is key rotation, that is, rotating the TDE Protector on the server by switching to a new key (or new version of the earlier key) from Azure Key Vault. Key rotation is a critical activity for an organization that is required to meet security and compliance objectives.

Automated key rotation for Azure SQL Database and Managed Instance is now available in preview, simplifying key management responsibilities for customers.

Click through to see how this works.

Comments closed

Data Type Conversions in SQL Server

Kenneth Fisher talks types:

Did you know that 1/2=0? If you didn’t you could be in for some interesting surprises.

1/2 returns 0 because 1 and 2 are both integers. When you divide two integers you get back an integer. Now, we, as human beings, know that what you probably wanted was to get back 0.5. Unfortunately, computers really aren’t all that smart and we have to be very explicit with them. 

Read on for more details, as well as a chart of which conversions may be implicit and which must be explicit.

Comments closed

Sources of Query Plan Duplication

Eitan Blumin has a script for us:

Running the above script with the parameter @RCA set to 0 will output a list of the top query hashes (sorted based on their size in bytes so that you’d see the most impactful queries first) that have multiple different query plan hashes cached for them.

Running the script with the parameter @RCA set to 1 will output the same list, with additional columns that will help you do deeper root cause analysis.

Unlike the similar script provided by Brent, my script counts the number of query plan hashes rather than the query plan handles per each query hash. I found that this results in far fewer “false positives”. But you can still set the parameter @CountByPlanHandleInsteadOfPlanHash to 1 if you want it to count based on query plan handles instead.

Click through for the script, details on how it works, and various classes of reason why you might get to this outcome.

Comments closed

Wrapping Get-Credential in Powershell

Stuart Ainsworth writes a script:

Just a quick blog; am working on a script that requires credentials to run against a REST API, and a developer wanted to convert that script to use command-line parameters. I built this script (and quick test) to show that the command-line parameters create the same object as the Get-Credential object.

The one thing that has me on edge a little is that the password comes in as plaintext. I do understand that sometimes this is the best route, though I’d put the pressure on devs to make sure there isn’t a route in which plaintext passwords get encrypted before transmission, even in a local environment like this.

Comments closed

A Critique of Parameter Sensitive Plan Optimization

Brent Ozar is not amused:

Parameter sniffing is a notorious problem for Microsoft SQL Server because it tries to reuse execution plans, which doesn’t work out well for widely varying parameters. Here’s a primer for the basics about how it happens.

SQL Server 2022 introduces a new feature called Parameter Sensitive Plan optimization. I’m not really sure why Microsoft capitalized the first three letters sometimes and not others, or why they only abbreviate it as “PSP optimization” – as long as we’re abbreviating, why not call it PSPO? Optimization is the word in there with the most syllables – why the hell would we leave that one as the one that isn’t abbreviated?

Because I make things go faster for a living, we’re gonna call it PSPO. You’ll see how I pronounce it later, and why.

Erik Darling has also spent a lot of time talking about this. And at this point, I don’t know that how many (if any) of Brent’s critiques get fixed before RTM.

Comments closed

Max Server Memory and AWS

Andrea Allred runs into a weird issue on AWS RDS:

We tracked down every job that was touching the server and started to tune it, thinking that was just pushing us over the edge. We worked with AWS and finally one of our engineers noticed that our MAX Server Memory Setting was back at the SQL default. You know that insane default? Yes, it was there. But we had properly set that…three months ago when this stack was put in place.

Click through for the entire story, including symptoms and resolution.

Comments closed

DAX Studio 3 Released

Marco Russo has an announcement:

DAX Studio 3 was released today.

This new, major version offers a new user interface and user experience, providing a modern look and feel without losing any of the features available in previous versions. On the contrary, in DAX Studio 3, many features that were harder to reach in the past are now more visible and easier to use.

The cost of implementing a complete overhaul for a product’s user interface is typically high, but at the same time, this overhaul does not provide new features compared to previous versions. So why bother about the user interface while the product is free and open-source?

Read on to see what those changes look like.

Comments closed

Trying out Shiny Python

Jamie Owen kicks the tires on Py-shiny:

We would posit (see what we did there) that R-{shiny} has been a boon for data science practitioners using the R language over the last decade. We know that in our Python work, we have certainly been clamouring for something of the same ilk. And whilst there are other frameworks that we also like, streamlit and dash to name a couple, neither of them has filled us with the same excitement and confidence that shiny did in R to build both simple and complex bespoke web applications. With RStudio Posit conf in action the big news from July 27th was the alpha release of Py-{shiny} which was a source of great interest for us, so we couldn’t resist installing and starting to build.

If you are familiar with R-shiny already, then much of the py-shiny package will feel familiar to you (albeit with a couple of things having been renamed). However we will approach the rest of this post assuming that a reader does not have that prior experience and take you through building a simple shiny application to display plots on subsets of a dataset.

I’m curious how much take-up there will be for the library, given that there are several good competitors on Python.

Comments closed

Installing SQL Server on CentOS

Nisarg Upadhyay has a walkthrough for us:

In this article, we will understand how we can manage a SQL Database in CentOS 8.0. This article is the first article on the topic Manage SQL Server on CentOS.

1. Step-by-step Installation of SQL Server and client tools in CentOS

2. Create and connect to the SQL Database

I have created a virtual machine on a workstation and installed CentOS 8.0. First, let us install SQL Server on CentOS.

One quick note is that CentOS is no longer officially supported as a platform for SQL Server on Linux. It’ll still work fine, though if you use it in production and run into a problem, there’s a risk that you wouldn’t be able to get timely support. Still, if you’re just trying it out on a personal device and you like the Red Hat-like experience versus Ubuntu-based distributions, getting Microsoft support isn’t all that relevant.

Comments closed