Press "Enter" to skip to content

Curated SQL Posts

Deploying a Storage Solution to a Kubernetes Cluster

Chris Adkin continues a series:

Before we dive into deploying a storage solution to our Kubernetes cluster, we need to understand the basics of storage in the world of Kubernetes, which can appear to be both exotic and mysterious to the uninitiated. To dispel some confusion around Kubernetes and storage, the storage IO path is exactly the same as that with common garden vanilla variety Unix or Linux. The Kubernetes storage ecosystem introduces two extra things we need to concern ourselves with above and beyond conventional Unix/Linux storage, firstly there are some extra layers of abstraction between the physical storage and filesystems that pods use, what I like to refer to as . . .

Read the whole thing. And that was a particularly mean cut-off point on my part, if I do say so.

Comments closed

Error Messages on SSDT Database Project Deployments

Chris Johnson has some advice if you’re hitting an error when deploying a SQL Server Data Tools database project:

Today I’d like to talk about three error messages you might see when deploying an SSDT database project, either through Visual Studio or via a dacpac and script. I’m going to focus here on what you see from inside of Visual Studio, but you will see similar errors returned when you deploy using a script and the reasons behind them will be the same.

Read on for Chris’s findings. These errors definitely aren’t a complete survey of possible messages, but they do hit some of the less obvious cases.

Comments closed

Finding Query Performance Issues with Query Store

Andrea Allred has a primer on Query Store:

So if I can’t modify or add indexes and I can’t change code, how do I get my query times to drop? Query Store to the rescue.

I love to tune queries. I feel so satisfied to see the times dropping on my server as I tune things. Recently, I have been tracking my Batch Requests per Second and my Instance Waits to see if I am making improvements when I tune. It has been awesome!

What I am going to show you today is how I dig into my query store to find those misbehaving queries and make their performance better.

Click through for some high-level tips on how to use Query Store.

Comments closed

Limitations with Control Flows in Azure Data Factory

Meagan Longoria has a list:

If you’ve been using Azure Data Factory for a while, you might have hit some limitations that don’t exist in tools like SSIS or Databricks. Knowing these limitations up front can help you design better pipelines, so I’m listing a few here of which you’ll want to be aware.

1. You cannot nest For Each activities.
Within a pipeline, you cannot place a For Each activity inside of another For Each activity. If you need to iterate through two datasets you have two main options. You can combine the two datasets before you iterate over them. Or you can use a parent/child pipeline design where you move the inner For Each activity into the child pipeline. Fun fact: currently the Data Factory UI won’t stop you from nesting For Each activities. You won’t find out until you try to execute the pipeline.

Click through for several other limitations and workarounds.

Comments closed

An Introduction to Azure Purview Studio

Rahul Mehta takes a look at Azure Purview’s current user interface:

Metrics in Azure Purview are integrated and reported using Azure Monitor. Click on the Metrics section and it would have a link to Azure Monitor, which would open in a new tab as shown below. From this interface, we can filter and split different data metrics, plot the same on different types of graphs, as well as build custom dashboards to visualize the metrics to evaluate performance. The available metrics depend on the features enabled in the Azure Purview account. Metrics may have different levels of granularity and this may lead to a significant amount of data. The data generated can be aggregated in these dashboards using the available aggregation functions.

Click through for a quick walkthrough.

Comments closed

Last Observation Carried Forward in R

Nathan Eastwood shows how to perform Last Observation Carried Forward in R:

Real life data is often riddled with missing values – or NAs – where no data value are stored for the variable in observation. Missing data such as this can have a significant effect on the conclusions which can be drawn from the data. For example individuals dropping out of a study or subjects not properly reporting responses. A common solution to this problem is to fill those NA values with the most recent non-NA value prior to it – this is called the Last Observation Carried Forward (LOCF) method.

This blog post will look at how to implement this particular solution using a combination of {dplyr}, {dbplyr} and {sparklyr}.

Click through for the solution.

Comments closed

Generating Random Numbers in R

Holger von Jouanne-Diedrich brings the noise:

In data science, we try to find, sometimes well-hidden, patterns (= signal) in often seemingly random data (= noise). Pseudo-Random Number Generators (PRNG) try to do the opposite: hiding a deterministic data generating process (= signal) by making it look like randomness (= noise). If you want to understand some basics behind the scenes of this fascinating topic, read on!

Click through for an explanation of the process.

Comments closed

Managing tempdb Growth

Monica Rathbun shares part two in a series:

As you learned in part one of this Mastering TempDB series, TempDB is a global resource used for many operations within SQL Server. You create and allocate temporary user objects such as temporary tables and table variables, rebuild indexes with SORT_IN_TEMPDB=ON, use it for version stores (RCSI), internal objects (worktables, spools, group by, order by) and even DBCC CHECKDB just to name a few. All these operations require space to be allocated in the TempDB database. At times, these operations can result in TempDB growing rapidly, which, in turn, can fill up the file system and cause failures. In this article, you will learn how to fix an overgrown TempDB file that has resulted in it running out of space.

It beats my answer, which is to rip the server out of the rack and chuck it in the ocean.

Comments closed

Drift Monitoring with Azure Machine Learning

I take a look at dataset drift monitoring in Azure Machine Learning:

One of the things I like to say about machine learning model is, “shift happens.” By that, I mean that models lose effectiveness over time due to changes in underlying circumstances. Relationships between variables that used to hold no longer do, and so our model quality degrades. This means that we sometimes need to retrain models.

But there’s a cost to retraining models—that work can be computationally expensive and time-consuming. This concern is particularly salient if you’re in a cloud, as you pay directly for everything there. This means that we don’t want to retrain models unless we need to. But when do we know if we should retrain the model? We can watch for model degradation, but there’s another method: drift detection in your datasets.

Read on for a demonstration of how the product works and a couple of things to keep in mind.

Comments closed