Press "Enter" to skip to content

Curated SQL Posts

Global Parameters in SSIS Framework

Andy Leonard has an update for us:

I’m happy to announce the latest version of our SSIS Framework includes global parameters! I can hear some of you thinking, …

“What Are Global Parameters, Andy?”

I’m so glad you asked! SSIS ships with package-scoped and project-scoped parameters. Project-scoped parameters may be used in any SSIS package in the project; package-scoped parameters are only available within the context of a single SSIS package. This functionality reduces repetition in SSIS package development and execution configuration.

Global parameters allow our SSIS Framework customers to set parameters and values that apply to the entire SSIS Catalog.

Now that you know what they are, Andy has an example of them in action. Global parameters aren’t part of the community edition, but they do look interesting.

Comments closed

Defining Simple Regression

Vincent Granville has a new algorithm for us:

The model-free, data-driven technique discussed here is so basic that it can easily be implemented in Excel, and we actually provide an Excel implementation. It is surprising that this technique does not pre-date standard linear regression, and is rarely if ever used by statisticians and data scientists. It is related to kriging and nearest neighbor interpolation, and apparently first mentioned in 1965 by Harvard scientists working on GIS (geographic information systems). It was referred back then as Shepard’s method or inverse distance weighting, and used for multivariate interpolation on non-regular grids (see here and here). We call this technique simple regression.

Read on to learn more about simple regression, including how to implement it and how it performs.

Comments closed

Automating Database Deployments: Why Not?

Grant Fritchey asks a question:

Building out processes and mechanisms for automated code deployments and testing can be quite a lot of work and isn’t easy. Now, try the same thing with data, and the challenges just shot through the roof. Anything from the simple fact that you must maintain the persistence of the data to data size to up time, and you have real problems in front of you.

However, adopting database deployment automation and testing has enormous benefits. Faster, safer, production deployment enhances the protection built around your production systems. Whether we want to use the loaded term of DevOps or not, the benefits of this style of development and deployment are easily documented and measured.

So, why are so few people doing it?

Grant gives some of the outline and lays out one response. I am seeing a lot more automation over time, but one underappreciated facet in this is a lack of trust for automated processes from humans. I think a good percentage of DBAs don’t trust that the automated process will get things correct, especially when dealing with complex chains of dependencies. An automated process may be less likely to make a mistake in a step, but it will also be unable to reason through an ambiguity and could perform an undesirable action in the event of unexpected circumstances. That’s a pretty big risk for DBAs who are concerned about their data. I can see a few other reasons as well, but this is one which I don’t hear often enough in these discussions.

Also, Grant asks people to fill out the State of Database DevOps survey, especially those people who are not automating database deployments.

1 Comment

Creating Custom T-SQL Snippets for SSMS

Dave Mason follows up on yesterday’s post:

I have a number of scripts and queries I’ve written and curated over the years. They’re not organized that well, scattered across different folders. Some are named poorly or grouped in a questionable manner. There are a handful that I tend to use the most. And yet with that small number, I sometimes have difficulty quickly finding a particular script (if I can find it at all), or spending too many mouse clicks to find it. It dawned on me recently to make use of code snippets.

Code snippets may have been intended primarily to aid code writing, but they can assist with administrative tasks too. 

These can provide a considerable benefit for data platform specialists.

Comments closed

Making Life Harder for Filter Operators

Erik Darling has a how-not-to guide for us:

We looked at a couple examples of when SQL Server might need to filter out rows later in the plan than we’d like, and why that can cause performance issues.

Now it’s time to look at a few more examples, because a lot of people find them surprising.

Read on for several examples. Also, because the bribes came through I don’t mind shilling for Erik, check out 25 hours of recorded content for $100. I think Erik’s knowledge is worth at least $4 an hour. Maybe even $5.

Comments closed

RANKX on Multiple Columns in DAX

Alberto Ferrari walks us through ranking based on multiple columns:

DAX offers the RANKX function to compute ranking over a table, based on measures or columns. One limitation of RANKX is that it is only capable of ranking using a single expression. Oftentimes it is necessary to use multiple columns to obtain a ranking, either because the business requirement dictates it, or because you want to rank ties with different criteria.

As a demonstration, we rank customers based on their purchase volume. To artificially introduce ties, we use the Rounded Sales measure, that rounds the sales amount to the nearest multiple of one thousand. Using Rounded Sales, several customers show the same amount of 10,000.00. Because they are ties, their ranking must now be defined by alphabetical order based on their names.

Read on for two methods to solve this problem.

Comments closed

Building a Backup Plan

Greg Larsen takes us through the steps of developing out a backup plan:

You might be wondering why you need to develop a backup plan. Can’t a DBA just implement a daily backup of each database and call it good? Well, that might work, but it doesn’t consider how an application uses a database. If you have a database that is only updated with a nightly batch process, then having a daily backup of the database right after the nightly update process might be all that you need. But what if you had a database that was updated all day long from some online internet application. If you have only one backup daily for a database that gets updated all day online, then you might lose up to a day’s worth of online transactions if it was to fail right before the next daily backup. Losing a day’s worth of transaction most likely would be unacceptable. Therefore, to ensure minimal data loss occurs when restoring a database, the backup and recovery requirements should be identified first before building a backup solution for a database.

The biggest non-secret here is that backup plans are ultimately business decisions rather than technical decisions. Greg then outlines several cases and provides considerations for each.

Comments closed

DAX Formatter for Power BI Desktop

Phil Seamark has a new tool for us:

Last week I was honoured to take part in the latest edition of the Power BI Dev Camp which is run by my colleague Ted Patterson. It was a fun session which I enjoyed.

As part of the Dev camp, I walked through some of my recent Visual Studio Code based blog posts on how to perform various tasks against models hosted in Power BI desktop.

While preparing for the session, Ted and I agreed that it might be helpful to create a small external tool that could automatically format all DAX expressions in a Power BI model. The idea is to leverage the excellent DAX Formatter API provided by the good folks at SQLBI. This API is the same endpoint used when you format your DAX using DAX Studio.

Read on for more details.

Comments closed

Delta Versus Lambda Architectures

Hector Leano compares the delta and lambda architectures:

Generally, a simple data architecture is preferable to a complex one. Code complexity increases points of failure, requires more compute to run jobs, adds latency, and increases the need for support. As a result, data pipeline performance degrades over time, increasing costs while decreasing productivity as your data engineers spend more time troubleshooting and downstream users wait longer for data refreshes.

Complexity was perceived as a necessary evil for the automated data pipelines feeding business reporting, SQL analytics, and data science because the traditional approach for bringing together batch and streaming data required a lambda architecture. While a lambda architecture can handle large volumes of batch and streaming data, it increases complexity by requiring different code bases for batch and streaming, along with its tendency to cause data loss and corruption. In response to these data reliability issues, the traditional data pipeline architecture adds even more complexity by adding steps like validation, reprocessing for job failures, and manual update & merge.

On the one hand, lambda was always intended to be a compromise architecture based on the tools of the time. On the other hand, take this with as many grains of salt as you need given that the post comes from the primary company responsible for delta.

Comments closed

Scaling ksqlDB, with Animations

Michael Drogalis walks us through scaling models with ksqlDB:

Software engineering memes are in vogue, and nothing is more fashionable than joking about how complicated distributed systems can be. Despite the ribbing, many people adopt them. Why? Distributed systems give us two things their single node counterparts cannot: scale and fault tolerance.

ksqlDB, the event streaming database, is built with a client/server architecture. You can run it with a single server, or you can cluster many servers together. Part 1 and part 2 in this series explained how a single server executes stateless and stateful operations. This post is about how these work when ksqlDB is deployed with many servers, and more importantly how it linearly scales the work it is performing—even in the presence of faults.

If you like, you can follow along by executing the example code yourself. ksqlDB’s quickstart makes it easy to get up and running.

Click through for well-animated examples.

Comments closed