Author: Kevin Feasel

I Remember Halloween

Published 2020-05-01 by Kevin Feasel

Paul White talks about the Halloween Problem:

Much has been written over the years about understanding and optimizing SELECT queries, but rather less about data modification. This series looks at an issue that is specific to INSERT, UPDATE, DELETE and MERGE queries – the Halloween Problem.
The phrase “Halloween Problem” was originally coined with reference to a SQL UPDATE query that was supposed to give a 10% raise to every employee who earned less than $25,000. The problem was that the query kept giving 10% raises until everyone earned at least $25,000.
We will see later on in this series that the underlying issue also applies to INSERT, DELETE and MERGE queries, but for this first entry, it will be helpful to examine the UPDATE problem in a bit of detail.

This is a classic problem in data management and has led to a good bit of confusion over time about why database updates can perform worse than you’d expect.

Comments closed

Visualizing Ranking Data

Published 2020-04-30 by Kevin Feasel

Stephanie Evergreen gives us a few techniques for visualizing ranking data:

And any time your data could be visualized in a bar chart, you can always take a jump to a dot plot or lollipop chart. You got this.
Any of these variations will be a perfectly fine visual to show rank data at a single point in time. If you have rank over time OR rank comparison across multiple groups, try a Bump Chart.

I was going to recommend a Cleveland dot plot, myself.

Comments closed

Using Powershell to Configure Database Mail and SQL Agent Alerts

Published 2020-04-30 by Kevin Feasel

Eric Cobb shows us how to use Powershell to set up database mail and SQL Agent alerts:

As a DBA, you need to know when there’s a problem on your SQL Servers. And while I highly recommend you use a full-fledged monitoring system, there are also some things you can set up on your SQL Servers so that they will tell you when certain things go wrong. This doesn’t replace a full monitoring system, but setting up the below alerts will give you notification when SQL Server encounters things like corruption or resource issues.

Even with a full-fledged monitoring system, there are places where you can still make use of mail and side alerts.

Comments closed

Color Band by Group in Power BI

Published 2020-04-30 by Kevin Feasel

Marco Russo and Alberto Ferrari show how we can change color alteration to switch from row to row and instead go from group to group:

The background color of the rows depends on Sales[Order Number]. The background color switches between white and light gray every time the order number changes, so all the rows of the same order have the same background color and can be easily identified. You cannot obtain this visualization by only using a Power BI style, because the coloring of a row depends on the actual data in it. You can achieve this goal by using the conditional formatting feature in Power BI. You can set the background color of a cell according to the value of a measure. Therefore, you need a DAX formula that returns two values: one for the white rows and one for the gray rows. The value returned by the measure must alternate between those two values with each consecutive order number.

Read on for an example of how you can do this.

Comments closed

Foreign Keys and Non-Changing Updates

Published 2020-04-30 by Kevin Feasel

Brent Ozar has a warning for us:

If you update a row without actually changing its contents, does it still hurt?
Paul White wrote in detail about the impact of non-updating updates, proving that SQL Server works hard to avoid doing extra work where it can. That’s a great post, and you should read it.
But foreign keys add another level of complexity. If you have foreign keys, and you tell SQL Server that you’re updating the contents of those related columns, SQL Server will check ’em even when the data isn’t changing.

Click through for the demonstration. I don’t think I agree with Brent’s dichotomy as laid out at the end of the post—the back-and-forth about removing keys would only make sense if you’re on the edge of the database equivalent of the production possibility frontier and expecting to move well beyond that point very soon. I’m not sure how well that describes the average company, but it’s a side quibble.

Comments closed

Progressive Disclosure in Power BI

Published 2020-04-30 by Kevin Feasel

Prathy Kamasani takes us through the implementation of a design idea in Power BI:

In the above example, I used a pattern to show details using action from the Card. When a user clicks on a card, the report will show details related to Card. It sounds straightforward, but it involves a lot of work using Power BI Functionalities: Buttons, Bookmarks, Sections, Grouping and Page Size.
There are few aesthetics I paid attention in this Report Page which are key for any landing page. Usually, a Landing page helps users to navigate around the Power BI Model, so it is important to highlight those navigation steps. In the above model, I used Buttons, labels and Images for navigation hints.

I like this for some uses, like giving analysts a chance to dive into the data. For an operational dashboard, I don’t like it very much unless the cards at the top alone provide me enough information to know whether I need to take an action; otherwise, it loses one of the most important concepts of a dashboard, glanceability.

Comments closed

When a Non-Clustered Index on Clustered Columns Makes Sense

Published 2020-04-30 by Kevin Feasel

Allen White gives us a scenario where adding a non-clustered index which is the same column as the clustered index can make sense:

Recently I was asked about adding a non-clustered index to a table (let’s call it Images) with just one column. It had been added in the development database and it improved performance dramatically. I looked at it and it had the same key as the clustered index on that table.
In reviewing the query I saw that Images was joined to the other tables in the query, but none of the columns were used, so Images was joined to ensure that values from the other tables had rows in Images. The query plan shows a significantly higher number of reads against Images without the new NCI (non-clustered index) than when it’s present.

I do agree that this can help—as we obviously see. The backseat query tuner in me wonders if maybe there’s another way to write the query to prevent the scan by using CROSS APPLY, but that’d only help if they were getting a small percentage of rows from the parent table expression built from the combination of the clustered index scan and index seek in the second example.

Comments closed

Using SQL Server Scalar Functions with Power Query

Published 2020-04-30 by Kevin Feasel

Erik Svensen shows that you can call user-defined scalar functions in SQL Server from Power Query:

Currently I am working with a project where we extract data from a SQL server – some of the business logic is built into scalar value functions (documentation).
Now the magic of PowerQuery enables us to reuse these functions within PowerQuery and Query Folding is supported – more about this at the end of this post.

My initial reaction is “That way lies madness” but in moderate doses, I could see this as a valuable second-best option for teams pulling data into Power BI.

Comments closed

Developing for Databricks with VS Code

Published 2020-04-29 by Kevin Feasel

Gerhard Brueckl tells us what comes after notebooks for users with development backgrounds:

For those users Databricks has developed Databricks Connect (Azure docs) which allows you to work with your local IDE of choice (Jupyter, PyCharm, RStudio, IntelliJ, Eclipse or Visual Studio Code) but execute the code on a Databricks cluster. This is awesome and provides a lot of advantages compared to the standard notebook UI. The two most important ones are probably the proper integration into source control / git and the ability to extend your IDE with tools like automatic formatters, linters, custom syntax highlighting, …
While Databricks Connect solves the problem of local execution and debugging, there was still a gap when it came to pushing your local changes back to Databricks to be executed as part of a regular ETL or ML pipeline. So far you had to either “deploy” your changes by manually uploading them via the Databricks UI again or write a script that uploads it via the REST API (Azure docs).

Gerhard has a nice extension for Visual Studio Code which helps with this. I’m also a huge fan of the DatabricksPS module, so I’ll happily plug that here.

Comments closed

Avoiding Diagnonal Axis Labels

Published 2020-04-29 by Kevin Feasel

Cole Nussbaumer Knaflic gives us two good alternatives for avoiding diagonal labels in data visualizations:

There is one common phenomenon in graphs that I recommend actively avoiding: diagonal axis labels. They are often observed on the x-axes of graphs, where many tools automatically rotate text when the labels become too long to fit horizontally. While this might seem like a kind favor, there are usually better options. Beyond looking messy, diagonally rotated text is slower to read. In this short post, I’ll highlight two common scenarios that lead to diagonal x-axis labels—long category names on bar charts and long date labels on line graphs—and a couple ideas to try instead.

Diagonal labels aren’t the worst on printed visuals (as you can tilt the paper to read those labels clearly), but they’re not great. When combined with screens—especially screens which change their rotation as you tilt them, like on phones—that leads to a lot of unnecessary dissatisfaction.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31