November 2022 – Curated SQL

Redshift Query Editor v2

Published 2022-11-30 by Kevin Feasel

Anusha Challa, et al, announce a new version of a Redshift query editor:

Amazon Redshift is a fast, fully managed, petabyte-scale cloud data warehouse. You have the flexibility to choose from provisioned and serverless compute modes. You can start loading and querying large datasets conveniently in Amazon Redshift using Amazon Redshift Query Editor v2, a web-based SQL client application.

It’s worth a try if you’re a Redshift user, though I’d imagine that frequent Redshift users have already sorted out their IDEs of choice.

Comments closed

Percentiles in KQL

Published 2022-11-30 by Kevin Feasel

Robert Cain continues a series on KQL:

Often we want to get data that is relative to other data. For example, we want a list of computers that have free space that is greater than the free space of other computers. We need to set a threshold, for example we want to return results where the free space is greater than 95% of the free space on other computers.

To do this, Kusto provides the percentile operator, along with its variants percentiles and percentiles_array.

Read on to see how it works. I do like the way that KQL handles percentile operations.

Comments closed

Text Concatenation and Dataset Refresh Errors

Published 2022-11-30 by Kevin Feasel

Chris Webb follows up on a prior post:

One of the most popular topics I’ve ever written about here is the dataset refresh errors you get with dynamic data sources and the Web.Contents function in Power BI (see here and here for example). Recently I learned about an interesting new detail regarding this problem that I thought was worth blogging about, and which may help someone trying to troubleshoot it.

Read on for the scenario, as well as what you can do to avoid an issue.

Comments closed

Find and Take “Missed” Backups

Published 2022-11-30 by Kevin Feasel

Tracy Boggiano checks the lost-and-found:

I blogged a few years ago about my configuration tables for Ola Hallengren’s maintenance solution that allows me to pull the configuration parameters from tables so I don’t have to edit SQL Agent jobs and allows me to query tables to see how all my jobs are configured with Ola’s maintenance solution. I’ve updated these as I’ve implemented them at my new job. But I was talking to another DBA about another trick I have implemented that I didn’t originally publish and it’s how I make sure I get a full or differential backup for all databases even when things fail without a bunch of manual intervention.

With Ola’s script if you default to passing in USER_DATABASES for your @Databases parameter you can query the msdb for your time period, in my case a week (168 hours), all the databases that have been backed up and return those with a minus sign concatenated together so you can exclude those from being backed up, and append those to @Databases parameter where it has USER_DATABASES already using the below code with the magic of XML and the STUFF command and pass that back to the DatabaseBackup procedure:

Read on for the code.

Comments closed

Encryption by Default

Published 2022-11-30 by Kevin Feasel

Matthew McGiffen lays down a stake:

You can see that a small majority of DBAs who follow Brent on Twitter and respond to polls use some form of at-rest encryption. I guess that means that in the larger population the majority aren’t encrypting their data at all.

There are a number of reasons why that might be the case, the only good one is that your database doesn’t hold any personally identifiable information (PII) or sensitive data which will certainly be true in some cases but not in most.

Read on for more of Matthew’s thoughts and do check out his new book. My copy’s supposed to arrive today.

1 Comment

Power BI Row-Level Security for Language Selection

Published 2022-11-30 by Kevin Feasel

Kristyna Hughes has an interesting use case for row-level security:

Making multilingual reports in Power BI requires a lot of different elements. Translations can be added to PBIX files to translate column names, visual titles, etc. but these translations will not impact the data itself. One way to translate the dimensions themselves (star schema introduction) is to have a translation table that contains a display language and value column for each ID of the dimension. In the example below, the translation table would replace the original table and we could apply RLS to the Language column.

I think this is a clever use case for row-level security.

Comments closed

Schema Comparison in Azure Data Studio

Published 2022-11-30 by Kevin Feasel

Chad Callihan tries out schema comparison in Azure Data Studio:

Do you need a tool for comparing database schemas? Azure Data Studio can be your answer. Let’s walk through using Azure Data Studio’s Schema Compare extension.

Read on for an overview of the product.

Comments closed

Storing Semi-Additive Facts as Timespans

Published 2022-11-30 by Kevin Feasel

Timo Zishiri gives a new spin to a common warehousing problem:

In these cases, the measure may be aggregated across dates by averaging over the number of periods, e.g., average daily inventory levels. Measures can also be aggregated across dates by taking the maximum/minimum for the time interval.

More specifically, this blog focuses on an alternative approach to providing end users with the ability to do point-in-time analysis, so-called trend analysis.

Click through to see how a timespan table would work.

Comments closed

Extracting JSON from a Spark DataFrame

Published 2022-11-29 by Kevin Feasel

Unmesha Sreeveni digs into some JSON:

Let’s see how we can extract a Json object from a spark dataframe column

This is an example data frame

Unemsha takes it one step at a time, breaking down each element of the DataFrame and showing how it all works.

Comments closed

Unity Catalog in Azure Databricks

Published 2022-11-29 by Kevin Feasel

Meagan Longoria makes a recommendation:

Unity Catalog in Databricks provides a single place to create and manage data access policies that apply across all workspaces and users in an organization. It also provides a simple data catalog for users to explore. So when a client wanted to create a place for statisticians and data scientists to explore the data in their data lake using a web interface, I suggested we use Databricks with Unity Catalog.

Read on to learn more about what the Unity Catalog does.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Month: November 2022