Press "Enter" to skip to content

Curated SQL Posts

Text Concatenation and Dataset Refresh Errors

Chris Webb follows up on a prior post:

One of the most popular topics I’ve ever written about here is the dataset refresh errors you get with dynamic data sources and the Web.Contents function in Power BI (see here and here for example). Recently I learned about an interesting new detail regarding this problem that I thought was worth blogging about, and which may help someone trying to troubleshoot it.

Read on for the scenario, as well as what you can do to avoid an issue.

Comments closed

Find and Take “Missed” Backups

Tracy Boggiano checks the lost-and-found:

I blogged a few years ago about my configuration tables for Ola Hallengren’s maintenance solution that allows me to pull the configuration parameters from tables so I don’t have to edit SQL Agent jobs and allows me to query tables to see how all my jobs are configured with Ola’s maintenance solution.  I’ve updated these as I’ve implemented them at my new job.  But I was talking to another DBA about another trick I have implemented that I didn’t originally publish and it’s how I make sure I get a full or differential backup for all databases even when things fail without a bunch of manual intervention.

With Ola’s script if you default to passing in USER_DATABASES for your @Databases parameter you can query the msdb for your time period, in my case a week (168 hours), all the databases that have been backed up and return those with a minus sign concatenated together so you can exclude those from being backed up, and append those to @Databases parameter where it has USER_DATABASES already using the below code with the magic of XML and the STUFF command and pass that back to the DatabaseBackup procedure:

Read on for the code.

Comments closed

Encryption by Default

Matthew McGiffen lays down a stake:

You can see that a small majority of DBAs who follow Brent on Twitter and respond to polls use some form of at-rest encryption. I guess that means that in the larger population the majority aren’t encrypting their data at all.

There are a number of reasons why that might be the case, the only good one is that your database doesn’t hold any personally identifiable information (PII) or sensitive data which will certainly be true in some cases but not in most.

Read on for more of Matthew’s thoughts and do check out his new book. My copy’s supposed to arrive today.

1 Comment

Power BI Row-Level Security for Language Selection

Kristyna Hughes has an interesting use case for row-level security:

Making multilingual reports in Power BI requires a lot of different elements. Translations can be added to PBIX files to translate column names, visual titles, etc. but these translations will not impact the data itself. One way to translate the dimensions themselves (star schema introduction) is to have a translation table that contains a display language and value column for each ID of the dimension. In the example below, the translation table would replace the original table and we could apply RLS to the Language column.

I think this is a clever use case for row-level security.

Comments closed

Storing Semi-Additive Facts as Timespans

Timo Zishiri gives a new spin to a common warehousing problem:

In these cases, the measure may be aggregated across dates by averaging over the number of periods, e.g., average daily inventory levels. Measures can also be aggregated across dates by taking the maximum/minimum for the time interval.

More specifically, this blog focuses on an alternative approach to providing end users with the ability to do point-in-time analysis, so-called trend analysis.

Click through to see how a timespan table would work.

Comments closed

Unity Catalog in Azure Databricks

Meagan Longoria makes a recommendation:

Unity Catalog in Databricks provides a single place to create and manage data access policies that apply across all workspaces and users in an organization. It also provides a simple data catalog for users to explore. So when a client wanted to create a place for statisticians and data scientists to explore the data in their data lake using a web interface, I suggested we use Databricks with Unity Catalog.

Read on to learn more about what the Unity Catalog does.

Comments closed

Optimizing Async Sinks in Flink

Hong Liang Teoh speeds things up:

When designing a Flink data processing job, one of the key concerns is maximising job throughput. Sink throughput is a crucial factor because it can determine the entire job’s throughput. We generally want the highest possible write rate in the sink without overloading the destination. However, since the factors impacting a destination’s performance are variable over the job’s lifetime, the sink needs to adjust its write rate dynamically. Depending on the sink’s destination, it helps to tune the write rate using a different RateLimitingStrategy.

This post explains how you can optimise sink throughput by configuring a custom RateLimitingStrategy on a connector that builds on the AsyncSinkBase (FLIP-171). In the sections below, we cover the design logic behind the AsyncSinkBase and the RateLimitingStrategy, then we take you through two example implementations of rate limiting strategies, specifically the CongestionControlRateLimitingStrategy and TokenBucketRateLimitingStrategy.

Read on for some tips on creating a rate limiting strategy for a sink.

Comments closed

The Importance of Proper Data Modeling in Power BI

Paul Turley avoids “big, wide tables”:

Power BI is architected to consume data in a dimensional model, with narrow fact tables and related dimensions. Introducing a big, wide table in a tabular model is extremely inefficient. It takes up space and memory resources, impacts performance, and complicates measure coding. Flattening records into a flat table is one of the worst things you can do in Power BI and a common mistake made by novice Power BI users.

This is a conversation I’ve had with many customers. We want our cake, and we want to eat it too. We want to have all the analytic capabilities, interactivity and high performance but we also want the ability to drill-down to a lot of details. What if we have a legitimate need to report on transaction details and/or a large table with many columns? It is well-known that the ideal shape is a star schema but what if we need to shape data for detail reporting? The answer is that you can have it both ways, but just not in one table.

Read on for a better model design (hint: the Kimball style) as well as several tips and tricks.

Comments closed