Press "Enter" to skip to content

Curated SQL Posts

Comparing Power BI Files

Imke Feldmann shows off a new Power BI file comparison tool:

What’s not covered?
Nothing. The comparison includes everything from the pbit-files: So beneath your M and DAX code, you’ll see all about your visual definitions (incl. filters set !), row level security and much, much more. Actually, I found some information a bit noisy (like many date fields, telling you when which changes happened). So I filtered them out in Excel. I’d recommend to check it out and play a bit with it to find the most suitable settings for you.

This looks quite useful.

Comments closed

Modeling Semi-Additive Measures

Paul Poco shows a couple techniques for modeling semi-additive measures in Analysis Services and Power BI:

As mentioned earlier, the most commonly encountered approach is Option 2, the snapshot fact table. The main drawback of this approach is that the fact table’s size will grow extremely fast. For example, if you want to calculate the headcount in a company with 10,000 employees on average, and you want 5 years of historical data, you will add 10,000 rows per day to your fact table – that gives you (10,000 * 365 * 5 =) 18,250,000 rows after 5 years.  

If you used the first approach, Option 1, the fact table would be (10,000 * 5 =) 50,000 rows after 5 years, assuming your employees change position or quit the company once a year, on average. 

The snapshot fact table (Option 2) is (18,250,000 / 50,000 =) 365 times bigger. On the bright side, as the data is very repetitive, you might get a very good compression ratio on these tables.  

Check it out. Semi-additive measures are not as common as additive measures, but you’re liable to have a couple of them in your data model.

Comments closed

Icon Maps in R

Laura Ellis shows how you can build maps full of little icons:

That was ok, but we should try to make the images more aesthetically pleasing using the magick package. We make each image transparent with the image_transparent() function. We can also make the resulting image a specific color with image_colorize().

I then saved the images using the image_write() function. I manually re-uploaded them to GH.

This was a great example of where laying icons on a map works.

Comments closed

Derivative Event Sourcing

Anna McDonald explains the concept of derivative event sourcing:

If you happen to be the proud owner of a single order service, then you are all set to begin.

But what if you have more than one order service?

Something that tends to happen at companies that have been around for more than a sprint is the accumulation of technical debt. Sometimes that debt takes the form of duplicate applications. Mergers happen and you adopt other applications that, for reasons beyond your control, cannot be retired or rewritten right away. In other words, sometimes you end up with more than one order service—enter derivative event sourcing!

This is a nice article for real-life scenarios where you don’t get to build nice, well-designed services from scratch.

Comments closed

Tracking Who Changed Data

Bert Wagner is on a quest to find out who moved his cheese:

Have you ever wondered who was the last person (or process) to modify a piece of data in your database?

SQL Server offers plenty of system views and functions that provide insight into how your server is running and how your queries are performing. However, it doesn’t offer much information about who last modified your data.

There are a few workarounds, though they aren’t great.

Comments closed

Prior Year to a Specific Date in DAX

Alberto Ferrari lets us compare up to specific dates between years:

Unfortunately, the calculation is not perfect. At the year level, it compares the full previous year against an incomplete current year – in this example there are no sales after September 5th in the current year.

Besides, the problem appears not only at the year level, but also at the month level. Indeed, in September the Previous Year measure returns sales for the entire month of September in the previous year. The comparison is unfair, as there are only five days’ worth of sales in September of the current year.

Read on for a better technique.

Comments closed

Hot Patching Azure SQL Database

Hans Olav Norheim has an interesting paper on a technique Microsoft uses to release SQL Server patches for Azure SQL Database while minimizing downtime:

The SQL Engine we are running in Azure SQL Database is the very latest version of the same engine customers run on their own servers, except we manage and update it. To update SQL Server or the underlying infrastructure (i.e. Service Fabric or the operating system), we must stop the SQL Server process. If that process hosts the primary database replica, we move the replica to another machine (requiring a failover).
 
During failover, the database may be offline for a second and still meet our 99.995% SLA. However,  failover of the primary replica impacts workload because it aborts in-flight queries and transactions. We built features such as resumable index (re)build and accelerated database recovery to address these situations, but not all running operations are automatically resumable. It may be expensive to restart complex queries or transactions that were aborted due to an upgrade. So even though failovers are quick, we want to avoid them.

Read on to see how they do it. There’s no on-prem analogue yet, though perhaps that will come in time.

Comments closed

Immediate Streaming and the Power BI Data Gateway

Gilbert Quevauvilliers doesn’t have time to wait:

Currently as far as I understand it the On-Premise Data Gateway will wait and buffer some data before sending it through to the Power BI Service. By changing the setting below in the On-Premise Data Gateway, it will start streaming the data almost immediately.

I am fortunate enough to be really good mates with Phil Seamark who so part of the Power BI CAT team and he gave me a little nugget of gold that I would like to share with you.

Read on to see how to configure the gateway to stream immediately.

Comments closed

LSTM in Databricks

Vedant Jain shows us an example of solving a multivariate time series forecasting problem using LSTM networks:

LSTM is a type of Recurrent Neural Network (RNN) that allows the network to retain long-term dependencies at a given time from many timesteps before. RNNs were designed to that effect using a simple feedback approach for neurons where the output sequence of data serves as one of the inputs. However, long term dependencies can make the network untrainable due to the vanishing gradient problem. LSTM is designed precisely to solve that problem.

Sometimes accurate time series predictions depend on a combination of both bits of old and recent data. We have to efficiently learn even what to pay attention to, accepting that there may be a long history of data to learn from. LSTMs combine simple DNN architectures with clever mechanisms to learn what parts of history to ‘remember’ and what to ‘forget’ over long periods. The ability of LSTM to learn patterns in data over long sequences makes them suitable for time series forecasting.

This is a nice overview and as a bonus, there’s a notebook as well where you can try it on your own.

Comments closed