Press "Enter" to skip to content

Author: Kevin Feasel

DAX Variables in Iterators

Kasper de Jonge explains how you can use a variable in the middle of an iterator:

As explained in the blog post the SUMX in this calculation will iterate over each row in the fact table which probably will have multiple currencies with different values for each date. The Min(FactExchangeRate[Factor]) will be evaluated for each currency and date and get the right value.

Now for those of you who have seen any of my sessions will ask why am I not using a variable as I always tell everyone to do so.

Click through for the answer and an example of where you can use a variable within an iterator.

Comments closed

Suspect Pages in msdb

Max Vernon explains what the suspect_pages table is in msdb:

When SQL Server detects corruption in a database, it reports that corruption immediately to the client who requested the data. But did you know SQL Server also stores the details about which pages have experienced corruption in the msdb database, in the suspect_pages table?

Read on to see the information you can get from this table, including a listing of what each event type means.

Comments closed

Testing ETL Pipelines

Ed Elliott has started a new series on testing ETL pipelines:

We test in production, this means we have monitoring and do things like have phased roll-outs using feature flags, or we roll-out to select customers first, prove it then roll it out to everyone else. Testing in production doesn’t mean hacking around getting some process to work. We don’t test “on production” (hacking), we test “in production” – while we are in production we are continually testing, and if anything goes wrong, we have alerts and can deal with it.

Testing pipelines feels difficult because there are so many moving pieces, but if you design for testability (e.g., being able to tee off samples of data, send test records through, etc.), things get easier.

Comments closed

Don’t Truncate Facts and Dimensions when Loading Data

Meagan Longoria explains why a truncate-and-reload strategy for data warehouses isn’t a good look:

Every once in a while, I come across a data warehouse where the data load uses a full truncate and reload pattern to populate a fact or dimension. While it may not be the end of the world for a small table, it does concern me and I usually recommend to redesign the load. My thoughts below on why this is an anti-pattern are true for using the actual TRUNCATE TABLE statement as well as executing a DELETE statement with no WHERE clause.

Read on for some great advice, including an exception to the rule.

Comments closed

Equidistant Points and Missing Data in Excel

Stephanie Evergreen shows how you can bring in missing data points in Excel to ensure the axis is accurate:

Excel automatically spaces your intervals and labels equidistant from one another but it is assuming that your intervals actually are equidistant. In this graph, that’s not the case. We are missing the months of March, April, July, and August, when either no one was enrolled in the study or we have some missing data. But we can’t just gloss over those months. It isn’t truthful and it distorts the data display.

Click through for the solution.

Comments closed

IDEs and Cloudera Data Science Workbench

Bethann Noble walks us through some of the options available for IDEs operating against Cloudera Data Science Workbench:

Other coders on the team including ML and DevOps engineers often work in local IDEs such as PyCharm.  These applications run locally on the user’s computer and connect to CDSW remotely over SSH for code completion and execution.  They must be configured per user and are not associated at the project level in CDSW. The documentation provides sample instructions for the Professional Edition of PyCharm v2019.1.

They support both browser-based and local IDEs.

Comments closed

Why Transaction Logs are Zero-Initialized

Paul Randal explains why the transaction log needs to be zero-initialized before SQL Server starts up:

It’s all to do with crash recovery. SQL Server knows where crash recovery has to start for a database, but not where it ends – i.e. SQL Server does not persist the ‘most recent LSN’ for a database anywhere. This means it has to work out where the end of the log is (and by end, I mean the most recent log record persisted on disk, not the physical end of the log file).

Read on for the detailed explanation.

Comments closed

Test Those Restores

Bob Pusateri wants you to test your restores, probably right now:

I had a client that was hit by a ransomware virus, encrypting several of their systems including the database server. Not to worry, though, they had “full backups” of all the affected machines, done by a third-party backup utility. After taking a day to cleanse their network, they restored these backups onto their servers. Now it was just a simple matter of bringing all the applications back online, right? Well, not exactly…

Just because a backup completes with no error code doesn’t mean it’s really a successful backup.

Comments closed

Getting the Largest Value Per Group

Erik Darling has a series on getting the highest value using CROSS APPLY. Part 1 covers the no-index route:

Let’s say you wanna get the highest thing. That’s easy enough as a concept.

Now let’s say you need to get the highest thing per user. That’s also easy enough to visualize.

There are a bunch of different ways to choose from to write it.

Part 2 covers the yes-index route:

In this round, row number had a tougher time than other ways to express the logic.

It just goes to show you, not every query is created equal in the eyes of the optimizer.

I don’t think I’m spoiling too much by saying that you really want a good index in place when using CROSS APPLY in this manner.

Comments closed