Press "Enter" to skip to content

Author: Kevin Feasel

Approximate Distinct Count with DAX

Gilbert Quevauvilliers runs some performance tests against the approximate distinct count formula in DAX:

I am currently running SQL Server Analysis Services (SSAS) 2019 Enterprise Edition. (This can also be applied to Power BI)

My Fact table has got roughly 950 Million rows stored in

And as mentioned previously it has got over 64 Million distinct users.

The data is queried from SQL Server into SSAS.

Gilbert first checks how close these are and then how much faster the approximate count is.

Comments closed

Setting Up a SQL Server Lab with AutomatedLab

Jess Pomfret looks at a very interesting Powershell module:

There is a fantastic PowerShell module called AutomatedLab that can enable you to easily build out a lab for the  specific scenario you need to test. Even better is the module comes with 70 sample scripts that you can start with and adapt to meet your needs.

The module gives you the option to work with Hyper-V or VMWare. I will say most of the examples are using Hyper-V, and that is what I’ll be using also.

For my lab I want a SQL Server 2019 instance joined to a domain, and a separate client machine that I can manage the SQL Server from. On the client I would need to be able to connect to the internet as I want to be able to download PowerShell modules from the gallery easily.

It’s about time for me to rebuild my lab, so I’ll need to check that out.

Comments closed

Incremental Imports with Sqoop

Jon Morisi continues a series on Sqoop:

In my last two blog posts I walked through how to use Sqoop to perform full imports.  Nightly full imports with overwrite has it’s place for small tables like dimension tables.  However, in real-world scenarios you’re also going to want a way to import only the delta values since the last time an import was run.  Sqoop offers two ways to perform incremental imports: append and lastmodified.

Both incremental imports can be run manually or created as job using the “sqoop job” command.  When running incremental imports manually from the command line the “–last-value” arg is used to specify the reference value for the check-column.  Alternately sqoop jobs track the “check-column” in the job and the value of the check-column is used for subsequent job runs as the where predicate in the SQL statement.  I.E. select columns from table where check-column > (last-max-check-column-value).

This is where Sqoop starts to break down for me, and Jon lists some of the issues in the post.

Comments closed

Pasting an R Plot into Word

Eran Raviv takes us through converting a plot in R to work with Microsoft Word:

In this post you will learn how to properly paste an R plot\chart\image to a word file. There are few typical problems that occur when people try to do that. Below you can find a simple, clean and repeatable solution.
When you google how to paste a plot from R to a word file you find that there are some solutions. But they are not satisfactory. For example, stackoverflow highest-ranking reply offers to use the Rstudio button to export your plot as an Enhanced Metafile (EMF) format. Couple of things wrong with it: the first is that you need to start messing around with the device scaling, because the export remembers the port dimensions. The second is that the word file is often not the final version. For better readability\representation we often convert the word to a pdf format before sending\publishing. But then you get something funny which you may have seen before, and drove some people insane consumed much of some people’s time, myself included:

Also check out the linked blog post for additional insights into why this happened.

Comments closed

LAST_VALUE() and Windows

Jeet Kainth explains the importance of specifying your window when using LAST_VALUE():

To return the actual last value, add the additional clause ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. This ensures that for each row the LAST_VALUE() function looks at all rows; from the very first row to the very last row.

Click through for the example. Remember that the default is going to the current row, not the entire data set.

Comments closed

Building a Cartesian Product with Power Query

Reza Rad shows how to build a Cartesian product using Power Query:

Sometimes, you need to create a multiplication of all sets of all pairs from two different data tables in Power BI. This action is different from Merge (Join) because there is no matching key columns and no way to relate the two tables together. However, you want to create a multiplication as a flatten table. If this process is needed to be done in Power Query, then there is a simple trick to do it, In this article, I’ll explain how you can do it.

Read on for that trick.

Comments closed

Dirty Deeds in SQL: Identity-Based Looping Edition

Nate Johnson has a confession to make:

I’ve done some things I’m not proud of. We all do, in IT, typically when we’re under-the-gun for a deadline or when the systems and frameworks in which we work have some sort of nuance or limitation that we just cannot get around, past, or over. And so we hack. We write code we’re not happy with. We even write code that we despise with every fiber of our well-intentioned being. But it has to be done. Because there’s no other choice.

Read on for the story. And if you want a much less ugly way to find gaps, I know a guy.

Comments closed

Multi-Statement TVPs and Time Logged

Erik Darling turns the seconds into minutes:

I’ve posted quite a bit about how cached plans can be misleading.

I’m gonna switch that up and talk about how an actual plan can be misleading, too.

In plans that include calling a muti-statement table valued function, no operator logs the time spent in the function. I’ve got a User Voice item for it here.

Click through for the demonstration. If that sounds like something you’d like fixed, vote up the User Voice item.

Comments closed

Altering an Indexed View Drops All of Its Indexes

Kendra Little has some thoughts on indexed views:

When I first heard about this behavior, it sounded like a bug to me: why should an alteration like adding a column to a view remove all the indexes?

A colleague of mine at Redgate wondered: is the behavior the same with the new CREATE OR ALTER syntax as it is with just plain ALTER? (Spoiler: I tested and it is the same: CREATE OR ALTER also drops all indexes on the view.)

Click through for the answer.

Comments closed

Capturing Query Errors with Extended Events

Jack Vamvas shows how to capture query errors using Extended Events:

If you’re troubleshooting SQL Server query errors , you’ll already know Extended Events are highly useful and very effective method to capture SQL Server errors.

To use the script you’ll need appropriate privileges to create the Extended Event. You will also need to have some space available on the disk to store output files.

If you want something a little less permanent, you can use the ring buffer target. I put together something like this a long time ago and enjoyed IM-ing coworkers and saying “You forgot the join criteria” with no other context. Freaked them out the first couple of times…

Comments closed