Press "Enter" to skip to content

Month: November 2018

Tracking Errors In Power BI

Reza Rad has a lengthy post covering how you can track errors in Power Query:

To build a robust BI system, you need to cater for errors and handle errors carefully. If you build a reporting solution that the refresh of that fails everytime an error occurs, it is not a robust system. Errors can happen by many reasons, In this post, I’ll show you a way to catch potential errors in Power Query and how to build an exception report page to visualize the error rows for further investigation. The method that you learn here, will save your model from failing at the time of refresh. Means you get the dataset updated, and you can catch any rows caused the error in an exception report page. To learn more about Power BI, read Power BI book from Rookie to Rock Star.

There’s a lot of work, but also a lot of value in doing that work.

Comments closed

Using Power BI DMVs

Kasper de Jonge shows off a few Power BI dynamic management views:

Today a quick one that I came across while writing a different blog post that I will blog later. I know we have talked about it again and again but a good best practice is to remove any high carnality columns you don’t necessary need. This trick is not new and has been blogged about before in different places but I wanted to emphasize it again due to the importance.

I was finishing up my next blog post and wanted to upload the sample file. While doing that I noticed the file was 150MB large. That is rather large for such a simple file. The largest table has 500,000 rows and none of them are unique. What is going on?

There are some interesting DMVs in Kasper’s post, including one which shows cardinality by column.

Comments closed

Using Query Store To Force Plans With Plan Guides On Them Already

Grant Fritchey creates a plan guide and then forces the plan in Query Store:

If I look at the plan that is stored in Query Store, I’ll see the identical plan up above, including the PlanGuideDB and PlanGuideName properties.

So, let’s force the plan using the values returned from the query above:

Now, when we run the query, we’ll see both the plan guide in use and that the plan is forced (see this earlier blog post explaining this behavior). This is all expected behavior.

Check it out to see how SQL Server behaves.

Comments closed

Working With Strings In Base R

Jozef Hajnala shows us that you don’t need stringr to do cool things with strings in R:

This post is aimed to serve as an overview of functionality provided by base R to work with strings. Note that the term “string” is used somewhat loosely and refers to character vectors and character strings. In R documentation, references to character string, refer to character vectors of length 1.

Also since this is an overview, we will not examine the details of the functions, but rather list examples with simple, intuitive explanations trading off technical precision.

As much as I like the tidyverse for its data platform professional-friendly approach to R, it is good to know the base libraries (and other alternatives) as well.  H/T R-Bloggers

Comments closed

Quick Geospatial Data Plots In R And Python

Harry McLellan shows us how we can use R and Python to generate quick-and-dirty plots of geospatial data:

Now R has some useful packages like ggmap, mapdata and ggplot2 which allow you to source you map satellite images directly from google maps, but this does require a free google API key to source from the cloud. These packages can also plot the map around the data as I am currently trimming the map to fit the data. But for a fair test I also used a simplistic pre-built map in R. This was from the package rworldmap, which allows plotting at a country level with defined borders. Axes can be scaled to act like a zoom function but without a higher resolutions map or raster satellite image map it is pointless to go past a country level.

There’s a lot more you can do with both languages, but when you just want a plot in a few lines of code, both are up to the task.

Comments closed

Using K-Means Clustering For Anomaly Detection

Anais Dotis-Georgiou gives us an interesting use case of using k-means clustering along with InfluxDB (a time-series database) to detect anomalies in EKG data:

If you read Part Two, then you know these are the steps I used for anomaly detection with K-means:

  1. Segmentation – the process of splitting your time series data into small segments with a horizontal translation.

  2. Windowing – the action of multiplying your segmented data by a windowing function to truncate the dataset before and after the window. The term windowing gets its name from its functionality: it allows you to only see the data in the window range since everything before and after (or outside the window) is multiplied by zero. Windowing allows you to seamlessly stitch your reconstructed data together.

  3. Clustering – the task of grouping similar windowed segments and finding the centroids in the clusters. A centroid is at the center of a cluster. Mathematically, it is defined by the arithmetic mean position of all the points in the cluster.

  4. Reconstruction – the process of rebuilding your time series data. Essentially, you are matching your normal time series data to the closest centroid (the predicted centroid) and stitching those centroids together to produce the reconstructed data.

  5. Normal Error – The purpose of the Reconstruction is to calculate the normal error associated with the output of your time series prediction.

  6. Anomaly Detection – Since you know what the normal error for reconstruction is, you can now use it as a threshold for anomaly detection. Any reconstruction error above that normal error can be considered an anomaly.

Read the whole thing.  This is a really cool use case of a set of technologies along with a venerable (if sometimes troublesome) algorithm.

Comments closed

Running Windows On Kubernetes

Chris Adkin shows us how you can run the Windows OS on Kubernetes (with some limitations):

With the caveat that this information is correct at the time of writing, the following points should be noted:

  • The control plane and worker nodes should always be on the same release of Kubernetes.
  • The control plane can only run on Linux
  • The minimum version of Windows 2016 RTM is required for worker nodes, but version 1709 is preferred.

A full list of restriction can be found here.

But here is something particularly significant for anyone wishing to deploy highly available SQL Server infrastructures to Kubernetes via availability groups:

Read on for that particularly significant limitation; it’s a doozy.

Comments closed

Using Power BI Dataflows To Create Common Data Sets

Alexander Arvidsson shares an interesting use case for Power BI Dataflows:

There are several more use cases for a dataflow, but one that is very useful is the ability to share a dataset between apps. Previously we had to duplicate the dataset to each and every app that needed to use it, increasing the risk that one dataset was ignored, not refreshed properly or otherwise out of sync with reality. By using dataflows we can have several apps rely on the same dataflow (via a dataset), and thus it is quite possible to have a “master dataset”.

Click through for a walkthrough, as well as an understanding of the process’s limitations.

Comments closed

No Laptop For You: A Case Of Machine Learning Failure

Ginger Grant walks us through a scenario where Lenovo refused to sell her a laptop four times:

Buying a laptop from Lenovo reminded me of an episode of Seinfeld when Elaine was trying to buy soup.  For some unknown reason, when I placed an order on their website and gave them my money, Lenovo gave me a Cancellation Notice, the email equivalent of “No Soup for you!”  After placing an order, about 15 minutes later, I received a cancellation notice.  I called customer service.  They looked at the order and advised me the systemincorrectly cancelled the order.  I was told to place the order again as they had resolved the problem.  I created a new order, and just like the last time, I received the No Laptop for You cancellation email.  I called back. This time I was told that the system thinks I am a fraud. Now I have no laptop and I have been insulted.

In all the talk of ML running the future, one thing that gets forgotten is that models, being simplifications of reality, necessarily make mistakes.  Failing to have some sort of manual override means, in this case, throwing away money for no good reason.

Comments closed

Join Estimation: Details On A Cardinality Estimator Change

Paul White shares some detail on a change to the Cardinality Estimator in SQL Server 2014:

The question is now how to perform a coarse estimation of the equijoin cardinality of the highlighted steps, using the information available.

The original cardinality estimator would have performed a fine-grained step-by-step histogram alignment using linear interpolation, assessed the join contribution of each step (much as we did for the minimum step value before), and summed each step contribution to acquire a full join estimate. While this procedure makes a lot of intuitive sense, practical experience was that this fine-grained approach added computational overhead and could produce results of variable quality.

The original estimator had another way to estimate join cardinality when histogram information was either not available, or heuristically assessed to be inferior. This is known as a frequency-based estimation[…]

It’s an interesting dive into one of the changes in 2014’s CE.  The post is a little math-heavy but Paul does a great job keeping it interesting.

Comments closed