Press "Enter" to skip to content

Author: Kevin Feasel

Building A Gantt Chart With Plotly

Ellen Talbot shows us how to embrace our inner micromanagers:

Something a little different today for a quick chat about my latest project and why I’m finding the plotly package so helpful!

Are you like me and physically can’t function unless you’ve got a to do list in front of you? Well even if you’re not, imagine my pain while I’m wearing my non – Locke Data hat and trying to plan out the final year of my PhD thesis!

I needed something that updated easily, something visual and something to keep my supervisors in the know. I’ve previously made gantt charts using LaTeX but found it ridiculously clunky to get working and decided there had to be a better way. And if I could include interactivity then all the better, which is how I discovered plotly.

Admittedly, I like gantt charts more than almost any developer I’ve ever met.  They always look so pretty and are wonderful depictions of a world which will never be.

Comments closed

Using Azure ML To Approve Expenses Automatically

Isabelle Van Campenhoudt walks us through a scenario of using Azure ML to find expense reports which should automatically be approved, reducing the workload for approvers:

My partner in crime Serge Luca aka Doctor Flow is the author of a nice and complex expenses approval system in Microsoft Flow .
One year ago, he asked me to add analytics to his Flow.  This year he has the interesting idea to add a machine-learning based approval in his flow and suggest me to work on it. The idea is the following: Since we have a lot of approvals in our system, can a machine learn and found some decision pattern to apply automatically to each expenses request ?
I decided to use the Microsoft Azure Machine Learning Studio. In this tool you can build experiments and use some of the most common and useful machine learning algorithms. It was amazing to see how easy it is to create and consume machine learning .

This contrasts with Ginger Grant’s nightmare scenario pretty well:  instead of trying to get the ML process to do all of the work, create a process which takes care of the really easy stuff and leave harder tasks to specialists with a deeper understanding of the rules.  That way they don’t have to spend their time on trivialities.

Comments closed

Tracking Errors In Power BI

Reza Rad has a lengthy post covering how you can track errors in Power Query:

To build a robust BI system, you need to cater for errors and handle errors carefully. If you build a reporting solution that the refresh of that fails everytime an error occurs, it is not a robust system. Errors can happen by many reasons, In this post, I’ll show you a way to catch potential errors in Power Query and how to build an exception report page to visualize the error rows for further investigation. The method that you learn here, will save your model from failing at the time of refresh. Means you get the dataset updated, and you can catch any rows caused the error in an exception report page. To learn more about Power BI, read Power BI book from Rookie to Rock Star.

There’s a lot of work, but also a lot of value in doing that work.

Comments closed

Using Power BI DMVs

Kasper de Jonge shows off a few Power BI dynamic management views:

Today a quick one that I came across while writing a different blog post that I will blog later. I know we have talked about it again and again but a good best practice is to remove any high carnality columns you don’t necessary need. This trick is not new and has been blogged about before in different places but I wanted to emphasize it again due to the importance.

I was finishing up my next blog post and wanted to upload the sample file. While doing that I noticed the file was 150MB large. That is rather large for such a simple file. The largest table has 500,000 rows and none of them are unique. What is going on?

There are some interesting DMVs in Kasper’s post, including one which shows cardinality by column.

Comments closed

Using Query Store To Force Plans With Plan Guides On Them Already

Grant Fritchey creates a plan guide and then forces the plan in Query Store:

If I look at the plan that is stored in Query Store, I’ll see the identical plan up above, including the PlanGuideDB and PlanGuideName properties.

So, let’s force the plan using the values returned from the query above:

Now, when we run the query, we’ll see both the plan guide in use and that the plan is forced (see this earlier blog post explaining this behavior). This is all expected behavior.

Check it out to see how SQL Server behaves.

Comments closed

Working With Strings In Base R

Jozef Hajnala shows us that you don’t need stringr to do cool things with strings in R:

This post is aimed to serve as an overview of functionality provided by base R to work with strings. Note that the term “string” is used somewhat loosely and refers to character vectors and character strings. In R documentation, references to character string, refer to character vectors of length 1.

Also since this is an overview, we will not examine the details of the functions, but rather list examples with simple, intuitive explanations trading off technical precision.

As much as I like the tidyverse for its data platform professional-friendly approach to R, it is good to know the base libraries (and other alternatives) as well.  H/T R-Bloggers

Comments closed

Quick Geospatial Data Plots In R And Python

Harry McLellan shows us how we can use R and Python to generate quick-and-dirty plots of geospatial data:

Now R has some useful packages like ggmap, mapdata and ggplot2 which allow you to source you map satellite images directly from google maps, but this does require a free google API key to source from the cloud. These packages can also plot the map around the data as I am currently trimming the map to fit the data. But for a fair test I also used a simplistic pre-built map in R. This was from the package rworldmap, which allows plotting at a country level with defined borders. Axes can be scaled to act like a zoom function but without a higher resolutions map or raster satellite image map it is pointless to go past a country level.

There’s a lot more you can do with both languages, but when you just want a plot in a few lines of code, both are up to the task.

Comments closed

Using K-Means Clustering For Anomaly Detection

Anais Dotis-Georgiou gives us an interesting use case of using k-means clustering along with InfluxDB (a time-series database) to detect anomalies in EKG data:

If you read Part Two, then you know these are the steps I used for anomaly detection with K-means:

  1. Segmentation – the process of splitting your time series data into small segments with a horizontal translation.

  2. Windowing – the action of multiplying your segmented data by a windowing function to truncate the dataset before and after the window. The term windowing gets its name from its functionality: it allows you to only see the data in the window range since everything before and after (or outside the window) is multiplied by zero. Windowing allows you to seamlessly stitch your reconstructed data together.

  3. Clustering – the task of grouping similar windowed segments and finding the centroids in the clusters. A centroid is at the center of a cluster. Mathematically, it is defined by the arithmetic mean position of all the points in the cluster.

  4. Reconstruction – the process of rebuilding your time series data. Essentially, you are matching your normal time series data to the closest centroid (the predicted centroid) and stitching those centroids together to produce the reconstructed data.

  5. Normal Error – The purpose of the Reconstruction is to calculate the normal error associated with the output of your time series prediction.

  6. Anomaly Detection – Since you know what the normal error for reconstruction is, you can now use it as a threshold for anomaly detection. Any reconstruction error above that normal error can be considered an anomaly.

Read the whole thing.  This is a really cool use case of a set of technologies along with a venerable (if sometimes troublesome) algorithm.

Comments closed

Running Windows On Kubernetes

Chris Adkin shows us how you can run the Windows OS on Kubernetes (with some limitations):

With the caveat that this information is correct at the time of writing, the following points should be noted:

  • The control plane and worker nodes should always be on the same release of Kubernetes.
  • The control plane can only run on Linux
  • The minimum version of Windows 2016 RTM is required for worker nodes, but version 1709 is preferred.

A full list of restriction can be found here.

But here is something particularly significant for anyone wishing to deploy highly available SQL Server infrastructures to Kubernetes via availability groups:

Read on for that particularly significant limitation; it’s a doozy.

Comments closed

Using Power BI Dataflows To Create Common Data Sets

Alexander Arvidsson shares an interesting use case for Power BI Dataflows:

There are several more use cases for a dataflow, but one that is very useful is the ability to share a dataset between apps. Previously we had to duplicate the dataset to each and every app that needed to use it, increasing the risk that one dataset was ignored, not refreshed properly or otherwise out of sync with reality. By using dataflows we can have several apps rely on the same dataflow (via a dataset), and thus it is quite possible to have a “master dataset”.

Click through for a walkthrough, as well as an understanding of the process’s limitations.

Comments closed