Press "Enter" to skip to content

Curated SQL Posts

Building A Convolutional Neural Network With TensorFlow

Anirudh Rao walks us through Convolutional Neural Networks in TensorFlow:

What Are Convolutional Neural Networks?

Convolutional Neural Networks, like neural networks, are made up of neurons with learnable weights and biases. Each neuron receives several inputs, takes a weighted sum over them, pass it through an activation function and responds with an output.

The whole network has a loss function and all the tips and tricks that we developed for neural networks still apply on Convolutional Neural Networks.

Pretty straightforward, right?

Neural networks, as its name suggests, is a machine learning technique which is modeled after the brain structure. It comprises of a network of learning units called neurons.

These neurons learn how to convert input signals (e.g. picture of a cat) into corresponding output signals (e.g. the label “cat”), forming the basis of automated recognition.

Let’s take the example of automatic image recognition. The process of determining whether a picture contains a cat involves an activation function. If the picture resembles prior cat images the neurons have seen before, the label “cat” would be activated.

Hence, the more labeled images the neurons are exposed to, the better it learns how to recognize other unlabelled images. We call this the process of training neurons.

I (finally) finished chapter 5 of Deep Learning in R, which is all about CNNs.  It’s interesting just how open CNNs are for post hoc understanding, totally at odds with the classic neural network reputation for being a black box full of dark magic.

Comments closed

Building A Gantt Chart With Plotly

Ellen Talbot shows us how to embrace our inner micromanagers:

Something a little different today for a quick chat about my latest project and why I’m finding the plotly package so helpful!

Are you like me and physically can’t function unless you’ve got a to do list in front of you? Well even if you’re not, imagine my pain while I’m wearing my non – Locke Data hat and trying to plan out the final year of my PhD thesis!

I needed something that updated easily, something visual and something to keep my supervisors in the know. I’ve previously made gantt charts using LaTeX but found it ridiculously clunky to get working and decided there had to be a better way. And if I could include interactivity then all the better, which is how I discovered plotly.

Admittedly, I like gantt charts more than almost any developer I’ve ever met.  They always look so pretty and are wonderful depictions of a world which will never be.

Comments closed

Using Azure ML To Approve Expenses Automatically

Isabelle Van Campenhoudt walks us through a scenario of using Azure ML to find expense reports which should automatically be approved, reducing the workload for approvers:

My partner in crime Serge Luca aka Doctor Flow is the author of a nice and complex expenses approval system in Microsoft Flow .
One year ago, he asked me to add analytics to his Flow.  This year he has the interesting idea to add a machine-learning based approval in his flow and suggest me to work on it. The idea is the following: Since we have a lot of approvals in our system, can a machine learn and found some decision pattern to apply automatically to each expenses request ?
I decided to use the Microsoft Azure Machine Learning Studio. In this tool you can build experiments and use some of the most common and useful machine learning algorithms. It was amazing to see how easy it is to create and consume machine learning .

This contrasts with Ginger Grant’s nightmare scenario pretty well:  instead of trying to get the ML process to do all of the work, create a process which takes care of the really easy stuff and leave harder tasks to specialists with a deeper understanding of the rules.  That way they don’t have to spend their time on trivialities.

Comments closed

Tracking Errors In Power BI

Reza Rad has a lengthy post covering how you can track errors in Power Query:

To build a robust BI system, you need to cater for errors and handle errors carefully. If you build a reporting solution that the refresh of that fails everytime an error occurs, it is not a robust system. Errors can happen by many reasons, In this post, I’ll show you a way to catch potential errors in Power Query and how to build an exception report page to visualize the error rows for further investigation. The method that you learn here, will save your model from failing at the time of refresh. Means you get the dataset updated, and you can catch any rows caused the error in an exception report page. To learn more about Power BI, read Power BI book from Rookie to Rock Star.

There’s a lot of work, but also a lot of value in doing that work.

Comments closed

Using Power BI DMVs

Kasper de Jonge shows off a few Power BI dynamic management views:

Today a quick one that I came across while writing a different blog post that I will blog later. I know we have talked about it again and again but a good best practice is to remove any high carnality columns you don’t necessary need. This trick is not new and has been blogged about before in different places but I wanted to emphasize it again due to the importance.

I was finishing up my next blog post and wanted to upload the sample file. While doing that I noticed the file was 150MB large. That is rather large for such a simple file. The largest table has 500,000 rows and none of them are unique. What is going on?

There are some interesting DMVs in Kasper’s post, including one which shows cardinality by column.

Comments closed

Using Query Store To Force Plans With Plan Guides On Them Already

Grant Fritchey creates a plan guide and then forces the plan in Query Store:

If I look at the plan that is stored in Query Store, I’ll see the identical plan up above, including the PlanGuideDB and PlanGuideName properties.

So, let’s force the plan using the values returned from the query above:

Now, when we run the query, we’ll see both the plan guide in use and that the plan is forced (see this earlier blog post explaining this behavior). This is all expected behavior.

Check it out to see how SQL Server behaves.

Comments closed

Working With Strings In Base R

Jozef Hajnala shows us that you don’t need stringr to do cool things with strings in R:

This post is aimed to serve as an overview of functionality provided by base R to work with strings. Note that the term “string” is used somewhat loosely and refers to character vectors and character strings. In R documentation, references to character string, refer to character vectors of length 1.

Also since this is an overview, we will not examine the details of the functions, but rather list examples with simple, intuitive explanations trading off technical precision.

As much as I like the tidyverse for its data platform professional-friendly approach to R, it is good to know the base libraries (and other alternatives) as well.  H/T R-Bloggers

Comments closed

Quick Geospatial Data Plots In R And Python

Harry McLellan shows us how we can use R and Python to generate quick-and-dirty plots of geospatial data:

Now R has some useful packages like ggmap, mapdata and ggplot2 which allow you to source you map satellite images directly from google maps, but this does require a free google API key to source from the cloud. These packages can also plot the map around the data as I am currently trimming the map to fit the data. But for a fair test I also used a simplistic pre-built map in R. This was from the package rworldmap, which allows plotting at a country level with defined borders. Axes can be scaled to act like a zoom function but without a higher resolutions map or raster satellite image map it is pointless to go past a country level.

There’s a lot more you can do with both languages, but when you just want a plot in a few lines of code, both are up to the task.

Comments closed

Using K-Means Clustering For Anomaly Detection

Anais Dotis-Georgiou gives us an interesting use case of using k-means clustering along with InfluxDB (a time-series database) to detect anomalies in EKG data:

If you read Part Two, then you know these are the steps I used for anomaly detection with K-means:

  1. Segmentation – the process of splitting your time series data into small segments with a horizontal translation.

  2. Windowing – the action of multiplying your segmented data by a windowing function to truncate the dataset before and after the window. The term windowing gets its name from its functionality: it allows you to only see the data in the window range since everything before and after (or outside the window) is multiplied by zero. Windowing allows you to seamlessly stitch your reconstructed data together.

  3. Clustering – the task of grouping similar windowed segments and finding the centroids in the clusters. A centroid is at the center of a cluster. Mathematically, it is defined by the arithmetic mean position of all the points in the cluster.

  4. Reconstruction – the process of rebuilding your time series data. Essentially, you are matching your normal time series data to the closest centroid (the predicted centroid) and stitching those centroids together to produce the reconstructed data.

  5. Normal Error – The purpose of the Reconstruction is to calculate the normal error associated with the output of your time series prediction.

  6. Anomaly Detection – Since you know what the normal error for reconstruction is, you can now use it as a threshold for anomaly detection. Any reconstruction error above that normal error can be considered an anomaly.

Read the whole thing.  This is a really cool use case of a set of technologies along with a venerable (if sometimes troublesome) algorithm.

Comments closed

Running Windows On Kubernetes

Chris Adkin shows us how you can run the Windows OS on Kubernetes (with some limitations):

With the caveat that this information is correct at the time of writing, the following points should be noted:

  • The control plane and worker nodes should always be on the same release of Kubernetes.
  • The control plane can only run on Linux
  • The minimum version of Windows 2016 RTM is required for worker nodes, but version 1709 is preferred.

A full list of restriction can be found here.

But here is something particularly significant for anyone wishing to deploy highly available SQL Server infrastructures to Kubernetes via availability groups:

Read on for that particularly significant limitation; it’s a doozy.

Comments closed