2018-11-27 – Curated SQL

Spark MLflow 0.8.0 Released

Published 2018-11-27 by Kevin Feasel

Aaron Davidson and Jules Damji announce MLflow 0.8.0 on the Spark platform:

Improved MLflow UI Experience

Compact Display for Metrics and Parameters: To avoid clutter and an explosion of columns for each metric or parameter, now we group them together in a single tabular column by default. That way, each runs’ parameters and metrics are listed nearby. Users can still click each parameter or metric to display it in a separate column or sort by it and customize their view this way.
Nesting Runs: For nested MLflow runs, which are common in hyperparameter search or multi-step workflows, the UI will display a collapsible tree underneath each parent run. This makes it much easier to organize and visualize multi-step workflows.
Labeling Runs: While MLflow gives each run a UUID by default, you can also now assign each run a name through the API. These names can also be edited in the UI.
UI Persistence: The MLflow UI now remembers your filters, sorting and column setup in browser local storage so you no longer need to reconfigure the view each time.

Looks like there are some nice additions here.

Comments closed

Disaster Recovery With Kafka Deployments

Published 2018-11-27 by Kevin Feasel

Yeva Byzek walks us through a disaster recovery scenario when running Apache Kafka:

Imagine:

Disaster strikes—catastrophic hardware failure, software failure, power outage, denial of service attack or some other event causes one datacenter with an Apache Kafka^® cluster to completely fail. Yet Kafka continues running in another datacenter, and it already has a copy of the data from the original datacenter, replicated to and from the same topic names. Client applications switch from the failed cluster to the running cluster and automatically resume data consumption in the new datacenter based on where it left off in the original datacenter. The business has minimized downtime and data loss resulting from the disaster, and continues to run its mission critical applications.

Ultimately, enabling the business to continue running is what disaster recovery planning is all about, as datacenter downtime and data loss can result in businesses losing revenue or entirely halting operations. To minimize the downtime and data loss resulting from a disaster, enterprises should create business continuity plans and disaster recovery strategies.

Distributed data sources can still succumb to disaster and many of the same policies that people learn when working with relational databases apply to things like Kafka as well.

Comments closed

Building A Convolutional Neural Network With TensorFlow

Published 2018-11-27 by Kevin Feasel

Anirudh Rao walks us through Convolutional Neural Networks in TensorFlow:

What Are Convolutional Neural Networks?

Convolutional Neural Networks, like neural networks, are made up of neurons with learnable weights and biases. Each neuron receives several inputs, takes a weighted sum over them, pass it through an activation function and responds with an output.

The whole network has a loss function and all the tips and tricks that we developed for neural networks still apply on Convolutional Neural Networks.

Pretty straightforward, right?

Neural networks, as its name suggests, is a machine learning technique which is modeled after the brain structure. It comprises of a network of learning units called neurons.

These neurons learn how to convert input signals (e.g. picture of a cat) into corresponding output signals (e.g. the label “cat”), forming the basis of automated recognition.

Let’s take the example of automatic image recognition. The process of determining whether a picture contains a cat involves an activation function. If the picture resembles prior cat images the neurons have seen before, the label “cat” would be activated.

Hence, the more labeled images the neurons are exposed to, the better it learns how to recognize other unlabelled images. We call this the process of training neurons.

I (finally) finished chapter 5 of Deep Learning in R, which is all about CNNs. It’s interesting just how open CNNs are for post hoc understanding, totally at odds with the classic neural network reputation for being a black box full of dark magic.

Comments closed

Building A Gantt Chart With Plotly

Published 2018-11-27 by Kevin Feasel

Ellen Talbot shows us how to embrace our inner micromanagers:

Something a little different today for a quick chat about my latest project and why I’m finding the plotly package so helpful!

Are you like me and physically can’t function unless you’ve got a to do list in front of you? Well even if you’re not, imagine my pain while I’m wearing my non – Locke Data hat and trying to plan out the final year of my PhD thesis!

I needed something that updated easily, something visual and something to keep my supervisors in the know. I’ve previously made gantt charts using LaTeX but found it ridiculously clunky to get working and decided there had to be a better way. And if I could include interactivity then all the better, which is how I discovered plotly.

Admittedly, I like gantt charts more than almost any developer I’ve ever met. They always look so pretty and are wonderful depictions of a world which will never be.

Comments closed

Using Azure ML To Approve Expenses Automatically

Published 2018-11-27 by Kevin Feasel

Isabelle Van Campenhoudt walks us through a scenario of using Azure ML to find expense reports which should automatically be approved, reducing the workload for approvers:

My partner in crime Serge Luca aka Doctor Flow is the author of a nice and complex expenses approval system in Microsoft Flow .
One year ago, he asked me to add analytics to his Flow. This year he has the interesting idea to add a machine-learning based approval in his flow and suggest me to work on it. The idea is the following: Since we have a lot of approvals in our system, can a machine learn and found some decision pattern to apply automatically to each expenses request ?
I decided to use the Microsoft Azure Machine Learning Studio. In this tool you can build experiments and use some of the most common and useful machine learning algorithms. It was amazing to see how easy it is to create and consume machine learning .

This contrasts with Ginger Grant’s nightmare scenario pretty well: instead of trying to get the ML process to do all of the work, create a process which takes care of the really easy stuff and leave harder tasks to specialists with a deeper understanding of the rules. That way they don’t have to spend their time on trivialities.

Comments closed

Tracking Errors In Power BI

Published 2018-11-27 by Kevin Feasel

Reza Rad has a lengthy post covering how you can track errors in Power Query:

To build a robust BI system, you need to cater for errors and handle errors carefully. If you build a reporting solution that the refresh of that fails everytime an error occurs, it is not a robust system. Errors can happen by many reasons, In this post, I’ll show you a way to catch potential errors in Power Query and how to build an exception report page to visualize the error rows for further investigation. The method that you learn here, will save your model from failing at the time of refresh. Means you get the dataset updated, and you can catch any rows caused the error in an exception report page. To learn more about Power BI, read Power BI book from Rookie to Rock Star.

There’s a lot of work, but also a lot of value in doing that work.

Comments closed

Using Power BI DMVs

Published 2018-11-27 by Kevin Feasel

Kasper de Jonge shows off a few Power BI dynamic management views:

Today a quick one that I came across while writing a different blog post that I will blog later. I know we have talked about it again and again but a good best practice is to remove any high carnality columns you don’t necessary need. This trick is not new and has been blogged about before in different places but I wanted to emphasize it again due to the importance.

I was finishing up my next blog post and wanted to upload the sample file. While doing that I noticed the file was 150MB large. That is rather large for such a simple file. The largest table has 500,000 rows and none of them are unique. What is going on?

There are some interesting DMVs in Kasper’s post, including one which shows cardinality by column.

Comments closed

Using Query Store To Force Plans With Plan Guides On Them Already

Published 2018-11-27 by Kevin Feasel

Grant Fritchey creates a plan guide and then forces the plan in Query Store:

If I look at the plan that is stored in Query Store, I’ll see the identical plan up above, including the PlanGuideDB and PlanGuideName properties.

So, let’s force the plan using the values returned from the query above:

1

EXEC sys.sp_query_store_force_plan 6,7;

Now, when we run the query, we’ll see both the plan guide in use and that the plan is forced (see this earlier blog post explaining this behavior). This is all expected behavior.

Check it out to see how SQL Server behaves.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Day: November 27, 2018

Improved MLflow UI Experience

What Are Convolutional Neural Networks?