Month: November 2018

If you really like a certain R visual, you can also package it as a pbiviz file to share with others. Once you set up the foundation to create the first pbiviz, it is easy to crank out many more just by replacing the R code and repackaging it (into a different pbiviz file). See instruction here.

But this post isn’t about making charts. It turns out you can hijack the R visual to do lots of other things too. Below are a few examples:

Note: I am no R expert. The examples below are relatively simple and cobbled together from similar things online. They may be a little clunky, but worth it, in my opinion, to be able to dynamically leverage many more of the R capabilities through Power BI.

Read on for some interesting examples.

Comments closed

When Structured Data Makes Sense In A Data Lake

Published 2018-11-28 by Kevin Feasel

James Serra ponders when it makes sense to add structured data to a data lake:

I would not say it’s common place to load structured data into the data lake, but I do see it frequently.

In most cases it is not necessary to first copy relational source data into the data lake and then into the data warehouse, especially when keeping in mind the effort to migrate existing ETL jobs that are already copying source data into the data warehouse, but there are some good uses cases to do just that:

There are some good reasons in here, so check them out.

Comments closed

Fixing Issues Related To Filtered Indexes

Published 2018-11-28 by Kevin Feasel

Kevin Chant looks at a few problems that can pop up when using filtered indexes:

In a past post here I did an overview of different index types. I said in that post that I think filtered indexes could be more popular. In this post I will cover fixing some of the problems caused when you first introduce rowstore filtered indexes to a SQL Server database.

Some of you have probably been there already. You’ve put in your first filtered index on a database only to find an issue has happened. I’ve witnessed these issues at a few places. This will hopefully reduce the pain.

I’ve definitely experienced the third issue (which also pops up when using parameterized queries, so the optimizer doesn’t know that it can use the filtered index), but never the first two.

Comments closed

Migrating Queries To Power BI Dataflows

Published 2018-11-28 by Kevin Feasel

Matt Allington shows us how to move a query from Power BI desktop into a Power BI Dataflow:

Power Query is a user friendly ETL tool (Extract, Transform and Load). Traditionally ETL has been done using more complicated tools (such as SQL Server Integration Services – SSIS) and the resulting data is stored in a data mart or data warehouse for consumption by anyone that needs a standard view of the data. Power BI Desktop can consume tables directly from a data warehouse and simply load the table into Power BI – dead easy. But Power Query is also a powerful ETL tool in its own right, and it can be used to transform and reshape the source data directly inside Power BI Desktop (and then PowerBI.com). This is very useful if :

You don’t have a data warehouse and/or

You need some variation of what is in your data warehouse.

You have other data sources that are not in a data warehouse but are still important to you.

Taking this approach (manipulate in Power Query) is perfectly fine if you have a single workbook, but what if you have 10 similar workbooks all needing the same transformation? Worse still, what if you are one of many people in a company all doing the same thing with multiple workbooks?

Read on for the solution.

Comments closed

Creating Custom Power BI Themes

Published 2018-11-28 by Kevin Feasel

Cathrine Wilhelmsen shows us how to create a custom theme for Power BI dashboards:

Power BI comes with several built-in themes and a whole gallery full of custom themes available for download. But what if you still can’t find the perfect look for your reports? No problem! Just create your own custom Power BI themes 🙂

…sounds simple enough, right? It only takes a few minutes to create a custom Power BI theme with a color palette of your choice. Whoosh – instant custom branding!

But if you are like me, simple color changes might not be enough. Maybe you want finer control of borders, fonts, labels, or other visual elements. Or maybe you just don’t want to keep changing the same settings over and over and over again in multiple visualizations and reports. (Please don’t do that.)

You can control all of these things in custom Power BI themes. It is, however, not quite as simple as creating a color palette… yet. (You never know when the Power BI product team will blow your mind with a new update!) But for now, we need to define custom themes in JSON files.

Click through to learn how to do some of these changes through the power of editing JSON files.

Comments closed

Spark MLflow 0.8.0 Released

Published 2018-11-27 by Kevin Feasel

Aaron Davidson and Jules Damji announce MLflow 0.8.0 on the Spark platform:

Improved MLflow UI Experience

Compact Display for Metrics and Parameters: To avoid clutter and an explosion of columns for each metric or parameter, now we group them together in a single tabular column by default. That way, each runs’ parameters and metrics are listed nearby. Users can still click each parameter or metric to display it in a separate column or sort by it and customize their view this way.
Nesting Runs: For nested MLflow runs, which are common in hyperparameter search or multi-step workflows, the UI will display a collapsible tree underneath each parent run. This makes it much easier to organize and visualize multi-step workflows.
Labeling Runs: While MLflow gives each run a UUID by default, you can also now assign each run a name through the API. These names can also be edited in the UI.
UI Persistence: The MLflow UI now remembers your filters, sorting and column setup in browser local storage so you no longer need to reconfigure the view each time.

Looks like there are some nice additions here.

Comments closed

Disaster Recovery With Kafka Deployments

Published 2018-11-27 by Kevin Feasel

Yeva Byzek walks us through a disaster recovery scenario when running Apache Kafka:

Imagine:

Disaster strikes—catastrophic hardware failure, software failure, power outage, denial of service attack or some other event causes one datacenter with an Apache Kafka^® cluster to completely fail. Yet Kafka continues running in another datacenter, and it already has a copy of the data from the original datacenter, replicated to and from the same topic names. Client applications switch from the failed cluster to the running cluster and automatically resume data consumption in the new datacenter based on where it left off in the original datacenter. The business has minimized downtime and data loss resulting from the disaster, and continues to run its mission critical applications.

Ultimately, enabling the business to continue running is what disaster recovery planning is all about, as datacenter downtime and data loss can result in businesses losing revenue or entirely halting operations. To minimize the downtime and data loss resulting from a disaster, enterprises should create business continuity plans and disaster recovery strategies.

Distributed data sources can still succumb to disaster and many of the same policies that people learn when working with relational databases apply to things like Kafka as well.

Comments closed

Building A Convolutional Neural Network With TensorFlow

Published 2018-11-27 by Kevin Feasel

Anirudh Rao walks us through Convolutional Neural Networks in TensorFlow:

What Are Convolutional Neural Networks?

Convolutional Neural Networks, like neural networks, are made up of neurons with learnable weights and biases. Each neuron receives several inputs, takes a weighted sum over them, pass it through an activation function and responds with an output.

The whole network has a loss function and all the tips and tricks that we developed for neural networks still apply on Convolutional Neural Networks.

Pretty straightforward, right?

Neural networks, as its name suggests, is a machine learning technique which is modeled after the brain structure. It comprises of a network of learning units called neurons.

These neurons learn how to convert input signals (e.g. picture of a cat) into corresponding output signals (e.g. the label “cat”), forming the basis of automated recognition.

Let’s take the example of automatic image recognition. The process of determining whether a picture contains a cat involves an activation function. If the picture resembles prior cat images the neurons have seen before, the label “cat” would be activated.

Hence, the more labeled images the neurons are exposed to, the better it learns how to recognize other unlabelled images. We call this the process of training neurons.

I (finally) finished chapter 5 of Deep Learning in R, which is all about CNNs. It’s interesting just how open CNNs are for post hoc understanding, totally at odds with the classic neural network reputation for being a black box full of dark magic.

Comments closed

Building A Gantt Chart With Plotly

Published 2018-11-27 by Kevin Feasel

Ellen Talbot shows us how to embrace our inner micromanagers:

Something a little different today for a quick chat about my latest project and why I’m finding the plotly package so helpful!

Are you like me and physically can’t function unless you’ve got a to do list in front of you? Well even if you’re not, imagine my pain while I’m wearing my non – Locke Data hat and trying to plan out the final year of my PhD thesis!

I needed something that updated easily, something visual and something to keep my supervisors in the know. I’ve previously made gantt charts using LaTeX but found it ridiculously clunky to get working and decided there had to be a better way. And if I could include interactivity then all the better, which is how I discovered plotly.

Admittedly, I like gantt charts more than almost any developer I’ve ever met. They always look so pretty and are wonderful depictions of a world which will never be.

Comments closed

Using Azure ML To Approve Expenses Automatically

Published 2018-11-27 by Kevin Feasel

Isabelle Van Campenhoudt walks us through a scenario of using Azure ML to find expense reports which should automatically be approved, reducing the workload for approvers:

My partner in crime Serge Luca aka Doctor Flow is the author of a nice and complex expenses approval system in Microsoft Flow .
One year ago, he asked me to add analytics to his Flow. This year he has the interesting idea to add a machine-learning based approval in his flow and suggest me to work on it. The idea is the following: Since we have a lot of approvals in our system, can a machine learn and found some decision pattern to apply automatically to each expenses request ?
I decided to use the Microsoft Azure Machine Learning Studio. In this tool you can build experiments and use some of the most common and useful machine learning algorithms. It was amazing to see how easy it is to create and consume machine learning .

This contrasts with Ginger Grant’s nightmare scenario pretty well: instead of trying to get the ML process to do all of the work, create a process which takes care of the really easy stuff and leave harder tasks to specialists with a deeper understanding of the rules. That way they don’t have to spend their time on trivialities.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30