Press "Enter" to skip to content

Day: December 29, 2020

Apache Spark Performance Tuning

Tomaz Kastrun provides a few hints when performance tuning Apache Spark code:

DataFrame versus Datasets versus SQL versus RDD is another choice, yet it is fairly easy. DataFrames, Datasets and SQL objects are all equal in performance and stability (at least from Spar 2.3 and above), meaning that if you are using DataFrames in any language, performance will be the same. Again, when writing custom objects of functions (UDF), there will be some performance degradation with both R or Python, so switching to Scala or Java might be a optimisation.

Read on for the details. My version is “When performance matters the most, be willing to switch to Scala.” It’s not always correct, but is rarely outright bad advice.

Comments closed

The Intuition Behind Averaging

The Stats Guy takes a look at averages:

In this diagram, there are a bunch of numbers and a single question mark. Behind the question, is also a number. The known numbers are the same as in our friend v above.

Our task is as follows:

– Make a guess on what that mystery number could be. And,
– If we can’t get it right, then reduce, as much as possible, the error we incur on our guess.

This is a well-written explanation of an important concept. H/T R-Bloggers

Comments closed

Naive Bayes and Continuous Predictor Variables

Akhila takes us through the intuition of how Naive Bayes works:

Usually we use the e1071 package to build a Naive Bayes classifier in R. And then using this classifier, we make some predictions on the training data.

So probability for these predictions can be directly calculated based on frequency of occurrences if the features are categorical.
But what if, there are features with continuous values? What the Naive Bayes classifier is actually doing behind the scenes to predict the probabilities of continuous data?

Click through for the answer. Also, Naive Bayes isn’t Bayesian, but that’s not important.

Comments closed

Power BI Composite Model V2 Demo

Wolfgang Strasser gives us a walkthrough of DirectQuery for Power BI datasets:

With the December 2020 release of Power BI Desktop, this approach changed. You are now able to change a live connection to a Power BI dataset (or an Azure Analysis Services connection) to DirectQuery mode. Which allows us, to enhance the remote model with new columns, tables, additional datasources and create relationships between the datasources.

Let’s dive deeper into this and look at the story together with a sample.

I’ve seen and linked to several posts talking about the idea, but Wolfgang has a demo going, which makes it easier to follow.

Comments closed

Working with Excel in Powershell

Mikey Bronowski has a festive post:

This blog post is part of the Festive Tech Calendar.

If you want to practice the whole thing I have prepared an interactive notebook for you that could be opened with Azure Data Studio for example (link to the notebook). For more things about the PowerShell module check this post out.

I would like to invite you to the world of magic!

Click through for an image-rich and extremely detailed post.

Comments closed

Inlining KQL in Power Query

Chris Webb shows you how you can include KQL query fragments in Power Query:

If the title wasn’t enough to warn you, this post is only going to be of interest to M ultra-geeks and people using Power BI with Azure Data Explorer – and I know there aren’t many people in either group. However I thought the feature I’m going to show you in this post is so cool I couldn’t resist blogging about it.

Limited in its utility, but still quite interesting.

Comments closed

DirectQuery for Power BI Datasets

James Serra takes us through a new Power BI feature:

Announced last week is a major new feature for Power BI: you can now use DirectQuery to connect to Azure Analysis Services or Power BI Datasets and combine it with other DirectQuery datasets and/or imported datasets. This is a HUGE improvement that has the Power BI community buzzing! Think of it as the next generation of composite models. Note this requires the December version of Power BI Desktop, and you must go to Options -> Preview features and select “DirectQuery for Power BI datasets and Analysis Services”.

Read on for more details.

Comments closed