Press "Enter" to skip to content

Author: Kevin Feasel

Chaining with DirectQuery for Power BI Datasets

Wolfgang Strasser explains the notion of chaining when working with Power BI datasets:

In my last blog post I introduced the new concept of DirectQuery for Power BI datasets. This feature allows you to extend and modify a (remote) published Power BI dataset with the help of a local model.

The local model does not contain a copy of the remote dataset but a reference to it. You, as Power BI developer, are able to extend the referenced model with new data sources (like the Excel file I used in my previous example) and/or extend the model with new measures, columns and so on. For a new data model, relationships between the two data islands can be created.

Read on for examples of how this can be useful and what the current limitations look like.

Comments closed

Using the Open Source R or Python Runtime with Machine Learning Services

Niels Berglund walks us through using the open source extensibility framework to install R or Python:

When Java became a supported language in SQL Server 2019, Microsoft mentioned that communication between ExternalHost and the language extension should be based on an API, regardless of the external language. The API is the Extensibility Framework API for SQL Server. Having an API ensures simplicity and ease of use for the extension developer.

From the paragraph above, one can assume that Microsoft would like to see 3rd party development of language extensions. That assumption turned out to be accurate as, mentioned above, Microsoft open-sourced the Java language extension, together with the include files for the extension API, in September 2020! This means that anyone interested can now create a language extension for their own favorite language!

However, open sourcing the Java extension was not the only thing Microsoft did. They also created and open-sourced language extensions for R and Python!

Click through for more detail and a walkthrough on installation of Python.

Comments closed

Adding a Database Project to GitHub

Elizabeth Noble shows how you can get your brand new Azure Data Studio project into GitHub:

Once you have the database project created, you’ll want to get your database project added to source control so that you (and others) can modify and manage your database code. This next step is the beginning of allowing you and others to work on the same databases and minimize the risk of overwriting someone else’s work or deploying the wrong code to Production.

Tools like GitHub Desktop and SourceTree have definitely made things easier, especially for the happy path scenarios.

Comments closed

External Table Not Accessible because Content of Directory Cannot be Listed

Liliam Leme troubleshoots an error when working with a serverless SQL pool in Azure Synapse Analytics:

Following this lab: Lab: Serverless Synapse – From Spark to SQL On Demand – Microsoft Tech Community

You may experience this message: 

Failed to execute the query because content of directory cannot be listed) 

This is due to an extra step required to enable the AAD to pass through the firewall on the storage.

Click through for the solution.

Comments closed

Apache Spark Performance Tuning

Tomaz Kastrun provides a few hints when performance tuning Apache Spark code:

DataFrame versus Datasets versus SQL versus RDD is another choice, yet it is fairly easy. DataFrames, Datasets and SQL objects are all equal in performance and stability (at least from Spar 2.3 and above), meaning that if you are using DataFrames in any language, performance will be the same. Again, when writing custom objects of functions (UDF), there will be some performance degradation with both R or Python, so switching to Scala or Java might be a optimisation.

Read on for the details. My version is “When performance matters the most, be willing to switch to Scala.” It’s not always correct, but is rarely outright bad advice.

Comments closed

The Intuition Behind Averaging

The Stats Guy takes a look at averages:

In this diagram, there are a bunch of numbers and a single question mark. Behind the question, is also a number. The known numbers are the same as in our friend v above.

Our task is as follows:

– Make a guess on what that mystery number could be. And,
– If we can’t get it right, then reduce, as much as possible, the error we incur on our guess.

This is a well-written explanation of an important concept. H/T R-Bloggers

Comments closed

Naive Bayes and Continuous Predictor Variables

Akhila takes us through the intuition of how Naive Bayes works:

Usually we use the e1071 package to build a Naive Bayes classifier in R. And then using this classifier, we make some predictions on the training data.

So probability for these predictions can be directly calculated based on frequency of occurrences if the features are categorical.
But what if, there are features with continuous values? What the Naive Bayes classifier is actually doing behind the scenes to predict the probabilities of continuous data?

Click through for the answer. Also, Naive Bayes isn’t Bayesian, but that’s not important.

Comments closed

Working with Excel in Powershell

Mikey Bronowski has a festive post:

This blog post is part of the Festive Tech Calendar.

If you want to practice the whole thing I have prepared an interactive notebook for you that could be opened with Azure Data Studio for example (link to the notebook). For more things about the PowerShell module check this post out.

I would like to invite you to the world of magic!

Click through for an image-rich and extremely detailed post.

Comments closed

Power BI Composite Model V2 Demo

Wolfgang Strasser gives us a walkthrough of DirectQuery for Power BI datasets:

With the December 2020 release of Power BI Desktop, this approach changed. You are now able to change a live connection to a Power BI dataset (or an Azure Analysis Services connection) to DirectQuery mode. Which allows us, to enhance the remote model with new columns, tables, additional datasources and create relationships between the datasources.

Let’s dive deeper into this and look at the story together with a sample.

I’ve seen and linked to several posts talking about the idea, but Wolfgang has a demo going, which makes it easier to follow.

Comments closed

DirectQuery for Power BI Datasets

James Serra takes us through a new Power BI feature:

Announced last week is a major new feature for Power BI: you can now use DirectQuery to connect to Azure Analysis Services or Power BI Datasets and combine it with other DirectQuery datasets and/or imported datasets. This is a HUGE improvement that has the Power BI community buzzing! Think of it as the next generation of composite models. Note this requires the December version of Power BI Desktop, and you must go to Options -> Preview features and select “DirectQuery for Power BI datasets and Analysis Services”.

Read on for more details.

Comments closed