Press "Enter" to skip to content

Month: June 2020

Azure Active Directory and the DatabricksPS Library

Gerhard Brueckl has updated the DatabricksPS library:

Databricks recently announced that it is now also supporting Azure Active Directory Authentication for the REST API which is now in public preview. This may not sound super exciting but is actually a very important feature when it comes to Continuous Integration/Continuous Delivery pipelines in Azure DevOps or any other CI/CD tool. Previously, whenever you wanted to deploy content to a new Databricks workspace, you first needed to manually create a user-bound API access token. As you can imagine, manual steps are also bad for otherwise automated processes like a CI/CD pipeline. With Databricks REST API finally supporting Azure Active Directory Authentication of regular users and service principals, this last manual step is finally also gone!

If you do use Databricks and haven’t tried out DatabricksPS, I highly recommend it. I think it’s a much nicer experience than hitting the REST API directly, particularly because it deals with continuation tokens and making multiple calls to get your results.

Comments closed

Returning Multiple Values in Power BI with ConcatenateX

Nick Edwards shows how you can use the ConcatenateX DAX function to combine values:

In this blog post we’ll take a quick look at using ConcatenateX function to view a concatenated string of dates where the max daily sales occurred for a given month.

I came across this function whilst going through the excellent “Mastering DAX 2nd Edition Video Course” by the guys from SQLBI.com. So credit to Marco and Alberto for sharing this.

So how does it work? If we had a list of dates ranging from 01/01/2020 to 31/12/2020 and we wanted to see which days we achieved maximum sales for each given month in a year we could use the ConcatenateX function to return these dates in a single row per month.

Click through for the demo.

Comments closed

Quick Powershell Tips

Shane O’Neill has a few Powershell tips for you:

If you spend a lot of time in a PowerShell console, it’s not rash to presume that you’re going to be running some of the same commands over and over again.

That’s where PowerShell’s history comes into play.

By using the command Get-History or even its alias h , you can see the commands that you’ve run before:

Click through to see how it works, as well as a few other tips.

Comments closed

Diagram Visualization with Graphviz

Mikey Bronowski walks through an introduction to the Graphviz diagramming language:

I came across Graphviz which is an open-source graph visualization software initiated by AT&T Labs Research. It can process the graphs that are written in the DOT language.

What is the DOT language?

In short, it is a graph description language that has few keywords like graphdigraphnodeedge. You cannot miss it has something to do with graphs.

I’ve used the R implementation of this as well. It doesn’t create beautiful diagrams, but it is fast, easy, and the output makes sense.

Comments closed

Alternatives to Circling Elements on a Page

Cole Nussbaumer Knaflic has some alternatives to circling an item you want people to notice:

You’ve seen it before: a circle on a slide or graph that is meant to highlight something of note. People tend to be surprised when I express admiration towards this approach. I love that it means someone took the time to consider the data and the viewer and thought, “I’d like people to look here” or “I want to make sure my audience doesn’t miss this.” Then they took an action—adding the circle—to help ensure it.

That said, the circle is a blunt tool. It’s better than nothing: if you are facing such a time constraint that you don’t have a minute to spare for anything beyond quickly adding a circle, do it. If you do have more than a minute, however, there are other eloquent solutions you can employ. This will typically involve making changes to how you design the way the data or supporting elements are formatted.

Cole then lists out several alternatives. When I circle (or wrap with a rectangle), it’s usually one of two scenarios: either I’ve just grabbed a screenshot (or have frozen the screen in ZoomIt) and that’s my primary tool available, or I’m working with a pre-generated image and can’t change it. But when you have a chance to alter the base graph or image, Cole has several excellent techniques to make certain items stand out in contrast to others.

Comments closed

Installing TensorFlow and Keras for R on SQL Server 2019 ML Services

I have a post on using TensorFlow and Keras in R on SQL Server 2019 Machine Learning Services:

What I’m doing is building a new virtual environment named r-reticulate, which is what the reticulate package in R desires. Inside that virtual environment, I’m installing the latest versions of tensorflow-probabilitytensorflow , and keras. I had DLL loading problems with TensorFlow 2.1 on Windows, so if you run into those, the proper solution is to ensure that you have the appropriate Visual C++ redistributables installed on your server.

Then, I switched back to the base virtual environment and installed the same packages. My thinking here is that I’ll probably need them for other stuff as well (and don’t tell anybody, but I’m not very good with Python environments).

Please continue not to tell anybody that I’m not very good with Python environments. I tend to dump things in the base environment, forget which one I’m in, and all kinds of other bad practices. I think I’m secretly undermining myself in Python, but I don’t have enough proof yet.

Comments closed

E-Mail Alerting in ADF.procfwk

Paul Andrew has an update to the Azure Data Factory Procedural Framework:

The primary goal of this release was to implement email alerting within the existing processing framework and using existing metadata driven practices to deliver this in an easy to control, flexible and granular way. That said, the following statements have been met in terms of alerting capabilities and design.

Read on for the full change list.

Comments closed

Power BI Best Practice Tips

Lazaros Viastikopoulos continues a series on Power BI tips, switching from performance to best practices:

Tip 2) Organise Measures by Grouping

Tip number two goes hand in hand with the tip explained above, as after we generate explicit measures, what should we do with all those leftover implicit measures? Surely they will confuse the report authors if they are left visible. Furthermore, if we structure our data model as a Star Schema, every fact table will contain some foreign keys to establish a relationship with the primary key in the dimension (lookup) table. Should these columns remain visible for everyone to use?

Read on to learn how, as well as details for the other four tips.

Comments closed

Actual I/O Statistics in Execution Plans

Hugo Kornelis talks about a fairly recent property in execution plans:

There are two operators that read from the SalesOrderDetail table (or from indexes on that table). The top left operator is an Index Seek on one of the nonclustered indexes on SalesOrderDetail, and on the bottom input of the Nested Loops operator is a Clustered Index Scan that scans the clustered index on the same table.

So, now what? Which of the two is in this case the problem? Is each doing exactly 625 logical reads? Is one doing 50 and the other 1200? For the longest time, there was no way to find out. Sometimes you could make an educated guess by looking at the rest of the execution plan. Sometimes you can get an idea by running other queries with similar plans and check their logical reads (like in this case, you could run the subquery by itself and that would work). But none of these methods are really satisfactory.

Read on to see how the SQL Server team has addressed this.

Comments closed