Press "Enter" to skip to content

Author: Kevin Feasel

Managed Self-Service BI in Power BI

Gogula Aryalingam has started a series on managed self-service BI. Part 1 provides an overview of the topic:

When putting together a business intelligence strategy using Power BI, Microsoft recommends three primary strategies that an organization can adopt. Out of these, the one that I tend to go with is managed self-service BI, which brings forth the concept of discipline at the core, flexibility at the edge. This concept is the dominant strategy used for BI at Microsoft itself; explained very nicely in this article. It’s my personal favorite, because I find it an effective means of onboarding customers once the core platform is built with the required standards (discipline), and then help them adopt the solution from the edge, thus providing them with the best of both worlds.

Part 2 takes us to the edge:

Now, what happens when an analyst, for instance, has a set of sales target spreadsheets and wants to compare the figures with sales metrics so that salespeople’s performances can be measured? It certainly needs a new dataset. However, flexibility at the edge has to prevail in the right way. This post will look at how we can go about this keeping to discipline at the core, flexibility at the edge.

Note: The analyst’s requirement is at current local to their group or department. It has not yet been made an organizational requirement. That’s how most requirements start out: A requirement at the departmental level, and then when enough people start reaping the benefits within and outside of the department, it can get absorbed into the core.

Part 3 returns to the core:

One problem that we may have overlooked when building a bunch of core datasets in that post, is that certain dimensions tend to duplicate across the datasets. Imagine a scenario where the single master data source of a managed self-service setup is a data warehouse, which sources all the required dimensions. When you have, for example, core reseller sales, internet sales, and finance datasets, each one will have a calendar dimension and a few others created in each of these datasets. This is not ideal if you think about the extent of the duplication and effort that is required.

This is where, once again, using DQ for PBI datasets and AS comes into play, where you could draw up a layered core dataset architecture. If we take the example of AdventureWorks’ fact tables in the data warehouse (single master data source) you can figure out what the business processes are. 

Read on for Gogula’s thoughts. I think there’s a lot going for this particular strategy, especially in a large organization with hundreds (or thousands) of people actively using Power BI. At that point, doing everything through a central IT organization doesn’t scale very well.

Comments closed

Join Types in Spark SQL

Rituraj Khare makes some connections:

In Apache Spark, we can use the following types of joins in SQL:

Inner join: An inner join in Apache Spark is a type of join that returns only the rows that match a given predicate in both tables. To perform an inner join in Spark using Scala, we can use the join method on a DataFrame.

The set of options is the same as you’d see in a relational database: inner, left outer, right outer, full outer, and cross. The examples here are in Scala, though would apply just as easily to PySpark and, of course, writing classic SQL statements.

Comments closed

External Objects in Databricks Unity Catalog

Meagan Longoria adds external tables and views to an Azure Databricks Unity Catalog:

I’ve been busy defining objects in my Unity Catalog metastore to create a secure exploratory environment for analysts and data scientists. I’ve found a lack of examples for doing this in Azure with file types other than delta (maybe you’re reading this in the future and this is no longer a problem, but it was when I wrote this). So I wanted to get some more examples out there in case it helps others.

I’m not storing any data in Databricks – I’m leaving my data in the data lake and using Unity Catalog to put a tabular schema on top of it (hence the use of external tables vs managed tables. In order to reference an ADLS account, you need to define a storage credential and an external location.

Read on for examples of what you can do with this.

Comments closed

Trying out FLAML

Gavita Regunath provides an overview of FLAML:

FLAML is short for Fast and Lightweight Automated Machine Learning library. It is an open-source Python library created by Microsoft researchers in 2021 for automated machine learning (AutoML). It is designed to be fast, efficient, and user-friendly, making it ideal for a wide range of applications.

Click through to learn more and to give it a spin with a pair of notebooks.

Comments closed

SQL ConstantCare Population Analysis

Brent Ozar looks at the ConstantCare customer base:

Companies are leapfrogging right past SQL Server 2017. I’m going to hazard a guess that SQL Server 2017 came out too quickly after 2016, and didn’t offer enough features to justify upgrades from 2016.

Does that offer us any lessons for SQL Server 2022? Is 2022 going to be a 2017-style release that people just leapfrog over? Well, as I write this, it’s late December 2022, and I’m not seeing the widespread early adoption that I saw for 2019 where people had it in development environments ahead of the release, learning how to use it.

It’s an interesting analysis of a particular market segment: users of the ConstantCare service.

1 Comment

Importing Excel Files from SharePoint into Power BI

Gogula Aryalingam gets some data:

I’ve been asked the question many times, especially when I teach Dashboard in a Day: How do you import data from a bunch of Excel files stored on a SharePoint folder?

It’s a fairly simple process. It’s just that you need to know that you first connect to the SharePoint site (which could have quite a large number of folders and files, and several document libraries. You will need to know the path of your files, and then you keep filtering till you get to your content. I do this in two steps:

Read on for those steps.

Comments closed

Frames and Tiles in mapBliss (R)

Benjamin Smith updates an R package:

The mapBliss package is a R package which I developed which allows for users to make custom souvenir quality maps of their flights, road trips and favorite cities by utilizing the power of the leaflet and other R packages (for a full list, see the Github README here). The goal of the package is to imitate the visualization and print-ability of maps produced by businesses like Atlas.co(my original inspiration), TheLittlePenMapiful and MaptracksMe (among many other such businesses).

It’s an interesting-looking package.

Comments closed

Closures in Scala (and All FP Languages)

Pallav Gupta explains what a closure is:

Objects are more flexible for certain use cases because they carry both data members and member functions, whereas a function does not have data members.

So if there is a requirement to pass data members along with functions, How will we achieve it in functional programming ?

The answer is yes, we can achieve it using a closure and a free variable.

Read on for an example.

Comments closed

Finding Blockers in Azure SQL DB + MI

Jose Manuel Jurado Diaz writes a program:

Today, I worked on a service request that our customer is looking for all blocking issues that is happening in their database. We have many articles about it Lesson Learned #22: How to identify blocking issues? – Microsoft Community Hub and in Diagnostics Settings and QDS we can collect this information but all points to that we cannot see the TSQL that is blocking and TSQL command that is blocked in an easy way. In the following script that I share as a script example we could take this one. 

First of all, please, remember that a blocking issue is normal and fundamental for any RDBMS. This script is basically when you need to understand what is happening to improve or reduce this.

I was a little surprised the answer wasn’t to use Extended Events, though this does work if you simply need something to run in ad hoc scenarios.

Comments closed