Press "Enter" to skip to content

Month: December 2023

Looping through Lakehouses in Microsoft Fabric Spark Jobs

Dennes Torres builds a loop:

I have published videos and articles before about Lakehouse maintenance. In this article I want to address a missing point for a lot of Fabric administrators: How to do maintenance on multiple lakehouses that are located in different workspaces.

One of the videos I have published explains the maintenance of multiple lakehouses, but only addresses maintenance in a single workspace. Is it a good idea to keep multiple lakehouses in the same workspace? Probably not.

Click through for the process.

Comments closed

Explaining Odds Ratios

Steven Sanderson explains the concept of an odds ratio:

Imagine a loan officer flipping a coin to decide whether to approve your loan. Odds ratios tell you how much more likely one factor (like your income) makes the “heads” (approval) side appear compared to another (like your student status).

In logistic regression, odds ratios compare the odds of an event (loan default, in our case) for two groups defined by a specific variable. They’re like multipliers: greater than 1 means something increases the chances of default, while less than 1 means it decreases them.

As for why, we use odds ratios because it’s hard to track and interpret changes in probabilities directly, at least when you’re thinking small numbers.

Comments closed

GPS Data in PostGIS

Ryan Lambert clues us in:

One of the key elements to using PostGIS is having spatial data to work with! Lucky for us, one big difference today compared to the not-so-distant past is that essentially everyone is carrying a GPS unit with them nearly everywhere. This makes it easy to create your own GPS data that you can then load into PostGIS! This post explores some basics of loading GPS data to PostGIS and cleaning it for use. It turns out, GPS data fr om nearly any GPS-enabled device comes with some… character. Getting from the raw input to usable spatial data takes a bit of effort.

This post starts with using ogr2ogr to load the .gpx data to PostGIS. Once the data is in PostGIS then we actually want to do something with it. Before the data is completely usable, we should spend some time cleaning the data first. Technically you can start querying the data right away, however, I have found there is always data cleanup and processing involved first to make the data truly useful.

Click through for an example of how it all fits together.

Comments closed

Generating Fabric Delta Tables from Power BI Semantic Models

Nikola Ilic is excited:

A few days ago, while preparing materials for the customer training on Microsoft Fabric, I stumbled upon a very interesting article at Microsoft Learn. The article describes how to integrate Power BI semantic models (aka datasets) into OneLake.

At first glance, this doesn’t sound like something epic, but when I started thinking more and more about it, I realized that this really might be a huge thing in many different scenarios. First of all, at the moment of writing, this feature is still in preview – this means, it can change to some extent in the coming months, before eventually becoming GA. Nevertheless, I decided to take a shot and explore what can be done with OneLake integration for semantic models.

Read on to learn more about what this is doing and what you can do with it.

Comments closed

100 Estimated Rows

Deborah Melkin finds the optimizer in a “Dunno, here’s a guess” scenario:

A little while back, my coworker was having trouble with a query and asked me about a strange thing they were seeing in the execution plan: The estimated rows for the query was showing 100.

100 felt like an awfully specific number. And there were two scenarios I knew of where the SQL Server cardinality estimator immediately used that value – table variables and multi-statement table value functions with SQL 2016 compatibility or earlier or as part of an APPLY where interleaved execution doesn’t apply instead of a JOIN.

The statement in question didn’t use either so what was the issue?

Read on for the solution.

Comments closed

Troubleshooting the Automation of a Process

Reitse Eskens tells a tale of woe:

When I got the job to restore a bacpac file, I didn’t think much of it. Because it’s quite easy. Get the bacpac and restore it with Azure Data Studio or Sql Server Management Studio. Get coffee, done. Then I got these requirements:

  • The bacpac needs to be downloaded daily from an sFTP server, without certificate
  • The bacpac file is in a zip file
  • The zip file is password protected
  • Everything must be done serverless (Azure Automation, Logic App, Function App, Data Factory and/or Synapse Analytics)
  • It’s a daily process that needs to be done without human supervision

Welcome to cloud development: Part A is easy, Part B is easy, Part C is mildly challenging, and combining A with B and C is a total nightmare because it turns out that A and B aren’t compatible, so by the end, you’re dealing with A” and b and D (because C, C’, C”, etc. wouldn’t work and c and c’ would work but had severe limitations preventing you from using it in this scenario).

Comments closed

Join Functions in DAX

Marco Russo and Alberto Ferrari join datasets together:

Readers with knowledge of SQL know that the join operation is widespread in SQL queries, as it is the standard way to combine data stored in different tables. It is however uncommon to explicitly join tables in DAX because the relationships in the data model provide enough information to allow many DAX functions to work without an explicit join operation. Most of the time, the join between tables is implicit and automatic.

However, DAX has two explicit join functions: NATURALLEFTOUTERJOIN and NATURALINNERJOIN. Apparently, these functions correspond to the behavior of LEFT OUTER JOIN and INNER JOIN in SQL. However, they differ from SQL in how you specify the join condition. This article shows how these functions can be used in DAX with practical examples. If you need a more introductory article about the syntax of these functions, read From SQL to DAX: Joining Tables, where we compare the SQL syntax with similar DAX functions.

Click through to learn more about how these functions work and what their limitations are.

Comments closed

Trying to Load a Table in Microsoft Fabric

Eugene Meidinger walks onto a field of rakes:

Last week, I struggled to load the data into Fabric, but finally got it into a Lakehouse. I was starting to run into a lot of frustration, and so it seemed like a good time to back up and get more oriented about the different pieces of Fabric and how they fit together. In my experience, it’s often most effective to try to do something, review some learning, and alternate. Without a particular pain point, it’s hard for the information to stick.

Read on for some thoughts on andragogy, learning paths, and travails loading data.

Comments closed

Warehousing and Power BI in Microsoft Fabric

Tomaz Kastrun continues a series on Microsoft Fabric. Day 15 covers building a warehouse:

I have named my as “Advent2023_DWH”.

You can create a warehouse using T-SQL scripts, from data flow gen2, from data pipelines and from the sample data. Let’s select the sample data and grab a coffee.

Day 16 looks at data pipelines:

With the Fabric warehouse created and explored, let’s see, how we can use pipelines to get the data into Fabric warehouse.

In the existing data warehouse, we will introduce new data. By clicking “new data”, two options will be available; pipelines and dataflows. Select the pipelines and give it a name.

And Day 17 provides a primer on how Power BI can read Fabric assets:

Within the Power BI in Fabric, you will find many of the components, that can be used to create a final report. And here are the components:

Comments closed

Fighting Heteroskedasticity in Regression Problems

Steven Sanderson deals with my favorite failure of BLUE (mainly because I love the name):

Tired of your least-squares regression model giving wonky results because some data points shout louder than others? Meet Weighted Least Squares (WLS), the superhero of regression, ready to tackle unequal variance (heteroscedasticity) and give your model the justice it deserves! Today, we’ll dive into the world of WLS in R, using base functions for maximum transparency. Buckle up, data warriors!

Read on to see how Weighted Least Squares helps in data analysis when you have heteroskedasticity.

Comments closed