Press "Enter" to skip to content

Month: January 2024

Parallelizing Notebook Runs in Microsoft Fabric via Python

Sandeep Pawar kicks off multiple notebooks at once:

The notebook class in mssparkutils has two methods to run notebooks – run and runMultiple . run allows you to trigger a notebook run for one single notebook. Mim wrote a nice blog to show how to use it and its usefulness.

runMultiple , on the other hand, allows you to create a Direct Acyclic Graph (DAG) of notebooks to execute notebooks in parallel and in specified order, similar to a pipeline run except in a notebook.

Read on to learn more about the advantages of this latter approach as well as how you can do it.

Comments closed

Replacing DISTINCT with EXISTS

Andy Brownsword makes a switch:

The DISTINCT clause in a query can help us quickly remove duplicates from our results. Sometimes it can be beneficial to stop and ask why. Why do we need to use the clause, why are we receiving duplicates from our data?

I see this typically due to a JOIN being used where we don’t really want all of those results. This could be a ‘does something exist’ check such as if a customer has ever ordered before. The issue comes when there are multiple rows returned like a frequent customer in this example.

As an alternative to this, Andy shows how you can use the EXISTS clause to find records matching some criterion.

Comments closed

Finding the Earliest Date in R

Steven Sanderson puts on the archaeologist’s fedora and bullwhip:

Greetings, fellow data enthusiasts! Today, we embark on a quest to uncover the earliest date lurking within a column of dates using the power of R. Whether you’re a seasoned R programmer or a curious newcomer, fear not, for we shall navigate through this journey step by step, unraveling the mysteries of date manipulation along the way.

Imagine you have a dataset filled with dates, and you’re tasked with finding the earliest one among them. How would you tackle this challenge? Fear not, for R comes to our rescue with its arsenal of functions and packages.

Click through to see how, keeping those pernicious missing values in mind.

Comments closed

Finding the Cake Dataset’s Original Source

Rasmus Baath has done a good deed for all:

In statistics, there are a number of classic datasets that pop up in examples, tutorials, etc. There’s the infamous iris dataset (just type iris in your nearest R prompt), the Palmer penguins (the modern iris replacement), the titanic dataset(s) (I hope you’re not a guy in 3rd class!), etc. While looking for a dataset to illustrate a simple hierarchical model I stumbled upon another one: The cake dataset in the lme4 package which is described as containing “data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures [as] presented in Cook (1938)1”. For me, this raised a lot of questions: Why measure the breakage angle of chocolate cakes? Why was this data collected? And what were the recipes?

Read on as Rasmus unravels the mysteries of the cake dataset with the help of several others. H/T R-Bloggers.

Comments closed

Power BI Themes Gallery

Seth Bauer shares a tip:

Welcome to today’s tutorial where we’ll explore the Power BI Tips+ Theme Generator and its incredible features designed to streamline your Power BI report building experience. In this walkthrough, we’ll guide you through the process of getting started with the Power BI Tips+ Gallery, focusing on the Sunset theme. By the end of this tutorial, you’ll be able to effortlessly integrate our pre-configured Gallery Projects into your Power BI reports. It doesn’t get easier than this!

Click through for the process, including a video on how to do it.

Comments closed

A Review of Data Platform Trends

Brendan Tierney checks the lists:

Each January I take a little time to look back on the Database market over the previous calendar year. This year I’ll have a look at 2023 (obviously!) and how things have changed and evolved.

In my post from last year (click here) I mentioned the behaviour of some vendors and how they like to be-little other vendors. That kind of behaviour is not really acceptable, but they kept on doing it during 2023 up to a point. That point occurred during the Autumn of 2023. It was during this period there was some degree of consolidation in the IT industry with staff reductions through redundancies, contracts not being renewed, and so on. These changes seemed to have an impact on the messages these companies were putting out and everything seemed to calm down. These staff reductions have continued into 2024.

Click through for Brendan’s thoughts on the new SQL:2023 standard and common estimations of data platform product utilization. I’ve historically focused on DB-Engines rather than TOPDB, but they’re both pointing in similar directions at the top.

Comments closed

Stack Overflows in Power Query

Chris Webb hits an error:

If you’re writing your own M code in Power Query in Power BI or Excel you may run into the following error:

Expression.Error: Evaluation resulted in a stack overflow and cannot continue.

If you’re a programmer you probably know what a stack overflow is; if you’re not you might search for the term, find this Wikipedia article and still have no clue what has happened. Either way it may still be difficult to understand why you’re running into it in Power Query. To explain let’s see some examples.

Read on to learn what a stack overflow is, how you can create one, and circumstances in which one might arise in Power Query and M. Now Chris has me thinking about tail call recursion this early in the morning.

Comments closed

Microsoft Fabric Free Trial Capacities

Soheil Bakhshi digs into the fine print:

If you are evaluating Microsoft Fabric and do not currently own a Premium Capacity, chances are you’re using Microsoft Fabric Trial Capacities. All Power BI users within an organisation or specific security groups given the rights can opt into Fabric Trial Capacities. Therefore, you may already have several Trial Fabric Capacities in your tenant. Your Fabric Administrators can specifically control who can opt into the Fabric Trial capacities within the Fabric Admin Portal, on the Help and support settings section, and enabling the Users can try Microsoft Fabric paid features setting as shown in the following image:

Read on for a lot more detail, including several common issues you might find along the way.

Comments closed

Being Smart about Missing Index Requests

Erik Darling doesn’t trust the system:

SQL Server’s missing index requests (and, by extension, automatic index management) are about 70/30 when it comes to being useful, and useful is the low number.

The number of times I’ve seen missing indexes implemented to little or no effect, or worse, disastrous effect… is about 70% of all the missing index requests I’ve seen implemented.

This has been pretty close to my experience as well. Click through for a much better approach to the task.

Comments closed