Press "Enter" to skip to content

Month: January 2024

Visualizing Power BI Import Dependencies as a Graph

Chris Webb builds graphs, but not those types of graphs–the other type of graphs:

A few years ago a new pair of Profiler events was added for Power BI Import mode datasets (and indeed AAS models): the Job Graph events. I blogged about them here but they never got used by anyone because it was extremely difficult to extract useful data from them – you had to run a Profiler trace, save the trace file, run a Python script to generate a .dgml file, then open that file in Visual Studio – which was a shame because they contain a lot of really interesting, useful information. The good news is that with the release of Semantic Link in Fabric and the ability to run Profiler traces from a Fabric notebook it’s now much easier to access Job Graph data and in this blog post I’ll show you how.

Read on to see an example of it in action.

Comments closed

Optimized Locking in Azure SQL DB

Aaron Bertrand tries out a new feature:

In a sentence: Instead of locking individual rows and pages for the life of the transaction, a single lock is held at the transaction level, and row and lock pages are taken and released as needed.

This is made possible by previous investments in Accelerated Database Recovery and its persistent version store. A modification can evaluate the predicate against the latest committed version, bypassing the need for a lock until it is ready to update (this is called lock after qualification, or LAQ). There’s a lot more to it than that, and I’m not going to dive deep today, but the result is simple: long-running transactions will lead to fewer lock escalations and will do a lot less standing in the way of the rest of your workload. Locks held for shorter periods of time will naturally help reduce blocking, update conflicts, and deadlocks. And with fewer locks being held at any given time, this will help improve concurrency and reduce overall lock memory.

Read on to learn more about how it works and Aaron’s initial thoughts on the feature.

Comments closed

Explaining Models with Classic Methods and SHAP

Michael Mayer has some ‘splainin to do:

Let’s explain a {tidymodels} random forest by classic explainability methods (permutation importance, partial dependence plots (PDP), Friedman’s H statistics), and also fancy SHAP.

Disclaimer: {hstats}, {kernelshap} and {shapviz} are three of my own packages.

What I really appreciate in here is that Michael includes classic methods here. It can be easy to say “Oh, this is old and therefore no longer relevant.” But that would be quite wrong.

Comments closed

Digging into Execution Plans for LAG() and LEAD()

Hugo Kornelis looks at a pair of useful window functions:

LAG and LEAD were introduced in SQL Server 2012. They require an OVER clause, but it can only specify PARTITION BY and ORDER BY. No ROWS / RANGE specification for a window frame. Which makes them the stand out as unusual in this series.

By default, they return a value from the last row before the current row, or from the first row after the current row, based on the specified sort order and while observing the specified partition boundaries. But there are two optional parameters, an offset to specify that you want, for instance, the third-last row or the second-next row. And the default parameter specifies a value to be used instead of NULL when the indicated row falls outside of the partition.

Click through to see what the plans look like, as well as how very welcome though potentially performance-impacting changes in SQL Server 2022 have affected this.

Comments closed

Against Publishing Power BI Model Changes from PBI Desktop

Soheil Bakhshi has some thoughts:

In a previous post, I shared a comprehensive guide on implementing Incremental Data Refresh in Power BI Desktop. We covered essential concepts such as truncation and load versus incremental load, understanding historical and incremental ranges, and the significant benefits of adopting incremental refresh for large tables. If you missed that post, I highly recommend giving it a read to get a solid foundation on the topic.

Now, let’s dive into Part 2 of this series where we will explore tips and tricks for implementing Incremental Data Refresh in more complex scenarios. This blog follows up on the insights provided in the first part, offering a deeper understanding of how Incremental Data Refresh works in Power BI. Whether you’re a seasoned Power BI user or just getting started, this post will provide valuable information on optimising your data refresh strategies. So, let’s begin.

Read on for plenty of detail, including your available options and how to use them.

Comments closed

Dynamic Parameters in Powershell

Laerte Junior explains how dynamic parameters work in Powershell:

Have you ever been in a situation that you want to call a cmdlet or a function with a parameter that depends on a conditional criteria that is available as a list? In this article I will show a technique where you can use PowerShell Dynamic Parameters to assist the user with parameter values.

In the documentation of Dynamic Parameters found at about_Functions_Advanced_Parameters in get-help it is defined as “parameters of a cmdlet, function, or script that are available only under certain conditions.” And can be created so that appears “only when another parameter is used in the function command or when another parameter has a certain value.” So, we can say that PowerShell Dynamic Parameters are used when the result of a parameter depends on the previous parameter.

Click through for examples.

Comments closed

Stored Procedure Wrapup

Erik Darling wraps up a series on stored procedures. First, cursors and loops:

You will, for better or worse, run into occasions in your database career that necessitate the use of loops and cursors.

While I do spend a goodly amount of time reworking this sort of code to not use loops and cursors, there are plenty of reasonable uses for them.

I do think we push the “don’t use cursors or loops” thing a little too hard in the SQL Server world, but I also think that a majority of cases in which you’re doing something in a loop, you should be doing it in code outside of SQL Server.

Erik then wraps things up for real:

The general idea of the series was to teach developers about the types of things I always seem to be fixing and adjusting, so that I can hopefully fix really interesting problems in the future.

Of course, that all depends on folks finding these and reading them. If that were the general sway of the world, I’d probably never had been in business in the first place.

Click through for a listing of all of the posts in the series.

Comments closed

LOWESS Smoothing in R

Steven Sanderson had me thinking of LOESS but then, bam!, snuck this in on me:

Locally Weighted Scatterplot Smoothing, or Lowess, is a powerful technique for capturing trends in noisy data. It’s particularly useful when dealing with datasets that exhibit complex patterns that might be missed by other methods. So, let’s get our hands dirty and start coding!

Read on for an example of LOWESS smoothing, which actually is a little different from LOESS. If you’re interested in learning more about the differences between LOESS and LOWESS, this Stack Exchange question and answer page is really good.

Comments closed

Using Schema Registry for Data Quality in Apache Kafka

Kai Waehner talks data quality:

Good data quality is one of the most critical requirements in decoupled architectures, like microservices or data mesh. Apache Kafka became the de facto standard for these architectures. But Kafka is a dumb broker that only stores byte arrays. The Schema Registry enforces message structures. This blog post looks at enhancements to leverage data contracts for policies and rules to enforce good data quality on field-level and advanced use cases like routing malicious messages to a dead letter queue.

Click through to learn more about the topic. This focuses a lot on the “why” and “what” but does have an example of “how” in there as well.

Comments closed