Press "Enter" to skip to content

Day: November 13, 2023

Comparing Permutation SHAP and Kernel SHAP

Michael Mayer lays some groundwork:

SHAP is the predominant way to interpret black-box ML models, especially for tree-based models with the blazingly fast TreeSHAP algorithm.

For general models, two slower SHAP algorithms exist:

  1. Permutation SHAP (Štrumbelj and Kononenko, 2010)
  2. Kernel SHAP (Lundberg and Lee, 2017)

Read on to understand more about these two forms of SHAP, as well as how they compare in two datasets of differing levels of difficulty.

Comments closed

Creating a User in Postgres

Daniel Calbimonte adds a new user two separate ways:

Open PGAdmin and connect to the server where you want to create the new user.

In the Object Browser, expand the Server and go to Login/Group Roles tree, and right-click on the folder. Select Create>Login/Group Role option from the context menu.

In addition to adding a user via PGAdmin, there’s also a script to add users via pgSQL, and that will look a lot more familiar to SQL Server administrators.

Comments closed

Fixing a Parallelism Problem Together

Reitse Eskens phones a friend:

In my previous blogpost (Click here to read) I wrote about a query that just wouldn’t go parallel. This sparked some discussion and interest from a few people who were very kind and helpful with their suggestions and even deep dives into the query plans, execution statistics etc.

To make one thing very clear, this blog is 99% their work, only the typing and rephrasing is mine. This also means that mistakes are mine as I’m trying to join the different inputs together into a logical story. So let me introduce you to the heroes, order by first name ascending.

Read on to see what people suggested and the effect that had.

Comments closed

Why Batch Mode Sort Spills are Slow

Paul White unravels a mystery:

Batch mode sorting was added to SQL Server in the 2016 release under compatibility level 130. Most of the time, a batch mode sort will be much faster than the row mode equivalent.

This post is about an important exception to this rule, as recently reported by Erik Darling (video).

No doubt you’ll visit both links before reading on, but to summarize, the issue is that batch mode sorts are very slow when they spill—much slower than an equivalent row mode sort.

Read the whole thing. Paul does a great job illuminating us.

Comments closed

Comparing Tableau and Power BI

Rob Collie throws down:

On the surface, when comparing Power BI vs Tableau, they appear very similar. Both are business intelligence tools. Both create slick dashboards and offer more than one data visualization tool. In fact, most people can’t distinguish between a dashboard created in Tableau and one created in Power BI.

Each of them has its strengths. However, their core difference lies in their approach to data integration, transformation, and analysis. Now, let’s break this down.

Read on for Rob’s take on the matter.

Comments closed

Things to Ignore: SQL Server I/O Affinity

Sean Gallardy recommends you not touch this:

I honestly have no idea how or why people tend to use this configuration option, if you know please drop me a line and let me know or put a comment below, I’m genuinely curious. When I ask people why this is set when I see/find it, I normally get a “well that’s how the last server was” or “I don’t know”. Awesome. I always like to just change random settings for no particular reason. Some days you wake up decide you’re going to go change a bunch of settings on your computer because why not, it’ll be fun.

There probably is a reason, though Sean’s speculation of “so the benchmark scores for SQL Server testing would be higher” is just as likely the cause as anything else. My fallback alternative is “one very large customer threw a lot of money Microsoft’s way to add a setting that works for them but nobody else.” There are a couple of those in the product, too.

Comments closed

Differentiating Physical and Logical Reads in SQL Server

Jose Manuel Jurado Diaz explains a concept:

In the realm of Azure SQL Database, query performance is a paramount concern for database administrators and developers alike. A critical aspect of this performance is understanding how SQL Server interacts with data, particularly through physical and logical reads. This article delves into these two fundamental concepts, providing insights into their impact on database performance and a practical lab to observe these metrics in action.

Read on for the difference, as well as a demonstration. With slow disks and insufficient RAM, it’s really important to know this difference. But as you have more RAM and move to formats like NVMe for storage, I’d argue that it’s less of an issue. The additional RAM, in particular, is important because the idea is that data access frequently will remain in the buffer pool for longer, so you’re more likely to see logical reads in action. Of course, poor indexing and bad decisions can ruin that idea, so don’t do that, okay?

Comments closed