Press "Enter" to skip to content

Curated SQL Posts

Fabric Workload Items in the Scanner API

Gilbert Quevauvilliers checks out the latest changes to the Scanner API:

All Fabric Workload Items are now available from the Scanner API

I was working with the customer and was looking for some information in the Scanner API.

For a change I went into Power Query and expanded the workspaces item.

Read on for more information. And if you’d like to learn more about the Scanner API, here’s a sample application showing how to use it.

Comments closed

Central Management Servers and SSMS 20

Greg Low works around an issue:

I’ve recently been doing work with a site that makes extensive use of Central Management Servers. And that’s an issue if you upgrade past v19.3 of SSMS.

Here’s my counter-argument: how frequent is it to find organizations that have enough SQL Server instances to make a Central Management Server worthwhile and also do not have any sort of certificate management process?

And more importantly, why don’t they have certificate management processes in place for SQL Server? This isn’t 2008 anymore—everybody (for some slight exaggeration of the term “everybody”) has certificate management in place for websites. It’s incredibly rare to find websites without TLS certificates, so somebody in your organization is managing certificates somehow. Why are these people not also managing certificates for SQL Server? Because once you have proper certificates in place rather than self-signed certs, there is no SSMS problem.

And if money is the issue, money is not the issue. Note that Daniel’s post is over 6 years old (and here’s me self-linking for street cred), meaning any company without the budget for proper certificates could have put this into place anytime over the past 6 years.

Self-signed certificates are okay for debugging purposes on personal machines. But they should not be acceptable for connecting to SQL Server in any environment. Certificate-driven encryption is a critical part of securing data movement over the wire, and a trusted certificate chain is critical for ensuring attackers cannot sit in the middle of that connection and read the data.

Comments closed

Quantile Normalization with TidyDensity

Steven Sanderson achieves normality:

In data analysis, especially when dealing with multiple samples or distributions, ensuring comparability and removing biases is crucial. One powerful technique for achieving this is quantile normalization. This method aligns the distributions of values across different samples, making them more similar in terms of their statistical properties.

Read on to see how you can use the TidyDensity package to pull this off.

Comments closed

Filtering a Visual by a Measure via a Slicer in Power BI

Meagan Longoria solves a problem:

Have you ever wanted to filter a visual by selecting a range of values for a measure? You may have found that you cannot populate a slicer with a measure. But you can do this another way.

I have a report that shows project expenses and budgets. I want users to be able to filter the list of project to only those which have expenses within my selected range. I also have 2 other slicers for project budget and percent of budget used, but let’s just focus on the expense amount slicer.

Read on to see how.

Comments closed

The Challenge of Developing PostgreSQL Features

Robert Haas talks about a development challenge:

Hacking on PostgreSQL is really hard. I think a lot of people would agree with this statement, not all for the same reasons. Some might point to the character of discourse on the mailing list, others to the shortage of patch reviewers, and others still to the difficulty of getting the attention of a committer, or of feeling like a hostage to some committer’s whimsy. All of these are problems, but today I want to focus on the purely technical aspect of the problem: the extreme difficulty of writing reasonably correct patches.

Read on for Robert’s experience developing incremental backups in Postgres. In fairness, I think this is true of any complex system which becomes mission-critical. It’s really easy to develop in low-risk, limited-code, greenfield environments. As you change each of those properties, development gets considerably more challenging, even if people are doing the right things the right way and checking ego at the door.

Comments closed

Visualizing a Spark Execution Plan

Gerhard Brueckl builds a very helpful tool:

I recently found myself in a situation where I had to optimize a Spark query. Coming from a SQL world originally I knew how valuable a visual representation of an execution plan can be when it comes to performance tuning. Soon I realized that there is no easy-to-use tool or snippet which would allow me to do that. Though, there are tools like DataFlint, the ubiquitous Spark monitoring UI or the Spark explain() function but they are either hard to use or hard to get up running especially as I was looking for something that works in both of my two favorite Spark engines being Databricks and Microsoft Fabric.

Read on for Gerhard’s answer, including an example of it in action.

Comments closed

Classification Concepts and CART in Action

I have a new video series:

In this video, I explain some core concepts behind classification and introduce the first classification algorithm we will look at in CART.

CART, by the way, stands for Classification and Regression Trees, and is one of the easiest classification algorithms to understand as a concept: it’s a decision tree (aka, a series of if-else statements) where each terminal node is an outcome: either a class for classification or a value for regression.

Comments closed