Press "Enter" to skip to content

Curated SQL Posts

Decision Trees and Non-Tabular Data

Ivan Palomares Carrascosa explains that you can use more than standard structured data against decision trees:

Versatile, interpretable, and effective for a variety of use cases, decision trees have been among the most well-established machine learning techniques for decades, widely used for classification and regression tasks. Yet, they are still widely used — whether as standalone models or as components of more powerful ensemble methods like random forests and gradient boosting machines.

And there is one more attractive feature that pushes the boundaries of their versatility even further: they can accommodate data in diverse formats, beyond just fully structured, tabular data. This article examines this facet of decision trees from a balanced theoretical and practical approach.

Click through for an example.

Leave a Comment

Event Notification via LISTEN/NOTIFY in PostgreSQL Doesn’t Scale

Elliot Levin takes us through a performance issue:

We love Postgres and it lives at the heart of our service! But this extremely concurrent, write-heavy workload resulted in a stalled-out Postgres. This is the story of what happened, how we ended up discovering a bottleneck in the LISTEN/NOTIFY feature of Postgres (the event notifier that runs based on triggers when something changes in a row), and what we ended up doing about it.

Click through for details, as well as what the team there did to migrate away from this feature.

Leave a Comment

Replication Types and Modes in PostgreSQL

Semab Tariq lays out three types of replication available in PostgreSQL:

PostgreSQL also supports replication to keep standby servers in sync with the primary server using Write-Ahead Log (WAL) files. Every change made to the database is first recorded in these WAL files on the primary server. These logs are then continuously streamed to the standby server, which applies them to stay up to date. This method ensures that all standby servers stay in sync with the primary and are ready to be promoted in case the primary server fails.

In this blog, we will explore the different types and modes of replication available in PostgreSQL to help you understand which option best fits your business needs.

Click through for the list, as well as an overview of each replication type.

Leave a Comment

Capture Long-Running Queries via Extended Events

Tom Collins has another extended events session for us:

A SQL Server Extended Event to track SQL queries taking longer than 100 seconds to complete. Adjust accoring to your requriements.

There is also a query below to extract the column details from the xel file 

Click through for the code. This kind of extended events session is rather useful for performance tuning and finding issues before customers e-mail.

Leave a Comment

Information Disclosure Vulnerability in SQL Server

Mike Walsh takes us through a recent CVE entry:

On patch Tuesday this week, Microsoft released an Important severity security update (a CVSS base score of 7.5)

The details of this 0-day exploit are available to read at the NIST site, and the Microsoft security update site.

In short, the exploit that Microsoft has discovered and subsequently fixed can allow information disclosure.

The NIST entry is kind of a joke right now, and the Microsoft security update info is basically what they submitted to NIST plus links to download the patches. Still, this is worth patching and it’s an issue that goes back at least to 2016—probably earlier, but 2016 is the last version of SQL Server that still gets security updates.

Leave a Comment

Accessing Delta Lake Tables as Iceberg Data

Matthew Hicks makes an announcement:

We’re thrilled to announce an exciting new Preview capability in OneLake: you can now automatically read Delta Lake tables using Apache Iceberg compatible readers, with no need for migration, copying, or manual conversion. This enhancement gives data engineers and analytics teams unprecedented flexibility in how they access and interact with their data.

This is pretty neat, given that Iceberg is the other popular format for data lakes.

Leave a Comment

The Through-the-Door Problem in Credit Risk Modeling

Richard Vale takes us through a data challenge:

In credit risk modelling, you want to calculate the probability that a loan will default. Since different financial institutions gather different data and offer different products, there is no one-size-fits-all approach to doing this. Therefore, credit risk models are usually built using the institution’s own data. For example, if I’m building a credit risk model for XYZ Bank, I look at loans which XYZ bank has previously granted, and try to estimate the probability that a future loan will default based on principal, tenor, the borrower’s credit rating, and so on.

For those who haven’t heard of the through-the-door problem before, this is a good moment to pause and think about what is wrong with this. Why does this process contain a huge pitfall?

Click through for the answer, as well as an example of the problem and one way to get around this. H/T R-Bloggers.

Leave a Comment

Troubleshooting Permissions in SQL Server

Jon Russell takes a look at several ways to gauge user permissions:

Understanding and troubleshooting SQL Server permissions can be challenging, especially when direct grants, role inheritance, ownership chains, and explicit denies all interact. The six scenarios that follow show how the engine decides who can do what, then demonstrate the diagnostic steps that reveal why it made that decision. Each section provides a setup script you can run in a dedicated test database, followed by a diagnostic query and a short explanation of the result.

Click through for six methods. I do wish that, instead of sys.fn_my_permissions() there was some sys.fn_user_permissions(@UserName) option. I realize that you can execute as a specific user and then run the function, but I had dreamed for years about having a way to track effective user permission changes, and sys.fn_my_permissions() requires not only that you have the authority to execute as a specific user, but also that you know all of the relevant users.

Leave a Comment

Goodbye Default Contributor Role in Fabric Workspace Identities

Varun Jain makes a security announcement:

Fabric workspace identity is an automatically managed service principal that can be associated with a Fabric workspace. Fabric workspaces with a workspace identity can securely read or write to firewall-enabled Azure Data Lake Storage Gen2 accounts through trusted workspace access for OneLake shortcuts. Fabric items can use the identity when connecting to resources that support Microsoft Entra authentication. Fabric uses workspace identities to obtain Microsoft Entra tokens without the customer having to manage any credentials. 

Previously, a workspace identity was automatically assigned the workspace contributor role and had access to workspace items.  

Read on to see what’s changing, why, and what you can do instead.

Leave a Comment

Building a Pareto Chart in Power BI

Boniface Muchendu creates a Pareto chart:

Creating a Pareto chart in Power BI is a powerful way to visualize the 80/20 rule in action. This type of chart helps you quickly identify the top contributors to your business metrics—whether you’re analyzing sales, categories, or customer segments. In this guide, you’ll learn how to build a dynamic Pareto chart using DAX, customize it, and apply it across different data dimensions.

Read on for the instructions.

Leave a Comment