Press "Enter" to skip to content

Author: Kevin Feasel

Classification Concepts and CART in Action

I have a new video series:

In this video, I explain some core concepts behind classification and introduce the first classification algorithm we will look at in CART.

CART, by the way, stands for Classification and Regression Trees, and is one of the easiest classification algorithms to understand as a concept: it’s a decision tree (aka, a series of if-else statements) where each terminal node is an outcome: either a class for classification or a value for regression.

Leave a Comment

Visualizing a Spark Execution Plan

Gerhard Brueckl builds a very helpful tool:

I recently found myself in a situation where I had to optimize a Spark query. Coming from a SQL world originally I knew how valuable a visual representation of an execution plan can be when it comes to performance tuning. Soon I realized that there is no easy-to-use tool or snippet which would allow me to do that. Though, there are tools like DataFlint, the ubiquitous Spark monitoring UI or the Spark explain() function but they are either hard to use or hard to get up running especially as I was looking for something that works in both of my two favorite Spark engines being Databricks and Microsoft Fabric.

Read on for Gerhard’s answer, including an example of it in action.

Leave a Comment

Finding Object Dependencies in SQL Server

Andy Brownsword looks for references:

When looking to migrate, consolidate or deprovision parts of a SQL solution it’s key to understand the dependencies on the objects inside.

Identifying dependencies can be challenging and I wanted to demonstrate one way to approach this. We’ll start with some objects across a couple of databases:

Read on for a pair of queries that get you on the way. Reference detection is surprisingly difficult in SQL Server, especially if you have cross-server queries. Even cross-database queries may not work the way you expect.

Another option is to use sys.dm_sql_referencing_entities and sys.dm_sql_referenced_entities. I wrote a blog post on the topic a long while back and included some of the caveats around these two DMFs.

1 Comment

Logging and Auditing in PostgreSQL

Muhammad Ali checks the logs:

In PostgreSQL, managing logs serves as a vital tool for identifying and resolving issues within your application and database. However, navigating through logs can be overwhelming due to the volume of information they contain. To address this, it’s essential to implement a well-defined logs management strategy.

Customizing PostgreSQL logs involves adjusting various parameters to suit your specific needs. Each organization may have unique requirements for logging, depending on factors such as the type of data stored and compliance standards.

In this article, we will explain parameters used to customize logs in PostgreSQL. Furthermore, we describe how to record queries in PostgreSQL and finally recommend a tool for managing PostgreSQL logs at granular level.

Read on to learn how to enable logs in Postgres, some notes on log management, and even a bit on auditing via pgaudit.

Leave a Comment

Darling Data Stored Procedure Updates

Erik Darling takes on the Royal We to announce updates:

We here at Darling Data strive to get things right the first time, but sometimes late nights and tired eyes conspire against us.

The nice thing about using these on a wide variety of SQL Servers in various states of disrepair is that bugs get spotted and sorted pretty quickly.

You can download all of the main SQL Server troubleshooting procedures I use in one convenient file.

Here’s a breakdown of changes you can find in the most recent releases!

Click through for quick changelogs for sp_QuickieStore, sp_PressureDetector, and sp_HumanEventsBlockViewer.

Leave a Comment

SQL Server Connection Strings and Power Apps

Deborah Melkin works through a pain point in Power Apps:

Power Platform is part of the Microsoft universe of products, for lack of a better phrase. But the one thing I find interesting is that the default connectors for data in the PowerPlatform sphere is Dataverse or Sharepoint. At least, when I see people talking about PowerApps is you’re connecting to one of those two connectors for your data. (Fun fact, PowerApps solutions use Dataverse to store configurations.) One would think that SQL Server database, wherever it may live Azure or on prem, would be part of that combination, but it’s actually considered a Premium connector.

I’ve mentioned that the PowerApp I’m building is using data in a SQL Server database. This matters because the type of connector you use makes a difference as you move apps from one environment to another.

Read on for more information around environment variables, why they won’t work, and one alternative solution.

Leave a Comment

Using strsplit() with Multiple Delimiters in R

Steven Sanderson shows off some more complex string splitting scenarios in R:

In data preprocessing and text manipulation tasks, the strsplit() function in R is incredibly useful for splitting strings based on specific delimiters. However, what if you need to split a string using multiple delimiters? This is where strsplit() can really shine by allowing you to specify a regular expression that defines these delimiters. In this blog post, we’ll dive into how you can use strsplit() effectively with multiple delimiters to parse strings in your data.

Read on for two examples of complex scenarios.

Leave a Comment

Adding GIFs to Power BI Reports

Riqo Chaar adds a bit of motion to cards:

This article will describe the process behind adding GIFs to card visuals in Power BI. The GIFs we will create in this article will be as follows: animated arrows, looping only once, displaying the direction of movement relating to a particular value between the current period and the previous period. These GIFs work extremely well as a visual aid, highlighting key information quickly to users, without any overstimulating effect due to a single loop being used.

This article was inspired by a video from the YouTube channel, How to Power BI.

Click through for the article. I’m pretty well on the fence about this: adding GIFs is not something I would think to do, primarily because of the distraction factor. Even so, it’s still good to know that it’s possible.

Leave a Comment

A Primer on Transactional Replication

Steve Stedman talks transactional replication:

Ensuring that your databases are synchronized across different locations with minimal delay is not just a convenience—it’s a necessity. This is where transactional replication in SQL Server shines, making it a pivotal strategy for systems that require real-time data replication with high consistency. Our latest video, “Transactional Replication in SQL Server”, dives deep into this topic, offering insights and visual walkthroughs that are invaluable for database administrators and developers.

Click through for the video and how the pieces fit together for transactional replication at a high level.

Leave a Comment

Understanding the Delta Lake Format

Reza Rad has a new post and video combo:

Please don’t get lost in the terminology pit regarding analytics. You have probably heard of Lake Structure, Data Lake, Lakehouse, Delta Tables, and Delta Lake. They all sound the same! Of course, I am not here to talk about all of them; I am here to explain what Delta Lake is.

Delta Lake is an open-source standard for Apache Spark workloads (and a few others). It is not specific to Microsoft; other vendors are using it, too. This open-source standard format stores table data in a way that can be beneficial for many purposes.

In other words, when you create a table in a Lakehouse in Fabric, the underlying structure of files and folders for that table is stored in a structure (or we can call it format) called Delta Lake.

Read on to learn more about this open standard and how it all fits together with Microsoft Fabric.

Leave a Comment