Press "Enter" to skip to content

Month: August 2024

Power BI Data Type Optimization

Nikola Ilic shows how important it can be to choose the right data types:

For demo purposes, I’ll be using a fact table that contains the data about chats performed by a customer support department of the fictitious company Customer First. This table includes approximately 9 million rows, which is not considered a large table in the context of Power BI and analytical workloads. For the sake of simplicity, let’s pretend that our model consists of only this single table. Finally, a semantic model is configured as an Import mode model. If you want to learn how your data is stored in Power BI, I suggest you start by reading this article first.

Data was loaded into Power BI from the underlying data source (SQL Server database) as-is, without any additional optimizations applied.

Nikola walks through the process of finding the most expensive columns in terms of data size and using the least precise acceptable value. One other thing that I commonly see is identity columns or other keys on fact tables. Those are very rarely necessary, because the point of a fact table is typically to aggregate it in some fashion. And these keys are unique (by design), meaning they won’t compress very well and will take up a lot of space. Looking at Nikola’s example, my next question would be, knowing that the name of the table is factChat, does chatID tie to some chat dimension? If not, is it actually necessary for reporting? Again, if not, that could shave off another 60 MB or so from the data model.

Comments closed

Finding Columns in Memory in Power BI Direct Lake Mode

Chris Webb goes searching:

As you probably know, in Power BI Direct Lake mode column data is only loaded into memory when it is needed by a query. I gave a few examples of this – and how to monitor it using DMVs – in this blog post from last year. But which columns are loaded into memory in which circumstances? I was thinking about this recently and realised I didn’t know for sure, so I decided to do some tests. Some of the results were obvious, some were a surprise.

Read on for the answer.

Comments closed

Troubleshooting Non-Editable Power Query Parameters in Microsoft Fabric

Soheil Bakhshi digs into a problem:

Power Query is a powerful tool within the Microsoft Fabric environment, enabling users to manage data sources and transform data efficiently. However, a common issue you may face is that after publishing the Semantic Model, the Power Query parameters either do not appear or are greyed out, making them non-editable. In this post and its accompanying YouTube video, I’ll walk you through the steps to diagnose and fix these problems, ensuring that your parameters work as expected in your published semantic models.

Click through for the video and a pair of common reasons.

Comments closed

Auditing a SQL Server Database Configuration

Ben Johnston continues a series on auditing:

This continues my series on auditing SQL Server. The fist parts covered discovery and documentation, server level hardware audits and SQL Server engine level audits. This section examines database configuration audits. As with the previous audit sections, the boundaries for the audit can be blurry. I try to stick to configuration items only, but I also discuss some code smells and items that can impact performance or might be covered in a code review. You will need to determine the scope of your audits and how much you want to cover in this portion of the audit.

This follows the patterns of the previous audits, starting with a list of items to validate, followed by key points to examine for each of those items, and ends with scripts or tactics to gather the actual audit results. As with previous audits, there are multiple methods to examine each item, but I generally prefer scripts due to their repeatability, especially when they need to be run by another team.

Read on for a general template, followed by details on each section.

Comments closed

Analyzing the Game Wingspan

Dan Oehm builds a meta:

Wingspan is a great game even though I’ve only played it a few times. The mechanics are great, there are lots of bird varitions, and a bunch of different strategies to try. There are 170 birds, and I’ve probably only seen 30 of them. So, true to form, I’ve dabbled in a bit of data analysis to get a view of all the different types of cards in the game.

Open source wins again since the {wingspan} R package exists. It contains the details of each bird in the core, European, Oceania, and swift start sets. I’ll only be using the core set for this analysis since that’s the only one I’m semi familiar with.

Having not played the game before, Dan’s visuals drew me in. There’s also a regression analysis and discussion of the trade-off between in-game power versus victory points. H/T R-Bloggers.

Comments closed

A Primer on One-Hot Encoding

Vinod Chugani does a bit of data modeling:

Preparing categorical data correctly is a fundamental step in machine learning, particularly when using linear models. One Hot Encoding stands out as a key technique, enabling the transformation of categorical variables into a machine-understandable format. This post tells you why you cannot use a categorical variable directly and demonstrates the use One Hot Encoding in our search for identifying the most predictive categorical features for linear regression.

Read the whole thing.

Comments closed

T-SQL Tuesday 177 Roundup

Mala Mahadevan gives us the low-down on database code management:

I was privileged to host yet another T-SQL Tuesday, for the month of August, 2024. My topic was on Managing database code. I was worried about getting responses, given that most of the community is no longer on X/Twitter and my invite didn’t seem to get much attention. I was wrong. People seem to keep track of this by other means and the response was excellent. Summary as below.

Read on for 11 takes on the topic.

Comments closed

Table Partitioning in PostgreSQL

Lee Asher talks partitions:

Partitioning in PostgreSQL was for many years a feature almost entirely ignored, and even today is often thought of only as a means to increase performance on large tables. While it’s certainly useful for this, it’s also an invaluable tool for solving common data maintenance problems. In this article, we’ll examine the feature from this perspective.

Let’s consider a common situation: a table continually ingesting time series data ordered chronologically: purchase orders, forum posts, time-series data, whatever. Most systems contain more than one table like this. And because we live in a world of Big Data, these tables can contain hundreds of millions, even billions of rows. Business rules require the data to remain active for some fixed period of time, then be deleted, (or archived first, then deleted). Commonly this is a monthly process: every month copy data that has “aged out” to another location, then delete the rows from the live table.

It’s interesting that people consider partitioning a performance-improving technique in the PostgreSQL world, but in the SQL Server world, partitioning is primarily for ease of maintenance and we warn against thinking about it as a performance-improving technique.

Comments closed

Checking SQL Server Backup Status with sp_CheckBackup

Jeff Iannucci announces a new tool:

Whether you are backing up your SQL Server databases with Ola Hallengren’s scripts, Maintenance Plans, or some third-party software, your backups are your lifeline for recovering your data in case of a disaster.

So…are your databases recoverable? Do you know WHERE your most recent backups were written? Do you know WHEN they were written? Did you verify the backups for recoverability? Have you backed up any encryption certificates associated with your backups?

And if you think know the answers to these questions…are you sure? And can you confirm your Recovery Point Objective (RPO)?

Read on to learn more about the procedure and grab a copy from the GitHub repo.

Comments closed

SQL Server on Linux for Disaster Recovery

Mika Sutinen shares a use case:

SQL Server on Linux has been an option for quite a while already, but I have to admit that personally, I have previously not been a very excited about that idea. This is mostly due to some complexities and limitations of SQL Server on Linux. However, the recent CrowdStrike incident made it quite obvious, that there’s a great use case for it. And that use case is Disaster Recovery.

The idea of having some operating system variety makes sense, though read the whole thing, as some companies will use components that SQL Server doesn’t offer for Linux.

And if you want a deep dive into using SQL Server on Linux, I’ll do a bit of self-promotion and reference my video series on the topic.

Comments closed