Press "Enter" to skip to content

Day: March 5, 2024

An Overview of Data Partitioning Strategies

thanhdoancong (there are spaces in there somewhere but I’d probably guess wrong) talks partitions:

Data partitioning is the magic wand that divides your massive dataset into smaller, organized subsets called partitions. These partitions are based on specific criteria, like date ranges, customer segments, or product categories.

It’s like organizing your overflowing closet by color, season, or type of clothing. Each section becomes easier to browse and manage, making life (and data analysis) much easier.

Read on for a few varieties of partitioning and how they could improve your data estate. There’s no guarantee that partitioning will definitely improve performance—and in SQL Server’s case, the partitioning feature often does not improve performance at all because that isn’t its intent—but this is a good read to get an idea of what strategies are available.

Comments closed

Implicit Conversion Works in Both Directions

Deborah Melkin shares a lesson:

I’ve been working on a project where I have to do a lot of interesting data manipulation. Unfortunately, dealing with implicit conversions has been part of the norm. So naturally, I managed to run across an interesting scenario that had me stumped, thinking that I ran into a strange edge case.

The tl;dr version is that this wasn’t an edge case or some undocumented issue or a bug or anything other than the db engine doing its job. I was looking at one scenario and missed the rest of the clues. But it’s worth sharing how these things are easily missed.

Click through for the full story.

Comments closed

SSMS 20 and Default Security

Brent Ozar notes a change:

SQL Server Management Studio 20 Preview 1 is out, and the new connection dialog has a big change:

When you click Connect, you’re likely going to get an error:

Read on for the quick-and-easy solution, which brings behavior back to the pre-SSMS 20 default, as well as the long-term solution to prevent it from being an issue at all.

This brings SSMS in line with Azure Data Studio, which has defaulted to requiring certificates for quite some time. Note that you will need to select “Trust server certificate” if you are using a self-signed cert, though self-signed certs remove one of the two benefits of using certificates in the first place. The first is that certificates allow for encrypting the Tabular Data Stream (TDS) packets SQL Server sends over the network. Self-signed certs do just as good a job of that task as certificates you get from a trusted authority.

The second use case of certificates, however, is ensuring that this is definitely the machine and service you intend to connect to. If an attacker takes over the machine and swaps out the certificate with their own, your client should panic a bit because that’s your early-warning indicator that something is wrong.

Comments closed

Measuring Query Times in Power BI DirectQuery Mode

Chris Webb breaks out the stopwatch:

If you’re tuning a DirectQuery semantic model in Power BI one of the most important things you need to measure is the total amount of time spent querying your data source(s). Now that the queries Power BI generates to get data from your source can be run in parallel it means you can’t just sum up the durations of the individual queries sent to get the end-to-end duration. The good news is that there are new traces event available in Log Analytics (though not in Profiler at the time of writing) which solves this problem.

Read on to learn more about this event.

Comments closed

Measuring Write Speeds in SQL Server

Vlad Drumea performs a test:

In this post I cover a script I’ve put together for measuring storage write speeds in SQL Server, namely against database data files.

This is meant to help get an idea of how the underlying storage performs when SQL Server is writing 1GB of data to a database.

At this point, you might be asking yourself: “Why not use CrystalDiskMark instead?”.
The answer is simple: you might not always be able to install/run additional software in an environment. Even more so if you work with external customers or you’re a consultant. It’s a lot simpler to ask a customer to run a script and send you the output, than it is to ask them to install and run some 3rd party software.

Click through for the script, what it does, and how to run it, as well as a note on limitations and example based on three drives.

Comments closed