Press "Enter" to skip to content

Curated SQL Posts

Performance Tuning Tables with Filters in Power BI

Chris Webb doesn’t want to wait:

There are four columns: Date, Town and two measures. One measure called [Fast Measure] is, as the name suggests, very quick to execute; the other measure, called [Slow Measure], is very complex and slow. The definitions are irrelevant here. Notice that there is a filter on this table visual so only the rows where [Fast Measure] is greater than 1 are shown.

If I measure the amount of time to render this table in Performance Analyzer, it takes around 17.5 seconds to run. However, if I remove the filter on [Fast Measure], the table only takes 8 seconds to run. Why? The filter is on the fast measure and surely more rows are returned without the filter, so wouldn’t the slow measure be evaluated more?

Click through for the answer.

Comments closed

Defining Production-Grade Code

Deb Melkin puzzles on a topic:

To me, the most obvious part is that code going out into production can pass code review and testing. You don’t want to send out code that has obvious bugs – like having incorrect joins or inaccurate calculations or poor query plans, etc. Code should also go through reviews, not just to make sure the obvious bugs are there but other basic development standards, like naming conventions and proper commenting, have been followed.

There’s also an interesting bit about proofs of concept that I wanted to expand upon a little. The terms “pilot” and “proof of concept” sound interchangeable (and people switch the terms around quite often) but mean rather different things. A proof of concept is an attempt to see if some concept actually works: for example, can I have my .NET-based web app send 5,000 messages per second to an Azure Event Hub without things falling apart? Proofs of concept are intended to be quick-and-dirty code which do not take into account security, robustness, or code quality. This is throw-away code and should never go into production.

By contrast, a pilot is intended to be production-ready but serves as a scaled-down version of a product. A pilot data warehouse might only include data for one department in the organization but the design accommodates future growth and the people developing it do everything they would do to make the warehouse robust and production-ready.

The really tricky part is that companies will often ask for a proof of concept and then treat it like a pilot, which is where they go wrong and end up spending years on awful maintenance of an awful product.

Comments closed

A Checklist for Code Quality

Deepthi Goguri has a list and is checking it twice:

3. Check for the execution times based on the DEV, TEST data. I have seen scenarios where the data size in the DEV and TEST environments are not same as production data. This can cause the difference in the execution times from lower environments to production. It is always suggested to test your code in lower environments having same amount of data as production.

Read on for the full list.

Comments closed

Commenting on Actions in Production

Andy Leonard wants to know whodunnit:

While “Commands completed successfully” is helpful to know – and definitely better than an error message – it does not inform the person executing the command precisely what just happened. Rather, “Commands completed successfully” informs the user that “something happened,” and as far as SQL Server Management Studio is concerned, whatever it was succeeded.

I prefer more details.

Andy also casually drops the concept of idempotence, which I define as saying that the end state is the same no matter how many times you run a specific process. My phrasing of the term is a little different though we get to the same point. Which is itself kind of an explanation of idempotence if you squint a little bit.

Comments closed

Practical Results of a ZooKeeper-less Kafka

Paul Brebner does the math:

The Kafka cluster meta-data is now only stored in the Kafka cluster itself, making meta-data update operations faster and more scalable. The meta-data is also replicated to all the brokers, making failover from failure faster too. Finally, the active Kafka controller is now the Quorum Leader, using Raft for leader election.

The motivation for giving Kafka a “brain transplant” (replacing ZooKeeper with KRaft) was to fix scalability and performance issues, enable more topics and partitions, and eliminate the need to run an Apache ZooKeeper cluster alongside every Kafka cluster.

Read on for some initial testing of KRaft versus ZooKeeper.

Comments closed

Diagnosing Kafka Message Throughput Reductions

Danica Fine and Nikoleta Verbeck troubleshoot an issue:

One of the greatest advantages of Kafka is its ability to maintain high throughput of data. Unsurprisingly, high throughput starts with the producers. Prior to sending messages off to the brokers, individual records destined for the same topic-partition are batched together as a single compressed collection of bytes. These batches are then further aggregated before being sent to the destination broker.

Batching is a great thing, and we (generally) want it. But how do you know when it’s working well and when it’s not?

This first post covers message throughput but there will be several other topics in the series as well.

Comments closed

Defining Production-Grade Code

Chad Callihan takes a detour:

You understand what you’re thinking when you write code but will you remember it in a few weeks or months? What about the new associate that has to work with your code? Will they be able to decipher what you were thinking based on code alone?

If code isn’t documented, it can make work unnecessarily difficult. For production-grade code, I would expect it to be well documented so even someone with minimal knowledge can get an idea of what the code is doing. It’s not necessary to write a novel for documentation but having something is better than nothing.

Read on for more thoughts of what makes code production-grade.

Comments closed

Conditional Formatting from Text in Power BI

Mara Pereira shows us a trick:

Have you ever wondered if you can apply conditional formatting based on a text field/measure instead of a numeric field/measure?

If your answer is yes, then this trick is for you!

The other day I was working with a customer who asked something that I had no idea how to build.

They wanted to apply conditional formatting over some of their visuals, but they wanted the conditional formatting applied over a text field and not over a numeric field or a measure.

Read on to see how.

Comments closed

Qualities of Production-Grade Code

Aaron Bertrand pulls out the list:

In a lot of programming languages, efficiency is almost always the guidepost. Sometimes, minimizing character count or line count is a “fool’s gold” measure of the quality of the code itself. Other times, unfortunately, engineers are judged not by quality at all, but rather the sheer number of lines of code produced – where more is, somehow, obviously better. Over my career, “how much code there is” has never been a very meaningful measure in any language.

But I’m here to talk about T-SQL, where certainly efficiency is a good thing to measure – though there are some caveats to that:

Read on for those caveats and what Aaron considers to be the hallmarks of high-quality code.

Comments closed

Horizontal Fusion in DAX

Marco Russo and Alberto Ferrari put on their lab coats:

Fusion is a DAX optimization that reduces the number of storage engine queries when the engine detects that multiple calculations can be merged together in a single query. There are two types of fusions: vertical fusion and horizontal fusion.

Vertical fusion occurs when multiple measures – or calculations in general – need to be computed in the same filter context. For example, the following query requires the calculation of two measures: Sales Amount and Margin:

Read on to see how horizontal fusion differs and when it can be most useful.

Comments closed