Press "Enter" to skip to content

Curated SQL Posts

Better Reports with the 3-30-300 Rule

Kurt Buhler comes up with a good rule:

Effective reports and dashboards should enable users to quickly answer their data questions so that they can focus on their primary business tasks and responsibilities. To help you design effective reports, we introduce the 3-30-300 rule for information design. The 3-30-300 rule is a straightforward and practical approach for you to produce efficient report layouts by structuring reports in a functionally hierarchical way. This rule concisely paraphrases the visual information-seeking mantra from Ben Schneidermann (1996). To make it easier to understand for Power BI developers, we express this rule with respect to approximately how long it should take users to get certain information or perform certain tasks in a report.

It’s a clever mnemonic and Kurt does a good job of showing how you could implement it.

Comments closed

Granting Developers Query Store Access

Josephine Bush wants to allow developers to solve their own problems:

Let’s have devs look at their own query performance. Yes, please, sign me up for that! Sometimes, it’s hard for me to know the best course of action, especially when they are using Entity Framework, but it’s a great start for them to use Query Store to see how impactful their queries are. I’m happy to help them decipher results if they are confused, but I really like performance tuning being a team sport. I was giving them a list of queries with, for example, high CPU usage, but it was even better when they could go in there and use Query Store for themselves on a regular basis.

The actual granting of rights takes a couple lines of T-SQL, and Josephine also provides an overview of Query Store along the way. Erik Darling’s sp_QuickieStore plays a prominent role in this post and I agree that it’s extremely helpful. I’d also be remiss not bringing up QDS Toolbox as well, as it’s a rather good solution in its own right.

Comments closed

Dynamic Cursors in SQL Server

Hugo Kornelis continues a series on cursors:

We’re already at part 31 of the plansplaining series. And this is also the third part in my discussion of execution plans for cursors. After explaining the basics, and after diving into static cursors, it is now time to investigate dynamic cursors. As a quick reminder, recall that a static cursor presents data as it was when the cursor was opened (and does so by simply saving a snapshot of that data in tempdb), whereas a dynamic cursor is supposed to see all changes that are committed while the cursor is open. Let’s see how this change in semantics affects the execution plan.

Read on as Hugo gives it the college try and also admits he might be missing something in the explanation.

Comments closed

GRANT Operations in Postgres

Shaun Thomas takes us through GRANT operations and roles in Postgres:

Not every database-backed application needs to be locked down like Fort Knox. Sometimes there are even roles that leverage blanket access to large swathes of available data, if not every table, simply for auditing or monitoring purposes. Normally this would require quite a bit of preparation or ongoing privilege management, but Postgres came up with a unique solution starting with version 14: predefined roles.

This topic comes up relatively frequently in Postgres chats like Discord, Slack, and IRC. Usually it’s along the lines of: “We have a low security application but have separated read and write access from the table owner to avoid accidents. That user should still be able to read or write any table in the database. What do I do?”

This is an area where Postgres and SQL Server are using the same terms but aren’t quite speaking the same language.

Comments closed

Power BI Command Memory Limit

Chris Webb is overdrawn at the memory bank:

Continuing my series on Power BI model memory errors (see part 1 and part 2), in this post I will look at the Command Memory Limit which restricts the amount of memory that XMLA commands like Create, Alter and most importantly Refresh and can use.

If you’ve ever been told that your semantic model should consume less than half the amount of memory available to it because memory consumption can double during a full refresh, then that is because of the Command Memory Limit. 

Read on to learn more about the Command Memory Limit and why this advice exists.

Comments closed

Reviewing Experimental Results in the Process

John Cook talks philosophy of statistics:

Suppose you’re running an A/B test to determine whether a web page produces more sales with one graphic versus another. You plan to randomly assign image A or B to 1,000 visitors to the page, but after only randomizing 500 visitors you want to look at the data. Is this OK or not?

John also has a follow-up article:

Suppose you design an experiment, an A/B test of two page designs, randomizing visitors to Design A or Design B. You planned to run the test for 800 visitors and you calculated some confidence level α for your experiment.

You decide to take a peek at the data after only 300 randomizations, even though your statistician warned you in no uncertain terms not to do that. Something about alpha spending.

You can’t unsee what you’ve seen. Now what?

Read on for a very interesting discussion of the topic. I’m definitely in the Bayesian camp: learn quickly update frequently, particularly early on when you have little information on the topic and the marginal value of learning one additional piece of information is so high.

Comments closed

Generating Data in SQL Server based on Distributions

Rick Dobson builds some data:

I support a data science team that often asks for datasets with different distribution values in uniform, normal, or lognormal shapes. Please present and demonstrate the T-SQL code for populating datasets with random values from each distribution type. I also seek graphical and statistical techniques for assessing how a random sample corresponds to a distribution type.

This is an interesting article, though if you want a set-based version of generating data according to a normal distribution, I have a blog post where I translated the RBAR version into something that performs a bit better. Converting to log-normal form also makes a lot of intuitive sense.

Comments closed

Comparing Microsoft Fabric Warehouse and Lakehouse Performance

Reitse Eskens busts out the stopwatch:

I just can’t seem to stop doing this, checking the limits of Microsoft Fabric. In this instalment I’ll try and find some limits on the data warehouse experience and compare them with the Lakehouse experience. The data warehouse is a bit different compared to the Lakehouse, so I’ll be digging into that one first. Then I’m going to load data into the warehouse with a copy data pipeline followed by some big queries to test performance. The Fabric Capacity App will be used to check out the capacity necessary (or used for that matter).

As usual, I’m using the F2 capacity as it’s the one that should break the easiest. It’s also the cheapest one to run tests against and, as the capacity calculation isn’t dependent on the SKU (Stock Keeping Unit), you can easily translate to find out which capacity SKU will fit the workload. Remember that your workload will differ from the one shown in this blog. These tests are a comparison between the different offerings, something you could do for yourself. These blogs are a bit of a happy place as every option will get a good chance. In your work, your skills (and those of your co-workers) will be a major driver towards an option. Even if this offers the chance to learn something new!

Reitse focuses on ingesting and transforming data and the results were quite interesting.

Comments closed

Invoking a Fabric Data Factory Pipeline via REST API

Andy Leonard makes a call:

This post is current as of 30 May 2024. There are other posts by fantastic bloggers about how to use the Fabric REST API. Fabric development is progressing so fast, some of those posts are less up-to-date. Make no mistake, this post will most likely not age well, and for the very same reason. That’s ok. We bloggers live to serve. I, like all the rest, will endeavor to persevere – and we will all write more posts, Lord willing.

In this post, I share one way to invoke Fabric Data Factory pipelines using the REST API.
I will be using the web version of Postman to call REST API methods.
You can sign up for a free Postman account. Since it’s free, I encourage you to check the box to receive news and offers from them. As I mentioned in an earlier post, you can always unsubscribe if the messages are unhelpful or if they get too “chatty.”

Read on for that way.

Comments closed

Using PostGIS in the Terminal

Dian M. Fay talks turkey about terminals:

Of late, I’ve been falling down a bunch of geospatial rabbit holes. One thing has remained true in each of them: it’s really hard to debug what you can’t see.

There are ways to visualize these. Some more-integrated SQL development environments like pgAdmin recognize and plot columns of geometry type. There’s also the option of standing up a webserver to render out raster and/or vector tiles with something like Leaflet. Unfortunately, I don’t love either solution. I like psql, vim, and the shell, and I don’t want to do some query testing here and copy others into and out of pgAdmin over and over; I’m actually using Leaflet and vector tiles already, but restarting the whole server just to start debugging a modified query is a bit much in feedback loop time.

Read on for Dian’s recommendations.

Comments closed