Press "Enter" to skip to content

Curated SQL Posts

Enforcing Join Order in Postgres

Hans-Juergen Schoenig demands order:

What is the important observation here? Let us take a look at planning time. PostgreSQL needs 0.297 milliseconds to find the best execution plan (= execution strategy) to run the query. The question arising is: Where does the planner need the time to plan the query? The thing is: Even when using explicit joins as shown above PostgreSQL will join those tables implicitly and decide on the best join order. What does that mean in real life?

Read on to see what this means in practice and how you can control join order in Postgres. With SQL Server, there are various join hints that will force a specific join order. As for the why, there are specific circumstances in which you might have more information than the optimizer and can come up with a superior way of joining tables together, especially as queries get more complicated. One of my favorite query tuning books is Dan Tow’s SQL Tuning, which is 20 years old at this point but still lays out a great way of thinking about how to attack the process of running a query. In that book, Dan uses several criteria to determine the table from which you want to drive a particular query, using factors like filters, the existence of foreign key constraints, etc. From there, you have a somewhat-deterministic way of defining the most efficient path for connecting the rest of the tables together. For most queries, especially in OLTP systems, this doesn’t matter very much in practice. But for warehouses, it can make a world of difference.

Comments closed

Incremental Refresh on Large Power BI Semantic Models

Soheil Bakhshi needs to refresh a lot of data:

Implementing incremental refresh on Power BI is usually straightforward if we carefully follow the implementation steps. However in some real-world scenarios, following the implementation steps is not enough. In different parts of my latest book, Expert Data Modeling with Power BI, 2’nd Edition, I emphasis the fact that understanding business requirements is the key to every single development project and data modelling is no different. Let me explain it more in the context of incremental data refresh implementation.

Read on for that explanation, as well as a few tips to make things work a bit more smoothly.

Comments closed

Durability and Hekaton

Rob Farley ponders a pair of potential performance improvements and their effects on durability:

Durability in SQL is handled by making sure that data changes are written to disk before the calling application is informed that the transaction is complete. We don’t walk out of a shop with our goods before the cashier has confirmed that the credit card payment has worked. If we did, and the payment bounced, the cashier would be calling us back pretty quickly. In SQL this is about confirming that the transaction log entry has been written, and it’s why you shouldn’t use disks with write-cache for databases

And yet, in-memory features of SQL, commonly called “Hekaton” handles transactions without writing to disk. The durability is delayed. This month, Todd Kleinhans invites us to write about Hekaton.

In-Memory OLTP is one of those features that I wish worked better for most use cases or didn’t have as many limitations around only working within the context of a single database. In practice, instead of using In-Memory OLTP for most tables, you’re usually better off just jamming more RAM on the box and limiting how many scans of large tables flush your buffer pool.

Comments closed

Tips for Using Powershell in Azure

Paul Harrison shares a few tips with us:

When I’m working with a new object in Azure I often don’t know where the information I care about is actually found in output. PowerShell makes it easy to navigate through objects, however it isn’t easy to get an overview of all properties available if they’re nested 5 levels deep. I like to use ConvertTo-JSON to help me get a general understanding for a new object and which properties are available and how to find them.

Read on for more information about converting to JSON and four other tips.

Comments closed

Proper Logging of SQL Agent Job Outputs

Aaron Bertrand hits on an annoyance of mine:

If you haven’t migrated your workloads to a managed database platform yet, you’re probably still relying on SQL Server Agent for various maintenance and other scheduled tasks. Most of the time, these processes just work. But when it’s time to troubleshoot, it can be cumbersome to get to the root of some problems. In this post, I’ll share some ideas to help you minimize the level of annoyance and tedium when you have to figure out what went wrong with the execution of a job.

Recently, I was investigating a job that ran a stored procedure against multiple databases in sequence, and ultimately failed. Here was the output of the job step as seen in Management Studio’s View Job History dialog:

Read on for a few tips to help with SQL Agent job logging.

Comments closed

VISUAL SHAPE and Visual Calculations

Marco Russo and Alberto Ferrari fit the square pegs into the square holes:

Visual calculations, introduced as a preview feature with the February 2024 release of Power BI, aim to simplify the creation of calculations tied to a specific visual. Using visual calculations for simple calculations is straightforward.

However, as soon as developers create more complex calculations, they should understand the technical details of visual calculation implementation. This requires understanding the hierarchical structure of the virtual table, the new visual context, the semantics of ROWS and COLUMNS, the behavior of CALCULATE, and the new visual context modifiers EXPAND and COLLAPSE.

In this first article about visual calculations, we introduce VISUAL SHAPE and the basics of visual calculation implementation, leaving the remaining topics to future articles. A complete whitepaper with a detailed explanation of all these topics will be available soon to SQLBI+ subscribers.

Even without a complete whitepaper, this serves as a useful primer on the topic.

Comments closed

Dropping Extra tempdb Files

Vlad Drumea doesn’t want to restart SQL Server:

This is a brief post containing a script you can use to delete extra tempdb data files and avoid the “file not empty” (Msg 5042) error.

I’ve occasionally ran into situations where an instance is configured with more than the recommended number of tempdb data files.

If you’re dealing with more than a couple of extra files it can get annoying, especially on a busy instance.

Read on for the script and how you can use it on busy and not-so-busy instances.

Comments closed

Saving Money on Azure Storage

Rahul Miglani claws back some cash:

In today’s digital landscape, businesses are increasingly turning to cloud storage solutions to manage their data effectively. Microsoft Azure offers a wide range of storage options tailored to meet diverse business needs while optimizing costs. In this blog post, we’ll explore how organizations can leverage Azure storage options to achieve significant cost savings without compromising performance or reliability.

Read on for ten tips. A lot of it boils down to keeping just enough data and putting it in the right tier, but there’s a bit more to the story.

Comments closed

Relationship Columns and Power BI DirectQuery Mode

Chris Webb builds a relationship:

Many Power BI connectors for relational databases, such as the SQL Server connector, have an advanced option to control whether relationship columns are returned or not. By default this option is on. Returning these relationship columns adds a small overhead to the time taken to open a connection to a data source and so, for Power BI DirectQuery semantic models, turning this option off can improve report performance slightly.

Read on to learn what these relationship columns are and how you can remove them. Chris also provides a first-order approach to how you can estimate the performance pain involved with including these.

Comments closed