Press "Enter" to skip to content

Curated SQL Posts

DATE_BUCKET and DATETRUNC in SQL Server 2022

Itzik Ben-Gan shows a good use of a pair of new T-SQL functions:

Time-based grouping and aggregation are common in analyzing data using T-SQL—for example, grouping sales orders by year or by week and computing order counts per group. When you apply time-based grouping, you often group the data by expressions that manipulate date and time columns with functions such as YEAR, MONTH, and DATEPART. Such manipulation typically inhibits the optimizer’s ability to rely on index order. Before SQL Server 2022, there was a workaround that enabled relying on index order, but besides being quite ugly, it had its cost, and the tradeoff wasn’t always acceptable.

Comments closed

Finding the Real Error in a DAX Measure

Teo Lachev builds on a prior post:

This blog builds upon my previous “Resolving Tabular Conversion Errors” and applies to Analysis Services in all flavors (Power BI, MD, and Tabular). In the scenario I described in the previous blog, the server at least told us the name of the offending measure in the error description. But sometimes you might not be that lucky. For example, recently I got this error when running a DAX query requesting many measures: “Microsoft OLE DB Provider for Analysis Services.” Hresult: 0x80004005 Description: “MdxScript(Model) (2000, 133) Failed to resolve name ‘SYNTAXERROR’. It is not a valid table, variable, or function name.” All we know is that there is a syntax error in some measure but good luck finding it if you have hundreds of measures in the query and your model. However, the (2000,133) section references the line number and column number in the MDX script (Yeap, MDX even if you use Tabular), so if we can get the script, we might be able to correlate the error.

Read on to see how you can capture that script and get the error. The technique is definitely not intuitive.

Comments closed

Granular Billing for Azure Data Factory

Chenye Charlie Zhu announces a new feature:

By default, Azure Data Factory reports lump sum charges for billing, meaning that at the factory level, we add up charges across all pipelines within a factory, and tell you how much you have spent on these pipelines. In many cases, these aggregate numbers should suffice. But in others, these numbers lack the clarity and transparency that we thrive to provide customers. For instance, if you are running data pipelines for multiple teams, you may want to determine the cost for each pipeline, for proper book-keeping and/or charge backs.

Now, Azure Data Factory will help you with this endeavor, with built-in per pipeline detailed billing view. Moreover, we built the feature on top of the Azure Billing and Cost Analysis platform, allowing you to stay with the cost and budget management tool that you are familiar with to identify spending trends and spot where overspending might have occurred.

Great if you have half a dozen pipelines. Probably less great if you have 500.

Comments closed

Database Lineage with Extended Properties

Garry Bargsley shows a use for extended properties on databases:

Databases listed with 1, 2, 3, 4? What do those databases represent? Where did those databases come from?

Would you believe me if I told you there was a way to identify where the databases were sourced from?

A setting that can be used in SQL Server would allow the person who built these databases or restored them from another system to add a note.

Extended properties are very useful but also very easy to forget and difficult to manage. I’ve seen a few vendor products make great use of them but in-house development tends to ignore them.

Comments closed

Tracking Lineage in Power BI

Gilbert Quevauvilliers reads the chain of custody documents:

As often happens blog posts originate from a customer requesting something I have not figured out before.

In this example the requirement was to be able to determine which data sources were being used by which tables, which were then associated to one or many Power BI datsets.

While I was working through this I figured out I could take it one step further and also if required have the actual Power Query as part of the report.

Read on to see what Gilbert came up with.

Comments closed

Rewriting Tricky Functions in SQL Server

Erik Darling fights dragons:

Far and away, some of the trickiest situations I run into when helping clients is rewriting scalar functions that have WHILE loops in them.

This sort of procedural code is often difficult, but not impossible, to replace with set-based logic.

Erik improves a function in this post, though often, the best way to improve a function is not to play the game at all.

Comments closed

IsNull and IsEmpty in KQL

Robert Cain’s fuel gauge is running on E:

In writing queries, it is not uncommon to get results where a column has missing values. This can cause concerns or questions from your users. “Why is this blank?”, “There must be something wrong with your query its missing data!”.

To avoid this, Kusto provides two functions to check for missing values: isnull and isempty. You can combine this with the iif function (covered in the Fun With KQL – IIF post) to provide clarifying text to the end user.

Check out the examples of how to use these two functions in Robert’s post.

Comments closed

Solving the CanSum Problem in R

Tomaz Kastrun knows if you can sum those together:

CanSum problem is a problem where a given array of integers (nums) and a target integer (target), return boolean (TRUE \ FALSE) indicating, that the target integer can be calculated using two or more numbers in array nums.

You may assume that each integer from the array can be used multiple times. You can also assume, that any integer (in nums or in target) is from 0 to +1e9 and the length of the nums array is from 2 to 1000 elements.

Click through for an example of one brute-force solution, followed by a much faster solution.

Comments closed

Creating a SQL Server Assessment Dashboard

Robert Blackburn builds a dashboard:

We must periodically evaluate the state of our databases. Luckily for SQL Server, Microsoft provides us with a customizable assessment through their SQL Assessment API Repo and API Documentation. You can change the rules per database and output the results to a database to track history.

However, that will take more than an hour. Let’s create a dashboard with the default rules in under an hour. We will use Azure Data Studio (ADS) and Power BI Desktop (PBI). If you are not familiar with them, both are free. Azure Data Studio is automatically installed with SSMS 18.7 and higher. You can also install them individually.

Read on to see how this works. Granted, it will not auto-update but unless the assessment output format changes between runs, at least you wouldn’t need to modify Power BI and could just refresh the data.

Comments closed

Finding Faulty Rows in Tabular Server Errors

Teo Lachev goes error-hunting:

A scheduled SSIS job that executes a massive DAX query to an on-prem Tabular server (Power BI can also generate this error) one day decided to throw an error “Source: “Microsoft OLE DB Provider for Analysis Services.” Hresult: 0x80004005 Description: “MdxScript(Model) (2020, 98) Calculation error in measure ‘Account Snapshot'[Average utilisation % of all CR active current accounts last 3 months]: The result of a conversion or arithmetic operation is either too large or too small.” At least we know the offending measure, but which row is causing the error? The query requests some 300+ measures for 120 million customers, so I thought someone might find the troubleshooting technique useful. Let’s ignore what the measure does for now except mentioning that it performs a division of two other measures.

Click through for the technique.

Comments closed