Press "Enter" to skip to content

Curated SQL Posts

Granular Billing for Azure Data Factory

Chenye Charlie Zhu announces a new feature:

By default, Azure Data Factory reports lump sum charges for billing, meaning that at the factory level, we add up charges across all pipelines within a factory, and tell you how much you have spent on these pipelines. In many cases, these aggregate numbers should suffice. But in others, these numbers lack the clarity and transparency that we thrive to provide customers. For instance, if you are running data pipelines for multiple teams, you may want to determine the cost for each pipeline, for proper book-keeping and/or charge backs.

Now, Azure Data Factory will help you with this endeavor, with built-in per pipeline detailed billing view. Moreover, we built the feature on top of the Azure Billing and Cost Analysis platform, allowing you to stay with the cost and budget management tool that you are familiar with to identify spending trends and spot where overspending might have occurred.

Great if you have half a dozen pipelines. Probably less great if you have 500.

Comments closed

Database Lineage with Extended Properties

Garry Bargsley shows a use for extended properties on databases:

Databases listed with 1, 2, 3, 4? What do those databases represent? Where did those databases come from?

Would you believe me if I told you there was a way to identify where the databases were sourced from?

A setting that can be used in SQL Server would allow the person who built these databases or restored them from another system to add a note.

Extended properties are very useful but also very easy to forget and difficult to manage. I’ve seen a few vendor products make great use of them but in-house development tends to ignore them.

Comments closed

Tracking Lineage in Power BI

Gilbert Quevauvilliers reads the chain of custody documents:

As often happens blog posts originate from a customer requesting something I have not figured out before.

In this example the requirement was to be able to determine which data sources were being used by which tables, which were then associated to one or many Power BI datsets.

While I was working through this I figured out I could take it one step further and also if required have the actual Power Query as part of the report.

Read on to see what Gilbert came up with.

Comments closed

Rewriting Tricky Functions in SQL Server

Erik Darling fights dragons:

Far and away, some of the trickiest situations I run into when helping clients is rewriting scalar functions that have WHILE loops in them.

This sort of procedural code is often difficult, but not impossible, to replace with set-based logic.

Erik improves a function in this post, though often, the best way to improve a function is not to play the game at all.

Comments closed

IsNull and IsEmpty in KQL

Robert Cain’s fuel gauge is running on E:

In writing queries, it is not uncommon to get results where a column has missing values. This can cause concerns or questions from your users. “Why is this blank?”, “There must be something wrong with your query its missing data!”.

To avoid this, Kusto provides two functions to check for missing values: isnull and isempty. You can combine this with the iif function (covered in the Fun With KQL – IIF post) to provide clarifying text to the end user.

Check out the examples of how to use these two functions in Robert’s post.

Comments closed

Solving the CanSum Problem in R

Tomaz Kastrun knows if you can sum those together:

CanSum problem is a problem where a given array of integers (nums) and a target integer (target), return boolean (TRUE \ FALSE) indicating, that the target integer can be calculated using two or more numbers in array nums.

You may assume that each integer from the array can be used multiple times. You can also assume, that any integer (in nums or in target) is from 0 to +1e9 and the length of the nums array is from 2 to 1000 elements.

Click through for an example of one brute-force solution, followed by a much faster solution.

Comments closed

Creating a SQL Server Assessment Dashboard

Robert Blackburn builds a dashboard:

We must periodically evaluate the state of our databases. Luckily for SQL Server, Microsoft provides us with a customizable assessment through their SQL Assessment API Repo and API Documentation. You can change the rules per database and output the results to a database to track history.

However, that will take more than an hour. Let’s create a dashboard with the default rules in under an hour. We will use Azure Data Studio (ADS) and Power BI Desktop (PBI). If you are not familiar with them, both are free. Azure Data Studio is automatically installed with SSMS 18.7 and higher. You can also install them individually.

Read on to see how this works. Granted, it will not auto-update but unless the assessment output format changes between runs, at least you wouldn’t need to modify Power BI and could just refresh the data.

Comments closed

Finding Faulty Rows in Tabular Server Errors

Teo Lachev goes error-hunting:

A scheduled SSIS job that executes a massive DAX query to an on-prem Tabular server (Power BI can also generate this error) one day decided to throw an error “Source: “Microsoft OLE DB Provider for Analysis Services.” Hresult: 0x80004005 Description: “MdxScript(Model) (2020, 98) Calculation error in measure ‘Account Snapshot'[Average utilisation % of all CR active current accounts last 3 months]: The result of a conversion or arithmetic operation is either too large or too small.” At least we know the offending measure, but which row is causing the error? The query requests some 300+ measures for 120 million customers, so I thought someone might find the troubleshooting technique useful. Let’s ignore what the measure does for now except mentioning that it performs a division of two other measures.

Click through for the technique.

Comments closed

Auto Partitioning Recommendations for Oracle

Brendan Tierney checks out some recommendations:

In a previous blog post I gave an overview of the DBMS_AUTO_PARTITION package in Oracle Autonomous Database. This looked at how you can get started and to setup Auto Partitioning and to allow it to automatically implement partitioning.

This might not be something the DBAs will want to happen for lots of different reasons. An alternative is to use DBMS_AUTO_PARTITION to make recommendations for tables where partitioning will have a performance improvement. The DBA can inspect these recommendations and decide which of these to implement.

Read on to see how you can run the recommender, as well as what a recommendation looks like.

Comments closed

Geo-Zone Redundant Storage for SQL MI Backups

Niko Neugebauer moves the backups pretty far away:

The new Geo-Zone Redundant Storage (GZRS) backup storage option combines the best of two worlds – Geo-Redundant and Zone-Redundant storage, keeping backups safe from both regional (Geo-Redundant) and Data Center (Zone-Redundant) failures. It provides the highest availability for storage currently offered on Azure, improving recovery speed and enabling Point-In-Time Restore (PITR) of backups in the event of a zone failure.  

Geo-Zone Redundant Storage for Azure SQL Managed Instance backups provides 3 synchronous copies in different availability zones within the same primary region, plus an additional asynchronous copy within a single availability zone in the paired secondary region, as shown on the following picture: 

Click through for that picture and what it does for expected availability. Basically, a whole bunch of data centers would need to fail before you lose a backup. Or someone messes up DNS and makes everything unavailable for a day, not that that’s ever happened before with a large cloud service provider…

Comments closed