Press "Enter" to skip to content

Curated SQL Posts

Data Governance in Databricks with Unity Catalog

Paul Roome, et al, announce the upcoming GA for Databricks Unity Catalog:

Today we are excited to announce that Unity Catalog, a unified governance solution for all data assets on the Lakehouse, will be generally available on AWS and Azure in the upcoming weeks. Currently, you can apply for a public preview or reach out to a member of your Databricks account team.

In a previous blog, we set out our vision for a governed lakehouse and how Unity Catalog can help customers simplify governance at scale. This blog will explore the most recent updates to Unity Catalog and our growing partner ecosystem.

Click through for those updates and to sign up for the public preview if so inclined.

Comments closed

Power BI Smart Narratives

Gauri Mahajan shapes the narrative:

To make it easier for the end-user, this job may be done by report or business analysts who may pre-analyze the reports, manually form textual narratives that summarize the key highlights in the report. While it solves the challenge in question, it opens a possibility of analysts’ bias getting introduced in the report, and the end-user may or may not agree with the narrative. Some systems solve this issue by employing complex machine learning / natural language processing / other artificial intelligence-based mechanisms to auto-generate smart textual narratives that summarizes the key highlights of the data. Though this approach works, it requires a significant number of resources and hard-to-find skills which is outside the bounds of a normal end-user who may want to use a reporting tool in a self-service manner and build a dashboard.

Modern reporting solutions like Tableau, AWS QuickSight, Microsoft Power BI, and others in similar league have been offering a feature to generate key insights using built-in AI/ML in the reporting tool which enables an end-user to extract insights as well as enables a report developer to have a smart visual that auto-updates the insights based on the change in the data.

In practice, this ends up being more of a fun toy than a really practical solution. Part of the issue is that decent analysis is hard, even more so when you have to develop something before even seeing the data or having any priors around feature importance.

Comments closed

Query Store in SQL Server 2022

Melody Zacharias gives us a heads up on what’s new with Query Store:

The SQL Server team has improved on Query Store for 2022 again and made some great improvements for SQL 2022. Query Performance was originally introduced as a flight recorder for your queries. It uses a system that gathers query performance data and gives you insights into your work loads over time. In 2022 it is being used to build and expand new capabilities in intelligent query processing.  To allow this to work well and be accurate, Query Store is now enabled by default for new databases. In addition to providing hinting support, it will facilitate the ability to build new intelligent query processing scenarios and improve performance.

Read on for a list of improvements you’ll see in the product.

Comments closed

An Overview of Parameter-Sensitive Plan Optimization

Erik Darling is diving into what we currently know about Parameter-Sensitive Plan Optimization, starting with an overview:

The way this feature works is, rather than caching a single query plan for every other execution to use, it creates what’s called a Dispatcher plan (if your query qualifies).

You’ll see something like this in the properties of the root node of your query plan, and your query will have some additional funny business at the end of it.

Read on to see what information is available to us and current feature limitations.

Comments closed

Resolving Implicit Conversion on a Join

Andrea Allred has a process:

As I was troubleshooting a performance issue, I noticed that there was an implicit conversion (SQL Server automatically converts the data from one data type to another) happening in my join. The join was on a column that was named the same in both tables, but one was datatype INT (integer) and the other was a datatype of VARCHAR(50) (variable character up to 50 places).

Read on for one way to resolve this issue…so long as no other calling code expects a string on a call.

Comments closed

Triggering a Power BI Dataset Refresh from Synapse

Nick Edwards updates a dataset:

Login to powerbi.com and in the top right hand corner locate “Settings” and then “Admin portal”

Under “Tenant settings” locate “Developer Settings” and then “Allow service principles to user Power BI APIs”.

Set this service to “Enabled” using the toggle. Next under the heading “Apply to:” select “Specific security groups (Recommended)”. Next add the newly created security group “AzureSynapsePowerBIIntegration” and click apply.

Click through for the full process.

Comments closed

Reading from and Writing to Excel with R

Benjamin Smith needs to modify an Excel file:

I was recently asked as part of a larger task to combine multiple sheets from an excel workbook into a into a single sheet. When approached about the problem I immediately was asked if I was going to use VBA to do it. While I know my way around VBA, since VBA does not have a native way to undo its operations I was uncomfortable with the potential hazard using VBA would yield if a mistake was made or something wrong happens.

In this blog I share how its possible to combine and format sheets using the openxlsx package and base R. Since I’m limiting myself to one library and base R, I will be employing base R’s pipe operator – |>, instead of the superior magrittr pipe – %>% (my opinion only, don’t take it too seriously).

Can confirm, the magrittr “default” pipe is better.

Comments closed

SharePoint Lists Showing 100 Items in Logic Apps

Koen Verbeeck needs more than 100 results:

I was reading a SharePoint List using the “Get Items” activity in an Azure Logic App. I explain how you can create such a Logic App in the blog post Reading a SharePoint List with Azure Logic App.

It all worked fine for a while, but recently the list grew larger than 100 items. Suddenly, I started getting complaints that some items didn’t make it into the data warehouse. What’s going on? I ran the Logic App and I could see only 100 items were inserted into the SQL Server table:

Read on to see how you can bump that number past 100.

Comments closed

Using a Tree Map as a Legend in Power BI

Prathy Kamasani makes clever use of a tree map:

I recently worked on two projects where the client wanted to show multiple metrics sliced by the same categorical data. For example, seeing how various metrics are performing over different regions or different product groups. A use case like this can be achieved in many ways; probably the best approach is to use small multiples functionality or to keep it simple, five same visuals with different metrics.

Let’s look into it with energy consumption data. Here, I want to show metrics 1 to 5 on different income levels over the years.

I like this solution when you have multiple graphs off of the same base data, like in the small multiples scenario Prathy shows us.

Comments closed