Press "Enter" to skip to content

Day: June 29, 2022

Customer Segmentation via Databricks Solution Accelerator

Gavita Regunath discovers customer segments in a dataset:

We will be using the German Credit dataset, a publicly available dataset provided by Dr. Hans Hofmann of the University of Hamburg. The German Credit dataset contains features describing 1000 loan applicants who have taken credit from the bank. Using this dataset, our aim will be to understand the following “How should the bank personalise its products for its customers?”.

Click through to see an example of clustering to generate customer segments.

Comments closed

Data Governance in Databricks with Unity Catalog

Paul Roome, et al, announce the upcoming GA for Databricks Unity Catalog:

Today we are excited to announce that Unity Catalog, a unified governance solution for all data assets on the Lakehouse, will be generally available on AWS and Azure in the upcoming weeks. Currently, you can apply for a public preview or reach out to a member of your Databricks account team.

In a previous blog, we set out our vision for a governed lakehouse and how Unity Catalog can help customers simplify governance at scale. This blog will explore the most recent updates to Unity Catalog and our growing partner ecosystem.

Click through for those updates and to sign up for the public preview if so inclined.

Comments closed

Power BI Smart Narratives

Gauri Mahajan shapes the narrative:

To make it easier for the end-user, this job may be done by report or business analysts who may pre-analyze the reports, manually form textual narratives that summarize the key highlights in the report. While it solves the challenge in question, it opens a possibility of analysts’ bias getting introduced in the report, and the end-user may or may not agree with the narrative. Some systems solve this issue by employing complex machine learning / natural language processing / other artificial intelligence-based mechanisms to auto-generate smart textual narratives that summarizes the key highlights of the data. Though this approach works, it requires a significant number of resources and hard-to-find skills which is outside the bounds of a normal end-user who may want to use a reporting tool in a self-service manner and build a dashboard.

Modern reporting solutions like Tableau, AWS QuickSight, Microsoft Power BI, and others in similar league have been offering a feature to generate key insights using built-in AI/ML in the reporting tool which enables an end-user to extract insights as well as enables a report developer to have a smart visual that auto-updates the insights based on the change in the data.

In practice, this ends up being more of a fun toy than a really practical solution. Part of the issue is that decent analysis is hard, even more so when you have to develop something before even seeing the data or having any priors around feature importance.

Comments closed

Query Store in SQL Server 2022

Melody Zacharias gives us a heads up on what’s new with Query Store:

The SQL Server team has improved on Query Store for 2022 again and made some great improvements for SQL 2022. Query Performance was originally introduced as a flight recorder for your queries. It uses a system that gathers query performance data and gives you insights into your work loads over time. In 2022 it is being used to build and expand new capabilities in intelligent query processing.  To allow this to work well and be accurate, Query Store is now enabled by default for new databases. In addition to providing hinting support, it will facilitate the ability to build new intelligent query processing scenarios and improve performance.

Read on for a list of improvements you’ll see in the product.

Comments closed

An Overview of Parameter-Sensitive Plan Optimization

Erik Darling is diving into what we currently know about Parameter-Sensitive Plan Optimization, starting with an overview:

The way this feature works is, rather than caching a single query plan for every other execution to use, it creates what’s called a Dispatcher plan (if your query qualifies).

You’ll see something like this in the properties of the root node of your query plan, and your query will have some additional funny business at the end of it.

Read on to see what information is available to us and current feature limitations.

Comments closed

Resolving Implicit Conversion on a Join

Andrea Allred has a process:

As I was troubleshooting a performance issue, I noticed that there was an implicit conversion (SQL Server automatically converts the data from one data type to another) happening in my join. The join was on a column that was named the same in both tables, but one was datatype INT (integer) and the other was a datatype of VARCHAR(50) (variable character up to 50 places).

Read on for one way to resolve this issue…so long as no other calling code expects a string on a call.

Comments closed

Triggering a Power BI Dataset Refresh from Synapse

Nick Edwards updates a dataset:

Login to powerbi.com and in the top right hand corner locate “Settings” and then “Admin portal”

Under “Tenant settings” locate “Developer Settings” and then “Allow service principles to user Power BI APIs”.

Set this service to “Enabled” using the toggle. Next under the heading “Apply to:” select “Specific security groups (Recommended)”. Next add the newly created security group “AzureSynapsePowerBIIntegration” and click apply.

Click through for the full process.

Comments closed