Press "Enter" to skip to content

Month: June 2022

Data Lakehouse Cleanrooms in Databricks

Matei Zaharia, et al, announce an interesting idea:

We are excited to announce data cleanrooms for the Lakehouse, allowing businesses to easily collaborate with their customers and partners on any cloud in a privacy-safe way. Participants in the data cleanrooms can share and join their existing data, and run complex workloads in any language – Python, R, SQL, Java, and Scala – on the data while maintaining data privacy.

With the demand for external data greater than ever, organizations are looking for ways to securely exchange their data and consume external data to foster data-driven innovations. Historically, organizations have leveraged data sharing solutions to share data with their partners and relied on mutual trust to preserve data privacy. But the organizations relinquish control over the data once it is shared and have little to no visibility into how data is consumed by their partners across various platforms. This exposes potential data misuse and data privacy breaches. With stringent data privacy regulations, it is imperative for organizations to have control and visibility into how their sensitive data is consumed. As a result, organizations need a secure, controlled and private way to collaborate on data, and this is where data cleanrooms come into the picture.

Read on to learn more about how this all works. It’s definitely a lot better than sending off a bunch of CSVs…

Comments closed

Database Audit Specifications Creating Users

Kenneth Fisher asks, who audits the auditors?:

I love database audits. They are simple, easy to use, effective, not overly resource intensive, and can be turned on and off at need once created. That said, they do have a few gotchas. If you want every user put public as the principal. And if you don’t, and you put in an AD user, be aware that if that user will be created (along with a matching schema) when you create the Database Audit Specification.

Read on for Kenneth’s experience and a way to clean up these potentially-added users.

Comments closed

Finding Key Influencers with Power BI

Gauri Mahajan looks at the key influencers visual in Power BI:

Once the Key Influencers are added to the Power BI report, it would look as shown below. The visual would be empty by default. The key areas that are required to make this visual works are Explain section and Analyze By section. The Analyze section is used to point to the variables or attributes that we intend to analyze. The Explain By section is used to point to the variables or attributes that may be influencing the attributes specified in the Analyze section.

I’ve found this visual to be pretty interesting if you have a good dataset.

Comments closed

Azure Synapse Analytics June 2022 Updates

Ryan Majidimehr has some updates for us:

Fuzzy matching with a sliding similarity score option has been added to the Join transformation in Mapping Data Flows. You can create inner and outer joins on data values that are similar rather than exact matches! Previously, you would have had to use an exact match. The sliding scale value goes from 60% to 100%, making it easy to adjust the similarity threshold of the match. 

Read on for the full list of updates.

Comments closed

Power BI Desktop External Tools Not Opening

Gilbert Quevauvilliers ran into a problem:

I recently got a new laptop and I had to install all my programs again. Everything was going as expected, except when I went to use ALM Toolkit, the program would not open.

I would click on ALM Toolkit, I would see it open for a few seconds in task manager and then it would disappear.

That led me down a few rabbit holes, I thought could it be Windows Defender, could it be the anti-virus or could it be installed incorrectly.

It turns out that neither of those was the problem. Read on to learn what the issue was and how Gilbert corrected it.

Comments closed

Unicode Character Generation in Power Query

Meagan Longoria needs more Unicode:

You may have used the UNICHAR() function in DAX to return Unicode characters in DAX measures. If you haven’t yet read Chris Webb’s blog post on the topic, I recommend you do. But did you know there is a Power Query function that can return Unicode characters? This can be useful in cases when you want to assign a Unicode character to a categorical value.

Click through to see how this works.

Comments closed

Customer Segmentation via Databricks Solution Accelerator

Gavita Regunath discovers customer segments in a dataset:

We will be using the German Credit dataset, a publicly available dataset provided by Dr. Hans Hofmann of the University of Hamburg. The German Credit dataset contains features describing 1000 loan applicants who have taken credit from the bank. Using this dataset, our aim will be to understand the following “How should the bank personalise its products for its customers?”.

Click through to see an example of clustering to generate customer segments.

Comments closed

Data Governance in Databricks with Unity Catalog

Paul Roome, et al, announce the upcoming GA for Databricks Unity Catalog:

Today we are excited to announce that Unity Catalog, a unified governance solution for all data assets on the Lakehouse, will be generally available on AWS and Azure in the upcoming weeks. Currently, you can apply for a public preview or reach out to a member of your Databricks account team.

In a previous blog, we set out our vision for a governed lakehouse and how Unity Catalog can help customers simplify governance at scale. This blog will explore the most recent updates to Unity Catalog and our growing partner ecosystem.

Click through for those updates and to sign up for the public preview if so inclined.

Comments closed

Power BI Smart Narratives

Gauri Mahajan shapes the narrative:

To make it easier for the end-user, this job may be done by report or business analysts who may pre-analyze the reports, manually form textual narratives that summarize the key highlights in the report. While it solves the challenge in question, it opens a possibility of analysts’ bias getting introduced in the report, and the end-user may or may not agree with the narrative. Some systems solve this issue by employing complex machine learning / natural language processing / other artificial intelligence-based mechanisms to auto-generate smart textual narratives that summarizes the key highlights of the data. Though this approach works, it requires a significant number of resources and hard-to-find skills which is outside the bounds of a normal end-user who may want to use a reporting tool in a self-service manner and build a dashboard.

Modern reporting solutions like Tableau, AWS QuickSight, Microsoft Power BI, and others in similar league have been offering a feature to generate key insights using built-in AI/ML in the reporting tool which enables an end-user to extract insights as well as enables a report developer to have a smart visual that auto-updates the insights based on the change in the data.

In practice, this ends up being more of a fun toy than a really practical solution. Part of the issue is that decent analysis is hard, even more so when you have to develop something before even seeing the data or having any priors around feature importance.

Comments closed