Press "Enter" to skip to content

Month: October 2023

Handling Source System Deletions in a Warehouse

Rayis Imayev deletes some rows:

When something important disappears, it’s natural to start asking questions and looking for answers, especially when that missing piece has had a significant impact on your life.

Similarly, when data that used to exist in your sourcing system suddenly vanishes without any trace, you’re likely to react in a similar way. You might find yourself reaching out to higher authorities to understand why the existing data management system design allowed this to happen. Your colleagues might wonder if better ways to handle such data-related issues exist. Ultimately, you’ll embark on a quest to question yourself about what could have been done differently to avoid the complete loss of that crucial data.

Kimball-style data warehousing already has the idea of type-2 slowly changing dimensions, which allow you to track the deletion of dimensional data by assigning an end date to the row and not inserting a new record with the next start date. It’s a little harder to deal with fact data deletions in that way, though, as there historically is no concept of slowly changing facts.

Read on for some thoughts on the topic from Rayis.

Comments closed

Excel Data Analysis with Python

Chris Webb takes us through a new add-in for Excel:

In the Power BI/Fabric community everyone is excited about the recent release of Semantic Link: the ability to analyse Power BI data easily using Python in Fabric notebooks. Sandeep Pawar has an excellent blog post here explaining what this is and why it’s so cool. Meanwhile in the Excel community, everyone is excited about the new integration of Python into Excel. But can you analyse Power BI data in Excel using Python? Yes you can – so as my teenage daughter would say, it’s time for a crossover episode.

Click through for an example of it in action.

Comments closed

Creating Horizontal Legends in R

Steven Sanderson flattens the legend:

Creating a horizontal legend in base R can be a useful skill when you want to label multiple categories in a plot without taking up too much vertical space. In this blog post, we’ll explore various methods to create horizontal legends in R and provide examples with clear explanations.

Read on for two demos, one with a single legend and one which creates two legends. I’m not so sure about how valuable the latter is (because you’re splitting valuable information into two places, losing some of the glanceability of a chart along the way), but it is interesting that you can do it.

Comments closed

Killing a Running Apache Spark Application

The Big Data in Real World team pulls the plug on an application:

Apache Spark is a powerful open-source distributed computing system used for big data processing. However, sometimes you may need to kill a running Spark application for various reasons, such as if the application is stuck, consuming too many resources, or taking too long to complete. In this post, we will discuss how to kill a running Spark application.

Click through to see how you can do this.

Comments closed

Connection Pooling in Postgres

Semab Tariq shows off a tool for Postgres:

PgBouncer is a lightweight yet powerful connection pooling tool for PostgreSQL. It efficiently manages and reuses database connections, reducing the load on the server and improving performance. It acts as an intermediary between applications and the PostgreSQL database, optimizing connection usage and enhancing scalability.

This is a bit different from SQL Server, where connection pooling is built in. Read on to see how it works.

Comments closed

Microsoft Fabric and Dataverse

Jose Mendes let us know what’s going on with Dataverse:

If like me, you’ve been keeping taps on what Microsoft has been up to on the Power Platform world, you would have noticed that there are two concepts that are regularly referenced in their architectures and generally associated to each other, Azure Data Lake Storage (ADLS) Gen 2 and Common Data Model (CDM).

As Francesco referred in his blog, Microsoft ultimate vision is for the CDM to be the de facto standard data model, however, although there is a fair amount of resources talking about the capabilities and features, it can be a bit confusing to understand how you can actually store your data in the CDM format in ADLS and use it to run data analytics such as data warehousing, Power BI reporting and Machine Learning.

Read on for more of what’s happening on that front. I will admit that Dataverse tends to be way down on my list of priorities, but that’s because I’m a relational database snob.

Comments closed

Data Activator in Microsoft Fabric

Toby Smith looks at the current state of Data Activator in Microsoft Fabric:

Fabric is the newest all-in-one analytics solution from Microsoft. It combines multiple components (some existing, some new) into a single integrated environment. One of these new components is Data Activator. As Data Activator is still in development, there is still more functionality to be added. This blog shares some of the current abilities and uses for Data Activator, along with ideas for how you can use it in your own business situations.

One of the biggest challenges with big data is understanding it. With tools like Power BI, we are now able to understand and analyse data better than ever before. But when do we act on it? Do we have to manually look at these reports daily just to check everything is going ok? This is where Data Activator comes in. Data activator is a no-code tool that automatically takes actions when certain conditions are met in the data. These actions can vary from alerts in Microsoft Teams, calling stored procedures, triggering other fabric items like a pipeline, or even retraining AI models.

This is a feature which has enormous potential for near-real-time alerting and automating workflows. But do read on to learn about some of the limitations currently in the product.

Comments closed

Microsoft Fabric Roadmap

James Serra shares some thoughts on the Microsoft Fabric roadmap:

Just released was the Microsoft Fabric roadmap that you can check out at https://aka.ms/FabricRoadmap. It’s great to see Microsoft be transparent on what features they are working on and when they will be available.

Here are my top 18 features on the roadmap that I am most excited about (in the order found in the roadmap):

Seems like about half of what James is looking forward to releases in Q4 and the other half releases in mid-2024.

Comments closed

Updates to Power BI Field Finder

Stephanie Bruno has an update for us:

The Power BI Field Finder is a standalone .pbix file you can download and hook up to your reports and data model to. The Field Finder helps you visually analyze where fields are used in reports.

I’ve used this to great effect on a prior project where I had to figure out what was going on in a report with about 20-25 pages that other people had put together.

Comments closed