Press "Enter" to skip to content

Day: February 13, 2023

Automated Delta Lake Maintenance in Synapse

Shalu Ganotra Chadha, et al, explain how to keep your Delta Lake tidy:

The useful features of Delta Lake come at the cost of requiring regular maintenance. Delta Lake requires periodic cleanup as it accumulates files over time with each upsert and retains previous snapshots of the data. They can quickly convert a small dataset (in MBs) to several GBs of storage. This is because deleted data is not really removed but retained as an older snapshot of the Delta Lake dataset.

Click through for two operations you can perform on a Delta Lake, as well as some recommendations on when to do what via the Genie Delta Lake Auto Maintenance scripts they provide.

Comments closed

Organizing Power BI Workspaces

Reza Rad brings order to things:

The question I often get in my Power BI architecture consulting sessions is; “How should we organize our workspaces? should we have one workspace with all the reports in it? Or multiple? Should we split it based on each report? Business unit? or something else?” In this article and video, I’ll explain a guide on how to organize and set up workspaces in your organization.

Click through for a video and a helpful article.

Comments closed

The Power of Metadata-Driven Development

Koen Verbeeck lays out a recommendation:

In this blog post I’ll talk about another of those rules/mantras/patterns/maxims:

build once, add metadata

I’m not sure if I’m using the right words, I heard something similar in a session by Spark enthusiast Simon Whiteley. He said you should only write code once, but make it flexible and parameterized, so you can add functionality just by adding metadata somewhere. A good example of this pattern can be found in Azure Data Factory; by using parameterized datasets, you can build one flexible pipeline that can copy for example any flat file, doesn’t matter which columns it has. I have blogged about this:

Click through to learn more about the concept, as well as some tips on how you’d do that in various data movement products (e.g., SSIS, ADF, Logic Apps).

Comments closed

Data Cleanup in Query Store

Grant Fritchey does some housekeeping:

The most important thing to understand here is that Query Store won’t just keep collecting data forever, filling your hard drive. There is a hard limit to how much data Query Store contains. By default, prior to 2019, that was 100mb. After 2019, it’s 1,000mb. You can, of course, adjust this up, or down, as needed on your systems. It’s a database-by-database setting (as so much of Query Store is). You can change this through SSMS:

Even with that limit, there are still several options available to you for when to clean up old Query Store data, whatever the definition of “old” (i.e., time-based or at the user’s whim).

Comments closed

Dynamic What-Ifs in Power BI

Chris Webb breaks the limits:

My recent post on passing any value to a Power BI dynamic M parameter from a report generated quite a lot of interest but only among people who use DirectQuery mode and who want more flexibility when filtering. In this post, however, I’ll show you an application for this technique that is useful for anyone, regardless of whether they are using DirectQuery mode or Import mode, and which is nothing to do with filtering.

You’re probably familiar with Power BI’s what-if parameter feature which allows end users to pass values into calculations. It’s great but it has one big limitation: all the values that can be used need to be hard-coded in a calculated table at design time. What if (haha) you could let your users enter the values they wanted to use in the report instead? Let’s see a simple example of how to do this.

There’s a little bit of trickery but this solves a problem that I’ve had with What-If scenarios for a while: by asking the question, you naturally invite additional scenario ideas and people want to try them out right then and there.

Comments closed

Handling Balance Sheets in Power BI

Imke Feldmann performs a balancing act:

This article presents a method to automatically plug or balance your balance sheet in Power BI. You can use it if your bookkeeping system doesn’t contain the Profit and Loss (PnL) bookings yet. Or for planning or financial modelling purposes when your calculated balance sheet figures don’t add up.

Click through for the example, which pertains to including a current year profit/loss on the reporting breakdown.

Comments closed

Building Custom Lineage in Purview

Alex Crampton writes some Python code:

The aim of this blog is to explain how to create custom Purview processes, enabling you to add lineage from processes that are not tracked out of the box.

As covered in this blogAzure Purview can help with understanding the lineage of your data, offering visibility of how and where data is moving within your data estate.

Lineage can only be tracked out of the box when using tools such as Data Factory, Power BI, and Azure Data Share. Lineage is lost when using other tools like Azure Functions, Databricks notebooks, or SQL stored procedures.

Read on to see the code, as well as what you can do.

Comments closed