Press "Enter" to skip to content

Month: January 2024

Data Vault 2.0 Models in Microsoft Fabric

Michael Olschimke and Dmytro Polishchuk continue a series:

The last article in this blog series discussed the basic entity types in Data Vault 2.0: hubs, links and satellites. While it would be theoretically possible to limit a model to just these three basic entity types, the resulting Data Vault model would be inefficient: it would most likely consume too much storage, be less efficient due to the many joins, and require a number of grain shifts during information delivery. This is due to certain characteristics in the data that require special treatment.

For these characteristics, Data Vault 2.0 provides special entity types that deal with the specialities. This article focuses on two of them: the non-historized link, which is used to capture transactions and events, and the multi-active satellite, which is used to model multiple active descriptions for the same parent hub or link in the same load.

Read on for an example of how to implement this in a Microsoft Fabric warehouse.

Comments closed

Using the Spark Connect GRPC API

Ed Elliott digs into API details:

In the first two posts, we looked at how to run some Spark code, firstly against a local Spark Connect server and then against a Databricks cluster. In this post, we will look more at the actual gRPC API itself, namely ExecutePlan, Config, and AddArtifacts/ArtifactsStatus.

Click through to see how it all works, with plenty of C# code to guide you along the way.

Comments closed

Loading Data from Statistics Denmark into Power BI

Erik Svensen goes over an oldie:

It turns out that the blogpost I wrote 10 years ago about getting data from Statistics Denmark into Power BI with Power Query still is being used – link.

But as the API has changed a bit since then I was asked to do an update of the blogpost – so here is how you can get the population of Denmark imported into Power BI with Power Query.

Read on to see the right way to do it today.

Comments closed

Embracing the Boring Part of Data Governance

Nikki Kelly shares some thoughts on data governance:

Data Governance – you have heard the term a million times and not once has it driven excitement in to your heart. I’d like to spend the next few minutes changing that.

Data Governance is formally defined as “… a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.”

Boring.

Nikki makes a great point that the process may feel boring but the net results are critical.

Comments closed

Calculating Date Differences in Month with R

Steven Sanderson has ways to track months:

Greetings fellow R enthusiasts! Today, let’s dive into the fascinating world of date calculations. Whether you’re a data scientist, analyst, or just someone who loves coding in R, understanding how to calculate the number of months between dates is a valuable skill. In this blog post, we’ll explore two approaches using both base R and the lubridate package, ensuring you have the tools to tackle any date-related challenge that comes your way.

Read on to see how to do this in base R as well as the lubridate package.

Comments closed

DBCC SHRINKFILE and tempdb

Tom Collins answers a question:

Question: I’m trying to delete a TempDB ndf file from the TempDB file definitions. It is no longer required ,but getting an error message :

DBCC SHRINKFILE: Page xxxxxxxx could not be moved because it is a work table page.

How can I get around this problem? There is no activity on the server

Read on for the answer.

Comments closed

The Impact of Auto-Close on Performance

Steve Stedman explains why Auto-Close should almost never be on for your database:

When the AutoClose setting is enabled, SQL Server will shut down the database after the last user disconnects. This means that every time a new connection is made, SQL Server must go through the entire process of starting the database again. This includes reading the database file, allocating memory, and performing any necessary recovery processes. This overhead can cause a noticeable delay for users as they connect, especially if the database is large or complex.

Read on for several other factors affecting performance. I will say that the best use case for Auto-Close is when you have a dev instance—especially on a local machine—with a large number of databases and a very limited amount of RAM available. Otherwise, if this is a server, I’m turning Auto-Close off. Even today, I’d rather just buy enough RAM for my developers than flip this switch.

Comments closed

Notebooks versus Dataflow Gen2 in Microsoft Fabric

Gilbert Quevauvilliers takes us through a comparison:

In this blog post I am going to compare Dataflow Gen2 vs Notebook in terms of how much it costs for the workload. I will also compare usability as currently the dataflow gen2 has got a lot of built in features which makes it easier to use.

The goal of this blog post is to understand which in my opinion is cheaper and easier to use, which will then be the focus for future blog posts with regards to what I’ve learned along the way, which will hopefully assist you too.

To compare between the two workloads, I am going to be using the same source file as well as do the same transformations which will result in the same result.

Read on for a surprising difference in cost.

Comments closed

Reviewing the Data Activator in Microsoft Fabric

Ginger Grant takes us through the Data Activator:

With the GA release of Fabric in November, 2023, I am dedicating several posts to new features which you will not find in Power BI or Azure Synapse, and the latest one I want to talk about is Data Activator. Data Activator is an interesting tool to include inside of Fabric because it is not reporting or ETL, rather it is a way to manage actions when the data hits defined targets.  It is a management system for data stored in Fabric or streamed in Azure using IOT or Event Hubs. You can use Data Activator to monitor the current state or to define actions to occur when certain conditions occur in the data.  Data Activator is still in preview, but you can evaluate it now.

Read on to see how to enable it and what you can currently do with it.

Comments closed