Press "Enter" to skip to content

Category: Warehousing

Tracking Task History in Snowflake

Kevin Wilkie is interested in just one thing:

By the table function name, you’re probably wondering “Sherpa, why on Earth do I really need to look at query history based on the session? I have Query_History_By_User which gives me a broader look at my data. Who really cares?”

Great question, my friend! You’re right. This new Query_History_By_Session table function isn’t going to give you a lot of data that you can’t get easier – and more helpfully – with Query_History_By_User. Why did Snowflake provide me this “useless” function?

Read on for the value of this function and an example of querying task history.

Comments closed

Security Options in Microsoft Fabric Warehouses

Koen Verbeeck locks things down:

We are implementing a data analytics solution in Microsoft Fabric. A warehouse is used for the gold layer, and we want to give users access to the data. However, by sharing the warehouse, they can read all the data in all the tables. Some data is sensitive, and only users with the correct permissions should be able to view it. Is it possible to implement more granular access control to the data?

Read on for the answer, as well as an important note on how users might be able to circumvent your permissions settings.

Comments closed

Type 2 SCDs in Microsoft Fabric

Reza Rad has changes to make:

In the previous article, I explained SCD (Slowly Changing Dimension) and its different types. In this article, I’ll show you how to implement SCD Type 2 (one of the most common types) using Microsoft Fabric and Power BI. This article includes using Lakehouse, Dataflow, Warehouse, Data Pipeline, SQL Stored Procedures, Power BI Semantic model, and report in Microsoft Fabric.

Click through to learn more about the structure of a type-2 slowly changing dimension, as well as how you can store and load this information to track changes over time.

Comments closed

An Overview of Slowly Changing Dimensions

Reza Rad talks slowly changing dimensions:

If you want to use Power BI, Microsoft Fabric, or any other data analytics tools, one of the key concepts to understand when working with a data warehouse system is the SCD (Slowly Changing Dimension). I will do this in a series of at least two articles. The first one (this one) will be on the concept of what SCD is, its meaning, and its different types. Then, the next one will discuss how to implement SCD types (such as Type 2) using Microsoft Fabric and Power BI.

Reza focuses on SCD types 0-4 but does briefly touch on types 5-7 (of which, I’d never heard of SCD type 7).

Comments closed

Finding Query History in Snowflake

Kevin Wilkie digs into the history:

If you’re running an audit process in SQL Server, you can do this fairly easily. But how many of us work in a place that requires this or has enough space to do this? Sadly, very few of us…

If you have Query Store running (and why wouldn’t you?) you can find a specific query you were running. But you have to have a specific something you can search for…

But can you get just a list of what you specifically ran in order? Not that I can find….

My recollection is that, if you’re using Azure SQL Database, that auditing is on by default (or you can turn it on with a toggle switch or radio buttion). For Azure Synapse Analytics dedicated SQL pools, there are a few DMVs that cover query operations. But yeah, in general, it’s something you’d need to enable first.

Comments closed

Microsoft Fabric Warehouse Access Control

Koen Verbeeck talks permissions:

We are starting a new analytics project in Microsoft Fabric, and our data will land in a warehouse. This is the first time we’re using Fabric, and we are wondering about the different options for sharing access to a warehouse we developed in a workspace.

Click through for more information on providing and limiting access to data in a Microsoft Fabric warehouse.

Comments closed

Microsoft Fabric: Lakehouse or Warehouse?

Koen Verbeeck helps us choose:

This doesn’t mean no code has to be written. On the contrary, in this article we’re going to focus on two services of Fabric: the lakehouse and the warehouse. The first one is part of the Data Engineering experience in Fabric, while the latter is part of the Data Warehousing experience. Both require code to be written to create any sort of artefact. In the warehouse we can use T-SQL to create tables, load data into them and do any kind of transformation. In the lakehouse, we use notebooks to work with data, typically in languages such as PySpark or Spark SQL.

Read on for the comparison. I tend to go more for the lakehouse experience rather than warehouse, but Koen provides a lot of the info you’d need in order to make the right decision for yourself.

Comments closed

Choosing between Data Warehouses, Lakes, and Lakehouses

Den Smyrnov talks architecture:

Historically, the two most popular approaches to storing and managing data are Data Warehouse and Data Lake. The choice between them usually depends on business objectives and needs. While Data Lakes are ideal for preserving large volumes of diverse data, warehouses are more favorable for business intelligence and reporting. Sometimes, organizations try to have the best of both worlds and mix Data Lake & Data Warehouse architectures. This, however, can be a time and cost-consuming process.

Against this backdrop, a new hybrid approach—Data Lakehouse—has emerged. It combines features of a Data Lake and a Data Warehouse, allowing companies to store and analyze data in the same repository and eliminating the Data Warehouse vs. Data Lake dilemma. Data Lakehouse mixes the scalability and flexibility of a Data Lake with the ability to extract insights from data easily. Ever so compelling, this approach still has certain limitations. It should not be treated as a “one-size-fits-all” solution.

Read on for an explanation of each of these three styles, including their pros and cons.

Comments closed

Time Travel in the Microsoft Fabric Warehouse

Reza Rad hops in the Delorean:

Data changes throughout time, especially in the world of BI and data warehousing systems; the data gets updated through ETL processes frequently. This means that the data you see in the warehouse today might differ from yesterday and the day before, and so on. Some parts of this data can be retrieved on a timely basis. You can, for example, query the sales amount from the sales table where the date has been the 2nd of April. That would give you the sales amount for the 2nd of April, even if you are querying it on the 23rd of May.

However, what if some of the sales transactions on the 2nd of April got updated? The sales amount you see would likely be the updated amount, but not the original amount. It is sometimes useful to be able to see what was that original amount, or in other words, travel in time and see what that value was.

Click through for a combination video and article. The syntax isn’t quite the same as with temporal tables in SQL Server, though it’s close enough to follow along if that’s your relevant experience.

Comments closed

Modern Data Warehousing with Data Lake Storage and Azure Data Factory

Josephine Bush continues a series on modern data warehousing:

In today’s data-driven world, having the right tools to manage and process large datasets is crucial. That’s where Azure Data Lake Storage (ADLS) and Azure Data Factory (ADF) come in handy, making it easier than ever to store and transform your data. In this post, I’ll show you how to set up ADLS to store your Parquet files and configure ADF to manage your data flows efficiently.

Read on for an overview of both technologies.

Comments closed