Warehousing – Page 2

Here’s the problem: while users have always had the ability to create whatever password they wanted in Cassandra–from straightforward to incredibly complex and everything in between–this ultimately created a noticeable security vulnerability.

While organizations might have internal processes for generating secure passwords that adhere to their own security policies, Cassandra itself did not have the means to enforce these standards. To make the security vulnerability worse, if a password initially met internal security guidelines, users could later downgrade their password to a less secure option simply by using “ALTER ROLE” statements.

Read on to see how CEP-24 helps with this. It looks like CEP-24 will be released in Apache Cassandra 5.1.

Comments closed

Snowflake Query Tags in Power BI

Published 2025-04-14 by Kevin Feasel

Chris Webb takes some of the shine off of things:

Since the November 2024 Power BI release blog post announced that queries sent to Snowflake by Power BI include a query tag I’ve had a lot of questions from people who couldn’t see this happening or wanted to know what the query tags contained, so in this blog I thought I would outline the current status.

It turns out that the query tag isn’t as far along as the blog post indicated, and there are some pretty big limitations in the cases in which there actually is tagging.

Comments closed

Loading Data from Pandas into Snowflake

Published 2025-04-04 by Kevin Feasel

Anil Kumar Moka loads some data:

Loading data into Snowflake is a common need. Using Python and pandas is a common go-to solution for data professionals. Whether you’re pulling data from a relational database, wrangling a CSV file, or prototyping a new pipeline, this combination leverages pandas’ intuitive data manipulation and Snowflake’s cloud-native scalability. But let’s be real—data loading isn’t always a simple task.

Files go missing, connections drop, and type mismatches pop up when you least expect them. That’s why robust error handling isn’t just nice-to-have; it’s essential for anything you’d trust in production. In this guide, we’ll walk through the fundamentals of getting data into Snowflake, explore practical examples with pandas and SQLAlchemy, and equip you with the tools to build a dependable, real-world-ready pipeline. Let’s dive in and make your data loading process as smooth as possible!

Read on for a quick primer around data loading and some of the sanity checking we should be doing along the way.

Comments closed

Foreign Key Relationships in Microsoft Fabric Data Warehouses

Published 2025-03-19 by Kevin Feasel

Jared Westover looks at key constraints:

In late 2024, I noticed a comment on the Microsoft Learn site stating that foreign keys could improve query performance on tables in a Fabric warehouse. That claim immediately caught my attention. I wanted to answer a simple question: Do relationships help, hurt, or have no effect when added to tables in a Fabric warehouse?

Let’s get more specific—do foreign keys improve query performance when reading data (not loading)? In other words, do they make queries run faster?

Sadly, the answer is not as promising as with SQL Server. But this also makes sense considering the distributed nature of Fabric data warehouses.

Comments closed

Asynchronous SQL Statement Execution in Snowflake

Published 2025-03-18 by Kevin Feasel

Koen Verbeeck doesn’t want to wait for an answer:

It’s been a while since I blogged about Snowflake, but a recent LinkedIn post caught my attention: the ability to add asynchronous execution of SQL statements in a stored procedure. In other words: parallel execution of SQL statements. This got me excited, because in my opinion this is something that has been missing in T-SQL since forever. Every time you want to do something in parallel, you need to use external tools to accomplish this in SQL Server (or Azure SQL DB, or Fabric Warehouse, or Fabric SQL DB, or … you get the point). You needed to use SQL Server Agent Jobs, or SSIS packages, or Azure Data Factory and so on.

Snowflake introduces the ASYNC and AWAIT keywords, which can be used to trigger asynchronous execution.

Read on for a very simple example and some thoughts from Koen. Aside from possibly making data modifications faster (assuming there are no constraint checks), I’m not quite sure what the major benefit to this is. I’d generally use asynchronous calls to support UI operations, letting a calling application respond to user input while some background thread processes data. But I’m not positive what you get from pushing async/await logic into the database itself.

Comments closed

Spatial Queries in Fabric Data Warehouse

Published 2025-03-14 by Kevin Feasel

Jovan Popvic reads a map:

Spatial data has become increasingly important in various fields, from urban planning and environmental monitoring to transportation and logistics. Fabric Data Warehouse offers spatial functionalities that enable you to query and analyze spatial data efficiently.

In this blog post, we will delve into the spatial capabilities in the Fabric Data Warehouse and demonstrate how to use the spatial functions in your queries.

This looks a bit like the way we perform spatial operations in SQL Server. Jovan shows off some examples of functionality, so check that out.

Comments closed

Data Security in Snowflake

Published 2025-03-05 by Kevin Feasel

Anil Kumar Moka locks down a Snowflake instance:

In this practical guide, we’ll explore techniques to help you secure your use of Snowflake:

Foundational Security Setup

Secure Views and Their Critical Role

Row-Level Security Implementation Methods

Dynamic Data Masking Strategies

Encryption and Data Protection

Best Practices and Common Pitfalls

Read on for the full article.

Comments closed

Notes on Change Tracking for Warehouse Incremental Loads

Published 2025-01-31 by Kevin Feasel

Meagan Longoria shares some hard-earned experience:

I have a few clients that incrementally load tables from a SQL Server source into their data warehouse or lakehouse by using change tracking. Lately, they encountered some issues with changes to the configuration and the data in the source database, so I decided to share some things you can check before using change tracking as part of your ETL load or when troubleshooting your data load.

Click through for three common issues you may run into while using change tracking.

Comments closed

Creating a Microsoft Fabric Warehouse with Service Principal

Published 2025-01-16 by Kevin Feasel

Gilbert Quevauvilliers sets up a new warehouse:

In this blog post I am going to show you how to create a Microsoft Fabric Warehouse, where the owner will be the Service Principal.

As mentioned in the blog post here are some of the advantages of having the Service Principal as the Warehouse Owner.

Using a Service Principal to create the warehouse avoids issue where the person who created the warehouse leaves the organization and issues arise when the users account is deleted from Entra ID.

You avoid the painful logging in with the user account to ensure the password remains updated.

The organization now owns the warehouse and not an individual user.

I will show you how I created a Warehouse with the owner being a Service Principal this using a Microsoft Fabric Notebook

Click through for the notebook and additional commentary.

Comments closed

Custom SCD2 with PySpark

Published 2025-01-14 by Kevin Feasel

Abhishek Trehan creates a type-2 slowly changing dimension:

A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse. It is considered and implemented as one of the most critical ETL tasks in tracking the history of dimension records.

SCD2 is a dimension that stores and manages current and historical data over time in a data warehouse. The purpose of an SCD2 is to preserve the history of changes. If a customer changes their address, for example, or any other attribute, an SCD2 allows analysts to link facts back to the customer and their attributes in the state they were at the time of the fact event.

Read on for an implementation in Python.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Category: Warehousing

Behind the Scenes in Developing a Cassandra Password Validator

Snowflake Query Tags in Power BI

Loading Data from Pandas into Snowflake

Foreign Key Relationships in Microsoft Fabric Data Warehouses

Asynchronous SQL Statement Execution in Snowflake

Spatial Queries in Fabric Data Warehouse

Data Security in Snowflake

Notes on Change Tracking for Warehouse Incremental Loads

Creating a Microsoft Fabric Warehouse with Service Principal

Custom SCD2 with PySpark