Press "Enter" to skip to content

Category: Warehousing

Real-Time Data Streaming in Snowflake

Anil Kumar Moka streams some data:

Real-time data ingestion has become essential for modern analytics and operational intelligence. Organizations across industries need to process data streams from IoT sensors, financial transactions, and application events with minimal latency. Snowflake offers two robust approaches to meet these real-time data needs: Snowpipe for near-real-time file-based streaming and Direct Streaming via Snowpark API for true real-time data integration.

This guide explores both options in depth, providing detailed implementations with explanation of code parameters, performance comparisons, and practical recommendations to help you choose the right approach for your specific use case.

Click through to see how it works. I’ll only make one semi-snarky comment that ‘real-time’ doesn’t mean “takes several seconds” but I realize I’m the one tilting at windmills here.

Comments closed

Choosing a Warehousing Data Architecture

James Serra compares and contrasts OLAP architectures:

As discussed in my blog and book “Deciphering Data Architectures: Choosing Between a Modern Data Warehouse, Data Fabric, Data Lakehouse, and Data Mesh” (Amazon), organizations are often challenged with choosing the right data architecture to meet their business goals—especially as AI and data-driven decision-making take center stage. To help clarify, here’s a quick review of the four core architectures, followed by guidance on when to use each. Each architecture includes five stages of data movement – ingest, store, transform, model, and visualize (described here).

Click through for James’s take on how each of them works and when you might choose one over the other.

Comments closed

Temp Table Bugs in Microsoft Fabric Warehouses

Jared Westover runs into a wall:

I was excited when Microsoft announced the ability to create session-scoped temporary tables in a Fabric warehouse. However, after using Microsoft Fabric temporary tables, I quickly felt disappointed. When will they be ready for prime time, and in the meantime, what other options are available?

Click through for Jared’s experience, although it might already be fixed.

Comments closed

Comparing Microsoft Fabric to Snowflake

Evanjalin Joseph lays out a comparison:

Take ShopSmart, a global retail chain that operates both online and offline. The company wants to combine its sales, inventory, and customer data in order to facilitate real-time reporting and predictive analytics. Two top platforms are being assessed by the IT team for this change.

Azure, Power BI, and Microsoft 365 are already widely used by ShopSmart, which is in line with Fabric’s integrated ecosystem. The alternative, however, provides more multi-cloud flexibility and strong performance on structured data. The group has to choose between selecting a more specialized warehousing solution with more deployment options or making use of its current Microsoft investments.

Let’s examine the differences between the two platforms.

Click through for an overview of each platform and how they stack up against one another.

Comments closed

Creating a Dashboard in Snowflake

Kevin Wilkie doesn’t just do e-mails:

If you haven’t had a chance to mess around in Snowflake, there are internal dashboards that you can provide to your users (or just use yourself) so that you can get the latest data.

Today, I want to quickly show a use case for these Dashboards and one way that you can get data quickly and easily in a nice format.

And right above Dashboards is Streamlit, one of my favorite libraries for building rapid application prototypes and even small data applications.

Comments closed

Behind the Scenes in Developing a Cassandra Password Validator

Stefan Miklosovic walks us through a new feature in Apache Cassandra:

Here’s the problem: while users have always had the ability to create whatever password they wanted in Cassandra–from straightforward to incredibly complex and everything in between–this ultimately created a noticeable security vulnerability.

While organizations might have internal processes for generating secure passwords that adhere to their own security policies, Cassandra itself did not have the means to enforce these standards. To make the security vulnerability worse, if a password initially met internal security guidelines, users could later downgrade their password to a less secure option simply by using “ALTER ROLE” statements.

Read on to see how CEP-24 helps with this. It looks like CEP-24 will be released in Apache Cassandra 5.1.

Comments closed

Snowflake Query Tags in Power BI

Chris Webb takes some of the shine off of things:

Since the November 2024 Power BI release blog post announced that queries sent to Snowflake by Power BI include a query tag I’ve had a lot of questions from people who couldn’t see this happening or wanted to know what the query tags contained, so in this blog I thought I would outline the current status.

It turns out that the query tag isn’t as far along as the blog post indicated, and there are some pretty big limitations in the cases in which there actually is tagging.

Comments closed

Loading Data from Pandas into Snowflake

Anil Kumar Moka loads some data:

Loading data into Snowflake is a common need. Using Python and pandas is a common go-to solution for data professionals. Whether you’re pulling data from a relational database, wrangling a CSV file, or prototyping a new pipeline, this combination leverages pandas’ intuitive data manipulation and Snowflake’s cloud-native scalability. But let’s be real—data loading isn’t always a simple task.

Files go missing, connections drop, and type mismatches pop up when you least expect them. That’s why robust error handling isn’t just nice-to-have; it’s essential for anything you’d trust in production. In this guide, we’ll walk through the fundamentals of getting data into Snowflake, explore practical examples with pandas and SQLAlchemy, and equip you with the tools to build a dependable, real-world-ready pipeline. Let’s dive in and make your data loading process as smooth as possible!

Read on for a quick primer around data loading and some of the sanity checking we should be doing along the way.

Comments closed

Foreign Key Relationships in Microsoft Fabric Data Warehouses

Jared Westover looks at key constraints:

In late 2024, I noticed a comment on the Microsoft Learn site stating that foreign keys could improve query performance on tables in a Fabric warehouse. That claim immediately caught my attention. I wanted to answer a simple question: Do relationships help, hurt, or have no effect when added to tables in a Fabric warehouse?

Let’s get more specific—do foreign keys improve query performance when reading data (not loading)? In other words, do they make queries run faster?

Sadly, the answer is not as promising as with SQL Server. But this also makes sense considering the distributed nature of Fabric data warehouses.

Comments closed

Asynchronous SQL Statement Execution in Snowflake

Koen Verbeeck doesn’t want to wait for an answer:

It’s been a while since I blogged about Snowflake, but a recent LinkedIn post caught my attention: the ability to add asynchronous execution of SQL statements in a stored procedure. In other words: parallel execution of SQL statements. This got me excited, because in my opinion this is something that has been missing in T-SQL since forever. Every time you want to do something in parallel, you need to use external tools to accomplish this in SQL Server (or Azure SQL DB, or Fabric Warehouse, or Fabric SQL DB, or … you get the point). You needed to use SQL Server Agent Jobs, or SSIS packages, or Azure Data Factory and so on.

Snowflake introduces the ASYNC and AWAIT keywords, which can be used to trigger asynchronous execution. 

Read on for a very simple example and some thoughts from Koen. Aside from possibly making data modifications faster (assuming there are no constraint checks), I’m not quite sure what the major benefit to this is. I’d generally use asynchronous calls to support UI operations, letting a calling application respond to user input while some background thread processes data. But I’m not positive what you get from pushing async/await logic into the database itself.

Comments closed