Press "Enter" to skip to content

Category: Warehousing

Data Lakes, Warehouses, and Lakehouses

Noa Shavit disambiguates three terms:

data warehouse is a repository and platform for storing, querying, and manipulating data. Warehouses are particularly suited for structured data used for decision support and business intelligence. Modern data warehouses have become more efficient, flexible, and scalable (particularly in the context of massively parallel processing and distributed computation), but they still bear the mark of their early development in the previous century.

The data warehouse concept dates back to data marts in the 1970s. After a long incubation period, the idea began to bear fruit commercially at IBM in the late 1980s and early 1990s. Data warehousing improved on the inefficiency of data marts, siloed data stores maintained by individual departments. 

Click through to learn more about each of the three concepts and how they relate.

Comments closed

Cloning Schemas in Snowflake

Kevin Wilkie makes a smaller clone:

One table at a time. Now I know what you’re thinking “Copying some of my databases would take forever…”

Well, don’t fret, my friends! Snowflake has a way!

Click through to see how you can clone an entire schema, including relevant tables, views, functions, procedures, etc. Kevin also specifies what types of objects do not copy, so check that out.

Comments closed

Table Cloning in Snowflake

Kevin Wilkie creates a clone:

In this coding scenario, I am copying everything from TableA and pushing it into a new table called TableB in the same database and schema.

If you check the size of the database before and after you clone a table, it will be the same size – no matter the size of TableA. This is because, at this point in time, TableB exists only as a “pointer” to the data that constitutes TableA. It is not until something changes in one of the tables – say adding a row to TableA, that it stops being a “pointer” and is artificially constituted.

Read on to learn more about how this works.

Comments closed

E-Mailing Query Results in Snowflake

Kevin Wilkie gussies up an e-mail:

In our last post, we discussed the most basic of all ways how in Snowflake you can send out a basic email. It was pretty simple, straight text – nothing to really grab the attention of our readers – which we know is the way to craft an email, right?

To do this, we’re going to have some fun in Python. Yes, delve deep into your bag of Python tricks as we get up to some shenanigans with Snowflake and Python.

Read on for a procedure to e-mail the prior result set in HTML format.

Comments closed

Hyperthreading and Warehouses

Joey D’Antoni covers the impact of hyperthreading on warehouse performance:

Database performance, especially in the cloud, is always a big concern. Traditionally, cloud storage has not performed as well as high-end on-premises storage. The introduction of faster cloud storage (in Azure Premium V2 and Ultra Disk, in AWS IO1-3) and VMs with more available storage bandwidth has improved overall storage performance. However, storage is only sometimes the complete picture.

Read on for Joey’s tests and findings.

Comments closed

SCD Types in Microsoft Fabric

Kenneth Omorodion reminds us that the Kimball model is still quite valuable:

In modern data warehousing, how we handle updates to dimension tables is crucial. There are several approaches; but the decision often comes down to two primary strategies: Slowly Changing Dimensions (SCD) Type 2 and overwriting tables. Each has its own benefits, use cases, and trade-offs. This tip will explore the two methods and why SCD Type 2 is often a better option in many data warehouse scenarios.

Read on for this overview of the benefits of type-2 slowly changing dimensions, as well as a little bit of coverage of several other types of slowly changing dimensions.

Comments closed

Create a Case Insensitive Warehouse in Microsoft Fabric

Gilbert Quevauvilliers is speaking my language:

This is a quick blog post to show you how to use a Microsoft Fabric Notebook to quickly and easily create a Case Insensitive Warehouse.

Just a quick note when I talk about a Case Insensitive Warehouse, what that means is that the upper casing and lower casing of column names and text are ignored. By default, Warehouses and Lakehouse’s are case sensitive in Microsoft Fabric.

Case sensitivity is a trap, so I applaud Gilbert’s commitment to excellence here.

Comments closed

Sending E-Mail from SQL Server and Snowflake

Kevin Wilkie says you’ve got mail:

For the many other parameters you can use with sp_sendmail, please refer to the Microsoft Documentation found here.

After running this stored procedure, SQL Server queues the email to be sent. From there, you should check often to see if it was actually sent. (And yes, it can get stuck there forever.)

With, Snowflake, it’s actually fairly close to the same basic principle as SQL Server.

Click through for the syntax for both systems.

Comments closed

Comparing Snowflake vs SQL Server E-Mail Configuration

Kevin Wilkie sends two e-mails:

Today, I want to talk about all the effort that goes into setting up the ability to email in SQL Server and Snowflake.

First is our old friend – SQL Server. I’ll leave this one to the experts at Microsoft. As has been the case over the last few years, they have some great documentation at Learn.Microsoft.com – especially when it comes to SQL Server.

I don’t know anything about sending e-mails via Snowflake (other than what Kevin mentions here), though I imagine a lot of the difference in complexity is that SQL Server allows arbitrary SMTP selection and requires an existing SMTP server.

Comments closed

A Primer on Medallion Architecture in Microsoft Fabric

Kenneth Omorodion builds a warehouse:

Data warehouses are essential components of modern analytics systems, offering optimized storage and processing capabilities for large volumes of data. When integrated with a Lakehouse architecture, you can combine the best of both worlds—structured, schema-enforced data storage with the flexibility and scalability of data lakes. Microsoft Fabric provides an excellent environment for implementing the Medallion Architecture, a design pattern for building efficient data processing pipelines by layering data into bronze, silver, and gold zones.

Click through for the process.

Comments closed