Press "Enter" to skip to content

Category: Warehousing

Retrieving Redshift Query History

Koen Verbeeck wants to see what you did last summer:

Because my Windows machine apparently decides to install updates over night (and thus reboot my machine), it has happened that I lost the query that I was writing for Redshift in the tool DBeaver. When you work with SQL Server Management Studio (SSMS), you typically don’t have this issue as a temporary copy is always saved. Close down SSMS, restart it and the queries are still there.

Click through to see what you can do.

Comments closed

Checklist for a Snowflake Migration

Sandeep Arora has a checklist for us:

We have broken our Snowflake Migration Checklist into nine phases to help plan and execute an end-to-end migration of the existing traditional data platform to Snowflake. These phases will help align migration resources and efforts; however, this doesn’t necessarily mean that all steps should be executed sequentially. Some phases, like “Train Users,” can be executed parallel to other phases.

At a high level, the process isn’t Snowflake-specific—really, 6 of the 9 steps are generic supporting steps which would apply to any major project. This makes the checklist not only a good starting point for a Snowflake migration, but also any major migration project.

Comments closed

Tracking Change Events in Snowflake

Kevin Wilkie shows off an interesting window function:

Notice that it has the OVER operator, you can order the data, and even partition the data as needed (Not seen in this example)!

But, as usual with Snowflake, there are even more functions we can work with! Sometimes, you just need to know when items are changed. Enter the CONDITIONAL_CHANGE_EVENT windowing function!

Click through for an example of how CONDITIONAL_CHANGE_EVENT() works.

Comments closed

Join Operations in BigQuery

Rathish Kumar joins a few tables together:

SQL joins are used to combine columns from multiple tables to get desired result set. In a typical Relational model we use normalized tables, each table represents an entity (example: employee, department, etc) and its relationships and when we need to get data from more than one tables, for example employee name and employee department, we use joins to combine employee name column from employee table, department name column from department table based on employee number key column, which is available on both the tables.

Similarly, typical data warehouse setup follows Star or Snowflake schema consisting of a primary fact table and satellite dimension tables. Fact tables represents events (example: orders table in a ecommerce business) and dimension table represents attributes and slowly changing information (example: customer, product tables).

The syntax is rather similar to most database engines, though there are a few physical join operators which differ from typical relational database management systems. Also, I’ll take this moment to say thank you to Rathish for not using Venn diagrams to show joins and instead using a proper technique.

Comments closed

Landing Zone Layouts for Modern Data Warehouses

Paul Hernandez builds out a landing zone for a warehouse:

In this article I want to discuss some different layout options for a landing zone in a modern cloud data warehouse architecture. With landing zone, I mean a storage account where raw data lands directly from its source system (not to be confused with a landing zone to move a system or application into the cloud).

One of the things I appreciate a lot about this post is that it covers the history, showing us how we got to where we are. Paul’s well-versed in each step along the way and lays things out clearly.

Comments closed

End of Month in Snowflake and SQL Server

Kevin Wilkie is ready for that end-of-month paycheck:

When you work with data, you’ll probably need to work with dates at least once a month. That is the nature of the beast. Today, let’s compare working with them in SQL Server and Snowflake. I want to focus only on adding and subtracting months when provided with a specific day.

Along the way, I would also push for a calendar table, so that you can remove some of the more difficult (or even most common) date calculations.

Comments closed

Object Tagging in Snowflake

Warner Chaves tags a table:

A tag is a user-defined label that can be attached to a Snowflake object, such as a database, table, or column. Tags can categorize objects based on any criteria that you choose, such as sensitivity, business unit, project, or owner. Once tags have been applied, you can use them to control access to the tagged objects, track usage and costs, and apply policies and rules.

Now let’s apply tagging to a specific use case: identifying sensitive customer data. For example, let’s assume that you have a table in Snowflake called “customers” that contains customer information, including their addresses. We want to categorize the “address” column as sensitive so that we can apply data protection policies and controls.

Click through for a few examples of how to create tags, apply tags to database objects, and review tagged objects.

Comments closed

Type 2 Dimension Loading in Redshift

Vaidy Kalpathy takes us through a bit of dimensional modeling in AWS Redshift:

Populating an SCD dimension table involves merging data from multiple source tables, which are usually normalized. SCD tables contain a pair of date columns (effective and expiry dates) that represent the record’s validity date range. Changes are inserted as new active records effective from the date of data loading, while simultaneously expiring the current active record on a previous day. During each data load, incoming change records are matched against existing active records, comparing each attribute value to determine whether existing records have changed or were deleted or are new records coming in.

Click through for the article.

Comments closed