Data Warehouse Automation

Koos van Strien provides some thoughts on data warehouse automation tools:

Currently, I think there are two main approaches to Data Warehouse Automation

  1. Data Warehouse Generation: You provide sources, mappings, datatype mappings etc.. The tool generates code (or artifacts).
  2. Data Warehouse Automation (DWA): The tool not only generates code / artifacts, but also manages the existing Data Warehouse, by offering continuous insight in data flows, actual lineage, row numbers, etc..

The difference might seem small, but IMHO is visible most clearly whenever changes occur in the Data Warehouse – the second class of tools can handle those changes (while preserving history). With the first class of tools provide you with the new structures, but you need to handle the preservation of history yourself (as you would’ve without DWA).

Read on for a contrast of these two approaches.

Related Posts

Thoughts On Snowflake DB

Koen Verbeeck shares some thoughts after working with Snowflake DB for a few months: Let’s start with the positive. Snowflake is a really scalable database. Storage is virtually limitless, since the data is stored on blob storage (S3 on AWS and Blob Storage on Azure). The compute layer (called warehouses) is completely separated from the storage […]

Read More

HDP 3.0 Updates To Hive And Druid

Nishant Bangarwa has some updates to Apache Druid in HDP 3.0: There are numerous improvements that went into HDP 3.0 and the performance improvements shown are an aggregate result of all of them. Here are some of the more noteworthy improvements related to Druid-Hive integration : Druid Expressions Support – HIVE-18893/ CALCITE-2170   added support for Druid expressions in Hive. […]

Read More

Categories

July 2017
MTWTFSS
« Jun Aug »
 12
3456789
10111213141516
17181920212223
24252627282930
31