Virtualize Data Or Move It?

James Serra contrasts data virtualization with traditional ETL moving data to a warehouse:

Data virtualization integrates data from disparate sources, locations and formats, without replicating or moving the data, to create a single “virtual” data layer that delivers unified data services to support multiple applications and users.

Data movement is the process of extracting data from source systems and bringing it into the data warehouse and is commonly called ETL, which stands for extraction, transformation, and loading.

If you are building a data warehouse, should you move all the source data into the data warehouse, or should you create a virtualization layer on top of the source data and keep it where it is?

Read on for James’s thoughts.

Related Posts

HDP 3.0 Updates To Hive And Druid

Nishant Bangarwa has some updates to Apache Druid in HDP 3.0: There are numerous improvements that went into HDP 3.0 and the performance improvements shown are an aggregate result of all of them. Here are some of the more noteworthy improvements related to Druid-Hive integration : Druid Expressions Support – HIVE-18893/ CALCITE-2170   added support for Druid expressions in Hive. […]

Read More

Redshift Architecture Performance Tips

John Ryan has a few hints to help us build speedy Redshift clusters: The Need to Vacuum As Redshift does not reclaim free space automatically, updates and delete operations can frequently lead to table growth. Equally, it’s important as new entries are added, that the data is maintained in a sorted sequence. The VACUUM command is […]

Read More

Categories

February 2018
MTWTFSS
« Jan Mar »
 1234
567891011
12131415161718
19202122232425
262728