John Miner shares some notes. Part 1 looks at getting started and tables, both managed and unmanaged:
The architectural diagram shows how information flows from a source system, into a delta lake house, transformed by programs, and used by end users. To get source data into the lake, we can use any of the three methods to retrieve the data as files: pipelines – traditional Azure Data Factory components, dataflows – wrangling data flows based on Power Query and shortcuts – the ability to link external storage to the lake. Once the data is in the lake, there are two types of programs that can transform the data files: spark notebooks and data flows.
Part 2 covers file and folder management:
In practice, I have seen an additional quality zone called raw be used to stage files in their native format before converting to a delta file format. Please note, the lake house uses either shortcuts or pipelines to get files into the lake. We will talk more about bronze, silver and gold zones when I cover full and incremental loading later in this article.
Read on for John’s thoughts.
Comments closed