Data Lakes

Jen Stirrup has a great primer on data lakes and factors to consider before you jump into the idea:

The organization will need to take a step back to understand better their existing status. Are they just starting out? Are other departments which are doing the same thing, perhaps in the local organization or somewhere else in the world? Once the organization understands their state better, they can start to broadly work out the strategy that the Data Lake is intended to provide.

As part of this understanding, the objective of the Data Lake will need to be identified. Is it for data science? Or, for example, is the Data Lake simply to store data in a holding pattern for data discovery? Identifying the objective will help align the vision and the goals, and set the scene for communication to move forward.

I would like to popularize the term Data Swamp for “that place you store a whole bunch of data of dubious origin and value.”  It’s the place that you promise management of course you can get the data back…as long as they never actually ask for it or are okay with reading terabytes of flat files from backup tapes.  The Data Swamp is the Aristotelian counterpart to the Data Lake, Goofus to its Gallant.  It will also, to my estimate, be the more common version.

Related Posts

Hooking SQL Server to Kafka

Niels Berglund has an interesting scenario for us: We see how the procedure in Code Snippet 2 takes relevant gameplay details and inserts them into the dbo.tb_GamePlay table. In our scenario, we want to stream the individual gameplay events, but we cannot alter the services which generate the gameplay. We instead decide to generate the event from the database […]

Read More

Choosing Clustered Index Columns

Ed Elliott wades into the clustered index debate: I have seen this debated in forums spread over the internet for decades, and the advice that we gave ten years ago isn’t as valid today as it was then. Ten years ago, memory was considerably less, and disks were spinning rust. The advent of SSD’s and […]

Read More

Categories

April 2016
MTWTFSS
« Mar May »
 123
45678910
11121314151617
18192021222324
252627282930