Data Lakes And Data Swamps

Randolph West talks about data lakes:

Internet companies including search engines (Google, Bing), social media companies (Facebook, Twitter), and email providers (Yahoo!, Outlook.com) are managing data stores measured in petabytes. On a daily basis these organizations handle all sorts of structured and unstructured data.

Assuming they put all their data in one repository, that could technically be thought of as a data lake. These organizations have adapted existing tools, and even created new technologies, to manage data of this magnitude in a field called big data.

The short version: big data is not a 100 GB SQL Server database or data warehouse. Big data is a relatively new field that came about because traditional data management tools are simply unable to deal with such large volumes of data. Even so, a single SQL Server database can allegedly be more than 500 petabytes in size, but Michael J. Swart warns usif you’re using over 10% of what SQL Server restricts you to, you’re doing it wrong.

Incidentally, I’ll note that the term data swamp has a storied history here at Curated SQL.

Related Posts

On Whether Relational Data Belongs In A Data Lake

Melissa Coates debates whether relational data really belongs in a data lake: For certain types of data, writing it to the data lake really is frequently the best choice. This is often true for low latency IoT data, semi-structured data like logs, and varying structures such as social media data. However, the handling of structured […]

Read More

The Value Of Power BI Dataflows

Matt Allington gets to the core benefits of Power BI Dataflows: Dataflows are: An online service provided by Microsoft as part of Power BI (software as a service, or SaaS). In effect dataflows are an online data collection and storage tool. Collection:  It uses Power Query to connect to the data at the source and transform that data as […]

Read More

Categories

July 2018
MTWTFSS
« Jun Aug »
 1
2345678
9101112131415
16171819202122
23242526272829
3031