AWS Data Lake

Nick Corbett announces that Amazon is rolling out their own data lake solution:

Separating storage from processing can also help to reduce the cost of your data lake. Until you choose to analyze your data, you need to pay only for S3 storage. This model also makes it easier to attribute costs to individual projects. With the correct tagging policy in place, you can allocate the costs to each of your analytical projects based on the infrastructure that they consume. In turn, this makes it easy to work out which projects provide most value to your organization.

The data lake stores metadata in both DynamoDB and Amazon ES. DynamoDB is used as the system of record. Each change of metadata that you make is saved, so you have a complete audit trail of how your package has changed over time. You can see this on the data lake console by choosing History in the package view:

Having a competitor in the data lake space is a good thing for us, though based on this intro post, it seems that Amazon and Microsoft are taking different approaches to the data lake, where Microsoft wants you to stay in the data lake (e.g., writing U-SQL or Python statements to query the data lake) and Amazon wants you to shop the data lake and check out the specific S3 buckets and files for your own processing.

Related Posts

The Forgotten Infrastructure Below Azure BI Architecture Diagrams

Meagan Longoria reminds us that there are several products which Azure BI projects need but which we tend to forget when building architectural diagrams: Let’s start with Azure Active Directory (AAD). In order to provision the resources in the diagram, your Azure subscription must already be associated with an Active Directory. AAD is Microsoft’s cloud-based […]

Read More

AzureR Packages In Cran

David Smith points out that the Azure packages for R are now in CRAN: The suite of AzureR packages for interfacing with Azure services from R is now available on CRAN. If you missed the earlier announcements, this means you can now use the install.packages function in R to install these packages, rather than having to install from the […]

Read More

Categories

December 2016
MTWTFSS
« Nov Jan »
 1234
567891011
12131415161718
19202122232425
262728293031