Serverless Lambda Architecture

Laith Al-Saadoon shows off a new Amazon Web Services product, AWS Glue, which allows you to build a data processing system on the Lambda architecture without directly provisioning any EC2 instances:

With the launch of AWS Glue, AWS provides a portfolio of services to architect a Big Data platform without managing any servers or clusters. AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (for example, the table definition and schema) in the AWS Glue Data Catalog. After it’s cataloged, your data is immediately searchable, queryable, and available for ETL.

AWS Glue generates the code to execute your data transformations and data loading processes. Furthermore, AWS Glue provides a managed Spark execution environment to run ETL jobs against a data lake in Amazon S3. In short, you can now run a Lambda Architecture in AWS in a completely 100% serverless fashion!

“Serverless” applications allow you to build and run applications without thinking about servers. What this means is that you can now stream data in real-time, process huge volumes of data in S3, and run SQL queries and visualizations against that data without managing server provisioning, installation, patching, or capacity scaling. This frees up your users to spend more time interpreting the data and deriving business value for your organization.

Laith has a working demo of the process available as well.

Related Posts

Keep That Data Raw

Archana Madhavan argues that you should retain your raw data: When your pipeline already has to read every line of your data, it’s tempting to make it perform some fancy transformations. But you should steer clear of these add-ons so that you: Avoid flawed calculations. If you have thousands of machines running your pipeline in real-time, […]

Read More

Long-Term Storage In Kafka

Jay Kreps shows us that you can use Kafka as a primary data store: The short answer is that it’s not insane, people do this all the time, and Kafka was actually designed for this type of usage. But first, why might you want to do this? There are actually a number of use cases, […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

Categories

September 2017
MTWTFSS
« Aug  
 123
45678910
11121314151617
18192021222324
252627282930