Serverless Lambda Architecture

Laith Al-Saadoon shows off a new Amazon Web Services product, AWS Glue, which allows you to build a data processing system on the Lambda architecture without directly provisioning any EC2 instances:

With the launch of AWS Glue, AWS provides a portfolio of services to architect a Big Data platform without managing any servers or clusters. AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (for example, the table definition and schema) in the AWS Glue Data Catalog. After it’s cataloged, your data is immediately searchable, queryable, and available for ETL.

AWS Glue generates the code to execute your data transformations and data loading processes. Furthermore, AWS Glue provides a managed Spark execution environment to run ETL jobs against a data lake in Amazon S3. In short, you can now run a Lambda Architecture in AWS in a completely 100% serverless fashion!

“Serverless” applications allow you to build and run applications without thinking about servers. What this means is that you can now stream data in real-time, process huge volumes of data in S3, and run SQL queries and visualizations against that data without managing server provisioning, installation, patching, or capacity scaling. This frees up your users to spend more time interpreting the data and deriving business value for your organization.

Laith has a working demo of the process available as well.

Related Posts

Switching To Managed Disks In Azure

Chris Seferlis walks us through an easy method to convert unmanaged disks to managed disks in Azure: First off, why would you want a managed disk over an unmanaged one? Greater scalability due to much higher IOPs and storage limits. There’s no longer the need to add additional storage accounts when you’re adding disk space, […]

Read More

Flint: Time Series With Spark

Li Jin and Kevin Rasmussen cover the concepts of Flint, a time-series library built on Apache Spark: Time series analysis has two components: time series manipulation and time series modeling. Time series manipulation is the process of manipulating and transforming data into features for training a model. Time series manipulation is used for tasks like data […]

Read More

Categories

September 2017
MTWTFSS
« Aug Oct »
 123
45678910
11121314151617
18192021222324
252627282930