An Apache Flume Overview

Daniel Berman takes us through an overview of Apache Flume:

Apache Flume was developed by Cloudera to provide a way to quickly and reliably stream large volumes of log files generated by web servers into Hadoop. There, applications can perform further analysis on data in a distributed environment. Initially, Apache Flume was developed to handle only log data. Later, it was equipped to handle event data as well.

Click through to get a code-free, high-level understanding of Flume and where it can work for you.

Related Posts

Performance Tuning Neural Network Training

Sean Owen takes us through a few techniques for speeding up neural network model training: Step #2: Use Early StoppingKeras (and other frameworks) have built-in support for stopping when further training appears to be making the model worse. In Keras, it’s the EarlyStopping callback. Using it means passing the validation data to the training process for evaluation […]

Read More

Machine Learning and Delta Lake

Brenner Heintz and Denny Lee walk us through solving data engineering problems with Delta Lake: As a result, companies tend to have a lot of raw, unstructured data that they’ve collected from various sources sitting stagnant in data lakes. Without a way to reliably combine historical data with real-time streaming data, and add structure to […]

Read More

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

August 2019
MTWTFSS
« Jul  
 1234
567891011
12131415161718
19202122232425
262728293031