Rolling A Log Analytics System

Michael Sun and Jeff Shmain put together a log analytics sytem using several technologies:

This is an example of tiered system design. Tiered system is a system design pattern where data is categorized and stored in different data stores that match best to each category. It can both improve performance and lower the cost of a system. One of the most famous tiered system designs is computer memory hierarchy.  In the log analytics use case, analysts mostly search for logs in recent months, but often run batch jobs to get long term trends from logs in recent years. Therefore, recent logs are indexed and stored in Solr for search, while years of logs are stored in HBase for batch processing. As such, the index in Solr is small, which both improves performance and reduces cost, among other benefits.

Although only months of logs are stored in Solr, the logs before that period are stored in HBase and can be indexed on demand for further analysis.

Now that we have covered a high level architecture of a log analytics system, we will dive into more details of individual components.

This looks like a solid architecture for a logging system and can apply to other cases as well.

Related Posts

Flint: Time Series With Spark

Li Jin and Kevin Rasmussen cover the concepts of Flint, a time-series library built on Apache Spark: Time series analysis has two components: time series manipulation and time series modeling. Time series manipulation is the process of manipulating and transforming data into features for training a model. Time series manipulation is used for tasks like data […]

Read More

ElasticMapReduce And RStudio

Tanzir Musabbir demonstrates how to set up Amazon ElasticMapReduce to include an RStudio edge node: RStudio Server provides a browser-based interface for R and a popular tool among data scientists. Data scientist use Apache Spark cluster running on  Amazon EMR to perform distributed training. In a previous blog post, the author showed how you can install RStudio Server on Amazon […]

Read More

Categories

April 2017
MTWTFSS
« Mar May »
 12
3456789
10111213141516
17181920212223
24252627282930