Michael Sun and Jeff Shmain put together a log analytics sytem using several technologies:
This is an example of tiered system design. Tiered system is a system design pattern where data is categorized and stored in different data stores that match best to each category. It can both improve performance and lower the cost of a system. One of the most famous tiered system designs is computer memory hierarchy. In the log analytics use case, analysts mostly search for logs in recent months, but often run batch jobs to get long term trends from logs in recent years. Therefore, recent logs are indexed and stored in Solr for search, while years of logs are stored in HBase for batch processing. As such, the index in Solr is small, which both improves performance and reduces cost, among other benefits.
Although only months of logs are stored in Solr, the logs before that period are stored in HBase and can be indexed on demand for further analysis.
Now that we have covered a high level architecture of a log analytics system, we will dive into more details of individual components.
This looks like a solid architecture for a logging system and can apply to other cases as well.