Apache Druid Concepts

Jatin Demla takes us through some of the key concepts behind Apache Druid:

Apache Druid is a distributed, high-performance columnar store for real-time analytics on a large dataset. Druid core design combines the OLAP analytics, time series database and search system to create a single operational analysis. Druid is most suitable for data with high cardinality column or queries having higher aggregation or group by.

Druid has very specific use cases. If you don’t fit one of the use cases, it’s not a good solution at all; but if you do fit one of the use cases, it’s excellent.

Related Posts

Comparing Performance: HBase1 vs HBase2

Surbhi Kochhar takes us through performance improvements between HBase version 1 and HBase version 2: We are loading the YCSB dataset with 1000,000,000 records with each record 1KB in size, creating total 1TB of data. After loading, we wait for all compaction operations to finish before starting workload test. Each workload tested was run 3 […]

Read More

The Transaction Log in Delta Tables

Burak Yavuz, et al, explain how the transaction log works with Delta Tables in Apache Spark: When a user creates a Delta Lake table, that table’s transaction log is automatically created in the _delta_log subdirectory. As he or she makes changes to that table, those changes are recorded as ordered, atomic commits in the transaction log. Each commit […]

Read More

Categories

March 2019
MTWTFSS
« Feb Apr »
 123
45678910
11121314151617
18192021222324
25262728293031