Apache Kudu

Kevin Feasel

2016-09-27

Hadoop

Greg Rahn discusses Apache Kudu:

At a high level, Kudu is a new storage manager that enables durable single-record inserts, updates, and deletes, as well as fast and efficient columnar scans due to its in-memory row format and on-disk columnar format.  This architecture makes Kudu very attractive for data that arrives as a single record at a time or that may need to be modified at a later time.

Today, many users try to solve this challenge via a Lambda architecture, which presents inherent challenges by requiring different code bases and storage for the necessary batch and real-time components. Using Kudu and Impala together completely avoids this problematic complexity by easily and immediately making data inserted into Kudu available for querying and analytics via Impala. (For more technical details on how Impala and Kudu work together for analytical workloads, see this post.)

I’d jokingly say “Someday, somebody’s going to reinvent the relational database inside of Hadoop.”  But it seems like that’s less of a joke than a medium-term prediction.

Related Posts

Flink’s State Processor API

Seth Wiesman and Fabian Hueske show off Apache Flink’s State Processor API: The State Processor API that comes with Flink 1.9 is a true game-changer in how you can work with application state! In a nutshell, it extends the DataSet API with Input and OutputFormats to read and write savepoint or checkpoint data. Due to […]

Read More

Derivative Event Sourcing

Anna McDonald explains the concept of derivative event sourcing: If you happen to be the proud owner of a single order service, then you are all set to begin. But what if you have more than one order service? Something that tends to happen at companies that have been around for more than a sprint […]

Read More

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930