History Of Apache Storm

Kevin Feasel

2016-05-20

Hadoop

Taylor Goetz gives a history of Storm up to release 1.0:

Storm was originally created by Nathan Marz while he was at Backtype (later acquired by Twitter) working on analytics products based on historical and real-time analysis of the Twitter firehose. Nathan envisioned Storm as a replacement for the real-time component that was based on a cumbersome and brittle system of distributed queues and workers. Storm introduced the concept of the “stream” as a distributed abstraction for data in motion, as well as a fault tolerance and reliability model that was difficult, if not impossible, to achieve with a traditional queues and workers architecture.

Nathan open sourced Storm to GitHub on September 19th, 2011 during his talk at Strange Loop, and it quickly became the most watched JVM project on GitHub. Production deployments soon followed, and the Storm development community rapidly expanded.

Storm is an exciting technology in that it’s a key driver in making Hadoop more than just a batch processing framework.

Related Posts

Calculating YARN Utilization Metrics

Dmitry Tolpeko shows how you can calculate per-second cluster utilization measures from YARN’s resource manager logs: But even if you query YARN REST API every second it still can only provide a snapshot of the used YARN resources. It does not show which application allocates or releases containers, their memory and CPU capacity, in which […]

Read More

Spark Streaming DStreams

Manish Mishra explains the fundamental abstraction of Spark Streaming: Before going into details of the operations available on the DStream API, let us look at the input sources from which we can start a Stream. There are multiple ways in which we can get the inputs from e.g. Kafka, Flume, etc. Or simple Idle files. […]

Read More

Categories

May 2016
MTWTFSS
« Apr Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031