Apache Pulsar 2.0 Released

Kevin Feasel

2018-06-08

Hadoop

George Leopold reports on a new version of Apache Pulsar:

The startup’s Apache Pulsar 2.0 released on Wednesday (June 6) adds new functionality designed to move data users “beyond batch” processing. Among them is a “stream-native” processing capability called Pulsar Functions designed to apply analytics to data as its flows through the Pulsar platform. Processing functions can be written in either Java or Python, the company said.

Debuted earlier this year as a preview feature, Streamlio announced general availability of Functions this week as part of its 2.0 release.

Another is a Pulsar enhancement developed in conjunction with Apache Bookkeeper, a scalable storage system. Streamlio said the new features, called Topic Compaction, delivers streaming data storage designed to improve the performance of applications consuming data from Pulsar. It serves as a “broker” that builds a snapshot of the latest value for each topic key, the startup said.

Read the whole thing.

Related Posts

It’s All ETL (Or ELT) In The End

Robin Moffatt notes that ETL (and ELT) doesn’t go away in a streaming world: In the past we used ETL techniques purely within the data-warehousing and analytic space. But, if one considers why and what ETL is doing, it is actually a lot more applicable as a broader concept. Extract: Data is available from a source system Transform: We […]

Read More

Flint: Time Series With Spark

Li Jin and Kevin Rasmussen cover the concepts of Flint, a time-series library built on Apache Spark: Time series analysis has two components: time series manipulation and time series modeling. Time series manipulation is the process of manipulating and transforming data into features for training a model. Time series manipulation is used for tasks like data […]

Read More

Categories

June 2018
MTWTFSS
« May Jul »
 123
45678910
11121314151617
18192021222324
252627282930