Reynold Xin announces Apache Spark 2.1:
-
Structured Streaming
Introduced in Spark 2.0, Structured Streaming is a high-level API for building continuous applications. The main goal is to make it easier to build end-to-end streaming applications, which integrate with storage, serving systems, and batch jobs in a consistent and fault-tolerant way.
-
Event-time watermarks: This change lets applications hint to the system when events are considered “too late” and allows the system to bound internal state tracking late events.
-
Support for all file-based formats and all file-based features: With these improvements, Structured Streaming can read and write all file-based formats, e.g. JSON, text, Avro, CSV. In addition, all file-based features—e.g. partitioned files and bucketing—are supported on all formats.
-
Apache Kafka 0.10: This adds native support for Kafka 0.10, including manual assignment of starting offsets and rate limiting.
-
This is a pretty hefty release. Click through to read the whole thing.