Spark 2.1

Reynold Xin announces Apache Spark 2.1:

  • Structured Streaming

    Introduced in Spark 2.0, Structured Streaming is a high-level API for building continuous applications. The main goal is to make it easier to build end-to-end streaming applications, which integrate with storage, serving systems, and batch jobs in a consistent and fault-tolerant way.

    • Event-time watermarks: This change lets applications hint to the system when events are considered “too late” and allows the system to bound internal state tracking late events.

    • Support for all file-based formats and all file-based features: With these improvements, Structured Streaming can read and write all file-based formats, e.g. JSON, text, Avro, CSV. In addition, all file-based features—e.g. partitioned files and bucketing—are supported on all formats.

    • Apache Kafka 0.10: This adds native support for Kafka 0.10, including manual assignment of starting offsets and rate limiting.

This is a pretty hefty release.  Click through to read the whole thing.

Related Posts

Changes in R 3.6.0

David Smith lays out the major changes in R 3.6.0: A major update to the open-source R language, R 3.6.0, was released on April 26 and is now available for download for Windows, Mac and Linux. As a major update, it has many new features, user-visible changes and bug fixes. You can read the details in the release announcement, […]

Read More

Azure SQL Database Edge

Randolph West gives us some quick info on Azure SQL Database Edge: I was first made aware of this edition at the MVP Summit earlier this year, and I need to clear some things up for folks who might be confused about the name, and who it’s for. Firstly, recall that Azure means “hybrid” now, so […]

Read More

Categories

January 2017
MTWTFSS
« Dec Feb »
 1
2345678
9101112131415
16171819202122
23242526272829
3031