Breeze: Mathematics In Scala

Nitin Aggarwal introduces the mathematics library behind Spark’s machine learning library, MLlib:

In simple terms, Breeze is a Scala library that extends the Scala collection library to provide support for vectors and matrices in addition to providing a whole bunch of functions that support their manipulation. We could safely compare Breeze to NumPy in Python terms. Breeze forms the foundation of MLlib—the Machine Learning library in Spark

Breeze comprises four libraries:

  • breeze-math: Numerics and Linear Algebra. Fast linear algebra backed by native libraries (via JBlas) where appropriate.

  • breeze-process: Tools for tokenizing, processing, and massaging data, especially textual data. Includes stemmers, tokenizers, and stop word filtering, among other features.

  • breeze-learn: Optimization and Machine Learning. Contains state-of-the-art routines for convex optimization, sampling distributions, several classifiers, and DSLs for Linear Programming and Belief Propagation.

  • breeze-viz: (Very alpha) Basic support for plotting, using JFreeChart.

Read on for samples and basic usage.

Related Posts

Leveraging Hive In Pyspark

Fisseha Berhane shows how to use Spark to connect Python to Hive: If we are using earlier Spark versions, we have to use HiveContext which is variant of Spark SQL that integrates with data stored in Hive. Even when we do not have an existing Hive deployment, we can still enable Hive support. In this […]

Read More

Unit Testing Spark Streaming DStreams

Anuj Saxena shows how to create unit tests for DStreams in Spark Streaming: The method ‘ testOperation ‘ takes the output of the operation performed on the ‘inputPair’ and check whether it is equal to the ‘outputPair’ and just like this, we can test our business logic. This short snippet lets you test your business logic without […]

Read More

Categories

December 2017
MTWTFSS
« Nov Jan »
 123
45678910
11121314151617
18192021222324
25262728293031