Nitin Aggarwal introduces the mathematics library behind Spark’s machine learning library, MLlib:

In simple terms, Breeze is a Scala library that extends the Scala collection library to provide support for vectors and matrices in addition to providing a whole bunch of functions that support their manipulation. We could safely compare Breeze to

NumPy in Pythonterms. Breeze forms the foundation of MLlib—the Machine Learning library in SparkBreeze comprises four libraries:

**breeze-math:**Numerics and Linear Algebra. Fast linear algebra backed by native libraries (via JBlas) where appropriate.**breeze-process:**Tools for tokenizing, processing, and massaging data, especially textual data. Includes stemmers, tokenizers, and stop word filtering, among other features.**breeze-learn:**Optimization and Machine Learning. Contains state-of-the-art routines for convex optimization, sampling distributions, several classifiers, and DSLs for Linear Programming and Belief Propagation.**breeze-viz:**(Very alpha) Basic support for plotting, using JFreeChart.

Read on for samples and basic usage.

Kevin Feasel

2017-12-28

Data Science, Misc Languages, Spark