Microsoft ML For Park

Xiaoyong Zhu announces that the Microsoft Machine Learning library is now available for Spark:

We’ve learned a lot by working with customers using SparkML, both internal and external to Microsoft. Customers have found Spark to be a powerful platform for building scalable ML models. However, they struggle with low-level APIs, for example to index strings, assemble feature vectors and coerce data into a layout expected by machine learning algorithms. Microsoft Machine Learning for Apache Spark (MMLSpark) simplifies many of these common tasks for building models in PySpark, making you more productive and letting you focus on the data science.

The library provides simplified consistent APIs for handling different types of data such as text or categoricals. Consider, for example, a DataFrame that contains strings and numeric values from the Adult Census Income dataset, where “income” is the prediction target.

It’s an open source project as well, so that barrier to entry is lowered significantly.

Related Posts

Machine Learning and Delta Lake

Brenner Heintz and Denny Lee walk us through solving data engineering problems with Delta Lake: As a result, companies tend to have a lot of raw, unstructured data that they’ve collected from various sources sitting stagnant in data lakes. Without a way to reliably combine historical data with real-time streaming data, and add structure to […]

Read More

The Databricks File System

Brad Llewellyn takes us through the Azure Databricks File System: Today, we’re going to talk about the Databricks File System (DBFS) in Azure Databricks.  If you haven’t read the previous posts in this series, Introduction, Cluster Creation and Notebooks, they may provide some useful context.  You can find the files from this post in our GitHub Repository.  Let’s move on […]

Read More

Categories

June 2017
MTWTFSS
« May Jul »
 1234
567891011
12131415161718
19202122232425
2627282930