Microsoft ML For Park

Xiaoyong Zhu announces that the Microsoft Machine Learning library is now available for Spark:

We’ve learned a lot by working with customers using SparkML, both internal and external to Microsoft. Customers have found Spark to be a powerful platform for building scalable ML models. However, they struggle with low-level APIs, for example to index strings, assemble feature vectors and coerce data into a layout expected by machine learning algorithms. Microsoft Machine Learning for Apache Spark (MMLSpark) simplifies many of these common tasks for building models in PySpark, making you more productive and letting you focus on the data science.

The library provides simplified consistent APIs for handling different types of data such as text or categoricals. Consider, for example, a DataFrame that contains strings and numeric values from the Adult Census Income dataset, where “income” is the prediction target.

It’s an open source project as well, so that barrier to entry is lowered significantly.

Related Posts

Comparing TensorFlow Versus PyTorch

Anirudh Rao compares PyTorch to TensorFlow: For small-scale server-side deployments both frameworks are easy to wrap in e.g. a Flask web server. For mobile and embedded deployments, TensorFlow works really well. This is more than what can be said of most other deep learning frameworks including PyTorch. Deploying to Android or iOS does require a non-trivial amount of work in TensorFlow. You don’t have to rewrite the entire inference portion of your model in Java or C++. […]

Read More

Looking At Databricks Cluster Pricing

Tristan Robinson takes a look at Azure Databricks pricing: The use of databricks for data engineering or data analytics workloads is becoming more prevalent as the platform grows, and has made its way into most of our recent modern data architecture proposals – whether that be PaaS warehouses, or data science platforms. To run any […]

Read More


June 2017
« May Jul »