Logistic Regression With Apache Spark

Manoj Gautam shows how to perform a logistic regression with Apache Spark:

Since we are going to try algorithms like Logistic Regression, we will have to convert the categorical variables in the dataset into numeric variables. There are 2 ways we can do this.

  1. Category Indexing
  2. One-Hot Encoding

Here, we will use a combination of StringIndexer and OneHotEncoderEstimator to convert the categorical variables. The OneHotEncoderEstimator will return a SparseVector.

Click through for the code and explanation.

Related Posts

Monte Carlo Simulation in Python

Kristian Larsen has a couple of posts on Monte Carlo style simulation in Python. First up is a post which covers how to generate data from different distributions: One method that is very useful for data scientist/data analysts in order to validate methods or data is Monte Carlo simulation. In this article, you learn how […]

Read More

Hyperparameter Tuning with MLflow

Joseph Bradley shows how you can perform hyperparameter tuning of an MLlib model with MLflow: Apache Spark MLlib users often tune hyperparameters using MLlib’s built-in tools CrossValidator and TrainValidationSplit.  These use grid search to try out a user-specified set of hyperparameter values; see the Spark docs on tuning for more info. Databricks Runtime 5.3 and 5.3 ML and above support […]

Read More


November 2018
« Oct Dec »