Manoj Gautam shows how to perform a logistic regression with Apache Spark:
Since we are going to try algorithms like Logistic Regression, we will have to convert the categorical variables in the dataset into numeric variables. There are 2 ways we can do this.
- Category Indexing
- One-Hot Encoding
Here, we will use a combination of StringIndexer and OneHotEncoderEstimator to convert the categorical variables. The
OneHotEncoderEstimator
will return a SparseVector.
Click through for the code and explanation.