Using MLFlow For Binary Classification In Keras

Jules Damji walks us through classifying movie reviews as positive or negative reviews, building a neural network via Keras on MLFlow along the way:

François’s code example employs this Keras network architectural choice for binary classification. It comprises of three Dense layers: one hidden layer (16 units), one input layer (16 units), and one output layer (1 unit), as show in the diagram. “A hidden unit is a dimension in the representation space of the layer,” Chollet writes, where 16 is adequate for this problem space; for complex problems, like image classification, we can always bump up the units or add hidden layers to experiment and observe its effect on accuracy and loss metrics (which we shall do in the experiments below).

While the input and hidden layers use relu as an activation function, the final output layer uses sigmoid, to squash its results into probabilities between [0, 1]. Anything closer to 1 suggests positive, while something below 0.5 can indicate negative.

With this recommended baseline architecture, we train our base model and log all the parameters, metrics, and artifacts. This snippet code, from module models_nn.py, creates a stack of dense layers as depicted in the diagram above.

The overall accuracy is pretty good—I ran through a sample of 2K reviews from the set with Naive Bayes last night for a presentation and got 81% accuracy, so the neural network getting 93% isn’t too surprising.  Seeing the confusion matrix in this demo would have been a nice addition.

Related Posts

Working with Columns in Spark

Achilleus has a two-parter on working with columns in Spark. Part 1 covers some of the basic syntax and several functions: Also, we can have typed columns which is basically a column with an expression encoder specified for the expected input and return type. scala> val name = $"name".as[String]name: org.apache.spark.sql.TypedColumn[Any,String] = namescala> val name = […]

Read More

Creating Threadpools with ExecutorService in Kafka

Prasanth Nair shows how we can use Java’s ExecutorService to create threadpools for Kafka consumers: Apache Kafka is one of today’s most commonly used event streaming platforms. While using the Kafka platform, quite often, we run into a scenario where we have to process a large number of events/messages that are placed on a broker. […]

Read More

Categories

August 2018
MTWTFSS
« Jul Sep »
 12345
6789101112
13141516171819
20212223242526
2728293031