Preparing Text Data For Natural Language Processing

Shirin Glander takes us through the process of preparing natural language data for machine learning using Keras:

As with any neural network, we need to convert our data into a numeric format; in Keras and TensorFlow we work with tensors. The IMDB example data from the keras package has been preprocessed to a list of integers, where every integer corresponds to a word arranged by descending word frequency.

So, how do we make it from raw text to such a list of integers? Luckily, Keras offers a few convenience functions that make our lives much easier.

This is a very nice tutorial if you’re new to the process.

Related Posts

From Excel to R: Three Examples

Abdul Majed Raja has a few examples of things which are easy to do in Excel and how you can do them in R: Create a difference variable between the current value and the next valueThis is also known as lead and lag – especially in a time series dataset this varaible becomes very important in feature engineering. In […]

Read More

Calculating AUC in R

Andrew Treadway shows how you can calculate Area Under the Curve in R: AUC is an important metric in machine learning for classification. It is often used as a measure of a model’s performance. In effect, AUC is a measure between 0 and 1 of a model’s performance that rank-orders predictions from a model. For […]

Read More

Categories

January 2019
MTWTFSS
« Dec Feb »
 123456
78910111213
14151617181920
21222324252627
28293031