Using Spark MLlib For Categorization

Taras Matyashovskyy uses Apache Spark MLlib to categorize songs in different genres:

The roadmap for implementation was pretty straightforward:

  • Collect the raw data set of the lyrics (~65k sentences in total):

    • Black Sabbath, In Flames, Iron Maiden, Metallica, Moonspell, Nightwish, Sentenced, etc.
    • Abba, Ace of Base, Backstreet Boys, Britney Spears, Christina Aguilera, Madonna, etc.
  • Create training set, i.e. label (0 for metal | 1 for pop) + features (represented as double vectors)

  • Train logistic regression that is the obvious selection for the classification

This is a supervised learning problem, and is pretty fun to walk through.

Related Posts

Capsule Neural Networks

Saurabh Kulshrestha covers the topic of capsule neural networks: This is the problem with Convolutional Neural Networks as well. CNN is good at detecting features, but will wrongly activate the neuron for face detection. This is because it is less effective at exploring the spatial relationships among features. A simple CNN model can extract the […]

Read More

Creating An Azure Chat Bot

Dustin Ryan shows how to build a QnA bot: After you’ve created your knowledge base you can then edit and update your knowledge base. There’s a few different ways to update your knowledge. a. Manually edit the knowledge base directly within QnAMaker.ai. You can do this by directly editing the questions by modifying the text […]

Read More

Categories

November 2016
MTWTFSS
« Oct Dec »
 123456
78910111213
14151617181920
21222324252627
282930