Handwriting Character Recognition

Tomaz Kastrun compares a few different libraries in terms of handwritten numeric character recognition:

Recently, I did a session at local user group in Ljubljana, Slovenija, where I introduced the new algorithms that are available with MicrosoftML package for Microsoft R Server 9.0.3.

For dataset, I have used two from (still currently) running sessions from Kaggle. In the last part, I did image detection and prediction of MNIST dataset and compared the performance and accuracy between.

MNIST Handwritten digit database is available here.

Tomaz has all of the code available as well.

Twitter Sentiment Analysis Using doc2vec

Sergey Bryl uses word2vec and doc2vec to perform Twitter sentiment analysis in R:

But doc2vec is a deep learning algorithm that draws context from phrases. It’s currently one of the best ways of sentiment classification for movie reviews. You can use the following method to analyze feedbacks, reviews, comments, and so on. And you can expect better results comparing to tweets analysis because they usually include lots of misspelling.

We’ll use tweets for this example because it’s pretty easy to get them via Twitter API. We only need to create an app on https://dev.twitter.com (My apps menu) and find an API Key, API secret, Access Token and Access Token Secret on Keys and Access Tokens menu tab.

Click through for more details, including code samples.

R Tools For Visual Studio

Matt Willis has a two-parter on R Tools for Visual Studio.  First, an introduction:

Once all the prerequisites have been installed it is time to move onto the fun stuff! Open up Visual Studio 2015 and add an R Project: File > Add > New Project and select R. You will be presented with the screen below, name the project AutomobileRegression and select OK.

Microsoft have done a fantastic job realising that the settings and toolbar required in R is very different to those required when using Visual Studio, so they have split them out and made it very easy to switch between the two. To switch to the settings designed for using R go to R Tools > Data Science Settings you’ll be presented with two pop ups select Yes on both to proceed. This will now allow you to use all those nifty shortcuts you have learnt to use in RStudio. Anytime you want to go back to the original settings you can do so by going to Tools > Import/Export Settings.

Next is executing an Azure Machine Learning web service within RTVS:

Whilst in R you can implement very complex Machine Learning algorithms, for anyone new to Machine Learning I personally believe Azure Machine Learning is a more suitable tool for being introduced to the concepts.

Please refer to this blog where I have described how to create the Azure Machine Learning web service I will be using in the next section of this blog. You can either use your own web service or follow my other blog, which has been especially written to allow you to follow along with this blog.

Coming back to RTVS we want to execute the web service we have created.

RTVS has grown on me.  It’s still not R Studio and may never be, but they’ve come a long way in a few months.


Simon Jackson introduces pipelearner, a tool to help with creating machine learning pipelines:

This post will demonstrate some examples of what pipeleaner can currently do. For example, the Figure below plots the results of a model fitted to 10% to 100% (in 10% increments) of training data in 50 cross-validation pairs. Fitting all of these models takes about four lines of code in pipelearner.

Click through for some very interesting examples.

Monitoring Car Data With Spark And Kafka

Carol McDonald builds a model to determine where Uber cars are clustered:

Uber trip data is published to a MapR Streams topic using the Kafka API. A Spark streaming application, subscribed to the topic, enriches the data with the cluster Id corresponding to the location using a k-means model, and publishes the results in JSON format to another topic. A Spark streaming application subscribed to the second topic analyzes the JSON messages in real time.

This is a fairly detailed post, well worth the read.

Understanding Naive Bayes

Ahmet Taspinar explains the Naive Bayes classificiation algorithm and writes Python code to implement it:

Within Machine Learning many tasks are – or can be reformulated as – classification tasks.

In classification tasks we are trying to produce a model which can give the correlation between the input data $X$ and the class $C$ each input belongs to. This model is formed with the feature-values of the input-data. For example, the dataset contains datapoints belonging to the classes ApplesPears and Oranges and based on the features of the datapoints (weight, color, size etc) we are trying to predict the class.

Ahmet has his entire post saved as a Jupyter notebook.

Calling Cognitive Services With R

David Smith has written a go-to guide for connecting to Azure Cognitive Services using R:

There’s no official R package (yet!) for calling Cognitive Services APIs. But since every Cognitive Service API is just a standard REST API, we can use the httr package to call the API. Input and output is standard JSON, which we can create and extract using the jsonlite package.

(There’s also an independent R interface to the text APIs. And there are already Python SDKs for many of the services, including the Face API.)

This is also useful for other REST APIs for times when there isn’t already a pre-built package to do most of the translation work for you.

Machine Learning With R Q&A

Ginger Grant answers a series of questions about R and machine learning:

Question: Is it possible to run R processes in diffrent boxes other than SQL Server itself for scalability reasons?

You have the option of installing the R Server on another server. Just keep in mind that you do have to account for the additional overhead of moving all the data over the network, which needs to weigh in on your decision to move processing to a different server.

Click through for plenty more questions and answers.

Using Spark MLlib For Categorization

Taras Matyashovskyy uses Apache Spark MLlib to categorize songs in different genres:

The roadmap for implementation was pretty straightforward:

  • Collect the raw data set of the lyrics (~65k sentences in total):

    • Black Sabbath, In Flames, Iron Maiden, Metallica, Moonspell, Nightwish, Sentenced, etc.
    • Abba, Ace of Base, Backstreet Boys, Britney Spears, Christina Aguilera, Madonna, etc.
  • Create training set, i.e. label (0 for metal | 1 for pop) + features (represented as double vectors)

  • Train logistic regression that is the obvious selection for the classification

This is a supervised learning problem, and is pretty fun to walk through.

Machine Learning Algorithms In R

Ginger Grant has a list of machine learning algorithms and their implementations in R:

Often times determining which algorithm to use can take a while.  Here is a pretty good flowchart for determining which algorithm should be used given some examples of what the desired outcomes and data contain. The diagram lists the algorithms, which are implemented in Azure ML.  The same algorithms can be implemented in R.  In R there are libraries to help with nearly every task.  Here’s a list of libraries and their accompanying links which can be used in Machine Learning.  This list is no means comprehensive as there are libraries and functions other than the ones listed here, but if you are trying to write a Machine Learning Experiment in R, and are looking at the flowchart, these R functions and Libraries will provide the tools to do the types of Machine Learning Analysis listed.

I think algorithm determination is one of the most difficult parts of machine learning.  Even if you don’t mean to go there, the garden of forking paths is dangerous.


February 2017
« Jan