Press "Enter" to skip to content

Category: Machine Learning

When Image Classifiers Look At Unknown Objects

Pete Warden explains that image classifiers aren’t magic:

As people, we’re used to being able to classify anything we see in the world around us, and we naturally expect machines to have the same ability. Most models are only trained to recognize a very limited set of objects though, such as the 1,000 categories of the original ImageNet competition. Crucially, the training process makes the assumption that every example the model sees is one of those objects, and the prediction must be within that set. There’s no option for the model to say “I don’t know”, and there’s no training data to help it learn that response. This is a simplification that makes sense within a research setting, but causes problems when we try to use the resulting models in the real world.

Back when I was at Jetpac, we had a lot of trouble convincing people that the ground-breaking AlexNet model was a big leap forward because every time we handed over a demo phone running the network, they would point it at their faces and it would predict something like “Oxygen mask” or “Seat belt”. This was because the ImageNet competition categories didn’t include any labels for people, but most of the photos with mask and seatbelt labels included faces along with the objects. Another embarrassing mistake came when they would point it at a plate and it would predict “Toilet seat”! This was because there were no plates in the original categories, and the closest white circular object in appearance was a toilet.

Read the whole thing.

Comments closed

Building Recurrent Neural Networks Using TensorFlow

Ahmet Taspinar walks us through creating a recurrent neural network topology using TensorFlow:

As we have also seen in the previous blog posts, our Neural Network consists of a tf.Graph() and a tf.Session(). The tf.Graph() contains all of the computational steps required for the Neural Network, and the tf.Session is used to execute these steps.

The computational steps defined in the tf.Graph can be divided into four main parts;

  1. We initialize placeholders which are filled with batches of training data during the run.

  2. We define the RNN model and to calculate the output values (logits)

  3. The logits are used to calculate a loss value, which then

  4. is used in an Optimizer to optimize the weights of the RNN.

As a lazy casual, I’ll probably stick with letting Keras do most of the heavy lifting.

Comments closed

Machine Learning With F#

Diogo Souza gives us an introduction to using Accord.NET in F#:

F# is a scripting as well as a REPL language. REPL comes from Read-Eval-Print Loop, which means that the language processes single steps one at a time like reading the user inputs (usually expressions), evaluating their values and, in the end, returning the result to the same user. All that happens in a loop until the loop ends. Visual Studio provides a great F# Interactive view that runs the scripts in REPL mode and shows the results. Take the following Hello World example:

This code just creates a single variable (let keyword) and assigns a string value to it. When you run this code (select all the code text and press Alt + Enter), you’ll see the following result in the F# Interactive window (Figure 5):

You can also use C# with Accord.NET, but there’s a strong bias toward F# among people in the .NET space who work with ML, for the same reason that there’s a bias toward Scala over Java for Spark developers:  the functional programming paradigm works extremely well with mathematical concepts.  Also, in addition to Accord.NET, you might also want to check out Math.NET.  My experience has been that this package tends to be a bit faster than Accord.

Comments closed

Using LIME To Explain Keras Models

Shirin Glander shows us how to use the LIME package to explain image recognition models built from Keras:

The segmentation of an image into superpixels are an important step in generating explanations for image models. It is both important that the segmentation is correct and follows meaningful patterns in the picture, but also that the size/number of superpixels are appropriate. If the important features in the image are chopped into too many segments the permutations will probably damage the picture beyond recognition in almost all cases leading to a poor or failing explanation model. As the size of the object of interest is varying it is impossible to set up hard rules for the number of superpixels to segment into – the larger the object is relative to the size of the image, the fewer superpixels should be generated. Using plot_superpixels it is possible to evaluate the superpixel parameters before starting the time-consuming explanation function.

Fun stuff.  I’m glad that there’s a lot of work going into explaining neural networks rather than hand-waving them off as magic.

Comments closed

Neural Topic Models On Amazon SageMaker

David Ping, et al, show off topic modeling on Amazon SageMaker:

Topic Modeling is used to organize a corpus of documents into “topics” which is a grouping based on a statistical distribution of words within the documents themselves. Amazon Comprehend, our fully managed text analytics service, provides a pre-configured topic modeling API that is best suited for the most popular use cases like organizing customer feedback, support incidents or workgroup documents. Amazon Comprehend is the suggested topic modeling choice for customers as it removes a lot of the most routine steps associated with topic modeling like tokenization, training a model and adjusting parameters. Amazon SageMaker’s Neural Topic Model (NTM) caters to the use cases where a finer control of the training, optimization, and/or hosting of a topic model is required, such as training models on text corpus of particular writing style or domain, or hosting topic models as part of a web application. While Amazon SageMaker NTM provides a starting point of state-of-the-art topic modeling, customers have the flexibility to modify the network architecture as well as hyperparameters to accommodate the idiosyncrasies of their data sets as well as to tune the trade-off between a multitude of metrics such as document modeling accuracy, human interpretability and granularity of the learned topics, based on their applications. In addition, Amazon SageMaker NTM leverages the full power of the Amazon SageMaker platform: easily configurable training and hosting infrastructure, automatic hyperparameter optimization, and fully-managed hosting with auto-scaling.

They walk through the entire topic modeling process, so check it out.

Comments closed

Solving A Problem In TensorFlow Using SoftMax

Kiran Gutha gives us a fairly simple solution to the MNIST digit data set using the SoftMax algorithm:

In this tutorial, we will train a machine learning model for predicting numbers in pictures. Our goal is not to design a world-class complex model (although we will give you the source code to implement first-rate predictive models later). Rather, this tutorial is to introduce how to use TensorFlow. So, we start here with a very simple mathematical model called Softmax Regression.

The implementation code for this tutorial is short, and the really interesting content is only contained in three lines of code. However, it is very important to understand the design ideas contained in these codes: the basic concepts of TensorFlow workflow and machine learning. Therefore, this tutorial will explain in detail the implementation of these codes.

This is about as easy as it gets with neural networks, but easy doesn’t mean wrong.

Comments closed

Neural Networks Are Polynomial Regression

Norman Matloff announces a new paper:

A summary of the paper is:

  • We present a very simple, informal mathematical argument that neural networks (NNs) are in essence polynomial regression (PR). We refer to this as NNAEPR.

  • NNAEPR implies that we can use our knowledge of the “old-fashioned” method of PR to gain insight into how NNs — widely viewed somewhat warily as a “black box” — work inside.

  • One such insight is that the outputs of an NN layer will be prone to multicollinearity, with the problem becoming worse with each successive layer. This in turn may explain why convergence issues often develop in NNs. It also suggests that NN users tend to use overly large networks.

  • NNAEPR suggests that one may abandon using NNs altogether, and simply use PR instead.

  • We investigated this on a wide variety of datasets, and found that in every case PR did as well as, and often better than, NNs.

  • We have developed a feature-rich R package, polyreg, to facilitate using PR in multivariate settings.

The paper and presentation slides are ungated, so check it out.  H/T R-bloggers

Comments closed

Using DALEX To Explain Black-Box Models

Przemyslaw Biecek explains that there’s more than LIME for explaining black-box models:

I’ve heard about a number of consulting companies, that decided to use simple linear model instead of a black box model with higher performance, because ,,client wants to understand factors that drive the prediction’’.
And usually the discussion goes as following: ,,We have tried LIME for our black-box model, it is great, but it is not working in our case’’, ,,Have you tried other explainers?’’, ,,What other explainers’’?

So here you have a map of different visual explanations for black-box models.

Check out DALEX, which includes a Jupyter notebook example.  H/T R-Bloggers

Comments closed

Comparing Keras In Python Versus R

Dmitry Kisler performs image classification using Keras in both Python and R:

From the plots above, one can see that:

  • the accuracy of your model doesn’t depend on the language you use to build and train it (the plot shows only train accuracy, but the model doesn’t have high variance and the bias accuracy is around 99% as well).

  • even though 10 measurements may be not convincing, but Python would reduce (by up to 15%) the time required to train your CNN model. This is somewhat expected because R uses Python under the hood when executes Keras functions.

This is just one example, but the results are about what I’d expect.

Comments closed

Auto-Encoders And KernelML

Rohan Kotwani gives us an example where KernelML might be better than TensorFlow or PyTorch:

So what’s the point of using KernelML?

1. The parameters in each layer can be non-linear
2. Each parameter can be sampled from a different random distribution
3. The parameters can be transformed to meet certain constraints
4. Network combinations are defined in terms of numpy operations
5. Parameters are probabilistically updated
6. Each parameter update samples the loss function around a local or global minima

KerneML Specs

KernelMLis brute force optimizer that can be used to train machine learning algorithms. The package uses a combination of a machine learning and monte carlo simulations to optimize a parameter vector with a user defined loss function. Using kernelml creates a high computational cost for large complex networks because it samples the loss function using a subspace for each parameter in the parameter vector which requires many random simulations. The computational cost was reduced by enabling parallel computations with the ipyparallel. The decision to use this package was made because it effectively utilizes the cores on a machine.

It’s an interesting use case, though I would have liked to have seen a direct comparison to other frameworks.

Comments closed