Split and compare quantiles
This parameter is the easiest to sell to the C-level guys. “Did you know that with this model, if we chop the worst 20% of leads we would have avoided 60% of the frauds and only lose 8% of our sales?” That’s what this plot will give you.
The math behind the plot might be a bit foggy for some readers so let me try and explain further: if you sort from the lowest to the highest score all your observations / people / leads, then you can literally, for instance, select the top 5 or bottom 15% or so. What we do now is split all those “ranked” rows into similar-sized-buckets to get the best bucket, the second best one, and so on. Then, if you split all the “Goods” and the “Bads” into two columns, keeping their buckets’ colours, we still have it sorted and separated, right? To conclude, if you’d say that the worst 20% cases (all from the same worst colour and bucket) were to take an action, then how many of each label would that represent on your test set? There you go!
Read on to see what else he uses and how you can build it yourself.
Our example in the video is a simple Keras network, modified from Keras Model Examples, that creates a simple multi-layer binary classification model with a couple of hidden and dropout layers and respective activation functions. Binary classification is a common machine learning task applied widely to classify images or text into two classes. For example, an image is a cat or dog; or a tweet is positive or negative in sentiment; and whether mail is spam or not spam.
But the point here is not so much to demonstrate a complex neural network model as to show the ease with which you can develop with Keras and TensorFlow, log an MLflow run, and experiment—all within PyCharm on your laptop.
Click through for the video and explanation of the process.
As people, we’re used to being able to classify anything we see in the world around us, and we naturally expect machines to have the same ability. Most models are only trained to recognize a very limited set of objects though, such as the 1,000 categories of the original ImageNet competition. Crucially, the training process makes the assumption that every example the model sees is one of those objects, and the prediction must be within that set. There’s no option for the model to say “I don’t know”, and there’s no training data to help it learn that response. This is a simplification that makes sense within a research setting, but causes problems when we try to use the resulting models in the real world.
Back when I was at Jetpac, we had a lot of trouble convincing people that the ground-breaking AlexNet model was a big leap forward because every time we handed over a demo phone running the network, they would point it at their faces and it would predict something like “Oxygen mask” or “Seat belt”. This was because the ImageNet competition categories didn’t include any labels for people, but most of the photos with mask and seatbelt labels included faces along with the objects. Another embarrassing mistake came when they would point it at a plate and it would predict “Toilet seat”! This was because there were no plates in the original categories, and the closest white circular object in appearance was a toilet.
Read the whole thing.
As we have also seen in the previous blog posts, our Neural Network consists of a
tf.Graph()contains all of the computational steps required for the Neural Network, and the
tf.Sessionis used to execute these steps.
The computational steps defined in the
tf.Graphcan be divided into four main parts;
We initialize placeholders which are filled with batches of training data during the run.
We define the RNN model and to calculate the output values (logits)
The logits are used to calculate a loss value, which then
is used in an Optimizer to optimize the weights of the RNN.
As a lazy casual, I’ll probably stick with letting Keras do most of the heavy lifting.
F# is a scripting as well as a REPL language. REPL comes from Read-Eval-Print Loop, which means that the language processes single steps one at a time like reading the user inputs (usually expressions), evaluating their values and, in the end, returning the result to the same user. All that happens in a loop until the loop ends. Visual Studio provides a great F# Interactive view that runs the scripts in REPL mode and shows the results. Take the following Hello World example:
let hello = “Hello World”
This code just creates a single variable (let keyword) and assigns a string value to it. When you run this code (select all the code text and press Alt + Enter), you’ll see the following result in the F# Interactive window (Figure 5):
You can also use C# with Accord.NET, but there’s a strong bias toward F# among people in the .NET space who work with ML, for the same reason that there’s a bias toward Scala over Java for Spark developers: the functional programming paradigm works extremely well with mathematical concepts. Also, in addition to Accord.NET, you might also want to check out Math.NET. My experience has been that this package tends to be a bit faster than Accord.
The segmentation of an image into superpixels are an important step in generating explanations for image models. It is both important that the segmentation is correct and follows meaningful patterns in the picture, but also that the size/number of superpixels are appropriate. If the important features in the image are chopped into too many segments the permutations will probably damage the picture beyond recognition in almost all cases leading to a poor or failing explanation model. As the size of the object of interest is varying it is impossible to set up hard rules for the number of superpixels to segment into – the larger the object is relative to the size of the image, the fewer superpixels should be generated. Using plot_superpixels it is possible to evaluate the superpixel parameters before starting the time-consuming explanation function.
Fun stuff. I’m glad that there’s a lot of work going into explaining neural networks rather than hand-waving them off as magic.
Topic Modeling is used to organize a corpus of documents into “topics” which is a grouping based on a statistical distribution of words within the documents themselves. Amazon Comprehend, our fully managed text analytics service, provides a pre-configured topic modeling API that is best suited for the most popular use cases like organizing customer feedback, support incidents or workgroup documents. Amazon Comprehend is the suggested topic modeling choice for customers as it removes a lot of the most routine steps associated with topic modeling like tokenization, training a model and adjusting parameters. Amazon SageMaker’s Neural Topic Model (NTM) caters to the use cases where a finer control of the training, optimization, and/or hosting of a topic model is required, such as training models on text corpus of particular writing style or domain, or hosting topic models as part of a web application. While Amazon SageMaker NTM provides a starting point of state-of-the-art topic modeling, customers have the flexibility to modify the network architecture as well as hyperparameters to accommodate the idiosyncrasies of their data sets as well as to tune the trade-off between a multitude of metrics such as document modeling accuracy, human interpretability and granularity of the learned topics, based on their applications. In addition, Amazon SageMaker NTM leverages the full power of the Amazon SageMaker platform: easily configurable training and hosting infrastructure, automatic hyperparameter optimization, and fully-managed hosting with auto-scaling.
They walk through the entire topic modeling process, so check it out.
In this tutorial, we will train a machine learning model for predicting numbers in pictures. Our goal is not to design a world-class complex model (although we will give you the source code to implement first-rate predictive models later). Rather, this tutorial is to introduce how to use TensorFlow. So, we start here with a very simple mathematical model called Softmax Regression.
The implementation code for this tutorial is short, and the really interesting content is only contained in three lines of code. However, it is very important to understand the design ideas contained in these codes: the basic concepts of TensorFlow workflow and machine learning. Therefore, this tutorial will explain in detail the implementation of these codes.
This is about as easy as it gets with neural networks, but easy doesn’t mean wrong.
A summary of the paper is:
We present a very simple, informal mathematical argument that neural networks (NNs) are in essence polynomial regression (PR). We refer to this as NNAEPR.
NNAEPR implies that we can use our knowledge of the “old-fashioned” method of PR to gain insight into how NNs — widely viewed somewhat warily as a “black box” — work inside.
One such insight is that the outputs of an NN layer will be prone to multicollinearity, with the problem becoming worse with each successive layer. This in turn may explain why convergence issues often develop in NNs. It also suggests that NN users tend to use overly large networks.
NNAEPR suggests that one may abandon using NNs altogether, and simply use PR instead.
We investigated this on a wide variety of datasets, and found that in every case PR did as well as, and often better than, NNs.
We have developed a feature-rich R package, polyreg, to facilitate using PR in multivariate settings.
The paper and presentation slides are ungated, so check it out. H/T R-bloggers
I’ve heard about a number of consulting companies, that decided to use simple linear model instead of a black box model with higher performance, because ,,client wants to understand factors that drive the prediction’’.
And usually the discussion goes as following: ,,We have tried LIME for our black-box model, it is great, but it is not working in our case’’, ,,Have you tried other explainers?’’, ,,What other explainers’’?
So here you have a map of different visual explanations for black-box models.