Press "Enter" to skip to content

Category: Data Science

Comparing Gradient Descent to the Normal Equation for Small Data Sets

Pushkara Sharma compares two techniques for regression:

In this article, we will see the actual difference between gradient descent and the normal equation in a practical approach. Most of the newbie machine learning enthusiasts learn about gradient descent during the linear regression and move further without even knowing about the most underestimated Normal Equation that is far less complex and provides very good results for small to medium size datasets.

If you are new to machine learning, or not familiar with a normal equation or gradient descent, don’t worry I’ll try my best to explain these in layman’s terms. So, I will start by explaining a little about the regression problem.

I was surprised by the results.

Comments closed

Understanding the Bayesian Nature of Kalman Filters

Holger von Jouanne-Diedrich gives us an interesting interpretation of Kalman filters:

The Kalman filter is a very powerful algorithm to optimally include uncertain information from a dynamically changing system to come up with the best educated guess about the current state of the system. Applications include (car) navigation and stock forecasting. If you want to understand how a Kalman filter works and build a toy example in R, read on!

The following post is based on the post “Das Kalman-Filter einfach erklärt” which is written in German and uses Matlab code (so basically two languages nobody is interested in any more 😉 ). This post is itself based on an online course “Artificial Intelligence for Robotics” by my colleague Professor Sebastian Thrun of Standford University.

In fairness, I regret only one thing about learning German: that I’ve forgotten so much over the years.

Comments closed

The Basics of Autoregressive Models

Holger von Jouanne-Diedrich explains some of the principels of autoregressive models through a demonstration:

Well, this seems to be good news for the sales team: rising sales! Yet, how does this model arrive at those numbers? To understand what is going on we will now rebuild the model. Basically, everything is in the name already: auto-regressive, i.e. a (linear) regression on (a delayed copy of) itself (auto from Ancient Greek self)!

So, what we are going to do is create a delayed copy of the time series and run a linear regression on it. We will use the lm() function from base R for that (see also Learning Data Science: Modelling Basics).

Read on for some additional understanding.

Comments closed

Using INLA for Spatial Regression in R

Lionel Hertzog continues a series on spatial regression:

INLA is a package that allows to fit a broad range of model, it uses Laplace approximation to fit Bayesian models much, much faster than algorithms such as MCMC. INLA allows for fitting geostatistical models via stochastic partial differential equation (SPDE), a good place for more background informations on this are these two gitbooks: spde-gitbook and inla-gitbook.

This is not the gentlest introduction, so if you’re new to the concept go back and read part 1.

Comments closed

The Basics of A/B Testing with R

Holger von Jouanne-Diedrich walks us through a simple example of A/B testing and analysis using R:

The bad news is, that you have to understand a little bit about statistical hypothesis testing, the good news is that if you read the following post, you have everything you need (plus, as an added bonus R has all the tools you need already at hand!): From Coin Tosses to p-Hacking: Make Statistics Significant Again! (ok, reading it would make it over one minute…).

Check out that article and the example in the blog post as well. R makes it really easy to perform this sort of analysis.

Comments closed

Using Random Cut Forests for Anomaly Detection

Chris Swierczewski and Lai Jiang have an example of using Random Cut Forests to perform anomaly detection against a dataset stored in Amazon Elasticsearch Service:

Based on these constraints and performance results from internal and publicly available benchmarks across many data domains, we chose the RCF algorithm for computing anomaly scores in data streams.

But this begs the question: How large of an anomaly score is large enough to declare the corresponding data point as an anomaly? The anomaly detector uses a thresholding model to answer this question. This thresholding model combines information from the anomaly scores observed thus far and certain mathematical properties of RCFs. This hybrid information approach allows the model to make anomaly predictions with a low false positive rate when relatively little data has been observed, and effectively adapts to the data in the long run. The model constructs an efficient sketch of the anomaly score distribution using the KLL Quantile Sketch algorithm. For more information, see Optimal Quantile Approximation in Streams.

The linked post is more of an explanation of process than a tutorial, but it’s interesting in seeing how different approaches can find anomalies at different rates.

Comments closed

Portfolio Optimization with SAS and Python

Sophia Rowland shows off the sastopypackage:

I started by declaring my parameters and sets, including my risk threshold, my stock portfolio, the expected return of my stock portfolio, and covariance matrix estimated using the shrinkage estimator of Ledoit and Wolf(2003). I will use these pieces of information in my objective function and constraints. Now I will need SWAT, sasoptpy, and my optimization model object.

Read on for a demo.

Comments closed

Avoiding Overfitting and Underfitting in Neural Networks

Manas Narkar provides some advice on optimizing neural network models:

Adding Dropout

Dropout is considered as one of the most effective regularization methods. Dropout is basically randomly zero-ing or dropping out features from your layer during the training process, or introducing some noise in the samples. The key thing to note is that this is only applied at training time. At test time, no values are dropped out. Instead, they are scaled. The typical dropout rate is between 0.2 to 0.5.

Click through for a demo on dropout, as well as coverage of several other techniques.

Comments closed

Fun with Randomness

Holger von Jouanne-Diedrich takes us through some random thoughts:

Many a student thinks that the first pic was created by some underlying pattern (because of its points clumping together in some areas while leaving others empty) and that the second one is “more” random. The truth is that technically both are not random (but only pseudo-random) but the first resembles “true” randomness more closely while the second is a low-discrepancy sequence.

While coming to the point of pseudo-randomness in a moment “true” randomness may appear to have a tendency to occur in clusters or clumps (technically called Poisson clumping). This is the effect seen (or shall I say heard) in the iPod shuffling function. Apple changed it to a more regular behaviour (in the spirit of the second picture)… which was then perceived to be more random (as with my students)!

Read the whole thing.

Comments closed