Analyzing Clickstream Data With Markov Chains

Eleni Markou shows one method of analyzing clickstream data:

We chose to use the third-order Markov Chain on the above-produced data, as:

  • The number of parameters needed for the chain’s representation remains manageable. As the order increases, the parameters necessary for the representation increase exponentially and thus managing them requires significant computational power.
  • As a rule of thumb, we would like at least half of the clickstreams to consist of as many clicks as the order of the Markov Chain that should be fitted. There is no point in selecting a third-order chain if the majority of the clickstream consists of two states and so there is no state three steps behind to take into consideration.

Fitting the Markov Chain model gives us transition probabilities matrices and the lambda parameters of the chain for each one of the three lags, along with the start and end probabilities.

This particular analysis is trying to understand which page (if any) a user will go to next when on a particular page.  Eleni uses additional techniques like k-means clustering to segment out particular groups of users.  Very interesting analysis.

Related Posts

Bias Correction In Standard Deviation Estimates

John Mount explains how to perform bias correction and explains why it happens so rarely in practice: The bias in question is falling off at a rate of 1/n (where n is our sample size). So the bias issue loses what little gravity it ever may have ever had when working with big data. Most sources of noise will […]

Read More

Explaining Neural Networks With H2O

Shirin Glander explains some of the concepts behind neural networks using H2O as a guide: Before, when describing the simple perceptron, I said that a result is calculated in a neuron, e.g. by summing up all the incoming data multiplied by weights. However, this has one big disadvantage: such an approach would only enable our neural net […]

Read More


September 2017
« Aug Oct »