We chose to use the third-order Markov Chain on the above-produced data, as:
- The number of parameters needed for the chain’s representation remains manageable. As the order increases, the parameters necessary for the representation increase exponentially and thus managing them requires significant computational power.
- As a rule of thumb, we would like at least half of the clickstreams to consist of as many clicks as the order of the Markov Chain that should be fitted. There is no point in selecting a third-order chain if the majority of the clickstream consists of two states and so there is no state three steps behind to take into consideration.
Fitting the Markov Chain model gives us transition probabilities matrices and the lambda parameters of the chain for each one of the three lags, along with the start and end probabilities.
This particular analysis is trying to understand which page (if any) a user will go to next when on a particular page. Eleni uses additional techniques like k-means clustering to segment out particular groups of users. Very interesting analysis.