Using word2vec To Model User Behavior

Nishan Subedi walks us through an Etsy project to model user journeys via semantic embedding techniques:

We initially started training the embeddings as a Skip-gram model with negative sampling (NEG as outlined in the original word2vec paper) method. The Skip-gram model performs better than the Continuous Bag Of Words (CBOW) model for larger vocabularies. It models the context given a target token and attempts to maximize the average likelihood of seeing any of the context tokens given a target token. The negative sampling draws a negative token from the entire corpus with a frequency that is directly proportional to the frequency of the token appearing in the corpus.

Training a Skip-gram model on only randomly selected negatives, however, ignores implicit contextual signals that we have found to be indicative of user preference in other contexts. For example, if a user clicks on the second item for a search query, the user most likely saw, but did not like, the first item that showed up in the search results. We extend the Skip-gram loss function by appending these implicit negative signals to the Skip-gram loss directly.

Similarly, we consider the purchased item in a particular session to be a global contextual token that applies to the entire sequence of user interactions. The intuition behind this is that there are many touch points on the user’s journey that help them come to the final purchase decision, and so we want to share the purchase intent across all the different actions that they took. This is also referred to as the linear multi-touch attribution model.

This is a very interesting article, and their attempt at getting around the problem of unexpected explosive growth in demand.

Related Posts

Dealing With Multicollinearity With R

Chaitanya Sagar explains the concept of multicollinearity in linear regressions and how we can mitigate this issue in R: Perfect multicollinearity occurs when one independent variable is an exact linear combination of other variables. For example, you already have X and Y as independent variables and you add another variable, Z = a*X + b*Y, […]

Read More

Principal Component Analysis With Faces

Mic at The Beginner Programmer shows us how to creepy PCA diagrams with human faces: PCA looks for a new the reference system to describe your data. This new reference system is designed in such a way to maximize the variance of the data across the new axis. The first principal component accounts for as […]

Read More

Categories

July 2018
MTWTFSS
« Jun Aug »
 1
2345678
9101112131415
16171819202122
23242526272829
3031