Data Science At Stack Overflow

David Robinson discusses his role as a data scientist at Stack Overflow:

The most prominent example of where machine learning is used in our product is Providence; our system for matching users to jobs they’ll be interested in. (For example, if you visit mostly Python and Javascript questions on Stack Overflow, you’ll end up getting Python web development jobs as advertisements). I work with engineers on the Data team (Kevin Montrose,Jason Punyon, and Nick Larsen) to design, improve and implement these machine learning algorithms. (Here’s some more about the architecture of the system, built before I joined). For example, we’ve worked to get the balance right between jobs that are close to a user geographically and jobs that are well-matched in terms of technology, and ensuring that users get a variety of jobs rather than seeing the same ones over and over.

A lot of this process involves designing and analyzing A/B tests, particularly about changing our targeting algorithms, ad design, and other factors to improve clickthrough rate (CTR). This process is more statistically interesting than I’d expected, in some cases letting me find new uses for methods I’d used to analyze biological experiments, and in other cases encouraging me to learn new statistical tools. In fact, much of my series on applying Bayesian methods to baseball batting statistics is actually a thinly-veiled version of methods I’ve used to analyze CTR across ad campaigns.

Sounds like a fun place to be.

Related Posts

The Basics Of PCA In R

Prashant Shekhar gives us an overview of Principal Component Analysis using R: PCA changes the axis towards the direction of maximum variance and then takes projection on this new axis. The direction of maximum variance is represented by Principal Components (PC1). There are multiple principal components depending on the number of dimensions (features) in the […]

Read More

Investigating The gcForest Algorithm

William Vorhies describes a new algorithm with strong potential: gcForest (multi-Grained Cascade Forest) is a decision tree ensemble approach in which the cascade structure of deep nets is retained but where the opaque edges and node neurons are replaced by groups of random forests paired with completely-random tree forests.  In this case, typically two of […]

Read More

Categories

June 2016
MTWTFSS
« May Jul »
 12345
6789101112
13141516171819
20212223242526
27282930