Kaggle Data Science Report For 2017

Mark McDonald rounds up a few notebooks covering a recent Kaggle survey:

In 2017 we conducted our first ever extra-large, industry-wide survey to captured the state of data science and machine learning.

As the data science field booms, so has our community. In 2017 we hit a new milestone of reaching over 1M registered data scientists from almost every country in the world. Representing many different backgrounds, skill levels, and professions, we were excited to ask our community a wide range of questions about themselves, their skills, and their path to data science. We asked them everything from “what’s your yearly salary?” to “what’s your favorite data science podcasts?” to “what barriers are faced at work?”, letting us piece together key insights about the people and the trends behind the machine learning models.

Without further ado, we’d love to share everything with you. Over 16,000 responses surveys were submitted, with over 6 full months of aggregated time spent completing it (an average response time of more than 16 minutes).

Click through for a few reports.  Something interesting to me is that the top languages/tools were, in order, Python, R, and SQL.  For the particular market niche that Kaggle competitions fit, that makes a lot of sense:  I tend to like R more for data exploration and data cleansing, but much of that work is already done by the time you get the dataset.

Related Posts

The Intuition Behind Principal Component Analysis

Holger von Jouanne-Diedrich gives us an intuition behind how principal component analysis (PCA) works: Principal component analysis (PCA) is a dimension-reduction method that can be used to reduce a large set of (often correlated) variables into a smaller set of (uncorrelated) variables, called principal components, which still contain most of the information.PCA is a concept […]

Read More

Plotting Diagrams In R With nest() And map()

Sebastian Sauer shows how to display multiple ggplot2 diagrams together using facets as well as a combination of the nest() and map() functions: One simple way is to plot several facets according to the grouping variable: d %>% ggplot() + aes(x = hp, y = mpg) + geom_point() + facet_wrap(~ cyl) Faceting is great, but it’s good to know […]

Read More


October 2017
« Sep Nov »