Data Science Languages

Alessandro Piva provides preliminary metrics on language usage among self-described data scientists:

Programming is one of the five main competence areas at the base of the skill set for a Data Scientist, even if is not the most relevant in term of expertise (see What is the right mix of competences for Data Scientists?). Considering the results of the survey, that involved more than 200 Data Scientist worldwide until today, there isn’t a prevailing choice among the programming languages used during the data science’s activities. However, the choice appears to be addressed mainly to a limited set of alternatives: almost 96% of respondents affirm to use at least one of R, SQL or Python.

These results don’t surprise me much.  R has slightly more traction than Python, but the percentage of people using both is likely to increase.  SQL, meanwhile, is vital for getting data, and as we’re seeing in the Hadoop space, as data platform products get more mature, they tend to gravitate toward a SQL or SQL-like language.  Cf. Hive, Spark SQL, Phoenix, etc.

Related Posts

Python versus R (Again)

Alex Woodie looks at whether Python is dominating R in the data science space: There is some evidence that Python’s popularity is hurting R usage. According to the TIOBE Index, Python is currently the third most popular language in the world, behind perennial heavyweights Java and C. From August 2018 to August 2019, Python usage surged […]

Read More

Z-Tests vs T-Tests

Stephanie Glen has a picture which explains the difference between a Z-test and a T-test: The following picture shows the differences between the Z Test and T Test. Not sure which one to use? Find out more here: T-Score vs. Z-Score. Click through for the picture.

Read More


December 2016
« Nov Jan »