Data Science Languages

Alessandro Piva provides preliminary metrics on language usage among self-described data scientists:

Programming is one of the five main competence areas at the base of the skill set for a Data Scientist, even if is not the most relevant in term of expertise (see What is the right mix of competences for Data Scientists?). Considering the results of the survey, that involved more than 200 Data Scientist worldwide until today, there isn’t a prevailing choice among the programming languages used during the data science’s activities. However, the choice appears to be addressed mainly to a limited set of alternatives: almost 96% of respondents affirm to use at least one of R, SQL or Python.

These results don’t surprise me much.  R has slightly more traction than Python, but the percentage of people using both is likely to increase.  SQL, meanwhile, is vital for getting data, and as we’re seeing in the Hadoop space, as data platform products get more mature, they tend to gravitate toward a SQL or SQL-like language.  Cf. Hive, Spark SQL, Phoenix, etc.

Related Posts

Connecting To Elasticsearch With R

Jerod Johnson has a sample of connecting to Elasticsearch with R: You will need the following information to connect to Elasticsearch as a JDBC data source: Driver Class: Set this to cdata.jdbc.elasticsearch.ElasticsearchDriver. Classpath: Set this to the location of the driver JAR. By default, this is the lib subfolder of the installation folder. The DBI functions, […]

Read More

Voice Control For Shiny Apps

Over at Jumping Rivers, an example of using a Javascript library to control a page using voice commands: I have found that performance across all devices and browsers is definitely not equal. By far the best browser I have found for viewing the apps is Google Chrome. I have also tended to find that my […]

Read More


December 2016
« Nov Jan »