StackLite Dataset

David Robinson reports on a new Stack Exchange data set available to the public:

For each Stack Overflow question asked since the beginning of the site, the dataset includes:

  • Question ID
  • Creation date
  • Closed date, if applicable
  • Deletion date, if applicable
  • Score
  • Owner user ID (except for deleted questions)
  • Number of answers
  • Tags

This is ideal for performing analyses such as:

  • The increase or decrease in questions in each tag over time

  • Correlations among tags on questions

  • Which tags tend to get higher or lower scores

  • Which tags tend to be asked on weekends vs weekdays

  • Rates of question closure or deletion over time

  • The speed at which questions are closed or deleted

This is pretty exciting.  Getting good, high-quality data sets for demonstration and pedagogical purposes is time-consuming, so the fact that the Stack Exchange people are tossing one out our way could be a major time-saver.

Related Posts

Tidy Anomaly Detection With Anomalize

Abdul Majed Raja walks us through an example using the anomalize package: One of the important things to do with Time Series data before starting with Time Series forecasting or Modelling is Time Series Decomposition where the Time series data is decomposed into Seasonal, Trend and remainder components. anomalize has got a function time_decompose() to perform the same. […]

Read More

Uploading Data Sets To Azure ML From R

Leila Etaati continues her series on the Azure ML R package by showing how to upload a data set: There is a function in AzureML package name “workspace” that creates a reference to an AzureML Studio workspace by getting the authentication token and workspace id as below: 1 ws <– workspace( id , auth  ) to […]

Read More

Categories

July 2016
MTWTFSS
« Jun Aug »
 123
45678910
11121314151617
18192021222324
25262728293031