David Robinson reports on a new Stack Exchange data set available to the public:
For each Stack Overflow question asked since the beginning of the site, the dataset includes:
- Question ID
- Creation date
- Closed date, if applicable
- Deletion date, if applicable
- Score
- Owner user ID (except for deleted questions)
- Number of answers
- Tags
This is ideal for performing analyses such as:
The increase or decrease in questions in each tag over time
Correlations among tags on questions
Which tags tend to get higher or lower scores
Which tags tend to be asked on weekends vs weekdays
Rates of question closure or deletion over time
The speed at which questions are closed or deleted
This is pretty exciting. Getting good, high-quality data sets for demonstration and pedagogical purposes is time-consuming, so the fact that the Stack Exchange people are tossing one out our way could be a major time-saver.