David Robinson reports on a new Stack Exchange data set available to the public:
For each Stack Overflow question asked since the beginning of the site, the dataset includes:
- Question ID
- Creation date
- Closed date, if applicable
- Deletion date, if applicable
- Score
- Owner user ID (except for deleted questions)
- Number of answers
- Tags
This is ideal for performing analyses such as:
-
The increase or decrease in questions in each tag over time
-
Correlations among tags on questions
-
Which tags tend to get higher or lower scores
-
Which tags tend to be asked on weekends vs weekdays
-
Rates of question closure or deletion over time
-
The speed at which questions are closed or deleted
This is pretty exciting. Getting good, high-quality data sets for demonstration and pedagogical purposes is time-consuming, so the fact that the Stack Exchange people are tossing one out our way could be a major time-saver.