David Smith notes that there are several data sets that Microsoft Research has made available:
Other data sets of note include:
A collection of 38M tweets related to the 2012 US election
3-D capture data from individuals performing a variety of hand gestures
Infer.NET, a framework for running Bayesian inference in graphical models
Images for 1 million celebrities, and associated tags
MS MARCO, is a new large-scale dataset for reading comprehension and question answering
Click through for more information, and then check out the data sets.