Well, not just the Times, scientists are also digging through Wikipedia among many other sites.
Researchers at Microsoft and the Technion-Israel Institute of Technology are creating software that analyzes 22 years of New York Times archives, Wikipedia and about 90 other web resources to predict future disease outbreaks, riots and deaths — and hopefully prevent them.
The new research is the latest in a number of similar initiatives that seek to mine web data to predict all kinds of events. Recorded Future, for instance, analyzes news, blogs and social media to “help identify predictive signals” for a variety of industries, including financial services and defense. Researchers are also using Twitter and Google to track flu outbreaks.
Technology Review outlines how it can work.
The system provides striking results when tested on historical data. For example, reports of droughts in Angola in 2006 triggered a warning about possible cholera outbreaks in the country, because previous events had taught the system that cholera outbreaks were more likely in years following droughts. A second warning about cholera in Angola was triggered by news reports of large storms in Africa in early 2007; less than a week later, reports appeared that cholera had become established. In similar tests involving forecasts of disease, violence, and a significant numbers of deaths, the system’s warnings were correct between 70 to 90 percent of the time.
See Kira Radinsky and Eric Horvitz, Mining the Web to Predict Future Events (PDF).