The New York Times to revamp online archive
TimesMachine, a treasure trove of NY Times papers published between 1851 and 1922, just got a makeover. The news giant recently released the prototype with six issues, and gave details on the technology behind it (via NY Times):
In order to build the new TimesMachine, we repurposed technology and techniques from an unlikely quarter: geographic information systems. Every scanned issue of The Times is essentially one very large digital image. For instance, our scan of the June 20, 1969 issue is a 13.2 gigapixel image that weighs in at over 200 megabytes. Since it is impractical to transmit such an image to every interested user, we needed to find a way to send only those parts of the scanned paper that a user was actually interested in viewing. To solve this conundrum we turned to tiling, a solution often used to display online maps. With tiling, a large image is broken down into small tiles that are computed at several different zoom levels. When a user wishes to view the tiled image in a browser, only the tiles required to display the visible portion are downloaded. This approach dramatically reduces bandwidth requirements and has the further advantage of allowing users to zoom and drag the larger image.
As developing systems for the generation and display of tiled images from scratch would have been cost prohibitive, we are quite fortunate that there are a number of excellent open source libraries for just these purposes. For processing and tiling the scanned newspapers we relied on both GDAL andImageMagick. For the in-browser display of our tiled images we relied on theLeaflet mapping library. In addition to great software, we received much valuable guidance from the great people of both Geo NYC and CartoDB.
Images: Top - September 18, 1851 issue on current TimesMachine / Bottom - July 20, 1969 issue on prototype.
FJP: The Times calls it a “work-in-progress” and welcomes suggestions at email@example.com.