The New York Times to revamp online archive

TimesMachine, a treasure trove of NY Times papers published between 1851 and 1922, just got a makeover. The news giant recently released the prototype with six issues, and gave details on the technology behind it (via NY Times):

In order to build the new TimesMachine, we repurposed technology and techniques from an unlikely quarter: geographic information systems. Every scanned issue of The Times is essentially one very large digital image. For instance, our scan of the June 20, 1969 issue is a 13.2 gigapixel image that weighs in at over 200 megabytes. Since it is impractical to transmit such an image to every interested user, we needed to find a way to send only those parts of the scanned paper that a user was actually interested in viewing. To solve this conundrum we turned to tiling, a solution often used to display online maps. With tiling, a large image is broken down into small tiles that are computed at several different zoom levels. When a user wishes to view the tiled image in a browser, only the tiles required to display the visible portion are downloaded. This approach dramatically reduces bandwidth requirements and has the further advantage of allowing users to zoom and drag the larger image.

As developing systems for the generation and display of tiled images from scratch would have been cost prohibitive, we are quite fortunate that there are a number of excellent open source libraries for just these purposes. For processing and tiling the scanned newspapers we relied on both GDAL andImageMagick. For the in-browser display of our tiled images we relied on theLeaflet mapping library. In addition to great software, we received much valuable guidance from the great people of both Geo NYC and CartoDB.

Images: Top - September 18, 1851 issue on current TimesMachine /  Bottom - July 20, 1969 issue on prototype.

FJP: The Times calls it a “work-in-progress” and welcomes suggestions at timesmachine@nytimes.com

Blog comments powered by Disqus
  1. ansieh reblogged this from futurejournalismproject and added:
    Again, I’m delayed in posting this, but I still wanted to be sure it made it up.
  2. susurrantpetrichor reblogged this from silencewhippersnapper
  3. chrischelberg reblogged this from laura-in-libraryland
  4. queenkitsch reblogged this from thehannahmachine and added:
    Ooo
  5. thehannahmachine reblogged this from laura-in-libraryland
  6. archivistic reblogged this from laura-in-libraryland
  7. senzamegafono reblogged this from futurejournalismproject
  8. maneatingbadger reblogged this from futurejournalismproject
  9. hbyn reblogged this from futurejournalismproject
  10. everbright-mourning reblogged this from futurejournalismproject
  11. noodlesandbrain reblogged this from futurejournalismproject
  12. silencewhippersnapper reblogged this from duckyshepherd
  13. duckyshepherd reblogged this from laura-in-libraryland
  14. nathancushing reblogged this from futurejournalismproject
  15. givememyshoe reblogged this from futurejournalismproject and added:
    Ooooo! This is very cool. I remember going to the Library as a kid and using the microfiche machines to look through old...
  16. zabotage reblogged this from futurejournalismproject
  17. spj-norcal reblogged this from futurejournalismproject
  18. dimwen reblogged this from futurejournalismproject
  19. dangerfieldnewby reblogged this from futurejournalismproject
  20. dunkirkfreelibrary reblogged this from laura-in-libraryland
  21. alexithymiadaily reblogged this from futurejournalismproject
  22. thebesthannah reblogged this from futurejournalismproject