Making Data Sausage
New York Times Senior Software Architect Jacob Harris takes a deep look at how to work with data to tell a narrative. He does so by going step by step through his own analysis of US food safety from data sets of food recalls taken from the US Department of Agriculture.
While Harris defines himself as a computer scientist, rather than a journalist, he says the reporting process for each is much the same once he starts looking at data. Specifically, as he sets out he works on:
Gathering the data we need to tell a story
“Interviewing” the data to find its strengths and limitations
Finding the specific narratives in the data we want to share and can support with data
It’s this second step, the interviewing, I find most interesting. I also like his word choice. It’s much less marshal than the “interrogation” many use to describe the process.
Before jumping into his case study, Harris writes:

What do I mean by narrative? Narrative is what makes it data journalism. We could just put a large PDF or SQL dump online, but that’s not very informative to anyone but experts. The art is finding the stories in the data the way a sculptor finds a statue in the marble.

For the data-curious, give Harris a read. He moves from high level strategizing and understanding of how to analyze data, including what type of questions to ask of it during “the interview,” to getting down and dirty with Ruby on Rails examples of how to actually work with the data once scraped. In other words, there’s fun for the whole family here.
Related: Our interview with Bitly data chief Hilary Mason about her methodology for working with data.
Somewhat Related: Alex Williams, Techcrunch. Data Is Not Killing Creativity, It’s Just Changing How We Tell Stories.
Image: Sausage Making via Wikimedia Commons.

Making Data Sausage

New York Times Senior Software Architect Jacob Harris takes a deep look at how to work with data to tell a narrative. He does so by going step by step through his own analysis of US food safety from data sets of food recalls taken from the US Department of Agriculture.

While Harris defines himself as a computer scientist, rather than a journalist, he says the reporting process for each is much the same once he starts looking at data. Specifically, as he sets out he works on:

  1. Gathering the data we need to tell a story
  2. “Interviewing” the data to find its strengths and limitations
  3. Finding the specific narratives in the data we want to share and can support with data

It’s this second step, the interviewing, I find most interesting. I also like his word choice. It’s much less marshal than the “interrogation” many use to describe the process.

Before jumping into his case study, Harris writes:

What do I mean by narrative? Narrative is what makes it data journalism. We could just put a large PDF or SQL dump online, but that’s not very informative to anyone but experts. The art is finding the stories in the data the way a sculptor finds a statue in the marble.

For the data-curious, give Harris a read. He moves from high level strategizing and understanding of how to analyze data, including what type of questions to ask of it during “the interview,” to getting down and dirty with Ruby on Rails examples of how to actually work with the data once scraped. In other words, there’s fun for the whole family here.

Related: Our interview with Bitly data chief Hilary Mason about her methodology for working with data.

Somewhat Related: Alex Williams, Techcrunch. Data Is Not Killing Creativity, It’s Just Changing How We Tell Stories.

Image: Sausage Making via Wikimedia Commons.

Blog comments powered by Disqus
  1. nednotes reblogged this from journo-geekery
  2. chrischelberg reblogged this from futurejournalismproject
  3. cristinasalazar reblogged this from futurejournalismproject
  4. darthschlomo reblogged this from journo-geekery
  5. j-d4wg reblogged this from journo-geekery
  6. viewfromthebalcony reblogged this from poynterinstitute
  7. poynterinstitute reblogged this from futurejournalismproject and added:
    An interesting look at how to use data in writing and reporting.
  8. journo-geekery reblogged this from futurejournalismproject and added:
    Reblogging in full because this is a WAY better summary of Jacob’s very-helpful article. Thanks, FJP!
  9. weiii501 reblogged this from futurejournalismproject
  10. gabrielleawright reblogged this from futurejournalismproject
  11. recordshift reblogged this from fatmanatee
  12. dangerfieldnewby reblogged this from futurejournalismproject
  13. theslyestfox reblogged this from basquavita and added:
    that’s funny, because I’ve always thought of you as more of a sausage whisperer….
  14. mojothree reblogged this from futurejournalismproject
  15. basquavita reblogged this from fatmanatee and added:
    i’ve always called myself a data whisperer but this is could be an interesting direction to take things.
  16. reach1to1 reblogged this from futurejournalismproject and added:
    The art if finding stories in data!
  17. fatmanatee reblogged this from barthel and added:
    Data sausage!
  18. aurora reblogged this from barthel
  19. briclopedia reblogged this from futurejournalismproject
  20. elmerseason reblogged this from futurejournalismproject