Making Data Sausage
New York Times Senior Software Architect Jacob Harris takes a deep look at how to work with data to tell a narrative. He does so by going step by step through his own analysis of US food safety from data sets of food recalls taken from the US Department of Agriculture.
While Harris defines himself as a computer scientist, rather than a journalist, he says the reporting process for each is much the same once he starts looking at data. Specifically, as he sets out he works on:
- Gathering the data we need to tell a story
- “Interviewing” the data to find its strengths and limitations
- Finding the specific narratives in the data we want to share and can support with data
It’s this second step, the interviewing, I find most interesting. I also like his word choice. It’s much less marshal than the “interrogation” many use to describe the process.
Before jumping into his case study, Harris writes:
What do I mean by narrative? Narrative is what makes it data journalism. We could just put a large PDF or SQL dump online, but that’s not very informative to anyone but experts. The art is finding the stories in the data the way a sculptor finds a statue in the marble.
For the data-curious, give Harris a read. He moves from high level strategizing and understanding of how to analyze data, including what type of questions to ask of it during “the interview,” to getting down and dirty with Ruby on Rails examples of how to actually work with the data once scraped. In other words, there’s fun for the whole family here.
Related: Our interview with Bitly data chief Hilary Mason about her methodology for working with data.
Somewhat Related: Alex Williams, Techcrunch. Data Is Not Killing Creativity, It’s Just Changing How We Tell Stories.
Image: Sausage Making via Wikimedia Commons.