Words and phrases are fundamental building blocks of language and culture, much as genes and cells are to the biology of life. And words are how we express ideas, so tracing their origin, development and spread is not merely an academic pursuit but a window into a society’s intellectual evolution.
In the report, Twitter said that, worldwide, it received 1,858 requests from governments for information about users in 2012, as well as 6,646 reports of copyright violations, and 48 demands from governments that content they deem illegal be removed.
I say that news organizations should become advocates for open information, demanding that government not only make more of it available but also put it in standard formats so it can be searched, visualized, analyzed, and distributed. What the value of that information is to society is not up to the gatekeepers — officials or journalists — to decide. It is up to the public.
Jeff Jarvis, BuzzMachine. Public is public… except in journalism?
While the above quote may stand on its own, a little context: not everyone liked the map of gun permit owners that was published in the aftermath of the Sandy Hook shooting. Jarvis believes that the decision of whether or not the map is morally sound belongs to the public — not to journalists.
Other media thinkers have said otherwise. The Times’ David Carr argued yesterday that the map, which showed the addresses of gun permit owners in New York’s Westechester and Rockland counties, isn’t journalism.
Well, is it?
While Twitter’s Turks will help bring much-needed context to the platform, they’re not journalists who verify whether something is true. As we’ve seen with the shootings in Newtown, Connecticut and Superstorm Sandy, Twitter rumors ran rampant. Some rumors turned out to be true, but many were inaccurate or even malicious. Some were important, others were trivial. At Breaking News, we rely on experienced journalists (that’s one of them, Stephanie Clary, above) to verify real-time reports and prioritize their importance. We also add context, associating reports with ongoing stories, topics and locations. But accuracy and importance — along with speed — are the essence of breaking news for any news organization.
The Breaking News team to Twitter: Your Mechanical Turk team can’t compete with our actual journalists. (via shortformblog)
FJP: Some Background — The Twitter Engineering blog posted yesterday about how it uses real people alongside its search algorithms to determine the “meaning” of trending terms. It does this with both in-house evaluators and Amazon’s Mechanical Turk, a crowdsourced marketplace for accomplishing (relatively) small tasks.
The goals is to contextualize and understand, for example, that something like #BindersFullOfWomen is related to politics.
Here’s what Twitter has to say about what happens when topics begin to trend:
As soon as we discover a new popular search query, we send it to our human evaluators, who are asked a variety of questions about the query… For example: as soon as we notice “Big Bird” spiking, we may ask judges on Mechanical Turk to categorize the query, or provide other information (e.g., whether there are likely to be interesting pictures of the query, or whether the query is about a person or an event) that helps us serve relevant Tweets and ads.
We need, in short, to pay attention to the materiality of algorithmic processes. By that, I do not simply mean the materiality of the algorithmic processing (the circuits, server farms, internet cables, super-computers, and so on) but to the materiality of the procedural inputs. To the stuff that the algorithm mashes up, rearranges, and spits out.
CW Anderson, Culture Daily. The Materiality of Algorithms.
In what reads like a starting point for more posts on the subject, CUNY Prof Chris Anderson discusses what documents journalists may want to design algorithms for, and just how hard that task will be.
Algorithms doing magic inside massive data sets and search engines, while not mathematically simple, are generally easy to conceptualize — algorithms and their data are sitting in the computer, the algorithm sifts through the excel sheet in the background and bam! you have something.
But if you’re working with poorly organized documents, it’s difficult to simply plug them in.
Chris writes that the work required to include any document in a set will shape the algorithm that makes sense of the whole bunch. This will be a problem for journalists who want to examine any documents made without much forethought, which is to say: government documents, phone records from different companies and countries, eye witness reports, police sketches, mugshots, bank statements, tax forms, and hundreds of other things worth investigating.
The recovered text [from these documents] is a mess, because these documents are just about the worse possible case for OCR [optical character recognition]: many of these documents are forms with a complex layout, and the pages have been photocopied multiple times, redacted, scribbled on, stamped and smudged. But large blocks of text come through pretty well, and this command extracts what text there is into one file per page.
To read the rest of Stray’s account, see his Overview Project.
And to see more with Chris Anderson, see our recent video interviews with him.
The [New York] Times does not release traffic figures, but a spokesperson said yesterday that [Nate] Silver’s blog provided a significant—and significantly growing, over the past year—percentage of Times pageviews. This fall, visits to the Times’ political coverage (including FiveThirtyEight) have increased, both absolutely and as a percentage of site visits. But FiveThirtyEight’s growth is staggering: where earlier this year, somewhere between 10 and 20 percent of politics visits included a stop at FiveThirtyEight, last week that figure was 71 percent.
But Silver’s blog has buoyed more than just the politics coverage, becoming a signifiant traffic-driver for the site as a whole. Earlier this year, approximately 1 percent of visits to the New York Times included FiveThirtyEight. Last week, that number was 13 percent. Yesterday, it was 20 percent. That is, one in five visitors to the sixth-most-trafficked U.S. news site took a look at Silver’s blog.
Marc Tracy, The New Republic. Nate Silver Is a One-Man Traffic Machine for the Times.
Takeaway: Stat nerds have clout.