Posts tagged with ‘algorithms’

What Happens When You Like Everything?
Journalists can be a masochistic lot.
Take Mat Honan over at Wired who decided to like everything in his Facebook News Feed:

Or at least I did, for 48 hours. Literally everything Facebook sent my way, I liked — even if I hated it. I decided to embark on a campaign of conscious liking, to see how it would affect what Facebook showed me…
…Relateds quickly became a problem, because as soon as you like one, Facebook replaces it with another. So as soon as I liked the four relateds below a story, it immediately gave me four more. And then four more. And then four more. And then four more. I quickly realized I’d be stuck in a related loop for eternity if I kept this up. So I settled on a new rule: I would like the first four relateds Facebook shows me, but no more.

So how did Facebook’s algorithm respond?

My News Feed took on an entirely new character in a surprisingly short amount of time. After checking in and liking a bunch of stuff over the course of an hour, there were no human beings in my feed anymore. It became about brands and messaging, rather than humans with messages…
…While I expected that what I saw might change, what I never expected was the impact my behavior would have on my friends’ feeds. I kept thinking Facebook would rate-limit me, but instead it grew increasingly ravenous. My feed become a cavalcade of brands and politics and as I interacted with them, Facebook dutifully reported this to all my friends and followers.

After 48 hours he gives up “because it was just too awful.”
Over at The Atlantic, Caleb Garling plays with Facebook’s algorithm as well. Instead of liking though, he tries to hack the system to see what he needs to do so that friends and followers see what he posts:

Part of the impetus was that Facebook had frustrated me. That morning I’d posted a story I’d written about the hunt for electric bacteria that might someday power remote sensors. After a few hours, the story had garnered just one like. I surmised that Facebook had decided that, for whatever reason, what I’d submitted to the blue ether wasn’t what people wanted, and kept it hidden.
A little grumpy at the idea, I wanted to see if I could trick Facebook into believing I’d had one of those big life updates that always hang out at the top of the feed. People tend to word those things roughly the same way and Facebook does smart things with pattern matching and sentiment analysis. Let’s see if I can fabricate some social love.
I posted: “Hey everyone, big news!! I’ve accepted a position trying to make Facebook believe this is an important post about my life! I’m so excited to begin this small experiment into how the Facebook algorithms processes language and really appreciate all of your support!”

And the likes poured in: “After 90 minutes, the post had 57 likes and 25 commenters.”
So can you game the Facebook algorithm? Not really, thinks Garling. Not while the code remains invisible.
At best, he writes, we might be able to intuit a “feeble correlation.”
Which might be something to like.

What Happens When You Like Everything?

Journalists can be a masochistic lot.

Take Mat Honan over at Wired who decided to like everything in his Facebook News Feed:

Or at least I did, for 48 hours. Literally everything Facebook sent my way, I liked — even if I hated it. I decided to embark on a campaign of conscious liking, to see how it would affect what Facebook showed me…

…Relateds quickly became a problem, because as soon as you like one, Facebook replaces it with another. So as soon as I liked the four relateds below a story, it immediately gave me four more. And then four more. And then four more. And then four more. I quickly realized I’d be stuck in a related loop for eternity if I kept this up. So I settled on a new rule: I would like the first four relateds Facebook shows me, but no more.

So how did Facebook’s algorithm respond?

My News Feed took on an entirely new character in a surprisingly short amount of time. After checking in and liking a bunch of stuff over the course of an hour, there were no human beings in my feed anymore. It became about brands and messaging, rather than humans with messages…

…While I expected that what I saw might change, what I never expected was the impact my behavior would have on my friends’ feeds. I kept thinking Facebook would rate-limit me, but instead it grew increasingly ravenous. My feed become a cavalcade of brands and politics and as I interacted with them, Facebook dutifully reported this to all my friends and followers.

After 48 hours he gives up “because it was just too awful.”

Over at The Atlantic, Caleb Garling plays with Facebook’s algorithm as well. Instead of liking though, he tries to hack the system to see what he needs to do so that friends and followers see what he posts:

Part of the impetus was that Facebook had frustrated me. That morning I’d posted a story I’d written about the hunt for electric bacteria that might someday power remote sensors. After a few hours, the story had garnered just one like. I surmised that Facebook had decided that, for whatever reason, what I’d submitted to the blue ether wasn’t what people wanted, and kept it hidden.

A little grumpy at the idea, I wanted to see if I could trick Facebook into believing I’d had one of those big life updates that always hang out at the top of the feed. People tend to word those things roughly the same way and Facebook does smart things with pattern matching and sentiment analysis. Let’s see if I can fabricate some social love.

I posted: “Hey everyone, big news!! I’ve accepted a position trying to make Facebook believe this is an important post about my life! I’m so excited to begin this small experiment into how the Facebook algorithms processes language and really appreciate all of your support!”

And the likes poured in: “After 90 minutes, the post had 57 likes and 25 commenters.”

So can you game the Facebook algorithm? Not really, thinks Garling. Not while the code remains invisible.

At best, he writes, we might be able to intuit a “feeble correlation.”

Which might be something to like.

A human author simply decides an interesting emotional path for the story, and the computer does the rest.

Margaret Sarlej, PhD candidate at University of New South Wales, to Phys.org. Computer writes its own fables.

We’ve written before about robots writing the news, now they’re writing fables.

Sarlej has written an application that takes 22 identified emotions used in fables, mixes and matches them with a plot, and pops out a written story.

Easier said than done. 

Via The Guardian:

Breaking stories down for a computer “involves not only encoding story elements like characters, events, and plot, but also the ‘common sense’ people take for granted”, said Sarlej. Telling a story is simple enough for a child to do, but stories are actually “incredibly complex”.

"For example, if Bob gives Alice an apple, Alice will have the apple, and Bob will not. To a person, that’s obvious, and doesn’t require explanation. If Bob punches Carl, people would generally assume Carl will be unhappy about it, but a computer doesn’t have the ‘common sense’ to make such an inference. In a computer programme, details like this must be explicitly spelled out," she said.

Current results are fairly rudimentary but, according to Scarlej’s supervisor, computers “will be making interesting and meaningful contributions to literature within the next decade.”

What Writer's Block? Swedish Man and His Bot Have Authored 2.7 Million Wikipedia Articles →

Via The Wall Street Journal:

Sverker Johansson could be the most prolific author you’ve never heard of.

Volunteering his time over the past seven years publishing to Wikipedia, the 53-year-old Swede can take credit for 2.7 million articles, or 8.5% of the entire collection, according to Wikimedia analytics, which measures the site’s traffic. His stats far outpace any other user, the group says.

He has been particularly prolific cataloging obscure animal species, including butterflies and beetles, and is proud of his work highlighting towns in the Philippines. About one-third of his entries are uploaded to the Swedish language version of Wikipedia, and the rest are composed in two versions of Filipino, one of which is his wife’s native tongue.

An administrator holding degrees in linguistics, civil engineering, economics and particle physics, he says he has long been interested in “the origin of things, oh, everything.”

It isn’t uncommon, however, for Wikipedia purists to complain about his method. That is because the bulk of his entries have been created by a computer software program—known as a bot. Critics say bots crowd out the creativity only humans can generate.

Mr. Johansson’s program scrubs databases and other digital sources for information, and then packages it into an article. On a good day, he says his “Lsjbot” creates up to 10,000 new entries.

That’s one way to go about it. Some Wikipedia editors aren’t happy it though.

The Robots are Coming, Part 132
First, some background, via Kevin Roose at New York Magazine:

Earlier this week, one of my business-beat colleagues got assigned to recap the quarterly earnings of Alcoa, the giant metals company, for the Associated Press. The reporter’s story began: “Alcoa Inc. (AA) on Tuesday reported a second-quarter profit of $138 million, reversing a year-ago loss, and the results beat analysts’ expectation. The company reported strong results in its engineered-products business, which makes parts for industrial customers, while looking to cut costs in its aluminum-smelting segment.”
It may not have been the most artful start to a story, but it got the point across, with just enough background information for a casual reader to make sense of it. Not bad. The most impressive part, though, was how long the story took to produce: less than a second.

If you’re into robots and algorithms writing the news, the article’s worth the read. It’s optimistic, asserting that in contexts like earnings reports, sports roundups and the like, the automation frees journalists for more mindful work such as analyzing what those earning reports actually mean
With 300 million robot-driven stories produced last year – more than all media outlets in the world combined, according to Roose – and an estimated billion stories in store for 2014, that’s a lot of freed up time to cast our minds elsewhere.
Besides, as Roose explains, “The stories that today’s robots can write are, frankly, the kinds of stories that humans hate writing anyway.”
More interesting, and more troubling, are the ethics behind algorithmically driven articles. Slate’s Nicholas Diakopoulos tried to tackle this question in April when he asked how we can incorporate robots into our news gathering with a level of expected transparency needed in today’s media environment. Part of his solution is understanding what he calls the “tuning criteria,” or the inherent biases, that are used to make editorial decisions when algorithms direct the news.
Here’s something else to chew on. Back to Roose:

Robot-generated stories aren’t all fill-in-the-blank jobs; the more advanced algorithms use things like perspective, tone, and humor to tailor a story to its audience. A robot recapping a basketball game, for example, might be able to produce two versions of a story using the same data: one upbeat story that reads as if a fan of the winning team had written it; and another glum version written from the loser’s perspective.

Apply this concept to a holy grail of startups and legacy organizations alike: customizing and personalizing the news just for you. Will future robots feed us a feel-good, meat and potatoes partisan diet of news based on the same sort behavioral tracking the ad industry uses to deliver advertising. With the time and cost of producing multiple stories from the same data sets approaching zero, it’s not difficult to imagine a news site deciding that they’ll serve different versions of the same story based on perceived political affiliations.
That’s a conundrum. One more worth exploring than whether an algorithm can give us a few paragraphs on who’s nominated for the next awards show.
Want more robots? Visit our Robots Tag.
Image: Twitter post, via @hanelly.

The Robots are Coming, Part 132

First, some background, via Kevin Roose at New York Magazine:

Earlier this week, one of my business-beat colleagues got assigned to recap the quarterly earnings of Alcoa, the giant metals company, for the Associated Press. The reporter’s story began: “Alcoa Inc. (AA) on Tuesday reported a second-quarter profit of $138 million, reversing a year-ago loss, and the results beat analysts’ expectation. The company reported strong results in its engineered-products business, which makes parts for industrial customers, while looking to cut costs in its aluminum-smelting segment.”

It may not have been the most artful start to a story, but it got the point across, with just enough background information for a casual reader to make sense of it. Not bad. The most impressive part, though, was how long the story took to produce: less than a second.

If you’re into robots and algorithms writing the news, the article’s worth the read. It’s optimistic, asserting that in contexts like earnings reports, sports roundups and the like, the automation frees journalists for more mindful work such as analyzing what those earning reports actually mean

With 300 million robot-driven stories produced last year – more than all media outlets in the world combined, according to Roose – and an estimated billion stories in store for 2014, that’s a lot of freed up time to cast our minds elsewhere.

Besides, as Roose explains, “The stories that today’s robots can write are, frankly, the kinds of stories that humans hate writing anyway.”

More interesting, and more troubling, are the ethics behind algorithmically driven articles. Slate’s Nicholas Diakopoulos tried to tackle this question in April when he asked how we can incorporate robots into our news gathering with a level of expected transparency needed in today’s media environment. Part of his solution is understanding what he calls the “tuning criteria,” or the inherent biases, that are used to make editorial decisions when algorithms direct the news.

Here’s something else to chew on. Back to Roose:

Robot-generated stories aren’t all fill-in-the-blank jobs; the more advanced algorithms use things like perspective, tone, and humor to tailor a story to its audience. A robot recapping a basketball game, for example, might be able to produce two versions of a story using the same data: one upbeat story that reads as if a fan of the winning team had written it; and another glum version written from the loser’s perspective.

Apply this concept to a holy grail of startups and legacy organizations alike: customizing and personalizing the news just for you. Will future robots feed us a feel-good, meat and potatoes partisan diet of news based on the same sort behavioral tracking the ad industry uses to deliver advertising. With the time and cost of producing multiple stories from the same data sets approaching zero, it’s not difficult to imagine a news site deciding that they’ll serve different versions of the same story based on perceived political affiliations.

That’s a conundrum. One more worth exploring than whether an algorithm can give us a few paragraphs on who’s nominated for the next awards show.

Want more robots? Visit our Robots Tag.

Image: Twitter post, via @hanelly.

‘Robot’ to write 1 billion stories in 2014 but will you know it when you see it? | Poynter. →

If you’re a human reporter quaking in your boots this week over news of a Los Angeles Times algorithm that wrote the newspaper’s initial story about an earthquake, you might want to cover your ears for this fact:

Software from Automated Insights will generate about 1 billion stories this year — up from 350 million last year, CEO and founder Robbie Allen told Poynter via phone.

FJP: Here’s a ponderable for you.

A few weeks ago, the New York Post reported that Quinton Ross died. Ross, a former Brooklyn Nets basketball player, didn’t know he was dead and soon let people know he was just fine.

"A couple (relatives) already heard it," Ross told the Associated Press. “They were crying. I mean, it was a tough day, man, mostly for my family and friends… My phone was going crazy. I checked Facebook. Finally, I went on the Internet, and they were saying I was dead. I just couldn’t believe it.”

The original reporter on the story? A robot. Specifically, Wikipedia Live Monitor, created by Google engineer Thomas Steiner.

Slate explains how it happened:

Wikipedia Live Monitor is a news bot designed to detect breaking news events. It does this by listening to the velocity and concurrent edits across 287 language versions of Wikipedia. The theory is that if lots of people are editing Wikipedia pages in different languages about the same event and at the same time, then chances are something big and breaking is going on.

At 3:09 p.m. the bot recognized the apparent death of Quinton Ross (the basketball player) as a breaking news event—there had been eight edits by five editors in three languages. The bot sent a tweet. Twelve minutes later, the page’s information was corrected. But the bot remained silent. No correction. It had shared what it thought was breaking news, and that was that. Like any journalist, these bots can make mistakes.

Quick takeaway: Robots, like the humans that program them, are fallible.

Slower, existential takeaway: “How can we instill journalistic ethics in robot reporters?

As Nicholas Diakopoulos explains in Slate, code transparency is an inadequate part of the answer. More important  is understanding what he calls the “tuning criteria,” or the inherent biases, that are used to make editorial decisions when algorithms direct the news.

Read through for his excellent take.

(Source: futurescope, via emergentfutures)

Where Robots Create Your Weekly Paper →

Via Nieman Lab:

The Guardian is experimenting in the craft newspaper business and getting some help from robots.

That may sound odd, given that the company prints a daily paper read throughout Britain. A paper staffed by humans. But the company is tinkering with something smaller and more algorithm-driven.

The Guardian has partnered with The Newspaper Club, a company that produces small-run DIY newspapers, to print The Long Good Read, a weekly print product that collects a handful of The Guardian’s best longform stories from the previous seven days. The Newspaper Club runs off a limited number of copies, which are then distributed at another Guardian experiment: a coffee shop in East London. That’s where, on Monday mornings, you’ll find a 24-page tabloid with a simple layout available for free.

On the surface, The Long Good Read has the appeal of being a kind of analog Instapaper for all things Guardian. But the interesting thing is how paper is produced: robots. Okay, algorithms if you want to be technical — algorithms and programs that both select the paper’s stories and lay them out on the page.

Jemima Kiss, head of technology for The Guardian, said The Long Good Read is another attempt at finding ways to give stories new life beyond the day they’re published: “It’s just a way of reusing that content in a more imaginative way and not getting too hung up on the fact it’s a newspaper.”

Read through to see how it’s done.

Medium is the Message: The Perils of Algorithmic Curation

In an interview on his newest project (the just over 1-year-old long-form platform Medium) Twitter co-founder Evan Williams shared a few thoughts on the uselessness of general news, and the need for a platform to highlight ideas of lasting import.

TechCrunch reports:

Williams is taking aim squarely at the news industry’s most embarrassing vulnerability: the incessant need to trump up mundane happenings in order to habituate readers into needing news like a daily drug fix.

“News in general doesn’t matter most of the time, and most people would be far better off if they spent their time consuming less news and more ideas that have more lasting import,” he tells me during our interview inside a temporary Market Street office space that’s housing Medium, until the top two floors are ready for his growing team. “Even if it’s fiction, it’s probably better most of the time.”

[…] Instead, Williams argues, citizens should re-calibrate their ravenous appetite for information towards more awe-inspiring content. “Published written ideas and stories are life-changing,” he gushes, recalling his early childhood fascination with books as the motivation to take on the media establishment. The Internet “was freeing that up, that excitement about knowledge that’s inside of books–multiplied and freed and unlocked for the world; and, the world would be better in every way.”

In Williams’s grand vision, the public reads for enlightenment; news takes a backseat directly in proportion to how often it leaves us more informed and inspired.

This is a really valid, and really noble ambition that resonates with more than a few people. In a letter to a young journalist, Pulitzer winning writer Lane DeGregory looks back on her career and says she wishes she had “read more short stories and fewer newspaper articles.”

It also echoes what Maria Popova has been aiming to do with her curatorial interestingness project, Brain Pickings, for years now. Last week, she wrote a must-read piece on tech writer Clive Thompson’s new book, which pushes past “painfully familiar and trite-by-overuse notions like distraction and information overload,” to deeply examine the impact of digital tools. She writes:

Several decades after Vannevar Bush’s now-legendary meditation on how technology will impact our thinking, Thompson reaches even further into the fringes of our cultural sensibility — past the cheap techno-dystopia, past the pollyannaish techno-utopia, and into that intricate and ever-evolving intersection of technology and psychology.

The Problem: Though I’ve been excited about Medium and its potential, I’m inclined to file Williams’ vision for it into the “pollyannaish techno-utopia” bucket that Popova mentions because although the impulse behind it (the desire for an antidote to the ravenous appetite for tidbits of useless information) is something I wholeheartedly agree with, algorithmic curation worries me.

How Medium works:

Traditional news editors stake their reputations on having an intuition for what drives eyeballs to their sites. Editors don’t, however, know whether readers leave more informed.

Williams thinks Medium has an answer: an intelligent algorithm that suggests stories, primarily based on how long users spend reading certain articles (which he’s discussing publicly for the first time). Like Pandora did for music discovery, Medium’s new intelligent curator aims to improve the ol’ human-powered system of manually scrolling through the Internet and asking others what to read.

In the algorithm itself, Medium prioritizes time spent on an article, rather than simple page views. “Time spent is not actually a value in itself, but in a world where people have infinite choices, it’s a pretty good measure if people are getting value,” explains Williams.

"Time spent" seems like a questionable way to measure value, if "enlightening" content is what Medium wants to put on the screens of readers. As a content-neutral long-form discovery platform, sure, it makes sense. And there isn’t really anything wrong with it either. But touting itself as a solution to our appetite for endless streams of meaningless information seems troubling to me. Here’s why:

A key aspect of Thompson’s argument on the good the internet has done for our brains is that it has given us unprecedented access to one another’s memory stores, which means that our ability to indiscriminately discover information and understand the world through it, has expanded infinitely. To oversimplify it: we don’t have to remember as much by ourselves—we simply need to remember where information is stored and how to access it quickly. While the benefits are obvious, the issue with this is that it hampers creative thought, and our ability to make connections.

In light of platforms like Medium, longer isn’t better, especially when the discovery of value is left to machines. Popova excerpts a portion of Thompson’s book in which he explains how an algorithm’s biases exist, but are almost impossible to identify:

The real challenge of using machines for transactive memory lies in the inscrutability of their mechanics. Transactive memory works best when you have a sense of how your partners’ minds work — where they’re strong, where they’re weak, where their biases lie. I can judge that for people close to me. But it’s harder with digital tools, particularly search engines. You can certainly learn how they work and develop a mental model of Google’s biases. … But search companies are for-profit firms. They guard their algorithms like crown jewels. This makes them different from previous forms of outboard memory. A public library keeps no intentional secrets about its mechanisms; a search engine keeps many. On top of this inscrutability, it’s hard to know what to trust in a world of self-publishing. To rely on networked digital knowledge, you need to look with skeptical eyes. It’s a skill that should be taught with the same urgency we devote to teaching math and writing.

Popova explains that without a mental pool of resources from which we can connect existing ideas into new combinations—and I’d add, thereby access, retain, and be “enlightened” by information—our capacity to do so is deflated.

TL;DR: Popova’s piece doesn’t directly address or assess discovery platforms like Medium, but I think it’s worth considering them together. Longer form writing isn’t an antidote to short bites of information, and ideas of lasting value can’t be judged by time spent consuming them. The point here is that for content platforms that truly seek to give people access to more ideas with more lasting import, a lot more work has to be done, namely: (1) The limitations of algorithmic curation need to be transparent, and talked about, and (2) Readers need to be taught how to critically consume self-published writing that they received through digitally networked knowledge. —Jihii

Can Robots Tell the Truth?
Hi, I am a student in journalism and am preparing an article about robots (like the Washington Post’s Truth Teller) validating facts instead of journalists. I am curious to know the Future Journalism Project’s point of view of about this. What are the consequences for journalists, journalism and for democracy? — Melanié Robert
Hi Melanié,
Many thanks for this fascinating question and my apologies for the delay in getting back to you. Here’s what happened:
I started thinking about this, and then I started writing about it. And then I started thinking that what I really needed to do was some reporting. You know, journalism.
I didn’t know much about the Washington Post’s Truth Teller project. For others that don’t, it’s an attempt to create an algorithm that can fact check political speeches in real time.
Since I didn’t know much about it about I got in touch and interviewed the two project leads: Steven Ginsberg, the Post’s National Political Editor, and Cory Haik, the Post’s Executive Producer for Digital News.
They gave me background on Truth Teller and how it came about, and then where they hope it leads. 
But that doesn’t really get to the sociocultural and philosophical questions you pose. So I called upon someone else. His name is Damon Horowitz.
Damon’s spent his career in both artificial intelligence and philosophy. He’s currently Google’s In-House Philosopher (seriously, it’s on his business card) and Director of Engineering. He also teaches philosophy at Columbia University.
So, after talking to these people, and thinking about it some more, I wrote a fair bit. 
You can find your answer at theFJP.org and I hope it answers some of what you’re looking for. — Michael
Have a question? Ask Away.
Image: Marvin the Paranoid Android, Hitchhiker’s Guide to the Galaxy.

Can Robots Tell the Truth?

Hi, I am a student in journalism and am preparing an article about robots (like the Washington Post’s Truth Teller) validating facts instead of journalists. I am curious to know the Future Journalism Project’s point of view of about this. What are the consequences for journalists, journalism and for democracy? — Melanié Robert

Hi Melanié,

Many thanks for this fascinating question and my apologies for the delay in getting back to you. Here’s what happened:

I started thinking about this, and then I started writing about it. And then I started thinking that what I really needed to do was some reporting. You know, journalism.

I didn’t know much about the Washington Post’s Truth Teller project. For others that don’t, it’s an attempt to create an algorithm that can fact check political speeches in real time.

Since I didn’t know much about it about I got in touch and interviewed the two project leads: Steven Ginsberg, the Post’s National Political Editor, and Cory Haik, the Post’s Executive Producer for Digital News.

They gave me background on Truth Teller and how it came about, and then where they hope it leads. 

But that doesn’t really get to the sociocultural and philosophical questions you pose. So I called upon someone else. His name is Damon Horowitz.

Damon’s spent his career in both artificial intelligence and philosophy. He’s currently Google’s In-House Philosopher (seriously, it’s on his business card) and Director of Engineering. He also teaches philosophy at Columbia University.

So, after talking to these people, and thinking about it some more, I wrote a fair bit

You can find your answer at theFJP.org and I hope it answers some of what you’re looking for. — Michael

Have a question? Ask Away.

Image: Marvin the Paranoid Android, Hitchhiker’s Guide to the Galaxy.

Miami Herald
The Problem With Contextual Advertising
Image: Screenshot, Miami Herald Home Page, December 16, 2012.

Miami Herald

The Problem With Contextual Advertising

Image: Screenshot, Miami Herald Home Page, December 16, 2012.

Even robots have biases.

Any decision process, whether human or algorithm, about what to include, exclude, or emphasize — processes of which Google News has many — has the potential to introduce bias. What’s interesting in terms of algorithms though is that the decision criteria available to the algorithm may appear innocuous while at the same time resulting in output that is perceived as biased.

Nick Diakopoulos, Nieman Lab. Understanding bias in computational news media.

Whether the cause is ideological or systematic, the outcome is, for now, the same: algorithms appear to be as biased as editors in their sorting of news. Click the link to read on, and see more about its author here.

We need, in short, to pay attention to the materiality of algorithmic processes. By that, I do not simply mean the materiality of the algorithmic processing (the circuits, server farms, internet cables, super-computers, and so on) but to the materiality of the procedural inputs. To the stuff that the algorithm mashes up, rearranges, and spits out.

CW Anderson, Culture Daily. The Materiality of Algorithms.

In what reads like a starting point for more posts on the subject, CUNY Prof Chris Anderson discusses what documents journalists may want to design algorithms for, and just how hard that task will be.

Algorithms doing magic inside massive data sets and search engines, while not mathematically simple, are generally easy to conceptualize — algorithms and their data are sitting in the computer, the algorithm sifts through the excel sheet in the background and bam! you have something.

But if you’re working with poorly organized documents, it’s difficult to simply plug them in.

Chris writes that the work required to include any document in a set will shape the algorithm that makes sense of the whole bunch. This will be a problem for journalists who want to examine any documents made without much forethought, which is to say: government documents, phone records from different companies and countries, eye witness reports, police sketches, mugshots, bank statements, tax forms, and hundreds of other things worth investigating.

Chris quotes Jonathan Stray’s trouble preparing 4500 docs on Iraqi security contractors:

The recovered text [from these documents] is a mess, because these documents are just about the worse possible case for OCR [optical character recognition]: many of these documents are forms with a complex layout, and the pages have been photocopied multiple times, redacted, scribbled on, stamped and smudged. But large blocks of text come through pretty well, and this command extracts what text there is into one file per page.

To read the rest of Stray’s account, see his Overview Project.

And to see more with Chris Anderson, see our recent video interviews with him.

Google News at 10: How the Algorithm Won Over the News Industry →

There is, on the one hand, an incredibly simple explanation for the shift in news organizations’ attitude toward Google: clicks. Google News was founded 10 years ago — September 22, 2002 — and has since functioned not merely as an aggregator of news, but also as a source of traffic to news sites. Google News, its executives tell me, now “algorithmically harvests” articles from more than 50,000 news sources across 72 editions and 30 languages. And Google News-powered results, Google says, are viewed by about 1 billion unique users a week. (Yep, that’s billion with a b.) Which translates, for news outlets overall, to more than 4 billion clicks each month: 1 billion from Google News itself and an additional 3 billion from web search.

As a Google representative put it, “That’s about 100,000 business opportunities we provide publishers every minute.”

Google News automation fail of the day week month ever.
For the unfortunate story, see here.

Google News automation fail of the day week month ever.

For the unfortunate story, see here.

That the presidency ages people quickly is well documented. In a recent CNN article, Dr. Michael Roizen of the Cleveland Clinic says presidents age twice as fast while in office.
What’s new, as Google’s automated news algorithm illustrates here, is that the presidency also appears to be able to turn a black man into a white man.
Learning something new every day.

That the presidency ages people quickly is well documented. In a recent CNN article, Dr. Michael Roizen of the Cleveland Clinic says presidents age twice as fast while in office.

What’s new, as Google’s automated news algorithm illustrates here, is that the presidency also appears to be able to turn a black man into a white man.

Learning something new every day.