Posts tagged algorithms

‘Robot’ to write 1 billion stories in 2014 but will you know it when you see it? | Poynter.

If you’re a human reporter quaking in your boots this week over news of a Los Angeles Times algorithm that wrote the newspaper’s initial story about an earthquake, you might want to cover your ears for this fact:

Software from Automated Insights will generate about 1 billion stories this year — up from 350 million last year, CEO and founder Robbie Allen told Poynter via phone.

FJP: Here’s a ponderable for you.

A few weeks ago, the New York Post reported that Quinton Ross died. Ross, a former Brooklyn Nets basketball player, didn’t know he was dead and soon let people know he was just fine.

"A couple (relatives) already heard it," Ross told the Associated Press. “They were crying. I mean, it was a tough day, man, mostly for my family and friends… My phone was going crazy. I checked Facebook. Finally, I went on the Internet, and they were saying I was dead. I just couldn’t believe it.”

The original reporter on the story? A robot. Specifically, Wikipedia Live Monitor, created by Google engineer Thomas Steiner.

Slate explains how it happened:

Wikipedia Live Monitor is a news bot designed to detect breaking news events. It does this by listening to the velocity and concurrent edits across 287 language versions of Wikipedia. The theory is that if lots of people are editing Wikipedia pages in different languages about the same event and at the same time, then chances are something big and breaking is going on.

At 3:09 p.m. the bot recognized the apparent death of Quinton Ross (the basketball player) as a breaking news event—there had been eight edits by five editors in three languages. The bot sent a tweet. Twelve minutes later, the page’s information was corrected. But the bot remained silent. No correction. It had shared what it thought was breaking news, and that was that. Like any journalist, these bots can make mistakes.

Quick takeaway: Robots, like the humans that program them, are fallible.

Slower, existential takeaway: “How can we instill journalistic ethics in robot reporters?

As Nicholas Diakopoulos explains in Slate, code transparency is an inadequate part of the answer. More important  is understanding what he calls the “tuning criteria,” or the inherent biases, that are used to make editorial decisions when algorithms direct the news.

Read through for his excellent take.

Where Robots Create Your Weekly Paper

Via Nieman Lab:

The Guardian is experimenting in the craft newspaper business and getting some help from robots.

That may sound odd, given that the company prints a daily paper read throughout Britain. A paper staffed by humans. But the company is tinkering with something smaller and more algorithm-driven.

The Guardian has partnered with The Newspaper Club, a company that produces small-run DIY newspapers, to print The Long Good Read, a weekly print product that collects a handful of The Guardian’s best longform stories from the previous seven days. The Newspaper Club runs off a limited number of copies, which are then distributed at another Guardian experiment: a coffee shop in East London. That’s where, on Monday mornings, you’ll find a 24-page tabloid with a simple layout available for free.

On the surface, The Long Good Read has the appeal of being a kind of analog Instapaper for all things Guardian. But the interesting thing is how paper is produced: robots. Okay, algorithms if you want to be technical — algorithms and programs that both select the paper’s stories and lay them out on the page.

Jemima Kiss, head of technology for The Guardian, said The Long Good Read is another attempt at finding ways to give stories new life beyond the day they’re published: “It’s just a way of reusing that content in a more imaginative way and not getting too hung up on the fact it’s a newspaper.”

Read through to see how it’s done.

Medium is the Message: The Perils of Algorithmic Curation

In an interview on his newest project (the just over 1-year-old long-form platform Medium) Twitter co-founder Evan Williams shared a few thoughts on the uselessness of general news, and the need for a platform to highlight ideas of lasting import.

TechCrunch reports:

Williams is taking aim squarely at the news industry’s most embarrassing vulnerability: the incessant need to trump up mundane happenings in order to habituate readers into needing news like a daily drug fix.

“News in general doesn’t matter most of the time, and most people would be far better off if they spent their time consuming less news and more ideas that have more lasting import,” he tells me during our interview inside a temporary Market Street office space that’s housing Medium, until the top two floors are ready for his growing team. “Even if it’s fiction, it’s probably better most of the time.”

[…] Instead, Williams argues, citizens should re-calibrate their ravenous appetite for information towards more awe-inspiring content. “Published written ideas and stories are life-changing,” he gushes, recalling his early childhood fascination with books as the motivation to take on the media establishment. The Internet “was freeing that up, that excitement about knowledge that’s inside of books–multiplied and freed and unlocked for the world; and, the world would be better in every way.”

In Williams’s grand vision, the public reads for enlightenment; news takes a backseat directly in proportion to how often it leaves us more informed and inspired.

This is a really valid, and really noble ambition that resonates with more than a few people. In a letter to a young journalist, Pulitzer winning writer Lane DeGregory looks back on her career and says she wishes she had “read more short stories and fewer newspaper articles.”

It also echoes what Maria Popova has been aiming to do with her curatorial interestingness project, Brain Pickings, for years now. Last week, she wrote a must-read piece on tech writer Clive Thompson’s new book, which pushes past “painfully familiar and trite-by-overuse notions like distraction and information overload,” to deeply examine the impact of digital tools. She writes:

Several decades after Vannevar Bush’s now-legendary meditation on how technology will impact our thinking, Thompson reaches even further into the fringes of our cultural sensibility — past the cheap techno-dystopia, past the pollyannaish techno-utopia, and into that intricate and ever-evolving intersection of technology and psychology.

The Problem: Though I’ve been excited about Medium and its potential, I’m inclined to file Williams’ vision for it into the “pollyannaish techno-utopia” bucket that Popova mentions because although the impulse behind it (the desire for an antidote to the ravenous appetite for tidbits of useless information) is something I wholeheartedly agree with, algorithmic curation worries me.

How Medium works:

Traditional news editors stake their reputations on having an intuition for what drives eyeballs to their sites. Editors don’t, however, know whether readers leave more informed.

Williams thinks Medium has an answer: an intelligent algorithm that suggests stories, primarily based on how long users spend reading certain articles (which he’s discussing publicly for the first time). Like Pandora did for music discovery, Medium’s new intelligent curator aims to improve the ol’ human-powered system of manually scrolling through the Internet and asking others what to read.

In the algorithm itself, Medium prioritizes time spent on an article, rather than simple page views. “Time spent is not actually a value in itself, but in a world where people have infinite choices, it’s a pretty good measure if people are getting value,” explains Williams.

"Time spent" seems like a questionable way to measure value, if "enlightening" content is what Medium wants to put on the screens of readers. As a content-neutral long-form discovery platform, sure, it makes sense. And there isn’t really anything wrong with it either. But touting itself as a solution to our appetite for endless streams of meaningless information seems troubling to me. Here’s why:

A key aspect of Thompson’s argument on the good the internet has done for our brains is that it has given us unprecedented access to one another’s memory stores, which means that our ability to indiscriminately discover information and understand the world through it, has expanded infinitely. To oversimplify it: we don’t have to remember as much by ourselves—we simply need to remember where information is stored and how to access it quickly. While the benefits are obvious, the issue with this is that it hampers creative thought, and our ability to make connections.

In light of platforms like Medium, longer isn’t better, especially when the discovery of value is left to machines. Popova excerpts a portion of Thompson’s book in which he explains how an algorithm’s biases exist, but are almost impossible to identify:

The real challenge of using machines for transactive memory lies in the inscrutability of their mechanics. Transactive memory works best when you have a sense of how your partners’ minds work — where they’re strong, where they’re weak, where their biases lie. I can judge that for people close to me. But it’s harder with digital tools, particularly search engines. You can certainly learn how they work and develop a mental model of Google’s biases. … But search companies are for-profit firms. They guard their algorithms like crown jewels. This makes them different from previous forms of outboard memory. A public library keeps no intentional secrets about its mechanisms; a search engine keeps many. On top of this inscrutability, it’s hard to know what to trust in a world of self-publishing. To rely on networked digital knowledge, you need to look with skeptical eyes. It’s a skill that should be taught with the same urgency we devote to teaching math and writing.

Popova explains that without a mental pool of resources from which we can connect existing ideas into new combinations—and I’d add, thereby access, retain, and be “enlightened” by information—our capacity to do so is deflated.

TL;DR: Popova’s piece doesn’t directly address or assess discovery platforms like Medium, but I think it’s worth considering them together. Longer form writing isn’t an antidote to short bites of information, and ideas of lasting value can’t be judged by time spent consuming them. The point here is that for content platforms that truly seek to give people access to more ideas with more lasting import, a lot more work has to be done, namely: (1) The limitations of algorithmic curation need to be transparent, and talked about, and (2) Readers need to be taught how to critically consume self-published writing that they received through digitally networked knowledge. —Jihii

Can Robots Tell the Truth?
Hi, I am a student in journalism and am preparing an article about robots (like the Washington Post’s Truth Teller) validating facts instead of journalists. I am curious to know the Future Journalism Project’s point of view of about this. What are the consequences for journalists, journalism and for democracy? — Melanié Robert
Hi Melanié,
Many thanks for this fascinating question and my apologies for the delay in getting back to you. Here’s what happened:
I started thinking about this, and then I started writing about it. And then I started thinking that what I really needed to do was some reporting. You know, journalism.
I didn’t know much about the Washington Post’s Truth Teller project. For others that don’t, it’s an attempt to create an algorithm that can fact check political speeches in real time.
Since I didn’t know much about it about I got in touch and interviewed the two project leads: Steven Ginsberg, the Post’s National Political Editor, and Cory Haik, the Post’s Executive Producer for Digital News.
They gave me background on Truth Teller and how it came about, and then where they hope it leads. 
But that doesn’t really get to the sociocultural and philosophical questions you pose. So I called upon someone else. His name is Damon Horowitz.
Damon’s spent his career in both artificial intelligence and philosophy. He’s currently Google’s In-House Philosopher (seriously, it’s on his business card) and Director of Engineering. He also teaches philosophy at Columbia University.
So, after talking to these people, and thinking about it some more, I wrote a fair bit. 
You can find your answer at theFJP.org and I hope it answers some of what you’re looking for. — Michael
Have a question? Ask Away.
Image: Marvin the Paranoid Android, Hitchhiker’s Guide to the Galaxy.

Can Robots Tell the Truth?

Hi, I am a student in journalism and am preparing an article about robots (like the Washington Post’s Truth Teller) validating facts instead of journalists. I am curious to know the Future Journalism Project’s point of view of about this. What are the consequences for journalists, journalism and for democracy? — Melanié Robert

Hi Melanié,

Many thanks for this fascinating question and my apologies for the delay in getting back to you. Here’s what happened:

I started thinking about this, and then I started writing about it. And then I started thinking that what I really needed to do was some reporting. You know, journalism.

I didn’t know much about the Washington Post’s Truth Teller project. For others that don’t, it’s an attempt to create an algorithm that can fact check political speeches in real time.

Since I didn’t know much about it about I got in touch and interviewed the two project leads: Steven Ginsberg, the Post’s National Political Editor, and Cory Haik, the Post’s Executive Producer for Digital News.

They gave me background on Truth Teller and how it came about, and then where they hope it leads. 

But that doesn’t really get to the sociocultural and philosophical questions you pose. So I called upon someone else. His name is Damon Horowitz.

Damon’s spent his career in both artificial intelligence and philosophy. He’s currently Google’s In-House Philosopher (seriously, it’s on his business card) and Director of Engineering. He also teaches philosophy at Columbia University.

So, after talking to these people, and thinking about it some more, I wrote a fair bit

You can find your answer at theFJP.org and I hope it answers some of what you’re looking for. — Michael

Have a question? Ask Away.

Image: Marvin the Paranoid Android, Hitchhiker’s Guide to the Galaxy.

Miami Herald
The Problem With Contextual Advertising
Image: Screenshot, Miami Herald Home Page, December 16, 2012.

Miami Herald

The Problem With Contextual Advertising

Image: Screenshot, Miami Herald Home Page, December 16, 2012.

Even robots have biases.

Any decision process, whether human or algorithm, about what to include, exclude, or emphasize — processes of which Google News has many — has the potential to introduce bias. What’s interesting in terms of algorithms though is that the decision criteria available to the algorithm may appear innocuous while at the same time resulting in output that is perceived as biased.

Nick Diakopoulos, Nieman Lab. Understanding bias in computational news media.

Whether the cause is ideological or systematic, the outcome is, for now, the same: algorithms appear to be as biased as editors in their sorting of news. Click the link to read on, and see more about its author here.

We need, in short, to pay attention to the materiality of algorithmic processes. By that, I do not simply mean the materiality of the algorithmic processing (the circuits, server farms, internet cables, super-computers, and so on) but to the materiality of the procedural inputs. To the stuff that the algorithm mashes up, rearranges, and spits out.

CW Anderson, Culture Daily. The Materiality of Algorithms.

In what reads like a starting point for more posts on the subject, CUNY Prof Chris Anderson discusses what documents journalists may want to design algorithms for, and just how hard that task will be.

Algorithms doing magic inside massive data sets and search engines, while not mathematically simple, are generally easy to conceptualize — algorithms and their data are sitting in the computer, the algorithm sifts through the excel sheet in the background and bam! you have something.

But if you’re working with poorly organized documents, it’s difficult to simply plug them in.

Chris writes that the work required to include any document in a set will shape the algorithm that makes sense of the whole bunch. This will be a problem for journalists who want to examine any documents made without much forethought, which is to say: government documents, phone records from different companies and countries, eye witness reports, police sketches, mugshots, bank statements, tax forms, and hundreds of other things worth investigating.

Chris quotes Jonathan Stray’s trouble preparing 4500 docs on Iraqi security contractors:

The recovered text [from these documents] is a mess, because these documents are just about the worse possible case for OCR [optical character recognition]: many of these documents are forms with a complex layout, and the pages have been photocopied multiple times, redacted, scribbled on, stamped and smudged. But large blocks of text come through pretty well, and this command extracts what text there is into one file per page.

To read the rest of Stray’s account, see his Overview Project.

And to see more with Chris Anderson, see our recent video interviews with him.

Google News at 10: How the Algorithm Won Over the News Industry

There is, on the one hand, an incredibly simple explanation for the shift in news organizations’ attitude toward Google: clicks. Google News was founded 10 years ago — September 22, 2002 — and has since functioned not merely as an aggregator of news, but also as a source of traffic to news sites. Google News, its executives tell me, now “algorithmically harvests” articles from more than 50,000 news sources across 72 editions and 30 languages. And Google News-powered results, Google says, are viewed by about 1 billion unique users a week. (Yep, that’s billion with a b.) Which translates, for news outlets overall, to more than 4 billion clicks each month: 1 billion from Google News itself and an additional 3 billion from web search.

As a Google representative put it, “That’s about 100,000 business opportunities we provide publishers every minute.”

Google News automation fail of the day week month ever.
For the unfortunate story, see here.

Google News automation fail of the day week month ever.

For the unfortunate story, see here.

That the presidency ages people quickly is well documented. In a recent CNN article, Dr. Michael Roizen of the Cleveland Clinic says presidents age twice as fast while in office.
What’s new, as Google’s automated news algorithm illustrates here, is that the presidency also appears to be able to turn a black man into a white man.
Learning something new every day.

That the presidency ages people quickly is well documented. In a recent CNN article, Dr. Michael Roizen of the Cleveland Clinic says presidents age twice as fast while in office.

What’s new, as Google’s automated news algorithm illustrates here, is that the presidency also appears to be able to turn a black man into a white man.

Learning something new every day.

For content farms, the Panda doesn’t play nice.

You can’t mess with Google forever. In February, the corporation concocted what it concocts best: an algorithm. The algorithm, called Panda, affects some 12 percent of searches, and it has — slowly and imperfectly — been improving things. Just a short time ago, the Web seemed ungovernable; bad content was driving out good. But Google asserted itself, and credit is due: Panda represents good cyber-governance. It has allowed Google to send untrustworthy, repetitive and unsatisfying content to the back of the class. No more A’s for cheaters.

For content farms, the Panda doesn’t play nice.

You can’t mess with Google forever. In February, the corporation concocted what it concocts best: an algorithm. The algorithm, called Panda, affects some 12 percent of searches, and it has — slowly and imperfectly — been improving things. Just a short time ago, the Web seemed ungovernable; bad content was driving out good. But Google asserted itself, and credit is due: Panda represents good cyber-governance. It has allowed Google to send untrustworthy, repetitive and unsatisfying content to the back of the class. No more A’s for cheaters.

The $23,698,655.93 Book

A post-doctoral student at UC Berkeley wanted to buy a copy of Peter Lawrence’s The Making of a Fly so did like a lot of people do and went to Amazon.

There he found it on sale for $35.54 to $1,730,045.91 (plus shipping).

What gives?

Michael Eisen explains:

On the day we discovered the million dollar prices, the copy offered by bordeebook was 1.270589 times the price of the copy offered by profnath. And now the bordeebook copy was 1.270589 times profnath again. So clearly at least one of the sellers was setting their price algorithmically in response to changes in the other’s price. I continued to watch carefully and the full pattern emerged.

Once a day profnath set their price to be 0.9983 times bordeebook’s price. The prices would remain close for several hours, until bordeebook “noticed” profnath’s change and elevated their price to 1.270589 times profnath’s higher price. The pattern continued perfectly for the next week…

…As I amusedly watched the price rise every day, I learned that Amazon retailers are increasingly using algorithmic pricing (something Amazon itself does on a large scale), with a number of companies offering pricing algorithms/services to retailers. Both profnath and bordeebook were clearly using automatic pricing – employing algorithms that didn’t have a built-in sanity check on the prices they produced. But the two retailers were clearly employing different strategies.

Closing price before the “mistake” was found: $23,698,655.93. 

Bloomberg: Personalizing the News for 20 Million People - GigaOm

publicmedia:

The approach that most traditional news services take, Krim said — in which editors select and present the news that they think matters most to a generic reader — “doesn’t really scale very well.” But by using analytical tools on the data about those web visitors and their reading patterns and usage, Krim said that Bloomberg can “present 20 million different views of that information.” The company is also trying to take into account the differences in how users want to receive their news during the day, including whether they want content as text they can read on their laptop or mobile, or video they can watch, and so on.

The company now collects over 100 data points for every page a reader loads, based on what they interact with, what time of day it is, etc. — more than a terabyte of data every day in aggregate, Krim said — and the team has 15 different algorithms running in parallel to make recommendations for what that reader might want to see next.

“We started studying the behavior of decision makers who come to our site,” Krim said, “and we noticed that there are a number of different usage curves of news… TV is kind of a U-shape during the day, web usage is like an arc, mobile is like an oscillating curve, magazines ramp up during the day, and newspapers obviously ramp down during the day.” What Bloomberg Digital is trying to do, he said, is to understand how people move from one to the other, and then present information to them in the way that they want.

LinkedIn Gets Newsy
LinkedIn launched a new product yesterday called LinkedIn Today.
Much like Paper.li and The Tweeted Times, the service extracts links submitted to it (in this case via Twitter and LinkedIn) and lays out articles in an eye-pleasing manner.
The benefit for users is great: you can drill down deeply into the news of the day of industries that interest you.
For LinkedIn? They do have an IPO coming up. Looks like they’re adding layers of new tools to make their offering more attractive. If they nail it, the hope is that the site becomes a daily stop for business users.
Via Fast Company:

At the moment, LinkedIn Today only has 22 industries to choose from, but the product will expand to include more of the service’s 115 listed industries. One example: “The agriculture industry is not sharing enough content to build a compelling product, but we hope over time it will,” explains Liz Walker, Product Manager of LinkedIn Today. Eventually, users will also have the option of searching by different cuts of data—i.e. what CEOs in the Bay Area are reading, or what Product Managers at LinkedIn are looking at.

LinkedIn Gets Newsy

LinkedIn launched a new product yesterday called LinkedIn Today.

Much like Paper.li and The Tweeted Times, the service extracts links submitted to it (in this case via Twitter and LinkedIn) and lays out articles in an eye-pleasing manner.

The benefit for users is great: you can drill down deeply into the news of the day of industries that interest you.

For LinkedIn? They do have an IPO coming up. Looks like they’re adding layers of new tools to make their offering more attractive. If they nail it, the hope is that the site becomes a daily stop for business users.

Via Fast Company:

At the moment, LinkedIn Today only has 22 industries to choose from, but the product will expand to include more of the service’s 115 listed industries. One example: “The agriculture industry is not sharing enough content to build a compelling product, but we hope over time it will,” explains Liz Walker, Product Manager of LinkedIn Today. Eventually, users will also have the option of searching by different cuts of data—i.e. what CEOs in the Bay Area are reading, or what Product Managers at LinkedIn are looking at.