Posts tagged with ‘algorithms’
Margaret Sarlej, PhD candidate at University of New South Wales, to Phys.org. Computer writes its own fables.
We’ve written before about robots writing the news, now they’re writing fables.
Sarlej has written an application that takes 22 identified emotions used in fables, mixes and matches them with a plot, and pops out a written story.
Easier said than done.
Via The Guardian:
Breaking stories down for a computer “involves not only encoding story elements like characters, events, and plot, but also the ‘common sense’ people take for granted”, said Sarlej. Telling a story is simple enough for a child to do, but stories are actually “incredibly complex”.
"For example, if Bob gives Alice an apple, Alice will have the apple, and Bob will not. To a person, that’s obvious, and doesn’t require explanation. If Bob punches Carl, people would generally assume Carl will be unhappy about it, but a computer doesn’t have the ‘common sense’ to make such an inference. In a computer programme, details like this must be explicitly spelled out," she said.
Current results are fairly rudimentary but, according to Scarlej’s supervisor, computers “will be making interesting and meaningful contributions to literature within the next decade.”
Sverker Johansson could be the most prolific author you’ve never heard of.
Volunteering his time over the past seven years publishing to Wikipedia, the 53-year-old Swede can take credit for 2.7 million articles, or 8.5% of the entire collection, according to Wikimedia analytics, which measures the site’s traffic. His stats far outpace any other user, the group says.
He has been particularly prolific cataloging obscure animal species, including butterflies and beetles, and is proud of his work highlighting towns in the Philippines. About one-third of his entries are uploaded to the Swedish language version of Wikipedia, and the rest are composed in two versions of Filipino, one of which is his wife’s native tongue.
An administrator holding degrees in linguistics, civil engineering, economics and particle physics, he says he has long been interested in “the origin of things, oh, everything.”
It isn’t uncommon, however, for Wikipedia purists to complain about his method. That is because the bulk of his entries have been created by a computer software program—known as a bot. Critics say bots crowd out the creativity only humans can generate.
Mr. Johansson’s program scrubs databases and other digital sources for information, and then packages it into an article. On a good day, he says his “Lsjbot” creates up to 10,000 new entries.
That’s one way to go about it. Some Wikipedia editors aren’t happy it though.
If you’re a human reporter quaking in your boots this week over news of a Los Angeles Times algorithm that wrote the newspaper’s initial story about an earthquake, you might want to cover your ears for this fact:
Software from Automated Insights will generate about 1 billion stories this year — up from 350 million last year, CEO and founder Robbie Allen told Poynter via phone.
FJP: Here’s a ponderable for you.
A few weeks ago, the New York Post reported that Quinton Ross died. Ross, a former Brooklyn Nets basketball player, didn’t know he was dead and soon let people know he was just fine.
"A couple (relatives) already heard it," Ross told the Associated Press. “They were crying. I mean, it was a tough day, man, mostly for my family and friends… My phone was going crazy. I checked Facebook. Finally, I went on the Internet, and they were saying I was dead. I just couldn’t believe it.”
The original reporter on the story? A robot. Specifically, Wikipedia Live Monitor, created by Google engineer Thomas Steiner.
Slate explains how it happened:
Wikipedia Live Monitor is a news bot designed to detect breaking news events. It does this by listening to the velocity and concurrent edits across 287 language versions of Wikipedia. The theory is that if lots of people are editing Wikipedia pages in different languages about the same event and at the same time, then chances are something big and breaking is going on.
At 3:09 p.m. the bot recognized the apparent death of Quinton Ross (the basketball player) as a breaking news event—there had been eight edits by five editors in three languages. The bot sent a tweet. Twelve minutes later, the page’s information was corrected. But the bot remained silent. No correction. It had shared what it thought was breaking news, and that was that. Like any journalist, these bots can make mistakes.
Quick takeaway: Robots, like the humans that program them, are fallible.
Slower, existential takeaway: “How can we instill journalistic ethics in robot reporters?”
As Nicholas Diakopoulos explains in Slate, code transparency is an inadequate part of the answer. More important is understanding what he calls the “tuning criteria,” or the inherent biases, that are used to make editorial decisions when algorithms direct the news.
Read through for his excellent take.
Via Nieman Lab:
The Guardian is experimenting in the craft newspaper business and getting some help from robots.
That may sound odd, given that the company prints a daily paper read throughout Britain. A paper staffed by humans. But the company is tinkering with something smaller and more algorithm-driven.
The Guardian has partnered with The Newspaper Club, a company that produces small-run DIY newspapers, to print The Long Good Read, a weekly print product that collects a handful of The Guardian’s best longform stories from the previous seven days. The Newspaper Club runs off a limited number of copies, which are then distributed at another Guardian experiment: a coffee shop in East London. That’s where, on Monday mornings, you’ll find a 24-page tabloid with a simple layout available for free.
On the surface, The Long Good Read has the appeal of being a kind of analog Instapaper for all things Guardian. But the interesting thing is how paper is produced: robots. Okay, algorithms if you want to be technical — algorithms and programs that both select the paper’s stories and lay them out on the page.
Jemima Kiss, head of technology for The Guardian, said The Long Good Read is another attempt at finding ways to give stories new life beyond the day they’re published: “It’s just a way of reusing that content in a more imaginative way and not getting too hung up on the fact it’s a newspaper.”
In an interview on his newest project (the just over 1-year-old long-form platform Medium) Twitter co-founder Evan Williams shared a few thoughts on the uselessness of general news, and the need for a platform to highlight ideas of lasting import.
Williams is taking aim squarely at the news industry’s most embarrassing vulnerability: the incessant need to trump up mundane happenings in order to habituate readers into needing news like a daily drug fix.
“News in general doesn’t matter most of the time, and most people would be far better off if they spent their time consuming less news and more ideas that have more lasting import,” he tells me during our interview inside a temporary Market Street office space that’s housing Medium, until the top two floors are ready for his growing team. “Even if it’s fiction, it’s probably better most of the time.”
[…] Instead, Williams argues, citizens should re-calibrate their ravenous appetite for information towards more awe-inspiring content. “Published written ideas and stories are life-changing,” he gushes, recalling his early childhood fascination with books as the motivation to take on the media establishment. The Internet “was freeing that up, that excitement about knowledge that’s inside of books–multiplied and freed and unlocked for the world; and, the world would be better in every way.”
In Williams’s grand vision, the public reads for enlightenment; news takes a backseat directly in proportion to how often it leaves us more informed and inspired.
This is a really valid, and really noble ambition that resonates with more than a few people. In a letter to a young journalist, Pulitzer winning writer Lane DeGregory looks back on her career and says she wishes she had “read more short stories and fewer newspaper articles.”
It also echoes what Maria Popova has been aiming to do with her curatorial interestingness project, Brain Pickings, for years now. Last week, she wrote a must-read piece on tech writer Clive Thompson’s new book, which pushes past “painfully familiar and trite-by-overuse notions like distraction and information overload,” to deeply examine the impact of digital tools. She writes:
Several decades after Vannevar Bush’s now-legendary meditation on how technology will impact our thinking, Thompson reaches even further into the fringes of our cultural sensibility — past the cheap techno-dystopia, past the pollyannaish techno-utopia, and into that intricate and ever-evolving intersection of technology and psychology.
The Problem: Though I’ve been excited about Medium and its potential, I’m inclined to file Williams’ vision for it into the “pollyannaish techno-utopia” bucket that Popova mentions because although the impulse behind it (the desire for an antidote to the ravenous appetite for tidbits of useless information) is something I wholeheartedly agree with, algorithmic curation worries me.
Traditional news editors stake their reputations on having an intuition for what drives eyeballs to their sites. Editors don’t, however, know whether readers leave more informed.
Williams thinks Medium has an answer: an intelligent algorithm that suggests stories, primarily based on how long users spend reading certain articles (which he’s discussing publicly for the first time). Like Pandora did for music discovery, Medium’s new intelligent curator aims to improve the ol’ human-powered system of manually scrolling through the Internet and asking others what to read.
In the algorithm itself, Medium prioritizes time spent on an article, rather than simple page views. “Time spent is not actually a value in itself, but in a world where people have infinite choices, it’s a pretty good measure if people are getting value,” explains Williams.
"Time spent" seems like a questionable way to measure value, if "enlightening" content is what Medium wants to put on the screens of readers. As a content-neutral long-form discovery platform, sure, it makes sense. And there isn’t really anything wrong with it either. But touting itself as a solution to our appetite for endless streams of meaningless information seems troubling to me. Here’s why:
A key aspect of Thompson’s argument on the good the internet has done for our brains is that it has given us unprecedented access to one another’s memory stores, which means that our ability to indiscriminately discover information and understand the world through it, has expanded infinitely. To oversimplify it: we don’t have to remember as much by ourselves—we simply need to remember where information is stored and how to access it quickly. While the benefits are obvious, the issue with this is that it hampers creative thought, and our ability to make connections.
In light of platforms like Medium, longer isn’t better, especially when the discovery of value is left to machines. Popova excerpts a portion of Thompson’s book in which he explains how an algorithm’s biases exist, but are almost impossible to identify:
The real challenge of using machines for transactive memory lies in the inscrutability of their mechanics. Transactive memory works best when you have a sense of how your partners’ minds work — where they’re strong, where they’re weak, where their biases lie. I can judge that for people close to me. But it’s harder with digital tools, particularly search engines. You can certainly learn how they work and develop a mental model of Google’s biases. … But search companies are for-profit firms. They guard their algorithms like crown jewels. This makes them different from previous forms of outboard memory. A public library keeps no intentional secrets about its mechanisms; a search engine keeps many. On top of this inscrutability, it’s hard to know what to trust in a world of self-publishing. To rely on networked digital knowledge, you need to look with skeptical eyes. It’s a skill that should be taught with the same urgency we devote to teaching math and writing.
Popova explains that without a mental pool of resources from which we can connect existing ideas into new combinations—and I’d add, thereby access, retain, and be “enlightened” by information—our capacity to do so is deflated.
TL;DR: Popova’s piece doesn’t directly address or assess discovery platforms like Medium, but I think it’s worth considering them together. Longer form writing isn’t an antidote to short bites of information, and ideas of lasting value can’t be judged by time spent consuming them. The point here is that for content platforms that truly seek to give people access to more ideas with more lasting import, a lot more work has to be done, namely: (1) The limitations of algorithmic curation need to be transparent, and talked about, and (2) Readers need to be taught how to critically consume self-published writing that they received through digitally networked knowledge. —Jihii
Any decision process, whether human or algorithm, about what to include, exclude, or emphasize — processes of which Google News has many — has the potential to introduce bias. What’s interesting in terms of algorithms though is that the decision criteria available to the algorithm may appear innocuous while at the same time resulting in output that is perceived as biased.
CW Anderson, Culture Daily. The Materiality of Algorithms.
In what reads like a starting point for more posts on the subject, CUNY Prof Chris Anderson discusses what documents journalists may want to design algorithms for, and just how hard that task will be.
Algorithms doing magic inside massive data sets and search engines, while not mathematically simple, are generally easy to conceptualize — algorithms and their data are sitting in the computer, the algorithm sifts through the excel sheet in the background and bam! you have something.
But if you’re working with poorly organized documents, it’s difficult to simply plug them in.
Chris writes that the work required to include any document in a set will shape the algorithm that makes sense of the whole bunch. This will be a problem for journalists who want to examine any documents made without much forethought, which is to say: government documents, phone records from different companies and countries, eye witness reports, police sketches, mugshots, bank statements, tax forms, and hundreds of other things worth investigating.
The recovered text [from these documents] is a mess, because these documents are just about the worse possible case for OCR [optical character recognition]: many of these documents are forms with a complex layout, and the pages have been photocopied multiple times, redacted, scribbled on, stamped and smudged. But large blocks of text come through pretty well, and this command extracts what text there is into one file per page.
To read the rest of Stray’s account, see his Overview Project.
And to see more with Chris Anderson, see our recent video interviews with him.
There is, on the one hand, an incredibly simple explanation for the shift in news organizations’ attitude toward Google: clicks. Google News was founded 10 years ago — September 22, 2002 — and has since functioned not merely as an aggregator of news, but also as a source of traffic to news sites. Google News, its executives tell me, now “algorithmically harvests” articles from more than 50,000 news sources across 72 editions and 30 languages. And Google News-powered results, Google says, are viewed by about 1 billion unique users a week. (Yep, that’s billion with a b.) Which translates, for news outlets overall, to more than 4 billion clicks each month: 1 billion from Google News itself and an additional 3 billion from web search.
As a Google representative put it, “That’s about 100,000 business opportunities we provide publishers every minute.”