Posts tagged data

Fortunately, my polling place is around the corner from my apartment.
Not quite sure where yours is? There’s a Web site for that.
Geeky stuff: Fun(ny) design aside, the site pulls data from the Google Civic Information API.

Fortunately, my polling place is around the corner from my apartment.

Not quite sure where yours is? There’s a Web site for that.

Geeky stuff: Fun(ny) design aside, the site pulls data from the Google Civic Information API.

Nate Silver on the Colbert Report

The New York Times’s Nate Silver, creator of the influential 538 election forecasting blog, talks pundits versus statistics, and how probability drives his forecasting methodology. 

He has no love for pundits, and says that given the choice between them and Ebola, he’d go with Ebola.

Bonus: Want more on electoral polling? Jihii has a great piece on what it all means, and where it can go so wrong.

Gendered News
From entertainment to finance to politics to sports, the Guardian Datablog explores how women and men are published in leading UK news sources, and how often articles by gender are shared across social networks.
In the interactive they’ve produced, you can sort across different criteria as well as drill deeper into specific publications and their sections.
At a macro level, UK news publishing is much like what we see in the United States: it’s dominated by men with less than 30% of news articles published by women across the Daily Mail, Telegraph and Guardian.
Drill down a bit, or look at gender participation by subject area, and you see women dominating topics like “lifestyle” and “entertainment” and men dominating, well, most everything else.
But the Datablog isn’t just looking at who gets published, but who gets heard.
You would think it’s one and the same but with the decline of the newspaper front page — and the Web site home page — as a conversation driver, it’s the social ecosystem of readers and their sharing habits that drives audience engagement and interaction.
Via the Guardian:

Online, who gets heard is determined by an ecosystem of actors: individuals sharing on Facebook and Twitter, link-sharing communities, personal algorithms on Google News, and citizen media curators. Newspapers only offer part of the information supply; we readers decide who’s heard every time we click, share or use our own voice…
…Of course, the reach of an article is much more complicated than likes and shares. What gets seen is often dependent on the time of day and the influence of who shares a link.
The definition of likes and shares also changes. Since our measurements in early August, Facebook’s counters have been changed to track links sent within private messages. This year, newsrooms experimented with Facebook social readers and tablet apps to grow their audiences. Bernhard Rieder’s network diagram of the Guardian’s Facebook page illustrates yet another social channel for news. Publishers sometimes can’t agree on what their own data means.
Despite these limitations, data on likes and shares offer the best outside picture of audience interest in women’s writing in the news.

Read through for analysis and more about the methodology and tools used to suss out the data. As usual, the Guardian also lets you download the data so you can work with it yourself.
Image: Screenshot, UK News Gender Ranking: What They Publish vs What Readers Share, via The Guardian. Select to embiggen.

Gendered News

From entertainment to finance to politics to sports, the Guardian Datablog explores how women and men are published in leading UK news sources, and how often articles by gender are shared across social networks.

In the interactive they’ve produced, you can sort across different criteria as well as drill deeper into specific publications and their sections.

At a macro level, UK news publishing is much like what we see in the United States: it’s dominated by men with less than 30% of news articles published by women across the Daily Mail, Telegraph and Guardian.

Drill down a bit, or look at gender participation by subject area, and you see women dominating topics like “lifestyle” and “entertainment” and men dominating, well, most everything else.

But the Datablog isn’t just looking at who gets published, but who gets heard.

You would think it’s one and the same but with the decline of the newspaper front page — and the Web site home page — as a conversation driver, it’s the social ecosystem of readers and their sharing habits that drives audience engagement and interaction.

Via the Guardian:

Online, who gets heard is determined by an ecosystem of actors: individuals sharing on Facebook and Twitter, link-sharing communities, personal algorithms on Google News, and citizen media curators. Newspapers only offer part of the information supply; we readers decide who’s heard every time we click, share or use our own voice…

…Of course, the reach of an article is much more complicated than likes and shares. What gets seen is often dependent on the time of day and the influence of who shares a link.

The definition of likes and shares also changes. Since our measurements in early August, Facebook’s counters have been changed to track links sent within private messages. This year, newsrooms experimented with Facebook social readers and tablet apps to grow their audiences. Bernhard Rieder’s network diagram of the Guardian’s Facebook page illustrates yet another social channel for news. Publishers sometimes can’t agree on what their own data means.

Despite these limitations, data on likes and shares offer the best outside picture of audience interest in women’s writing in the news.

Read through for analysis and more about the methodology and tools used to suss out the data. As usual, the Guardian also lets you download the data so you can work with it yourself.

Image: Screenshot, UK News Gender Ranking: What They Publish vs What Readers Share, via The Guardian. Select to embiggen.

Nulpunt to Give Freedom of Information Some Digital Grunt

Every good design project starts with a problem, and one of the biggest is how to find the key facts in a sea of data. 

A design studio in Amsterdam called Metahaven is developing a product called Nulpunt to do two things: Firstly, it tells journalists and activists when their government has published a document holding information they care about, and secondly it lets users highlight, annotate and share the important sections.

Metahaven say that Nulpunt will integrate with the new Freedom of Information Laws The Netherlands is drafting. The new legislation will demand the publication of vastly more documents produced by government, the public service or private companies working on publicly funded projects. 

It’s great for transparency in theory, but assuming the laws pass and aren’t hobbled on the way through, it’ll mean that the FOI “problem” won’t be about scarcity any more, it’ll be about abundance; how to organize and sift through a vast sea of data. And that’s the problem that Metahaven is aiming to solve with Nulpunt; using key digital characteristics; personalization and socialization.

They’re not the only people to be attacking the problem space: If you’ve got youself a huge document dump you can use Document Cloud to automatically ‘read’ the files for key facts, subjects and dates, or turn to The Overview Project to get a kind of visual table of contents. 

The point of difference for Nulpunt, assuming it gets a release, seems to be that it’s designed to integrate with a specific source of information; namely the Dutch government. Metahaven are keen to launch Nulpunt in more countries, although they have also said Nulpunt will not always be non-profit and commercial free, which is a tough business model to scale.

There’s more on the product at FastCompany Design and The Verge

Mapping Gender Income Inequality

A collaboration between Slate and the New America Foundation. The interactive visualization was created using MapBox.

Via Slate:

Women in Utah have it the worst. There, the average working woman makes 55 cents for every dollar the average working man makes. The state is followed closely by Wyoming, at 56 cents; Louisiana, at 59 cents; North Dakota, at 62 cents; and Michigan, at 62 cents. The best states for income equality are Hawaii, Florida, Nevada, Maryland, and North Carolina. In each, women make about three-fourths of what men make.

County-level data illustrate the best cities for pay equality: Washington, D.C. and Dallas lead, followed by San Francisco, Los Angeles, Austin, Santa Fe, New York, and Boston. In each, women make at least 80 cents per dollar that men make. In most other major cities, they make about 70 cents.

For a biggie version, see Slate, Map Shows the Worst State for Women To Make Money.

Mapping Conflict
Conflict History maps the world’s wars and skirmishes over the millennia. Users control the map with a timeline scrubber or by entering search terms. Data is pulled from Freebase and shown on Google Maps.
Image: Screenshot, Conflict History 1998-2007. 
H/T: Infosthetics.

Mapping Conflict

Conflict History maps the world’s wars and skirmishes over the millennia. Users control the map with a timeline scrubber or by entering search terms. Data is pulled from Freebase and shown on Google Maps.

Image: Screenshot, Conflict History 1998-2007

H/T: Infosthetics.

Bidding on Your Personal Browser History

Proclivity Media and others are working very hard to find out what you want to buy, and they’re getting to know you very well along the way.

Here’s the backstory: one particularly savvy way of advertising has begun receiving a lot of attention lately. It’s called re-targeting, and it relies on personal browser history to figure out what users may want to buy.

Automated programming bids on ad space individual users see based on their personal search history, more traditional consumer reports and retailer records, selling one-time ads at several hundred dollars a pop.

via Internet Retailer:

Proclivity uses its Consumer Valuation Platform to place cookies in consumers’ web browsers to monitor their browsing behavior around the Internet and tracks their specific interactions on a client retailer’s site using tiny pieces of embedded software code in site content. Proclivity adds data from the retailer, including the merchant’s own web analytics on shoppers’ click activity, and information on sales, merchandizing campaigns and product pricing, then scores it to determine when each customer is likely to buy and at what price point.

This is very similar to Facebook Exchange, which has been working cautiously well since June.

Here’s the Wall Street Journal:

Facebook is using its data trove to study the links between Facebook ads and members’ shopping habits at brick-and-mortar stores, part of an effort to prove the effectiveness of its $3.7 billion annual ad business to marketers.

FJP: This is big data at work — for many businesses, there’s a lot to find when comparing data sets that follow consumer behavior online and in stores.

I Love Messing with Data
The Journalist’s Resource, a project that curates media scholarship, created a great reading list on the social, cultural and political issues and possibilities surrounding big data.
Like much in today’s digital world, the promise and hope of using huge data sets to solve significant issues are all too tempered by the threats that same data can have depending on whose hands it is in and what they plan to do with it.
What follows are abstracts from just some of the articles the Journalist’s Resource has pulled together. Read through for more and to access links back to the originals.

danah boyd and Kate Crawford Will large-scale analysis of DNA help cure diseases? Or will it usher in a new wave of medical inequality? Will data analytics help make people’s access to information more efficient and effective? Or will it be used to track protesters in the streets of major cities? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Some or all of the above?… Given the rise of Big Data as both a phenomenon and a methodological persuasion, we believe that it is time to start critically interrogating this phenomenon, its assumptions and its biases.
Vivek Kundra If … data isn’t sliced, diced and cubed to separate signal from noise, it can be useless. But, when made available to the public and combined with the network effect — defined by Reed’s Law, which asserts that the utility of large networks, particularly social networks, can scale exponentially with the size of the network — society has the potential to drive massive social, political and economic change.
David M. Berry In cutting up the world [into data chunks], information about the world necessarily has to be discarded in order to store a representation within the computer. In other words, a computer requires that everything is transformed from the continuous flow of our everyday reality into a grid of numbers that can be stored as a representation of reality which can then be manipulated using algorithms. These subtractive methods of understanding reality (episteme) produce new knowledges and methods for the control of reality (techne). They do so through a digital mediation, which the digital humanities are starting to take seriously as they’re problematic.”
Bert-Japp Koops Big Data involves not only individuals’ digital footprints (data they themselves leave behind) but, perhaps more importantly, also individuals’ data shadows (information about them generated by others). And contrary to physical footprints and shadows, their digital counterparts are not ephemeral but persistent. This presents particular challenges for the right to be forgotten, which are discussed in the form of three key questions. Against whom can the right be invoked? When and why can the right be invoked? And how can the right be effected?”
Janna Anderson and Lee RainieWhile enthusiasts see great potential for using Big Data, privacy advocates are worried as more and more data is collected about people — both as they knowingly disclose such things as their postings through social media and as they unknowingly share digital details about themselves as they march through life. Not only do the advocates worry about profiling, they also worry that those who crunch Big Data with algorithms might draw the wrong conclusions about who someone is, how she might behave in the future, and how to apply the correlations that will emerge in the data analysis.

Image: Calvin and Hobbes.

I Love Messing with Data

The Journalist’s Resource, a project that curates media scholarship, created a great reading list on the social, cultural and political issues and possibilities surrounding big data.

Like much in today’s digital world, the promise and hope of using huge data sets to solve significant issues are all too tempered by the threats that same data can have depending on whose hands it is in and what they plan to do with it.

What follows are abstracts from just some of the articles the Journalist’s Resource has pulled together. Read through for more and to access links back to the originals.

danah boyd and Kate Crawford
Will large-scale analysis of DNA help cure diseases? Or will it usher in a new wave of medical inequality? Will data analytics help make people’s access to information more efficient and effective? Or will it be used to track protesters in the streets of major cities? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Some or all of the above?… Given the rise of Big Data as both a phenomenon and a methodological persuasion, we believe that it is time to start critically interrogating this phenomenon, its assumptions and its biases.

Vivek Kundra
If … data isn’t sliced, diced and cubed to separate signal from noise, it can be useless. But, when made available to the public and combined with the network effect — defined by Reed’s Law, which asserts that the utility of large networks, particularly social networks, can scale exponentially with the size of the network — society has the potential to drive massive social, political and economic change.

David M. Berry
In cutting up the world [into data chunks], information about the world necessarily has to be discarded in order to store a representation within the computer. In other words, a computer requires that everything is transformed from the continuous flow of our everyday reality into a grid of numbers that can be stored as a representation of reality which can then be manipulated using algorithms. These subtractive methods of understanding reality (episteme) produce new knowledges and methods for the control of reality (techne). They do so through a digital mediation, which the digital humanities are starting to take seriously as they’re problematic.”

Bert-Japp Koops
Big Data involves not only individuals’ digital footprints (data they themselves leave behind) but, perhaps more importantly, also individuals’ data shadows (information about them generated by others). And contrary to physical footprints and shadows, their digital counterparts are not ephemeral but persistent. This presents particular challenges for the right to be forgotten, which are discussed in the form of three key questions. Against whom can the right be invoked? When and why can the right be invoked? And how can the right be effected?”

Janna Anderson and Lee Rainie
While enthusiasts see great potential for using Big Data, privacy advocates are worried as more and more data is collected about people — both as they knowingly disclose such things as their postings through social media and as they unknowingly share digital details about themselves as they march through life. Not only do the advocates worry about profiling, they also worry that those who crunch Big Data with algorithms might draw the wrong conclusions about who someone is, how she might behave in the future, and how to apply the correlations that will emerge in the data analysis.

Image: Calvin and Hobbes.

Imagine if your whole life you’ve looked through one eye, only seeing through one eye and suddenly, scientists can give you the ability to open up a second eye. So what you would see is not just more data but it’s a whole different way of seeing.

Said photojournalist Rick Smolan today, telling the audience at a Human Face of Big Data event the same thing he told his son when, at 2am, the little boy climbed out of bed, snuck into the kitchen and asked him why he stayed up late everynight on the phone talking about “big data.” Smolan continued:

My son, who again wanted to stay up as late as he could before I sent him back to bed, said: could scientists and computers, like, let us open up a third eye and a fourth and a fifth? And I said yes.

See the group’s phone app, its upcoming book and more here.

New York Times, Washington Post developers team up to create Open Elections database

shaneguiter:

Senior developers from The New York Times and The Washington Post are looking for volunteers to help collect more than 10 years of federal elections data from each state. With their help — and $200,000 in Knight News Challenge funding — Serdar Tumgoren and Derek Willis are working on creating a free, comprehensive source of official U.S. election results.

The goal is to end up with electoral data that can then be linked to different types of data sets — campaign finance, voter demographics, legislative histories, and so on — in ways that previously haven’t been possible on this scale.

Tumgoren, of The Washington Post, says the idea for Open Elections came from “mutual frustration that there is no single, free source of data — and more importantly, nicely standardized data.” Soothing this frustration isn’t necessarily going to be pretty. The task of finding state elections data — at least some of which will be a godawful, inextricable mess — will require some “brute-forcing,” Tumgoren says.

Access to Full Twitter Archive of Public Posts Now Available

Gnip, a social data delivery company that offers the full Twitter firehose, announced the release of Historical PowerTrack, a tool for accessing Twitter’s complete public history.

Via Gnip:

This level of access has never been available and we know it is really going to accelerate the rate of innovation going forward. We think there are new products and businesses that will now be possible with access to a “social layer” of historical data. We frequently ask ourselves “If you could know what the world was saying at any moment in time about any topic, what could you build?”

We very much look forward to seeing how that question is answered.

NASA Animation of Temperature Data from 1880-2011

Via The Climate Desk, “a journalistic collaboration dedicated to exploring the impact—human, environmental, economic, political—of a changing climate. The partners are The Atlantic, Center for Investigative Reporting, Grist, The Guardian, Mother Jones, Slate, Wired, and PBS’s new public-affairs show Need To Know.” 

Let a thousand Jon Stewarts bloom.

Brewster Kahle, founder, Internet Archive, to the New York Times. All the TV News Since 2009, on One Web Site.

The News: Archive.org has recorded every news program from 20 US news sources since 2009. Today they release 350,000 broadcasts to the world. You can start your remixing here.

fjp-latinamerica:

La Nación gives Tableau a try
Argentinian newspaper La Nación has been experimenting with the Seattle-based Tableau software and the result is impeccable: a good-looking, interactive data-built map with a list of local transparency laws or applicable regulations. 
Internal insight, via Nación DATA blog:

This collaborative project consists of an interactive map about transparency and public information in Argentina. The final version includes different provisions, ordinances, laws and resolutions on transparency sorted by political jurisdiction.
It took many months to be finally finished. We have no doubt that this map will be useful not only for those who advocate a more transparent government, but also for journalists, code developers, and activists of all sorts.

Image: Partial screenshot of the Nación DATA blog, via LaNación.com

FJP Fun Fact: Pat Hanrahan, one of Tableau’s founders, was also a founding employee at Pixar. 

fjp-latinamerica:

La Nación gives Tableau a try

Argentinian newspaper La Nación has been experimenting with the Seattle-based Tableau software and the result is impeccable: a good-looking, interactive data-built map with a list of local transparency laws or applicable regulations. 

Internal insight, via Nación DATA blog:

This collaborative project consists of an interactive map about transparency and public information in Argentina. The final version includes different provisions, ordinances, laws and resolutions on transparency sorted by political jurisdiction.

It took many months to be finally finished. We have no doubt that this map will be useful not only for those who advocate a more transparent government, but also for journalists, code developers, and activists of all sorts.

Image: Partial screenshot of the Nación DATA blog, via LaNación.com

FJP Fun Fact: Pat Hanrahan, one of Tableau’s founders, was also a founding employee at Pixar.