Posts tagged data

Mapping Gender Income Inequality

A collaboration between Slate and the New America Foundation. The interactive visualization was created using MapBox.

Via Slate:

Women in Utah have it the worst. There, the average working woman makes 55 cents for every dollar the average working man makes. The state is followed closely by Wyoming, at 56 cents; Louisiana, at 59 cents; North Dakota, at 62 cents; and Michigan, at 62 cents. The best states for income equality are Hawaii, Florida, Nevada, Maryland, and North Carolina. In each, women make about three-fourths of what men make.

County-level data illustrate the best cities for pay equality: Washington, D.C. and Dallas lead, followed by San Francisco, Los Angeles, Austin, Santa Fe, New York, and Boston. In each, women make at least 80 cents per dollar that men make. In most other major cities, they make about 70 cents.

For a biggie version, see Slate, Map Shows the Worst State for Women To Make Money.

Mapping Conflict
Conflict History maps the world’s wars and skirmishes over the millennia. Users control the map with a timeline scrubber or by entering search terms. Data is pulled from Freebase and shown on Google Maps.
Image: Screenshot, Conflict History 1998-2007. 
H/T: Infosthetics.

Mapping Conflict

Conflict History maps the world’s wars and skirmishes over the millennia. Users control the map with a timeline scrubber or by entering search terms. Data is pulled from Freebase and shown on Google Maps.

Image: Screenshot, Conflict History 1998-2007

H/T: Infosthetics.

Bidding on Your Personal Browser History

Proclivity Media and others are working very hard to find out what you want to buy, and they’re getting to know you very well along the way.

Here’s the backstory: one particularly savvy way of advertising has begun receiving a lot of attention lately. It’s called re-targeting, and it relies on personal browser history to figure out what users may want to buy.

Automated programming bids on ad space individual users see based on their personal search history, more traditional consumer reports and retailer records, selling one-time ads at several hundred dollars a pop.

via Internet Retailer:

Proclivity uses its Consumer Valuation Platform to place cookies in consumers’ web browsers to monitor their browsing behavior around the Internet and tracks their specific interactions on a client retailer’s site using tiny pieces of embedded software code in site content. Proclivity adds data from the retailer, including the merchant’s own web analytics on shoppers’ click activity, and information on sales, merchandizing campaigns and product pricing, then scores it to determine when each customer is likely to buy and at what price point.

This is very similar to Facebook Exchange, which has been working cautiously well since June.

Here’s the Wall Street Journal:

Facebook is using its data trove to study the links between Facebook ads and members’ shopping habits at brick-and-mortar stores, part of an effort to prove the effectiveness of its $3.7 billion annual ad business to marketers.

FJP: This is big data at work — for many businesses, there’s a lot to find when comparing data sets that follow consumer behavior online and in stores.

I Love Messing with Data
The Journalist’s Resource, a project that curates media scholarship, created a great reading list on the social, cultural and political issues and possibilities surrounding big data.
Like much in today’s digital world, the promise and hope of using huge data sets to solve significant issues are all too tempered by the threats that same data can have depending on whose hands it is in and what they plan to do with it.
What follows are abstracts from just some of the articles the Journalist’s Resource has pulled together. Read through for more and to access links back to the originals.

danah boyd and Kate Crawford Will large-scale analysis of DNA help cure diseases? Or will it usher in a new wave of medical inequality? Will data analytics help make people’s access to information more efficient and effective? Or will it be used to track protesters in the streets of major cities? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Some or all of the above?… Given the rise of Big Data as both a phenomenon and a methodological persuasion, we believe that it is time to start critically interrogating this phenomenon, its assumptions and its biases.
Vivek Kundra If … data isn’t sliced, diced and cubed to separate signal from noise, it can be useless. But, when made available to the public and combined with the network effect — defined by Reed’s Law, which asserts that the utility of large networks, particularly social networks, can scale exponentially with the size of the network — society has the potential to drive massive social, political and economic change.
David M. Berry In cutting up the world [into data chunks], information about the world necessarily has to be discarded in order to store a representation within the computer. In other words, a computer requires that everything is transformed from the continuous flow of our everyday reality into a grid of numbers that can be stored as a representation of reality which can then be manipulated using algorithms. These subtractive methods of understanding reality (episteme) produce new knowledges and methods for the control of reality (techne). They do so through a digital mediation, which the digital humanities are starting to take seriously as they’re problematic.”
Bert-Japp Koops Big Data involves not only individuals’ digital footprints (data they themselves leave behind) but, perhaps more importantly, also individuals’ data shadows (information about them generated by others). And contrary to physical footprints and shadows, their digital counterparts are not ephemeral but persistent. This presents particular challenges for the right to be forgotten, which are discussed in the form of three key questions. Against whom can the right be invoked? When and why can the right be invoked? And how can the right be effected?”
Janna Anderson and Lee RainieWhile enthusiasts see great potential for using Big Data, privacy advocates are worried as more and more data is collected about people — both as they knowingly disclose such things as their postings through social media and as they unknowingly share digital details about themselves as they march through life. Not only do the advocates worry about profiling, they also worry that those who crunch Big Data with algorithms might draw the wrong conclusions about who someone is, how she might behave in the future, and how to apply the correlations that will emerge in the data analysis.

Image: Calvin and Hobbes.

I Love Messing with Data

The Journalist’s Resource, a project that curates media scholarship, created a great reading list on the social, cultural and political issues and possibilities surrounding big data.

Like much in today’s digital world, the promise and hope of using huge data sets to solve significant issues are all too tempered by the threats that same data can have depending on whose hands it is in and what they plan to do with it.

What follows are abstracts from just some of the articles the Journalist’s Resource has pulled together. Read through for more and to access links back to the originals.

danah boyd and Kate Crawford
Will large-scale analysis of DNA help cure diseases? Or will it usher in a new wave of medical inequality? Will data analytics help make people’s access to information more efficient and effective? Or will it be used to track protesters in the streets of major cities? Will it transform how we study human communication and culture, or narrow the palette of research options and alter what ‘research’ means? Some or all of the above?… Given the rise of Big Data as both a phenomenon and a methodological persuasion, we believe that it is time to start critically interrogating this phenomenon, its assumptions and its biases.

Vivek Kundra
If … data isn’t sliced, diced and cubed to separate signal from noise, it can be useless. But, when made available to the public and combined with the network effect — defined by Reed’s Law, which asserts that the utility of large networks, particularly social networks, can scale exponentially with the size of the network — society has the potential to drive massive social, political and economic change.

David M. Berry
In cutting up the world [into data chunks], information about the world necessarily has to be discarded in order to store a representation within the computer. In other words, a computer requires that everything is transformed from the continuous flow of our everyday reality into a grid of numbers that can be stored as a representation of reality which can then be manipulated using algorithms. These subtractive methods of understanding reality (episteme) produce new knowledges and methods for the control of reality (techne). They do so through a digital mediation, which the digital humanities are starting to take seriously as they’re problematic.”

Bert-Japp Koops
Big Data involves not only individuals’ digital footprints (data they themselves leave behind) but, perhaps more importantly, also individuals’ data shadows (information about them generated by others). And contrary to physical footprints and shadows, their digital counterparts are not ephemeral but persistent. This presents particular challenges for the right to be forgotten, which are discussed in the form of three key questions. Against whom can the right be invoked? When and why can the right be invoked? And how can the right be effected?”

Janna Anderson and Lee Rainie
While enthusiasts see great potential for using Big Data, privacy advocates are worried as more and more data is collected about people — both as they knowingly disclose such things as their postings through social media and as they unknowingly share digital details about themselves as they march through life. Not only do the advocates worry about profiling, they also worry that those who crunch Big Data with algorithms might draw the wrong conclusions about who someone is, how she might behave in the future, and how to apply the correlations that will emerge in the data analysis.

Image: Calvin and Hobbes.

Imagine if your whole life you’ve looked through one eye, only seeing through one eye and suddenly, scientists can give you the ability to open up a second eye. So what you would see is not just more data but it’s a whole different way of seeing.

Said photojournalist Rick Smolan today, telling the audience at a Human Face of Big Data event the same thing he told his son when, at 2am, the little boy climbed out of bed, snuck into the kitchen and asked him why he stayed up late everynight on the phone talking about “big data.” Smolan continued:

My son, who again wanted to stay up as late as he could before I sent him back to bed, said: could scientists and computers, like, let us open up a third eye and a fourth and a fifth? And I said yes.

See the group’s phone app, its upcoming book and more here.

New York Times, Washington Post developers team up to create Open Elections database

shaneguiter:

Senior developers from The New York Times and The Washington Post are looking for volunteers to help collect more than 10 years of federal elections data from each state. With their help — and $200,000 in Knight News Challenge funding — Serdar Tumgoren and Derek Willis are working on creating a free, comprehensive source of official U.S. election results.

The goal is to end up with electoral data that can then be linked to different types of data sets — campaign finance, voter demographics, legislative histories, and so on — in ways that previously haven’t been possible on this scale.

Tumgoren, of The Washington Post, says the idea for Open Elections came from “mutual frustration that there is no single, free source of data — and more importantly, nicely standardized data.” Soothing this frustration isn’t necessarily going to be pretty. The task of finding state elections data — at least some of which will be a godawful, inextricable mess — will require some “brute-forcing,” Tumgoren says.

Access to Full Twitter Archive of Public Posts Now Available

Gnip, a social data delivery company that offers the full Twitter firehose, announced the release of Historical PowerTrack, a tool for accessing Twitter’s complete public history.

Via Gnip:

This level of access has never been available and we know it is really going to accelerate the rate of innovation going forward. We think there are new products and businesses that will now be possible with access to a “social layer” of historical data. We frequently ask ourselves “If you could know what the world was saying at any moment in time about any topic, what could you build?”

We very much look forward to seeing how that question is answered.

NASA Animation of Temperature Data from 1880-2011

Via The Climate Desk, “a journalistic collaboration dedicated to exploring the impact—human, environmental, economic, political—of a changing climate. The partners are The Atlantic, Center for Investigative Reporting, Grist, The Guardian, Mother Jones, Slate, Wired, and PBS’s new public-affairs show Need To Know.” 

Let a thousand Jon Stewarts bloom.

Brewster Kahle, founder, Internet Archive, to the New York Times. All the TV News Since 2009, on One Web Site.

The News: Archive.org has recorded every news program from 20 US news sources since 2009. Today they release 350,000 broadcasts to the world. You can start your remixing here.

fjp-latinamerica:

La Nación gives Tableau a try
Argentinian newspaper La Nación has been experimenting with the Seattle-based Tableau software and the result is impeccable: a good-looking, interactive data-built map with a list of local transparency laws or applicable regulations. 
Internal insight, via Nación DATA blog:

This collaborative project consists of an interactive map about transparency and public information in Argentina. The final version includes different provisions, ordinances, laws and resolutions on transparency sorted by political jurisdiction.
It took many months to be finally finished. We have no doubt that this map will be useful not only for those who advocate a more transparent government, but also for journalists, code developers, and activists of all sorts.

Image: Partial screenshot of the Nación DATA blog, via LaNación.com

FJP Fun Fact: Pat Hanrahan, one of Tableau’s founders, was also a founding employee at Pixar. 

fjp-latinamerica:

La Nación gives Tableau a try

Argentinian newspaper La Nación has been experimenting with the Seattle-based Tableau software and the result is impeccable: a good-looking, interactive data-built map with a list of local transparency laws or applicable regulations. 

Internal insight, via Nación DATA blog:

This collaborative project consists of an interactive map about transparency and public information in Argentina. The final version includes different provisions, ordinances, laws and resolutions on transparency sorted by political jurisdiction.

It took many months to be finally finished. We have no doubt that this map will be useful not only for those who advocate a more transparent government, but also for journalists, code developers, and activists of all sorts.

Image: Partial screenshot of the Nación DATA blog, via LaNación.com

FJP Fun Fact: Pat Hanrahan, one of Tableau’s founders, was also a founding employee at Pixar. 

Forest of Advocacy

A collaboration between LazerLAB and Northeastern University’s Centers for Computational Social Science and Digital Humanities is releasing weekly political data visualizations between now and the 2012 election.

Here’s what they say about their first:

Our first family of visualizations is the “Forest of Advocacy.”

These visualizations provide a dynamic look at the partisan tilt of giving within organizations. For each organization, individuals are characterized as points sketching out a line over time. The X axis is time, and the Y axis represents the net partisan tilt of contributions over the preceding 6 months. Over the decades, one sees lines sketched out, reflecting the partisanship of individuals over time. For each organization, we also provide the net contributions of the entire organization, and the names of biggest Democratic, Republican, and “bipartisan” contributors (the individual with the highest product of Democratic and Republican contributions).

The visualizations are broken down by organization (eg., the ACLU, Goldman Sacks) with a video like the one above and static visualizations accompanying each.

H/T: Information Aesthetics.

Google Just Produced a MAD Visualization
Mapping Arms Data, that is. It visualizes the imports and exports of small arms, light weapons, and ammunition across 250 states and territories between 1992 and 2010. Specifically:

• Military weapons include artillery, mortars, machine guns (sub, light, and heavy), assault rifles, combat shotguns, and machine pistols.
• Civilian arms consist of pistols, revolvers, sporting shotguns, sporting rifles (anything not rated as a military item including fully automatic weaponry).
• Ammunition includes shotgun shells and small caliber ammo (anything below 14.5mm which isn’t fired from a shotgun).

It was produced as part of the Google Ideas INFO (Illicit Networks, Forces in Opposition) Summit. Read more about it here.
FJP: We particularly like its use of WebGL, which is a JavaScript API for rendering interactive graphics. It’s alarming, illuminating, and pretty mesmerizing to play with.
Image: Screenshot from the visualization.

Google Just Produced a MAD Visualization

Mapping Arms Data, that is. It visualizes the imports and exports of small arms, light weapons, and ammunition across 250 states and territories between 1992 and 2010. Specifically:

• Military weapons include artillery, mortars, machine guns (sub, light, and heavy), assault rifles, combat shotguns, and machine pistols.

• Civilian arms consist of pistols, revolvers, sporting shotguns, sporting rifles (anything not rated as a military item including fully automatic weaponry).

• Ammunition includes shotgun shells and small caliber ammo (anything below 14.5mm which isn’t fired from a shotgun).

It was produced as part of the Google Ideas INFO (Illicit Networks, Forces in Opposition) Summit. Read more about it here.

FJP: We particularly like its use of WebGL, which is a JavaScript API for rendering interactive graphics. It’s alarming, illuminating, and pretty mesmerizing to play with.

Image: Screenshot from the visualization.

Murder in America
The Wall Street Journal takes FBI data from 2000 to 2010 to analyze the who, what, where, why, how and when murders take place across America.
All 165,068 in the decade analyzed.
The interactive they’ve created lets users sort and explore “why” a murder occurred (eg., Lover’s Triangle, Gang Killing and a large bucket of “Other”), who was killed and by whom (by race, sex and relationship), what weapon was used (eg., gun, knife, blunt object, etc.), when murders occurred (by year) and where they occurred (by state).
Needless to say, guns top the weapons category. While unlikely, getting pushed or thrown out a window  has occurred 35 times.
Most often the relationship between the victim and killer is unknown (in over 70,000 cases). How or why this doesn’t become known goes unexplained but acquaintances accounted for over 27,000 murders, strangers for over 25,000.
In the good to know but it goes against our folk history category: the least likely to commit murder are stepmothers with 57 killings attributed to them in the decade analyzed.
The WSJ notes in their methodology that the data they’re working with has many holes in it. For example:

The FBI collects this data from the states, except for Florida. Florida doesn’t use the FBI’s guidelines when reporting additional information about homicides. The FBI data don’t capture all homicides. The states’ reporting is voluntary, and the country’s thousands of police agencies aren’t consistent in how they report. Some states, including New York, reported no justifiable homicides at all for some years. In recording the circumstances of a murder, the information recorded in the FBI data may capture only the relationship of the killer to one of the victims — but not other victims — in a given situation. Because of the unlimited number of scenarios in which a homicide can occur, the coding used in the FBI database may not explain the full set of circumstances involved.

That said, an interesting data set and interactive but view it as a big picture account of murder in America.
Image: Detail, Murder in America, by the Wall Street Journal.

Murder in America

The Wall Street Journal takes FBI data from 2000 to 2010 to analyze the who, what, where, why, how and when murders take place across America.

All 165,068 in the decade analyzed.

The interactive they’ve created lets users sort and explore “why” a murder occurred (eg., Lover’s Triangle, Gang Killing and a large bucket of “Other”), who was killed and by whom (by race, sex and relationship), what weapon was used (eg., gun, knife, blunt object, etc.), when murders occurred (by year) and where they occurred (by state).

Needless to say, guns top the weapons category. While unlikely, getting pushed or thrown out a window  has occurred 35 times.

Most often the relationship between the victim and killer is unknown (in over 70,000 cases). How or why this doesn’t become known goes unexplained but acquaintances accounted for over 27,000 murders, strangers for over 25,000.

In the good to know but it goes against our folk history category: the least likely to commit murder are stepmothers with 57 killings attributed to them in the decade analyzed.

The WSJ notes in their methodology that the data they’re working with has many holes in it. For example:

The FBI collects this data from the states, except for Florida. Florida doesn’t use the FBI’s guidelines when reporting additional information about homicides. The FBI data don’t capture all homicides. The states’ reporting is voluntary, and the country’s thousands of police agencies aren’t consistent in how they report. Some states, including New York, reported no justifiable homicides at all for some years. In recording the circumstances of a murder, the information recorded in the FBI data may capture only the relationship of the killer to one of the victims — but not other victims — in a given situation. Because of the unlimited number of scenarios in which a homicide can occur, the coding used in the FBI database may not explain the full set of circumstances involved.

That said, an interesting data set and interactive but view it as a big picture account of murder in America.

Image: Detail, Murder in America, by the Wall Street Journal.

(In)tolerance
A poll released today by the Arab American Institute explores attitudes Americans have toward Arabs and Muslims. 
“The data extracted,” the Institute writes, ”indicates that anti-Arab and anti-Muslim political rhetoric has taken a toll on American public opinion, especially along age and party lines.”
Takeaways from the report:

1. Arabs, Muslims, Arab Americans, and American Muslims have the lowest favorable/highest unfavorable ratings among the groups covered.
2. Muslims were the only group with a net unfavorable rating.
3. Note that one in five Americans were either unfamiliar with or not sure of their attitudes toward these communities.
4. Sikhs and Mormons also fare poorly, but in the case of Sikhs, one in four Americans are “unfamiliar” or “not sure”.
5. There is a deep generational divide, which is reflected in a partisan divide.
6. Younger Americans (18-25) rate Arabs and Muslims up to 17 points higher than the older generation. They also rate Arab Americans and American Muslims higher as well.
7. Younger Americans rate Catholics and the various Protestant denominations covered in the survey almost 20 points lower than do older Americans (65+). The younger group also rates Mormons 15 points lower.
8. This is reflected in a deep partisan divide and even more so in a division between those who describe themselves as Obama or Romney voters. For example, note how the ratings given to Arabs and Muslims by Obama and Romney voters are mirror reflections of each other. While Obama voters give Arabs a net 51%/29% favorable rating and Muslims a net 53%/29% rating; Romney voters give Arabs a 30%/50% net unfavorable rating and Muslims a 25%/57% unfavorable rating.
9. Democrats and Obama voters give no group a net negative rating. Republicans and Romney voters only give strong negative ratings to Arabs, Muslims, Arab Americans, and American Muslims.

Image: Detail from The American Divide: How We View Arabs and Muslims.Select to embiggen.

(In)tolerance

poll released today by the Arab American Institute explores attitudes Americans have toward Arabs and Muslims. 

“The data extracted,” the Institute writes, ”indicates that anti-Arab and anti-Muslim political rhetoric has taken a toll on American public opinion, especially along age and party lines.”

Takeaways from the report:

1. Arabs, Muslims, Arab Americans, and American Muslims have the lowest favorable/highest unfavorable ratings among the groups covered.

2. Muslims were the only group with a net unfavorable rating.

3. Note that one in five Americans were either unfamiliar with or not sure of their attitudes toward these communities.

4. Sikhs and Mormons also fare poorly, but in the case of Sikhs, one in four Americans are “unfamiliar” or “not sure”.

5. There is a deep generational divide, which is reflected in a partisan divide.

6. Younger Americans (18-25) rate Arabs and Muslims up to 17 points higher than the older generation. They also rate Arab Americans and American Muslims higher as well.

7. Younger Americans rate Catholics and the various Protestant denominations covered in the survey almost 20 points lower than do older Americans (65+). The younger group also rates Mormons 15 points lower.

8. This is reflected in a deep partisan divide and even more so in a division between those who describe themselves as Obama or Romney voters. For example, note how the ratings given to Arabs and Muslims by Obama and Romney voters are mirror reflections of each other. While Obama voters give Arabs a net 51%/29% favorable rating and Muslims a net 53%/29% rating; Romney voters give Arabs a 30%/50% net unfavorable rating and Muslims a 25%/57% unfavorable rating.

9. Democrats and Obama voters give no group a net negative rating. Republicans and Romney voters only give strong negative ratings to Arabs, Muslims, Arab Americans, and American Muslims.

Image: Detail from The American Divide: How We View Arabs and Muslims.
Select to embiggen.