Posts tagged with ‘data’

I would love it if transparency truly allayed anxiety in an informed, nonexplosive way,” Mr. Rudder told me. But in practice, he said, “it might increase anxiety.

Natasha Singer (writing in the Times) about Christian Rudder, president of OkCupid and the guy who wrote this, whose new book: Dataclysm: Who We Are (When We Think No One’s Looking) just came out.

Singer writes:

Mr. Rudder says he carefully considers the potential risks of OkCupid’s observational and product research on its members. But his dual role as the approver of company research and its chief interpreter is complicated.

“The people who are making the minimal risk decisions are the same people conducting the experiments,” he acknowledges. “It is a conflict of interest.”

Now Mr. Rudder is weighing the possibility of even greater research transparency. His book certainly urges companies to share more of their behavioral research findings with the public. But the outcry over his recent blog post suggests that many consumers are not aware of the extent to which companies already scrutinize and manipulate their online activities.

FJP: While the issues around social manipulation and the need for transparency make themselves pretty apparent, Rudder’s comment, quoted above, points to something perplexing. Maybe greater transparency about both data-gathering practices and interpretations of it (especially on networks where people have social and emotional investments) would increase our anxiety about what the data says about us. It echoes (in sentiment) something Kate Crawford wrote in The New Inquiry in May called The Anxieties of Big Data that’s also really worth reading. In it, she deconstructs the myth that more data means greater accuracy and also points this out: 

If we take these twinned anxieties — those of the surveillers and the surveilled — and push them to their natural extension, we reach an epistemological end point: on one hand, the fear that there can never be enough data, and on the other, the fear that one is standing out in the data.

More Reading: A pretty interesting long-form profile on Rudder. A fantastic essay from danah boyd on ethics and oversight in data manipulation. And, from the FJP archives, a reading list on the social, cultural and political issues/possibilities surrounding big data.

We Are Nomads

We Are Nomads

Navigate the News
The Upshot, a new, data-driven venture from the New York Times, launches tomorrow. It will cover politics, policy and economic analysis, Quartz reported in March, and added:

David Leonhardt, the Times’ former Washington bureau chief, who is in charge of The Upshot, told Quartz that the new venture will have a dedicated staff of 15, including three full-time graphic journalists, and is on track for a launch this spring. “The idea behind the name is, we are trying to help readers get to the essence of issues and understand them in a contextual and conversational way,” Leonhardt says. “Obviously, we will be using data a lot to do that, not because data is some secret code, but because it’s a particularly effective way, when used in moderate doses, of explaining reality to people.”

Today, Leonhardt explained the why of it on Facebook:

You have no shortage of excellent news sources — sources that expertly report and analyze news as it happens. Like you, those of us at The Upshot rely on those sources every day. So why are we starting a new site to help people understand the news?…
…One, we believe many people don’t understand the news as well as they would like. They want to grasp big, complicated stories — Obamacare, inequality, political campaigns, the real-estate and stock markets — so well that they can explain the whys and hows of those stories to their friends, relatives and colleagues.
We believe we can help readers get to that level of understanding by writing in a direct, plain-spoken way, the same voice we might use when writing an email to a friend. We’ll be conversational without being dumbed down. We will build on the excellent journalism The New York Times is already producing, by helping readers make connections among different stories and understand how those stories fit together.

Image: @UpshotNYT announces its launch.

Navigate the News

The Upshot, a new, data-driven venture from the New York Times, launches tomorrow. It will cover politics, policy and economic analysis, Quartz reported in March, and added:

David Leonhardt, the Times’ former Washington bureau chief, who is in charge of The Upshot, told Quartz that the new venture will have a dedicated staff of 15, including three full-time graphic journalists, and is on track for a launch this spring. “The idea behind the name is, we are trying to help readers get to the essence of issues and understand them in a contextual and conversational way,” Leonhardt says. “Obviously, we will be using data a lot to do that, not because data is some secret code, but because it’s a particularly effective way, when used in moderate doses, of explaining reality to people.”

Today, Leonhardt explained the why of it on Facebook:

You have no shortage of excellent news sources — sources that expertly report and analyze news as it happens. Like you, those of us at The Upshot rely on those sources every day. So why are we starting a new site to help people understand the news?…

…One, we believe many people don’t understand the news as well as they would like. They want to grasp big, complicated stories — Obamacare, inequality, political campaigns, the real-estate and stock markets — so well that they can explain the whys and hows of those stories to their friends, relatives and colleagues.

We believe we can help readers get to that level of understanding by writing in a direct, plain-spoken way, the same voice we might use when writing an email to a friend. We’ll be conversational without being dumbed down. We will build on the excellent journalism The New York Times is already producing, by helping readers make connections among different stories and understand how those stories fit together.

Image: @UpshotNYT announces its launch.

The Internet is a Series of Tubes
Mark Graham and Stefano De Sabbata from the Oxford Internet Institute map the world’s submarine fibre-optic cables to appear like the London’s Tube Map (PDF). But they also go a few steps further.
Via Information Geographies

For the sake of simplicity, many short links have been excluded from the visualization. For instance, it doesn’t show the intricate network of cables under the waters of the Gulf of Mexico, the South and East China Sea, the North Sea, and the Mediterranean Sea. The map instead aims to provide a global overview of the network, and a general sense of how information traverses our planet. (The findings reported below, however, are based on two analysis of the full submarine fibre-optic cable network, and not just the simplified representation shown in the illustration.)
The map also includes symbols referring to countries listed as “Enemies of the Internet” in the 2014 report of Reporters Without Borders. The centrality of the nodes within the network has been calculated using the PageRank algorithm. The rank is important as it highlights those geographical places where the network is most influenced by power (e.g., potential data surveillance) and weakness (e.g., potential service disruption).

Image: Internet Tube, by Mark Graham and Stefano De Sabbata.

The Internet is a Series of Tubes

Mark Graham and Stefano De Sabbata from the Oxford Internet Institute map the world’s submarine fibre-optic cables to appear like the London’s Tube Map (PDF). But they also go a few steps further.

Via Information Geographies

For the sake of simplicity, many short links have been excluded from the visualization. For instance, it doesn’t show the intricate network of cables under the waters of the Gulf of Mexico, the South and East China Sea, the North Sea, and the Mediterranean Sea. The map instead aims to provide a global overview of the network, and a general sense of how information traverses our planet. (The findings reported below, however, are based on two analysis of the full submarine fibre-optic cable network, and not just the simplified representation shown in the illustration.)

The map also includes symbols referring to countries listed as “Enemies of the Internet” in the 2014 report of Reporters Without Borders. The centrality of the nodes within the network has been calculated using the PageRank algorithm. The rank is important as it highlights those geographical places where the network is most influenced by power (e.g., potential data surveillance) and weakness (e.g., potential service disruption).

Image: Internet Tube, by Mark Graham and Stefano De Sabbata.

All About the #Selfies
Via Wired:

Right now, there are more than 79 million photos on Instagram that fall under #selfie. This is not counting #selfies (7 million photos), #selfienation (1 million photos), #selfiesfordays (400,000 photos) or the countless number of photos with no hashtag at all. You might be thinking: “Finally, we’ve reached peak #selfie!” But according to a new study, only 3-5 percent of photos on Instagram fall into the category…
…In its short lifespan, the selfie has gone from pop culture phenomenon to academic lab rat. For obvious reasons, these photos are a psychological research goldmine, but there’s been little done in the way of objectively looking at the photos’ content to see how it might reflect the actual world we live in. Selfiecity looks at the trend through a window, not a microscope. Instead of zeroing in on a single narrow element, the Selfiecity project is broken down into a few broad areas: main findings, contextual essays and interactive data visualizations. “We wanted to look at this phenomena from different perspectives,” Manovich explains.
Selfiecity analyzes Instagram data for visual cues like head position, emotional expression, gender and age, in order to get a clearer picture of how (and how often) people actually take selfies in different cultures. “The idea was to confront the generalizations about selfies, which are not based on data, with actual data,” says Manovich. “We wanted to look at what the actual patterns are.”

So, check Selfiecity, it’s mesmerizing.
And then, perhaps, check #SELFIE (Official Music Video), a techno ode to all things selfie, crowdsourced “from so many amazing and funny ppl.”
Image: Selfies in New York, via Wired.

All About the #Selfies

Via Wired:

Right now, there are more than 79 million photos on Instagram that fall under #selfie. This is not counting #selfies (7 million photos), #selfienation (1 million photos), #selfiesfordays (400,000 photos) or the countless number of photos with no hashtag at all. You might be thinking: “Finally, we’ve reached peak #selfie!” But according to a new study, only 3-5 percent of photos on Instagram fall into the category…

…In its short lifespan, the selfie has gone from pop culture phenomenon to academic lab rat. For obvious reasons, these photos are a psychological research goldmine, but there’s been little done in the way of objectively looking at the photos’ content to see how it might reflect the actual world we live in. Selfiecity looks at the trend through a window, not a microscope. Instead of zeroing in on a single narrow element, the Selfiecity project is broken down into a few broad areas: main findings, contextual essays and interactive data visualizations. “We wanted to look at this phenomena from different perspectives,” Manovich explains.

Selfiecity analyzes Instagram data for visual cues like head position, emotional expression, gender and age, in order to get a clearer picture of how (and how often) people actually take selfies in different cultures. “The idea was to confront the generalizations about selfies, which are not based on data, with actual data,” says Manovich. “We wanted to look at what the actual patterns are.”

So, check Selfiecity, it’s mesmerizing.

And then, perhaps, check #SELFIE (Official Music Video), a techno ode to all things selfie, crowdsourced “from so many amazing and funny ppl.”

Image: Selfies in New York, via Wired.

Facebook v Princeton: Who You Got?
Princeton punches first, via The Guardian:

Facebook has spread like an infectious disease but we are slowly becoming immune to its attractions, and the platform will be largely abandoned by 2017, say researchers at Princeton University.
The forecast of Facebook’s impending doom was made by comparing the growth curve of epidemics to those of online social networks. Scientists argue that, like bubonic plague, Facebook will eventually die out.
The social network, which celebrates its 10th birthday on 4 February, has survived longer than rivals such as Myspace and Bebo, but the Princeton forecast says it will lose 80% of its peak user base within the next three years…
…”Ideas, like diseases, have been shown to spread infectiously between people before eventually dying out, and have been successfully described with epidemiological models,” the authors claim in a paper entitled Epidemiological modelling of online social network dynamics.

Facebook punches back:

In keeping with the scientific principle “correlation equals causation,” our research unequivocally demonstrated that Princeton may be in danger of disappearing entirely…
…[Trends suggest] that Princeton will have only half its current enrollment by 2018, and by 2021 it will have no students at all, agreeing with the previous graph of scholarly scholarliness. Based on our robust scientific analysis, future generations will only be able to imagine this now-rubble institution that once walked this earth.

Read through for Facebook’s assorted charts and graphs to back its claims.
The Princeton paper is available via arXiv (PDF)

Facebook v Princeton: Who You Got?

Princeton punches first, via The Guardian:

Facebook has spread like an infectious disease but we are slowly becoming immune to its attractions, and the platform will be largely abandoned by 2017, say researchers at Princeton University.

The forecast of Facebook’s impending doom was made by comparing the growth curve of epidemics to those of online social networks. Scientists argue that, like bubonic plague, Facebook will eventually die out.

The social network, which celebrates its 10th birthday on 4 February, has survived longer than rivals such as Myspace and Bebo, but the Princeton forecast says it will lose 80% of its peak user base within the next three years…

…”Ideas, like diseases, have been shown to spread infectiously between people before eventually dying out, and have been successfully described with epidemiological models,” the authors claim in a paper entitled Epidemiological modelling of online social network dynamics.

Facebook punches back:

In keeping with the scientific principle “correlation equals causation,” our research unequivocally demonstrated that Princeton may be in danger of disappearing entirely…

…[Trends suggest] that Princeton will have only half its current enrollment by 2018, and by 2021 it will have no students at all, agreeing with the previous graph of scholarly scholarliness. Based on our robust scientific analysis, future generations will only be able to imagine this now-rubble institution that once walked this earth.

Read through for Facebook’s assorted charts and graphs to back its claims.

The Princeton paper is available via arXiv (PDF)

If you use Netflix, you’ve probably wondered about the specific genres that it suggests to you. Some of them just seem so specific that it’s absurd. Emotional Fight-the-System Documentaries? Period Pieces About Royalty Based on Real Life? Foreign Satanic Stories from the 1980s?

If Netflix can show such tiny slices of cinema to any given user, and they have 40 million users, how vast did their set of “personalized genres” need to be to describe the entire Hollywood universe?

This idle wonder turned to rabid fascination when I realized that I could capture each and every microgenre that Netflix’s algorithm has ever created.

Through a combination of elbow grease and spam-level repetition, we discovered that Netflix possesses not several hundred genres, or even several thousand, but 76,897 unique ways to describe types of movies…

…What emerged from the work is this conclusion: Netflix has meticulously analyzed and tagged every movie and TV show imaginable. They possess a stockpile of data about Hollywood entertainment that is absolutely unprecedented. The genres that I scraped and that we caricature above are just the surface manifestation of this deeper database.

— Alexis C. Madrigal, The Atlantic. How Netflix Reverse Engineered Hollywood.

Reporting Immigration, Population and the US Census
The US population grew to just over 316 million in 2013, according to the Census Bureau, which released its population numbers Monday. This is up from 313.8 million in 2012.
With the release of the data, news organizations are giving things a local spin. Take, for example, Florida closing in on New York as the country’s third most populous state; Utah as the country’s second fastest growing state; or Pennsylvania as one of the country’s slowest growing states.
In New York City, the talk is about how diverse the population is, with 37.2% of the population foreign-born.
Via WNYC:

The city’s foreign-born population has crossed the 3 million mark, a figure without precedent in municipal history and indicative of a decades-long metamorphosis of New York’s character.

If you have yourself some minutes, listen to this segment from the Brian Lehrer Show on New New Yorkers.
If you like to play with data, you can download the Census information here.
Image: Screenshot, Top 10 Immigrant Groups in Woodside, Queens, via NYC.gov’s Where are New York City’s Immigrants/Top Groups Living?

Reporting Immigration, Population and the US Census

The US population grew to just over 316 million in 2013, according to the Census Bureau, which released its population numbers Monday. This is up from 313.8 million in 2012.

With the release of the data, news organizations are giving things a local spin. Take, for example, Florida closing in on New York as the country’s third most populous state; Utah as the country’s second fastest growing state; or Pennsylvania as one of the country’s slowest growing states.

In New York City, the talk is about how diverse the population is, with 37.2% of the population foreign-born.

Via WNYC:

The city’s foreign-born population has crossed the 3 million mark, a figure without precedent in municipal history and indicative of a decades-long metamorphosis of New York’s character.

If you have yourself some minutes, listen to this segment from the Brian Lehrer Show on New New Yorkers.

If you like to play with data, you can download the Census information here.

Image: Screenshot, Top 10 Immigrant Groups in Woodside, Queens, via NYC.gov’s Where are New York City’s Immigrants/Top Groups Living?

Evolution, or Lack Thereof
Via Pew Research Center, Public Views on Human Evolution.

Evolution, or Lack Thereof

Via Pew Research Center, Public Views on Human Evolution.

What Surveillance Valley knows about you →

Via PandoDaily:

No source of information is sacred: transaction records are bought in bulk from stores, retailers and merchants; magazine subscriptions are recorded; food and restaurant preferences are noted; public records and social networks are scoured and scraped. What kind of prescription drugs did you buy? What kind of books are you interested in? Are you a registered voter? To what non-profits do you donate? What movies do you watch? Political documentaries? Hunting reality TV shows?

That info is combined and kept up to date with address, payroll information, phone numbers, email accounts, social security numbers, vehicle registration and financial history. And all that is sliced, isolated, analyzed and mined for data about you and your habits in a million different ways…

…Take MEDbase200, a boutique for-profit intel outfit that specializes in selling health-related consumer data. Well, until last week, the company offered its clients a list of rape victims (or “rape sufferers,” as the company calls them) at the low price of $79.00 per thousand. The company claims to have segmented this data set into hundreds of different categories, including stuff like the ailments they suffer, prescription drugs they take and their ethnicity…

…[I]f lists of rape victims aren’t your thing, MEDbase can sell dossiers on people suffering from anorexia, substance abuse, AIDS and HIV, Alzheimer’s Disease, Asperger Disorder, Attention Deficit Hyperactivity Disorder, Bedwetting (Enuresis), Binge Eating Disorder, Depression, Fetal Alcohol Syndrome, Genital Herpes, Genital Warts, Gonorrhea, Homelessness, Infertility, Syphilis… the list goes on and on and on and on.

PandoDaily reports that some 4,000 data mining companies generate about $200 billion annually. 

Census Bureau Releases Mapping Tool
The US Census Bureau today released an updated set of statistics based on its nation-wide, 2008-2012 American Community Survey. Along with it, the Bureau’s created an interactive map to allow users to visually explore communities across the country.
Via the US Census Bureau:

The new application allows users to map out different social, economic and housing characteristics of their state, county or census tract, and to see how these areas have changed since the 1990 and 2000 censuses. The mapping tool is powered by American Community Survey statistics from the Census Bureau’s API, an application programming interface that allows developers to take data sets and reuse them to create online and mobile apps.

Site visitors can explore eight core statistics (eg, median household income, total population and education levels) via the map.
Those with coding chops can hit up the Census Bureau’s API to develop creations of their own. The API gives access to 40 social, economic and housing topics.
Image: Screenshot, Census Explorer.

Census Bureau Releases Mapping Tool

The US Census Bureau today released an updated set of statistics based on its nation-wide, 2008-2012 American Community Survey. Along with it, the Bureau’s created an interactive map to allow users to visually explore communities across the country.

Via the US Census Bureau:

The new application allows users to map out different social, economic and housing characteristics of their state, county or census tract, and to see how these areas have changed since the 1990 and 2000 censuses. The mapping tool is powered by American Community Survey statistics from the Census Bureau’s API, an application programming interface that allows developers to take data sets and reuse them to create online and mobile apps.

Site visitors can explore eight core statistics (eg, median household income, total population and education levels) via the map.

Those with coding chops can hit up the Census Bureau’s API to develop creations of their own. The API gives access to 40 social, economic and housing topics.

Image: Screenshot, Census Explorer.

Internet Populations
Cartograms are interesting. Instead of displaying political boundaries, they show data boundaries. So, for example, mapping the world across social and economic indicators.
Here, though, is Internet penetration, via the Oxford Internet Institute. It represents who’s online and where.
Via The Atlantic

The map, created as part of the Information Geographies project at the Oxford Internet Institute, has two layers of information: the absolute size of the online population by country (rendered in geographical space) and the percent of the overall population that represents (rendered by color). Thus, Canada, with a relatively small number of people takes up little space, but is colored dark red, because more than 80 percent of people are online. China, by contrast, is huge, with more than half a billion people online, but relatively lightly shaded, since more than half the population is not online. Lightly colored countries that have large populations, such as China, India, and Indonesia, are where the Internet will grow the most in the years ahead.

And, via the Oxford Institute’s Mark Graham and Stefano De Sabbata, some trends:

First, the rise of Asia as the main contributor to the world’s Internet population; 42% of the world’s Internet users live in Asia, and China, India, and Japan alone host more Internet users than Europe and North America combined…
…The map also reveals interesting patterns in some of the world’s poorest countries. Most Latin American countries now can count over 40% of their citizens as Internet users. Because of this, Latin America as a whole now hosts almost as many Internet users as the United States.
Some African countries have seen staggering growth, whereas other have seen little change since we last mapped Internet use globally in 2008. In the last three years, almost all North African countries doubled their population of Internet users (Algeria being a notable exception). Kenya, Nigeria, and South Africa, also saw massive growth. However, it remains that over half of Sub-Saharan African countries have an Internet penetration of less than 10%, and have seen very little grow in recent years.
It is therefore important to remember that despite the massive impacts that the Internet has on everyday life for many people, most people on our planet remain entirely disconnected. Only one third of the world’s population has access to the Internet.

FJP: Global mobile penetration? At 6.8 billion mobile subscribers, that’s another story. So, disconnected in a sense. But being mobile can be very connected.
Image: Internet Population and Penetration, via the Oxford Internet Institute. Select to embiggen.

Internet Populations

Cartograms are interesting. Instead of displaying political boundaries, they show data boundaries. So, for example, mapping the world across social and economic indicators.

Here, though, is Internet penetration, via the Oxford Internet Institute. It represents who’s online and where.

Via The Atlantic

The map, created as part of the Information Geographies project at the Oxford Internet Institute, has two layers of information: the absolute size of the online population by country (rendered in geographical space) and the percent of the overall population that represents (rendered by color). Thus, Canada, with a relatively small number of people takes up little space, but is colored dark red, because more than 80 percent of people are online. China, by contrast, is huge, with more than half a billion people online, but relatively lightly shaded, since more than half the population is not online. Lightly colored countries that have large populations, such as China, India, and Indonesia, are where the Internet will grow the most in the years ahead.

And, via the Oxford Institute’s Mark Graham and Stefano De Sabbata, some trends:

First, the rise of Asia as the main contributor to the world’s Internet population; 42% of the world’s Internet users live in Asia, and China, India, and Japan alone host more Internet users than Europe and North America combined…

…The map also reveals interesting patterns in some of the world’s poorest countries. Most Latin American countries now can count over 40% of their citizens as Internet users. Because of this, Latin America as a whole now hosts almost as many Internet users as the United States.

Some African countries have seen staggering growth, whereas other have seen little change since we last mapped Internet use globally in 2008. In the last three years, almost all North African countries doubled their population of Internet users (Algeria being a notable exception). Kenya, Nigeria, and South Africa, also saw massive growth. However, it remains that over half of Sub-Saharan African countries have an Internet penetration of less than 10%, and have seen very little grow in recent years.

It is therefore important to remember that despite the massive impacts that the Internet has on everyday life for many people, most people on our planet remain entirely disconnected. Only one third of the world’s population has access to the Internet.

FJP: Global mobile penetration? At 6.8 billion mobile subscribers, that’s another story. So, disconnected in a sense. But being mobile can be very connected.

Image: Internet Population and Penetration, via the Oxford Internet Institute. Select to embiggen.

laughingsquid:

Privacy Opinions by xkcd

FJP: We’re with the sage.

laughingsquid:

Privacy Opinions by xkcd

FJP: We’re with the sage.

If you can’t present your ideas to at least a modestly larger audience, then it’s not going to do you very much good. Einstein supposedly said that I don’t trust any physics theory that can’t be explained to a 10-year-old. A lot of times the intuitions behind things aren’t really all that complicated.

Nate Silver in a Q&A with Harvard Business Review on how to get into data science as a newbie (student, professional, or otherwise).