Posts tagged data

The Internet is a Series of Tubes
Mark Graham and Stefano De Sabbata from the Oxford Internet Institute map the world’s submarine fibre-optic cables to appear like the London’s Tube Map (PDF). But they also go a few steps further.
Via Information Geographies

For the sake of simplicity, many short links have been excluded from the visualization. For instance, it doesn’t show the intricate network of cables under the waters of the Gulf of Mexico, the South and East China Sea, the North Sea, and the Mediterranean Sea. The map instead aims to provide a global overview of the network, and a general sense of how information traverses our planet. (The findings reported below, however, are based on two analysis of the full submarine fibre-optic cable network, and not just the simplified representation shown in the illustration.)
The map also includes symbols referring to countries listed as “Enemies of the Internet” in the 2014 report of Reporters Without Borders. The centrality of the nodes within the network has been calculated using the PageRank algorithm. The rank is important as it highlights those geographical places where the network is most influenced by power (e.g., potential data surveillance) and weakness (e.g., potential service disruption).

Image: Internet Tube, by Mark Graham and Stefano De Sabbata.

The Internet is a Series of Tubes

Mark Graham and Stefano De Sabbata from the Oxford Internet Institute map the world’s submarine fibre-optic cables to appear like the London’s Tube Map (PDF). But they also go a few steps further.

Via Information Geographies

For the sake of simplicity, many short links have been excluded from the visualization. For instance, it doesn’t show the intricate network of cables under the waters of the Gulf of Mexico, the South and East China Sea, the North Sea, and the Mediterranean Sea. The map instead aims to provide a global overview of the network, and a general sense of how information traverses our planet. (The findings reported below, however, are based on two analysis of the full submarine fibre-optic cable network, and not just the simplified representation shown in the illustration.)

The map also includes symbols referring to countries listed as “Enemies of the Internet” in the 2014 report of Reporters Without Borders. The centrality of the nodes within the network has been calculated using the PageRank algorithm. The rank is important as it highlights those geographical places where the network is most influenced by power (e.g., potential data surveillance) and weakness (e.g., potential service disruption).

Image: Internet Tube, by Mark Graham and Stefano De Sabbata.

All About the #Selfies
Via Wired:

Right now, there are more than 79 million photos on Instagram that fall under #selfie. This is not counting #selfies (7 million photos), #selfienation (1 million photos), #selfiesfordays (400,000 photos) or the countless number of photos with no hashtag at all. You might be thinking: “Finally, we’ve reached peak #selfie!” But according to a new study, only 3-5 percent of photos on Instagram fall into the category…
…In its short lifespan, the selfie has gone from pop culture phenomenon to academic lab rat. For obvious reasons, these photos are a psychological research goldmine, but there’s been little done in the way of objectively looking at the photos’ content to see how it might reflect the actual world we live in. Selfiecity looks at the trend through a window, not a microscope. Instead of zeroing in on a single narrow element, the Selfiecity project is broken down into a few broad areas: main findings, contextual essays and interactive data visualizations. “We wanted to look at this phenomena from different perspectives,” Manovich explains.
Selfiecity analyzes Instagram data for visual cues like head position, emotional expression, gender and age, in order to get a clearer picture of how (and how often) people actually take selfies in different cultures. “The idea was to confront the generalizations about selfies, which are not based on data, with actual data,” says Manovich. “We wanted to look at what the actual patterns are.”

So, check Selfiecity, it’s mesmerizing.
And then, perhaps, check #SELFIE (Official Music Video), a techno ode to all things selfie, crowdsourced “from so many amazing and funny ppl.”
Image: Selfies in New York, via Wired.

All About the #Selfies

Via Wired:

Right now, there are more than 79 million photos on Instagram that fall under #selfie. This is not counting #selfies (7 million photos), #selfienation (1 million photos), #selfiesfordays (400,000 photos) or the countless number of photos with no hashtag at all. You might be thinking: “Finally, we’ve reached peak #selfie!” But according to a new study, only 3-5 percent of photos on Instagram fall into the category…

…In its short lifespan, the selfie has gone from pop culture phenomenon to academic lab rat. For obvious reasons, these photos are a psychological research goldmine, but there’s been little done in the way of objectively looking at the photos’ content to see how it might reflect the actual world we live in. Selfiecity looks at the trend through a window, not a microscope. Instead of zeroing in on a single narrow element, the Selfiecity project is broken down into a few broad areas: main findings, contextual essays and interactive data visualizations. “We wanted to look at this phenomena from different perspectives,” Manovich explains.

Selfiecity analyzes Instagram data for visual cues like head position, emotional expression, gender and age, in order to get a clearer picture of how (and how often) people actually take selfies in different cultures. “The idea was to confront the generalizations about selfies, which are not based on data, with actual data,” says Manovich. “We wanted to look at what the actual patterns are.”

So, check Selfiecity, it’s mesmerizing.

And then, perhaps, check #SELFIE (Official Music Video), a techno ode to all things selfie, crowdsourced “from so many amazing and funny ppl.”

Image: Selfies in New York, via Wired.

Facebook v Princeton: Who You Got?
Princeton punches first, via The Guardian:

Facebook has spread like an infectious disease but we are slowly becoming immune to its attractions, and the platform will be largely abandoned by 2017, say researchers at Princeton University.
The forecast of Facebook’s impending doom was made by comparing the growth curve of epidemics to those of online social networks. Scientists argue that, like bubonic plague, Facebook will eventually die out.
The social network, which celebrates its 10th birthday on 4 February, has survived longer than rivals such as Myspace and Bebo, but the Princeton forecast says it will lose 80% of its peak user base within the next three years…
…”Ideas, like diseases, have been shown to spread infectiously between people before eventually dying out, and have been successfully described with epidemiological models,” the authors claim in a paper entitled Epidemiological modelling of online social network dynamics.

Facebook punches back:

In keeping with the scientific principle “correlation equals causation,” our research unequivocally demonstrated that Princeton may be in danger of disappearing entirely…
…[Trends suggest] that Princeton will have only half its current enrollment by 2018, and by 2021 it will have no students at all, agreeing with the previous graph of scholarly scholarliness. Based on our robust scientific analysis, future generations will only be able to imagine this now-rubble institution that once walked this earth.

Read through for Facebook’s assorted charts and graphs to back its claims.
The Princeton paper is available via arXiv (PDF)

Facebook v Princeton: Who You Got?

Princeton punches first, via The Guardian:

Facebook has spread like an infectious disease but we are slowly becoming immune to its attractions, and the platform will be largely abandoned by 2017, say researchers at Princeton University.

The forecast of Facebook’s impending doom was made by comparing the growth curve of epidemics to those of online social networks. Scientists argue that, like bubonic plague, Facebook will eventually die out.

The social network, which celebrates its 10th birthday on 4 February, has survived longer than rivals such as Myspace and Bebo, but the Princeton forecast says it will lose 80% of its peak user base within the next three years…

…”Ideas, like diseases, have been shown to spread infectiously between people before eventually dying out, and have been successfully described with epidemiological models,” the authors claim in a paper entitled Epidemiological modelling of online social network dynamics.

Facebook punches back:

In keeping with the scientific principle “correlation equals causation,” our research unequivocally demonstrated that Princeton may be in danger of disappearing entirely…

…[Trends suggest] that Princeton will have only half its current enrollment by 2018, and by 2021 it will have no students at all, agreeing with the previous graph of scholarly scholarliness. Based on our robust scientific analysis, future generations will only be able to imagine this now-rubble institution that once walked this earth.

Read through for Facebook’s assorted charts and graphs to back its claims.

The Princeton paper is available via arXiv (PDF)

If you use Netflix, you’ve probably wondered about the specific genres that it suggests to you. Some of them just seem so specific that it’s absurd. Emotional Fight-the-System Documentaries? Period Pieces About Royalty Based on Real Life? Foreign Satanic Stories from the 1980s?

If Netflix can show such tiny slices of cinema to any given user, and they have 40 million users, how vast did their set of “personalized genres” need to be to describe the entire Hollywood universe?

This idle wonder turned to rabid fascination when I realized that I could capture each and every microgenre that Netflix’s algorithm has ever created.

Through a combination of elbow grease and spam-level repetition, we discovered that Netflix possesses not several hundred genres, or even several thousand, but 76,897 unique ways to describe types of movies…

…What emerged from the work is this conclusion: Netflix has meticulously analyzed and tagged every movie and TV show imaginable. They possess a stockpile of data about Hollywood entertainment that is absolutely unprecedented. The genres that I scraped and that we caricature above are just the surface manifestation of this deeper database.
Alexis C. Madrigal, The Atlantic. How Netflix Reverse Engineered Hollywood.
Reporting Immigration, Population and the US Census
The US population grew to just over 316 million in 2013, according to the Census Bureau, which released its population numbers Monday. This is up from 313.8 million in 2012.
With the release of the data, news organizations are giving things a local spin. Take, for example, Florida closing in on New York as the country’s third most populous state; Utah as the country’s second fastest growing state; or Pennsylvania as one of the country’s slowest growing states.
In New York City, the talk is about how diverse the population is, with 37.2% of the population foreign-born.
Via WNYC:

The city’s foreign-born population has crossed the 3 million mark, a figure without precedent in municipal history and indicative of a decades-long metamorphosis of New York’s character.

If you have yourself some minutes, listen to this segment from the Brian Lehrer Show on New New Yorkers.
If you like to play with data, you can download the Census information here.
Image: Screenshot, Top 10 Immigrant Groups in Woodside, Queens, via NYC.gov’s Where are New York City’s Immigrants/Top Groups Living?

Reporting Immigration, Population and the US Census

The US population grew to just over 316 million in 2013, according to the Census Bureau, which released its population numbers Monday. This is up from 313.8 million in 2012.

With the release of the data, news organizations are giving things a local spin. Take, for example, Florida closing in on New York as the country’s third most populous state; Utah as the country’s second fastest growing state; or Pennsylvania as one of the country’s slowest growing states.

In New York City, the talk is about how diverse the population is, with 37.2% of the population foreign-born.

Via WNYC:

The city’s foreign-born population has crossed the 3 million mark, a figure without precedent in municipal history and indicative of a decades-long metamorphosis of New York’s character.

If you have yourself some minutes, listen to this segment from the Brian Lehrer Show on New New Yorkers.

If you like to play with data, you can download the Census information here.

Image: Screenshot, Top 10 Immigrant Groups in Woodside, Queens, via NYC.gov’s Where are New York City’s Immigrants/Top Groups Living?

Evolution, or Lack Thereof
Via Pew Research Center, Public Views on Human Evolution.

Evolution, or Lack Thereof

Via Pew Research Center, Public Views on Human Evolution.

What Surveillance Valley knows about you

Via PandoDaily:

No source of information is sacred: transaction records are bought in bulk from stores, retailers and merchants; magazine subscriptions are recorded; food and restaurant preferences are noted; public records and social networks are scoured and scraped. What kind of prescription drugs did you buy? What kind of books are you interested in? Are you a registered voter? To what non-profits do you donate? What movies do you watch? Political documentaries? Hunting reality TV shows?

That info is combined and kept up to date with address, payroll information, phone numbers, email accounts, social security numbers, vehicle registration and financial history. And all that is sliced, isolated, analyzed and mined for data about you and your habits in a million different ways…

…Take MEDbase200, a boutique for-profit intel outfit that specializes in selling health-related consumer data. Well, until last week, the company offered its clients a list of rape victims (or “rape sufferers,” as the company calls them) at the low price of $79.00 per thousand. The company claims to have segmented this data set into hundreds of different categories, including stuff like the ailments they suffer, prescription drugs they take and their ethnicity…

…[I]f lists of rape victims aren’t your thing, MEDbase can sell dossiers on people suffering from anorexia, substance abuse, AIDS and HIV, Alzheimer’s Disease, Asperger Disorder, Attention Deficit Hyperactivity Disorder, Bedwetting (Enuresis), Binge Eating Disorder, Depression, Fetal Alcohol Syndrome, Genital Herpes, Genital Warts, Gonorrhea, Homelessness, Infertility, Syphilis… the list goes on and on and on and on.

PandoDaily reports that some 4,000 data mining companies generate about $200 billion annually. 

Census Bureau Releases Mapping Tool
The US Census Bureau today released an updated set of statistics based on its nation-wide, 2008-2012 American Community Survey. Along with it, the Bureau’s created an interactive map to allow users to visually explore communities across the country.
Via the US Census Bureau:

The new application allows users to map out different social, economic and housing characteristics of their state, county or census tract, and to see how these areas have changed since the 1990 and 2000 censuses. The mapping tool is powered by American Community Survey statistics from the Census Bureau’s API, an application programming interface that allows developers to take data sets and reuse them to create online and mobile apps.

Site visitors can explore eight core statistics (eg, median household income, total population and education levels) via the map.
Those with coding chops can hit up the Census Bureau’s API to develop creations of their own. The API gives access to 40 social, economic and housing topics.
Image: Screenshot, Census Explorer.

Census Bureau Releases Mapping Tool

The US Census Bureau today released an updated set of statistics based on its nation-wide, 2008-2012 American Community Survey. Along with it, the Bureau’s created an interactive map to allow users to visually explore communities across the country.

Via the US Census Bureau:

The new application allows users to map out different social, economic and housing characteristics of their state, county or census tract, and to see how these areas have changed since the 1990 and 2000 censuses. The mapping tool is powered by American Community Survey statistics from the Census Bureau’s API, an application programming interface that allows developers to take data sets and reuse them to create online and mobile apps.

Site visitors can explore eight core statistics (eg, median household income, total population and education levels) via the map.

Those with coding chops can hit up the Census Bureau’s API to develop creations of their own. The API gives access to 40 social, economic and housing topics.

Image: Screenshot, Census Explorer.

Internet Populations
Cartograms are interesting. Instead of displaying political boundaries, they show data boundaries. So, for example, mapping the world across social and economic indicators.
Here, though, is Internet penetration, via the Oxford Internet Institute. It represents who’s online and where.
Via The Atlantic

The map, created as part of the Information Geographies project at the Oxford Internet Institute, has two layers of information: the absolute size of the online population by country (rendered in geographical space) and the percent of the overall population that represents (rendered by color). Thus, Canada, with a relatively small number of people takes up little space, but is colored dark red, because more than 80 percent of people are online. China, by contrast, is huge, with more than half a billion people online, but relatively lightly shaded, since more than half the population is not online. Lightly colored countries that have large populations, such as China, India, and Indonesia, are where the Internet will grow the most in the years ahead.

And, via the Oxford Institute’s Mark Graham and Stefano De Sabbata, some trends:

First, the rise of Asia as the main contributor to the world’s Internet population; 42% of the world’s Internet users live in Asia, and China, India, and Japan alone host more Internet users than Europe and North America combined…
…The map also reveals interesting patterns in some of the world’s poorest countries. Most Latin American countries now can count over 40% of their citizens as Internet users. Because of this, Latin America as a whole now hosts almost as many Internet users as the United States.
Some African countries have seen staggering growth, whereas other have seen little change since we last mapped Internet use globally in 2008. In the last three years, almost all North African countries doubled their population of Internet users (Algeria being a notable exception). Kenya, Nigeria, and South Africa, also saw massive growth. However, it remains that over half of Sub-Saharan African countries have an Internet penetration of less than 10%, and have seen very little grow in recent years.
It is therefore important to remember that despite the massive impacts that the Internet has on everyday life for many people, most people on our planet remain entirely disconnected. Only one third of the world’s population has access to the Internet.

FJP: Global mobile penetration? At 6.8 billion mobile subscribers, that’s another story. So, disconnected in a sense. But being mobile can be very connected.
Image: Internet Population and Penetration, via the Oxford Internet Institute. Select to embiggen.

Internet Populations

Cartograms are interesting. Instead of displaying political boundaries, they show data boundaries. So, for example, mapping the world across social and economic indicators.

Here, though, is Internet penetration, via the Oxford Internet Institute. It represents who’s online and where.

Via The Atlantic

The map, created as part of the Information Geographies project at the Oxford Internet Institute, has two layers of information: the absolute size of the online population by country (rendered in geographical space) and the percent of the overall population that represents (rendered by color). Thus, Canada, with a relatively small number of people takes up little space, but is colored dark red, because more than 80 percent of people are online. China, by contrast, is huge, with more than half a billion people online, but relatively lightly shaded, since more than half the population is not online. Lightly colored countries that have large populations, such as China, India, and Indonesia, are where the Internet will grow the most in the years ahead.

And, via the Oxford Institute’s Mark Graham and Stefano De Sabbata, some trends:

First, the rise of Asia as the main contributor to the world’s Internet population; 42% of the world’s Internet users live in Asia, and China, India, and Japan alone host more Internet users than Europe and North America combined…

…The map also reveals interesting patterns in some of the world’s poorest countries. Most Latin American countries now can count over 40% of their citizens as Internet users. Because of this, Latin America as a whole now hosts almost as many Internet users as the United States.

Some African countries have seen staggering growth, whereas other have seen little change since we last mapped Internet use globally in 2008. In the last three years, almost all North African countries doubled their population of Internet users (Algeria being a notable exception). Kenya, Nigeria, and South Africa, also saw massive growth. However, it remains that over half of Sub-Saharan African countries have an Internet penetration of less than 10%, and have seen very little grow in recent years.

It is therefore important to remember that despite the massive impacts that the Internet has on everyday life for many people, most people on our planet remain entirely disconnected. Only one third of the world’s population has access to the Internet.

FJP: Global mobile penetration? At 6.8 billion mobile subscribers, that’s another story. So, disconnected in a sense. But being mobile can be very connected.

Image: Internet Population and Penetration, via the Oxford Internet Institute. Select to embiggen.

laughingsquid:

Privacy Opinions by xkcd

FJP: We’re with the sage.

laughingsquid:

Privacy Opinions by xkcd

FJP: We’re with the sage.

If you can’t present your ideas to at least a modestly larger audience, then it’s not going to do you very much good. Einstein supposedly said that I don’t trust any physics theory that can’t be explained to a 10-year-old. A lot of times the intuitions behind things aren’t really all that complicated.
Nate Silver in a Q&A with Harvard Business Review on how to get into data science as a newbie (student, professional, or otherwise).
Laughing at those who read about Miley Cyrus is America’s second-favorite pastime, right after reading about Miley Cyrus.

New York Magazine, Final Tally: Americans Were 12 Times More Interested in Miley Cyrus Than Syria.

Background: Outbrain, the content discovery platform, crunched numbers across its network of publishers to compare reader interest in stories about Syria versus those about Miley Cyrus:

Globally, there were almost 2.5 times as many available stories on Syria as there were on Miley Cyrus. Yet consumption of those Miley stories outpaced Syria by a factor of 8-to-1. And in the United States? 12-to-1!

Before those outside the States start casting their serious news stones, take stock: “Interest in the starlet significantly outpaced Syria in England, Australia, France, Germany, and every other nation in Outbrain’s analysis — except Israel and Russia.”

We just happen to fetishize her a bit more.

Visualizing the Bible

Top: textual cross-references within the Bible via Chris Harrison:

The bar graph that runs along the bottom represents all of the chapters in the Bible. Books alternate in color between white and light gray. The length of each bar denotes the number of verses in the chapter. Each of the 63,779 cross references found in the Bible is depicted by a single arc - the color corresponds to the distance between the two chapters, creating a rainbow-like effect.

Bottom: applying sentiment analysis in the Bible, via OpenBible:

This visualization explores the ups and downs of the Bible narrative, using sentiment analysis to quantify when positive and negative events are happening…

Things start off well with creation, turn negative with Job and the patriarchs, improve again with Moses, dip with the period of the judges, recover with David, and have a mixed record (especially negative when Samaria is around) during the monarchy. The exilic period isn’t as negative as you might expect, nor the return period as positive. In the New Testament, things start off fine with Jesus, then quickly turn negative as opposition to his message grows. The story of the early church, especially in the epistles, is largely positive.

For more examples of biblical visualizations, visit The Guardian.

Images: Visiting the source links above gives you biggie versions. Alternatively, select to embiggen.

The ‘Mood Graph’: How Our Emotions Are Taking Over the Web

Wired’s Evan Selinger describes what he sees as a new direction of the Internet, wherein platforms now focus on tracking and categorizing how users feel about the content they consume:

The point is that all these interfaces are now focusing on the emotional aspects of our information diets. To put this development in a broader context: the mood graph has arrived, taking its place alongside the social graph (most commonly associated with Facebook), citation-link graph and knowledge graph (associated with Google), work graph (LinkedIn and others), and interest graph (Pinterest and others).

Like all these other graphs, the mood graph will enable relevance, customization, targeting; search, discovery, structuring; advertising, purchasing behaviors, and more. It also signals an important shift in computer-mediated communication.

Several aspects of this “mood graph” concern Selinger, including the potential of the “pre-fabricated symbols” of digital emotional communication (emoji, emoticons, and so on) to simplify the range and complexity of our feelings as well as the monetization of emotional tracking by companies like Facebook into advertising revenue.

FJP: Selinger cites Bitly for for feelings and methods of user-reported emotional expression in his piece, but other applications attempt to track mood using “raw” data, like the MoodScope, which analyzes smartphone data with an algorithm that takes into account sites visited (both physically and online), apps used, friends contacted, etc. Biofeedback technologies, such as Affectiva, collect data like facial expression, skin conductance, and heart rate to measure emotional state. These extensions of the Quantified Self movement have the potential to provide a more nuanced measure of our feelings than tracking premeditated verbal communication.

I also just want to mention that Tumblr culture seems to have developed its own language conventions (purposeful capitalization, lack of punctuation, etc.) to facilitate emotive expression (see this great Tumblr meta-discussion for more thoughts on that). So there is a way for language to accommodate tone and emotion to more closely mimick “IRL” interaction. And we might already be seeing that shift in mainstream language use. Shining