Posts tagged with ‘data’

Laughing at those who read about Miley Cyrus is America’s second-favorite pastime, right after reading about Miley Cyrus.

New York Magazine, Final Tally: Americans Were 12 Times More Interested in Miley Cyrus Than Syria.

Background: Outbrain, the content discovery platform, crunched numbers across its network of publishers to compare reader interest in stories about Syria versus those about Miley Cyrus:

Globally, there were almost 2.5 times as many available stories on Syria as there were on Miley Cyrus. Yet consumption of those Miley stories outpaced Syria by a factor of 8-to-1. And in the United States? 12-to-1!

Before those outside the States start casting their serious news stones, take stock: “Interest in the starlet significantly outpaced Syria in England, Australia, France, Germany, and every other nation in Outbrain’s analysis — except Israel and Russia.”

We just happen to fetishize her a bit more.

Visualizing the Bible

Top: textual cross-references within the Bible via Chris Harrison:

The bar graph that runs along the bottom represents all of the chapters in the Bible. Books alternate in color between white and light gray. The length of each bar denotes the number of verses in the chapter. Each of the 63,779 cross references found in the Bible is depicted by a single arc - the color corresponds to the distance between the two chapters, creating a rainbow-like effect.

Bottom: applying sentiment analysis in the Bible, via OpenBible:

This visualization explores the ups and downs of the Bible narrative, using sentiment analysis to quantify when positive and negative events are happening…

Things start off well with creation, turn negative with Job and the patriarchs, improve again with Moses, dip with the period of the judges, recover with David, and have a mixed record (especially negative when Samaria is around) during the monarchy. The exilic period isn’t as negative as you might expect, nor the return period as positive. In the New Testament, things start off fine with Jesus, then quickly turn negative as opposition to his message grows. The story of the early church, especially in the epistles, is largely positive.

For more examples of biblical visualizations, visit The Guardian.

Images: Visiting the source links above gives you biggie versions. Alternatively, select to embiggen.

The ‘Mood Graph’: How Our Emotions Are Taking Over the Web →

Wired’s Evan Selinger describes what he sees as a new direction of the Internet, wherein platforms now focus on tracking and categorizing how users feel about the content they consume:

The point is that all these interfaces are now focusing on the emotional aspects of our information diets. To put this development in a broader context: the mood graph has arrived, taking its place alongside the social graph (most commonly associated with Facebook), citation-link graph and knowledge graph (associated with Google), work graph (LinkedIn and others), and interest graph (Pinterest and others).

Like all these other graphs, the mood graph will enable relevance, customization, targeting; search, discovery, structuring; advertising, purchasing behaviors, and more. It also signals an important shift in computer-mediated communication.

Several aspects of this “mood graph” concern Selinger, including the potential of the “pre-fabricated symbols” of digital emotional communication (emoji, emoticons, and so on) to simplify the range and complexity of our feelings as well as the monetization of emotional tracking by companies like Facebook into advertising revenue.

FJP: Selinger cites Bitly for for feelings and methods of user-reported emotional expression in his piece, but other applications attempt to track mood using “raw” data, like the MoodScope, which analyzes smartphone data with an algorithm that takes into account sites visited (both physically and online), apps used, friends contacted, etc. Biofeedback technologies, such as Affectiva, collect data like facial expression, skin conductance, and heart rate to measure emotional state. These extensions of the Quantified Self movement have the potential to provide a more nuanced measure of our feelings than tracking premeditated verbal communication.

I also just want to mention that Tumblr culture seems to have developed its own language conventions (purposeful capitalization, lack of punctuation, etc.) to facilitate emotive expression (see this great Tumblr meta-discussion for more thoughts on that). So there is a way for language to accommodate tone and emotion to more closely mimick “IRL” interaction. And we might already be seeing that shift in mainstream language use. Shining

Learn Data-Driven Journalism with the Knight Center!

Itching to learn about data-driven journalism and its increasing role in the American newsroom? It’s your lucky day!

Starting August 12 (this Monday!), The Knight Center for Journalism in the Americas at UT Austin will host a free, five-week Massive Open Online Course (MOOC) called Data-Driven Journalism: The Basics.”

The course brings together five experts in the field – from publications that include The New York Times, ProPublica, NPR and the Houston Chronicle – to explore “how data is used in the media industry today, where to locate data, how to clean and analyze it critically, and how to optimize the presentation of information for maximum readability and interactivity.” 

Sign up here or learn more about the course here and here!

FJP: It’s open to anyone with an internet connection and a disciplined desire to learn, and I have at least one of these things. Also teaching innovative journalism via an innovative education platform seems meta in a good way to me. Hope to see some of you in the “classroom”!  Shining

Video: YouTube, Data-Driven Journalism: The Basics - MOOC Course with Knight Center

Compare Cities with Mapping Tool Urban Observatory

via Next City:

The cities of the world have a communication problem, and Richard Saul Wurman wants to solve it.

“They don’t collect their information the same way. They don’t describe themselves with the same legend,” says Wurman, an architect, graphic designer and founder of the TED conferences. “One city might have five different patterns of industrial types of land use and another might have one. One city might call an airport ‘transportation’ and another might call it ‘commercial.’ They call everything by different names.”

[…] Wurman has partnered with GIS mapping firm Esri and production company Radical Media to create a digital, data-rich city comparison tool called Urban Observatory.

Wurman unveiled the project Monday at the Esri International User Conference, currently underway in San Diego. (About five years ago he tried to launch a similar project called 19.20.21, which sought to compare the largest cities in the world, though that effort fizzled.)

This updated project places 16 different world cities next to each other and maps out various pieces of public data. Urban Observatory’s website places maps of three cities side by side and enables users to select which of the 15 data variables to display. Users can compare housing density in Tokyo, Paris and New York, or traffic in Singapore, Rotterdam and Delhi, or how much open space there is in London versus Mumbai versus Auckland. The maps can be dragged and zoomed and, crucially, always display at the same scale, making it easy to see how each city compares to the other.

“Every city has plans to try to reduce their crime, improve their education, reduce their pollution, reduce flooding, have better utilities,” Wurman says. “Wouldn’t it be nice if they could learn from the successes and failures of other places?”

Images: Top: physical exhibition of project at the Esri conference, Urban Observatory’s Flickr. Bottom: Urban Observatory comparison maps, ArchDaily.

The World of Verified Twitter users

Twitter constructed a nifty visualization map of the mutual follows between 50,000 users with verified accounts. The map categorized the users by color: news (blue), government and politics (purple), music (red), sports (yellow) and TV (green). Twitter found some interesting trends:

One of the many fascinating things about this diagram is that it shows which accounts tend to follow those outside their category. For example, the reason that blue and purple almost seem to merge into one another is that journalists tend to follow politicians, and vice versa. The same is true of TV and music, down in the bottom right, with musicians and TV stars following each other often.

We can even see how usage varies by country. For instance, on the left you have a purple swath of government users following yellow sports users — it turns out these are largely UK politicians following prominent athletes. In the top middle, a line of Spanish-language pop stars, TV companies, sportspeople and government bodies. The purple outcrop at around two o’clock is Japanese politics; the red island below it is Japanese music.

Images: Twitter Media Blog, interactive map (bottom is zoomed-in image)

The Plural of Anecdote is Data
Via @kissane.

The Plural of Anecdote is Data

Via @kissane.

Selling Data, Taking Things in Your Hands Edition
A common truism says that if it’s free and on the Web, you’re not the customer but the product being sold. Also common is the following reaction: what can I do about that. The less common reaction: How can I get in on that?
Try this one on as a thought experiment.
Via Slate:

In a world of privacy-invading smartphone apps and government-grade spyware, keeping personal data personal online can seem like a difficult task. But could you make money by choosing to give away logs of your most intimate data?
Federico Zannier is trying to find out. Emails, chat logs, location data, browser history, screenshots—you name it, the New York-based software developer is selling it all.With a Kickstarter campaign launched earlier this month, Zannier, a 28-year-old Italian-born master’s student at NYU, is offering to hand over a day’s digital footprint for a measly $2. He says he “violated his own privacy” starting back in February for about 50 days straight, recording screenshots and webcam snaps of himself every 30 seconds and tracking his every footstep using GPS technology. He logged the address of each Web page he visited—storing some 3 million lines of text—and accumulated a massive trove of 21,124 webcam photos and 19,920 screen shots.
Zannier’s aim, somewhat paradoxically, is to take ownership of his own data by selling it. He points out that we often hand over our private data unwittingly, given that few people take the time to read the terms and conditions of apps and online services. Companies rake in millions of dollars selling our information to marketing firms while we receive little in return. But Zannier’s Kickstarter is not just out to make a statement about online privacy—he plans to use the funds to create a browser extension and a smartphone app that he says will help others sell their own data. “If more people do the same, I’m thinking marketers could just pay us directly for our data,” he writes on his Kickstarter page. “It might sound crazy, but so is giving all our data away for free.”

So, just as the Web often disrupts, let’s cut out the middle man.
Image: It’s Free, But They Sell Your Information, via Telco 2.0.

Selling Data, Taking Things in Your Hands Edition

A common truism says that if it’s free and on the Web, you’re not the customer but the product being sold. Also common is the following reaction: what can I do about that. The less common reaction: How can I get in on that?

Try this one on as a thought experiment.

Via Slate:

In a world of privacy-invading smartphone apps and government-grade spyware, keeping personal data personal online can seem like a difficult task. But could you make money by choosing to give away logs of your most intimate data?

Federico Zannier is trying to find out. Emails, chat logs, location data, browser history, screenshots—you name it, the New York-based software developer is selling it all.With a Kickstarter campaign launched earlier this month, Zannier, a 28-year-old Italian-born master’s student at NYU, is offering to hand over a day’s digital footprint for a measly $2. He says he “violated his own privacy” starting back in February for about 50 days straight, recording screenshots and webcam snaps of himself every 30 seconds and tracking his every footstep using GPS technology. He logged the address of each Web page he visited—storing some 3 million lines of text—and accumulated a massive trove of 21,124 webcam photos and 19,920 screen shots.

Zannier’s aim, somewhat paradoxically, is to take ownership of his own data by selling it. He points out that we often hand over our private data unwittingly, given that few people take the time to read the terms and conditions of apps and online services. Companies rake in millions of dollars selling our information to marketing firms while we receive little in return. But Zannier’s Kickstarter is not just out to make a statement about online privacy—he plans to use the funds to create a browser extension and a smartphone app that he says will help others sell their own data. “If more people do the same, I’m thinking marketers could just pay us directly for our data,” he writes on his Kickstarter page. “It might sound crazy, but so is giving all our data away for free.”

So, just as the Web often disrupts, let’s cut out the middle man.

Image: It’s Free, But They Sell Your Information, via Telco 2.0.

#OpenGov
President Obama signs an Executive Order: Making Open and Machine Readable the New Default for Government Information.

To promote continued job growth, Government efficiency, and the social good that can be gained from opening Government data to the public, the default state of new and modernized Government information resources shall be open and machine readable. Government information shall be managed as an asset throughout its life cycle to promote interoperability and openness, and, wherever possible and legally permissible, to ensure that data are released to the public in ways that make the data easy to find, accessible, and usable.

Image: Twitter post from Luke Fretwell.

#OpenGov

President Obama signs an Executive Order: Making Open and Machine Readable the New Default for Government Information.

To promote continued job growth, Government efficiency, and the social good that can be gained from opening Government data to the public, the default state of new and modernized Government information resources shall be open and machine readable. Government information shall be managed as an asset throughout its life cycle to promote interoperability and openness, and, wherever possible and legally permissible, to ensure that data are released to the public in ways that make the data easy to find, accessible, and usable.

Image: Twitter post from Luke Fretwell.

The Periodic Table of Star Wars, Episodes IV, V and VI

While they don’t claim to have every character in the original trilogy, they do have the major ones.

Via etckt:

The first thing we had to think about when designing this new table of elements was the data that was to be contained on the tile. Naturally, there is the Element ID and name but what else could we include. Working through some thumbnails, we settled on the cast order, episode number and the actor’s initials.

When working through the first drafts, it was starting to look good, but wasn’t entirely what the original concept we had hoped for delivering. After much research, we were able to find one of the alphabets used in the films, Arabesh, and decided to use that for some of the ancillary data on the tile.

The coloring of the elements comes from variations on Luke and Darth Vader’s light sabers.

FJP: Be still, nerd hearts. Be still.

The Geography of a Tweet
A team of researchers lead by GDELT co-creator Kalev Leetaru gained access to the Twitter decahose last October and November and examined 1.5 billion tweets from 71 million users.
Among the many things they parsed from the two terabytes of data was the average physical distance between an original tweet its retweet: Some 749 miles (1205 km).
For @ mentions, the average distance between one user referencing another when exact geolocation is known is 744 miles (1197 km).
The paper, Mapping the Global Twitter Heartbeat: The Geography of Twitter, also includes the geographic difference between mainstream news media and news items from Twitter:

Mainstream media appears to have significantly less coverage of Latin America and vastly better greater of Africa. It also covers China and Iran much more strongly, given their bans on Twitter, as well as having enhanced coverage of India and the Western half of the United States. Overall, mainstream media appears to have more even coverage, with less clustering around major cities.

Image: Detail, Network map showing locations of users retweeting other users (geocoded Twitter Decahose tweets 23 October 2012 to 30 November 2012), via FirstMonday.org. Select to embiggen.

The Geography of a Tweet

A team of researchers lead by GDELT co-creator Kalev Leetaru gained access to the Twitter decahose last October and November and examined 1.5 billion tweets from 71 million users.

Among the many things they parsed from the two terabytes of data was the average physical distance between an original tweet its retweet: Some 749 miles (1205 km).

For @ mentions, the average distance between one user referencing another when exact geolocation is known is 744 miles (1197 km).

The paper, Mapping the Global Twitter Heartbeat: The Geography of Twitter, also includes the geographic difference between mainstream news media and news items from Twitter:

Mainstream media appears to have significantly less coverage of Latin America and vastly better greater of Africa. It also covers China and Iran much more strongly, given their bans on Twitter, as well as having enhanced coverage of India and the Western half of the United States. Overall, mainstream media appears to have more even coverage, with less clustering around major cities.

Image: Detail, Network map showing locations of users retweeting other users (geocoded Twitter Decahose tweets 23 October 2012 to 30 November 2012), via FirstMonday.org. Select to embiggen.

Data Journalism: From the Inbox
any recommendations for training/workshops in data journalism? (also, i love this blog) — aliciee
Hi there. We love that you love this blog. Here goes:
Since I don’t know where you actually are I’m going to stick to mostly online resources.
One place I’d start is Lynda.com which is an online training site with video-based courses that range from desktop applications like Photoshop to programming languages like Ruby. It’s subscription-based but you can pay by the month ($25) and drop it at any time. Two courses that might be of interest are Interactive Data Visualization with Processing and Up and Running with R. Also, if you’re still in school, see if it’s available to you for free. Jihii has free access to it at Columbia.
One of the hard things about answering this question though is that there are various moving parts, not least of which is what tools you want to be working with. I mentioned R and Processing above, but there are also tools like Google’s Google’s Fusion Tables, Hadoop and Gephi, not to mention a whole host of others.
Which, come to think of it, is probably why you’re asking about training and workshops. Figuring out where to start can be confusing.
So here are some places to start:
Go Through the Data Journalism Handbook.
Review DataVisualization’s inspiration on tools you can use.
Hit up Reddit, and head to the subreddits such as this one on visualization. Ask questions.
Go to Perugia, Italy. There’s a data journalism conference going on there April 24-28… We can fantasize, right?
In the offline world, take a look at Meetup and Eventbrite for events and workshops. They pop up all the time. For example, here are upcoming workshops in New York City and here are NYC Meetup groups that focus on data.
So, with apologies for not being more specific on actual workshops, that’s what I got for you. Hope it helps. — Michael
Have a question? Ask away.
Image: Using Google Earth to visualize marine and coastal data. Via OpenEarth.

Data Journalism: From the Inbox

any recommendations for training/workshops in data journalism? (also, i love this blog) — aliciee

Hi there. We love that you love this blog. Here goes:

Since I don’t know where you actually are I’m going to stick to mostly online resources.

One place I’d start is Lynda.com which is an online training site with video-based courses that range from desktop applications like Photoshop to programming languages like Ruby. It’s subscription-based but you can pay by the month ($25) and drop it at any time. Two courses that might be of interest are Interactive Data Visualization with Processing and Up and Running with R. Also, if you’re still in school, see if it’s available to you for free. Jihii has free access to it at Columbia.

One of the hard things about answering this question though is that there are various moving parts, not least of which is what tools you want to be working with. I mentioned R and Processing above, but there are also tools like Google’s Google’s Fusion Tables, Hadoop and Gephi, not to mention a whole host of others.

Which, come to think of it, is probably why you’re asking about training and workshops. Figuring out where to start can be confusing.

So here are some places to start:

So, with apologies for not being more specific on actual workshops, that’s what I got for you. Hope it helps. — Michael

Have a question? Ask away.

Image: Using Google Earth to visualize marine and coastal data. Via OpenEarth.

Making Data Sausage
New York Times Senior Software Architect Jacob Harris takes a deep look at how to work with data to tell a narrative. He does so by going step by step through his own analysis of US food safety from data sets of food recalls taken from the US Department of Agriculture.
While Harris defines himself as a computer scientist, rather than a journalist, he says the reporting process for each is much the same once he starts looking at data. Specifically, as he sets out he works on:
Gathering the data we need to tell a story
“Interviewing” the data to find its strengths and limitations
Finding the specific narratives in the data we want to share and can support with data
It’s this second step, the interviewing, I find most interesting. I also like his word choice. It’s much less marshal than the “interrogation” many use to describe the process.
Before jumping into his case study, Harris writes:

What do I mean by narrative? Narrative is what makes it data journalism. We could just put a large PDF or SQL dump online, but that’s not very informative to anyone but experts. The art is finding the stories in the data the way a sculptor finds a statue in the marble.

For the data-curious, give Harris a read. He moves from high level strategizing and understanding of how to analyze data, including what type of questions to ask of it during “the interview,” to getting down and dirty with Ruby on Rails examples of how to actually work with the data once scraped. In other words, there’s fun for the whole family here.
Related: Our interview with Bitly data chief Hilary Mason about her methodology for working with data.
Somewhat Related: Alex Williams, Techcrunch. Data Is Not Killing Creativity, It’s Just Changing How We Tell Stories.
Image: Sausage Making via Wikimedia Commons.

Making Data Sausage

New York Times Senior Software Architect Jacob Harris takes a deep look at how to work with data to tell a narrative. He does so by going step by step through his own analysis of US food safety from data sets of food recalls taken from the US Department of Agriculture.

While Harris defines himself as a computer scientist, rather than a journalist, he says the reporting process for each is much the same once he starts looking at data. Specifically, as he sets out he works on:

  1. Gathering the data we need to tell a story
  2. “Interviewing” the data to find its strengths and limitations
  3. Finding the specific narratives in the data we want to share and can support with data

It’s this second step, the interviewing, I find most interesting. I also like his word choice. It’s much less marshal than the “interrogation” many use to describe the process.

Before jumping into his case study, Harris writes:

What do I mean by narrative? Narrative is what makes it data journalism. We could just put a large PDF or SQL dump online, but that’s not very informative to anyone but experts. The art is finding the stories in the data the way a sculptor finds a statue in the marble.

For the data-curious, give Harris a read. He moves from high level strategizing and understanding of how to analyze data, including what type of questions to ask of it during “the interview,” to getting down and dirty with Ruby on Rails examples of how to actually work with the data once scraped. In other words, there’s fun for the whole family here.

Related: Our interview with Bitly data chief Hilary Mason about her methodology for working with data.

Somewhat Related: Alex Williams, Techcrunch. Data Is Not Killing Creativity, It’s Just Changing How We Tell Stories.

Image: Sausage Making via Wikimedia Commons.

Visualizing George Takei Photo Sharing

When George Takei posts an image on Facebook it generally generates a lot of shares. For example, this image of Marvin the Martian, which Takei cleverly posted as “The first image has now been received from Curiosity on Mars,” has seen over 311,000 shares.

Stamen Design has looked at a few of Takei’s photo posts and visualized how they spread through the social network:

Called “Photo-sharing Explosions,” these visualizations look at the different ways that photos shared on George Takei’s Facebook page go viral once he’s posted them.

Each visualization is made up of a series of branches, starting from George. As each branch grows, re-shares split off onto their own arcs. Sometimes, these re-shares spawn a new generation of re-shares, and sometimes they explode in short-lived bursts of activity. The two different colors show gender, and each successive generation becomes lighter as time goes by. And the curves are just for snazz.

Visit Facebook Stories to see Stamen’s other Takei visualizations.