Posts tagged Wikipedia

In less than a decade, Wikipedia has grown from a frequently ridiculed experiment to one of the world’s most popular websites. The online encyclopedia has reached near-ubiquity among Internet users and is often invoked as a synecdoche for user-generated content communities, crowdsourcing, peer production, and Web 2.0. As such, it is hardly surprising that a number of high-impact statistics demonstrating the project’s unexpected success are frequently mentioned in the public sphere. As of April 2012, there have been 528 million edits made to the English-language version and a total of 1.29 billion edits across all language versions. Other commentators describe the project in terms of its article content, not the amount of work put into those articles, and such figures are equally daunting: 19 million encyclopedia articles contain 8 billion words in 270 languages, and the English-language Wikipedia alone has 3.9 million articles containing 2.5 billion words.

While most of these and other statistics are backed up by a substantial amount of empirical research, estimations of the total number of labor-hours contributed to Wikipedia are one notable exception. However, this has not stopped champions of the project from stating with more and less certainty that Wikipedia is one of the largest projects in human history…

…[A] well-documented and often-repeated labor hour estimation is that of the Empire State Building, which took 3,000 laborers a total of 7 million labor-hours to construct. Figures for the construction of the Channel Tunnel report a total 170 million labor-hours, while estimations of the Great Pyramid at Giza range from 880 million to 3.5 billion labor-hours. The first edition of the Encyclopedia Britannica was written and published by 3 employees authoring 24 pages a week for 100 weeks, which is around 12,000 labor-hours assuming 40 hour work week…

…Summing the duration of all continuous editing sessions and single edit sessions, we identified 41,018,804 total labor-hours expended in the English-language version of Wikipedia… Extrapolating to all language version of Wikipedia based on the total number of edits made to each project, we estimate that 61,706,883 total labor-hours have been contributed in edit sessions for non-English language Wikipedias, for a total of 102,673,683 total labor-hours to all Wikipedia versions.

R. Stuart Geiger and Aaron Halfaker, Using Edit Sessions to Measure Participation in Wikipedia (PDF).

FJP: That’s approximately 11,720 years of peer production. 

Editing Wikipedia: Gender Disparity Edition
via The Atlantic Wire:

It’s long been known that the ranks of Wikipedia editors are mainly male. But now illustrator Santiago Ortiz has created an interactive looking at the proportion of edits on individual Wikipedia articles made by men vs. women, and it turns out that the gender divide on “the free encyclopedia that anyone can edit” is even starker than we thought.

Read About It.
Image: Screenshot of the most edited articles (green) and most visited articles (blue) by gender on Wikipedia.

Editing Wikipedia: Gender Disparity Edition

via The Atlantic Wire:

It’s long been known that the ranks of Wikipedia editors are mainly male. But now illustrator Santiago Ortiz has created an interactive looking at the proportion of edits on individual Wikipedia articles made by men vs. women, and it turns out that the gender divide on “the free encyclopedia that anyone can edit” is even starker than we thought.

Read About It.

Image: Screenshot of the most edited articles (green) and most visited articles (blue) by gender on Wikipedia.

Graphing the Influence of Thinkers and Ideas Throughout History

Brendan Griffen has graphed a network of all people on Wikipedia with who they’ve influenced and who they’re influenced by.

Via Griff’s Graphs:

For those new to this type of thing: the node size represents the number of connections. In short, I used a database version of Wikipedia to extract all people with known influences and made this map. The bigger the node, the bigger influence that person had on the rest of the network. Nietzsche, Kant, Hegel, Hemingway, Shakespeare, Plato, Aristotle, Kafka, and Lovecraft all, as one would expect, appear as the largest nodes. Around these nodes, cluster other personalities who are affiliated (depends on distance). Highlighting communities by colour reveals sub-networks within the total structure. You’ll notice common themes amongst similarly coloured authors.

Griffen’s influence is Simon Raper who recently graphed the history of philosophy.

The tools used are similar too:

First I queried Snorql and retrieved every person who had a registered ‘influence’ or registered ‘influenced by’ value (restricted to people only so if they were influenced by ‘anime’, they were excluded).

I then decoded these using a neat little URL decoder and imported them into Microsoft Excel for further processing (removing things like ‘(Musician)’ and other annoying syntax).

I then exported these as a csv and imported into Gephi and proceeded as usual. Fruchterman-Reingold algorithm followed by Force Atlas 2. I then identified communities using ‘Modularity’ and edited the rest in Preview. Due to the size, I’ve had to zoom up and take snapshots on regions of interest.

The csv file containing all of the data can be obtained here so you can make your own maps.

And yes, as Griffen notes, the information and visualization is biased towards Western ideas and cultures since Wikipedia skews heavily toward English speakers.

Meantime, we’re absolutely gobsmacked.

Read Griffen’s post on the project. Check out zoomable version. Get yourself a pretty print.

Images: Partial screenshots of Graphing Every* Idea in History, by Brendan Griffen. Select to embiggen.

H/T: Flowing Data.

Taking Wikipedia’s Pulse, Musically

What do changes to Wikipedia sound like? Well, if you track all edits — which are currently pushing about 400 per minute — and mapped them to Open Sound Control, Pure Data and wikibeat, you come out with some modernist beats.

Watch the above screencast by wikibeat creator Dan Chudnov as he does just this. The audio kicks in about a minute into the video.

As Chudnov describes it, “wikibeat sonifies changes to wikipedia as they happen. it uses Ed Summers’ wikipulse, which monitors changes to each language-specific wikipedia and displays their rates of change as gauges, and creates a series of audible beats based on these change rates. it does this by sending the change rate information to a Pure Data application over OSC.”

Read through for links to source code to try it on your own.

H/T: Dario Taraborelli, Senior Research Analyst at the Wikimedia Foundation who we interviewed in January (podcast).

Wikipedia: Goodbye Google Maps, Hello Open Street Maps

Wikipedia joins a growing list of high profile organizations leaving Google Maps and moving to the open source Open Street Maps. The move comes after Google announced in March that they would begin charging Web sites that receive more than 25,000 requests per month for use of their maps.

Via Wikipedia:

Previous versions of our application used Google Maps for the nearby view. This has now been replaced with OpenStreetMap - an open and free source of Map Data that has been referred to as ‘Wikipedia for Maps.’ This closely aligns with our goal of making knowledge available in a free and open manner to everyone. This also means we no longer have to use proprietary Google APIs in our code, which helps it run on the millions of cheap Android handsets that are purely open source and do not have the proprietary Google applications.

OpenStreetMap is used in both iOS and Android, thanks to the amazing Leaflet.js library. We are currently using Mapquest’s map tiles for our application, but plan on switching to our own tile servers in the near future.

Also, via Techspot, a look at mapping economics:

In March, Google announced it would be charging high-volume users for its once gratis Google Maps service. Developer accounts which pull in fewer than 25,000 requests per month are not considered high-volume and thus have remained free. However, for accounts that exceed 25,000 views, developers must pay between $4 to $10 for every additional 1,000 views generated. For popular websites and apps that rely on Google Maps APIs, this can add up pretty quickly…

…Although some may be quick to call out Google for its decision to charge a premium, Google Maps has really been the only mapping service to offer its product to everyone without cost. Traditionally, companies like NavTeq and TeleNav have always licensed their map data to third parties. It costs a lot of money to put together accurate maps and Google took some risk offering theirs free of charge. As a result, Google Maps has become the go-to place for many companies and users alike. In fact, comScore found that over 71% of Americans had used Google Maps in February.

Mapping Wikipedia
Via Tracemedia:

Mapping Wikipedia is a groundbreaking visualisation of the world mapped according to articles in 7 different languages. The map displays both the global patterns and the vast number of geo-located items. The dataset was produced by the Oxford Internet Institute as part of a project that examines Wikipedia in the Middle East and North Africa…
…The project was developed using the excellent Open Layers. To display the large number of articles we wrote a subclass of the Open Layers Canvas renderer, and optimised for point plotting. As a fallback for browsers that don’t support canvas we included the FlashCanvas shim. 
The Google basemap was produced using the Styled Map Wizard
To glue everything together we used jQuery.

Image: English, by Wordcount in Europe. Via Tracemedia.
H/T: Flowing Data

Mapping Wikipedia

Via Tracemedia:

Mapping Wikipedia is a groundbreaking visualisation of the world mapped according to articles in 7 different languages. The map displays both the global patterns and the vast number of geo-located items. The dataset was produced by the Oxford Internet Institute as part of a project that examines Wikipedia in the Middle East and North Africa…

…The project was developed using the excellent Open Layers. To display the large number of articles we wrote a subclass of the Open Layers Canvas renderer, and optimised for point plotting. As a fallback for browsers that don’t support canvas we included the FlashCanvas shim. 

The Google basemap was produced using the Styled Map Wizard

To glue everything together we used jQuery.

Image: English, by Wordcount in Europe. Via Tracemedia.

H/T: Flowing Data

Print will survive. Books will survive even longer. It’s print as a marker of prestige that’s dying.

Tim Carmody, in his piece, Wikipedia Didn’t Kill Britannica. Windows Did.

Carmody argues that Britannica began to die long before Wikipedia was around, likely around the time of the advent of Microsoft and Encarta CD-Roms. He’s not too sad about it though, because he likens Britannica to a marker of prestige more than an actually valuable, usable information source.

In short, Britannica was the 18th/19th equivalent of a shelf full of SAT prep guides. Or later, a family computer.

“I suspect almost no one ever opened their Britannicas,” says Appelbaum. “Britannica’s own market research showed that the typical encyclopedia owner opened his or her volumes less than once a year,” say Greenstein and Devereux.

“It’s not that Encarta made knowledge cheaper,” adds Appelbaum, “it’s that technology supplanted its role as a purchasable ‘edge’ for over-anxious parents. They bought junior a new PC instead of a Britannica.”

In another piece on the subject (this one in Slate) Farhad Manjoo also believes Britannica is fairly useless as an information source and rather, Wikipedia is sort of our savior.

My advice is to make the wiser, cheaper choice, one that will prove more helpful to your kids in the long run: Pay nothing to Britannica and teach your young ones to use Google and Wikipedia. While there are many legitimate complaints to be leveled at Wikipedia (rarely, it gets things wrong; sometimes, its entries are vandalized), the free, crowdsourced encyclopedia is better than Britannica in every way. It’s cheaper, it’s bigger, it’s more accessible, it’s more inclusive of differing viewpoints and subjects beyond traditional academic scholarship, its entries tend to include more references, and it is more up to date.

Most importantly, learning to navigate Google and Wikipedia prepares you for the real world, while learning to use Britannica teaches you nothing beyond whatever subject you’re investigating at the moment.

In that regard, Manjoo seems to have a point. Learning to fact-check is a key media literacy skill useful not only of journalists and writers but news consumers as well.

Don’t buy what Britannica’s selling. Its reliance on expert authority may yield mostly accurate information, but it teaches kids to believe everything they read. If you pay for this service, you’re building a cocoon of truth around students who’ll one day enter a world where everyone claims to be an expert—and where a lot of those people are lying. If you want to learn to suss out the liars, there’s no better training than Wikipedia.

Thoughts?

After 244 Years, Encyclopaedia Britannica Stops the Presses
From the NY Times:

After 244 years, the Encyclopaedia Britannica is going out of print.
Those coolly authoritative, gold-lettered sets of reference books that were once sold door to door by a fleet of traveling salesmen and displayed as proud fixtures in American homes will be discontinued, the company is expected to announce on Wednesday.
In a nod to the realities of the digital age — and, in particular, the competition from the hugely popular Wikipedia — Encyclopaedia Britannica will focus primarily on its online encyclopedias and educational curriculum for schools, company executives said.
The last edition of the encyclopedia will be the 2010 edition, a 32-volume set that weighs in at 129 pounds and includes new entries on global warming and the Human Genome Project.
“It’s a rite of passage in this new era,” Jorge Cauz, the president of Encyclopaedia Britannica Inc., a Chicago-based company, said in an interview. “Some people will feel sad about it and nostalgic about it. But we have a better tool now. The Web site is continuously updated, it’s much more expansive and it has multimedia.”

After 244 Years, Encyclopaedia Britannica Stops the Presses

From the NY Times:

After 244 years, the Encyclopaedia Britannica is going out of print.

Those coolly authoritative, gold-lettered sets of reference books that were once sold door to door by a fleet of traveling salesmen and displayed as proud fixtures in American homes will be discontinued, the company is expected to announce on Wednesday.

In a nod to the realities of the digital age — and, in particular, the competition from the hugely popular Wikipedia — Encyclopaedia Britannica will focus primarily on its online encyclopedias and educational curriculum for schools, company executives said.

The last edition of the encyclopedia will be the 2010 edition, a 32-volume set that weighs in at 129 pounds and includes new entries on global warming and the Human Genome Project.

“It’s a rite of passage in this new era,” Jorge Cauz, the president of Encyclopaedia Britannica Inc., a Chicago-based company, said in an interview. “Some people will feel sad about it and nostalgic about it. But we have a better tool now. The Web site is continuously updated, it’s much more expansive and it has multimedia.”

What Does Wikipedia Have Against Soap?
Image: Screenshot of Twitter reactions to Wikipedia blackout, via @herpderpedia

What Does Wikipedia Have Against Soap?

Image: Screenshot of Twitter reactions to Wikipedia blackout, via @herpderpedia

Data and Peer Production with Wikimedia’s Dario Taraborelli

I spoke with Dario Taraborelli, Senior Research Analyst at the Wikimedia Foundation, a few days ago. What interested me is a new open data and research infrastructure initiative Wikimedia is pursuing in order to put data in the hands of a wider audience.

What also interests me is how Wikimedia is implementing it: namely, by creating an online space for data consultations in order to really hear from data wranglers and journalists about what they’re looking for in an open data platform.

Dario talks about a number of initiatives Wikimedia is pursuing and resources it’s providing to do so. Here’s a hit list of sites he mentions if you’d like to explore:

Semantic Metadata

Geolocation Data

Pageview Data, Trending Topics, Real-time Edit Data

Wikimedia Research Hub

And, most importantly: the Wikimedia Foundation’s open data consultation.

Run Time: ~25:00

5 Ways to Get Your Own Copy of Wikipedia

thenextweb:

The apps don’t require you to be online to view the pages, and you’ll be able to reference Wikipedia no matter where you are, even when it goes dark on Wednesday.

FJP: The English language version of Wikipedia will be offline this Wednesday in protest of the SOPA/PIPA bills currently in the US Congress. So, if you can’t do without, check out these apps.

Happy 11th Birthday, Wikipedia!
Via Singularity Hub:

[I]t’s doing more than subsisting, it’s thriving. Wikimedia Foundation’s annual fund drive raised $4.5 million in 2008, $8.7 million in 2009, $15 million in 2010, and now $20 million in 2011. The drive is also getting faster (dropping from 67 days to 50 from 2009-2010), and broader, as seen in the increased number of donors. Besides Wikipedia, there are ten sister projects: Wiktionary, Wikibooks, Wikimedia Commons (aka Wikicommons), Wikispecies, Wikiquote, Wikisource, Wikiversity, Wikinews, MediaWiki, Wikimedia Incubator, and Wikimedia Metawiki. Each has its own dedicated user base and corps of volunteers. WMF has sites in almost every country and in 282 different languages.
The 2011-2012 Foundation Plan calls for expanding the sites further every year. The 2011-2012 budget is actually $28.3 million, with missing funds to be met by grants from institutions like the Sloan Foundation. (This drive and grant combination is the norm, and it seemingly works well.) Wikimedia has increased its hires, bringing the company from 50 to 78 in the past fiscal year and aiming to further increase staff by as many as 35 more hires. Wikimedia Foundation has plenty of money to spend as well, they run a high level of reserves ($13 million or so), and they continue to exceed their expectations in revenue. (Revenue was up 50% or so in 2010). To balance that boon, spending is going to increase by 24% in 2012 to invest in better harnessing the crowd.

Happy 11th Birthday, Wikipedia!

Via Singularity Hub:

[I]t’s doing more than subsisting, it’s thriving. Wikimedia Foundation’s annual fund drive raised $4.5 million in 2008, $8.7 million in 2009, $15 million in 2010, and now $20 million in 2011. The drive is also getting faster (dropping from 67 days to 50 from 2009-2010), and broader, as seen in the increased number of donors. Besides Wikipedia, there are ten sister projects: Wiktionary, Wikibooks, Wikimedia Commons (aka Wikicommons), Wikispecies, Wikiquote, Wikisource, Wikiversity, Wikinews, MediaWiki, Wikimedia Incubator, and Wikimedia Metawiki. Each has its own dedicated user base and corps of volunteers. WMF has sites in almost every country and in 282 different languages.

The 2011-2012 Foundation Plan calls for expanding the sites further every year. The 2011-2012 budget is actually $28.3 million, with missing funds to be met by grants from institutions like the Sloan Foundation. (This drive and grant combination is the norm, and it seemingly works well.) Wikimedia has increased its hires, bringing the company from 50 to 78 in the past fiscal year and aiming to further increase staff by as many as 35 more hires. Wikimedia Foundation has plenty of money to spend as well, they run a high level of reserves ($13 million or so), and they continue to exceed their expectations in revenue. (Revenue was up 50% or so in 2010). To balance that boon, spending is going to increase by 24% in 2012 to invest in better harnessing the crowd.

Pazzi Italiani

As the Italian government considers a law that would require Web sites to remove any content any person finds libelous, Wikipedia has shut dow the Italian version of its site.

The law requires publishers to remove content “within 48 hours of the request and, without any comment, a correction of any content that the applicant deems detrimental to his/her image.”

Via a notice currently up at it.wikipedia.org:

Unfortunately, the law does not require an evaluation of the claim by an impartial third judge - the opinion of the person allegedly injured is all that is required, in order to impose such correction to any website.

Hence, anyone who feels offended by any content published on a blog, an online newspaper and, most likely, even on Wikipedia can directly request to publish a “corrected” version, aimed to contradict and disprove the allegedly harmful contents, regardless of the truthfulness of the information deemed as offensive, and its sources…

…The obligation to publish on our site the correction as is, provided by the named paragraph 29, without even the right to discuss and verify the claim, is an unacceptable restriction of the freedom and independence of Wikipedia, to the point of distorting the principles on which the Free Encyclopedia is based and this would bring to a paralysis of the “horizontal” method of access and editing, putting - in fact - an end to its existence as we have known until today.

 
Tumblr is on a tear, now clocking more monthly pageviews than Wikipedia:
 

To get a sense of how much of an outlier Tumblr is when it comes to pageviews, let’s take a deeper look at the comScore numbers. (ComScore’s estimates are lower than Quantcast’s, but show the same general trends). In August, Tumblr entered the top 100 sites comScore tracks (at No. 99) with an estimated 41 million unique visitors. But it’s estimated 6.5 billion pageviews a month places it at No. 21 among all sites ranked by that metric. By comparison, the Wikimedia Foundation sites (which includes Wikipedia.org) get 423 million unique visitors a month who generate 5.6 billion pageviews.
Tumblr is also bigger than Twitter.com in pageviews (but not in unique visitors), and is about half the size of AOL and Craigslist. Below is a snapshot of where Tumblr ranks in Pageviews compared to other select sites in the ComScore Top 100.

Tumblr is on a tear, now clocking more monthly pageviews than Wikipedia:

To get a sense of how much of an outlier Tumblr is when it comes to pageviews, let’s take a deeper look at the comScore numbers. (ComScore’s estimates are lower than Quantcast’s, but show the same general trends). In August, Tumblr entered the top 100 sites comScore tracks (at No. 99) with an estimated 41 million unique visitors. But it’s estimated 6.5 billion pageviews a month places it at No. 21 among all sites ranked by that metric. By comparison, the Wikimedia Foundation sites (which includes Wikipedia.org) get 423 million unique visitors a month who generate 5.6 billion pageviews.

Tumblr is also bigger than Twitter.com in pageviews (but not in unique visitors), and is about half the size of AOL and Craigslist. Below is a snapshot of where Tumblr ranks in Pageviews compared to other select sites in the ComScore Top 100.