Posts tagged with ‘data’
If Netflix can show such tiny slices of cinema to any given user, and they have 40 million users, how vast did their set of “personalized genres” need to be to describe the entire Hollywood universe?
This idle wonder turned to rabid fascination when I realized that I could capture each and every microgenre that Netflix’s algorithm has ever created.
Through a combination of elbow grease and spam-level repetition, we discovered that Netflix possesses not several hundred genres, or even several thousand, but 76,897 unique ways to describe types of movies…
…What emerged from the work is this conclusion: Netflix has meticulously analyzed and tagged every movie and TV show imaginable. They possess a stockpile of data about Hollywood entertainment that is absolutely unprecedented. The genres that I scraped and that we caricature above are just the surface manifestation of this deeper database.
— Alexis C. Madrigal, The Atlantic. How Netflix Reverse Engineered Hollywood.
No source of information is sacred: transaction records are bought in bulk from stores, retailers and merchants; magazine subscriptions are recorded; food and restaurant preferences are noted; public records and social networks are scoured and scraped. What kind of prescription drugs did you buy? What kind of books are you interested in? Are you a registered voter? To what non-profits do you donate? What movies do you watch? Political documentaries? Hunting reality TV shows?
That info is combined and kept up to date with address, payroll information, phone numbers, email accounts, social security numbers, vehicle registration and financial history. And all that is sliced, isolated, analyzed and mined for data about you and your habits in a million different ways…
…Take MEDbase200, a boutique for-profit intel outfit that specializes in selling health-related consumer data. Well, until last week, the company offered its clients a list of rape victims (or “rape sufferers,” as the company calls them) at the low price of $79.00 per thousand. The company claims to have segmented this data set into hundreds of different categories, including stuff like the ailments they suffer, prescription drugs they take and their ethnicity…
…[I]f lists of rape victims aren’t your thing, MEDbase can sell dossiers on people suffering from anorexia, substance abuse, AIDS and HIV, Alzheimer’s Disease, Asperger Disorder, Attention Deficit Hyperactivity Disorder, Bedwetting (Enuresis), Binge Eating Disorder, Depression, Fetal Alcohol Syndrome, Genital Herpes, Genital Warts, Gonorrhea, Homelessness, Infertility, Syphilis… the list goes on and on and on and on.
PandoDaily reports that some 4,000 data mining companies generate about $200 billion annually.
— Nate Silver in a Q&A with Harvard Business Review on how to get into data science as a newbie (student, professional, or otherwise).
New York Magazine, Final Tally: Americans Were 12 Times More Interested in Miley Cyrus Than Syria.
Background: Outbrain, the content discovery platform, crunched numbers across its network of publishers to compare reader interest in stories about Syria versus those about Miley Cyrus:
Globally, there were almost 2.5 times as many available stories on Syria as there were on Miley Cyrus. Yet consumption of those Miley stories outpaced Syria by a factor of 8-to-1. And in the United States? 12-to-1!
Before those outside the States start casting their serious news stones, take stock: “Interest in the starlet significantly outpaced Syria in England, Australia, France, Germany, and every other nation in Outbrain’s analysis — except Israel and Russia.”
We just happen to fetishize her a bit more.