Hadoop, Say What? →

With an ecosystem of components with names like Pig, Oozie, Sqoop and Zookeeper among others, it can be difficult understanding what exactly the software framework Hadoop actually is.

Fortunately, Edd Dumbill at O’Reilly Radar gives a great explainer:

Apache Hadoop has been the driving force behind the growth of the big data industry. You’ll hear it mentioned often, along with associated technologies such as Hive and Pig…

…Hadoop brings the ability to cheaply process large amounts of data, regardless of its structure. By large, we mean from 10-100 gigabytes and above. How is this different from what went before?

Dumbill goes on to core components such as MapReduce, HDFS, and then explains others that improve programmability, data access, coordination and workflow, management and deployment, and machine learning.

Interested in more? Here are some tutorials to get you started:

  1. misantropo reblogged this from futurejournalismproject
  2. futurejournalismproject posted this