Hadoop is a parallel data processing framework that Google is based upon consisting of MapReduce and HDFS. It is used to batch process masses amount of data across many nodes.
Getting it running on Ubuntu Linux is easy and will be documented here shortly.
Overview (Single Node)
- Download Hadoop 0.20.2
- Download Pig
- Configure environment
Overview (Multi Node Cluster)
- As for Single node but configure relevant config files
- Without doubt, this tutorial helped me get off the ground much quicker: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/