Hadoop and Pig on Ubuntu 10.04 Lucid Lynx

Hadoop is a parallel data processing framework that Google is based upon consisting of MapReduce and HDFS.  It is used to batch process masses amount of data across many nodes.

Getting it running on Ubuntu Linux is easy and will be documented here shortly.

Overview (Single Node)

  • Download Hadoop 0.20.2
  • Download Pig
  • Configure environment

Overview (Multi Node Cluster)

  • As for Single node but configure relevant config files

Thanks

Advertisements

2 comments on “Hadoop and Pig on Ubuntu 10.04 Lucid Lynx

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s