Hadoop, Pig, Apache and Squid Log Processing

I’ve been experimenting with Hadoop to help process the gigabytes of logs generated from Apache and Squid where I work. Currently, we’ve a very small proof of concept cluster comprising of 5 nodes that is churning through Squid logs hourly to produce GnuPlot graphs of traffic over the last hour and last day.

I’ll not cover Hadoop in any detail here (there are many places to look for this – for example Yahoo! and Cloudera) but I’ll document the scripts used to get Hadoop processing Squid and Apache logs here.

Advertisements

One comment on “Hadoop, Pig, Apache and Squid Log Processing

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s