Mod_rpaf: Extract real-IP from behind reverse proxy/load balancer

There are many reasons why you’d want to front your Apache-based site behind a reverse proxy but one of the sacrifices you make in speed and security, you would by default lose a sensible way to extract the real IP (or at least have a very good chance to) of the client making the request. This is because your reverse proxy or load balancer in many situations becomes the client and to Apache this means you will only ever see the IP (the internal IP if its local to your Apache server farm) which become useless if you are doing analytics on the Apache logs or making decisions for the request based on the IP (for example Geo decisions, or ModSecurity).
To circumvent this, you’d want to use mod_rpaf (http://stderr.net/apache/rpaf/). This module is easily integrated into your Apache environment using the following method (Apache 2.x):

sudo apxs -i -c -n mod_rpaf-2.0.so mod_rpaf-2.0.c

And then configuring Apache to pick up this module:

# if DSO load module first:
LoadModule rpaf_module modules/mod_rpaf-2.0.so
RPAFenable On
RPAFsethostname Off
RPAFproxy_ips 10.10.10.1
RPAFheader X-Forwarded-For

This says look in the X-Forwarded-For header for the existence of the IPs listed in the RPAFproxy_ips line, and if found, pick up the external IP.

Unfortunately in practice this doesn’t work so well for a couple of reasons:

1. When auto-scaling an environment such as in Amazon, and you’re using something like ELB, you won’t know what internal IP the ELB will talk to Apache on – causing you to not be able to auto-configure mod_rpaf at install/run-time.
2. It makes an assumption on the external IP: it just takes the last IP seen on the X-Forwarded-For line.

Number 2 caused me issues because it is possible to have an internal IP address as the last entry on the X-Forwarded-For line – and if inspected properly, the external IP would be somewhere along the X-Forwarded-For line. To get around this and the ability to have more flexibility in the mod_rpaf config so that the code invoked to expose the real IP when an internal address was seen I created the following patch:

--- mod_rpaf-2.0.c 2011-06-23 13:51:53.000000000 +0100
 +++ mod_rpaf-2.0.c.new 2011-06-24 16:08:18.000000000 +0100
 @@ -71,6 +71,7 @@
 #include "http_protocol.h"
 #include "http_vhost.h"
 #include "apr_strings.h"
 +#include "string.h"
module AP_MODULE_DECLARE_DATA rpaf_module;
@@ -136,10 +137,14 @@
 }
static int is_in_array(const char *remote_ip, apr_array_header_t *proxy_ips) {
 - int i;
 + int i,len;
 + char tmp[16];
 char **list = (char**)proxy_ips->elts;
 for (i = 0; i nelts; i++) {
 - if (strcmp(remote_ip, list[i]) == 0)
 + len=strlen(list[i]);
 + strncpy(tmp,remote_ip,len);
 + tmp[len]='';
 + if (strcmp(tmp, list[i]) == 0)
 return 1;
 }
 return 0;
 @@ -155,6 +160,7 @@
 static int change_remote_ip(request_rec *r) {
 const char *fwdvalue;
 char *val;
 + int i;
 rpaf_server_cfg *cfg = (rpaf_server_cfg *)ap_get_module_config(r->server->module_config,
 &rpaf_module);
@@ -183,7 +189,10 @@
 rcr->old_ip = apr_pstrdup(r->connection->pool, r->connection->remote_ip);
 rcr->r = r;
 apr_pool_cleanup_register(r->pool, (void *)rcr, rpaf_cleanup, apr_pool_cleanup_null);
 - r->connection->remote_ip = apr_pstrdup(r->connection->pool, ((char **)arr->elts)[((arr->nelts)-1)]);
 + for(i=arr->nelts-1; i >= 0; i--) {
 + if (is_in_array( apr_pstrdup(r->connection->pool, ((char **)arr->elts)[i]), cfg->proxy_ips) == 0 )
 + r->connection->remote_ip = apr_pstrdup(r->connection->pool, ((char **)arr->elts)[i]);
 + }
 r->connection->remote_addr->sa.sin.sin_addr.s_addr = apr_inet_addr(r->connection->remote_ip);
 if (cfg->sethostname) {
 const char *hostvalue;

Patch and compile/install as follows:

patch -p1 < patch_mod_rpaf
sudo apxs -i -c -n mod_rpaf-2.0.so mod_rpaf-2.0.c

This allows you to use a modified config file which allows you to run Apache behind, say, ELB and it extracts the last seen external IP (i.e. not 10. 172. or 192.168.). Of course, edit to suit your particular environment:

# if DSO load module first:
LoadModule rpaf_module modules/mod_rpaf-2.0.so
RPAFenable On
RPAFsethostname Off
RPAFproxy_ips 10. 172. 192.168.
RPAFheader X-Forwarded-For

When it sees a line like

X-Forwarded-For: 192.168.100.227, 209.88.21.195, 10.58.59.219, 192.168.123.123

It will pick out 209.88.21.195 as the real IP.

Apache, FancyIndexing and PHP 5 (mod_autoindex)

Introduction
The default Directory Listing in Apache is pretty much awful, but I had a need to present some files through a web browser. Rather than produce something with PHP alone I decided to enhance the Apache FancyIndexing option as it is designed for exactly this purpose.
I came across a nice PHP enhancement (update to include link and credit) to the FancyIndexing that added guided navigation to the directory listing, as well as improve the default font and general styling thanks to effective use of CSS.

Instructions

1. Edit httpd.conf and add or modify the following

AccessFileName .htaccess
<Files ~ “^\.ht”>
Order allow,deny
Deny from all
</Files>

<Directory /your/directory>
AllowOverride all
</Directory>

2. In the directory you want the listing of add the following .htaccess file

Options +Indexes +FollowSymlinks
IndexOptions FancyIndexing HTMLTable FoldersFirst SuppressRules SuppressDescription SuppressHTMLPreamble Charset=UTF-8
#
# AddIcon* directives tell the server which icon to show for different# files or filename extensions.  These are only displayed for
# FancyIndexed directories.
#
AddIcon /autoindex/icons/application.png .exe .app
AddIcon /autoindex/icons/type_binary.png .bin .hqx .uu
AddIcon /autoindex/icons/type_box.png .tar .tgz .tbz .tbz2 bundle .rar
AddIcon /autoindex/icons/type_code.png .html .htm .htx .htmls .dhtml .phtml .shtml .inc .ssi .c .cc .css .h .rb .js .rb .pl .py .sh .shar .csh .ksh .tcl .as
AddIcon /autoindex/icons/type_database.png .db .sqlite .dat
AddIcon /autoindex/icons/type_disc.png .iso .image
AddIcon /autoindex/icons/type_document.png .ttf
AddIcon /autoindex/icons/type_excel.png .xlsx .xls .xlm .xlt .xla .xlb .xld .xlk .xll .xlv .xlw
AddIcon /autoindex/icons/type_flash.png .flv
AddIcon /autoindex/icons/type_illustrator.png .ai .eps .epsf .epsi
AddIcon /autoindex/icons/type_pdf.png .pdf
AddIcon /autoindex/icons/type_php.png .php .phps .php5 .php3 .php4 .phtm
AddIcon /autoindex/icons/type_photoshop.png .psd
AddIcon /autoindex/icons/monitor.png .ps
AddIcon /autoindex/icons/type_powerpoint.png .ppt .pptx .ppz .pot .pwz .ppa .pps .pow
AddIcon /autoindex/icons/type_swf.png .swf
AddIcon /autoindex/icons/type_text.png .tex .dvi
AddIcon /autoindex/icons/type_vcf.png .vcf .vcard
AddIcon /autoindex/icons/type_word.png .doc .docx
AddIcon /autoindex/icons/type_zip.png .Z .z .tgz .gz .zip
AddIcon /autoindex/icons/globe.png .wrl .wrl.gz .vrm .vrml .iv
AddIcon /autoindex/icons/vector.png .plot

AddIconByType (TXT,/autoindex/icons/type_text.png) text/*
AddIconByType (IMG,/autoindex/icons/type_image.png) image/*
AddIconByType (SND,/autoindex/icons/type_audio.png) audio/*
AddIconByType (VID,/autoindex/icons/type_video.png) video/*
AddIconByEncoding (CMP,/autoindex/icons/type_box.png) x-compress x-gzip
AddIcon /autoindex/icons/back.png ..
AddIcon /autoindex/icons/information.png README INSTALL
AddIcon /autoindex/icons/type_folder.png ^^DIRECTORY^^
AddIcon /autoindex/icons/blank.png ^^BLANKICON^^

#
# DefaultIcon is which icon to show for files which do not have an icon# explicitly set.
#
DefaultIcon /autoindex/icons/type_document.png
#
# Enables PHP to be used in our header file
# 
AddHandler application/x-httpd-php .php
AddType text/html .php .html
#
# ReadmeName is the name of the README file the server will look for by
# default, and append to directory listings.
#
# HeaderName is the name of a file which should be prepended to
# directory indexes.
ReadmeName /autoindex/footer.php
HeaderName /autoindex/header.php
#
# IndexIgnore is a set of filenames which directory indexing should ignore
# and not include in the listing.  Shell-style wildcarding is permitted.
#
IndexIgnore autoindex .??* *~ *# RCS CVS *,v *,t *.dat ..

IndexOptions +NameWidth=42
AddDescription "PNG images" *.png

Warning for PHP 5.3 and higher

I originally had this running with PHP 5.1 and it was working great.  I upgraded to PHP 5.3.3 (latest at the time of writing) and it refused to parse the PHP, despite the PHP working if I called the Header and Footer PHP pages directly.

It turned out to be the directive XHTML in the IndexOptions line.  Remove this and it will parse.  XHTML says:

The XHTML keyword forces mod_autoindex to emit XHTML 1.0 code instead of HTML 3.2.

Whereas the same pages says that a Header/Readme filename “must resolve to a document with a major content type of text/* (e.g.text/htmltext/plain, etc.).”

Building Apache 2.2, PHP 5 with GD and MySQLi support from source

1. Download the following
Apache 2.2 from http://httpd.apache.org/download.cgi [2.2.17]
PHP 5.3.3 from http://www.php.net/downloads.php [5.3.3]
Expat from http://sourceforge.net/projects/expat/ [2.0.1]
JPEG from http://www.ijg.org/ [v8b]
PNG from http://sourceforge.net/projects/libpng/files/ [1.4.4]

2. Apache

./configure --enable-so --enable-modules=most --enable-proxy --with-mpm=worker --disable-imap --enable-deflate
make
sudo make install

3. Expat XML Parser

./configure
make
sudo make install

4 JPEG

./configure
make
sudo make install

5. PNG

./configure
make
sudo make install

6. PHP

./configure --disable-cli --enable-embedded-mysqli --with-zlib --enable-shared --with-apxs2=/usr/local/apache2/bin/apxs --with-gd
make
sudo make install

Hadoop, Pig, Apache and Squid Log Processing

I’ve been experimenting with Hadoop to help process the gigabytes of logs generated from Apache and Squid where I work. Currently, we’ve a very small proof of concept cluster comprising of 5 nodes that is churning through Squid logs hourly to produce GnuPlot graphs of traffic over the last hour and last day.

I’ll not cover Hadoop in any detail here (there are many places to look for this – for example Yahoo! and Cloudera) but I’ll document the scripts used to get Hadoop processing Squid and Apache logs here.