My response to “OpenStack is overstretched” – The Register

What do we get from writing an opinion online? Noteriety? Fame? Publicity?  I’m doing it because sometimes I need to fill in the blank space between the last post and this one.
I’ve just finished reading a Register article entitled “OpenStack is overstretched” in which a Enrico Signoretti writes that due to the popularity of OpenStack it seems like everyone needs to stick their awe in – good and bad – and the outcome is too many cooks.
Whilst I don’t question the burden of having to juggle a large number of vendors interests into a popular Open Source Cloud “Operating System” is a large one – isn’t this where Open Source projects shine? Sub groups work on particular code that may have a small group of people’s interests at heart. But The source code is available to do that right? What makes that feature get into the final cut? Does it matter so long as the core platform exists to serve the general public?
There is no doubt that OpenStack is a challenging product to use and implement and equally one that must keep the release manager busy and community as a whole – but their interests are to make a fantastic Open Source product available to all that want to take it in any direction they want.
We live in a world where we expect products to be able to have one-click installs – and in fact, a part of my job is to get to this stage in all aspects by automate everything. But sometimes it isn’t possible – but we can make it easier and certainly strive to get there. What we can do is to contribute back rather than sit back and watch a product mature enough to then add your own SKU that you can resell to keep your shareholders happy.
It sounds like people are frustrated because OpenStack has the potential to be big but its not ready to shine yet. There are companies using it, there are companies investigating it and there are companies that are contributing to it.

This is Open Source. You get the choice of what you want to do with it.

Ubuntu 11.10 Oneiric Ocelot on the desktop – my thoughts (and it’s not good)

So Ubuntu 11.10, aka Oneiric Ocelot has been out for a short while now and so far it has been nothing but pain for people upgrading from an earlier release. Not only are the bugs racking up (and some are showstoppers like my post regarding the “Waiting for network configuration” shows, but the move to Unity seems disastrous and is losing people’s allegiance to the once admired desktop Linux of choice for many.
Has Ubuntu lost its way here? Ubuntu’s parental backers, Canonical, are concentrating their efforts on their Ubuntu Cloud Infrastructure project and Ubuntu 11.10 on a server is great, even bringing with it an easier way to get OpenStack installed.
For me, Unity is a mistake. It made sense on my netbook, does it make sense on a touchscreen maybe, but it doesn’t make sense on my desktop. Integration with even the most basic apps are causing problems (Gvim anyone? Empathy?), its sluggish (Gwibber status updates take..an..age..to..input..).

Overall I’ve lost my faith in Ubuntu on the desktop, which is a shame as it was on the way to make adoption to an Open Source desktop possible.

sshfs – SSH FileSystem, replacement to NFS over wireless?

sshfs (SSH FileSystem) is a FUSE based file system that allows mounting of remote directories using SSH (or more specifically, sftp).
I have all my Linux machines mounting remote directories from a central NAS on my home network. Some machines are connecting at 100Mb LAN (though reduced as I actually run PowerLine adapters between the machines and the NAS), but the more frequently used machines (laptops/netbooks) connect via wireless. One connects at 802.11n, another connects at 802.11g speeds.

Recently, NFS performance has been poor on the UNR Acer Aspire 1 (A110) netbook which is the most used device in the home.  To improve this without opting to add on a 802.11n USB dongle I recently started to look at sshfs.  SSH is stable – I’ve never had issues scping files netween NAS and netbook.  NFS hooks at lower levels into the kernel, and when NFS hangs, the netbook becomes unstable enough to warrant a reboot. Turn it off and on again just isn’t an option for something that “should just work”.

sshfs uses SFTP to present the filesystem on the local machine so make sure the sftp subsystem is enabled in your sshd_config on the server:

Subsystem       sftp    /usr/libexec/sftp-server

Now the great thing about sshfs being based on sftp and a FUSE implementation means you can (and as shown here initally) is that you run this as a regular user.

On my Qnap NAS I have a number of NFS exports. To test the performance I used an NFS export containing lots of photos.

NFS

The NFS export is the following

"/share/NFS/test" *(rw,nohide,async,no_root_squash)
This is mounted with the following options in fstab
nas:/share/NFS/test /media/test nfs rw,bg,tcp,wsize=32768,rsize=32768,vers=3

sshfs

The general syntax for sshfs is

sshfs user@host:/directory /mountpoint

After a number of tests, the following options gave me performance with reliability

sshfs -o idmap=user -o uid=1000 -o gid=1000 -o workaround=nodelaysrv:buflimit -o no_check_root -o kernel_cache -o auto_cache admin@nas:/share/MD0_DATA/test /media/sshfs

Note that when you execute this command it will ask for the password of the username specified. Normal SSH rules apply here – access using ssh keys is the way to provide secure, seamless, unprompted access [or prompted if you sign your key with a password, of course].

So this gives me two areas on my wireless laptop:

/media/test is the NFS mount point of /share/MD0_DATA/test on ‘nas’

/media/sshfs is the sshfs mount point of /share/MD0_DATA/test on ‘nas’

SSH Timeout

It is crucial that you add the following in to your SSH client config which is located under .ssh/ssh_config

ServerAliveInterval 15
ServerAliveCountMax 3

This is to avoid the SSHFS mount hanging after a timeout – which is quite messy to clean up.

Performance sshfs vs NFS

Performance tests were rough and ready, but I needed to represent the real world.  I did timings of directory listings/finds and also visually using Gnome as this is to fix performance issues on a netbook which would be the crux of the issue.

The test area had 4,501 photos of various formats and file sizes.

sshfs

time find /media/sshfs

real 0m6.254s
user 0m0.010s
sys 0m0.110s

NFS

time find /media/test

real 0m3.738s
user 0m0.020s
sys 0m0.110s

Conclusion

I can repeat those tests over and over and I get NFS consistently quicker than sshfs. Visually I see Gnome creating thumbnails slower under sshfs than I do under NFS – but it is still acceptable.  The reason I’m looking at improving performance of the remote filesystems currently mounted under NFS is because of the instability witnessed using NFS – although there is a caveat to this stability…

So, NFS is quicker than sshfs, but is it enough to not use it? I think sshfs is a great idea and will certainly be used for some parts of my home network.  It will easily work its way into the enterprise too as a replacement to age old habit of using scp – especially when used with an automount set up and that niggling issue of “do I really want to run NFS on that server just to access some files?”.

Will I use this completely at home as opposed to NFS? I’m not so sure – the jury’s still out.  I’m currently using UNR on the netbook and I’ve tracked down another issue with NFS over wireless – the latest kernels are what seems to be the cause of the instability with my ath5k wireless driver.  It appears that the Ubuntu kernels, 2.6.32-23 kernels and later are causing my issue.  I’m currently running an older 2.6.32-21 kernel and all is well… for now.

First foray with Drupal

I’ve recently started to use Drupal for a community intranet portal system as I see it as a good fit for what the project wants to achieve.

I decided to dive straight into the latest 7 alpha release as it has the features and design of what I expect of a modern CMS portal system.

Over the next few months I’ll be documenting my trials and tribulations of using Drupal!

How to solve a problem like scraping

It’s been a while since I last blogged but it doesn’t mean I’ve disappeared. See it as me being deep in thought.

I work for a large web site operating in EMEA that has lots of invaluable data available to the public. This is great, but other people want to take that data wholesale without going through the proper authorised channels. This is known as scraping – effectively “Site Content Raping” to coin a not so nice phrase.

Scraping is very easy to do. There are tools out there that in a few clicks, will spider your site and download the content – after all, the data is public, the hyperlinks are designed to take you through the data. The web search engine bots effectively scrape our site, but the difference is that they report back the links. Scraping content involves downloading the relevant data that causes legal issues. In truth, scraping is a legal issue – but legal routes to stopping scraping is hit by two issues: its a lengthy process and one that needs evidence to support the scraping activity to show its breaking the terms and conditions of your site.

The problem with scraping is being able to identify it in the first place. Some scrapes are relatively benign and easy to spot. Ironically, they’re not usually an issue unless the lazy way they’ve implemented their scrape causes site capacity issues. But well designed site infrastructure should be able to cope with any surge in demand however it is presented. Most scraping activity remains under the radar though and spotting the trail involves understanding how the site can be scraped in the first place, the methods to evade detection and the hardest challenge in all this is distinguishing this from the millions of legitimate traffic accessing the site at the same time.

There are a number of ways to tackle the problem:

- Employ a 3rd party to monitor and report on the scraping activity in real-time on your behalf as part of a monthly service operational expense
- Implement ingress filtering of your data to report on activities in real-time using equipment maintained and set up by teams internally
- Implement log analysis after the event

I’m looking at the log analysis to tackle the issue which involves large data set processing using Hadoop and custom scripts to slice and dice the information to help form conclusions that will help towards writing the reports to support scraping activity.

Over the next few months I aim to track my successes and failures in combating this problem.

Ubuntu Lucid Lynx 10.04 Beta Announced

The Register has a great short article introducing you to the delights of Ubuntu Lucid Lynx first Beta announced today.  You can grab the Beta version here.

The release sees new rebranding which has gotten rid of Human Theme in favour of a more professional looking desktop in an attempt to make Linux desktops look less Linux-like.  The release also ventures into the world of online music with the imminent launch of Ubuntu’s U1 music store which ties up Rythmbox to MP3 purchases.

Try it out.

I gave it a go on VirtualBox and the video drivers from the guest additions kept stack tracing and given my current stable Karmic Koala setup I don’t particularly want to run this Beta just yet on real hardware.

Syncing your iPhone in Ubuntu + RhythmBox

So you’ve got your shiny new iPhone. You’ve attached it to your Ubuntu Karmic enabled PC and all you can see is some USB storage. This is fine if you want to use your iPhone as a £450 USB drive, but if you want to manage your tunes in RhythmBox then follow this guide below:

http://maketecheasier.com/sync-iphone-with-rhythmbox/2010/02/13

It goes into detail of removing packages, setting up new repositories, etc. but in reality I did the following:

add-apt-repository ppa:pmcenery/ppa
apt-get update
apt-get dist-upgrade
sudo apt-get install gvfs gvfs-backends gvfs-bin gvfs-fuse libgvfscommon0 ifuse libgpod-dev libgpod-common libimobiledevice-utils libimobiledevice0 libimobiledevice-dev libplist++1 libplist-utils python-plist libusb-1.0-0 libusb-1.0-0-dev libusbmuxd1 usbmuxd

I then added my user to the ‘fuse’ group. Log out and then back in again and voila – your iPhone should be recognised as an iPhone. Loading up RhythmBox and enabling the Portable Players – iPod plugin (Edit… Plugins) should allow you to control your iPhone through RhythmBox and copy music to it.

Don’t forget to include MP3 (gstreamer-ugly) support to RhythmBox too – although, helpfully this prompts you when trying to run proprietary codecs.

Qnap TS-210: Home NAS

I recently purchased a home network storage device after consolidating the myriad of external hard drives I had. After a good few hours of googling and asking for recommendations I took delivery of a Qnap TS-210 – a 2 drive bay SATA-2 NAS enclosure. This device is awesome! After populating it with a couple of fast, quiet Western Digital Blue drives I now have RAID1 shared storage connected to my wireless hub so my data is available across all my home machines.

This is all well and good but what sets this apart from the rest of the NAS devices out there for under £250 is that this device is not just great for serving my photos over NFS…

The Qnap TS-210 sports the following:

NFS, SMB/CIFS
iSCSI target
Apache with PHP, SSL and WebDAV
MySQL Server
TwonkyServer uPNP multimedia server
BitTorrent Client
Web Cam Surveillance server

http://rcm-uk.amazon.co.uk/e/cm?lt1=_blank&bc1=000000&IS2=1&bg1=FFFFFF&fc1=000000&lc1=0000FF&t=uksysadmincom-21&o=2&p=8&l=as1&m=amazon&f=ifr&md=0M5A6TN3AXP2JHJBWT02&asins=B002SD71FO

And the list doesn’t stop there. The device runs Linux and it doesn’t hide access to it’s internals. Enable the Optware plugin and you get access to an apt/yum like repository. From here you can install a wide range of tools and services such as Squid.

Administration of the NAS is through a polished Ajax interface.
Setup is straightforward although there are a couple of gotchas which I came across in the 3.2 version firmware:

Do not run EXT4 if you expect a stable NFS server. It’s not production ready and after a frustrating few days of copying data from hard drive to hard drive, EXT3 is rock solid stable.

If, like me, you start with a single drive expecting to easily mirror it later on then you’re wrong. Copying data off again and back after setting up a new mirror is the only option on this device.

More positives though come in the shape of power management and green features. The device consumes only 14W when in use and 7W when idle. Hard drives are powered down when not in use after a set period and you can set times when you want the device off and when to power up again. Handy for a device for home or small that doesn’t need it on 24 hours a day yet at the same time want the hassle of remembering to power it on each day.

Google Wave

So I got invited to join the new internet phenomenon – Google Wave and with only 3 contacts I don’t see the point. Guess this one is for the viral developers to make some use of it before it gets released to the masses.
I think I get the point of it – its about integration and collaboration. I can see it replacing my email interface @ Google Mail which makes sense – for chatting? I’m not so sure. For status updates? Still not sure. For sharing documents? Possibly but outside of the workplace… we’ll see.
So far its as useful as a chocolate fireguard.