OpenStack troubleshooting basics

If you’ve been running OpenStack from the developer trunk (as per my blog) you will occasionally come across some bugs. This is the nature of the beast for running bleeding edge code.
So how do you track down a solution for them?

Step 1. Check the logs

First place to look is in /var/log/nova where you will see the logs related to OpenStack.
Some bugs will be related to changes in the software, so maybe an extra config line is needed in /etc/nova/nova.conf.
For example you may have seen this in /var/log/nova/nova-network.log:

2011-03-15 17:33:35,732 CRITICAL nova [-] failed to create /usr/lib/pymodules/python2.6/cloud2.MainThread-18360
(nova): TRACE: Traceback (most recent call last):
(nova): TRACE:   File "/usr/bin/nova-network", line 48, in
(nova): TRACE:     service.serve()
(nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/service.py", line 284, in serve
(nova): TRACE:     x.start()
(nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/service.py", line 84, in start
(nova): TRACE:     self.manager.init_host()
(nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 489, in init_host
(nova): TRACE:     super(VlanManager, self).init_host()
(nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/network/manager.py", line 125, in init_host
(nova): TRACE:     self.driver.init_host()
(nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/network/linux_net.py", line 394, in init_host
(nova): TRACE:     iptables_manager.apply()
(nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/utils.py", line 523, in inner
(nova): TRACE:     with lock:
(nova): TRACE:   File "/usr/lib/pymodules/python2.6/lockfile.py", line 223, in __enter__
(nova): TRACE:     self.acquire()
(nova): TRACE:   File "/usr/lib/pymodules/python2.6/lockfile.py", line 239, in acquire
(nova): TRACE:     raise LockFailed("failed to create %s" % self.unique_name)
(nova): TRACE: LockFailed: failed to create /usr/lib/pymodules/python2.6/cloud2.MainThread-18360
(nova): TRACE:

There was a change between releases that required the following lines present in /etc/nova/nova.conf to solve this:

--state_path=/var/lib/nova
--lock_path=/var/lock/nova

Step 2. Check https://bugs.launchpad.net/nova and https://bugs.launchpad.net/swift

A recent one I came across this morning was the following:

2011-03-17 08:49:19,160 ERROR nova.api [GXEJM3P1HVP7N53IGI5J admin myproject] Unexpected error raised: invalid literal for int() with base 16: 'ami-jqxvgtmd'
(nova.api): TRACE: Traceback (most recent call last):
(nova.api): TRACE:   File "/usr/lib/pymodules/python2.6/nova/api/ec2/__init__.py", line 318, in __call__
(nova.api): TRACE:     result = api_request.invoke(context)
(nova.api): TRACE:   File "/usr/lib/pymodules/python2.6/nova/api/ec2/apirequest.py", line 150, in invoke
(nova.api): TRACE:     result = method(context, **args)
(nova.api): TRACE:   File "/usr/lib/pymodules/python2.6/nova/api/ec2/cloud.py", line 906, in describe_images
(nova.api): TRACE:     images = self.image_service.detail(context)
(nova.api): TRACE:   File "/usr/lib/pymodules/python2.6/nova/image/s3.py", line 76, in detail
(nova.api): TRACE:     images = self.service.detail(context)
(nova.api): TRACE:   File "/usr/lib/pymodules/python2.6/nova/image/local.py", line 58, in detail
(nova.api): TRACE:     for image_id in self._ids():
(nova.api): TRACE:   File "/usr/lib/pymodules/python2.6/nova/image/local.py", line 50, in _ids
(nova.api): TRACE:     return [int(i, 16) for i in os.listdir(self._path)]
(nova.api): TRACE: ValueError: invalid literal for int() with base 16: 'ami-jqxvgtmd'
(nova.api): TRACE:

I found this related bug: https://bugs.launchpad.net/nova/+bug/735641 by searching the bug database for the error. In this case the solution is to remove my images from my objectstore and re-upload them due to changes in how the images are stored and retrieved.

Step 3. Always a good place to go is on IRC @ freenode.net

(http://webchat.freenode.net/) and join #openstack where the developers and contributors will answer your questions. Have patience though, they do have work to do.

Step 4. You can also ask questions on Launchpad: https://answers.launchpad.net/nova/+addquestion (and similar for swift).

I also find its handy to not be too vague – describe your set up instead of saying “I launch an instance and it’s stuck on “Scheduling, can you help?” doesn’t give anyone any details of why that could be the case. As you can imagine, this could be anything from user-error, hardware errors, software errors or misconfigured environments, etc. – all requiring many different ways to troubleshoot so help yourself by being more specific.

For more information check out the OpenStack Wiki on contributing to the project: http://wiki.openstack.org/HowToContribute and this information on support and troubleshooting from the docs: http://docs.openstack.org/openstack-compute/admin/content/ch08.html

Advertisements

One comment on “OpenStack troubleshooting basics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s