It seems like I’m going to be haunted by my disastrous server upgrade for some time to come.

The server has gitosis installed, which is what I use for serving my private Git repos. Now, when trying to clone a repo (from another computer), I suddenly got a stack trace:

/usr/bin/gitosis-serve:5: UserWarning: Unbuilt egg for setuptools [unknown version] (/usr/lib/python2.6/dist-packages)
  from pkg_resources import load_entry_point
Traceback (most recent call last):
  File "/usr/bin/gitosis-serve", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line 2655, in <module>
    working_set.require(__requires__)
  File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line 648, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.6/dist-packages/pkg_resources.py", line 546, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: gitosis==0.2
fatal: The remote end hung up unexpectedly

It turns out that this was because Python had been upgraded to 2.6. Fixing it was extremely easy: Re-install gitosis with the current version of Python:

git clone git://eagain.net/gitosis

cd gitosis

sudo python setup.py install

Now I just need to figure out how to fix the charset encoding in my MoinMoin wiki…

Thinking like a sysadmin

August 24, 2010

I’ll admit right away: Sitting under our staircase, hunched over my laptop, is not really my idea of a great weekend.

I spent a significant part of last weekend in deep frustration. My home server had crashed, and it was absolutely my own fault. When this server is down, it means no internet connection in the house. Plus, a mailing list, my wiki, redirection to my blog and a few other services are down. So obviously this needed to be fixed, and fast.

The trouble began when I tried to create a RAID array last week. After some hassle with a cheap SATA controller “with Raid”, I decided that I would just go for a software RAID. Now, in an attempt to be responsible and actually prepare this operation, I googled a bit and found various posts about how to go about doing this in Ubuntu, caveats to avoid and so on. Among other interesting facts, I discovered that the kernel I was running was known to cause some trouble with RAIDs.

So time to ugrade to the newest release, a leap of 3 or 4 Ubuntu releases. Unfortunately, my patience was running low at this point, so I ignored the HUGE warning and started the upgrade over my SSH connection. Because I couldn’t be bothered to pull out the server from under the staircase and attach keyboard and monitor to it.

As you will have guessed by now, this was an enormous mistake. One of the stupidest things I have yet done in my career as an amateur sysadmin. Because of course, somewhere in the course of the upgrade, something went wrong and broke the network connection. Along with DNS, DHCP, internet connectivity and everything else we rely on this machine for.

After a reboot, I was still unable to contact the machine. So I had to pull it out and attach an actual monitor to it. To my horror, it turned out that the boot process died instantly, because it was unable to mount the root filesystem. WTF?

Let me just cut to the chase here, because too many hours were spent trying out solutions that just didn’t work.

The solution was slightly interesting, though. Not having an internet connection (except on my HTC Hero) made it pretty hard to google for possible solutions, look up GRUB docs etc., so I ended up taking a different approach. I had an Ubuntu 10.4 installer CD lying around, so I did a fresh install using some free space on the hard drive. Then, I compared the GRUB configurations of the broken and the fresh installations. It turned out that the upgrade had been interrupted before it had created an initrd for the new kernel! So I was a bit surprised that it had updated the GRUB config to a half-baked state – the entry in menu.lst contained a kernel line but no initrd.

So stealing a fresh kernel+initrd from the new installation and Voila!, my system was back to normal (except for lots of unconfigured packages etc.).

Lesson learned: As a software developer, I’m enthusiastic, optimistic and sometimes downright reckless. As a sysadmin, pretty much the same. A good sysadmin needs to be conservative and cautious, and I understand why they sometimes get annoyed with us developers.

Keep this in mind the next time your local sysadmin doesn’t “just update that shit” right away.

 

RAID trouble

August 20, 2010

I had heard about software RAID and hardware RAID, but I had not heard about fakeRAID. And how typical it is that I learn about this on the Ubuntu wiki after an evening spent on:

1) Squeezing myself into the tight space under our staircase to install a cheap SATA/RAID controller in my server

2) Setting up keyboard and monitor to get into the BIOS of that controller

3) Creating a RAID0 set and discovering that this did not “just work” under Ubuntu. Ubuntu still sees two distinct drives…WTF?

4) Lots of Googling and reading through forum threads

5) Getting uncomfortable again to uninstall the cheap controller card that I now hate!

It turns out that some controller chip vendors, Silicon Image included, produce some inexpensive “RAID” controllers that are really just SATA controllers with a BIOS on them to assist in setting up the RAID arrays. I’ll admit that I don’t understand (or care about) the details of this, but these controllers seem to offer better performance than pure software RAID, plus they allow most OSs to boot from the RAID array.

However, not being a real HW RAID means some OS interaction is required. The array is not presented to the OS as a single, large drive. So the advantages of fakeRAID are uninteresting to me, because what I wanted was easier setup.

So I have decided to go with an ordinary Linux SW RAID, which by all accounts is very reliable and relatively easy to set up. If I run into any interesting “learning opportunities”, I’ll blog about it.

By the way, the controller in question here is a DeLock branded 4-port SATA controller. The chipset on it is a SiI3114. Here it is on the website of my local parts shop.

Follow

Get every new post delivered to your Inbox.