TL;DR

The backup gem contains a bug that causes it to crash when the HOME environment variable is not set. Notably, this is the case when jobs are started by ‘god’.

I have patched this and will soon send a pull request to the authors of the gem.

Read on to get a few hints on how to track down issues like this one.

The story of a painful bug hunt

0. Background

On several projects I’m using god to start and monitor daemon processes, especially resque workers. On one particular project, this just stopped working recently and as a workaround I had to start the resque workers manually under tmux.

Today I finally took the time to figure out what was going on.

1. Symptoms

In my god.conf I have a ‘resque’ watch configured which starts five workers. Here is the relevant snippet from god.conf:

num_workers = rails_env == 'production' ? 5 : 2
num_workers.times do |num|
  God.watch do |w|
    w.name     = "resque-#{num}"
    w.group    = 'resque'
    w.interval = 60.seconds
    w.env      = {"QUEUE"=>"*", "RAILS_ENV"=>rails_env}
    w.start    = "bundle exec bootup_rake -f #{rails_root}/Rakefile environment resque:work"
    w.dir      = "#{rails_root}"
...

When doing a “god start resque”, god tries to start up the workers but they crash almost immediately. If I do “ps aux | grep resque” within a few seconds I see the processes but then they just disappear.

2. What is god saying?

Finding the scroll that collects the utterings of god is easy:

/var/log/god.log

However, it quicly becomes apparent that god has nothing interesting to say:

I [2011-11-15 16:24:57]  INFO: Using pid file directory: /var/run/god
I [2011-11-15 16:30:55]  INFO: Syslog enabled.
I [2011-11-15 16:30:55]  INFO: Using pid file directory: /var/run/god
I [2011-11-15 16:39:30]  INFO: Syslog enabled.
I [2011-11-15 16:39:30]  INFO: Using pid file directory: /var/run/god
...

A bit of googling suggests that I try to run god in the foreground to see what is going on.

3. Non-daemonized god

So let’s see how god is being invoked:

root@foo:~# ps aux | grep god
root     27956  4.5  0.2 179304 19132 pts/1    Sl   10:12   \
   0:00 /usr/local/rvm/rubies/ruby-1.9.2-p290/bin/ruby      \
   /usr/local/rvm/gems/ruby-1.9.2-p290/bin/god -P /var/run/god.pid -l /var/log/god.log

We need to make a few tweaks:

  • don’t send output to a log file (remove -l)
  • load the config file with -c <config>
  • keep it in the foreground with -D

So our command becomes:

root@foo:~# /usr/local/rvm/rubies/ruby-1.9.2-p290/bin/ruby \ 
   /usr/local/rvm/gems/ruby-1.9.2-p290/bin/god             \
   -P /var/run/god.pid -c /etc/god.conf -D

This yields slightly more interesting output. Here are the lines for the worker ‘resque-1′:

I [2011-11-16 10:17:13]  INFO: resque-1 move 'unmonitored' to 'init'
I [2011-11-16 10:17:13]  INFO: resque-1 moved 'unmonitored' to 'init'
I [2011-11-16 10:17:13]  INFO: resque-1 [trigger] process is not running (ProcessRunning)
I [2011-11-16 10:17:13]  INFO: resque-1 move 'init' to 'start'
I [2011-11-16 10:17:13]  INFO: resque-1 start: bundle exec bootup_rake \ 
   -f /var/www/foo/current/Rakefile environment resque:work
I [2011-11-16 10:17:13]  INFO: resque-1 moved 'init' to 'start'
I [2011-11-16 10:17:13]  INFO: resque-1 [trigger] process is running (ProcessRunning)
I [2011-11-16 10:17:13]  INFO: resque-1 move 'start' to 'up'
I [2011-11-16 10:17:13]  INFO: resque-1 moved 'start' to 'up'
I [2011-11-16 10:17:13]  INFO: resque-1 [ok] memory within bounds [8952kb] (MemoryUsage)
I [2011-11-16 10:17:14]  INFO: resque-1 [ok] process is running (ProcessRunning)

OK, maybe that’s not so interesting after all. God starts the process and is satisfied with that.

What IS interesting is the fact that this actually works! The resque workers keep running and start processing jobs.

4 WTF?

So when I start god non-daemonized, it works. When god is started with ‘service god start’, it does not.

I need a way to see the log output from god when it is running in daemonized mode. This turns out to be as simple as:

god log resque

By default this gives me the output from the first worker in the group. I won’t repeat it here, but it is exactly the same as when running god in the foreground.

It seems to be safe to assume that god is behaving the same way whether started by me or via /etc/init.d/god. To be sure I also tried starting it by executing the exact two lines from this file. Same result.

5. Inspecting the environment

I have suspected for a while that this issue has to do with the environment in which the rake command is being run. This seems to be the only major source of difference between the working and the non-working experiments.

So I need to know the environment in which the failing rake invocations are being run. I know that for all running processes, the environment is available in the file /proc/<pid>/environ. But how can I get to this file when the process dies within seconds?

My solution is to run the following command immediately after god tries to start the workers:

less -f /proc/`ps aux | grep resque | head -1 | cut -d ' ' -f 4`/environ

What this does is

  • ps aux | grep resque finds the process info for the about-to-die resque workers
  • head -1 grabs the first one
  • cut -d ‘ ‘ -f 4 cuts out the PID (fourth field when using a space as the delimiter)
  • This PID is then used as part of the path passed to the outer less command

The environ file is in some kind of binary format (which is why I need -f on the less command) the details of which I haven’t bothered to look up.

I don’t really need to, because after running the above command I see this in my iTerm:

GEM_HOME=/usr/local/rvm/gems/ruby-1.9.2-p290^@TERM=xterm^@IRBRC=/usr/local/rvm/r\
ubies/ruby-1.9.2-p290/.irbrc^@MY_RUBY_HOME=/usr/local/rvm/rubies/ruby-1.9.2-p290\
^@QUEUE=*^@rvm_path=/usr/local/rvm^@PATH=/usr/local/rvm/gems/ruby-1.9.2-p290/bin\
:/usr/local/rvm/gems/ruby-1.9.2-p290@global/bin:/usr/local/rvm/rubies/ruby-1.9.2\
-p290/bin:/usr/local/rvm/bin:/var/www/foo/shared/bundle/ruby/1.9.1/bin:/usr/loca\
l/rvm/gems/ruby-1.9.2-p290/bin:/usr/local/rvm/gems/ruby-1.9.2-p290@global/bin:/u\
sr/local/rvm/rubies/ruby-1.9.2-p290/bin:/usr/local/rvm/bin:/usr/local/sbin:/usr/\
local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games^@PWD=/var/www/foo/releases/20\
111115153743^@LANG=en_US.UTF-8^@SHLVL=1^@BUNDLE_GEMFILE=/var/www/foo/releases/20\
111115153743/Gemfile^@RAILS_ENV=production^@rvm_ruby_string=ruby-1.9.2-p290^@GEM\
_PATH=/usr/local/rvm/gems/ruby-1.9.2-p290:/usr/local/rvm/gems/ruby-1.9.2-p290@gl\
obal^@RUBYOPT=-I/usr/local/rvm/gems/ruby-1.9.2-p290/gems/bundler-1.0.18/lib -rbu\
ndler/setup^@BUNDLE_BIN_PATH=/usr/local/rvm/gems/ruby-1.9.2-p290/gems/bundler-1.\
0.18/bin/bundle^@RUBY_VERSION=ruby-1.9.2-p290^@

Notice that the variables are separated by some byte that less renders as “^@”. Good!

6. Reproducing the bug

Off to Emacs to transform this into something useful. I won’t bore you with the details, but a quick keyboard macro transforms the above into:

export GEM_HOME=/usr/local/rvm/gems/ruby-1.9.2-p290
export TERM=xterm
export IRBRC=/usr/local/rvm/rubies/ruby-1.9.2-p290/.irbrc
export MY_RUBY_HOME=/usr/local/rvm/rubies/ruby-1.9.2-p290
export QUEUE=*
export rvm_path=/usr/local/rvm
export PATH=/usr/local/rvm/gems/ruby-1.9.2-p290/bin:/usr/local/rvm/gems/ruby-1.9.2-p290@global/bin:\
   /usr/local/rvm/rubies/ruby-1.9.2-p290/bin:/usr/local/rvm/bin:/var/www/foo/shared
export PWD=/var/www/foo/releases/20111115153743
export LANG=en_US.UTF-8
export SHLVL=1
export BUNDLE_GEMFILE=/var/www/foo/releases/20111115153743/Gemfile
export RAILS_ENV=production
export rvm_ruby_string=ruby-1.9.2-p290
export GEM_PATH=/usr/local/rvm/gems/ruby-1.9.2-p290:/usr/local/rvm/gems/ruby-1.9.2-p290@global
export RUBYOPT=-I/usr/local/rvm/gems/ruby-1.9.2-p290/gems/bundler-1.0.18/lib -rbundler/setup
export BUNDLE_BIN_PATH=/usr/local/rvm/gems/ruby-1.9.2-p290/gems/bundler-1.0.18/bin/bundle
export RUBY_VERSION=ruby-1.9.2-p2

Back on the server, I clear all existing environment vars with a snippet I found here:

unset  $ ( /usr/bin/env | egrep '^(\w+)=(.*)$' | \
  egrep -vw 'PWD|USER|LANG' | /usr/bin/cut -d= -f1);

Next, I paste all the exports above into iTerm and finally I am ready to try starting the worker again.

Voila!! It crashes:

rake aborted!
can't convert nil into String
/var/www/foo/shared/bundle/ruby/1.9.1/gems/backup-3.0.16/lib/backup.rb:40:in `join'
/var/www/foo/shared/bundle/ruby/1.9.1/gems/backup-3.0.16/lib/backup.rb:40:in `<module:Backup>'
/var/www/foo/shared/bundle/ruby/1.9.1/gems/backup-3.0.16/lib/backup.rb:8:in `<top (required)>'
...

Here are the lines around line 40 of backup.rb:

##                                                                                                                                                                                                                                                                                                                                                                      
# Backup's Environment paths                                                                                                                                                                                                                                                                                                                                            
PATH               = File.join(ENV['HOME'], 'Backup')
DATA_PATH          = File.join(ENV['HOME'], 'Backup', 'data')

Ah, so the backup gem assumes that the HOME environment variable is set. If not, it crashes.

7. Done

Fortunately the source for “backup” is on Github, so I could easily fork the project and apply the very simple patch that was needed.

If you need to run scheduled jobs in your Rails app (!= background jobs as in Resque), you should use the excellent whenever gem.

However, one detail that is really easy to miss is which shell environment the cron job actually runs under.

bundle: command not found

And an easy way to confuse yourself even further is by trying to debug issues in this area by:

  1. Logging into your  server as the deployment user
  2. Doing crontab -l to see which entries whenever installed
  3. Testing an entry by cut’n’pasting it to the shell prompt
The outcome will not be reliable, because you’re running it from a login shell and cron is not. So for example, your rbenv shims may be in the path while testing, but cron does not load your .bashrc or .bash_profile files.

On a project I work on, the staging and production machines are currently quite different. In production we use RVM (globally installed) while staging has rbenv locally installed for the deployment user.

So in staging I need to assume that no binaries are available and use the full path to bundler, in my case /home/passenger/.rbenv/versions/1.9.2-p290/bin/bundle.

(I know, symlinking this to eg. /usr/local/bin/bundle would probably be more elegant, but I actually like the reminder of where my ruby is actually installed. I don’t spend much time on the servers, so I tend to forget.)

Running stuff only on certain stages

EDIT: See Markham’s comment for a better way to do achieve the same result.

Another issue I struggled with for a while is how to schedule jobs only in production. For example, I don’t want to run S3 backups from staging.

I use capistrano/multistage, so I needed to inspect the value of the current “stage” from schedule.rb. It turns out that by including this:

require 'capistrano/ext/multistage'

set :whenever_environment, defer { stage }
require "whenever/capistrano"

…the stage is available in schedule.rb as @environment. Very simple, but it took some googling to find.

Example schedule.rb

Below is my anonymized schedule.rb. It should be pretty self explanatory, post questions in the comments if not.

def production?
 @environment == 'production'
end

set :bundler, production? ? "gp_bundle" : "/home/passenger/.rbenv/versions/1.9.2-p290/bin/bundle"

job_type :gp_rake, "cd :path && RAILS_ENV=:environment :bundler exec rake :task --silent :output"
job_type :gp_runner, "cd :path && :bundler exec rails runner -e :environment ':task' :output"
job_type :gp_bundle_exec, "cd :path && RAILS_ENV=:environment :bundler exec :task"

every 5.minutes do
 gp_rake "thinking_sphinx:index"
end

every 30.minutes do
 gp_runner "FooBar.update"
end

every 1.day, :at => "4:00am" do
 gp_runner "Baz.purge_inactive"
end

if production?
 every 1.day, :at => "4:00am" do
   gp_bundle_exec 'backup perform --trigger my_backup --config-file config/backup.rb --log-path log'
 end
end

EDIT: Jakob Skjerning posted this snippet in the comments:

job_type :rake, “cd /var/www/appname/#{environment}/current && /home/appname/.rvm/gems/ree-1.8.7-2010.02@appname-#{environment}/bin/bundle exec /home/appname/.rvm/wrappers/ree-1.8.7-2010.02@appname-#{environment}/rake –silent RAILS_ENV=#{environment} :task :output”

I use my Xoom mainly for reading eBooks, and I really like Amazon’s Kindle app.

Unfortunately, it seems to have no support at all for syncing books that are not bought through Amazon (not surprisingly perhaps). This is a problem for me because I often buy from the Pragmatic Programmers, and I don’t want to copy those .mobi files to my tablet manually (I know that PragProg offers to sync the titles via Whispernet, but that only works for physical Kindle devices).

What I found out recently is that there is a pretty easy fix for this:

  1. Store your books in some folder in your Dropbox, say “/kindle”
  2. Install the app FolderSync Lite on your Android device
  3. Set up your Dropbox account in FolderSync. The interface is not very intuitive so pay attention.
  4. Set up a “folderpair” in FolderSync, mapping /kindle in your Dropbox to /mnt/sdcard/kindle on your device. Make sure the sync type is “to local folder”.
  5. Schedule this folderpair to sync as often as you like.
If you got the options in FolderSync right, you can now put your non-Amazon purchases into /kindle in your Dropbox and they will automatically appear in the Kindle app on your Android device.

You may be wondering if this conflicts with the books you bought through Amazon – after all, they’re also stored in /mnt/sdcard/kindle. The trick is that FolderSync does not take ownership of this folder, it simply copies files from the remote (Dropbox) folder into it. So I assume that it doesn’t even clean up deleted files, although I haven’t tried.

On a project, I have a “Revision” resource and URL’s like

http://foobar.dev/revisions/1

Now suddenly the client wants to rename “revision” to “edition”, so all URLs should be changed to the format

http://foobar.dev/editions/1

Unless I’m missing something in the Rails docs, there is no option to change the base part of the path without renaming the resource itself. Actions can be renamed, eg. “nuevo” instead of “new”, but I don’t think the path segment for the resource name can be changed (please correct me if I’m wrong).

I guess the proper way to go would be to rename everything: The AR model, all helper calls, controllers, tests etc. Even with lots of automation, this would be a really annoying task and I’m not inclined to spend time on this right now. Who knows if the client will change his mind again?

So instead, I ended up changes the resource from

  resources :revisions

to

  resources :editions, :controller => "revisions", :as => "revisions"

I’m changing the name of the resource, but the :controller option makes sure the right controller is still called, and the :as option controls the generation of path helpers (so I can keep using eg. edit_revision_path).

Remember to make the same change for nested resources.

EDIT: As Rasmus and Jakob point out, the :path option sets the actual path, not a prefix. I have made some minor updates to the Rails docs to correct this (most instances were already fixed).

It turns out that lots of MBP models have a well-known issue with the headphone port, and this can cause the internal speakers to become disabled. Actually, they’re disabled as a side-effect of digital output being enabled. The headphone port is used for both electrical and optical connections, and the internal switch managing this can sometimes enter the “optical connector present” state for no apparent reason.

Here’s one thread describing this issue, and Google turns out many more.

However, the reason that I’m writing this is that my machine (2010 17″ MBP) seems to suffer from a variation of this problem. All descriptions of the issue I found were rather old, and in all cases it was reported that the red light (ie. the optical signal stream) was turned on when the internal speakers were disabled. On my machine, the red light was visible only some of the time.

Doing a software update, a reboot and even an SMC reset did not help. I eventually fixed the issue by moving the headphone jack slowly in and out of the port while wiggling it a little bit. It took maybe 15 tries before it worked…

To monitor your (lack of) progress, open the System Preferences, go to the Sound pane and select the Output tab.

  • In the broken state, the topmost item will read “Digital Output” and a red light may or may not be visible from the headphone socket
  • When the headphone jack is inserted, this should change to “Headphones”
  • The non-broken state is “Internal Speakers”. This is when you have managed to clear the switch (HW or SW, I don’t know) that is stuck in the “optical connector present” state.

My oldish Acer Aspire One has been running Ubuntu Netbook Remix for a year or so, and I have become increasingly annoyed with how slow it is. I remember it as being much faster when I installed it, so either a) I have become faster, b) The performance of the installation has degraded, or c) My memory is incorrect yet again.

I’m actually inclined to go with b). The problem with the Aspire One (some models) is its extremely slow 8GB SSD. I know, those things are supposed to be fast, and they have done wonders for my last two laptops. But on the AA1, especially the write speed really is terrible. So I’m suspecting that something has caused the Ubuntu installation to do more writes over time…whether this is disk fragmentation, the browser cache or whatever, I don’t know.

For the reinstall, I decided to try out Joli OS and so far it looks like this was a good choice! What I basically need is for this machine to be 100% quiet, reasonably responsive and able to render web pages and PDF files. That’s all, because I use it almost exclusively for reading in bed.

The entire OS is built very much like I would expect Chrome OS (which I haven’t tried) to be built. Everything revolves around the web, and it seems that most of the OS is written in HTML5. Joli OS is in fact based on Ubuntu Netbook Remix, but simplified (even further) and integrating Jolicloud, which is pretty cool. In fact Jolicloud is at the core of the user experience where it replaces the launcher found in UNR.

The idea with Jolicloud is to have a web-based, portable desktop containing all the webapps that you use all the time. Facebook, Twitter, Gmail, Flickr, YouTube, just to name a few of the ones that are added by default.

I’m not sure exactly what Joli OS adds to this (because the OS is all I’ve tried), but a feature I found especially nice is the ability to link my device with Dropbox and Google Docs. I use Dropbox for storing a lot of PDF documents that I want to have available anywhere I go. But I also use it for lots of other purposes, and actually I think this is one of the reasons my UNR installation finally gave up – because I was naive enough to have Dropbox installed and syncing everything.

With Joli, I can browse my Dropbox and open individual documents. Browsing is of course slightly slower than on my physical disk, but I tend to keep these documents open for weeks, so this is not an issue. And of course I could just download any files I want to access offline.

So far, I have done very little customization of the OS. The AA1 has a very noisy fan, but fixing this problem with the acerhdf module is a  simple matter of editing a few files. I also want Caps Lock to be an additional Ctrl, and I want both Danish and US keyboard layouts installed. Both wishes were granted after a few minutes of poking around in the settings (these are just the Gnome settings, of course).

I’m really surprised to discover that I seem to fit exactly into the target group for an OS like this. I have installed Emacs (mostly to edit org files), but apart from that, it looks like I’ll be fine with the built-in selection of programs. Which is just Chromium and the Gnome stuff.

Finally, it looks like there’s a whole social networking thing built into Joli which I haven’t explored at all yet. But I have already earned the badge “Recycler” just by installing Joli and recycling a computer that was destined to be replaced by a tablet before long :-)

Here is a very simple tip that has been really useful to me over the last few months.

The way I work in Emacs is by opening lots and lots of files and then quickly selecting the buffer I want with ido-mode.

However, this breaks down if I have more than one project loaded in my Emacs session. For example, when working on Rails projects, I want to be able to find the routes.rb file with just a few keystrokes. To avoid restarting Emacs whenever I switch project, I need multiple instances running simultaneously, which is not “supported” in OSX (or rather, it’s just not the OSX way).

To workaround this, I use a really simple hack: Copying Emacs

So to create Emacs2 and Emacs3 in addition to just Emacs, do

cp -r /Applications/Emacs.app /Applications/Emacs2.app
cp -r /Applications/Emacs.app /Applications/Emacs3.app

That’s it! Now you can fire up three Emacs instances at the same time. Next step will be to replace the icons of the clones to make it easier to spot them when switching applications.

[EDIT]

As discussed in the comments below, I can just start new instances with “open -n -a Emacs.app”. Thanks tali713 for bringing this to my attention!

Follow

Get every new post delivered to your Inbox.