TL;DR

The backup gem contains a bug that causes it to crash when the HOME environment variable is not set. Notably, this is the case when jobs are started by ‘god’.

I have patched this and will soon send a pull request to the authors of the gem.

Read on to get a few hints on how to track down issues like this one.

The story of a painful bug hunt

0. Background

On several projects I’m using god to start and monitor daemon processes, especially resque workers. On one particular project, this just stopped working recently and as a workaround I had to start the resque workers manually under tmux.

Today I finally took the time to figure out what was going on.

1. Symptoms

In my god.conf I have a ‘resque’ watch configured which starts five workers. Here is the relevant snippet from god.conf:

num_workers = rails_env == 'production' ? 5 : 2
num_workers.times do |num|
  God.watch do |w|
    w.name     = "resque-#{num}"
    w.group    = 'resque'
    w.interval = 60.seconds
    w.env      = {"QUEUE"=>"*", "RAILS_ENV"=>rails_env}
    w.start    = "bundle exec bootup_rake -f #{rails_root}/Rakefile environment resque:work"
    w.dir      = "#{rails_root}"
...

When doing a “god start resque”, god tries to start up the workers but they crash almost immediately. If I do “ps aux | grep resque” within a few seconds I see the processes but then they just disappear.

2. What is god saying?

Finding the scroll that collects the utterings of god is easy:

/var/log/god.log

However, it quicly becomes apparent that god has nothing interesting to say:

I [2011-11-15 16:24:57]  INFO: Using pid file directory: /var/run/god
I [2011-11-15 16:30:55]  INFO: Syslog enabled.
I [2011-11-15 16:30:55]  INFO: Using pid file directory: /var/run/god
I [2011-11-15 16:39:30]  INFO: Syslog enabled.
I [2011-11-15 16:39:30]  INFO: Using pid file directory: /var/run/god
...

A bit of googling suggests that I try to run god in the foreground to see what is going on.

3. Non-daemonized god

So let’s see how god is being invoked:

root@foo:~# ps aux | grep god
root     27956  4.5  0.2 179304 19132 pts/1    Sl   10:12   \
   0:00 /usr/local/rvm/rubies/ruby-1.9.2-p290/bin/ruby      \
   /usr/local/rvm/gems/ruby-1.9.2-p290/bin/god -P /var/run/god.pid -l /var/log/god.log

We need to make a few tweaks:

  • don’t send output to a log file (remove -l)
  • load the config file with -c <config>
  • keep it in the foreground with -D

So our command becomes:

root@foo:~# /usr/local/rvm/rubies/ruby-1.9.2-p290/bin/ruby \ 
   /usr/local/rvm/gems/ruby-1.9.2-p290/bin/god             \
   -P /var/run/god.pid -c /etc/god.conf -D

This yields slightly more interesting output. Here are the lines for the worker ‘resque-1′:

I [2011-11-16 10:17:13]  INFO: resque-1 move 'unmonitored' to 'init'
I [2011-11-16 10:17:13]  INFO: resque-1 moved 'unmonitored' to 'init'
I [2011-11-16 10:17:13]  INFO: resque-1 [trigger] process is not running (ProcessRunning)
I [2011-11-16 10:17:13]  INFO: resque-1 move 'init' to 'start'
I [2011-11-16 10:17:13]  INFO: resque-1 start: bundle exec bootup_rake \ 
   -f /var/www/foo/current/Rakefile environment resque:work
I [2011-11-16 10:17:13]  INFO: resque-1 moved 'init' to 'start'
I [2011-11-16 10:17:13]  INFO: resque-1 [trigger] process is running (ProcessRunning)
I [2011-11-16 10:17:13]  INFO: resque-1 move 'start' to 'up'
I [2011-11-16 10:17:13]  INFO: resque-1 moved 'start' to 'up'
I [2011-11-16 10:17:13]  INFO: resque-1 [ok] memory within bounds [8952kb] (MemoryUsage)
I [2011-11-16 10:17:14]  INFO: resque-1 [ok] process is running (ProcessRunning)

OK, maybe that’s not so interesting after all. God starts the process and is satisfied with that.

What IS interesting is the fact that this actually works! The resque workers keep running and start processing jobs.

4 WTF?

So when I start god non-daemonized, it works. When god is started with ‘service god start’, it does not.

I need a way to see the log output from god when it is running in daemonized mode. This turns out to be as simple as:

god log resque

By default this gives me the output from the first worker in the group. I won’t repeat it here, but it is exactly the same as when running god in the foreground.

It seems to be safe to assume that god is behaving the same way whether started by me or via /etc/init.d/god. To be sure I also tried starting it by executing the exact two lines from this file. Same result.

5. Inspecting the environment

I have suspected for a while that this issue has to do with the environment in which the rake command is being run. This seems to be the only major source of difference between the working and the non-working experiments.

So I need to know the environment in which the failing rake invocations are being run. I know that for all running processes, the environment is available in the file /proc/<pid>/environ. But how can I get to this file when the process dies within seconds?

My solution is to run the following command immediately after god tries to start the workers:

less -f /proc/`ps aux | grep resque | head -1 | cut -d ' ' -f 4`/environ

What this does is

  • ps aux | grep resque finds the process info for the about-to-die resque workers
  • head -1 grabs the first one
  • cut -d ‘ ‘ -f 4 cuts out the PID (fourth field when using a space as the delimiter)
  • This PID is then used as part of the path passed to the outer less command

The environ file is in some kind of binary format (which is why I need -f on the less command) the details of which I haven’t bothered to look up.

I don’t really need to, because after running the above command I see this in my iTerm:

GEM_HOME=/usr/local/rvm/gems/ruby-1.9.2-p290^@TERM=xterm^@IRBRC=/usr/local/rvm/r\
ubies/ruby-1.9.2-p290/.irbrc^@MY_RUBY_HOME=/usr/local/rvm/rubies/ruby-1.9.2-p290\
^@QUEUE=*^@rvm_path=/usr/local/rvm^@PATH=/usr/local/rvm/gems/ruby-1.9.2-p290/bin\
:/usr/local/rvm/gems/ruby-1.9.2-p290@global/bin:/usr/local/rvm/rubies/ruby-1.9.2\
-p290/bin:/usr/local/rvm/bin:/var/www/foo/shared/bundle/ruby/1.9.1/bin:/usr/loca\
l/rvm/gems/ruby-1.9.2-p290/bin:/usr/local/rvm/gems/ruby-1.9.2-p290@global/bin:/u\
sr/local/rvm/rubies/ruby-1.9.2-p290/bin:/usr/local/rvm/bin:/usr/local/sbin:/usr/\
local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games^@PWD=/var/www/foo/releases/20\
111115153743^@LANG=en_US.UTF-8^@SHLVL=1^@BUNDLE_GEMFILE=/var/www/foo/releases/20\
111115153743/Gemfile^@RAILS_ENV=production^@rvm_ruby_string=ruby-1.9.2-p290^@GEM\
_PATH=/usr/local/rvm/gems/ruby-1.9.2-p290:/usr/local/rvm/gems/ruby-1.9.2-p290@gl\
obal^@RUBYOPT=-I/usr/local/rvm/gems/ruby-1.9.2-p290/gems/bundler-1.0.18/lib -rbu\
ndler/setup^@BUNDLE_BIN_PATH=/usr/local/rvm/gems/ruby-1.9.2-p290/gems/bundler-1.\
0.18/bin/bundle^@RUBY_VERSION=ruby-1.9.2-p290^@

Notice that the variables are separated by some byte that less renders as “^@”. Good!

6. Reproducing the bug

Off to Emacs to transform this into something useful. I won’t bore you with the details, but a quick keyboard macro transforms the above into:

export GEM_HOME=/usr/local/rvm/gems/ruby-1.9.2-p290
export TERM=xterm
export IRBRC=/usr/local/rvm/rubies/ruby-1.9.2-p290/.irbrc
export MY_RUBY_HOME=/usr/local/rvm/rubies/ruby-1.9.2-p290
export QUEUE=*
export rvm_path=/usr/local/rvm
export PATH=/usr/local/rvm/gems/ruby-1.9.2-p290/bin:/usr/local/rvm/gems/ruby-1.9.2-p290@global/bin:\
   /usr/local/rvm/rubies/ruby-1.9.2-p290/bin:/usr/local/rvm/bin:/var/www/foo/shared
export PWD=/var/www/foo/releases/20111115153743
export LANG=en_US.UTF-8
export SHLVL=1
export BUNDLE_GEMFILE=/var/www/foo/releases/20111115153743/Gemfile
export RAILS_ENV=production
export rvm_ruby_string=ruby-1.9.2-p290
export GEM_PATH=/usr/local/rvm/gems/ruby-1.9.2-p290:/usr/local/rvm/gems/ruby-1.9.2-p290@global
export RUBYOPT=-I/usr/local/rvm/gems/ruby-1.9.2-p290/gems/bundler-1.0.18/lib -rbundler/setup
export BUNDLE_BIN_PATH=/usr/local/rvm/gems/ruby-1.9.2-p290/gems/bundler-1.0.18/bin/bundle
export RUBY_VERSION=ruby-1.9.2-p2

Back on the server, I clear all existing environment vars with a snippet I found here:

unset  $ ( /usr/bin/env | egrep '^(\w+)=(.*)$' | \
  egrep -vw 'PWD|USER|LANG' | /usr/bin/cut -d= -f1);

Next, I paste all the exports above into iTerm and finally I am ready to try starting the worker again.

Voila!! It crashes:

rake aborted!
can't convert nil into String
/var/www/foo/shared/bundle/ruby/1.9.1/gems/backup-3.0.16/lib/backup.rb:40:in `join'
/var/www/foo/shared/bundle/ruby/1.9.1/gems/backup-3.0.16/lib/backup.rb:40:in `<module:Backup>'
/var/www/foo/shared/bundle/ruby/1.9.1/gems/backup-3.0.16/lib/backup.rb:8:in `<top (required)>'
...

Here are the lines around line 40 of backup.rb:

##                                                                                                                                                                                                                                                                                                                                                                      
# Backup's Environment paths                                                                                                                                                                                                                                                                                                                                            
PATH               = File.join(ENV['HOME'], 'Backup')
DATA_PATH          = File.join(ENV['HOME'], 'Backup', 'data')

Ah, so the backup gem assumes that the HOME environment variable is set. If not, it crashes.

7. Done

Fortunately the source for “backup” is on Github, so I could easily fork the project and apply the very simple patch that was needed.

If you need to run scheduled jobs in your Rails app (!= background jobs as in Resque), you should use the excellent whenever gem.

However, one detail that is really easy to miss is which shell environment the cron job actually runs under.

bundle: command not found

And an easy way to confuse yourself even further is by trying to debug issues in this area by:

  1. Logging into your  server as the deployment user
  2. Doing crontab -l to see which entries whenever installed
  3. Testing an entry by cut’n’pasting it to the shell prompt
The outcome will not be reliable, because you’re running it from a login shell and cron is not. So for example, your rbenv shims may be in the path while testing, but cron does not load your .bashrc or .bash_profile files.

On a project I work on, the staging and production machines are currently quite different. In production we use RVM (globally installed) while staging has rbenv locally installed for the deployment user.

So in staging I need to assume that no binaries are available and use the full path to bundler, in my case /home/passenger/.rbenv/versions/1.9.2-p290/bin/bundle.

(I know, symlinking this to eg. /usr/local/bin/bundle would probably be more elegant, but I actually like the reminder of where my ruby is actually installed. I don’t spend much time on the servers, so I tend to forget.)

Running stuff only on certain stages

EDIT: See Markham’s comment for a better way to do achieve the same result.

Another issue I struggled with for a while is how to schedule jobs only in production. For example, I don’t want to run S3 backups from staging.

I use capistrano/multistage, so I needed to inspect the value of the current “stage” from schedule.rb. It turns out that by including this:

require 'capistrano/ext/multistage'

set :whenever_environment, defer { stage }
require "whenever/capistrano"

…the stage is available in schedule.rb as @environment. Very simple, but it took some googling to find.

Example schedule.rb

Below is my anonymized schedule.rb. It should be pretty self explanatory, post questions in the comments if not.

def production?
 @environment == 'production'
end

set :bundler, production? ? "gp_bundle" : "/home/passenger/.rbenv/versions/1.9.2-p290/bin/bundle"

job_type :gp_rake, "cd :path && RAILS_ENV=:environment :bundler exec rake :task --silent :output"
job_type :gp_runner, "cd :path && :bundler exec rails runner -e :environment ':task' :output"
job_type :gp_bundle_exec, "cd :path && RAILS_ENV=:environment :bundler exec :task"

every 5.minutes do
 gp_rake "thinking_sphinx:index"
end

every 30.minutes do
 gp_runner "FooBar.update"
end

every 1.day, :at => "4:00am" do
 gp_runner "Baz.purge_inactive"
end

if production?
 every 1.day, :at => "4:00am" do
   gp_bundle_exec 'backup perform --trigger my_backup --config-file config/backup.rb --log-path log'
 end
end

EDIT: Jakob Skjerning posted this snippet in the comments:

job_type :rake, “cd /var/www/appname/#{environment}/current && /home/appname/.rvm/gems/ree-1.8.7-2010.02@appname-#{environment}/bin/bundle exec /home/appname/.rvm/wrappers/ree-1.8.7-2010.02@appname-#{environment}/rake –silent RAILS_ENV=#{environment} :task :output”

My oldish Acer Aspire One has been running Ubuntu Netbook Remix for a year or so, and I have become increasingly annoyed with how slow it is. I remember it as being much faster when I installed it, so either a) I have become faster, b) The performance of the installation has degraded, or c) My memory is incorrect yet again.

I’m actually inclined to go with b). The problem with the Aspire One (some models) is its extremely slow 8GB SSD. I know, those things are supposed to be fast, and they have done wonders for my last two laptops. But on the AA1, especially the write speed really is terrible. So I’m suspecting that something has caused the Ubuntu installation to do more writes over time…whether this is disk fragmentation, the browser cache or whatever, I don’t know.

For the reinstall, I decided to try out Joli OS and so far it looks like this was a good choice! What I basically need is for this machine to be 100% quiet, reasonably responsive and able to render web pages and PDF files. That’s all, because I use it almost exclusively for reading in bed.

The entire OS is built very much like I would expect Chrome OS (which I haven’t tried) to be built. Everything revolves around the web, and it seems that most of the OS is written in HTML5. Joli OS is in fact based on Ubuntu Netbook Remix, but simplified (even further) and integrating Jolicloud, which is pretty cool. In fact Jolicloud is at the core of the user experience where it replaces the launcher found in UNR.

The idea with Jolicloud is to have a web-based, portable desktop containing all the webapps that you use all the time. Facebook, Twitter, Gmail, Flickr, YouTube, just to name a few of the ones that are added by default.

I’m not sure exactly what Joli OS adds to this (because the OS is all I’ve tried), but a feature I found especially nice is the ability to link my device with Dropbox and Google Docs. I use Dropbox for storing a lot of PDF documents that I want to have available anywhere I go. But I also use it for lots of other purposes, and actually I think this is one of the reasons my UNR installation finally gave up – because I was naive enough to have Dropbox installed and syncing everything.

With Joli, I can browse my Dropbox and open individual documents. Browsing is of course slightly slower than on my physical disk, but I tend to keep these documents open for weeks, so this is not an issue. And of course I could just download any files I want to access offline.

So far, I have done very little customization of the OS. The AA1 has a very noisy fan, but fixing this problem with the acerhdf module is a  simple matter of editing a few files. I also want Caps Lock to be an additional Ctrl, and I want both Danish and US keyboard layouts installed. Both wishes were granted after a few minutes of poking around in the settings (these are just the Gnome settings, of course).

I’m really surprised to discover that I seem to fit exactly into the target group for an OS like this. I have installed Emacs (mostly to edit org files), but apart from that, it looks like I’ll be fine with the built-in selection of programs. Which is just Chromium and the Gnome stuff.

Finally, it looks like there’s a whole social networking thing built into Joli which I haven’t explored at all yet. But I have already earned the badge “Recycler” just by installing Joli and recycling a computer that was destined to be replaced by a tablet before long :-)

Today, I found a bug in Rails.

On a project, I need to mail PDF files to users, and I would like to use PDFKit for that. The contents of the PDF are generated by rendering a view, but I don’t really want PDFKit to hit the server to get the content. Both because this breaks down in unit tests where the server may not be running, but also because there’s just no reason to go through the entire stack.

So I decided to generate the HTML with render_to_string, then convert it to PDF with PDFkit, and finally mail it to the user.

However, it turned out that this doesn’t work. Here is a minimal example:

class NoteMailer < ActionMailer::Base
  default :from => "from@example.com"

  def foo
    @notes = []
    render_to_string(:template => "notes/index.html.erb")
    # self.instance_variable_set(:@lookup_context, nil) # <-- Workaround
    mail(:to => "example@example.com")  
  end
end

It turns out that the call to render_to_string initializes the @lookup_context of the controller (== mailer in this case). The lookup context is the object that is responsible for locating templates to render, so it contains (among other things) a list of template extensions to look for.

The call to mail() renders the actual mail, but the mailer now reuses the lookup context that was created by render_to_string. This prevents it from finding the mail template, which is (in this example) named “foo.text.erb”, because the lookup context doesn’t recognize the “.text.erb” extension.

A workaround is to manually clear the lookup context on the mailer after the call to render_to_string, but obviously this is a really bad solution.

Git autopush

January 10, 2011

Some of my Git repos are very, very simple. The commits are all on master, no branches in play at all. And the workflow is just as simple as back in the bad old Subversion days: Pull from the repo, commit some stuff, push it.

A good example is the repo I use to store my Project Euler solutions. It is only used by myself and I always want to push after committing.

So I set out to create (and complete within 10 minutes) my least ambitious project so far: A bash script to add “git push” as a post-commit hook in the current Git repo. You can find it on Github, but the business part of it is right here:

#!/bin/sh

HOOKS_FOLDER=.git/hooks
POST_COMMIT=$HOOKS_FOLDER/post-commit

if [ -d $HOOKS_FOLDER ]; then
    if [ -f $POST_COMMIT ]; then
        echo "Post commit hook already exits, please add 'git push' manually in .git/hooks/post-commit"
        exit -1
    fi
    echo "git push" > $POST_COMMIT
    chmod 755 $POST_COMMIT
    exit 0
else
    echo "This command must be run in the root of a Git repository."
    exit -1
fi

Suggestions welcome for improving the script. I think this is the first multiline Bash script I ever wrote, so I expect it to be flawed.

Ah yes, and before you comment: I know that this is not the shiniest of Git workflows, I know that branches are cheap etc. I love all the cool features of Git but sometimes I just don’t need them.

However I do not know if Git already has this functionality built in via some configuration option?

JRuby/Clojure integration

December 21, 2010

I recently did a demo at an aarhus.rb meeting, showing how to call Clojure code from a Rails app. Specifically, I wanted to try using the very impressive Incanter library which is written in Clojure. This is not just a contrived example – Incanter is something you’d want to integrate with if you needed to do something statistics related.

This post is a walkthrough of how I got a very simple demo app running. It displays a histogram and allows the user to enter the size of the normal distribution sample.

NOTE: The finished Rails app can be found on Github.

1. Set up the basics

As with any other project, I am using RVM for managing multiple (J)Ruby implementations and keeping gemsets project-specific.

So first, create a directory for the Rails app. Inside that directory, create a .rvmrc file containing the following line:

rvm --create jruby-1.5.2@jruby-clojure-demo

Now do a “cd .” to have RVM pick up this file. This switches to jruby-1.5.2 (which must be installed, see RVM docs) and a newly created “jruby-clojure-demo” gemset.

We obviously need Rails for this demo:

gem install rails

This installs Rails 3.0.x as of this writing. Since this takes a little while to complete, fire up another terminal and let’s move on.

2. Fetch all the JARs we need with Leiningen

Leiningen is a build/dependency management tool for Clojure. I’m not using it to build anything for this demo, but it is useful for pulling down all the JARs we need.

Install Leningen, then create a new project with “lein new”, eg. in ~/tmp. Modify the project.clj file to look something like this:

(defproject incanter-test "1.0.0-SNAPSHOT"
  :description "FIXME: write"
  :repositories {"incanter" "http://repo.incanter.org"}
  :dependencies [[org.clojure/clojure "1.1.0"]
                 [org.clojure/clojure-contrib "1.1.0"]
                 [org.incanter/incanter-full "1.0.0"]]
  :dev-dependencies [[swank-clojure "1.2.1"]])

Finally run “lein deps” to have Leiningen download all required JAR files into lib/. This also takes a while, but now our Rails gem is installed and we can move on.

3. Create the Rails app

We use -O to skip activerecord:

rails new . -O

Next, add jrclj to the Gemfile. This is a JRuby/Clojure bridge created by Kyle Burton, and it’s the heart of this integration.

gem 'jrclj'

Do a “bundle install” to install all gems required by our new Rails project.

4. Enabling Clojure integration in the Rails project

To allow us to call Clojure code from the Rails project, we need to load all the JARs we downloaded in step 2.

First, create a “deps” folder inside the Rails app and copy all the JARs into it. Then, insert the following before the Bundler#require call in application.rb:

require 'java' 
Dir["#{File.dirname(__FILE__)}/../deps/*.jar"].each do |jar|
  require jar
end

This code needs to be executed before the Bundler call because the imported JARs are needed for the JRClj bridge to initialize properly.

5. Creating the histogram view

It’s time to call some Clojure code to render our view. I’m not showing how to create the actual Rails controller, the view or the simple form that collects the size attribute. See the Github repo for a working example.

To create the histogram, we use the method incanter.charts/histogram. The input to this method is the result from incanter.stats/sample-normal, and we save its output with incanter.core/save.

The output of incanter.core/save is a stream of PNG data. It should have been possible to simply save this data into a new ByteArrayOutputStream (yes, with JRuby you can just instantiate that class), but because of what looks like a glitch in the JRuby/Clojure bridge, this did not work. It looks like the return types may be wrong in some cases, but I haven’t found time to investigate yet.

So I resorted to a dirty hack: Dropping the PNG data into a file in public/images and simply inserting an image_tag into the view to render it. Thread-safe? I don’t think so :-)

    clj = JRClj.new
    %w{core stats charts}.each { |l| clj._import("incanter.#{l}") }
    clj.eval clj.read_string "(incanter.core/save \
                                (incanter.charts/histogram \
                                   (incanter.stats/sample-normal #{@size.to_i})) \
                                 \"#{::Rails.root.to_s}/public/images/out.png\")"

So what I’m doing here shows one of the two ways you can use JRClj: Evaluating a string containing Clojure code. The clj.eval call above evaluates to the value of the call to incanter.core/save, ie. the outermost sexp in the string.

The other, arguably cleaner and more Ruby-like mode to go, is to instantiate a JRClj object (like above), then call the Clojure functions as if they were Ruby function, with the difference that dashes in method names are replaced with underscores:

jruby-1.5.2 > clj = JRClj.new
=> (truncated...)
jruby-1.5.2 > clj._import("incanter.stats")
=> (truncated...)
jruby-1.5.2 > clj.sample_normal(1000).take(5)  
=> [-1.67885083887929, 0.332869879174951, -0.178234653916897, -0.382876843067349, 1.16325259920326]

Appendix: A Clojure-enabled IRB

I think most Ruby programmers agree that the IRB is a very useful thing, and it becomes even more useful when playing with stuff like JRClj.

Included in the JRClj documentation is a small script that loads the necessary JAR files and fires up an IRB session. I used this while working on the demo, and the script can be found as “jrepl” in the root of the Github repo.

MoinMoin encoding issue

September 4, 2010

As mentioned in my previous post, the encoding in my MoinMoin wiki was somehow broken. When entering Danish characters on a page, they were turned into garbage (which persisted when switching to edit view). Surprisingly, I fixed this by accident today when making a change to improve the performance of the wiki.

It turns out that the preferred way to run a MoinMoin instance on Apache2 these days is with mod_wsgi. I’m not even sure what that is, except it has to do with running Python code in an Apache server via some daemons.

To install mod_wsgi:

sudo aptitude install libapache2-mod-wsgi

The only change needed in my Apache2 configuration was replacing this line:

ScriptAlias /w "/usr/share/moin/server/moin.cgi"  

with these lines:

WSGIScriptAlias /w /var/www/chopwiki/moin.wsgi

WSGIDaemonProcess chopwiki-daemon user=www-data group=www-data home=/var/www processes=5 threads=10 maximum-requests=1000 umask=0007

WSGIProcessGroup chopwiki-daemon

 

The “moin.wsgi” referenced above was copied from /usr/share/moin/server into /var/www/chopwiki but not modified. 

After making this change, the character encoding issue was gone as well! I have no idea why, and I’m not going to spend time finding out.

Follow

Get every new post delivered to your Inbox.