Friday, March 27, 2009

Thoughts on GPS data analysis.

Notes to self:
  1. I'm collecting craploads of GPS data, delimited by 'event' (i.e. a run/ride/ski).
  2. I want to correlate that data to compare metadata associated with that GPS data (heart rate/pace)
  3. I want to do this in a non n^2 fashion.

Some thoughts:
  1. Similar 'events' usually center around the same GPS range. What is the margin of error for that range?
  2. If I can quickly get similar events grouped by center, than I can chunk them up by distance. Every event has a starting point, and even though GPS data will not be identical, distance would be a good way to divide events into similar subsections.
  3. Once I get similar event subsections, I can start comparing (i.e. graphing/analyzing) HR/pace data.
  4. Of course, all of this assumes that I've got a system that displays basic statistics for each event, and I'm not there yet.

Monday, March 16, 2009

On the Couch(DB): Part 1: setup on mac OSX Leopard

Why CouchDB? I'm running into situations where a traditional database is giving me headaches:

(1) At work, I'm seeing people spend more and more time maintaining the database and tweaking the schema behind an ETL application used to normalize input data into a standard format as the data being persisted starts to get into the TB range.
(2) At home I want to write an app storing my GPS data from my runs and rides, and I have no idea what the metadata around that data is going to be.

The common denominator for both of these situations is that a traditional database schema, with very explicitly defined tables and relations, is a liability, either eventually, as in case 1, or initially, as in case 2. In either case I'm more likely than not to be wrong with any schema design choices that I make.

In both situations, a more flexible approach would be to treat each discrete set of data as a bag of attributes (key value pairs) with a couple of unique keys that I can search on. I could see creating a Lucene or a BDB with those keys and a pointer to a file containing the rest of the data. This would work pretty well, until it was time to go to production or scale, or both. When I go to production, I want the data to be highly available. When I scale, I want the data to be partitionable. To do either I would have to extend my naive hashing scheme.

CouchDB is a document based database -- in other words each document is an attribute bag identified by 1..N keys -- that is fault tolerant and distributable, with incremental replication. In other words it is the extension my naive hashing scheme needs to be a real boy instead of a puppet.

Enough hype and bloviation...I need to install couchdb on a macbook pro running leopard. I followed the instructions found here, and quickly discovered I had neglected to install XCode when I had upgraded to Leopard. Doh!

btw, if you ever do this, you'll find that macports has downloaded and staged components into /opt/local/var/macports/build, and may not be able to build them successfully after failure due to lack of compilers, etc. Removing the staged directories for the components that were downloaded is the best way to 'reset' macports. Trying to configure, build, and install macported components yourself is not :)

I did run into a minor bump: tcl was installed at 8.4.14_0 on my box, I needed to upgrade to 8.5.6_0 because sudo crapped out trying to build tk. I did this by hand:

sudo port install tcl --version 8.5
sudo port deactivate tcl @8.4.14_0
sudo port activate tcl @8.5.6_0 (using sudo port list | grep tcl to get exact version numbers)

OK, with XCode and dependencies installed, I then followed the rest of the instructions, which, beyond building and installing couchdb, show you how to create the required couchdb system acct, install it as a service, and launch it at startup. All Leopard specific, but all highly convenient.

I also found this one step install -- good for getting up and going, but I think in the long run I want to build and understand couchdb a little more than this approach lets me.

Next up: simple couchdb access.

Sunday, March 15, 2009

Intervals -- why am I doing this again?

Today I got bitch slapped by reality, or actually the weather, which is one variant of reality. I was planning to do a long run of at 16-18 miles, mixed in with 10 1 mile intervals. It was raining slightly, or 'misting' as we like to say around here (in Seattle we have many different terms for rain), nothing too hard, but it was 35 degrees, so I put on a shell instead of a tech tee, strapped my headlamp on (thanks to Daylight Savings, it is now pitch black at 6:00 AM), and headed out, full of ambition.

I had downloaded the days workout into my Garmin. 10x1 mile at 8:40. I felt a little slow going out, but chalked it up to the disoriented feeling I was getting running in the rain, in the dark. It is fairly surreal -- my headlamp lights up a small cone of streaking raindrops, and very little else, so I start to zone out (and slow down).

When I started the intervals, I knew something was wrong. My get up and go had got up and went. And it was raining harder. No more mist, more like psuedo hail. Then snow, big fat flakes that chilled my face to numb as they fell and stuck. Still, I soldiered on, trying to maintain a pace that seemed pretty achievable from the glow of my laptop that morning, but was proving to be a painful challenge on the hilly Mercer Island loop.

A younger and less intelligent me would have ground myself down into a little nub trying to do the intervals, but I'm older, and less inclined to hurt for no reason, so I quit. I turned around, changed my plans from 18 miles to 11, and packed it in. I ran a couple of intervals on the way back, but by then my legs were so frozen that it felt like I was running through (very cold) molasses.

Later, in the shower, I analyzed the workout. Heres the thing. I know I'm fairly motivated. I know that I'm working hard. My heart rate told me so. And that ended up being the key. Why am I trying to maintain some arbitrary pace up and down a hilly course? I should be running these intervals by heart rate -- by setting a target to stay above -- or below, depending on the goal of the interval -- I'll run at a specific effort, and the pace will be what it is. This feels a lot more real than trying to do intervals at X:30/mile 'because so and so said to', or because X:30 means I'll run a marathon in Y:45. Those are completely arbitrary, completely false goals, and I will crash and burn trying to make them work. On the other hand, I firmly believe that if I can run to a specific heart rate, I will see improvements in speed, because I will see improvements in efficiency.

Friday, March 13, 2009

Old Guy Training

This whole marathon training thing has brought me face to face with my advancing years.
The last time I seriously trained for running was when I was 30, training for my first 1/2 Marathon. I had never run 13 miles before and as a result I prepared as well as I could, which meant that I went out, ran as fast as I could for as long as I could, took the next day off, and did it again.

This time around that approach really doesn't work. I just can't recover between efforts, even with a day off. Maybe it's the kids, maybe it's the job, maybe it's the additional 10 years. But after a long run or a speed workout I'm cooked for at least a day if not two. However, I'm running much longer than I ever used to, and at least as fast.

Which reminds me of my favorite 'old guy' story. Actually it's an 'old bull' story. There are two bulls, one young, one old, sitting at the top of a hill, overlooking a pasture, staring down at a bunch of attractive (to bulls, anyway) cows. The young bull says to the older one: "Hey Pops, lets run down the hill as fast as we can and fuck a cow!". The old bull turns, looks at the younger one and says "Hey kid, lets walk down and fuck em all." I wish I had been this intelligent about training earlier, when I actually had the ability to recover quickly from a hard workout. But youth is definitely wasted on the young, and I can't turn back the clock.

These days I'm running one long day, up to 1/2 to 3/4 is at marathon pace, and one fast day, consisting of mile splits at 1-2 minutes faster than marathon pace. Both runs include stretches of 'Gallow-walking', as recommended by Jeff Galloway. I usually walk until my pulse drops to <> 15 miles, the other one will be around 8-9 miles, and the easy runs all under 5.

I'm hoping that the speed work as well as the marathon pace runs on the longer days will effectively get me faster. I definitely feel like speed is the last thing to come around. The other stuff, like stretching, climbing, pullups and squats, really helps balance out the pounding from the running. As the days get longer I'll probably skip one of the runs and get in a medium length bike ride.

One benefit of being 40 is simply 10 more years of experience. In the last 10 years I have grown a lot as a person. I've become a father, I've lost my father to cancer, my entire perspective has shifted from 'what can I do for fun next?' to 'what can I do for my family?'. The additional responsibility gives me the perspective to not take training so seriously. At the same time I am able to fully commit to training when I am doing it, because my time is so limited. Any pretensions of actually being 'elite' have disappeared completely. I'm doing this because I like it, not because I'm good at it. I can honestly say that I enjoy running more than I ever did before, because it gives me 2-3+ hours of silence in which I can work things out.

Last week my long run was a relatively flat 18 miles -- in freezing hail and rain. I froze parts of my body I never want to freeze again. However the run itself was a great confidence booster -- mostly due to the weather. Training has been a constant adventure, pushing these distances that I've never done before has been a lot of fun.

The marathon course was published online yesterday, and it actually looks kind of hilly. So that, plus the fact that we're going to Disneyland next week, makes me want to 'burn one down' on Sunday, maybe a good 18 miles with hills. An easy run on Monday, climbing on Tuesday, and speedwork on Wednesday prior to leaving for CA will probably set me up just right to take it easy and recover while in the land of Mickey Mouse.

Thursday, March 12, 2009

Migrating an RoR app from Ubuntu Feisty to CentOS 5.2 Part 4: Trying not to get impaled by Vlad

This is a continuation from Part 3:
  1. I've installed mod rails
  2. I've built and migrated the database (and enabled script/console)
  3. I've built and installed RRDTools.
In this part I get Vlad the Deployer working.

Local Machine Changes
(1) In my local /config/deploy.rb I changed the :domain value to point to my new box:

set :domain, "xen-5.evri.corp"

(2) I tried to run vlad:setup. My remote install commands, run via SSH, were not being found.

Server Machine Changes

The problem with ssh seems to that my user environment is not being established with a non interactive terminal. Even after I modified my /etc/profile or files in /etc/profile.d or ~/.bashrc , even though that is how I understood non interactive terminals get their env variables. The problem was that the default sshd_config does not allow user environment variables to be set via sshd. So in /etc/ssh/sshd_config, change

# PermitUserEnvironment no


PermitUserEnvironment yes

I reran vlad:setup

rake vlad:setup rake full_vlad (I've modified my rakefile as follows: task :full_vlad =>['vlad:update','vlad:migrate'...all other tasks I need to do ]
As a final step, I asked OPS to map the old server name "dashboard" to map to the new server.

Are We There Yet?

I run a set of ruby cron jobs on the machine (TODO: migrate these to use DaemonSpawn??), I migrate cron settings over by

crontab -l > cron
scp cron to new machine
crontab cron

Now I run rake full_vlad, and the server deploys after a full source update and database migration.

Unfortunately, my graphs are not showing any data. wtf?

I call rrdtool via IO.popen to extract my data from rrd files. On the centos box, IO.popen is not returning a readable IO object. I cannot repro with script/console from the production box (i.e. that gives me valid IO).

Some sleep and investigation reveals that the environment variables for my user are not being set when Ruby spawns a process. This, again, is unique to centos. So I hardcode the path of rrdtool in my IO.popen call, and am now receiving data.

Migrating an RoR app from Ubuntu Feisty to CentOS 5.2 Part 3: Installing RRD

Continued from Part 2:
I've installed RRD before. I followed these instructions, and took the following additional steps to install development headers for pango and libxml2 (missing from default install of centos)

sudo yum install libxml2-devel

sudo yum install pango-devel

In the instructions above, I downloaded rrdtool source. Next I build and install rrdtool
(1) ran make in rrdtool dir
(2) sudo make install to install in /usr/local/rrdtool-1-3.6
(3) modify the path in /etc/profile to include /usr/local/rrdtool-1-3.6/bin

Next up, the home stretch, Impaling myself with Vlad.

Migrating an RoR app from Ubuntu Feisty to CentOS 5.2 Part 2: Database Migration

Part 2 in a series of 4

In Part 1, I got Phusion Passenger up and running. Now I needed to migrate my database schema across. I use Postgres 8.3, which was not installed on the centos box.

Installing Postgres
I need to make sure postgres is (a) installed and (b) configured.

(a) installing postgres: following steps outlined here.
(b) configuring my database and user: my database settings from database.yml are:

adapter: postgresql
host: localhost
database: deploy_monitor_development
username: deploy_monitor
password: deploy_monitor

adapter: postgresql
host: localhost
database: deploy_monitor_test
username: deploy_monitor
password: deploy_monitor

adapter: postgresql
host: localhost
database: deploy_monitor
username: deploy_monitor
password: deploy_monitor

Because this is a production box, I only need to configure the production database, meaning I only need to add the production user deploy_monitor and the database deploy_monitor.

I followed the instructions from the postgresql site to add a user and create a database.
I also needed to install the following gems:
  • postgres (
  • postgres-pr (0.5.1)
to enable Rails to connect to Postgres.

Console Access
A lot of times I need to log into the box and check something via script/console, i.e. to run an ActiveRecord query.
In order to run console, I need to build readline:

(1) Install readline and readline-devel

yum install readline
yum install readline-devel

(2) build/install ruby readline bindings

cd {ruby src}/ext/readline
ruby extconf.rb
sudo make install

Continued: Part 3, Installing RRD

Tuesday, March 3, 2009

Migrating an RoR app from Ubuntu Feisty to CentOS 5.2 Part 1: Setting up mod_rails

Notes from migrating an app from an overloaded Ubuntu server box to a new, virgin centOS box. The app is a Rails app, with a postgres db, served up via mod_rails (passenger). The destination box has apache2 and Ruby 1.8.7 installed.

My app uses phusion passenger, which is fairly easy to install.

(1) sudo gem install passenger
(2) I then needed to generate the apache mod for passenger.

sudo passenger-install-apache-2-module

This step told me the following:

* To install GNU C++ compiler: Please run yum install gcc-c++ as root. * To install OpenSSL support for Ruby: Please (re)install Ruby with OpenSSL support by downloading it from * To install Apache 2 development headers: Please run yum install httpd-devel as root.
Installing gcc-c++

sudo yum install gcc-c++, this was painless. Yum is installed by default on the version of centos I was migrating to (5.2)

Enabling Ruby with Open SSL

Unfortunately the company approved version of Ruby 1.8.7 did not include ext/openssl, so I needed to download, build, and install on my own.

(1) download source


(2) untar, cd to extracted dir, and run ./configure
(3) make

At this point, you usually sudo make install, but I didn't want to re-install ruby, I just wanted to add openssl to an existing installation.
(4) cd extracted dir/ext/openssl
(5) ruby extconf.rb
(6) make
(7) sudo make install

I re-ran sudo passenger-install-apache-2-module, which completed successfully. As instructed, I pasted the following into /etc/httpd/conf.d/passenger.conf, because all files in conf.d are loaded by httpd.conf:

LoadModule passenger_module /evri/ruby/lib/ruby/gems/1.8/gems/passenger-2.0.6/ext/apache2/ PassengerRoot /evri/ruby/lib/ruby/gems/1.8/gems/passenger-2.0.6 PassengerRuby /evri/ruby/bin/ruby
I then put in the server directives to map my public directory as well as a specified VirtualHost to my app.

Options FollowSymLinks
AllowOverride none
Order allow,deny
Allow from all

ServerAdmin webmaster@localhost
ServerName xen-5
ServerAlias xen-5.local xen-5.evri.corp
DocumentRoot /var/www/rails/dashboard/current/public
ErrorLog /var/www/rails/dashboard/current/log/server.log

# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
LogLevel debug

Phusion Passenger aka mod_rails was configured.

Next: Database Migration