Waving Not Drowning: 2008

Tuesday, December 30, 2008

Bulk Resource Uploads via ActiveResource

Background:
I recently had to reduce the across the wire trips for the monitoring app I had hastily thrown together because the amount of time spent making trips serializing and deserializing individual resources was beginning to affect monitoring performance. The Second Fallacy of Distributed Computing was beginning to rear it's ugly Putinesque head.

I knew that this was coming, but premature optimization has never worked out for me, so I went with the default ActiveResource approach -- everything is a resource, and a CRUD operation on a resource maps to the corresponding http 'verb' -- until smoke started pouring out of my servers.

My basic requirements:

Create a web service that can store data for hundreds of individual datapoints at 5 minute intervals.
Those datapoints can come and go.
The implementor of the statistics gathering code really doesn't need to know the by the wire details of how their data is getting to my web service.

Implied in these requirements is the need for efficiency:

I shouldn't have to perform individual CRUD ops on each statistic every 5 minutes.
I shouldn't have to make an over the wire request for data every time I want to read that data.

From those implications I arrived at the following distilled technical requirements:

I need to bulk upload statistics, and create/update them in one transaction in order to reduce the need for individual CRUD ops. At this point I'm going to choose to 'fail fast', aborting if a single create/update fails, so that I know if something is wrong.
I need to keep a client side cache of those statistics around, only updating them when they've changed (important aside: because this is a monitoring application, it is assumed that each statistic belongs to a single client, so there is no need for out of band updates).

The Juicy Bits
I'd love to go into a long digression about how I explored every which way to do this, but I'll summarize by saying that my final solution had the following advantages:

Uses the existing ActiveResource custom method infrastructure
No custom routes need to be defined
Complexity hidden from the user, restricted to client side upload_statistics call and server side POST handler method.
The priesthood of High REST will not need to crucify me at the side of the road.

ActiveResource extension:

I needed to extend default ActiveResource. By default, AR is not aware of data model relationships. For example, invoking the to_xml method on an AR class only shows it's attributes, even if you specify other classes to include, like this:

ARDerivedClass.to_xml(:include=>[childClass])

This limitation makes being smart about bulk updates pretty hard. I needed to introduce the notion of a client side cache, initialized and synchronized as needed.

My data model looks roughly like this:

Monitor=>has many=>Statistics

The default AR implementation of this looks like


class Statistic <>

I've extended as follows:

implemented an add_statistic method to Monitor that caches Statistic objects locally
Added an upload_statistics method to the Monitor that serializes the client local statistics and then sends them to the server.
modified the default POST handler for Statistic objects to handle bulk creates/updates.
initially loaded the statistics cache on the client side.
lazy synced the cache to the server side, updating on find and delete requests.

Client and Server code by Operation

I want to point out a couple of things in this code:

(1) Cache loading is done in Monitor.initialize(). That way it gets called whether the client is retrieving or creating a Monitor.

  def initialize(attributes = {}, logger = Logger.new(STDOUT))
if(@@logger == nil)
 @@logger = logger
end

@statistics = {}

if(attributes["statistics"] != nil)
 attributes["statistics"].each do | single_stat_attributes|
   @@logger.debug("loading #{single_stat_attributes["name"]}")
   @statistics[single_stat_attributes["name"]] = Statistic.new(single_stat_attributes)
 end

end
super(attributes)
end

This required the following modification on the Monitor controller (server) side:


def index

if(params[:name] == nil)
@monitor_instances = Monitor.find(:all)
else
@monitor_instances = Monitor.find_all_by_name(params[:name])
end

respond_to do |format|
format.html  #index.html.erb
format.xml  { render :xml => @monitors.to_xml(:include=>[:statistics]) }
           format.json  { render :json => @monitors.to_json(:include=>[:statistics])}
      end
end

I needed to make sure that returned monitor instances included child statistics in order to load the client side cache.
(2) get_statistic and delete_statistic synchronize with the server side.
(3) I've added a new upload_statistics method. I wanted to override save, but what I found at runtime is that the ActiveResource.save method calls update, which loads statistics as attributes. This wont work for us because some of those attributes may not exist on the server side, so an 'update' operation is invalid. In upload_statistics, a custom AR method posts the client side cache of statistics to the StatisticsController on the server side:

def upload_statistics


if(@statistics.length > 0)
data = @statistics.to_xml
self.post(:statistics,{:bulk=>"true"},data)
end

end

Note that the first parameter is the method name, the second is the param options, and the third is the actual post data (that contains the serialized client side map of the statistics. The actual path that this POST gets sent to is /monitor_instances/:id/statistics.xml

In the server, I do not have to add/create any new routes, but I do need to make sure that the default POST handler checks for the bulk parameter and handles accordingly.




# POST /statistics
# POST /statistics.xml
def create

if(params[:bulk] == nil)
# handle a single update
else
#handle a bulk update
end
end

Marshalling and Saving stats on the Client side.

In the StatisticsController,create handler, I need to unmarshall the xml into statistics. There are these instructions to extend ActiveRecord via the standard lib/extensions.rb mechanism, but they won't work for me because I'm serializing a hash, not an array of Statistic objects. So I need to deserialize and create/update objects by 'hand', which actually isn't that hard:




cmd = request.raw_post
monitor_instance = MonitorInstance.find(params[:monitor_instance_id])
logger.debug(cmd)
hash =  Hash.from_xml(cmd)

       hash["hash"].values.each do | options |
stat = Statistic.find(:first,
:conditions=>["monitor_instance_id = #{params[:monitor_instance_id]} and name = '#{options["name"]}'"])

if(stat == nil)
 #create a new Statistic object
else
 # update existing statistic object
end
end

respond_to do |format|
statistics = Statistic.find(:all,
:conditions=>["monitor_instance_id = #{params[:monitor_instance_id]}"])
format.xml  { render :xml => statistics.to_xml, :status => :created, :location => monitor_instance_path(@monitor_instance) }
end

In the code above, I deserialize the xml payload using Hash.from_xml, which creates a hash around the hash encoded in the xml data.

To get to the original hash of statistics options, I had to extract them from the encoded hash:

hash = Hash.from_xml(cmd)
hash["hash"].values.each do | options |
# create / update the stat that corresponds to options["name"] under the monitor
end

Summary
This took a lot longer than expected, because I ran into issues with trying to use standard methods, i.e. save, that I still don't understand. However, I know a lot more about AR and how to extend it to do more intelligent sub resource handling.

Sunday, December 21, 2008

Best advertainment webisode ever.

This one made me laugh so hard I pulled something.

http://bewareofthedoghouse.com/VideoPage.aspx

If this is the future of ads, I'm hooked!

Thursday, December 18, 2008

Sinatra, my new favorite prototype playground

About a week ago I was trying to get something ready for the first annual Evri Hack-a-thon, a concentrated 2 day affair where we focused on putting together cool apps with the new Evri API . The event was a blast, I for one rediscovered how fun writing code for code's sake really is.

I was implementing a 'music browser' mostly in javascript, and needed a proxy server to make calls out to those services that didn't have JSONP support.

slight digression here. JSONP is the coolest thing since sliced bread. I say that as someone who loves bread, even more so when it is sliced. The ability to retrieve data w/o a backend is so powerful I _almost_ understand why it's been seen as a Terrible, Horrible, No Good Hack. But not really, because it makes life as a developer so much easier.

I wanted to spend most of my time in the JavaScript, not futzing with the backend server. Because I've been mostly coding in Ruby for the last year, that ruled out rolling up a quick Java Servlet -- I didn't want to spend any time installing Tomcat/Jetty and associated jars, and having to remember how that world worked. I also didn't want to write a Rails app -- seemed ridiculous when I didn't have a data model.

I looked around at a couple of lightweight Ruby Frameworks, like Camping and Merb. Camping would have required me to down version to 1.8.5, and Merb overwhelmed me with the volume of configuration choices. In other words,my ideal proxy server had to be stone cold simple because I simply didn't have the time for anything else.

Enter Sinatra. Elegant, concise, and witty, just like it's namesake. Here is how you configure a path to /json/getjswidgets in Sinatra:

get "/json/getjswidgets" do
cb = params[:callback]
href = params[:href]
...
end

A couple of things to note in the example above:
(1) params are retrieved with the params hash, just like in Rails. So this method was actually called as:

/json.getjswidgets?callback={temp callback name}&href={some value}

(2) all paths are handled with the same 'get...do...end' syntax. It's that simple.

Another example:

get "/json/artists/:name/album" do
cb = params[:callback]
name = params[:name]
....
end

Note that the name parameter is embedded in the path, just like you do in the standard routes.rb file in rails.

Once you get past the path routing (which takes about as long as it does to read this sentence), Sinatra continues to be blissfully easy by allowing you to render the view via erb, builder, haml, and sass. You can render the view inline, or modularize it by putting the files in a view directory.

Helper methods are defined in a helpers block:

helpers do
def helper_method
...
end

...
end

Static assets are kept in a public directory -- again, Sinatra takes a "if it ain't broke..." approach that really minimizes the learning curve. Normally, I loathe the whole "But Ours Go To Eleven!" mindset that I see in frameworks because it means that I have to once again learn another unique set of concepts to get anything done. Sinatra does the exact opposite in leveraging a well known, well used, well understood set of conventions/concepts from Rails while stripping the concept of a framework down to that which is as simple as possible, but not simpler. Sinatra, you're my new BFF!

Sunday, December 7, 2008

Converting to a Single Speed

Why Single Speed? A combination of 'luck' and timing have led me to re-rig my commuter bike as a single speed. The 'luck' part was a pulley on my circa 1995 rear derailleur exploding. The timing part is the rise of single speeds in general. I've been noticing the rise in single speeds in the last couple of years as a bike commuter. They look so....simple and maintenance free!

I'm riding a 14 year old Kona Kula, once my singletrack steed, now my urban commuting stalwart. The key thing about converting a bike with vertical dropouts into a single speed is that you can't slide the rear wheel back and forth to get the perfect chain tension. You need a chain tensioner. A chain tensioner is like a derraileur-lite that pulls the chain taut. There are several brands out there, all of which make a simple, bullet proof device.

The other key thing about converting a standard bike into a singlespeed is what to do with your rear cluster. There are a number of freehub to singlespeed conversion kits out there that provide spacers and cogs to replace your freewheel.

I ended up choosing the Forte Singlespeed conversion kit, made by the Performance Bicycle house brand. This was the only brand I found out there that offered the freehub spacers and cogs, as well as the chain tensioner, for by far the cheapest price -- for $25 I got everything, including 3 cogs to experiment with. Compare that to the Surly solution, which was going to cost $50 for the chain tensioner, and $30 for the spacers, and $10+ for the cog.

It also had what I considered to be a key feature: it allowed me to adjust the horizontal placement of the tensioner. This was important because I had no idea where I would be placing the cog to line up with the chainring.

I also wanted to try using an original cog and chainring, since I had replaced them a year ago and they weren't completely beat down yet. The preferred way to go is to do a clean replacement, but that would require a new chain and front chainring, and I wasn't sure that I could find a replacement front chainring without a special order.

Installation was easy, and gave me a chance to clean my bike for the first time in 6 years!

Tools required:

a cassette removal tool and chainwhip for freewheel removal.
an allen wrench for the usual.
a crank puller to remove the inner chainring on the triple.
a chain tool to break and resize the chain.

Step 1: remove the chainrings. The only way to get the inner chainring out is to remove the crank from the bike. The optimal position for the new chainring is in the middle position of the triple crank. But the chainring I wanted was 44 tooth and too big to use in the middle position -- it rubbed the chainstay -- so I had to keep it in the outer position:

This is where the horizontal adjustablilty of the Forte chain tensioner became really useful. It let me slide the tensioner cog over to the outside with an allen wrench.

Step 2: remove the freewheel using the chainwhip and the freewheel tool.
Step 3: install the chain tensioner where the real derailleur used to be.
Step 4: position the singlespeed cog -- using spacers to fill up the freehub around that cog -- and the chain tensioner cog so that they are inline with the chainring. This is important. If you don't line things up, the chain will derail. In the picture below, note the spacers around the cog. Because I installed my chainring on the outermost position, I've had to position the cog at the outer end of the freehub (with only one spacer between it and the cassette lockring).

Step 5: whip out the chain tool resize the chain so that the chain tensioner is engaged (i.e. it has tension).

I used a 16 tooth cog from my old freewheel, and my existing chain. This may not work because that cog was designed to be 'shiftable', and the ramps on the cog body may derail the chain. However, I wanted to give this a try before buying a new chain and front chainring.

Wednesday, November 26, 2008

Basic Auth over HTTP using Ruby, Net::HTTP

I'm writing this one down because it took way too long for me to stumble around it. The Net::HTTP class provides http transport level page access. Most of the time I use open-uri, which treats web pages like files, because that is, as the kids say, one hella fine way to roll.

Too bad it doesn't work with Basic Auth.

I've got a service at http://db-import that listens on 8080. It requires valid credentials. I want to get some type data from it and parse it with Hpricot. Normally I would do this like so:

doc = open("http://db-import:8080/rest/entityTypes.xml) { |f| Hpricot(f) }

however, the requirement of basic auth makes me do this:

ENTITY_TYPES_REQUEST = "http://db-import:8080/importapi/rest/entityTypes.xml"
.....
uri = URI.parse(ENTITY_TYPES_REQUEST)
Net::HTTP.start(uri.host,uri.port) do |http|
req = Net::HTTP::Get.new(uri.path)
req.basic_auth user,pass
response = http.request(req)
end

doc = Hpricot(response.body)

A couple of things to note (that got me):

I needed to specify the hostname w/o the transport. Instead of "http://db-import", specify "db-import". Yeah, that's kind of obvious after the fact :). I URI
HTTP.start only opens the connection, the user then makes all requests/process all responses within the connection block. So in the code above I first configure the request object with basic auth and then use it to make the request.

Not terribly hard, but I do tend to trip up on details and wanted to spare some pain the next time around.

Monday, November 24, 2008

I dont like running (data) naked

Yesterday morning, at 6:45, I was on semi autopilot, stepping out the door for my morning run. I grabbed my trusty Garmin 305, walked out the door, and hit the on button. And waited. And tried again. I figured my gloves were a little too thick, so I took one off and then pressed again. And pressed harder. Nothing.

After almost 2 years of pretty much day in day out use, in rain, wind, and snow, through heat and cold, thick and thin, my little friend had left the building. Operating on pure reflex, I plugged it back into the recharging cradle and went back outside, bereft.

This was a truly sad moment for me. There I was, in the dark and cold, trying to get excited about going running without second by second updates on heart rate, pace, altitude, and distance covered. At that moment I realized that I was being ridiculous, even diva-like. I mean, wasn't running for running's sake not enough? Would I have even had this mental conversation 2 years ago?

Well....no. Not at 0-dark-45 in the morning. When I'd rather be in bed, warm and comfortable, dozing in and out of consciousness. Instead, I'm standing in a slight drizzle with my headlight strapped on, bundled up from head to toe in waterproof yet breathable and oh-so-reflective winter running gear. It would be different if, say, I was running on the beach at Kauai, wearing shorts and a t-shirt. I dont think I would need motivation coming from my wrist-top computer.

Then again, maybe I would. I mean the coolest thing about the GPS/HRM is that it tells a story, of where I've been and - literally - what I've done. It tells a story and then persists it, for later recall. When I upload my run to the computer, I get to see how ~~slow~~ fast I went, the hills on the route, the overall distance, and I get to remember how I felt at specific points in the run. And if I don't remember, my heart rate tells me. It's sort of like a data photo album, where the mix of lat/long, altitude, and heart rate combine to give me a snapshot of how I felt at every point in the run.

I took off on the run anyway, shamed by my dependence on data, determined to experience 'pure' running without instrumentation. And I actually did. I couldn't refer to my data feed, so I started to pay attention to my form, my breathing, my stride, my forward lean. I knew the mileage of the route I was running (6.23 to be exact), but didn't know exactly how far I had gone, or how far I had left. And although I knew that I was somewhere between 125 and 145 bpm, I had to pace myself by how I felt at that moment, not how my watch was telling me how I felt.

So, yeah, I enjoyed it, a little. And I was actually resigned to a month of 'naked' running while I sent my little buddy back to Garmin to be refurbed. It is, after all, the middle of winter, and I'm not training for anything in particular, more doing long runs to justify eating all those XMas sugar cookies.

I had just convinced myself that this whole zen running thing was good, really good. But when I went down to the garage to pack the HRM up so I could ship it back to Garmin for refurb I noticed that it was on, telling me that it was fully charged. Slowly, disbelieving, I turned it back on, and watched it search fruitlessly for a satellite connection. "Are you indoors?" it asked me. It seemed a little irritated. I turned it off, put it back in the cradle, and went back upstairs -- all of a sudden tomorrows early morning run is looking a lot more fun.

Tuesday, November 18, 2008

Notes from the (Javascript) Noob: conditionally enable console debugging

Today I ran into a problem when the primary user of my monitoring app wanted to know why graphs werent rendering for him. I checked the site from my machine and all looked good. I checked the site from another devs machine, and again, everything was rendering. At this point I was confused.

I knew it had to be something in the javascript rendering, so I had the user install firebug. Instead of a JS error (or 10), the page loaded fine. Hmmm. I then wanted to see if the requests I was firing from the page to the Google Charts API were actually going through. We tabbed to the FB net tab, which was disabled. When I had him enable that, plus the console, the graphs rendered.

Doh! I was using console.log to check a value, and forgot that not everyone in the known universe runs with FB enabled. In order to continue to log, I've done this:

function log(str) {
var c = window.console; if (c) {
console.log(str);
}
}

I'm kind of surprised that (a) I wasn't getting a 'console object not defined' in the naive install of FB (which evaluated JS, but had console/net logging turned off), and that (b) if console was present as implied by (a), that logging would degrade gracefully. But the code above works, and I'll take that over sheer speculation.

Sunday, November 9, 2008

Kiran and Leela and Pork n Beans

This morning, hopped up on (whole wheat) pancakes and (lite) syrup, the kids and I rocked out to Weezer. In this age of Rock Band and Guitar Hero, it might seem lame to jam with tennis rackets, but we're old school. Kiran, Leela, consider yourselves blackmailed :)

Friday, November 7, 2008

I think I've got a Soccer Problem

When I was seven, all I liked to do was read. Read read read. My mom was and is a very wise woman and decided that being a wimpy, nerdy bookworm was the fast track to many beatdowns, and signed me up for AYSO soccer.

I hated the first season, didn't really understand what the hell was going on, and wanted to quit. I'm not sure why I didn't, but by the end of the second season I really loved the game. I loved the smell of the field, the oranges at halftime, and the feeling of being part of something bigger than just me. I loved playing, touching the ball, and would dribble and shoot on an imaginary goal framed by trees for hours and hours after school.

Note that love doesn't imply ability. I'm not overly coordinated, and that, coupled with a serious vision problem (brought on by all that reading), and my reluctance to wear glasses on the field, washed me out of soccer by high school. I really missed playing and got back into it when I turned 30.

People that play soccer when they're older tend to fall into two camps. There are the ex college/high school studs/studettes, who have amazing touch and vision and ability. They know exactly where they are, where everyone else is, and what is going to happen next. Then there are the rest of us, hacks who occasionally get a good touch or light up a good run and feel that all too brief moment of being connected to the worlds most amazing game.

I'm a spaz, occasionally doing something nice, sometimes having great games, sometimes having terrible games, most of the time having randomly great and terrible moments in the same game. My only real gifts are speed and endurance, both of which are slowly disappearing as I get older. I can pass OK, and have decent field vision at times, but my first touch is more accidental than deliberate, I have no air game, and I have a pathetically wimpy shot.

I've been on the same team for about six years. It's a great group of men and women, most of whom are much better than I am, and very patient. One thing I've noticed over the years is that we've started to focus less on the actual games and more on the beers after the game. Its just as fun to give each other crap after the game as it is to play. Sometimes more fun.

Every season I swear it will be my last. In tonights game I was trying to move the ball across the field with a defender at my hip. I tried to reverse on him when all of a sudden I found myself flat on the ground with a really bad calf cramp. I made it clear to the ref that the defender had nothing to do with me ending up on the ground, and limped off the field to enjoy the rest of the game as a spectator. I don't know why my body chose that moment to betray me, but it was enough to end my night.

I'm not sure why I keep coming back. As mentioned above, my speed is no longer keeping me in the game. Seattles dirt fields play havoc on my knees and ankles. Pacific Northwest weather in the late fall/early spring is the opposite of warm and dry. Guys in their 20s are starting to burn by me, making me feel like a slow old man. And lately, more often than not, I find myself in the middle of the game with no anticipation, consistently a 1/2 second too late to the ball, and (even though I now see 20/15 thanks to lasik), completely tunnel visioned.

But there are those moments, really brief ones, where occasionally I get a glimpse of what it is like to really play the beautiful game. Tonight I got a pass, touched it to my inside, and moved the ball up the field. I could see everything, and it felt like I had all the time in the world. I drew a defender to me and flicked the ball to an open space right in front of my wing, who touched it once and lofted a beautiful high shot over the goalies outstretched arms. It was textbook, it was beautiful, and for that brief moment I was not a spaz, I was a player. It's an elusive high that keeps me coming back looking for more.

I'll take 400mg of ibu and walk off that leg cramp now. It hurts, but I think not playing would hurt worse. Maybe I'll quit next season.

Monday, November 3, 2008

Crontarded (sigh)

Note to self: sometimes mistakes are painful. Sometimes they are funny. Sometimes they are both, and sometimes they are painful, but funny in retrospect. In any case, the best approach is to document it, so that it _never_happens_again. Here is an email I sent earlier today:

Subject: HI! I'm an idiot!

Hey Adam, you know when you came up to me and told me db-import was getting pegged every 6 hours? And Gil, you know when you were asking me one day around noon why you were handling requests every 1 minute?

well, in the crontab I was running a job like this:

* */6 * * *
which is really a great way of saying: every 6 hours, run this task every 1 minute. You see, I knew that, I just didn't _know_ that.

I've amended the offending crontab entry to :

00 */6 * * *

so that the job can run once and only once, every 6 hours, like God intended.

So, I'm sorry. You both can join the long list of people that (a) should punch me or (b) should get a free beer from me. The way I'm f*cking up today, that line will shortly be stretching around the block.

-- Arun

Those of you keeping score at home will know that the score now reads:
Compilers and OS's (not including windoze): 12,000
Arun: 0

Thursday, October 30, 2008

Logrotate: a tale of two config locations

I was trying to make sure my logfiles didnt grow disproportionately large by rotating them via logrotate.d.

Logrotate has two entry points:

/etc/logrotate.d -- this directory contains files that maintain the config settings of all logfiles you want to rotate.

/etc/logrotate.conf -- this file allows you to specify application specific log rotate settings as well.

I'm writing this down now because I had forgotten that I had already configured logrotate for one of my applications and modified the general config file logrotate.conf. When I tried to simulate log rotation by running with the -d parameter:

logrotate -d /etc/logrotate.conf

I recieved a 'duplicate entry' error, which led me to (re)discover the application config files in /etc/logrotate.d.

In general, I think it's a much better idea to do application level logrotate configuration in /etc/logrotate.d. It keeps files manageable and readable.

Here is a sample logrotate config file:

/var/www/rails/dashboard/current/log/*.log {
weekly # once a week
rotate 10 # keep 10 copies
copytruncate # keep original file handle (but truncate file)
delaycompress # delay compression until next rotation
compress # compress it
notifempty # do nothing if you don't need to
missingok # it's not a bad thing to not have a log file.
}

Generating a public key for those automated remote scripts

I've been using vlad to deploy apps lately, which makes deployment a breeze. However, for a complicated deploy, I'm usually asked at least 10 times to re-enter my password.

I was originally too lazy to set up a public key, but after doing 10 deploys one day (another story), I reconsidered. After all, typing the same thing over and over again = violates all kinds of fairly logical assertions, like DRY for one.

Here is what I did to set up a public key on my deployment server.

(1) on my client box:
ssh-keygen -t dsa

ssh-keygen will ask for a passphrase. Dont enter one -- that kills the whole point of using a public key to automate ssh/scp actions!

ssh-keygen generates two files in ~/.ssh:
id_rsa -- your private key used in ssh authentication.
id_rsa.pub -- the public key you can spray out to machines you want to copy things to.

Then do the following:

scp ~/.ssh/id_rsa.pub machine:/~/.ssh/authorized_keys2

now your logins should be password (and pain!) free.

Tuesday, October 21, 2008

Creating Objects on the fly in Ruby

I'm writing a file parser where I want to plug in different parsing modules depending on the kind of file I need to parse.

In order to do this without having to change code, I'm storing the configuration in a YAML file, like this:

home_page_total_clicks:

id: 2
match_string: "home_page"
measurement: "total_clicks"
clazz_name: "DummyParser"
date_comparision: 1

I'm specifying that for pages that are categorized as 'home_page_total_clicks', that I want to instantiate a class named "DummyParser". I'm thinking that in the future I could allow someone to specify an arbitrary parser to an abitrary file type.

The way Ruby allows you do instantiate classes from specified strings relys on the fact that all classes are constants, that you can retrieve:

clazz = Kernel.const_get(file_process_data.clazz_name)
processor = clazz.new(file_process_data)

and tada! a new processor -- note how I've assumed that a processor takes a file_process_data as an input. This will fail processors that don't have initialize methods that dont take file_process_data.

Tuesday, October 14, 2008

Swimming Breakthrough this AM

Image via WikipediaSome history: when I was twelve, my mom decided that the one way to 'drown proof' me and my sister was to get us on a swim team. My summers, and then my winters, became filled with laps and yards and intervals. Only one problem. I was a terrible swimmer.

I had the technique of a drowning man (hence the title of my blog), coupled with low body fat -- in the 6% range -- and the upper body of a Kenyan marathoner. Eventually, in high school, I started to dread swimming so much I would actually have nightmares about going to practice. I dropped out to spare myself any more sleepless nights, and proceeded to apply any lung capacity I had built up swimming to inhaling bong hits, which eventually led to more sleepless nights, but that's another story.

Fast forward to now, and I'm quickly closing in on 40. I've been an incredibly average bike racer, run some 1/2 marathons, and while my genetics don't point to world class anything, I do actually enjoy running and biking. So triathlons would seem like a natural next step, especially since soccer is becoming a beer and ibuprofen aided affair, and climbing takes too much time away from the kids right now.

But the thought of swimming, and the indelible imprints of suffering through thousands of yards very slowly and painfully, kept me focused on other things. Until now. I decided to actually take the time to learn how to swim, via Total Immersion. Another positive factor: my body fat has doubled, so I'm not quite the sinker I used to be.

Total Immersion teaches you how to swim better via a series of progressive drills. These drills start out very basic, i.e. you are floating on your back and kicking. They build up from there, but the keys are

swimming "downhill" by keeping your head down instead of looking forward.
swimming on your side, and pivoting from side to side.
driving that pivot from your core
pushing your chest down into the water because the air in your lungs will help you float.
barely kicking

This was completely counter to the way I swam, which was with my head looking forward, using my arms to drag myself through the water until my shoulders hurt, kicking spasmodically to try and float as effortlessly as the much better swimmers around me.

After a month of working on these drills, I was swimming with much less effort than I ever had before, but I still felt that something was missing. I still felt that I was expending a lot of energy, that it was hard to breathe, and that I was struggling to swim downhill.

After reading and re-reading the drills section of the Total Immersion book, I decided to try using FistGloves. They are what they sound like: rubber gloves that force your hands closed. What that does is dramatically reduce the amount of surface area that you have to work with. The idea is that by reducing your hand area, you will be forced to concentrate on balance as well as stroke.

Again, this is counter to what I had been taught. To work on stroke, our coaches used to give us paddles and pull buoys. The paddles increased the surface area during the pull, increasing the workload on the shoulders. The buoys were used to let us concentrate on pulling. The result was supposed to be increased strength that resulted in increased speed, but I always felt fast until I took the paddles off, and then I felt slow. And heavy, especially since I had to put the buoy away.

This morning was my first go with the FistGloves. I put them on, then pushed off of the wall. I immediately slowed to a snails pace. The bottom of the pool didn't glide by at all, and I felt completely out of balance. I finished the first 25 yards and had to rest.

I pushed off again and adjusted my balance, keeping my head down, focusing on driving from my hips (there was no other way to generate force). To breathe, I couldn't lift my head because I couldn't generate enough force with my arms. I had to roll to the side -- the way that I was supposed to. I had to pick my elbows up high, and drive my fist into the water by my goggles, and continue driving forward with that fist as I pivoted through to float on my side -- again, just like the Total Immersion drills.

Most importantly, I could feel massive changes in the amount of water I was pushing back with my forearms, just by keeping my elbows raised. I noticed -- for the first time ever -- that I dropped my leading elbow when breathing, and that my corresponding 'push' on the water completely disappeared. Removing my hands from the equation made any and all deficiencies completely stand out.

Maybe 300 yards later, I almost felt like I was moving normally. So I took the gloves off and pushed off. WOW. It was like someone had strapped dinner plates on my hands. I was gliding. Effortlessly. Flip turns, which I normally loathe because they require me to hold my breath, were fun because I was flying into the wall. Bilateral breathing, which used to destroy my form, became easy once I rolled to my side instead of of lifting my head. I concentrated on keeping my elbows high both in and out of the water, and extending my leading hand out for as long as I could.

This is the biggest breakthrough I've had swimming, ever. It was definitely one of those flow experiences. I'm having fun, in the pool, which 25 years ago was more of a source of torture than anything else. Now I'm actually excited to do a triathlon because I'll enjoy all 3 legs. I can't wait to get back in the pool!

Monday, October 13, 2008

rake and vlad and mod_rails

I'm using Vlad to deploy my application, because it's so damn simple. Actually, I tried Capistrano, which also seemed pretty simple, until it just didn't work.

I'm not entirely sure why, and I didn't delve into the details, because a friend walked by and said "Capistrano? Really? All the Kool Kidz are using Vlad. Don't be a dork!". Actually, he said "life is too short to read documentation. Vlad only requires a couple of variables in your deploy.rb, and no capify. And it works."

The one thing I _do_ like about vlad is that it builds on top of rake, which means there is one less thing for me to keep track of.

Here are my notes about using Vlad with mod_rails:

(0) my deploy.rb contains the essentials:

set :application, "myapp"
set :domain, "machinename.foo.corp"
set :deploy_to, " /var/www/deploylocation"
set :repository, "http://svn/svn/proj_name/trunk/"
set :user, "arun"

(1) mod_rails detects restarts by scanning RAILS_ROOT/tmp/restart.txt. So a restart is super simple:
namespace :vlad do
remote_task :restart do
run "touch #{deploy_to}/current/tmp/restart.txt"
end
end

Note there is no vlad:restart, and when I had added this in, mod_rails support wasn't in mainline vlad.
(2) running remote tasks is insanely trivial:

remote_task :link_dirs do

run("ln -sf #{deploy_to}/data #{deploy_to}/current/public/data")

run("ln -sf #{deploy_to}/graphs #{deploy_to}/current/public/graphs")

run("ln -sf #{deploy_to}/current/doc #{deploy_to}/current/public/doc")
end

remote_task :chmod_cron do
run "chmod 4755 #{deploy_to}/current/sh/deploy_monitor.sh"
end

(3) sometimes, I've got to just run one of these. Usually, I've got to run all of them, in addition to the standard vlad update and migrate tasks. Rake specifies ordered dependencies like this:

task :full_vlad => ['vlad:update','vlad:migrate','vlad:link_dirs','vlad:chmod_cron','vlad:restart1']

Note the way that the vlad namespace is specified in the dependency array.

Sunday, October 12, 2008

Google Charts API

I've been putting together a monitoring app using Rails as the app framework and RRD as the data storage / graphing engine. Early on, I was the one driving a lot of the requirements. Now that the app is in production, I'm getting a lot of reasonable requests that I didn't think of.

One of these is to display some specific data in a pie chart. For some of the statistics I was collecting, a pie chart made much more sense than the standard 1..n dataset in a time series RRD graph. When I first heard the requirement, my instinct was to use one of the well known ruby graphing engines, like Gruff or Scruffy. Scruffy, in particular, looks really useful. However I wanted to get a quick prototype out in front of my users to make sure they really wanted a pie graph, and for that the Google Chart API was the tool for the job. By reducing graph construction to an API call, Google makes it this easy:

basically, I've embedded the following call in an image tag:

http://chart.apis.google.com/chart?cht=chart_type&chd=t:data1,data2,..,dataN&chs=widthxheight&chl=label1|label2|..|labelN

While I may end up switching this out for a more full featured server side graphing engine -- the one at PullMonkey looks pretty sweet even if it does require flash -- the point is that I'm leveraging an API to get my app off the ground very quickly. Life is good when app construction is this fast.

Tuesday, October 7, 2008

Dealing with HTTP Timeouts in Ruby

The standard ruby open-uri library makes connecting to local and remote resources completely transparent, which is great most of the time, it lets me do things like this:

stats = Hpricot(open(interface_url))

and I can easily replace the url with a file at unit test time.

However, yesterday I was trying to consume some XML from a web service that is on a machine that is...pegged, like pegged with a load of 30, because it's a data mining box in the middle of a huge run. Also, producing the XML requires database queries, which exacerbates the load problem. So it took about 2 minutes to get back to me -- 1 minute more than the standard open-uri timeout.

Open-uri calls Net:HTTP to do it's HTTP protocol based file opens, and Net::HTTP is a lot more like a standard HTTP library in another language -- i.e. it lets you set timeouts. Unfortunately with flexibility comes some complexity, but it's nothing worth crying about:

@site_url = URI.parse(site_url)

http = Net::HTTP.new(@site_url.host, @site_url.port)
http.read_timeout=360 # timeout in seconds. Yeah, that's 6 minutes.
req = Net::HTTP::Get.new @site_url.path

res = http.start { |web|
web.request(req)
}

stats = Hpricot(res.body)

Of course, real production code would wrap this in a begin..rescue block and retry.

Friday, October 3, 2008

Moving from Wordpress

So I moved from wordpress. Mainly so I could take advantage of the Evri Widget. It's kind of sad to have worked on something so hard and not be able to take advantage of a widget that showcases what we do at Evri.

I've found blogging to be a great way to file away information that I have found useful in the past -- mostly technical stuff. I also use it to document my less embarrassing (the real embarrassing ones will just have to live on in my mind) experiences and aspirations. Lately life has been moving so fast that documenting it -- even if in snippets -- is much more useful than sitting around repeating history. Or, to use my favorite Joe Biden quote from last night: "Past is Prologue".