Tuesday, December 30, 2008

Bulk Resource Uploads via ActiveResource

I recently had to reduce the across the wire trips for the monitoring app I had hastily thrown together because the amount of time spent making trips serializing and deserializing individual resources was beginning to affect monitoring performance. The Second Fallacy of Distributed Computing was beginning to rear it's ugly Putinesque head.

I knew that this was coming, but premature optimization has never worked out for me, so I went with the default ActiveResource approach -- everything is a resource, and a CRUD operation on a resource maps to the corresponding http 'verb' -- until smoke started pouring out of my servers.

My basic requirements:
  1. Create a web service that can store data for hundreds of individual datapoints at 5 minute intervals.
  2. Those datapoints can come and go.
  3. The implementor of the statistics gathering code really doesn't need to know the by the wire details of how their data is getting to my web service.
Implied in these requirements is the need for efficiency:
  • I shouldn't have to perform individual CRUD ops on each statistic every 5 minutes.
  • I shouldn't have to make an over the wire request for data every time I want to read that data.
From those implications I arrived at the following distilled technical requirements:
  1. I need to bulk upload statistics, and create/update them in one transaction in order to reduce the need for individual CRUD ops. At this point I'm going to choose to 'fail fast', aborting if a single create/update fails, so that I know if something is wrong.
  2. I need to keep a client side cache of those statistics around, only updating them when they've changed (important aside: because this is a monitoring application, it is assumed that each statistic belongs to a single client, so there is no need for out of band updates).
The Juicy Bits
I'd love to go into a long digression about how I explored every which way to do this, but I'll summarize by saying that my final solution had the following advantages:
  • Uses the existing ActiveResource custom method infrastructure
  • No custom routes need to be defined
  • Complexity hidden from the user, restricted to client side upload_statistics call and server side POST handler method.
  • The priesthood of High REST will not need to crucify me at the side of the road.
ActiveResource extension:

I needed to extend default ActiveResource. By default, AR is not aware of data model relationships. For example, invoking the to_xml method on an AR class only shows it's attributes, even if you specify other classes to include, like this:


This limitation makes being smart about bulk updates pretty hard. I needed to introduce the notion of a client side cache, initialized and synchronized as needed.

My data model looks roughly like this:

Monitor=>has many=>Statistics

The default AR implementation of this looks like

class Statistic <>

I've extended as follows:
  • implemented an add_statistic method to Monitor that caches Statistic objects locally
  • Added an upload_statistics method to the Monitor that serializes the client local statistics and then sends them to the server.
  • modified the default POST handler for Statistic objects to handle bulk creates/updates.
  • initially loaded the statistics cache on the client side.
  • lazy synced the cache to the server side, updating on find and delete requests.

Client and Server code by Operation

I want to point out a couple of things in this code:

(1) Cache loading is done in Monitor.initialize(). That way it gets called whether the client is retrieving or creating a Monitor.

def initialize(attributes = {}, logger = Logger.new(STDOUT))
if(@@logger == nil)
@@logger = logger

@statistics = {}

if(attributes["statistics"] != nil)
attributes["statistics"].each do | single_stat_attributes|
@@logger.debug("loading #{single_stat_attributes["name"]}")
@statistics[single_stat_attributes["name"]] = Statistic.new(single_stat_attributes)


This required the following modification on the Monitor controller (server) side:

def index

if(params[:name] == nil)
@monitor_instances = Monitor.find(:all)
@monitor_instances = Monitor.find_all_by_name(params[:name])

respond_to do |format|
format.html #index.html.erb
format.xml { render :xml => @monitors.to_xml(:include=>[:statistics]) }
format.json { render :json => @monitors.to_json(:include=>[:statistics])}

I needed to make sure that returned monitor instances included child statistics in order to load the client side cache.
(2) get_statistic and delete_statistic synchronize with the server side.
(3) I've added a new upload_statistics method. I wanted to override save, but what I found at runtime is that the ActiveResource.save method calls update, which loads statistics as attributes. This wont work for us because some of those attributes may not exist on the server side, so an 'update' operation is invalid. In upload_statistics, a custom AR method posts the client side cache of statistics to the StatisticsController on the server side:

def upload_statistics

if(@statistics.length > 0)
data = @statistics.to_xml


Note that the first parameter is the method name, the second is the param options, and the third is the actual post data (that contains the serialized client side map of the statistics. The actual path that this POST gets sent to is /monitor_instances/:id/statistics.xml

In the server, I do not have to add/create any new routes, but I do need to make sure that the default POST handler checks for the bulk parameter and handles accordingly.

# POST /statistics
# POST /statistics.xml
def create

if(params[:bulk] == nil)
# handle a single update
#handle a bulk update

Marshalling and Saving stats on the Client side.

In the StatisticsController,create handler, I need to unmarshall the xml into statistics. There are these instructions to extend ActiveRecord via the standard lib/extensions.rb mechanism, but they won't work for me because I'm serializing a hash, not an array of Statistic objects. So I need to deserialize and create/update objects by 'hand', which actually isn't that hard:

cmd = request.raw_post
monitor_instance = MonitorInstance.find(params[:monitor_instance_id])
hash = Hash.from_xml(cmd)

hash["hash"].values.each do | options |
stat = Statistic.find(:first,
:conditions=>["monitor_instance_id = #{params[:monitor_instance_id]} and name = '#{options["name"]}'"])

if(stat == nil)
#create a new Statistic object
# update existing statistic object

respond_to do |format|
statistics = Statistic.find(:all,
:conditions=>["monitor_instance_id = #{params[:monitor_instance_id]}"])
format.xml { render :xml => statistics.to_xml, :status => :created, :location => monitor_instance_path(@monitor_instance) }

In the code above, I deserialize the xml payload using Hash.from_xml, which creates a hash around the hash encoded in the xml data.

To get to the original hash of statistics options, I had to extract them from the encoded hash:

hash = Hash.from_xml(cmd)
hash["hash"].values.each do | options |
# create / update the stat that corresponds to options["name"] under the monitor

This took a lot longer than expected, because I ran into issues with trying to use standard methods, i.e. save, that I still don't understand. However, I know a lot more about AR and how to extend it to do more intelligent sub resource handling.

Reblog this post [with Zemanta]

Sunday, December 21, 2008

Best advertainment webisode ever.

This one made me laugh so hard I pulled something.


If this is the future of ads, I'm hooked!

Thursday, December 18, 2008

Sinatra, my new favorite prototype playground

About a week ago I was trying to get something ready for the first annual Evri Hack-a-thon, a concentrated 2 day affair where we focused on putting together cool apps with the new Evri API. The event was a blast, I for one rediscovered how fun writing code for code's sake really is.

I was implementing a 'music browser' mostly in javascript, and needed a proxy server to make calls out to those services that didn't have JSONP support.

slight digression here. JSONP is the coolest thing since sliced bread. I say that as someone who loves bread, even more so when it is sliced. The ability to retrieve data w/o a backend is so powerful I _almost_ understand why it's been seen as a Terrible, Horrible, No Good Hack. But not really, because it makes life as a developer so much easier.

I wanted to spend most of my time in the JavaScript, not futzing with the backend server. Because I've been mostly coding in Ruby for the last year, that ruled out rolling up a quick Java Servlet -- I didn't want to spend any time installing Tomcat/Jetty and associated jars, and having to remember how that world worked. I also didn't want to write a Rails app -- seemed ridiculous when I didn't have a data model.

I looked around at a couple of lightweight Ruby Frameworks, like Camping and Merb. Camping would have required me to down version to 1.8.5, and Merb overwhelmed me with the volume of configuration choices. In other words,my ideal proxy server had to be stone cold simple because I simply didn't have the time for anything else.

Enter Sinatra. Elegant, concise, and witty, just like it's namesake. Here is how you configure a path to /json/getjswidgets in Sinatra:

get "/json/getjswidgets" do
cb = params[:callback]
href = params[:href]

A couple of things to note in the example above:
(1) params are retrieved with the params hash, just like in Rails. So this method was actually called as:

/json.getjswidgets?callback={temp callback name}&href={some value}

(2) all paths are handled with the same 'get...do...end' syntax. It's that simple.

Another example:

get "/json/artists/:name/album" do
cb = params[:callback]
name = params[:name]

Note that the name parameter is embedded in the path, just like you do in the standard routes.rb file in rails.

Once you get past the path routing (which takes about as long as it does to read this sentence), Sinatra continues to be blissfully easy by allowing you to render the view via erb, builder, haml, and sass. You can render the view inline, or modularize it by putting the files in a view directory.

Helper methods are defined in a helpers block:

helpers do
def helper_method


Static assets are kept in a public directory -- again, Sinatra takes a "if it ain't broke..." approach that really minimizes the learning curve. Normally, I loathe the whole "But Ours Go To Eleven!" mindset that I see in frameworks because it means that I have to once again learn another unique set of concepts to get anything done. Sinatra does the exact opposite in leveraging a well known, well used, well understood set of conventions/concepts from Rails while stripping the concept of a framework down to that which is as simple as possible, but not simpler. Sinatra, you're my new BFF!

Sunday, December 7, 2008

Converting to a Single Speed

Why Single Speed? A combination of 'luck' and timing have led me to re-rig my commuter bike as a single speed. The 'luck' part was a pulley on my circa 1995 rear derailleur exploding. The timing part is the rise of single speeds in general. I've been noticing the rise in single speeds in the last couple of years as a bike commuter. They look so....simple and maintenance free!

I'm riding a 14 year old Kona Kula, once my singletrack steed, now my urban commuting stalwart. The key thing about converting a bike with vertical dropouts into a single speed is that you can't slide the rear wheel back and forth to get the perfect chain tension. You need a chain tensioner. A chain tensioner is like a derraileur-lite that pulls the chain taut. There are several brands out there, all of which make a simple, bullet proof device.

The other key thing about converting a standard bike into a singlespeed is what to do with your rear cluster. There are a number of freehub to singlespeed conversion kits out there that provide spacers and cogs to replace your freewheel.

I ended up choosing the Forte Singlespeed conversion kit, made by the Performance Bicycle house brand. This was the only brand I found out there that offered the freehub spacers and cogs, as well as the chain tensioner, for by far the cheapest price -- for $25 I got everything, including 3 cogs to experiment with. Compare that to the Surly solution, which was going to cost $50 for the chain tensioner, and $30 for the spacers, and $10+ for the cog.

It also had what I considered to be a key feature: it allowed me to adjust the horizontal placement of the tensioner. This was important because I had no idea where I would be placing the cog to line up with the chainring.

I also wanted to try using an original cog and chainring, since I had replaced them a year ago and they weren't completely beat down yet. The preferred way to go is to do a clean replacement, but that would require a new chain and front chainring, and I wasn't sure that I could find a replacement front chainring without a special order.

Installation was easy, and gave me a chance to clean my bike for the first time in 6 years!

Tools required:
  1. a cassette removal tool and chainwhip for freewheel removal.
  2. an allen wrench for the usual.
  3. a crank puller to remove the inner chainring on the triple.
  4. a chain tool to break and resize the chain.
Step 1: remove the chainrings. The only way to get the inner chainring out is to remove the crank from the bike. The optimal position for the new chainring is in the middle position of the triple crank. But the chainring I wanted was 44 tooth and too big to use in the middle position -- it rubbed the chainstay -- so I had to keep it in the outer position:

This is where the horizontal adjustablilty of the Forte chain tensioner became really useful. It let me slide the tensioner cog over to the outside with an allen wrench.

Step 2: remove the freewheel using the chainwhip and the freewheel tool.
Step 3: install the chain tensioner where the real derailleur used to be.
Step 4: position the singlespeed cog -- using spacers to fill up the freehub around that cog -- and the chain tensioner cog so that they are inline with the chainring. This is important. If you don't line things up, the chain will derail. In the picture below, note the spacers around the cog. Because I installed my chainring on the outermost position, I've had to position the cog at the outer end of the freehub (with only one spacer between it and the cassette lockring).

Step 5: whip out the chain tool resize the chain so that the chain tensioner is engaged (i.e. it has tension).

I used a 16 tooth cog from my old freewheel, and my existing chain. This may not work because that cog was designed to be 'shiftable', and the ramps on the cog body may derail the chain. However, I wanted to give this a try before buying a new chain and front chainring.

Reblog this post [with Zemanta]