Waving Not Drowning: April 2010

Wednesday, April 21, 2010

Synchronization Redux

In my last post, I discussed how I had refactored the synchronization around some data structures to reduce the need to lock those data structures except when necessary. My solution was cleaner in that the synchronization was localized, but the actual synchronization was still a little bit hairy in that it required a recheck inside the lock:

int newVersion = specialFooCache.getCacheVersion();



 // we only want one thread to refresh the cache, and we 

 // want the other ones to keep using the old data structures.



 if(newVersion != cacheVersion && inProgress == false) {

  synchronized(lock) {

   // first thread is in b/c cacheVersion has not been updated yet.

   // All other threads    evaluate to false.        

   int checkVersion = flexibleStrategyCache.getCacheVersion();

   if(checkVersion != cacheVersion) {

}
}
}

But I was (a) stuck and (b) had other code to refactor under the gun. So I didn't think about it much until today in the code review. I'm really glad I code reviewed with John, because he is a relentless simplifier whose lives by the 'less code' motto.

Right away I could tell that the whole recheck thing wasn't sitting well with him. The more I explained, the more concerned he looked, until he spotted a potential race condition between threads that might have piled up outside the synchronize when the someCache had incremented while it was being reloaded. These threads would immediately reload even when they didn't have to, which didn't affect the integrity of the in memory data structures, but was definitely a bad use of cpu. He also was insistent that there had to be an easier way, and under his direction this is that easier way:



 int newVersion = 0;

 // lock is shorter

 synchronized(lock) {

  newVersion = specialFooCache.getCacheVersion();

       

  // leave right away if in progress

  if(cacheVersion == newVersion || inProgress == true) {

   return;

 }

       

  inProgress = true;

 }

                

           

 try {

  // load from cache

 

 } catch (Exception e) {

 ...

 } finally {

  synchronized(lock) {

   // set the version. 

   cacheVersion = newVersion;

   // we are done here.

   inProgress = false;

           

  }

 }

A couple of things to note:
(1) This solution is definitely easier and more performant in addition to being correct. The lock is not held during the reload, and when it is released other threads will exit the function immediately if a cache rewrite is in progress.
(2) The key thing that I learned from watching John solve the problem is that he wasn't happy until it was dead simple. When I don't take this approach, or abandon it under pressure, there are some potentially costly ramifications. The original solution had a complexity smell around it that I chose to ignore. The problem with ignoring that smell is that it comes back to haunt after you've forgotten why you wrote it in the first place. The solution is, of course, to write code that is simple enough to re-understand months from now. Or simple enough for anyone not familiar with your code to understand.
(3) the lock at the bottom is not actually necessary since (a) the variables being set are volatile and (b) only one thread will get this far. However, since only one thread is getting this far, it is not a performance issue, and actually lends some clarity to the resetting of test conditions.
(4) I still feel like this is somewhat a work in progress...I'm not convinced that (3) is really required (even though we both signed off on it). But I do feel a lot better about the current algorithm. That said,
(5) I love code reviews. The knowledge transfer is huge. I'm always looking for ways to strip my code down, and the insight I get from a good reviewer (like John) raises my game tremendously.

Thursday, April 15, 2010

Synchronization Requirements

Recently I was lucky enough to pull the short straw and refactor the dreaded FooManager (name changed to protect the innocent). FooManager was a horrendously complex, overwrought class that suffered from at least two God Complexes, and my team had been playing hot potato with it for the last couple of months, since we learned we had inherited it.

My mission was to make FooManager understandable/maintainable without requiring heavy ingestion of psychotropic substances or becoming suicidal. After doing the usual de-Godification and Cleverectomy procedures, i.e. choosing decent abstractions, removing static methods, etc, I was left with a fundamental synchronization dilemma. It was far from the most annoying part of FooManager, but it was definitely the most interesting to refactor.

FooManager received updates for each specific Foo subclass from different caches. It registered for those updates using a callback interface. The cache notified FooManager whenever it changed. This seemed like an innocent enough pub/sub pattern. But it made managing the datastructures in FooManager -- the ones that held pre-computed indexes of Foos by name, Foos by id, Foos by other id, etc -- tricky.

All access to these data structures were done within read/write locks. I think the motivation behind using a read/write lock instead of a more generic synchronization mechanism was to optimize reads at the expense of writes. This made sense, given that writes were infrequent and driven by periodic cache changes, and reads did not need immediate access to new data. But it meant that any time you wanted to access the structures, you did so within the confines of a read/write lock. Even when all you wanted to do was read the data, you needed to acquire the read lock on it.

The event-callback and the read/write lock had a Design-Pattern-as-a-Hammer smell to it, and as I thought about it some more, I realized why.

(1) The data structures under read/write locks did not have to be immediately consistent with the cache, they had to be eventually consistent with the cache. That meant that a cache change did not require immediate and total synchronization.
(2) The read/write lock was overkill, but necessary because of the asynchronous nature of the cache refill.
(3) The loading performance of the cache was really minimal, since the cache data structures were in memory by the time the notification occurred.
(4) What we really wanted to do was only block when a thread was updating the data structures. We didn't want other threads re-updating the data structures. But they shouldn't be blocked from accessing the current data structures while the update was going on.

The eventual consistency and low reload overhead of the system made it possible to do away with the event callback interface and poll the cache for an update by checking a cache version number via a synchronized method. Heres how it worked:

Every request would check the cache version.
The first thread that got an updated version number would enter a lock and update a set of temporary data structures.
All other threads would block on the lock, enter once the original writer method had exited, make the version check, get the same version that they had entered with, and not attempt a reload.



synchronized protected int checkExpired(int currentVersion) throws Exception{



  int newVersion = specialFooCache.getCacheVersion();

  if(newVersion != currentVersion) {

   // update temp data structures.

   ...

   // assign temps to member data structures

   ...

  }

  return newVersion;



}

Prior to every request, I checked the cache version and reloaded if necessary:



 // cacheVersion is a member variable of FooManager

 cacheVersion = checkExpired(cacheVersion);

 // now access structure w/o locks.

This was, well, OK. It was definitely more simple, but now all reader threads were blocked while waiting for the first one to update the data structures.

Thinking back to the original conditions, I remembered that as long as the data structures were eventually updated, all non writing threads would be just as happy using the original structures in an unblocked manner. In other words, we only needed to make sure that one thread reloaded the original structures.

This required synchronization at a more granular level: I removed synchronization on the checkExpired() method, and inside of it, blocked the rewrite code using a conditional statement followed by a lock, and allowed all threads that had gotten past the conditional and stuck on the lock to exit once (a) the lock was released and (b) they realized that they didn't need to do the cache reload.



protected void checkExpired() {



 int newVersion = specialFooCache.getCacheVersion();



 // we only want one thread to refresh the cache, and we 

 // want the other ones to keep using the old data structures.



 if(newVersion != cacheVersion && inProgress == false) {

  synchronized(lock) {

   // first thread is in b/c cacheVersion has not been updated yet.

   // All other threads    evaluate to false.        

   int checkVersion = specialFooCache.getCacheVersion();

   if(checkVersion != cacheVersion) {   



    inProgress = true;

    // reload from cache

    .....

    // set data structures so that subsequent threads get

    // the new data

    ....

    // set the version. 

    cacheVersion = newVersion;

    // we are done here.

    inProgress = false;

   } // end version recheck

  } // end lock. 

 } // end version and inProgress check

Note that in order to prevent thread local caching and allow all threads to get immediate access to the data structures after they were reassigned, I declared the data structures that I was updating as volatile.

The simplification of the cache update process made it a whole lot easier to understand, I think that the conditions -- specifically the eventual consistency -- allowed us some latitude in when reader threads actually got the data. There isn't a specific by the numbers approach to solving synchronization issues, but it always helps to understand what you are synchronizing, and more importantly, what you don't have to synchronize.

Thursday, April 1, 2010

Using Google Maps API v3 + JQuery.

My side project to display and do some detailed data mining of my GPS exercise data, has been languishing for the better part of a year while my day job(s) have been taking most of my day and night time. Garmin has a pretty decent Mac desktop program that provides graphing, graph zooming, and stats by mile, but I'd rather have all of that functionality (and more!) via a web UI. I'd also like to integrate the results of the data mining into that UI in a useful way, i.e. mining relative effort over similar terrain to track actual fitness over time.

I decided that this project is going to be about having fun and nothing more, and as such decided to write the UI first, because all of my work these days is server side, Java, so 'fun' for me involves more dynamic languages, i.e. JavaScript and Ruby.

I wanted to display my gps data as a route on a map, which meant getting up to speed on the Google Maps API, and writing a quick dummy server that could dump out some route data for me.

I decided to use the latest Google Maps API v3 , and of course JQuery for the front end work, and mocked up a quick backend server using Sinatra. I can always redo that backend in something more robust once I want to actually deploy, but for now getting the data to the page is more important than how fast that data is retrieved, or machine resources consumed by serving that data.

Part 1: Displaying The Map

I needed to include the google maps api v3 js:

http://maps.google.com/maps/api/js?sensor=false
and the latest JQuery:
http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js

Then in the ready function, I created a map by pointing it to a div and passing in my options.



$(document).ready(function() {

    var myLatlng = new google.maps.LatLng(47.5615, -122.2168);

    var myOptions = {

          zoom: 12,

          center: myLatlng,

          mapTypeId: google.maps.MapTypeId.TERRAIN

    };

    var map = new google.maps.Map($("#map_canvas")[0], myOptions);

Note above that I'm assigning the map to a div with id = "map_canvas".

Part 2: Parsing the GPS data

Eventually I'd like to upload data directly from my device, but for now I'm going to skip that part and 'pretend' I've already done it. GPS data is exported in the TCX format, which is Garmin-proprietary, but is the easiest to use right now. My current desktop program has the ability to export 1 to N days worth of data into tcx.

At some point parsing tcx using a DOM based parser was going to start hurting, so I decided to use a SAX based parser from the start. My usual choice for quick n dirty XML/HTML parsing, hpricot, option was therefore not an option. I investigated nokogiri, but eventually settled on libxml, mostly because the rdoc on sax parsing was very clear, and it was much faster for sax parsing.

I mainly wanted to parse lat-long data out of the tcx file and dump the coordinates into another file in JSON format. Here is my 5 minute hacked together code:



class PostCallbacks

include XML::SaxParser::Callbacks



def initialize(write_file) 

  @state="unset"

  @write_file = File.open(write_file,"w")

  @buffer = "{\"data\" : ["



end



def on_start_element_ns(element, attributes, prefix, uri, namespaces)



  if element == 'LatitudeDegrees'

   @state = "in_lat"

  elsif element == 'LongitudeDegrees'

   @state = "in_long"

  end

end



def on_characters(chars) 

  if(@state=="in_lat") 

   @buffer += "{\"lat\": #{chars}"

  elsif(@state == "in_long")

   @buffer +=  ", \"long\": #{chars}},"

  end

end



def on_end_element_ns(element,prefix,uri)  



  @state="unset"

end



def on_end_document() 



  @buffer = @buffer.slice(0,@buffer.length-1)

  @buffer += ("]}")

  @write_file.puts(@buffer)

  @write_file.close()

end

end



parser = XML::SaxParser.file(ARGV[0])

parser.callbacks = PostCallbacks.new(ARGV[1])

parser.parse

Part 3: Serving The GPS Data Up

My goal here was to basically dump that generated file as a response to an AJAX request. Sinatra is perfect for delivering quick services like this. I use Sinatra's DSL to handle a GET request as follows:



require 'rubygems'

require 'sinatra'

require 'open-uri'

require 'json'



  get '/sample_path.json' do

   content_type :json

   File.open("../output/out.json") do | file | 

   file.gets

   end



  end

The ../output/out.json is where I parsed the lat-long data from the tcx file into.
I ended up spending a lot of time debugging my get method (http://localhost:7000/sample_path.json). In FF I was unable to get a response back from my GET request. I googled around and apparently there are some compatibility issues with firefox 3.6.2 and JQuery. I was however able get the code to work in Safari, and I'm considering downgrading to FF 3.5 because I haven't seen those kind of problems with that browser, and Firebug is an essential part of my debugging library.

Part 4: Drawing The Data On the Map
The JS code that made the request loads the results into google.maps.MVCArray, which it then uses to create a polyline superimposed on the map:



var url = "http://localhost:4567/sample_path.json";

$.ajax({

  type: "GET",

      url: url,

      beforeSend: function(x) {

        if(x && x.overrideMimeType) {

           x.overrideMimeType("application/json;charset=UTF-8");

       }

      },

      dataType: "json",

      success: function(data,success){



         var latLongArr = data['data'];

        var pathCoordinates = new google.maps.MVCArray();

        for(i = 0; i < latLongArr.length; i++) {

     // each coordinate is put into a LatLng.
     var latlng = new google.maps.LatLng(latLongArr[i]['lat'],
        latLongArr[i]['long']);
     pathCoordinates.insertAt(i,latlng);
   }
   // and this is where we actually draw it.
   var polyOptions = {
     path: pathCoordinates,
     strokeColor: '#ff0000',
     strokeOpacity: 1.0,
     strokeWeight: 1
   };

   poly = new google.maps.Polyline(polyOptions); poly.setMap(map);
   }
});

The Result

Not super impressive, but a good start!