Friday, August 27, 2010

Beware the Emerging God Object

Most God Objects are pretty obvious. Bloated, side effect filled, their existence guaranteed by collective fear about refactoring them to a state where their previous functionality is not reproducible.

The God Object that derailed me for the last couple of days was not bloated, it's functions (for the most part) did not have side effects. It did a lot, but did not seem to be overwrought.  In fact, I'm pretty sure it only had a minor God Complex until I started refactoring it!

As a software engineer, my goal is to produce the most effective, robust software with the least amount of effort. So, in an attempt to not repeat last week's thrash-fest, I'm going to catalog the warning signs, the resolution, and the corrected approach, because I'm pretty sure I'm going to be in this situation again.

The Task:
We need to make an existing protoype table driven. The prototype functionality configured and ran A/B tests. Its configuration data was currently hardcoded into an enum.

Warning Signs Before We Even Started:
  1. The core logic of the A/B test configuration  resided in a single object.
  2. Despite having several distinct (logical) sub components, this object was responsible for serializing itself and those sub objects.
  3. This object existed in both our configuration time and runtime environments, which have significant differences, the biggest one being that at configuration time, all objects are read/written to a relational database, and at runtime, they are read from a BDB store. Most conceptual objects in the system have config time and runtime implementations.
  4. We could not get rid of this object because it was still being used in production. We would have to implement a phased approach to upgrading it. 
  5. I did not know the codebase well. 
  6. Our sprints are a week, so we felt compelled to fit in an end to end task into a week. 
The first four warning signs are simply facts. The last two are things that I should pay more attention to. If I don't know the codebase well, and I'm trying to make significant improvements in a week, my chances of success go way down. I chose to ignore that reality.

Warning Signs Once We Started:

  1. The table driven logic did not mesh well with the existing logic. The developer who had written the code had serialized all sub objects into a JSON string. The object was then serialized into a BDB file as a set of attributes and the JSON string, not an object graph. Trying to work with the new table data (an object graph) mean that:
  2. Any change I made to the code had side effects. This is because I was trying to preserve the original signature of the object to maintain backward compatibility with the rest of the system. This became obvious when:
  3. The original code, fairly straightforward, became conditional-heavy. Instead of acknowledging the additional complexity, 
  4. I was choosing to 'hack' more because I knew that we would have time to rework the code. I knew I was hacking because
  5. The developer I was working with started to get more confused with every checkin I made. 

Do Over!
When his brow started cramping up from being so furrowed, I realized that things were wrong. We grabbed a whiteboard and went to work, depicting the original workflow, how adding table driven configurations changed that workflow, and how to change the existing workflow as little as possible to reduce the amount of rework that we would have to do in the runtime portion of the code.  Most of the good ideas came from our dev lead, who had been silently (and then not so silently) observing us trying to make sense of it all. I mention that because he was operating at a distance from a lot of the complexity, and his solution avoided most of it. I felt that he out of all of us was able to make the necessary leap out of the stew of constraints and details, and reframe the problem in a way that made it a much less complex problem. That's the genius of software. That's what I do, but only sometimes. That's why I'm writing this down.

The Problems and Their Solutions: 

  1. The old functionality contained both runtime behavior and configuration and behavior. We chose to separate the two, resulting in a POJO configuration data object used at configuration time, and the 'action' object that knew how to initialize given one of those data objects and contained runtime behavior.
  2. More separation of concerns: The old way of storing data in a JSON string  and the new table driven way of storing data did not belong together in the same object.  Even though we separated the old object into a data object and an 'action' object, we chose to keep the old data separate from the new data, and access both through getters/setters, which insulated us from the underlying implementations. 
  3. Composition: because we didn't have enough time to rewrite the infrastructure that wrote the configuration object into a BDB, we needed the new data to be persisted into  configuration object as part of the JSON string. Fortunately, once we separated out the data from the configuration object, we could compose the JSON string outside of the configuration object using the POJOs, then set it prior to serializing the configuration object to the BDB file. 

The warning signs before start played out into more significant problems before they were fully resolved. And, when I think back on the process, I realize I was uncomfortable with the changes I was making, but I chose to push on. Because time was relatively compressed, I found myself making 'poor choices', which ultimately backfired. Next time (these are notes to myself, the 'you' I'm taking to is 'me' :)

  1. Acknowledge risks up front. Not a lot of time to get it done? That's a risk. Don't know the codebase? That's another risk (assuming the codebase is non trivial). 
  2. Examine the old code and determine if there are concerns that should be separated before anything else is done. 
  3. Determine the scope of those changes to see if they can be met in the desired time frame.
  4. If not, break the project into smaller chunks. 
  5. If it feels easy it's right. If it feels hard, it's less right. Be more right. 
  6. Do not hack. When you find yourself hacking, stop. Back up and revisit initial assumptions. Step away from the problem.
  7. Discuss ideas often, with others. If someone else can understand it easily, it can't be that bad. If, on the other hand, their brow starts to wrinkle, and your blood pressure starts to go up because there are not words in the English language to describe what you need to describe, it might just be worse than you think. 
More Conclusions: 

  1. God objects never started out as God Objects. They started out as high level concepts -- "oh, I need something that does X", and mutate because of bolted on additional functionality. 
  2. Most developers don't set out to create a God Object. In this specific case I was trying to save effort by reducing the amount of change to the original code. However, the additional effort to explain and understand my 'bolt ons' was costing real time and effort. 
  3. The earlier you de-factor complexity out of a God Object, the better. Decomposing layered functionality into discrete objects makes it really easy for other people to recognize what those objects are for, and not overlay them with functionality that addresses other concerns. 
  4. Conversely, the longer you wait, the harder it is. God Objects tend to have code with lots of side-effects and implicit assumptions. Refactoring those successfully is hard, even with comprehensive unit tests.