Monday, March 16, 2009

On the Couch(DB): Part 1: setup on mac OSX Leopard

Why CouchDB? I'm running into situations where a traditional database is giving me headaches:

(1) At work, I'm seeing people spend more and more time maintaining the database and tweaking the schema behind an ETL application used to normalize input data into a standard format as the data being persisted starts to get into the TB range.
(2) At home I want to write an app storing my GPS data from my runs and rides, and I have no idea what the metadata around that data is going to be.

The common denominator for both of these situations is that a traditional database schema, with very explicitly defined tables and relations, is a liability, either eventually, as in case 1, or initially, as in case 2. In either case I'm more likely than not to be wrong with any schema design choices that I make.

In both situations, a more flexible approach would be to treat each discrete set of data as a bag of attributes (key value pairs) with a couple of unique keys that I can search on. I could see creating a Lucene or a BDB with those keys and a pointer to a file containing the rest of the data. This would work pretty well, until it was time to go to production or scale, or both. When I go to production, I want the data to be highly available. When I scale, I want the data to be partitionable. To do either I would have to extend my naive hashing scheme.

CouchDB is a document based database -- in other words each document is an attribute bag identified by 1..N keys -- that is fault tolerant and distributable, with incremental replication. In other words it is the extension my naive hashing scheme needs to be a real boy instead of a puppet.

Enough hype and bloviation...I need to install couchdb on a macbook pro running leopard. I followed the instructions found here, and quickly discovered I had neglected to install XCode when I had upgraded to Leopard. Doh!

btw, if you ever do this, you'll find that macports has downloaded and staged components into /opt/local/var/macports/build, and may not be able to build them successfully after failure due to lack of compilers, etc. Removing the staged directories for the components that were downloaded is the best way to 'reset' macports. Trying to configure, build, and install macported components yourself is not :)

I did run into a minor bump: tcl was installed at 8.4.14_0 on my box, I needed to upgrade to 8.5.6_0 because sudo crapped out trying to build tk. I did this by hand:

sudo port install tcl --version 8.5
sudo port deactivate tcl @8.4.14_0
sudo port activate tcl @8.5.6_0 (using sudo port list | grep tcl to get exact version numbers)


OK, with XCode and dependencies installed, I then followed the rest of the instructions, which, beyond building and installing couchdb, show you how to create the required couchdb system acct, install it as a service, and launch it at startup. All Leopard specific, but all highly convenient.

I also found this one step install -- good for getting up and going, but I think in the long run I want to build and understand couchdb a little more than this approach lets me.

Next up: simple couchdb access.

1 comment:

  1. This post or article gives us good information about database as to how we can make our database in proper manner. If we hire some professional to do this work as the writer can do can you do my essay better than us we can do much better work. Maintaining database is not an easy work to do especially in professional working environment.

    ReplyDelete