CouchDB and Me

Thursday, October 8th, 2009 1:20 am

So, as per the advice of my friend Yuce, I have begun blogging about various types of software, technology, and whatever else I see fit to project onto the masses. I decided to start with a recent “crush”, CouchDB.

Relational databases are simple. Anyone with experience using spreadsheet software understands the basic functionality – a series of rows conforming to a specific, pre-defined series of columns and data types. A lot of data can be easily organized in this manner. Most data can be roughly shoved into it. However, ubiquitous does not necessarily mean appropriate.

CouchDB is just one example of a document store, which, as opposed to a relational database, means you can insert an object with almost any structure, with wildly disparate data types. These are inserted into a single “database” (which should be renamed to something more akin to ‘documents with some sort of unified meaning’) and can be retrieved in different ways. You can grab all of the documents, or group documents by certain attributes, or extract data from various objects and correlate it into entirely new data structures on the fly.

This is accomplished with the Map / Reduce ideology, where (in brief) you Map (or group) document data by a key or attribute, and then optionally Reduce (or correlate / extract / etc.) by some alternate algorithm. (For more information on Map / Reduce, see here and here.) This is wildly different than simple data abstraction and union done with traditional relational databases, and provides an almost unwieldy amount of freedom.

CouchDB is interesting in the confluence of the following factors: access interface, storage notation, retrieval language, and replication process. It uses a simple RESTful interface, meaning that data can be extracted with plain URLs, and data can be inserted with simple POST calls. The data itself is created, and stored, in JSON, something with which most every web developer worth his mouse has interacted. As such, there is almost no need for a data-to-object mapper –many languages (PHP, Python, obviously Javascript) include this feature either by default or with simple additional libraries. The Map / Reduce logic is written in Javascript and is stored with the document database. Finally, replication is a simple URL call, with a source (local or remote) and destination (local or remote), which performs at a fairly rapid rate.

These factors combine to create a system that is a.) accessible to a wide range of web developers, b.) trivially implemented in a variety of languages, and c.) easily replicated / transported / backed-up. Any web developer should be able to hack a JSON object and write the simpler Map / Reduce logic, and subsequently open storage possibilities from an overly-fragmented table design into a simple, extensible document paradigm.

In future blog posts, I’ll write about compiling CouchDB on different systems, the potential pitfalls this new approach creates, and my own personal adventure with the system. Already I’m rediscovering that age-old technology problem — when you really like your new, shiny hammer, every problem starts to look like a nail.

2 Responses to “CouchDB and Me”

  1. Yuce Tekol says:

    Josh, this is a terrific article, thanks for it.
    Looking forward to the next one!

  2. [...] Pondering the Obvious Josh Marshall posting about technology and other tools of destruction. « CouchDB and Me [...]

Leave a Reply