Hadi speaks in one of the scariest rooms ever, It’s built on the top of this sports arena. Way way up with a ridiculous steep angle down to the stage. Butterflies in my stomach:
Prelude
Specifications today are written in the language of the client but most of the data modeling is done in a relational model. Hadi talks about foreign keys and the idea that a invoice have to have a customer because the relational model requires a foreign key and a bunch of other things. This makes some changes inflexible, like a client wanting to invoice an anonymous customer.
We have an impedance miss match between how we look at the data and how the client looks at the data. When client talks about invoices they see the actual invoice, when we see invoices we see the relational mode.
In Domain Driven Design, et al, we model the clients perspective using objects and then use an ORM to map between the business perspective and the relational model. Which in Hadi’s opinion brings a heap of other problems into the flexibility of responding to change and feature implementations.
Hadi’s session will show how to work with document databases using CoachDb
The Tool
CouchDB is a document database that stores complete documents in JSON. It has ACID compliance for each documentation and uses REST to access the stored documents. The initial engines was built in C++ was was ported to Erlang because of it’s strength in concurrent operations, which accessing and writing data is really about. It works on a lot of platforms; Windows, Linux, Andriod, Maemo, Browser Couch (HTML 5 storage) and CouchDB is able to synchronize data between any number of instances on any type of platform.
CouchDB doesn’t have a schema, well there is a small schema with an ID (_id) and a Revision (_rev), which allows you to store documents in any format you want and change that format during the documents lifetime.
Since CouchDB is built as a REST-based db, everything is done through HTTP (GET, POST; UPDATE and DELETE) and JSON which makes it very easy to access either by running your own HTTP requests or using the browser based admin app Futon.
Writing and versions
CouchDB uses an insert only approach, so each write to a document creates a new version of that document. This means you can track changes through the documents lifetime, but it also means that there is no locks and thus data consistency will follow the Eventual Consistency paradigm.
CouchDB enforces optimistic looking by enforcing the rule that you have to pass a revision number when updating.
Since the nature of CouchDB is to be distributed, this means that you can have a lot of versions spread out in your eco-system up-until the synchronization occurs. This makes it scalable but changes how we have to think about consistency.
When creating new documents, the engine wants you to specify an iD. Anything can be an ID but the DB uses UUID as the default and let’s you generate them easily.
Queries
Since the REST-Api only allows you to query on ID’s you can’t do this;
select * from customers where email =’’
Which means that you have to use MAP / Reduce. That is create a Map (a view as another document) of the data which is a javascript function which executes the against the documents and returns the data that you need.
This means that all queries have to be pre-created. There is no ad-hoc queries in CouchDB, you have to think about all of your queries in advance.
The perspective
NoSQL is a curious thing and CouchDB falls into that category. There are great advantages in using documents databases if documents is all you want. It won’t support things like reporting or BI.
CouchDB will probably really quick find documents using it’s ID but I’m still skeptical about performance when it comes to more advanced queries, especially if you need queries based on cross references.
The architecture of CouchDB allows to simply replicate and scale out instances across the internet, but I’d love to see some numbers on hardware vs performance. Javascript and JSON is not quick, so maybe it needs more hardware to achieve the same performance as other options.
All in all, I’ll probably pick up the tech and play around with it a bit to find good scenarios where it is a perfect match.
#1 by Fredrik Normén on June 16th, 2010
Check out MongoDB if you haven’t and of cource the most interesting DB at the moment, the Neo4j a graph databases.
#2 by Jonas Elfström on June 16th, 2010
Make sure to also check out MongoDB. It’s kind of a hybrid. I’ve played with it some and it’s extremely fast and easy to work with. I really like how the impedance between the object model and the relational model almost goes away. As a .NET guy you might also want to check out RavenDB.
#3 by Patrik Löwendahl on June 16th, 2010
Thanks for the tips. MongoDb was used by Rob Conery in a later talk. But he didn’t really “show” it.
I’ll figure out a nice little scenario to try out.