NDC2010: Rob Conery – I can’t hear you, there is an ORM in my ear

In this session Rob Conery tried to show of the new Shiny toy “NoSQL” in as many forms as possible, discussing the pro’s and and some con’s. He based much of the talk on his experiences with building TekPub and the requirements they had around that. He tried to illustrate that this wasn’t a magical kingdom and refrained from the usual snake-oil selling techniques. So all in all a good presentation, content and comments below;

IMAG0084

The talk
In his talk Rob talked and showed two types of “NoSql”-databases, the graph (object) and document kind. Rob categorized document databases as a storage with a bunch of JSON serialized to a bunch of BSON stored as key / value pair. For TekPub they where using MongoDB and that’s the tool he used in his recorded (o.O) demos. He had some neat .NET code that created Session-like containers for the MongoDB and it looked simple to communicate with.

Graph (or object databases) Rob categorized as storage by serialized binary blobs and used DB4O to demonstrate.

Both demo’s had abstractions that hid away how you really communicated with the databases and Rob talked more about the strengths and low-friction of the model, but it was really hard to see it from his demos because of the abstraction level and simple examples.

The conflict
A bit into the talk we hit a conflict. What about reporting? Can we do that with NoSQL databases? Most of them you can’t and Rob showed a strategy of storing copies of data in a relational database for using as a report store. This made it very clear that most NoSQL are for specific purposes and RDMBS are general purpose.

As a developer you also need to think differently and accept Eventual Consistency.

The discussion
I think Rob did a good job in introducing NoSQL to those who haven’t seen it, he also lifted one of the problems with the technology today, reporting. As most developers I like that there are new tech to solve my problems, NoSQL are interesting and as I said before I’ll play around with the tech. But as most hypes I’d be really careful in declaring the old king dead and the new king on the throne.

The NoSQL databases we see to day are good for very specific scenarios, they scale well and for have been serving large websites very well. They are immature though, tooling, infrastructure support and the overall eco-system are far behind the RDBMS product we have out there.

I like the idea to use “NoSQL” to feed web sites, similar to having data cached in memory or “read-copies” for our presentations, while still pushing data to better models for BI or support for business processes.

NDC2010: Hadi Hariri – CouchDB For .NET Developers

Hadi speaks in one of the scariest rooms ever, It’s built on the top of this sports arena. Way way up with a ridiculous steep angle down to the stage. Butterflies in my stomach: 

IMAG0081

Prelude
Specifications today are written in the language of the client but most of the data modeling is done in a relational model. Hadi talks about foreign keys and the idea that a invoice have to have a customer because the relational model requires a foreign key and a bunch of other things. This makes some changes inflexible, like a client wanting to invoice an anonymous customer.

We have an impedance miss match between how we look at the data and how the client looks at the data. When client talks about invoices they see the actual invoice, when we see invoices we see the relational mode.

In Domain Driven Design, et al, we model the clients perspective using objects and then use an ORM to map between the business perspective and the relational model. Which in Hadi’s opinion brings a heap of other problems into the flexibility of responding to change and feature implementations.

Hadi’s session will show how to work with document databases using CoachDb

The Tool
CouchDB is a document database that stores complete documents in JSON. It has ACID compliance for each documentation and uses REST to access the stored documents. The initial engines was built in C++ was was ported to Erlang because of it’s strength in concurrent operations, which accessing and writing data is really about. It works on a lot of platforms; Windows, Linux, Andriod, Maemo, Browser Couch (HTML 5 storage) and CouchDB is able to synchronize data between any number of instances on any type of platform.

CouchDB doesn’t have a schema, well there is a small schema with an ID (_id) and a Revision (_rev), which allows you to store documents in any format you want and change that format during the documents lifetime.

Since CouchDB is built as a REST-based db, everything is done through HTTP (GET, POST; UPDATE and DELETE) and JSON which makes it very easy to access either by running your own HTTP requests or using the browser based admin app Futon.

Writing and versions
CouchDB uses an insert only approach, so each write to a document creates a new version of that document. This means you can track changes through the documents lifetime, but it also means that there is no locks and thus data consistency will follow the Eventual Consistency paradigm.

CouchDB enforces optimistic looking by enforcing the rule that you have to pass a revision number when updating.

Since the nature of CouchDB is to be distributed, this means that you can have a lot of versions spread out in your eco-system up-until the synchronization occurs. This makes it scalable but changes how we have to think about consistency.

When creating new documents, the engine wants you to specify an iD. Anything can be an ID but the DB uses UUID as the default and let’s you generate them easily.

Queries
Since the REST-Api only allows you to query on ID’s you can’t do this;

select * from customers where email =’foo@bar.com’

Which means that you have to use MAP / Reduce. That is create a Map (a view as another document) of the data which is a javascript function which executes the against the documents and returns the data that you need.

This means that all queries have to be pre-created. There is no ad-hoc queries in CouchDB, you have to think about all of your queries in advance.

The perspective

NoSQL is a curious thing and CouchDB falls into that category. There are great advantages in using documents databases if documents is all you want. It won’t support things like reporting or BI.

CouchDB will probably really quick find documents using it’s ID but I’m still skeptical about performance when it comes to more advanced queries, especially if you need queries based on cross references.

The architecture of CouchDB allows to simply replicate and scale out instances across the internet, but I’d love to see some numbers on hardware vs performance. Javascript and JSON is not quick, so maybe it needs more hardware to achieve the same performance as other options.

All in all, I’ll probably pick up the tech and play around with it a bit to find good scenarios where it is a perfect match.