NDC2010: Rob Conery – I can’t hear you, there is an ORM in my ear

In this session Rob Conery tried to show of the new Shiny toy “NoSQL” in as many forms as possible, discussing the pro’s and and some con’s. He based much of the talk on his experiences with building TekPub and the requirements they had around that. He tried to illustrate that this wasn’t a magical kingdom and refrained from the usual snake-oil selling techniques. So all in all a good presentation, content and comments below;

IMAG0084

The talk
In his talk Rob talked and showed two types of “NoSql”-databases, the graph (object) and document kind. Rob categorized document databases as a storage with a bunch of JSON serialized to a bunch of BSON stored as key / value pair. For TekPub they where using MongoDB and that’s the tool he used in his recorded (o.O) demos. He had some neat .NET code that created Session-like containers for the MongoDB and it looked simple to communicate with.

Graph (or object databases) Rob categorized as storage by serialized binary blobs and used DB4O to demonstrate.

Both demo’s had abstractions that hid away how you really communicated with the databases and Rob talked more about the strengths and low-friction of the model, but it was really hard to see it from his demos because of the abstraction level and simple examples.

The conflict
A bit into the talk we hit a conflict. What about reporting? Can we do that with NoSQL databases? Most of them you can’t and Rob showed a strategy of storing copies of data in a relational database for using as a report store. This made it very clear that most NoSQL are for specific purposes and RDMBS are general purpose.

As a developer you also need to think differently and accept Eventual Consistency.

The discussion
I think Rob did a good job in introducing NoSQL to those who haven’t seen it, he also lifted one of the problems with the technology today, reporting. As most developers I like that there are new tech to solve my problems, NoSQL are interesting and as I said before I’ll play around with the tech. But as most hypes I’d be really careful in declaring the old king dead and the new king on the throne.

The NoSQL databases we see to day are good for very specific scenarios, they scale well and for have been serving large websites very well. They are immature though, tooling, infrastructure support and the overall eco-system are far behind the RDBMS product we have out there.

I like the idea to use “NoSQL” to feed web sites, similar to having data cached in memory or “read-copies” for our presentations, while still pushing data to better models for BI or support for business processes.

NDC2010: Hadi Hariri – CouchDB For .NET Developers

Hadi speaks in one of the scariest rooms ever, It’s built on the top of this sports arena. Way way up with a ridiculous steep angle down to the stage. Butterflies in my stomach: 

IMAG0081

Prelude
Specifications today are written in the language of the client but most of the data modeling is done in a relational model. Hadi talks about foreign keys and the idea that a invoice have to have a customer because the relational model requires a foreign key and a bunch of other things. This makes some changes inflexible, like a client wanting to invoice an anonymous customer.

We have an impedance miss match between how we look at the data and how the client looks at the data. When client talks about invoices they see the actual invoice, when we see invoices we see the relational mode.

In Domain Driven Design, et al, we model the clients perspective using objects and then use an ORM to map between the business perspective and the relational model. Which in Hadi’s opinion brings a heap of other problems into the flexibility of responding to change and feature implementations.

Hadi’s session will show how to work with document databases using CoachDb

The Tool
CouchDB is a document database that stores complete documents in JSON. It has ACID compliance for each documentation and uses REST to access the stored documents. The initial engines was built in C++ was was ported to Erlang because of it’s strength in concurrent operations, which accessing and writing data is really about. It works on a lot of platforms; Windows, Linux, Andriod, Maemo, Browser Couch (HTML 5 storage) and CouchDB is able to synchronize data between any number of instances on any type of platform.

CouchDB doesn’t have a schema, well there is a small schema with an ID (_id) and a Revision (_rev), which allows you to store documents in any format you want and change that format during the documents lifetime.

Since CouchDB is built as a REST-based db, everything is done through HTTP (GET, POST; UPDATE and DELETE) and JSON which makes it very easy to access either by running your own HTTP requests or using the browser based admin app Futon.

Writing and versions
CouchDB uses an insert only approach, so each write to a document creates a new version of that document. This means you can track changes through the documents lifetime, but it also means that there is no locks and thus data consistency will follow the Eventual Consistency paradigm.

CouchDB enforces optimistic looking by enforcing the rule that you have to pass a revision number when updating.

Since the nature of CouchDB is to be distributed, this means that you can have a lot of versions spread out in your eco-system up-until the synchronization occurs. This makes it scalable but changes how we have to think about consistency.

When creating new documents, the engine wants you to specify an iD. Anything can be an ID but the DB uses UUID as the default and let’s you generate them easily.

Queries
Since the REST-Api only allows you to query on ID’s you can’t do this;

select * from customers where email =’foo@bar.com’

Which means that you have to use MAP / Reduce. That is create a Map (a view as another document) of the data which is a javascript function which executes the against the documents and returns the data that you need.

This means that all queries have to be pre-created. There is no ad-hoc queries in CouchDB, you have to think about all of your queries in advance.

The perspective

NoSQL is a curious thing and CouchDB falls into that category. There are great advantages in using documents databases if documents is all you want. It won’t support things like reporting or BI.

CouchDB will probably really quick find documents using it’s ID but I’m still skeptical about performance when it comes to more advanced queries, especially if you need queries based on cross references.

The architecture of CouchDB allows to simply replicate and scale out instances across the internet, but I’d love to see some numbers on hardware vs performance. Javascript and JSON is not quick, so maybe it needs more hardware to achieve the same performance as other options.

All in all, I’ll probably pick up the tech and play around with it a bit to find good scenarios where it is a perfect match.

NDC2010: Chris Sells on Data

IMAG0068

First session ended, Chris Sells on data. He kicked on a lot of open doors, tried to sell the idea that M and Oslo’s death are over exaggerated and will be the next big thing. It’s over, let it go.

Chris position on Data
Chris started off by stating his position on data, it can be saved in many forms; graphs, trees or tables, but as it seems we as an industry more often then not revert back to tables. Since it brings the most utility of the three for multi purposes. Later in the talk he spoke about NoSQL and how a lot of these technologies solve interesting problems, often with scale, but warned the audience (as the engineers they are) to think that the new shiny toy comes without flaws or drawbacks. Every tool has his/her place in the eco-system.

Data for everyone, really everyone
An interesting point he made though, that runs chills down my spine, is that availability and access of data changes. It’s not that it changes that gives me the creeps, it’s how he and the team he works for envisions the change. Chris made a parallel with Excel and how good it was, how it allowed everyone to be a “programmer”, his vision was that with Microsofts OData and things like Excel Power Pivot, everyone will be able to query data and put in their program. As there isn’t enough Excel mess to clean up in the world?! But hey, at least it’s consultant friendly.

Perspective
Chris concluded in his talk that how we think about data changes, how we expose/get exposed to data changes and that no matter what we do, data will be what’s important (he also said that behavior was “the 90’s”, meh?). I’d agree that data is important, how we store data is important and how we access data is important. But data isn’t just there to be entered, read or draw diagram off. There is a huge portion of data that’s used to support and make business processes easier. Excel doesn’t help with that, neither do OData (and certainly not M). So even with all these new shiny toys Microsoft will be putting out, we’ll still build our software as we used to. Just with more options.

Performance differences between LINQ To SQL and NHibernate

In my current project one of the actions I’ve taken is to have the project and team move away from Linq To Sql to NHibernate. There was a multitude of issues that was the basis for this move, some of the main reasons are outlined in my post:“Top 10 reasons to think twice about using Linq To Sql in your project” but there was also others, like the inability to tweak Linq To Sql to perform in different scenarios which often lead to stored procedures. NHibernate have a multitude of buttons to push and can tweak almost every aspect of the data access which gives us more options.

This post is not about the lack of optimizing options in LTS, nor the options in NHibernate. It’s just to illustrate a simple truth; NHibernate is more mature and have had time to optimize the core functionality like object creation (materialization). Compare these two images:

 

LTSProfiler_small

Linq To Sql Materialization 703ms -  DB access 7 ms

 

NhibernateProfiler_small

NHIbernate materialization 159 – DB access 7 ms

Unfortunately I can’t show the real model that we loaded but it was an aggregate with a couple of lists which where 3 levels deep.

Another thing to note here: Linq To Sql needed a stored procedure to load this (due to the fact that load spans work really really bad in depths deeper then 2 levels, read more here: LINQ To SQL: Support for Eager Loading, Really? ) NHibernate usages a couple of batched select queries that was built using NHibernate Criteria and the Future feature.

Look at the database execution times, both are 7ms. So can we please kill the “stored procedures are faster then ORM generated queries”-debate now? Each of these scenarios is run on the same database with the same indexes and both had a database that where equally warmed up before the runs.