CouchDB - A use case - Kore Nordmann - PHP / Projects / Politics

Kore Nordmann - PHP / Projects / Politics

By Kore Nordmann, first published at Thu, 28 Aug 2008 22:49:08 +0200

CouchDB - A use case

This article tries to give you an overview on how CouchDB can be used to map data to a data storage and how it is completely different from relational databases. I try to introduce the basic concepts and show an advanced use case.

The application

The problem I try to solve is the typical example of a group based user permission management. Let's take a look how this would be commonly solved in a RDBMS (relational database management system):

+------+ +-------+ +------------+ | user | | group | | permission | +------+ +-------+ +------------+ | id | <- n:m -> | id | <- n:m -> | id | | name | | name | | name | | ... | | ... | | ... | +------+ +-------+ +------------+

We basically have a list of permissions associated with each group, and each user associated to any number of groups. In the database you need of course create the five required tables for this, and then you can query the data by a simple query with some JOINs. We do this for years, and it can be considered as basic handcraft. So, how can this be done with CouchDB?

The basic documents

In CouchDB we do not talk about rows, or database entries - we talk about documents. We have some database, which contains all documents, which are required for our application with no constraints on the data contained by the documents in the database itself. The documents itself are just random JSON objects - with any number of attributes and any depth of nesting of objects and arrays in the properties.

In the tables above the table columns define the data which can be stored in the tables, and thus, which data can be stored in the database at all. If there are additional fields, which should be stored in the user table, for example - the schema needs to be changed by adding additional columns or tables. With the known problems.

CouchDB is accessed through HTTP, which is not really relevant here, as this is just the method to connect to the server. You will notice this though, because documents and views are accessed through URLs in this article instead of plain table or database names in RDBMS.

Document constraints

You of course want to apply constraints to your data in your database, like when you want to query all users from your database - without all the other documents.

In CouchDB you can do so by specifying views. Views are basically functions in any language, which are called for each document in the database and decide, if they want to index the current document by some key. By default the functions are written in ECMAScript (Javascript), but you can also use PHP or any other language for this.

Views are also documents in the CouchDB database, located in a special folder of the database. Each view document can contain any number of actual view functions, which can be considered as a method to group view functions. Commonly the views for documents which fulfill a similar set of constraints are contained in one view document.

A practical example

Consider the above mentioned example of querying all users in a database. The constraint we apply for user documents may be, that the documents are required to have a property called "username", or a property of the name "type", which contains the string "user" or whatever you can imagine and fits your use case. When you use the PHPillow document classes each document has such a type property, which then serves as a good basic constraint in the view functions. So I will use it here in my basic example:

function ( doc ) { if ( doc.type == 'user' ) { emit( doc._id, doc._id ); } }

This is a really trivial example of a view function, which really just ensures the document has the value "user" in type property and the calls the special CouchDB emit() function. The emit function takes a key and a value and creates an index from the provided keys. If emit() is not called for one document, the document will not be added to the respective index, like for all documents which do have a type != 'user' in this case.

The key can be of any type, not only ID strings / int, like in this case, but this is not yet relevant, and we get back to this later. The same is true for the value, but for a user listing the IDs are enough, which enable us to fetch the entire document later with the ID.

More complex view

Beside the very basic view shown above, with a turing complete language you can implement far more complex constraints on the documents. The following example shows a still simple variant, which only would return users with confirmed accounts, indicated by a property confirmed, set to true. View functions are contained by just another document in the database. A complete view definition document with both view functions, stored to the path /database/_design/users would the be a JSON structure like:

{ "language": "javascript", "views": { "all": { "map": "function ( doc ) { if ( doc.type == 'user' ) { emit( doc._id, doc._id ); } }" }, "registered": { "map": "function ( doc ) { if ( ( doc.type == 'user' ) && (doc.confirmed === true ) ) { emit( doc._id, doc._id ); } }" } } }

We now defined two view functions, which give us different subsets of the documents based on the constraints in the view functions. The view itself can be queried by accessing the path /database/_view/users/registered, which would list the IDs of all registered users in the database. The view result can be influenced with skip and count (like LIMIT), or by keys, which is described in further detail in the CouchDB documentation and is not relevant yet.

Adding groups and permissions

Like in the model above we need to add groups and permissions to the database. We do this in group documents, containing lists of associated users and the actual permissions. Such a document could look like:

{ "_id" : "group-users", "_rev" : "7984327592", "name" : "Users", "permissions": [ "wiki-read", "wiki-write" ], "users": [ "user-kore", "user-john" ], "type": "group" }

It just has a name, id (like every document in a CouchDB database), the common type property and the two arrays with permission identifiers and user IDs. This is exactly the data we represented in the five RDBMS tables shown above in two different document types. But how to query such data?

The most common query probably is to request the permissions of an user, for example when he logs into the system. Let's write a CouchDB view function for this:

function( doc ) { if ( doc.type == "group" ) { for ( var i = 0; i < doc.users.length; ++i ) { emit( doc.users[i], doc.permissions ); } } }

In this function we emit the permissions for each user associated with the current group. CouchDB perfectly works with multiple values emitted for one key, and will return all of them, if the data for this key is requested. If we now query the view with the key "user-kore" we get a result set like:

{ "rows": [ { "key": "user-kore", "value": [ "wiki-read", "wiki-write" ] } ] }

If we have multiple groups associated with one user, with overlapping permissions, they of course may be set multiple times in the view result. But what is map without reduce? Of course CouchDB can do this for you. This is why all those view functions were defined in a property called map. Let's have a look at such a map reduce of user permissions and the result of this:

{ "language": "javascript", "views": { "user_permissions": { "map": "function( doc ) { if ( doc.type == "group" ) { for ( var i = 0; i < doc.users.length; ++i ) { emit( doc.users[i], doc.permissions ); } } }", "reduce": "function( keys, values ) { var permissions = []; for ( var i = 0; i < values.length; ++i ) { if ( permissions.indexOf( values[i] ) == -1 ) { permissions.push( values[i] ); } } return permissions; }", }, } }

The reduce function is called with the return values from the view and receives two arrays, the keys of the result and the values from the result. With those information we can now build up a JSON structure which should match the JSON structure retuned from the map function. This enables partial reduce spread over multiple nodes, or vertical splitting of reduce result calculation.

In this case we just aggregate all permission arrays passed to the reduce function into one permission array. When called for all users, the array will contain all permissions by all users, or, queried for just one user, it will contain all permissions of one user:

{ "rows": [ { "key": null, "value": [ "wiki-read", "wiki-write" ] } ] }

We can of course also query this data for one user. This always returns a unique list of permission for each user with quite simple structure in out CouchDB database.

The additional features like grouping of view results would allow us to query the permissions for each grouped key (for each user), in just one query.

Advanced interesting features

Attachments

Each document may have any number of files attached. They are directly accessible using the proper URL. The documents returned then have a new property which contains a list with the attached documents.

View collations

As mentioned above, the keys emitted by the emit() function may be of any JSON structure. With the sorting of keys this enables you to implement - for example - tree structures of your data trivially. Think of hierarchical forums, navigations structures, etc. - for more details take a look at view collations in the wiki.

PHPillow

Documents and view documents for the application described above are delivered with the default distribution of PHPillow. Information about the usage can be found in the tutorial and the test cases of PHPillow.

If you liked this blog post, or learned something please consider using flattr to contribute back: .

Trackbacks

Comments

Add new comment

Fields with bold names are mandatory.

eZ Components

eZ Components

Exploring PHP

Exploring PHP

Hire me

Amazon wishlist

Powered by