CouchDB - A use case
First published at Thursday 28 August 2008
Warning: This blog post is more then 16 years old – read and use with care.
CouchDB - A use case
Table of Contents
This article tries to give you an overview on how CouchDB can be used to map data to a data storage and how it is completely different from relational databases. I try to introduce the basic concepts and show an advanced use case.
The application
The problem I try to solve is the typical example of a group based user permission management. Let's take a look how this would be commonly solved in a RDBMS (relational database management system):
+------+ +-------+ +------------+
| user | | group | | permission |
+------+ +-------+ +------------+
| id | <- n:m -> | id | <- n:m -> | id |
| name | | name | | name |
| ... | | ... | | ... |
+------+ +-------+ +------------+
We basically have a list of permissions associated with each group, and each user associated to any number of groups. In the database you need of course create the five required tables for this, and then you can query the data by a simple query with some JOINs. We do this for years, and it can be considered as basic handcraft. So, how can this be done with CouchDB?
The basic documents
In CouchDB we do not talk about rows, or database entries - we talk about documents. We have some database, which contains all documents, which are required for our application with no constraints on the data contained by the documents in the database itself. The documents itself are just random JSON objects - with any number of attributes and any depth of nesting of objects and arrays in the properties.
In the tables above the table columns define the data which can be stored in the tables, and thus, which data can be stored in the database at all. If there are additional fields, which should be stored in the user table, for example - the schema needs to be changed by adding additional columns or tables. With the known problems.
CouchDB is accessed through HTTP, which is not really relevant here, as this is just the method to connect to the server. You will notice this though, because documents and views are accessed through URLs in this article instead of plain table or database names in RDBMS.
Document constraints
You of course want to apply constraints to your data in your database, like when you want to query all users from your database - without all the other documents.
In CouchDB you can do so by specifying views. Views are basically functions in any language, which are called for each document in the database and decide, if they want to index the current document by some key. By default the functions are written in ECMAScript (Javascript), but you can also use PHP or any other language for this.
Views are also documents in the CouchDB database, located in a special folder of the database. Each view document can contain any number of actual view functions, which can be considered as a method to group view functions. Commonly the views for documents which fulfill a similar set of constraints are contained in one view document.
A practical example
Consider the above mentioned example of querying all users in a database. The constraint we apply for user documents may be, that the documents are required to have a property called "username", or a property of the name "type", which contains the string "user" or whatever you can imagine and fits your use case. When you use the PHPillow document classes each document has such a type property, which then serves as a good basic constraint in the view functions. So I will use it here in my basic example:
function ( doc )
{
if ( doc.type == 'user' )
{
emit( doc._id, doc._id );
}
}
This is a really trivial example of a view function, which really just ensures the document has the value "user" in type property and the calls the special CouchDB emit() function. The emit function takes a key and a value and creates an index from the provided keys. If emit() is not called for one document, the document will not be added to the respective index, like for all documents which do have a type != 'user' in this case.
The key can be of any type, not only ID strings / int, like in this case, but this is not yet relevant, and we get back to this later. The same is true for the value, but for a user listing the IDs are enough, which enable us to fetch the entire document later with the ID.
More complex view
Beside the very basic view shown above, with a turing complete language you can implement far more complex constraints on the documents. The following example shows a still simple variant, which only would return users with confirmed accounts, indicated by a property confirmed, set to true. View functions are contained by just another document in the database. A complete view definition document with both view functions, stored to the path /database/_design/users
would the be a JSON structure like:
{
"language": "javascript",
"views": {
"all": {
"map": "function ( doc )
{
if ( doc.type == 'user' )
{
emit( doc._id, doc._id );
}
}"
},
"registered": {
"map": "function ( doc )
{
if ( ( doc.type == 'user' ) &&
(doc.confirmed === true ) )
{
emit( doc._id, doc._id );
}
}"
}
}
}
We now defined two view functions, which give us different subsets of the documents based on the constraints in the view functions. The view itself can be queried by accessing the path /database/_view/users/registered
, which would list the IDs of all registered users in the database. The view result can be influenced with skip and count (like LIMIT), or by keys, which is described in further detail in the CouchDB documentation and is not relevant yet.
Adding groups and permissions
Like in the model above we need to add groups and permissions to the database. We do this in group documents, containing lists of associated users and the actual permissions. Such a document could look like:
{
"_id" : "group-users",
"_rev" : "7984327592",
"name" : "Users",
"permissions": [
"wiki-read",
"wiki-write"
],
"users": [
"user-kore",
"user-john"
],
"type": "group"
}
It just has a name, id (like every document in a CouchDB database), the common type property and the two arrays with permission identifiers and user IDs. This is exactly the data we represented in the five RDBMS tables shown above in two different document types. But how to query such data?
The most common query probably is to request the permissions of an user, for example when he logs into the system. Let's write a CouchDB view function for this:
function( doc )
{
if ( doc.type == "group" )
{
for ( var i = 0; i < doc.users.length; ++i )
{
emit( doc.users[i], doc.permissions );
}
}
}
In this function we emit the permissions for each user associated with the current group. CouchDB perfectly works with multiple values emitted for one key, and will return all of them, if the data for this key is requested. If we now query the view with the key "user-kore" we get a result set like:
{
"rows": [
{
"key": "user-kore",
"value": [
"wiki-read",
"wiki-write"
]
}
]
}
If we have multiple groups associated with one user, with overlapping permissions, they of course may be set multiple times in the view result. But what is map without reduce? Of course CouchDB can do this for you. This is why all those view functions were defined in a property called map
. Let's have a look at such a map reduce of user permissions and the result of this:
{
"language": "javascript",
"views": {
"user_permissions": {
"map": "function( doc )
{
if ( doc.type == "group" )
{
for ( var i = 0; i < doc.users.length; ++i )
{
emit( doc.users[i], doc.permissions );
}
}
}",
"reduce": "function( keys, values )
{
var permissions = [];
for ( var i = 0; i < values.length; ++i )
{
if ( permissions.indexOf( values[i] ) == -1 )
{
permissions.push( values[i] );
}
}
return permissions;
}",
},
}
}
The reduce function is called with the return values from the view and receives two arrays, the keys of the result and the values from the result. With those information we can now build up a JSON structure which should match the JSON structure retuned from the map function. This enables partial reduce spread over multiple nodes, or vertical splitting of reduce result calculation.
In this case we just aggregate all permission arrays passed to the reduce function into one permission array. When called for all users, the array will contain all permissions by all users, or, queried for just one user, it will contain all permissions of one user:
{
"rows": [
{
"key": null,
"value": [
"wiki-read",
"wiki-write"
]
}
]
}
We can of course also query this data for one user. This always returns a unique list of permission for each user with quite simple structure in out CouchDB database.
The additional features like grouping of view results would allow us to query the permissions for each grouped key (for each user), in just one query.
Advanced interesting features
Attachments
Each document may have any number of files attached. They are directly accessible using the proper URL. The documents returned then have a new property which contains a list with the attached documents.
View collations
As mentioned above, the keys emitted by the emit() function may be of any JSON structure. With the sorting of keys this enables you to implement - for example - tree structures of your data trivially. Think of hierarchical forums, navigations structures, etc. - for more details take a look at view collations in the wiki.
PHPillow
Documents and view documents for the application described above are delivered with the default distribution of PHPillow. Information about the usage can be found in the tutorial and the test cases of PHPillow.
Subscribe to updates
There are multiple ways to stay updated with new posts on my blog:
Comments
Lukas at Saturday, 30.8. 2008
I am still waiting for the day when I have time to play with CouchDB. I realize the convinience but I cannot help but worry about exactly this as well. Like who makes sure that some developer does not screw up all my queries because he starts using some "special" property that I use in my map/reduce calls to filter documents?
Also what happens if user X leaves the company and I need to transfer all documents to someone else? I guess I could write a map/reduce call to change the username foo to something else everywhere. Well that could also cause issues, since I might end up modifying my "audit trail" document.
I guess my main worry is, if things get any complex, who/how to keep an overview of the properties and what they mean? Since there is no hierarchy, I cannot really easily manage responsibility.
John Martin at Saturday, 6.12. 2008
Great document, stumbled across it when I began (yesterday) to look at CouchDB as an up-and-coming web DB solution. It's about damn time a DB solution without being tied to RDBMS presented itself.
Also your site is a great resource for any PHP developer; I've forwarded links out to many of your articles and code snippets to almost every Open Source web developer I know.
Thanks for a very well written and put-together site. Looking forward to future articles and code snippets!
John Martin
Charles Romestant at Monday, 29.12. 2008
Great article, thank you. Still not sure this is going to replace everything RDBMS but it does have some nice possibilities as a cache intermediary ( maybe something to replace xml caches for dynamic web pages).
Sam at Saturday, 24.10. 2009
How to make a login changeable and unique? Use display_name? How to make display_name unique too?
Couchdb as awesome technology, but unfortunately it still missing some basic functionality.
Etranger at Sunday, 25.10. 2009
Sam,
documents in couchdb have unique id's, and they also have revisions (for the purpose of collision detection). That said, you can always locate existing usernames, and take action based on that. Both applies to ensuring that no duplicate values exist in a filled database, and making sure that duplicates are not entering it.
Relaxed schema requirements are not a replacement for rdbms, as already said in the post, but a neat addition, and should be used accordingly.
rbriank at Monday, 26.10. 2009
Great article! Thanks
Andy at Monday, 15.11. 2010
Bad example of a use for couchdb. Its these kinds of posts that are confusing users and pushing them to use a tool (couchdb) that is not right for the job.
Silver Knight at Monday, 27.12. 2010
@Andy: If it's such a bad example of a use for couchdb, and it's confusing so many users, your comment might have been more useful and constructive if it had contained a GOOD example, or at least a LINK to a good example. Complaint without any suggestion for an alternate option or improvement is pointless and helps nobody.
@Kore: Thank you for the post. I found that the examples have helped me wrap my brain around couchdb a bit better.
Judith at Friday, 17.2. 2012
Thankyou. I found here what I needed. ^^