Kore Nordmann - PHP / Projects / Politics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:Author: Kore Nordmann
:Date: Sat, 06 Dec 2008 10:12:07 +0100
:Revision: 8
:Copyright: CC by-sa
====================
CouchDB - A use case
====================
:Description:
This article tries to give you an overview on how CouchDB can be used to map
data to a data storage and how it is completely different from relational
databases. I try to introduce the basic concepts and show an advanced use
case.
.. contents:: Table of Contents
:depth: 3
This article tries to give you an overview on how CouchDB__ can be used to map
data to a data storage and how it is completely different from relational
databases. I try to introduce the basic concepts and show an advanced use
case.
__ http://incubator.apache.org/couchdb/
The application
===============
The problem I try to solve is the typical example of a group based user
permission management. Let's take a look how this would be commonly solved in
a RDBMS (relational database management system)::
+------+ +-------+ +------------+
| user | | group | | permission |
+------+ +-------+ +------------+
| id | <- n:m -> | id | <- n:m -> | id |
| name | | name | | name |
| ... | | ... | | ... |
+------+ +-------+ +------------+
We basically have a list of permissions associated with each group, and each
user associated to any number of groups. In the database you need of course
create the five required tables for this, and then you can query the data by a
simple query with some JOINs. We do this for years, and it can be considered as
basic handcraft. So, how can this be done with CouchDB?
The basic documents
===================
In CouchDB we do not talk about rows, or database entries - we talk about
documents. We have some database, which contains all documents, which are
required for our application with no constraints on the data contained by the
documents in the database itself. The documents itself are just random JSON
objects - with any number of attributes and any depth of nesting of objects
and arrays in the properties.
In the tables above the table columns define the data which can be stored in
the tables, and thus, which data can be stored in the database at all. If
there are additional fields, which should be stored in the user table, for
example - the schema needs to be changed by adding additional columns or
tables. With the known problems.
CouchDB is accessed through HTTP, which is not really relevant here, as this
is just the method to connect to the server. You will notice this though,
because documents and views are accessed through URLs in this article instead of
plain table or database names in RDBMS.
Document constraints
--------------------
You of course want to apply constraints to your data in your database, like
when you want to query all users from your database - without all the other
documents.
In CouchDB you can do so by specifying views. Views are basically functions in
any language, which are called for each document in the database and decide,
if they want to index the current document by some key. By default the
functions are written in ECMAScript (Javascript), but you can also use PHP or
any other language for this.
Views are also documents in the CouchDB database, located in a special folder
of the database. Each view document can contain any number of actual view
functions, which can be considered as a method to group view functions.
Commonly the views for documents which fulfill a similar set of constraints
are contained in one view document.
A practical example
-------------------
Consider the above mentioned example of querying all users in a database. The
constraint we apply for user documents may be, that the documents are required
to have a property called "username", or a property of the name "type", which
contains the string "user" or whatever you can imagine and fits your use case.
When you use the PHPillow document classes each document has such a type
property, which then serves as a good basic constraint in the view functions.
So I will use it here in my basic example::
function ( doc )
{
if ( doc.type == 'user' )
{
emit( doc._id, doc._id );
}
}
This is a really trivial example of a view function, which really just ensures
the document has the value "user" in type property and the calls the special
CouchDB emit() function. The emit function takes a key and a value and creates an
index from the provided keys. If emit() is not called for one document, the
document will not be added to the respective index, like for all documents
which do have a type != 'user' in this case.
The key can be of any type, not only ID strings / int, like in this case, but
this is not yet relevant, and we get back to this later. The same is true for
the value, but for a user listing the IDs are enough, which enable us to fetch
the entire document later with the ID.
More complex view
-----------------
Beside the very basic view shown above, with a turing complete language you
can implement far more complex constraints on the documents. The following
example shows a still simple variant, which only would return users with
confirmed accounts, indicated by a property confirmed, set to true. View
functions are contained by just another document in the database. A complete
view definition document with both view functions, stored to the path
``/database/_design/users`` would the be a JSON structure like::
{
"language": "javascript",
"views": {
"all": {
"map": "function ( doc )
{
if ( doc.type == 'user' )
{
emit( doc._id, doc._id );
}
}"
},
"registered": {
"map": "function ( doc )
{
if ( ( doc.type == 'user' ) &&
(doc.confirmed === true ) )
{
emit( doc._id, doc._id );
}
}"
}
}
}
We now defined two view functions, which give us different subsets of the
documents based on the constraints in the view functions. The view itself can
be queried by accessing the path ``/database/_view/users/registered``, which
would list the IDs of all registered users in the database. The view result
can be influenced with skip and count (like LIMIT), or by keys, which is
described in further detail in the CouchDB documentation and is not relevant
yet.
Adding groups and permissions
=============================
Like in the model above we need to add groups and permissions to the database.
We do this in group documents, containing lists of associated users and the
actual permissions. Such a document could look like::
{
"_id" : "group-users",
"_rev" : "7984327592",
"name" : "Users",
"permissions": [
"wiki-read",
"wiki-write"
],
"users": [
"user-kore",
"user-john"
],
"type": "group"
}
It just has a name, id (like every document in a CouchDB database), the common
type property and the two arrays with permission identifiers and user IDs.
This is exactly the data we represented in the five RDBMS tables shown above
in two different document types. But how to query such data?
The most common query probably is to request the permissions of an user, for
example when he logs into the system. Let's write a CouchDB view function for
this::
function( doc )
{
if ( doc.type == "group" )
{
for ( var i = 0; i < doc.users.length; ++i )
{
emit( doc.users[i], doc.permissions );
}
}
}
In this function we emit the permissions for each user associated with the
current group. CouchDB perfectly works with multiple values emitted for one
key, and will return all of them, if the data for this key is requested. If we
now query the view with the key "user-kore" we get a result set like::
{
"rows": [
{
"key": "user-kore",
"value": [
"wiki-read",
"wiki-write"
]
}
]
}
If we have multiple groups associated with one user, with overlapping
permissions, they of course may be set multiple times in the view result. But
what is map without reduce? Of course CouchDB can do this for you. This is why
all those view functions were defined in a property called ``map``. Let's have
a look at such a map reduce of user permissions and the result of this::
{
"language": "javascript",
"views": {
"user_permissions": {
"map": "function( doc )
{
if ( doc.type == "group" )
{
for ( var i = 0; i < doc.users.length; ++i )
{
emit( doc.users[i], doc.permissions );
}
}
}",
"reduce": "function( keys, values )
{
var permissions = [];
for ( var i = 0; i < values.length; ++i )
{
if ( permissions.indexOf( values[i] ) == -1 )
{
permissions.push( values[i] );
}
}
return permissions;
}",
},
}
}
The reduce function is called with the return values from the view and
receives two arrays, the keys of the result and the values from the result.
With those information we can now build up a JSON structure which should match
the JSON structure retuned from the map function. This enables partial reduce
spread over multiple nodes, or vertical splitting of reduce result
calculation.
In this case we just aggregate all permission arrays passed to the reduce
function into one permission array. When called for all users, the array will
contain all permissions by all users, or, queried for just one user, it will
contain all permissions of one user::
{
"rows": [
{
"key": null,
"value": [
"wiki-read",
"wiki-write"
]
}
]
}
We can of course also query this data for one user. This always returns a
unique list of permission for each user with quite simple structure in out
CouchDB database.
The additional features like grouping of view results would allow us to query
the permissions for each grouped key (for each user), in just one query.
Advanced interesting features
=============================
Attachments
-----------
Each document may have any number of files attached. They are directly
accessible using the proper URL. The documents returned then have a new
property which contains a list with the `attached documents`__.
__ http://wiki.apache.org/couchdb/HttpDocumentApi#head-c5c629663a9847055332932f633e7b9022d3218a
View collations
---------------
As mentioned above, the keys emitted by the emit() function may be of any JSON
structure. With the sorting of keys this enables you to implement - for
example - tree structures of your data trivially. Think of hierarchical forums,
navigations structures, etc. - for more details take a look at `view
collations`__ in the wiki__.
__ http://wiki.apache.org/couchdb/ViewCollation
__ http://wiki.apache.org/couchdb/
PHPillow
========
Documents and view documents for the application described above are delivered
with the default distribution of PHPillow__. Information about the usage can
be found in the tutorial__ and the test cases of PHPillow.
__ http://kore-nordmann.de/projects/phpillow/index.html
__ http://kore-nordmann.de/projects/phpillow/tutorial.html
Trackbacks
==========
- CouchDB - A use case on Thu, 24 Mar 2011 03:08:42 +0100 in ehcache.net
This article tries to give you an overview on how CouchDB can be used to map
data to a data storage and how it is completely different from relational
databases. I try to introduce the basic concepts and show an advanced use
case.
The application
The problem I try to solve is the
typical example of a group based user permission management. Let's take a
look how this would be commonly solved in a RDBMS (relational database
management system)
Comments
========
- Lukas at Sat, 30 Aug 2008 12:40:02 +0200
I am still waiting for the day when I have time to play with CouchDB. I
realize the convinience but I cannot help but worry about exactly this as
well. Like who makes sure that some developer does not screw up all my
queries because he starts using some "special" property that I use in my
map/reduce calls to filter documents?
Also what happens if user X leaves the company and I need to transfer all
documents to someone else? I guess I could write a map/reduce call to change
the username foo to something else everywhere. Well that could also cause
issues, since I might end up modifying my "audit trail" document.
I guess my main worry is, if things get any complex, who/how to keep an
overview of the properties and what they mean? Since there is no hierarchy,
I cannot really easily manage responsibility.
- John Martin at Sat, 06 Dec 2008 02:38:51 +0100
Great document, stumbled across it when I began (yesterday) to look at
CouchDB as an up-and-coming web DB solution. It's about damn time a DB
solution without being tied to RDBMS presented itself.
Also your site is a great resource for any PHP developer; I've forwarded
links out to many of your articles and code snippets to almost every Open
Source web developer I know.
Thanks for a very well written and put-together site. Looking forward to
future articles and code snippets!
John Martin
- Charles Romestant at Mon, 29 Dec 2008 05:38:43 +0100
Great article, thank you. Still not sure this is going to replace everything
RDBMS but it does have some nice possibilities as a cache intermediary (
maybe something to replace xml caches for dynamic web pages).
- Sam at Sun, 25 Oct 2009 01:42:21 +0200
How to make a login changeable and unique? Use display_name? How to make
display_name unique too?
Couchdb as awesome technology, but unfortunately it still missing some basic
functionality.
- Etranger at Sun, 25 Oct 2009 13:23:28 +0100
Sam,
documents in couchdb have unique id's, and they also have revisions (for the
purpose of collision detection). That said, you can always locate existing
usernames, and take action based on that. Both applies to ensuring that no
duplicate values exist in a filled database, and making sure that duplicates
are not entering it.
Relaxed schema requirements are not a replacement for rdbms, as already said
in the post, but a neat addition, and should be used accordingly.
- rbriank at Mon, 26 Oct 2009 14:44:01 +0100
Great article! Thanks
- Andy at Mon, 15 Nov 2010 19:04:05 +0100
Bad example of a use for couchdb. Its these kinds of posts that are
confusing users and pushing them to use a tool (couchdb) that is not right
for the job.
- Silver Knight at Mon, 27 Dec 2010 23:04:38 +0100
@Andy: If it's such a bad example of a use for couchdb, and it's confusing
so many users, your comment might have been more useful and constructive if
it had contained a GOOD example, or at least a LINK to a good example.
Complaint without any suggestion for an alternate option or improvement is
pointless and helps nobody.
@Kore: Thank you for the post. I found that the examples have helped me
wrap my brain around couchdb a bit better.