CMIS and Atom/AtomPub

As has been widely blogged on by Craig Randall, Chuck Hollis, Andrew Chapman, John Newton, Ethan Gur-esh, David Nuescheler and many analysts including IDC, Burton Group, CMS Watch, 451 Group and more, EMC, IBM and Microsoft today announced that they have worked together to create a draft specification for a web services based (and I use the term “web services” more generically than as SOAP-based services) standard for content management – Content Management Interoperability Services (CMIS). Other than to point out a couple of highlights, I won’t repeat what many others have already said – but those highlights that I must repeat are:

  • This is a programming language agnostic interface. No matter what language you are using to implement your client, chances are there is adequate support for the things that are needed to invoke CMIS services (uh, HTTP – ubiquitous).
  • The spec defines a domain model and services in the abstract (i.e. not mapped to any concreate binding – we agreed on the core semantics first) as well as both SOAP-based and REST-based bindings.
  • Wow – EMC, IBM and Microsoft all agreed on the contents of this draft!! (and there are not lots and lots of optional features specified which would make assessing compliance against the standard a nightmare and the spec much less valuable)
  • The spec was designed to be layered over existing repositories – that is, no reengineering of the repository implementations is required. This presents the real possibility that interoperability can be achieved over the repositories in existence today, not just the repositories of tomorrow.

But the thing I want so say a bit more about is that the REST binding is specified as an extension to Atom and AtomPub. You may notice that more recent posts on my, as of late, slightly less neglected blog touch upon the Atom technologies – my interest in Atom is not just stemming from CMIS, I see Atom’s applicability to many other use cases beyond content management. I would say that the relative simplicity, core usefulness and significant uptake of Atom greatly influenced the choice to create the RESTful CMIS binding as an extension of Atom. There is enough in that CMIS binding to generate dozens of interesting dialogs, let me just touch upon a couple of things to start.First, Atom applicability for content management is a natural. When we started to look at generating bindings for the abstract CMIS model, it was immediately apparent that it was very easy to create Atom Format representations for the core CMIS objects; also, many of the CMIS services deal with sets (I’m intentionally avoiding the term “collection” here because particularly in the context of Atom discussions that term is already heavily overloaded) of these objects. Yeah, we are talking about things that are easily represented as entries and feeds. And from a client perspective, the types of things that we want to do with our corresponding entry and feed representations are similar to what standard Atom clients already do – show the lists of objects and expose some of the attributes for each.The CMIS domain model has a bit more complexity, for example, the notion of hierarchy. Folders (one of the core CMIS object types) can contain other folders as well as documents (another CMIS object type). There are lots of different ways that hierarchies can be represented of course, a flat list with pointers to ids/URIs/keys, etc. What the current CMIS draft does is include children of a folder (folder is represented as an Atom entry) as nested entries. The simple and powerful notion of foreign markup allows for this and there are a number of other ways that CMIS takes advantage of it. The Atom community has talked about nested collections before – CMIS offers an opportunity for a renewed dialog on that subject. Is it proof that entries needn’t be nested or is it a catalyst for inclusion? (In order to keep this initial post a bit on the less-long side I’ll address some of the other foreign markup that CMIS defines in future posts).So on to AtomPub – this is where things get really interesting. You’ll notice that the REST binding starts off by defining the resource model for CMIS. It defines the folder, document, relationship and policy resources as well as many collections including children (of a particular folder), descendants (of a particular folder – this is where the hierarchy I talked about above comes in), checked out documents (ooh, now things are getting interesting), as so on. (Okay, so you’ll notice that I use the term “collections” here – I’ll admit that not all of what we call collections in CMIS follow the rules that I am being a stickler about here. It’s a draft – we’re still working on it.) Then, as good disciples of Richardson and Ruby we define which of the basic HTTP operations are supported against each. It’s pretty straight forward for many of the resources – GET on a document returns the metadata for that document (an ), GET on the document media URL gets the document contents, GET on a the folder children resource returns a containing an entry for each document or folder contained therein, … – you get the picture.So what about the very core content management service of checkout? It’s tempting to think about the document that we want to check out as the resource that we want to manipulate – but then what operation do we apply? It surely ain’t GET or DELETE. PUT is kinda tempting – maybe I can PUT a representation that includes an attribute – true? If someone is really interested, I can dedicate a whole post to why this isn’t a good idea but the short of it is that it is generally not a good idea to model semantics such as these with a side effect to some state. So for now, believe me that this PUT approach is not good. What about POST? Well, POSTing is usually reserved for adding entries to collection resources and the document I want to check out isn’t a collection. So do we need a new verb – CHECKOUT? Hmmm, what was it that Richardson and Ruby warned us to do when we were tempted to create new verbs? And what was that we were just saying about POST and collections? AH HA – that’s it!! That is where the collection of checked out documents comes in. To check out a document using the CMIS REST, AtomPub-based binding you issue a POST of the document to the checked out documents collection!! Now that is cool. Ah, but I also have to acknowledge that by being posted to that new collection changes the state of the document. Gotta think on that a bit.So as I reflect on what I’ve written so far I realize that I have once again gone quickly down into the technical weeds – sorry, can’t help myself, I find the weeds rather fun. But in the interest of closing out this post while it is still announcement day in some timezones I’ll save some of those technical details for future posts. My aims today really were to first, celebrate the milestone that has been reached with CMIS and also to pique interest in the particularly the RESTful binding. I’m very interested in feedback from members of the REST and Atom communities – oh, forgot to mention that as said in the press release, the draft spec will go to OASIS for ratification – watch for announcements from OASIS on when the initial meeting will be. Until the OASIS forums are established I look forward to discussions on existing mailing lists and blogs. (Note, for lack of having an automated solution for keeping spam out of my comments I currently moderate all of them – so any comments posted here won’t show up until I approve them. I’m not on vacation so there shouldn’t be too much of a delay).To paraphrase John Newton, congratulations to EMC, IBM and Microsoft for putting aside their differences for the benefit of customers and the industry as a whole!

3 Comments

  1. Hi, are you still working on CMIS and JCR? I’m a search person looking at it and they look like good standards but I’m not sure how far they’ll get. What’s your take?Avi

Share Your Thoughts

Leave a Reply to Avi Rappoport, SearchTools.com Cancel reply