Why do some XML-oriented users avoid Cocoon 2.2?

[24 January 2011]

Over time, about it I’ve become aware that I’m not the only user of Cocoon 2.1 who has not yet moved to Cocoon 2.2.

In my case, the basic story is simple. When I first considered installing Cocoon 2.2, I expected installation to be very similar to installation of Cocoon 2.1; it is after all a decimal-point release. So I looked for a zip or tgz file to download and couldn’t find one. A little puzzled, I read the getting-started documentation, which informed me to my dismay that the first thing I had to do was install a particular Java build manager and start building an extension to Cocoon. (I have nothing whatever against Java build managers in general or Maven [the build manager in question] in particular. It’s just that the large majority of lines of code I’ve written in the last ten years are in XSLT, with Prolog and XQuery a distant second and third, with Common Lisp, Emacs Lisp, C, Java, Rexx, and other languages bringing up the rear like the peloton in a bicycle race. So no, I didn’t have Maven installed and didn’t have it on my list of things to do sometime soon or ever.) Now, I like Cocoon a fair bit and moving to 2.2 still seemed like a good idea, so I got as far as downloading Maven and working through its five-minute introduction before I went back to the Cocoon 2.2 intro and learned that the first thing I was to do was develop a Java extension to Cocoon. That was when I lost patience, said “This is nuts” and re-installed Cocoon using an old Cocoon 2.1 .war file.

Some months later I thought about it again and decided I should give it another try. I did. And essentially the same thing happened again.

In the time since then I’ve encountered two or three other XML-oriented people who have told me similar stories as explanations of why they are still using Cocoon 2.1.

Recently I’ve come to believe that the problem here (if it’s a problem — and as a Cocoon 2.1 user, I think it is) is simple: the Cocoon 2.2 documentation is written (I guess) by people who think of themselves in some primary sense as Java programmers, and they have written it (not surprisingly, if in this case perhaps not quite reasonably) for people much like themselves: i.e. people who want to use Cocoon as a framework to write and deploy Java code and/or to extend Cocoon. There is no highly visible documentation for Cocoon 2.2 aimed at people who want to use Cocoon’s out-of-the-box features to create XML-based web sites where all the heavy lifting is handled by XSLT transformations under the control of Cocoon pipelines, and who are more likely to be interested in writing XSLT than Java. Me, I got interested in Cocoon precisely because I could do nice things without writing Java code for it. I am happy to know, and occasionally to be reminded, that I can extend Cocoon if I ever need to by writing Java code; I’ve done that in the past and I expect to do it in the future.

I think Cocoon 2.2 could get better uptake among XML-oriented users if there were some highly visible documentation aimed at that demographic. It might also help if there were a document on “Cocoon 2.2 installation for Cocoon 2.1 users” to explain that while Maven is indeed targeted at Java developers, you really don’t need to be a Java developer to find it useful: you can just think of it as a Java-specific package and dependency manager or a much-smarter FTP client specialized for downloading and installing Cocoon in just the way you want it.

More on this topic later.

XForms class 14-15 February 2011, Rockville, Maryland

[5 January 2011; typo corrected 24 Jan 2011]

Black Mesa Technologies has scheduled a two-day hands-on class on the basics of XForms, pilule to be taught 14-15 February 2011 in Rockville, Maryland, in the training facility of Mulberry Technologies (to whom thanks for the hospitality).

The course will cover the XForms processing model, the treatment of simple values and the creation of simple structures, repetitions of the same element, sequences of heterogeneous elements, and techniques for using XForms for complex forms, dynamic interfaces, and multilingual interfaces. It’s based on the one-and-a-half day course given last November at the TEI Members Meeting in Zadar, Croatia, which (judging by the participants’ evaluations) was a success.

XForms have great potential for individuals, projects, and organizations using XML seriously: XForms is based on the model / view / controller idiom, and the model in question is represented by a set of XML documents. That means that you can use XForms to create specialized editing interfaces for XML documents, which exploit the styling and interface capabilities of the host language (typically XHTML) and can also exploit your knowledge of your own data and requirements.

Some people have built more or less general-purpose XML editors for specific vocabularies using XForms. That works, I think, more or less, though in many cases I think you’ll get better results acquiring a good XML editor and learning to use it. XForms really shines, I think, in the creation of ad hoc special-purpose editors for the performance of specialized tasks.

In many projects, XML documents are created and refined in multiple specialized passes. In a historical documentary edition, the document will be transcribed, then proofread and corrected in multiple passes performed by different people, or pairs of people. Another pass over the document will mark places where annotations are needed and note the information needed to write those annotations. (“Who is this person mentioned here? Need a short biographical note.”) And so on.

In a language corpus, an automated process may have attempted to mark sentence boundaries, and a human reviewer may be assigned to correct them; the only things that reviewer is supposed to do are open the document, split all of the s elements where the software missed a sentence boundary, join adjacent s elements wherever the software was wrong in thinking it had found a sentence boundary, save the document, and quit. If you undertake this task in a full XML editor, and you get bored and lose concentration, there is essentially no limit to the amount of damage you could accidentally do to the data by mistake. What is needed for situations like this is what Henry Thompson of the University of Edinburgh calls ‘padded-cell editors’ — editors in which you cannot do all that much damage, precisely because they are not full-featured general-purpose editors. Because they allow the user to do only a few things, padded-cell editors can have simpler user interfaces and be easier to learn than general-purpose editors.

The construction of padded-cell editors has always been a complicated and expensive task; it’s going to take thousands, or tens of thousands, of lines of Java or Objective C or Python to build one, even if you have a reasonably good library to use. With XForms, the high-level abstractions and the declarative nature of the specification make it possible to do roughly the same work with much less code: a few hundred lines of XHTML, CSS, and XForms-specific markup.

This is why I think XForms has a place in the toolkit of any project or organization making serious use of XML. And, coincidentally, it may be a reason you, dear reader, or someone you know may want to attend this XForms course.

(Oh, yes, one more thing: we have set up an email announcement list for people who want to receive email notification of this and other courses organized or taught by Black Mesa Technologies; a sign-up page is available.)

What constitutes successful format conversion?

[31 December 2010]

I wrote about the International data curation conference earlier this month, web but did not provide a pointer to my own talk.

My slides are on the Web on this site; they may give some idea of the general thrust of my talk. (On slide 4, “IANAPL” expands to “I am not a preservation librarian”. On slide 20, the quotation is from an anonymous review of my paper.)

Over time, I become more and more convinced that formal proofs of correctness are important for things we care about. The other day, for example (29 December to be exact), I saw a front-page article in the New York Times about radiation overdoses resulting from both hardware and software shortcomings in the device used to administer radiotherapy. I found it impossible not to think that formal proofs of correctness could help prevent such errors. (Among other things, formal proofs of correctness force those responsible to say with some precision what correct behavior is, for the software in question, which is likely to lead to more explicit consideration of things like error modes than might otherwise happen.)

Formal specification of the meaning of markup languages is only a smaller part of making possible formal proofs of system correctness. But it’s a step, I think.