Pierazzo on digital diplomatic editions

[2 January 2012]

A new issue of Literary and Linguistic Computing arrived not long ago. I’ve been meaning to note that it contains an article I hope will become standard reading for anyone involved with the digitization of texts and manuscripts: Elena Pierazzo’s thoughtful essay under the title “A rationale of digital documentary editions”. (LLC 26.4 (Dec 2011): 463-477.)

Dr. Pierazzo is a member of the team responsible for the Jane Austen’s Fiction Manuscripts Digital Edition and she has used her experience with that edition to reconsider the question are digital editions different from printed ones? … Do they represent an advancement of textual scholarship or just a translation of the same scholarship into a new medium? My own inclination is almost always to stress the continuity of scholarly concerns here as more important than the discontinuities of the technologies pressed into the service of scholarship, practitioner but Dr. Pierazzo reaches the different conclusion that digital editions are “substantially different” from print editions. She begins her argument by defining her terms and discussing some central questions (“What is a diplomatic edition?” “What does a diplomatic edition contain?” “Once you start identifying features to encode, illness where do you stop?”). She takes a firm stand against the view that it’s possible for an edition to encode everything about a source document (so what does an edition contain? A selection of the available information. She concludes her survey of recent thinking about the topic with the remark that

It is only with the advent of digital editions that we have started to understand that what we need is scholarly guidance …, buy more about that we need to rethink the reasons why we make our transcriptions, and that this approach should apply to print and digital editions alike.

She then proceeds to make a start on the task she has identified, listing a number of features (or rather, classes of features) which may be recorded, or elided, in a digital edition, and proposing five criteria for deciding whether to include or exclude them:

  • the purpose of the edition
  • the needs of prospective readers
  • the nature of the document
  • the capabilities of the publishing medium
  • cost

In the section on the purpose of the edition, she uses the decisions made by the Austen project to illustrate the kinds of considerations that arise, and devotes a useful couple of paragraphs to “What was not encoded”. As the last two items in her list of criteria suggest, while she argues that we need scholarly guidance about what information about a manuscript can usefully be recorded, she also takes a pragmatic view of the limitations imposed on any project by finite resources. The essay is, in the nature of things, overtly concerned only with text-bearing objects like manuscripts. But many of the concerns addressed will bear also on other forms of cultural artefact.

A wonderful piece; no one engaged in digitization of cultural heritage materials should miss it.

What constitutes successful format conversion?

[31 December 2010]

I wrote about the International data curation conference earlier this month, web but did not provide a pointer to my own talk.

My slides are on the Web on this site; they may give some idea of the general thrust of my talk. (On slide 4, “IANAPL” expands to “I am not a preservation librarian”. On slide 20, the quotation is from an anonymous review of my paper.)

Over time, I become more and more convinced that formal proofs of correctness are important for things we care about. The other day, for example (29 December to be exact), I saw a front-page article in the New York Times about radiation overdoses resulting from both hardware and software shortcomings in the device used to administer radiotherapy. I found it impossible not to think that formal proofs of correctness could help prevent such errors. (Among other things, formal proofs of correctness force those responsible to say with some precision what correct behavior is, for the software in question, which is likely to lead to more explicit consideration of things like error modes than might otherwise happen.)

Formal specification of the meaning of markup languages is only a smaller part of making possible formal proofs of system correctness. But it’s a step, I think.