<!DOCTYPE TEI.2 PUBLIC '-//C. M. Sperberg-McQueen//DTD
          TEI Lite 1.0 plus SWeb (XML)//EN'
          '../../../../lib/swebxml.dtd' [
<!ATTLIST list type CDATA 'bullets' >
<!ATTLIST seg  rend CDATA 'incremental' >
<!ATTLIST xref href CDATA '' >

<!ENTITY date.last.touched '9 November 2009'>

<!ENTITY ntilde  "&#241;" ><!-- small n, tilde -->

]>
<?xml-stylesheet type="text/xsl" href="../../../../lib/swebtohtml.xsl"?> 
<TEI.2>
<teiHeader>
<fileDesc>
<titleStmt>
<title>Thutmose I MT:  MARCXML to TEI Header translation</title>
</titleStmt>
<publicationStmt>
<pubPlace>Espa&ntilde;ola, New Mexico</pubPlace>
<publisher>Black Mesa Technologies LLC</publisher>
<date>2009</date>
</publicationStmt>
<sourceDesc>
<p>No source; created in electronic form.</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<front>
<titlePage>
<docTitle>
<titlePart>Thutmose I MT: MARCXML to TEI header translation</titlePart>
</docTitle>
<docAuthor>C. M. Sperberg-McQueen, Black Mesa Technologies LLC</docAuthor>
<docDate>26 October 2009</docDate>
<docDate>rev. &date.last.touched;</docDate>
</titlePage>
</front>
<body>
<p>Thutmose I translates from MARCXML records to TEI headers.  The
<q>I</q> indicates that this is the first, <soCalled>level 1</soCalled>,
version of the program; it handles the most important MARC fields,
but does not handle all of the mappings described in the 
TEI best practices document.</p>

<div id="usage">
<head>Usage</head>
<p>This document assumes you have an XSLT 1.0 processor and know how
to invoke it on your MARCXML data, and how to pass parameters to
the stylesheet processor.</p>
<div id="parms">
<head>Run-time parameters</head>
<p>The most important parameters are these:
<list type="glossary">

<label><code>result</code></label>
<item>Specifies which result element to use for each MARC record in the
input:<list>
<item><code>TEI</code> means to produce a complete <soCalled>tadpole</soCalled>
TEI document, containing a header with a rudimentary text structure.</item>
<item><code>teiHeader</code> means to produce just the header, without
the TEI wrapper element and rudimentary text.</item>
</list>
If the input contains more than one MARC record, the set of resulting
elements (whether TEI elements or teiHeader elements) will be wrapped
in a <ident>teiCorpus</ident> element regardless of the value of 
<code>result</code>.
</item>

<label><code>marcitem</code></label>
<item>Specifies whether the MARC record in the input describe 
the source of the TEI document, or the TEI document itself.
input:<list>
<item><code>source</code> means the MARC record describes
the source, not the TEI document itself.  
</item>
<item><code>tei-from-source</code> means the MARC record describes
the TEI document itself, and was prepared with the source of 
the TEI document in hand.
</item>
<item><code>tei-from-digital</code> means the MARC record describes
the TEI document itself, and was prepared without the source of 
the TEI document in hand, only the digital object (page scans, etc.) 
itself.
</item>
</list>
The <code>marcitem</code> determines whether certain information in
the MARC record (for example, the edition statement) is copied
into the TEI header as a child of <ident>fileDesc</ident> or
as a descendant of <ident>sourceDesc</ident>.
</item>

</list>
</p>
</div>
<div id="alsoparms">
<head>Additional parameters</head>
<p>Additional parameters include:
<list type="glossary">

<label><code>idno</code></label>
<item>Use the value specified both as the value of the <att>id</att>
attribute on the <gi>TEI</gi> element (if one is generated) and
as the value of an <gi>idno</gi> element in the publication 
statement.</item>

<label><code>usertext</code></label>
<item>Supplies the name of a file containing various bits of
user-supplied text to be inserted at appropriate places in
the output.  (For example, when the MARC record describes the
source, not the TEI document, the publication statement for the
TEI document will typically identify the project which is
creatign the TEI document; appropriate XML should be provided in
the document named in the <code>usertext</code> parameter.)
Defaults to <q><code>user-supplied-text.xml</code></q>, which
(as shipped) contains data used by Indiana University..
</item>

<label><code>beautification</code></label>
<item>Specifies whether trailing punctuation should be stripped
from certain values before the values are copied to the 
output.  <list>
<item><code>1</code> means yes, attempt to strip the non-significant
punctuation.</item>
<item><code>0</code> means no, do not attempt to strip non-significant
punctuation, just copy things as they are.</item>
</list>
Because MARC records vary so much in their punctuation practice,
and because sometimes the trailing punctuation really does belong
to the value (e.g. a title ending in the word <q>etc.</q>), 
Value beautification is at best an approximation:  sometimes
it strips characters that shouldn't be stripped, and sometimes
it leaves characters that should be stripped.
Turning it off, on the other hand, will leave a large number of extraneous commas, 
slashes, and full stops in personal names
and work titles.  Either way, ideally the header should be
cleaned up by hand.</item>

<label><code>sourceDesc</code></label>
<item>Specifies how the source description will be tagged.
Known values are:<list>
<item><code>biblFull</code></item>
</list>
Later, <code>biblStruct</code> and possibly other values
will also be supported.
</item>

</list>
</p>
</div>
</div>

<div id="getit">
<head>Getting Thutmose I</head>
<p>Just download what you need:<list>
<item><xref href="mt-1.xsl">mt-1.xsl</xref> the main stylesheet</item>
<item><xref href="mt-1-pull.xsl">mt-1-pull.xsl</xref> the named <soCalled>pull</soCalled> templates</item>
<item><xref href="mt-1-push.xsl">mt-1-push.xsl</xref> the <soCalled>push</soCalled> templates</item>
<item><xref href="mt-1-utils.xsl">mt-1-utils.xsl</xref> miscellaneous utiltity templates</item>
<item><xref href="user-supplied-text.xml">user-supplied-text.xml</xref> sample text (from Indiana) for user-supplied information</item>
<item><xref href="userdoc.xml">userdoc.xml</xref> this document (in TEI Lite)</item>
<item><xref href="progdoc.xml">progdoc.xml</xref> a document aimed at people who want to modify or maintain the stylesheet (in TEI Lite)</item>
</list>
</p>
<p>
Many XSLT processors will be able to retrieve the stylesheets from the
Web, so you may not need to download them at all.
If you do want to download things, e.g.
to run the stylesheet locally, you'll need all the XSL files.  
</p>
</div>

<div id="plans">
<head>Plans for the future</head>
<p>This is the first version of Thutmose.</p>
<p>Follow-on projects are expected to:<list>
<item>provide more complete mappings from MARC to TEI headers</item>
<item>improve the data beautification options</item>
<item>provide better guesses about what to do, when the same
field may map to more than one TEI element, or when the
same information may be redundantly present in more than one
MARC field</item>
<item>provide some record of changes made in the course of 
beautification, so that it can be reviewed and the data can 
be edited if necessary; changes will be recorded either 
inline (in processing instructions) or in a log emitted as
a side effect of transformation</item>
<item>map from TEI headers to MARC (currently this has lower priority)</item>
</list>
</p>
</div>
<div id="gaps">
<head>Known gaps, bugs, and shortcomings</head>
<p><list>
<item>The punctuation-stripping routine used when the beautify option is chosen 
does not always do the right thing.  In the data used for testing, it is
not always clear what the right thing is.</item>
<item>The <code>tei-from-source</code> and <code>tei-from-digital</code>
options are not yet fully worked out.</item>
<item>Fields currently mapped to appropriate elements in the TEI header
include: <list>
<item>245 (title)</item>
<item>100, 110, 111 (various forms of author)</item>
<item>250 $a and $b (editionStmt)</item>
<item>300 $a, $b, and $c (extent)</item>
<item>260 $a, $b, $c (publicationStmt)</item>
<item>440, 490, and 830 $a (seriesStmt)</item>
<item>500, 546 (notesStmt)</item>
<item>600, 610, 611, 650, 651, 655 (profileDesc/textClass/kwywords) 
currently assumes withotut checking that values are from LCSH</item>
</list>
Not yet included:
<list>
<item>040 $b (teiHeader/@lang)</item>
<item>130, 240, 246 (other forms of title)</item>
<item>533 and 534 various subfields (*e.g. $a author of source, $t title, 
$b edition, $e extent, $c publication statement, etc.))</item>
<item>700, 710, 711 (added tracings for author, editor, other responsibles)</item>
<item>500 (respStmt, editorialDecl, projectDesc, ...)</item>
<item>other 5XX fields</item>
<item>028 5_, 099, 766 $w idno</item>
<item>050-099 classDecl</item>
<item>6XX second indicator (classDecl/taxonomy)</item>
<item>6xx _7 $2 (classDecl/taxonomy)</item>
<item>008/35-37 langUsage</item>
<item>041, 546 langUsage/language</item>
</list>
</item>
</list>
</p>
</div>
</body>
</text>
</TEI.2>
<!-- Keep this comment at the end of the file
Local variables:
mode: xml
sgml-default-dtd-file:"/Library/SGML/Public/Emacs/sweb.ced"
sgml-omittag:t
sgml-shorttag:t
End:
-->
