<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE TEI.2 PUBLIC '-//C. M. Sperberg-McQueen//DTD
          TEI Lite 1.0 plus SWeb (XML)//EN'
          '../../lib/swebxml.dtd' [
<!ATTLIST list type CDATA 'bullets' >
<!ATTLIST seg  rend CDATA 'incremental' >
<!ATTLIST xref href CDATA '' >

<!ATTLIST div id ID #IMPLIED >
<!ATTLIST item id ID #IMPLIED >

<!ENTITY date.last.touched '15 February 2012'>
<!ENTITY date.original '14 February 2012'>

<!ENTITY aacute  "&#225;" ><!-- small a, acute accent -->
<!ENTITY agrave  "&#224;" ><!-- small a, grave accent -->   
<!ENTITY ntilde  "&#241;" ><!-- small n, tilde -->
<!ENTITY ouml    "&#246;" ><!-- small o, dieresis or umlaut mark -->

<!ENTITY lsquo  "&#x2018;" ><!--=single quotation mark, left-->
<!ENTITY rsquo  "&#x2019;" ><!--=single quotation mark, right-->
<!ENTITY ldquo  "&#x201C;" ><!--=double quotation mark, left-->
<!ENTITY rdquo  "&#x201D;" ><!--=double quotation mark, right-->

]>
<?xml-stylesheet type="text/xsl" href="../../lib/bmtdocs.xsl"?> 
<TEI.2>
<teiHeader>
<fileDesc>
<titleStmt>
<title type="main">SXT emulator</title>
<title type="sub">Streaming XML Transducers in XML</title>
</titleStmt>
<publicationStmt>
<pubPlace>Espa&ntilde;ola, New Mexico</pubPlace>
<publisher>Black Mesa Technologies LLC</publisher>
<date>2012</date>
</publicationStmt>
<sourceDesc>
<p>No source; created in electronic form.</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<front>
<titlePage>
<docTitle>
<titlePart>SXT emulator</titlePart>
<titlePart>Streaming XML Transducers in XML</titlePart>
</docTitle>
<docAuthor>C. M. Sperberg-McQueen</docAuthor>
<docDate>&date.original;</docDate>
<docDate>rev. &date.last.touched;</docDate>
</titlePage>

<div id="navbar" type="navbar">
<head>Nearby documents</head>
<divGen type="toc"/>
<div>
<list>
<item id="jd">
<xref
href="http://www.informatica.si/PDF/32-4/13_Dvorakova%20-%20Automatic%20Streaming%20Processing%20of%20XSLT...pdf"
>JD's paper defining SXT</xref></item>
<!--*
<item id="xslt-streaming">
<xref
href="https://www.w3.org/XML/Group/2010/01/xslt-models/"
>XSLT WG work on formalization of streaming</xref></item>
<item id="msm-on-sxt">
<xref
href="https://www.w3.org/XML/Group/2010/01/xslt-models/dvorakova.sxt.html"
>MSM attempt to describe SXT in XSLT terms</xref></item>
*-->
<item id="siteroot"><xref href="../..">Home</xref></item>
</list>
</div>
</div>
</front>
<body>

<p>This page describes a simple emulator for
the Streaming XML Transducer automata described
by the computer scientist Jana Dvo&#x0159;&aacute;kov&aacute;
in her article &ldquo;Automatic Streaming 
Processing of XSLT Transformations Based 
on Tree Transducers&rdquo;, 
<emph>Informatica</emph> 32 (2008): 373-382 
(available on the Web from the <xref href=
"http://www.informatica.si/PDF/32-4/13_Dvorakova%20-%20Automatic%20Streaming%20Processing%20of%20XSLT...pdf">Infomatika journal web site</xref>).
</p>
<p>The emulator described and made available here was
created as a toy in the interests of understanding SXT
automata better.  It is not intended or suitable for
production use. The author thanks Dimitre Novatchev and
Michael Kay for spurring him to the work, but they should
not be held accountable for its shortcomings.
</p>

<div id="sxt">
<head>SXT</head>
<p>Streaming XML Transducer automata walk through
the input XML document, performing a pre- and post-order
traversal of the document.  Each interior node is visited twice,
once going down and once going up.  On each visit, a matching
rule fires.  Along the way, an output tree is built up, which
is also traversed in depth-first order.
</p>
<p>For further details see the original paper.</p>
</div>
<div id="source">
<head>The emulator</head>
<p>The <xref href="sxt.xsl"> XSLT 1.0 (sic) source 
for the emulator</xref> is available on this site.
Example transformations are linked to <xref
href="#exx">below</xref>.
</p>
<p>The emulator can be run with any XSLT 1.0 processor.</p>
<list>
<item>The SXT automaton description should be the main input
document.</item>
<item>The URI of the input XML document should be the given
in the <code>input-uri</code> parameter.</item>
<item>Varying levels of execution tracing can be obtained by
specifying the <code>msg-level</code> parameter with 
any of the values <code>terse</code>,
<code>verbose</code>,
<code>trace</code>,
<code>debug</code>, and 
<code>tmi</code>.  (Note however that at
the moment [15 February] the tracing messages are oriented
more toward debugging the emulator than to tracing
what the emulator is doing.  That will change soon, I hope.)</item>
</list>
<p>The output of the processor should be the XML encoding of the
output tree.</p>
</div>
<div id="input">
<head>Input format</head>
<p>The input for this stylesheet is an XML description of
an SXT automaton.  The namespace 
<xref href=
"http://blackmesatech.com/2012/sxt/">http://blackmesatech.com/2012/sxt/</xref>
is used for the input; in this description the prefix
<code>sxt</code> is used for that namespace.
<!--* I almost used 
<xref href=
"http://www.informatica.si/PDF/32-4/13_Dvorakova%20-%20Automatic%20Streaming%20Processing%20of%20XSLT...pdf">http://www.informatica.si/PDF/32-4/13_Dvorakova%20-%20Automatic%20Streaming%20Processing%20of%20XSLT...pdf</xref>
but then remembered that I'm only supposed to create namespaces in
domains I own, regardless of whose conceptual apparatus the
namespace is intended to represent.
*-->
<list>
<item><p>The <code>sxt:sxt</code> element contains the description of an
SXT transducer; its content is a sequence of
<code>sxt:rule</code> elements.</p>
<p>The <code>sxt:sxt</code> element carries a <code>start-mode</code>
attribute which indicates the starting mode of the automaton.</p></item>
<item><p>The <code>sxt:rule</code> element contains one rule for
the transducer.</p>
<p>It can carry three attributes:<list>
<item><code>match</code> resembles the <code>match</code> of XSLT:
it indicates which nodes are matched by the rule.  The match
pattern must be a QName or one of the standard node tests
<code>text()</code>,
<code>comment()</code>, or
<code>processing-instruction()</code>.
</item>
<item><code>mode</code> resembles the <code>mode</code> of XSLT,
but the emulator requires that the name begin with 
one of the characters <code>d</code>, <code>D</code>, 
<code>u</code>, or <code>U</code>.  Mode names beginning
with <code>d</code> or <code>D</code> are names of down modes;
names beginning with <code>u</code> or <code>U</code> name
up modes.  All modes are down modes or up modes.
</item>
<item><code>pos</code> indicates a constraint on the
position of the node matched.  It can take any of four
values:
<list>
<item><code>leaf</code> indicates that the node matches only
leaf nodes (nodes with no children)</item>
<item><code>not-leaf</code> indicates that the node matches only
non-leaf nodes (nodes with at least one child)</item>
<item><code>last</code> indicates that the node matches only
final nodes (nodes with no right sibling)</item>
<item><code>not-last</code> indicates that the node matches only
non-last nodes (nodes with a right sibling)</item>
</list>
</item>
</list>
</p>
<p>An <code>sxt:rule</code> element contains either
a sequence of result tree fragments, with exactly one
SXT movement instruction (see below), or else zero or more 
<code>sxt:close</code> elements followed by one SXT movement
instruction.  (In principle, the SXT movement instruction
should be the rightmost leaf node in the rule, but this 
is not currently enforced.  However, any nodes located to the right
of the movement instruction will be ignored.)</p>
<p>Result tree fragments are specified literally as
elements, text nodes, comments, and processing instructions.  
In the current version (v1) of the emulator, attributes are
not supported.
</p>
<p>For example, the following rule fires for 
non-leaf <code>h1</code> elements in <code>down</code> mode:
</p>
<eg><![CDATA[  <!--* h1 *-->
  <sxt:rule match="h1" mode="down" pos="not-leaf">
    <div>
      <head><sxt:down mode="down-2"/></head>
    </div>
  </sxt:rule>
]]></eg>
<p>It means, roughly:  add a <code>div</code> element to the
current location in the output tree, and within that <code>div</code>
add a <code>head</code> element.  Then move to the first
child of the input <code>h1</code> element, in mode <code>down-2</code>.</p>
<p>An example of a &lsquo;closing rule&rsquo; is the following,
which fires on <code>h1</code> elements in mode <code>up</code>,
when the <code>h1</code> is the last child of its parent:</p>
<eg><![CDATA[
  <sxt:rule match="h1" mode="up" pos="last">
    <sxt:close name="head"/>
    <sxt:close name="div"/>
    <sxt:up mode="up"/>
  </sxt:rule>]]></eg>
<p>This rule moves the write-head up two levels, closing first
the <code>head</code> element and then the <code>div</code> element,
and then traversing to the parent of the <code>h1</code> element,
in mode <code>up</code>.</p>
<p>The closing rule illustrates two of the ways in which the
emulator goes beyond SXT automata as defined by Jana
Dvo&#x0159;&aacute;kov&aacute;:  first the emulator allows
more than one <code>sxt:close</code> instruction in a rule, thus
moving up more than one level in the output;
pure SXT closing rules move up just one level.  Moving multiple
levels is possible but requires additional states.  Second, 
the <code>sxt:close</code> element accepts an optional
<code>name</code> attribute to specify what element the
transformation author expects to be closing and the emulator
signals an error if the name doesn't match; the pure SXT
automaton does not cater to human frailty in this way.</p>
</item>
<item><p>The SXT movement instructions signal that
at that point in the rule, the SXT automaton should
move to the node indicated.  They are roughly equivalent
to <code>xsl:apply-templates</code> with specific
<code>select</code> values.  Each can also carry
a <code>mode</code> attribute, which has essentially the
same meaning and behavior as in XSLT.
</p>
<list>
<item><p>The <code>sxt:down</code> element signals that
at that point in the rule, the SXT automaton should
move to the first child of the current node (roughly
<code>&lt;xsl:apply-templates select="child::node()[1]"/&gt;</code>)
and find a matching rule in the mode indicated.
The mode indicated must be a down mode.
</p>
</item>
<item><p>The <code>sxt:self</code> element signals that at that point
in the rule, the SXT automaton should find a matching rule in the mode
indicated for the current element (roughly
<code>&lt;xsl:apply-templates
select="."/&gt;</code>).  There is no
movement in the input tree.
If the current rule is a down-mode rule matching a leaf node,
the mode indicated may be an up-mode.
Otherwise, 
the mode indicated must match the direction of the
rule's mode (a down-mode rule can switch to another
down mode for the same node, and an up-mode rule can
switch to another up mode, but no switching from up modes
to down modes is allowed, or vice versa).
</p>
</item>
<item><p>The <code>sxt:right</code> element signals that
at that point in the rule, the SXT automaton should
move to the immediate right sibling of the current node (roughly
<code>&lt;xsl:apply-templates select="following-sibling::node()[1]"/&gt;</code>)
and find a matching rule in the mode indicated,
which must be a down mode.</p>
</item>
<item><p>The <code>sxt:up</code> element signals that
at that point in the rule, the SXT automaton should
move to the parent of the current node (roughly
<code>&lt;xsl:apply-templates select="parent::*"/&gt;</code>
and find a matching rule in the mode indicated, which
must be an up mode.</p>
</item>
</list>
<p>The allowed movements are subject to constraints 
which ensure that every node is visited in
the prescribed order.  Every rule must satisfy one
of the following descriptions:<list>
<item>In a down-mode rule matching a non-leaf node, 
the movement instruction is <code>sxt:self</code> 
in an up mode.</item>
<item>In a down-mode rule matching a leaf node, 
the movement instruction is <code>sxt:down</code> 
in a down mode.</item>
<item>In an up-mode rule matching a non-last node, 
the movement instruction is <code>sxt:right</code> 
in a down mode.</item>
<item>In an up-mode rule matching a last node, 
the movement instruction is <code>sxt:up</code> 
in an up mode.</item>
<item>In a down-mode rule,
the movement instruction is <code>sxt:self</code> 
in a down mode.</item>
<item>In an up-mode rule,
the movement instruction is <code>sxt:self</code> 
in an up mode.</item>
</list>
</p>
</item>
<item><p>The <code>sxt:close</code> element signals that
at that point in a rule, the write head should be moved
up one level in the output tree.  (Unreconstructed
imperative programmers may prefer to think of this in
imperative terms as meaning &ldquo;Emit a close tag for
the element on the top of the output stack, and pop
that stack.&rdquo;)</p>
<p>The <code>sxt:close</code> element may optionally carry
a <code>name</code> attribute, which records the
generic identifier of the element the programmer expects
to be closing with the close instruction.  The emulator
will signal an error if the name doesn't match the name
on the top of the output stack.</p></item>
<item><p>The <code>sxt:copy</code> element makes sense only
in a rule matching a text node, comment node, or processing
instruction;
it signals that the node should be copied to the output.</p>
<p>The <code>copy</code> instruction does not correspond to
any primitive operations in SXT automata as defined; it was
added to make it possible to write more interesting 
transformations as exercises.</p></item>
<item><p>The <code>sxt:stop</code> element indicates that the
automaton should stop.</p>
<p>Like <code>sxt:copy</code>, this instruction does not
correspond to anything in the formal definition of SXT automata.
It may in fact be completely unnecessary;
the paper defining SXT is a bit vague about halting conditions (or at
least I have failed to understand cleary what the exact halting
conditions are).  The justification is wholly pragmatic:  the first
versions of the emulator had a little trouble knowing when to stop,
which went away when this explicit stop instruction was
added.</p></item>
</list>
</p>
</div>
<div id="exx">
<head>Examples</head>
<p>The following simple exercises may serve to
illustrate the behavior of SXT automata.
Most of these small exercises were posed in
roughly their current form by Michael Kay,
and others by Dimitre Novatchev.</p>

<div id="identity-transform">
<head>Identity transform</head>

<p>The identity transform is not particularly
difficult or complicated, but precisely for that
reason it can be helpful to look at how the identity
transform is written in any transformation language.
</p>

<p>The current version of the SXT emulator requires
rules to be labeled with explicit match patterns (no
wildcards), so the identity transform is vocabulary-specific.
Here is one for the vocabulary used in the next example:
<code>doc</code>, 
<code>body</code>, 
<code>h1</code>, 
<code>p</code>, 
text nodes, comments, and processing instructions.
<list>
<item><xref href="exx/h1-p.identity.sxt.xml">SXT transform</xref></item>
<item><xref href="exx/h1-p.input.xml">sample input</xref></item>
<item><xref href="exx/output.h1-p.identity.xml">sample output</xref></item>
</list>
Examination of the input and output shows some whitespace-related
disturbances (white space in the automaton description being copied
to the output); XSLT users who care about whitespace will be 
familiar with this class of problem.
</p>
</div>
<div id="div-wrapping">
<head>Div-wrapping</head>
<p>Consider the following simple exercise.</p>
<p>The input document contains a sequence of
<code>h1</code> and <code>p</code> elements; the
job of the transform is to wrap a <code>div</code>
element around each sequence of an <code>h1</code>
element and the following <code>p</code> elements.
Any leading <code>p</code> elements should be left
unwrapped.
</p>
<p>So input of the form
<eg><![CDATA[<body><h1/><p/><p/>...<h1/><p/><h1/><p/>...</body>]]></eg>
should produce output of the form
<eg><![CDATA[  <body>
    <div><h1/><p/><p/>...</div>
    <div><h1/><p/></div>
    <div><h1/><p/>...</div>
  </body>]]></eg>
</p>
<p>An SXT solution for this exercise:
<list>
<item><xref href="exx/h1-p.sxt.xml">SXT transform</xref></item>
<item><xref href="exx/h1-p.input.xml">sample input</xref></item>
<item><xref href="exx/output.h1-p.input.xml">sample output</xref></item>
</list>
In processing <code>h1</code> and <code>p</code> elements, the
transform must keep track of whether a <code>div</code> has
or has not already been opened in the output: if it has,
an <code>h1</code> must close that <code>div</code> before
opening a new one, and a final <code>p</code> must close 
the <code>div</code> as well as the <code>p</code> element
before traversing to the enclosing <code>body</code> element.
This is one bit of information (div opened? yes or no); it is
tracked in this solution by doubling the number of modes which
would otherwise be needed:  <code>down</code> and <code>up</code>
cover the simple case of the identity transform,
while <code>down-2</code> and <code>up-2</code> are used
in the special case that one more element has been opened
than would be opened in the identity transform.  The basic
principle is:  each additional bit of information requires doubling
the number of states (modes) for all the elements which must
deal with the information or transmit it without loss.
</p>
</div>

<div id="employee-wrapping">
<head>Employee wrapping</head>
<p>The input document contains a sequence of
<code>d</code> (department) elements.  Some
<code>d</code> elements contain sequences of
<code>emp</code> elements, others are empty.
The transform is to copy empty <code>d</code>
elements unchaged to the output, and wrap
any sequences of <code>emp</code> elements inside
<code>ol</code> elements.
</p>

<p>So input of the form
<eg><![CDATA[<d/><d/><d><emp/><emp/><emp/></d>]]></eg>
should produce output of the form
<eg><![CDATA[<d/><d/><d><ol><emp/><emp/><emp/></ol></d>]]></eg>
</p>

<p>SXT solution:<list>
<item><xref href="exx/d-emp.sxt.xml">SXT transform</xref></item>
<item><xref href="exx/d-emp.input.xml">sample input</xref></item>
<item><xref href="exx/output.d-emp.input.xml">sample output</xref></item>
<item><xref href="exx/d-emp.2.input.xml">sample input 2</xref></item>
<item><xref href="exx/output.d-emp.2.input.xml">sample output 2</xref></item>
</list></p>

</div>

<div id="item-wrapping">
<head>Item wrapping</head>
<p>The input document contains a sequence of
<code>p</code> and <code>item</code> elements intermingled.  
Pass the <code>p</code> elements through unchanged;
wrap contiguous sequences of <code>item</code> elements
in <code>list</code> elements.
</p>

<p>So input of the form
<eg><![CDATA[<p/><p/><p/><item/><item/><p/>]]></eg>
should produce output of the form
<eg><![CDATA[<p/><p/><p/><list><item/><item/></list><p/>]]></eg>
</p>
<!--*
<p>SXT solution:<list>
<item><xref href="exx/h1-p.sxt.xml">SXT transform</xref></item>
<item><xref href="exx/h1-p.input.xml">sample input</xref></item>
<item><xref href="exx/output.h1-p.input.xml">sample output</xref></item>
</list></p>
*-->
</div>

<div id="inventory-summary">
<head>Calculating totals</head>
<p>The input document contains an <code>order</code> element with a
sequence of <code>item</code> elements. For example:
<eg><![CDATA[  <order> ...
    <item id="..." count="1" price="59.95">
      Men's Allegheny loafers, tan
    </item>
    <item id="..." count="2" price="64.95">
      Men's v-neck sweater, size 38, black
    </item>
    <item id="..." count="13" price="24.48">
      something something something
    </item>
    <item id="..." count="1" price="159.95">
      Cut-glass goblets, 1 pr
    </item>
    ...
  </order>]]></eg></p>

<p>For each item, multiple the count by the price,
and sum the resulting products.  The input shown should
produce the result
<eg><![CDATA[<total>668.04</total>]]></eg>
</p>
<!--*
<p>SXT solution:<list>
<item><xref href="exx/h1-p.sxt.xml">SXT transform</xref></item>
<item><xref href="exx/h1-p.input.xml">sample input</xref></item>
<item><xref href="exx/output.h1-p.input.xml">sample output</xref></item>
</list></p>
*-->
</div>

<div id="inventory-bis">
<head>Inserting totals</head>
<p>Exactly like the preceding, but perform an identity 
transform on all items as well as calculating the total.
Insert the <code>total</code> element
before the end-tag for <code>order</code>.
</p>

<!--*
<p>SXT solution:<list>
<item><xref href="exx/h1-p.sxt.xml">SXT transform</xref></item>
<item><xref href="exx/h1-p.input.xml">sample input</xref></item>
<item><xref href="exx/output.h1-p.input.xml">sample output</xref></item>
</list></p>
*-->
</div>
<div id="gi-list">
<head>Element lists</head>
<p>This exercise has several variants.</p>
<p>List the names of all elements in the input document,
in document order, one name in the output for each element
instance in the input. If more than one instance of an
element type occurs in the input, the name will occur
more than once in the output.</p>
<p>List the names without duplicates (in document
order; in sorted order).</p>
<p>List the names and the number of element instances for each name,
in descending frequency order.</p>
<!--*
<p>So input of the form
<eg><![CDATA[<d/><d/><d><emp/><emp/><emp/></d>]]></eg>
should produce output of the form
<eg><![CDATA[<d/><d/><d><ol><emp/><emp/><emp/></ol></d>]]></eg>
</p>
*-->
<!--*
<p>SXT solution:<list>
<item><xref href="exx/h1-p.sxt.xml">SXT transform</xref></item>
<item><xref href="exx/h1-p.input.xml">sample input</xref></item>
<item><xref href="exx/output.h1-p.input.xml">sample output</xref></item>
</list></p>
*-->
</div>
</div>

<div id="tasks">
<head>Task list</head>
<div id="todo">
<head>To do</head>
<list>
<item>Add an <code>sxt:open</code> instruction (mirror
image of <code>sxt:close</code>) and simplify the prescribed
structure for rules: <list>
<item>Either a sequence of zero or more <code>sxt:close</code>
instructions followed by a movement instruction,</item>
<item>or a sequence of zero or more <code>sxt:open</code> instructions
and literal result elements, intermingled (or optionally 
just <code>sxt:open</code>,
<code>sxt:close</code>, and
<code>sxt:text</code> elements, intermixed subject to
the rule that at every point in the sequence the number
of <code>sxt:close</code> elements before that point
must not exceed the number of <code>sxt:open</code> elements
before that point), followed by an SXT movement instruction.
</item>
</list>
</item>
<item>Write and test solutions to more examples.</item>
<item>Add pointers to examples.</item>
<item>Allow simple wildcards in match patterns, to make
near-identity transforms less mind-numbingly repetitive to
write.  Also make <code>sxt:copy</code> behave, for 
elements, in a way roughly similar to <code>xsl:copy</code>.</item>
<item>Change debugging messages to use
message level <code>debug</code>, leaving
message level <code>trace</code> for messages
tracing the execution of the SXT automaton.</item>
<item>Check the input to enforce rules.</item>
<item>Make Web interface to allow user to run
transformations in their Web browser (in the
style of the <xref href="http://blackmesatech.com/2009/04/vc.html"
>XSD 1.1 vc:* filter</xref>)?</item>
<item>Make XForms interface to animate the execution 
of the automaton (in the style of the <xref href="../../2011/12/pl0/index.xhtml"
>PL/0 virtual machine emulator</xref>)?</item>
<item>Extend emulator to allow passing and retrieval of
numeric and string-valued parameters.</item>
<item>Add <code>sxt:deep-copy</code> and 
<code>sxt:skip</code>instructions?</item>
</list>
</div>
<div id="done">
<head>Done</head>
<list>

<item>[15 February 2012] Describe SXT briefly.</item>
<item>[15 February 2012] Describe input format.</item>
<item>[15 February 2012] Trivial extension:  allow (or require) QName on
<code>sxt:close</code> for sanity checking.</item>
<item>[15 February 2012] Write first version of emulator.
<list>
<item>Emit output.</item>
<item>On demand, emit trace information.</item>
</list></item>
<item>[14 Feb 2012] began this web page</item>
</list>
</div>
</div>

</body>
</text>
</TEI.2>
<!-- Keep this comment at the end of the file
Local variables:
mode: xml
sgml-default-dtd-file:"/Library/SGML/Public/Emacs/sweb.ced"
sgml-omittag:t
sgml-shorttag:t
End:
-->
