Paper announcement: on tree automata for XML Schema validation

Dear all, 

We would like to announce the following research report:

"XML Schema, Tree Logic and Sheaves Automata."
By Silvano Dal-Zilio and Denis Lugiez.

Some recent works have studied statically typed languages for XML
processing based on Document Type Definition (DTD), a simple standard
for defining documents validity. In this context, several algorithms are
based on regular tree automata. 

This paper is concerned with an extension of tree automata tailored to
the manipulation of XML Schema, a more advanced standard. The paper is
available from: http://www.cmi.univ-mrs.fr/~dalzilio/RR-4631.ps


Silvano Dal-Zilio
Denis Lugiez



We describe a new class of tree automata and a related logic on trees,
with applications to the processing of XML documents and XML schemas.

XML documents, and other forms of semi-structured data, may be roughly
described as edge labelled trees. Therefore it is natural to use tree
automata to reason on them and try to apply the classical connection
between automata, logic and query languages. This approach has been
followed by various researchers and has given some notable results,
especially when dealing with Document Type Definition (DTD), the
simplest standard for defining XML documents validity. But additional
work is needed to take into account XML schema, a more advanced
standard, for which regular tree automata are not satisfactory. A major
reason for this inadequacy is the presence of an associative-commutative
operator in the schema language, inherited from the &-operator of SGML,
and the inherent limitations of regular tree automata in dealing with
associative-commutative algebras.

The class of automata considered here, called sheaves automata, is a
tailored version of automata for unranked trees with both associative
and associative-commutative symbols already proposed by the authors. In
order to handle both ordered and unordered operators, we combine the
transition relation of regular tree automaton with regular word
expression and counting constraints. This extension appears quite
natural since, when no counting constraints occurs, we obtain hedge
automata, a simple model for XML schemata, and when no constraints
occur, we obtain regular tree automata.

Silvano DAL ZILIO	           CNRS / LIF Marseille / Equipe MOVE
http://www.cmi.univ-mrs.fr/~dalzilio  Tel: +33 491 113 625  Fax: -602
    CMI, 39 rue F. Joliot-Curie, 13453 Marseille Cedex 13, FRANCE