#d2d 2.0 text using mtdocpage : webpage #title #tdom, a Generator for Typed XML Models #htmlTitle bandm metatools Tdom, Strictly TYPED XML Models #lang en #tableOfContents // ------------------------------------------------------- #h1 #title Principles of #tdom #label principles // ------------------------------------------------------- #p #tdom is a tool for generating #emph!typed! data models of an xml text body according to a definition given as XML DTD #cite xml . "Typed" model means that (a) the validity of the model w.r.t. the DTD is guaranteed by all creation and modification methods,#footnote But see the discussion on unsafe API methods in #ref txt_dtdguarant below #/footnote and (b) that this can be proved at compile time.#footnote Only exception: When taking them seriously, the constraints on id/idref/idrefs attributes imposed by #cite xml are nearly impossible to maintain! And they would be very expensive to check automatically with every model update. Therefore they are only evaluated on demand, controlled by the user. See #ref txt_idAttributes for details. #/footnote #p The #tdom generated model behaves "partly algebraic", since each node behaves like an algebraic expression and knows nothing about the context(s) it appears in. So, in contrast to w3c DOM (#cite w3cDom), you can employ #emph!sharing!, even between different "documents". Nodes exist #emph!independently! from a global document object, and can be created, processed and stored in a freely compositional and local way. This is a fundamental requirement for a "functional style" of programming. #nl #tdom nodes do #emph!not! behave algebraic in the sense that they can be treated as #emph!mutable! (but think twice if it is really necessary for your purposes, you loose sharing !-), and do not support algebraic #src!equals()!. #nl (This could be added in some later version !?!) #p The fact that they "do not know" their parent and their siblings makes #tdom nodes behave more like nodes in a tree in the mathematical sense. Software architects used to W3C DOM et.sim. may consider this restriction to be a draw-back. Processing and creating trees is of course fundamentally different in this paradigm: creating goes most naturally bottom-up, processing goes most naturally by visiting top-down (see #ref txt_visitor) and memorizing all required context information "on the flight". #nl (For all our applications, we found this a most convenient, safe and easy to debug way of coding !-) #p Applying the #tdom compiler to a DTD yields a collection of Java source files, forming a single #src package#/. This package will be processed by a Java compiler. It relies on the presence of a collection of base clases in the package #link 2/eu/bandm/tools/tdom/runtime/package-tree.html #text !/tdom/runtime#/link, see #ref txt_runtimelasses#/ref . The generated collection provides (at least) one #java class definition for each type of node defined by the DTD. This includes #ldots #list #i one or two classes for each #src!ELEMENT! declaration, #i one class for each attribute declaration, #i and one class for each structured sub-content of an #src!ELEMENT!'s content. #/list All these Java classes are called "node classes" in the following. A #tdom model of a certain text corpus is realized by a structured collection of instances of these node classes. Each such instance represents a certain part of the document, and each node class represents a certain type of these document parts. #p All generated node classes provide #ldots #list #i constructor definitions, #i methods for retrieving sub-nodes of a given node, #i methods for updating sub-nodes of a given node while preserving the correctness w.r.t. the DTD,#nl (For all these topics see #ref txt_dtd2java and #ref txt_tdomconstruction) #i parsing methods for creating documents from SAX streams or W3C-DOMs (see #ref txt_autoconst), #i methods for translating a #tdom model into a SAX stream, #i methods for a compressed serialization and de-serialization (for these topics see #ref txt_serial). #/list #p Additionally the generated package contains #ldots #list #i a visitor base class #src!Visitor! and a visitor template #src!VisitorTemplate! for declarative processing of the models, as described in #ref txt_visitor, #i a class derived from the run-time class #src!TypedDTD!, which contains a DTD model. Please note that this class additionally does need the #emph!textual! representation of the DTD as a (error-free, parseable) runtime resource, accessible via the relative path "#src!./original.dtd!" from the DTD class. #/list #p After these classes have been compiled by a #java compiler, you can create a #tdom text model #ldots #list #i #ldots either "manually", i.e. by explicitly calling constructors for node classes, thereby explicitly creating the document tree #emph!bottom-up!, see #ref txt_dtd2java #i or by creating a whole #src!Document! with the generated parsing methods: Either with the validating SAX receiver from a SAX event stream, or with the validation DOM interpreter from a W3C-DOM (e.g. #cite xercesj), see #ref txt_autoconst. #/list #p Each #tdom model, or each fragment thereof, can then #ldots #list #i#ldots be analysed by visitor based code, see #ref txt_visitor, #i#ldots be modified by applying the update methods of those Java standard libraries which make up the model, see #ref txt_dtd2java, #i#ldots and finally be written out by different serialization methods, see #ref txt_serial #/list #p #label public_sources_using_tdom Some of the public examples on page #link getit.html #text download & licences#/link make extensive use of #tdom/. #nl See esp. the "BandM booking" book keeping software, where a dedicated DTD models the business objects, and the #ddd based "Wiki" where type correct XHTML 1.0 is constructed in small pieces bottom-up. // FIXME BEISPIEL austauschen, -- besser dtd-renderer oder musicXml-generator. #p#kind src #label mt_sources_using_tdom In the sources of #mt themselves you find instructive examples. #link 2/eu/bandm/tools/option/package-tree.html #text !/option #/link shows a little stand-alone application: #nl The hand-coded DTD file #link 3/eu/bandm/tools/option/absy/option.dtd #text !/option/absy/option.dtd#/link defines the structure of the input texts. #nl These texts are read in and parsed parsed into a #tdom model. This model is then traversed by different visitors and directly translated to #java code, -- all in the one and compact file #link 3/eu/bandm/tools/option/Compiler.java#text !/option/Compiler.java#/link #p#kind src #link 2/eu/bandm/tools/formatfrontends/DynamicFormatter.html #text !/formatfrontends/DynamicFormatter#/ is also stand-alone, but more complex. It can be read as an instructive examples for how to solve small tasks by interpreting a #tdom model directly. #p#kind src A larger stand-alone application is found in #link 3/eu/bandm/tools/doctypes/xhtml/xhtml1-strict.dtd #text !/doctypes/xhtml/xhtml1-strict.dtd #/. #p#kind src #src!eu/bandm/tools/jul! and #src!eu/bandm/bpm/booking! are examples of rather small DTDs, which are not hand-written, but #emph!exported! from a #src!d2d! module definition. In these projects #src!d2d! is used as an input front-end to generated sax events which construct a #tdom model. #p#kind src #label mt_sources_using_xantlr_tdom But most applications of #tdom operate together with #xantlr, taking an "abstract syntax tree" and reducing it to some inernal (semantic) model. This chain can be observed in the packages #src!formatfrontends!, #src!umod!, #src!muli! and #src!d2d!, (not to speak of our applications #src!hr!, #src!bpmn!, #src!score!, #ldots ????) #p#kind src In more detail, the processing chains are #ldots #table #tr #td #link 3/eu/bandm/tools/umod/parser/umod.g #td --> #link 3/eu/bandm/tools/umod/parser/UModParser.dtd #td --> #link 2/eu/bandm/tools/umod/absy #td --> #link 3/eu/bandm/tools/umod/Reducer.java #tr #td #link 3/eu/bandm/tools/d2d/parser/d2d.g #td --> #link 3/eu/bandm/tools/d2d/parser/D2dParser.dtd #td --> #link 2/eu/bandm/tools/d2d/absy #td --> #link 3/eu/bandm/tools/d2d/base/Reducer.java #/table // ------------------------------------------------------- #h1 #title Mapping from DTD to #java #label txt_dtd2java // ------------------------------------------------------- #p In the following simplified examples, let "#src!pkg!" be the name of the generated packages, as given to the #tdom tool by the command line parameter #src!--pkgname!, see #ref txt_calltdom. // ------------------------------------------------------- #h2 #title Relevant Information content of a DTD #p The #tdom/ tool processes one single DTD file and generates one package of Java source files. (The contents of further DTD files, which are included directly or indirectly in this file by the famous "external parameter entity" mechanism, are of course also considered and are processed as if contained directly in the top file.) #nl From the DTD it uses #ldots #list #i#ldots all element defintions, their (expanded) content models and their attribute definitions. #i#ldots process instructions of the form "#src!!" #i#ldots entity definitions. #/list In most cases, entity definitions are only used implicitly. #tdom uses the #mt component #link dtd.html #text#src!dtd!#/link, expanding entity references in a transparent way. // ------------------------------------------------------- #h3 #title Process instructions #label txt_pi /* INSGESAMT: DOC anfang ubersicht in section #ref txt_pi (2.1.1 o.ae) DOC ausführlich in eigenem Abschnitt ( )( ) package FIXME ( ) used in mt/examples/tdom/extend/*.dtd (x)(x) doc (X)( ) 5 import (x)(X) 2.11 attribute (x)(X) 2.11 attribute-entity ( )( ) content ( )( ) content-entity (X)(?) 2.3 private 3.2 verweis ist FALSCH und geht nach 2.1.1 FIXME (X)(?) 2.3 public (X)(?) 2.3 default private (X)(?) 2.3 default public (x)(X) 2.3.1 abstract (5 "" ) FIXME ellipse (x)(X) 2.3.1 abstract-entity (X)(X) xmlns (x)(X) xmlns:[pref] (X)(x) SYSTEM x y (X)(x) PUBLIC x === */ #p The translation into Java code is controlled by a whole zoo of "process instructions" adressing #tdom, as defined in #cite xml. Here are the most important: #list #i#src!! #nl #src!! -- XML namespace definition, see #ref txt_nsmode. #i#src!! -- defines the xml document id for this DTD file, see #ref txt_ownid. #i#src!! -- defines which elements can be used as top of a model, as so called "document elements", see #ref txt_elementclasses. #i#src!! #nl #src!! -- generated #src!abstract! classes for alternatives in DTD content models, for leaner code, see #ref txt_abstractelements. Ending with an ellipsis it is also used in the expansion mechanism, see #ref txt_extension. #i#src!! #nl #src!! -- generate a common class for an attribute used in different elements, for leaner code, see #ref txt_commonAttributes #i#src!! -- import another #tdom model for extending it, see #ref txt_extension #i#src!! -- define additional documentation text to integrate it into the generated API doc, see #ref txt_docTexts #i#src!! -- ?? FIXME MORE (_) #/list // ------------------------------------------------------- #h2 #title XML Namespaces #label txt_nsmode #p The identifier of the names of all elements, entities and attributes are implemented as #link 2/eu/bandm/tools/util/NamespaceName.html #text NamespaceName#/link. #p As documented there, this class can represent names in "non-namespace-mode" and in "namespace-mode". In the first case the character "#src!:!" is treated in no way special. In the latter there must be at most one such character, and the prefix is mapped to a "namespace URI", as it is the standard way with XML namespaces, see #cite xml-ns. #p For #tdom, the namespace mode is activated by process instructions which exactly follow the standard XML syntax: #commentchar\ #source #/source #commentchar/ #p For the runtime namespace logic, the prefixes are ignored and "equals()" etc. is ruled only by the namespace URI, -- the usual way with namespace aware XML. In this concern, all prefixes must be different, but can be arbitrary. #p But for the code generation the prefixes are kept and only the colon "#src!:!" is replaced by an underscore "#src!_!". (There are more characters to be replaced, see the paragraph on name translation in #ref txt_elementclasses.) So the selected prefixes appear in the name of the generated Java classes and should be selected accordingly. // ------------------------------------------------------- #h2 #title The own XML Document Id #label txt_ownid #p With the PIs #ldots #commentchar\ #source --- or --- #/source #commentchar/ #p #ldots/ the #link 2/eu/bandm/tools/message/XMLDocumentIdentifier.html #text XMLDocumentIdentifier#/link of the dtd file itself is made known to #tdom/. // is stored in TypedDOMGenerator.dtd.storeDocumentID() It will be stored in generated DTD class and is accessible by the method #link 2/eu/bandm/tools/tdom/runtime/TypedDTD.html #loc getDocumentId-- #text getDocumentId()#/link. // ------------------------------------------------------- #h2 #title Pre-Defined Infra-Structure, Runtime Classes #label txt_runtimelasses #p The classes generated by #tdom to represent Elements, Attributes and sub-expressions of content models, all inherit from pre-defined runtime classes. These classes are contained in #src!eu.bandm.tools.tdom.runtime! and provide basic functionalities. #p The following figure indicates symbolically their inheritance tree, and the places where the generated classes are inserted: #nl (#emph Please note#/ that this graph is only symbolic and leaves out many details. For more details, please refer to the current API documentation contained in #link 2/eu/bandm/tools/tdom/runtime/package-tree.html #text !/eu/bandm/tools/tdom/runtime #/link !) #commentchar\ #source INTERFACES: eu.bandm.tools.tdom.runtime. TypedContent TypedElement.MixedContentContainer | TypedElement.PCDataContainer Visitable ImpliedAttribute Identifiable ... // etc CLASSES: eu.bandm.tools.tdom.runtime. TypedDTD | pkg.DTD <<<<< GENERATED once for each Tdom model TypedNode | TypedDocument | | TypedSubstantial | | TypedPCData IMPLEMENTS Matchable // FIXME visitable / matchable??? WAS GILT?? | | | | TypedElement IMPLEMENTS TypedContent | | | pkg.Element <<<<< GENERATED once for each Tdom model | | | | pkg.Element_ <<<<< GENERATED once for each ELEMENT declaration | | | | pkg.Element_ <<<<< " | | TypedEthereal | | TypedComment | | TypedProcessingInstruction | | TypedSubtree IMPLEMENTS TypedContent | | TypedChoice | | TypedAttribute | | CDataAttribute | | EnumerationAttribute pkg.Element_.Attr_ <<<<< GENERATED for each attribute declaration | | NmTokenAttribute | | | IdAttribute | | | IdRefAttribute pkg.Element_.Attr_ <<<<< GENERATED for each attribute declaration | | NmTokensAttribute | | | IdRefsAttribute pkg.Element_.Attr_ <<<<< GENERATED for each attribute declaration | ... | TypedElement.MixedContent IMPLEMENTS TypedContent | pkg.Element_.Content <<<<< GENERATED once for each mixed-content ELEMENT TypedElement.MixedContentFactory //... #/source #commentchar/ // ------------------------------------------------------- #h2 #title Generated Java Classes for the top-level DTD #label txt_dtdgen #p In each #tdom run, among others one top-level class #src!pkg.Element! is generated, from which all element classes are derived, and one prototype class #src!pkg.Visitor!. The #src!pkg.Element! class implements #src!tdom.runtime.Visitable!; these co-operate in visitor dispatching. see #ref txt_visitor below. #p For a beginner, the different classes and instances called "dtd" or sim. may be confusing: #list #i In most cases, the DTD to translate into Java source files is given to the impementation of the #tdom compiler as a DTD text file, as it is defined in #cite xml. See #ref txt_calltdom below. #i In the generated package "#src!pkg!", a class called "#src!pkg.DTD!" will be generated. This is the central means for retrieving all kinds of reflective information, when later running the generated code. #i For this purpose, it inherits from #link 2/eu/bandm/tools/tdom/runtime/TypedDTD.html #text !/tdom/tdom/runtime/TypedDTD #/link. #i The method #link 2/eu/bandm/tools/tdom/runtime/TypedDTD.html #loc getInterfaceInfo-- #text !/tdom/tdom/runtime/TypedDTD/getInterfaceInfo #/link delivers different class objects, and an instance of #link 2/eu/bandm/tools/tdom/runtime/TypedDTD.AbstractElementInfo.html #text !/tdom/tdom/runtime/TypedDTD.AbstractElementInfo #/link. #i Esp. the generated #src!pkg.DTD! provides a public static field called "#src!dtd!", which gives access to one instance of itself. #i The original DTD source text must be accessible to the package's initialization code in a #emph!text file named! "#src!original.dtd!". #i This is parsed on initialization, and the resulting instance of #link 2/eu/bandm/tools/dtd/DTD.Dtd.html #text !/dtd/DTD.Dtd#/link, which is an #umod model, is made accessible by #link 2/eu/bandm/tools/tdom/runtime/TypedDTD.html #loc getDTD-- #text !/tdom/tdom/runtime/TypedDTD,getDTD()#/link. #/list // ------------------------------------------------------- #h2 #title Generated Java Classes for Element declarations. General Name Translation. #label txt_elementclasses #p For each #src() declaration in the source DTD, a node class is generated in the generated package "#src!pkg!". Instances of this class will be used to represent the document's sub-trees corresponding to this element. Such a class is called "element class" in the following. Its name is #src!Element_!, where #src!! is the DTD name translated to a Java name. #p #label txt_nametranslation The #bold!name translation from DTD to Java! is necessary for all kinds of names, as element tags, attribute names, entity names etc. What happens in all these cases is, that every single occurence of a minus signs "#src!-!", a colon "#src!:!" or a dot "#src!.!" is replaced by a single underscore "#src!_!". #p The #tdom tool does #bold!not check! whether ambiguities are created by this translation. Instead, you will get error messages from the subsequent Java compilation process, accordingly. This will happen for an input like #source #/source #p For #emph!those! #src() declarations which can serve as the #emph!top-level! node of a document, a further class names #src!Document_! is created. This inherits from #link 2/eu/bandm/tools/tdom/runtime/TypedDocument.html #text !.tdom.runtime.TypedDocument#/link and contains additionally the methods for parsing a document as a whole from some external source (SAX or W3C-DOM). The indication whether a given element is such a top-level one is encoded in the DTD by process instructions: The positive case (=yes, #src E #emph!can! be top-level element) is indicated in the DTD by the process instruction #source #/source #p The negative case by #source #/source #p For all those element declarations which are not explicitly mentioned in this way the default is defined by #source -- or -- #/source #p As mentioned above, every element class may contain inner Java classes which realize complex sub-contents, and update and retrieve methods ("#src!set_<..>(..)!" and "#src!get_<..>(..)!") for all sub-contents. #p Please note that you can even #xemph!derive further hand-coded sub-classes! from generated element classes. This is possible because, whenever a typed element needs to know the identity of the class it is an instance of (e.g. for visiting or for serialization), it does #emph!not! use the #java language #src!.getClass()!-method, but the generated method #src!public int getTagIndex()!, which will be inherited by your derived classes and is specified in #link 2/eu/bandm/tools/tdom/runtime/TypedElement.html #loc getTagIndex-- #text TypedElement.getTagIndex()#/link. #p The tag name of a given element can be read statically, when the class is known, by #ldots #source Element_<>.TAG_NAME #/source #p #ldots due to the generated definition #ldots #source public static final String TAG_NAME ; #/source #p The dynamic way is defined in #link 2/eu/bandm/tools/tdom/runtime/TypedElement.html #text !/tdom/runtime/TypedElement#/ by #source NamespaceName el.getName() String el.getTagName() String el.getNamespaceURI() String el.getLocalName() #/source #p // ------------------------------------------------------- #h3 #title Abstract Java Classes as Realisations of DTD Content Model Alternatives #label txt_abstractelements #p The #tdom tool does support some #emph!abstraction! of isomorphic content definitions. This abstraction is rather limited, due to the nature of DTD, but nevertheless an important means for increasing re-usability, while preserving static type safety. #p Any content model which is an undecorated choice expression of element references can be translated into an #src!abstract! class, in the Java terminology. The benefit comes from the further consequences: (a) all element declarations referred to in this alternative will be translated into an element class which is derived from this abstract class, and (b) whenever this choice clause will appear in a certain content model, it will be replaced by a simple reference to this abstract class. This happens independently of the sequential order of the alternatives, and also when the choice alternatives are a true subset of the alternatives of a bigger choice clause. #nl In the case of nested choices, the largest sub-expression of every choice which matches such an abstraction is replaced by the corresponding abstract class. #nl (These statement must be refined for overlapping definitions, see below.) #nl The single inheritance property of Java implies that each element may appear at most in one of these declarations. #p This mechanism is controlled by a tdom process instruction like #source #/source #p The content model must be a disjunction of plain element types. The PI declares that an abstract class "#src!Element_a!" will be generated. This will be the superclass of the classes which realize the elements in the disjunction, here: #src!Element_b!, #src!Element_c! and #src!Element_d!. We assume that the name "#src!a!" is #bold!fresh!, ie. does not appear as an element or entity declaration in the rest of the DTD. Whenever this choices appears in a content model, the complicate interface for choices (see #ref txt_realizingchoices below), is replaced by the much more simple interface for a single element reference. #source ==> yields a code interface containing class Element_x { public Element_a[] getElems_1_a() {...} public void setElems_1_a(Element_a[]) {...} public Elem_a getElem_2_a() {...} public Elem_a setElem_2_a(Element_b e) {...} public Elem_a setElem_2_a(Element_c e) {...} public Elem_a setElem_2_a(Element_d e) {...} } #/source #p #p#kind missing WHAT ABOUT (PCDATA | b|c|d) ???????????? #p A more realistic example, simplified from our XHTML model: #source // is NOT used in xhtml_1_0.dtd WHY?? FIXME #/source #p The last example shows that these definitions may be nested, as long as no cycles do result. #p Whenever such a choice expression is defined by the contents of a DTD "parameter entity", this entity can be used directly. its contents will define the subclasses, and its name will be used as the name of the abstract class. This is shown in the first line of the example above. The entity's expansion text may carry further decoration which will be stripped to get the alternative expressions, as in #source #/source //#p //The entity may only contain this alternative, plus additionally one //decorating character after the closing parenthesis. #p This abstraction mechanism significantly increases reusability and versatility: With the XHTML example, it is possible now to collect the very different sub-classes of #src!Element_block_content! into one single storage, e.g. an #src!ArrayList!, and later insert this sequence into an arbitrarily chosen instance of #src!Element_object!, #src!Element_map!, #src!Element_fieldset!, #src!Element_noscript!, #src!Element_body!, or #src!Element_blockquote!. #p Without this abstraction, the first, collecting step would already require the wrapping of the elements into a certain alternative of a certain choice type of a certain hosting element class, and the collected sequence could not be used anywhere else, without "hacking" and losing static type safety. #p The behaviour in case of #bold!overlapping! choice expressions is not fully defined: #commentchar\ #source // not clear what will happen here: #/source #commentchar/ #p So please look into the generated code to find out, or avoid such definitions. #p Up to here we assumed that the name of the abstract class is a fresh name. #nl A different case is to declare an #bold!existing element declaration as abstract!. The preconditions and consequences are the same, the contents model of the element must be an undecorated choice clause. #p Additionally, the corresponding node is not longer represented in the generated model by an object instance on its own, but is represented indirectly, by an instance of one of its sub-classes, which corresponds to an element from its contents model's choice expression. This instance transparently represents two(2) or even more nodes of the conceptual document tree, namely the chosen leaf element and the containing, abstract element. "Transparently" means that visitor code and attribute accessing methods are not affected. #p FIXME STIMMT DAS??? BEISPIEL ??? der code in TypedDOMGenerator scheint ein "" NICHT zu unterstützen !? #p#kind missing GENAUER! FIXME #p This method is frequently employed in the #tdom /#xantlr co-operation, to eliminate unnecessary nodes which only present alternatives in the derivation tree. This is controlled by a "#src!options {xmlNodeType=abstract;}!" option in the #xantlr grammar file, which is translated to the #tdom PI automatically, see #ref txt_xantlrtdom below. #p#kind missing DON'T KNOW how much is really CHECKED of these pre-conditions !?!?! // ------------------------------------------------------- #h3 #title Name Mangling from DTD Elements' Contents and Sub-Contents to Java Classes #label txt_namemangling //#h2 #title Name Mangling from DTD #src ELEMENT s' Content and Sub-Content //to #java #src!class!es #label txt_namemangling #p (This section describes how Java class definitions and their naming are derived from the contents model of a certain element. Please note that the mere #emph!name translation! for eliminating those characters, which are valid identifier components in XML but not in Java, as described in #ref txt_elementclasses above, must happen anyhow and independently!) #p The mapping rules between DTD and #java class definitions act locally on each "#src$$" declaration. The structure of the generated #java classes and their naming convention immediately reflect the usage of #emph!parentheses! in the regular expression describing the element's contents, as given by the compiled DTD. #nl Please note that there is #bold no implicit normalization#/bold of DTD content models. For the name mangling purpose #emph!there is a difference! between the content models #source a, (b, c) #/source #p #ldots and #ldots #source a, b, c #/source #p Name mangling is basically defined on #emph!sequences!. #p Therefore, as a first step, the top-level of the element's content definition is always interpreted as such a sequence , possibly of length 1(one). #p Each sequence consists of #emph!content particles!. Each such content particle is #ldots #list #i either an element reference, #i or an embedded choice, #i or an embedded sub-sequence. #/list #p All content particles may be decorated with a quantification symbol "#src!?!", "#src!*!" or "#src!+!". The naming convention assigns #emph(position numbers separately) to (a) to all references to elements of a certain tag, (b) to all embedded choices and (c) to all embedded sub-sequences. #nl These numberings #emph[always start with the number one(1)]. //These numberings always start with the number one(1) ! ERROR "!" still end flag?? #p From these numbers (and the tag strings in case (a)) #emph!particle names! are generated. #nl E.g. the sub-particles of the top-level in following DTD content model are adressed by the particle names put beneath: #source | | | | | | | Seq_1 | | Choice_1 Seq_2 | | Elem_1_TB Elem_1_TA Elem_2_TA #/source #p The quantification symbols "#src!?!", "#src!*!" or "#src!+!" and all parenthes around singletons (i.e. not enclosing sub-sequences or alternatives) are ignored in the definition of particle names. #p If the top-level content model as written in the DTD is an alternative, the top-level content for #tdom is considered as a singleton sequence. #nl In this case we get a very simple top-level naming, like #ldots #source | Choice_1 // or | Choice_1 #/source // ------------------------------------------------------- #h3 #title Inner Classes Generated for Sub-Content #p For each sub-sequence and each choice contained in the top-level sequence, an #emph!inner class! is defined in the class representing the element. The names of these classes are identical with the particle names, as defined above. #p As with element classes, each such class provides update and retrieve methods ("#src!set_<..>!" and "#src!get_<..>!") for its sub-contents. An instance of such an inner class must be used as an argument for the constructors and for update methods, and is returned as result of a corresponding retrieve method. #p #bold Please note#/ that in the current implementation there is #emph!no! algebraic equality defined on content models. Therefore in the example above the types of #src!Seq_1! and #src!Seq_2! are #emph!not compatible!, in spite of having the same contents definition. The same fact holds for embedded choices. // ------------------------------------------------------- #h3 #title Retrieval, Update and Visit Methods #label txt_ruvmethods #p Built upon the particle names, the #java class generated for the DTD element "#src!NC!" provides methods for retrieving and updating the contents of a given instance. Which methods are generated is controlled by the quantification decoration of the content particle. #p Let #src!! be the particle name, and #src!! be its plural form (i.e. "#src!Elems_1_TA!" for "#src!Elem_1_TA!", "#src!Choices_2!" for "#src!Choice_2!" and "#src!Seqs_1!" for "#src!Seq_1!"). #nl Let #src!! be the class representing a sub-content (i.e. "#src!Element_TA!" for an Element reference, and "#src!Element_NC.Seq_1!" or "#src!Element_NC.Choice_2!" for embedded sub-content). #commentchar\ #p In case of undecorated particles the generated methods are #ldots #source public get(){..} // deliver current content // this is always != null public set( e){..} // update current content // if e==null, throw exception #/source #p If the modifier "#src!?!" is present, we get #ldots #source public get(){..} // deliver current content, // but may return null public set( e){..} // update current content // and accept null as argument public boolean has(){..} // return whether component is currently // contained in the higher-level content #/source #p If one of the modifiers "#src!*!" or "#src!+!" is present, we get #ldots #source public [] get(){..} // deliver whole sequence as an array public get(int pos){..} // deliver content at the given position // of the sequence public [] set([] e){..} // update current content totally // to a whole sequence public set(int pos, e){..} // update current content // at the given position public int count(){.. // return number of components currently // contained in the higher-level content public void visit(Visitor v){..} // apply visitor to all particles in the // sub-conten #/source #commentchar / #p Please note that for your convenience every "#src!set<>()!"-method (including those described in the following sections!) always returns the old, overwritten value as its result. #p The meaning of the method "#src!visit(Visitor c)!" will be explained in section #ref txt_visitorcall. // ------------------------------------------------------- #h3 #title Unsafe retrieval methods and alternative checked list generation #label txt_dtdguarant #p In the preceding list of methods the retrieval functions for plural sub-contents like "#src!get_Elems_1_TA()!" (or like "#src!getSeqs_2()!" and "#src!getChoices_1()!" as introduced in the next sections) deliver a direct access to the Java "array" data object which realizes the contents of the Tdom model instance. Due to a weakness of the Java language, This array is #xemph!not protected against vandalism!, i.e. storing into it a forbidden #src!null! value. #p The same holds for the plural update methode like "#src!set_Elems1_TA(TA[])!" (or like "#src!setSeqs_1(TN.Sequ_1[])!" and "#src!setChoices_1(TN.Choice_1[])!", which additionally can violate the "#src!+!" specification by applying it to zero-length arguments. #xemph So the type correctness w.r.t. the original DTD is only guranteed if these API methods are not used. #p An alternative is the usage of #xemph!checked lists! for the implementation. These lists are proxy classes above the Java list classes and prohibit the insertion of #src!null!, see #link 2/eu/bandm/tools/umod/runtime/CheckedList.html #text their api doc#/link. This mode is selected when running tdom with the command line switch "#src --generateLists", see #ref txt_calltdom. #p If lists are selected than in the methode signatures below and above all types "#src!x[]!" (appearing as result types or parameters) must be replaced by "#src!CheckedList!". In this mode, all code using tdom is always type correct w.r.t. the original DTD. // ------------------------------------------------------- #h3 #title Inner Classes Generated for Embedded Sequences #label txt_realizingsequences #p In case of embedded sequences, the whole top-level procedure (particle naming scheme, inner class definition for sub-structures and generation of methods) is simply applied recursively. #p A #bold difference#/bold in the implementation is that the classes for sub-content (i.e. embedded Sequences and Choices) are not inner classes of the inner class representing the sub-content, but reside as direct inner classes of the element's class. #nl The nesting is only represented by their name, which is a concatenation of the particle names of all levels, connected by an underscore "#src!_!". #p The following example shows some of the get methods and the resulting types (classes). The names of both are again constructed by the particle names: #source | | | | | nc.getSeq_1().getSeq_1(3).getElem_1_TB()=>Element_TB | | | nc.getSeq_1().getSeq_1(3)=>NC.Seq_1_Seq_1 | nc.getSeq_1()=>NC.Seq_1 #/source // ------------------------------------------------------- #h3 #title Inner Classes Generated for Choices #label txt_realizingchoices #p The inner classes generated for choices are sub-classes of the pre-defined runtime class #src!TypedChoice!. Additionally, for each alternative of a choice an inner class is generated, which is again a sub-class of this "typed choice class". The name of such an alternative class is the name of the choice class with the appendix "#src!_Alt_!". #p This "#src!!" used to identify an alternative is the position number w.r.t. the containing choice in the original DTD formula. #emph[This numbering starts with 1(one) !] #p In our example from above, the naming is #ldots #source || | | || | Choice_1_Alt_3 || Choice_1_Alt_2 |Choice_1_Alt_1 Choice_1 #/source #p The methods generated for the choice class are #ldots #commentchar \ #source public class Element_NC extends eu.bandm.tools.tdom.Element { ... public Choice_1 setChoice_1(Choice_1 e){...} // change content accordingly. public Choice_1 getChoice_1(){...} // deliver current content public abstract class Choice_1 extends TypedChoice { public int getAltIndex(){...} // deliver the index of the // currently contained alternative public Choice_1_Alt_1 toAlt_1(){..} // convert to the corresponding class, public Choice_1_Alt_1 toAlt_2(){..} // if current content represents this // alternative. Otherwise, return null public boolean isAlt_1(){..} // return true iff current content public boolean isAlt_2(){..} // is of the mentioned alternative. ... } public class Choice_1_Alt_1 extends Choice_1 { ... // update/retrieve/visit methods like an top-level element/sequence class !! } } #/source #commentchar / #p The contents of each #src!Choice__Alt_! class is again treated as a sequence (possibly a singleton sequence), and the top-level naming and code generation scheme is applied recursively. #nl Again, no further nesting of inner classes will be applied, but the representing classes are direct inner classes of the element's class, and their names created by concatenation of the naming particle hierarchy. #p An example for retrieving: #source | | | | | nc.getChoice_1().toAlt_3().getChoice_1(8) | | =>NC.Choice_1_Alt_3_Choice_1 | | | nc.getChoice_1().toAlt_3()=>NC.Choice_1_Alt_3 | nc.getChoice_1()=>NC.Choice_1 #/source // ------------------------------------------------------------------------------- #h3 #title Text Content and Mixed Content #p Mixed content and plain character content is treated specially. Mixed content could be considered a "choice-type with #src!*!-quantification", but in contrast to the standard implementation described above, the layer which explicitly adresses the choices is skipped for the sake of the user's convenience. #p Instead, a specialized #src!Content! class is defined in the element's implementing class, which can contain either character data, or one of the elements listed in the mixed content declaration. #p So the DTD definition #ldots #suppressVerbatimCommandCharWarning 2 #source #/source #p #ldots is translated to #ldots #commentchar \ #source public class Element_NB extends Element implements TypedElement.TypedPCDataContainer ... { public static class Content extends TypedElement.MixedContent { ... } public List getContent(){..} // returns the modifiable list of particles public String getPCData() {return getPCData(this);} // convenience function } public class Element_NC extends Element implements TypedElement.MixedContentContainer ... { ... public static class Content extends TypedElement.MixedContent { public Content (Element_TA el){...} // create the variant with element TA public boolean isElement_TA(){...} // returns whether content particle is a TA public Element_TA toElement_TA(){...} // returns casted content or null public Content (Element_TB el){...} // create the variant with element TB public boolean isElement_TB(){...} // returns whether content particle is a TB public Element_TB toElement_TB(){...} // returns casted content or null // inherited from TypedElement.MixedContent : public Content (String s){...} // create the variant with pcdata public Content (TypedPCData s){...} // dto. public boolean isPCData(){...} // returns whether content particle is PCData public TypedPCData toPCData(){...} // returns casted content or null } ... public List getContent(){..} // returns the modifiable list of particles } // to get the character content of the pcdata particles, you additionally need: public class TypedPCData extends TypedNode { ... public String getPCData(){} // returns text content of this content particle \\??? RAUS // .ignorable() missing FIXME } #/source #commentchar / #p Let #src!Elx elx! be a generated element class, and a reference to an instance of it. To read character data of a given content particle is done as in #source for (Elx.Content c : elx.getContent) if (c.isPCData()) String charSeq = c.toPCData().getPCData(); #/source #p This is rather tedious, of course. #nl The PCData objects themselves are algebraic: to change the text contents, you have to create a new instance and insert it into the list of #src!el.getContent()!. For conveniece there is a constructor which implies the #src!new PCData()!: #source elx.getContent().add(new Elx.Content("text value")); #/source #p All elements which are defined by the DTD wording #commandchar\ \src!(#PCDATA)! or \src!(#PCDATA)*!, i.e. which are pcdata ONLY, \commandchar# are realized as instances of #src!PCDataContainer!, a sub-class of #src!MixedContentContainer!. #p #bold!Please note! that also in this case you never can make any assumption on #emph.how many. content particles exist, the concatenation of which represents the plain text. #p Anyhow, processing should not happen on this technical level of representation. Additionally, for convenience, these objects offer #emph!directly! the method #src!getPCData()!, which concatenates all fragments into one string. #nl Setting the contents nevertheless requires to create the intermediate container level by executing #src!elx.setContent(new Elx.Content("newstringvalue"));! #p Beside this low-level treatment there is a general method #source TypedElement { String getDeepPCData() ; } #/source #p It descends the whole subtree rooted at the element and collects all character data recursively. This corresponds to the notion of "string-value" in XPath [XPath 1.0/5.2], to XPath's "string()" function and to "xsl:value-of" in #cite xslt1_0. #nl (The implementation requires the instantiation of a Visitor. This code is specific for the model, and thus realized in the generated code for #src!Element!.) #p (The runtime class #src!TypedElement! offers both functionalities additionally wrapped into static functions objects: #nl #src!public static final Function getFlatPCData!, #nl #src!public static final Function getDeepPCData!, #nl ) /* ========================= #p Up to now the only example of mixed content in #mt itself is found in #link 1/src/eu/bandm/tools/tester/tdom/Element_invalid.java#blank#/! #p #bold ml an alle#~ WARUM kann netscape dieses file nicht richtig anzeigen ???? ================================== */ // ------------------------------------------------------- #h2 #title Attributes #label txt_attributes #h3 #title Generated Classes for Attributes #p The definition of "attributes" in XML is rather akward and inpractical. E.g. #list #i technically, the scope of a concrete attribute definition is local to one certain #src!ELEMENT!. But the pragmatics of all attributes with the same name are in most cases defined globally, w.r.t. the DTD as a whole, --- which is not represented syntactically. #i The granularity of their "type system" is rather unbalanced; #i the semantics of "#src!ID!" type attributes are non-compositional w.r.t. the validity of the containing document; #i the declaration of "#hh#src!FIXED!" values mixes the realms of type definition and of data; #i all enumeration types (accidentially meeting in one element type) "should" have disjoint value sets, #nl (#cite xml , last sentence of section 3.3.1 says #nl "For interoperability, the same Nmtoken SHOULD NOT occur more than once in the enumerated attribute types of a single element type." ) #i etc #/list #p (Indeed we met well-experienced XSLT programmers who admitted that in their daily work the first step of every processing is the replacement of all attributes by additional #src!ELEMENT!s.) #p The #tdom support of attributes is as follows: #nl For every pair of #src!ELEMENT! declaration and attribute definition a new inner class is defined in the element's class, which is derived from #emph!that! subclass of #link 2/eu/bandm/tools/tdom/runtime/TypedAttribute.html #text !/tdom/runtime/TypedAttribute#/ which corresponds to the attribute's "type", see table below. (Only exception: Common classes for attributes of different elements as described in #ref txt_commonAttributes.) #p The naming convention for this inner class and for the retrieval/update methods is similar to that of content particles as described above: The attribute named "X" from the DTD is addressed as "#src!Attr_!!" in the Java code, where X' is the mangled character sequence from X, as desribed for elements in #ref txt_elementclasses. This Java name is used directly for the inner class which implements the attributes, and in the names of the retrieval methods. #p The attribute objects serve as #emph!storages for values!, not as values: They are created with the element object automatically, but a value has to be assigned to them explicitly (by the user of the API or via the parsed XML source). The identity of the attribute objects related to a particular element instance is totally under the control of the #tdom code: No explicit assingment by the user is possible; initial #emph!sharing! is terminated automatically by write access; therefore references to attribute objects should better not be cached. #p There are two retrieval methods: #nl #src!element.readAttr_X()! delivers the current attribute object. In case that this attribute has the default value, a common default object is returned. In this case the attempt to set a new value will result in an #src UnsupportedOperationException("!mutable")#/src. But this is the better method to #emph!read! an attribute value, because default objects can be shared. This method is also applied by all generated visitor code. #p #src!element.getAttr_X()! delivers an individual object anyhow. The value of this object may be read and written. This method should only be used when writing is indeed intended, because the common default object is replaced by a dedicated, writable copy. #p The retrieved Attribute object in turn provides methods for setting and getting the values, namely #src!V getValue()! and #src!setValue(V)!. #src!getValue()! returns null only for an attribute which has been declared #src!#hh IMPLIED! and currently has an "absent" value. #nl #src!String getStringValue()! and #src!static String getStringValue(V)! deliver the value as it would appear in a XML standard text serialization. #nl #src!String getTypeString()! delivers the text of the type declaration in the DTD. #nl #src!boolean isOptional() / isFixed() / isRequired()! delivers whether the value has been declared in the DTD as #src!#hh IMPLIED! / #src!#hh FIXED! / #src!#hh REQUIRED!. #nl #src!boolean isSpecified()! delivers whether attribute has been set explicitly, when creating the containing element instance or afterwards. (For details see #ref txt-unsetting below.) #nl #src!V getDefaultValue()! delivers the default value as declared in the DTD. The value null represents "attribute value is absent" and corresponds to the declaration "#src!#hh IMPLIED!". #p (The #src!V getValue()!, #src!setValue(V)! and few other methods can be realized directly in the generated code, or inherited from the corresponding base classes from #link 4/tdom/runtime/package-summary.html #text tdom.runtime#/link, so please have a look to that api doc and into the generated sources.) #p The type expression #src!V! depends on the attributes "type" as it appears in the DTD: #table#border 1 #tr #td DTD attribute "type" #td realizing class derived from #td Java type #src!! #tr #td #src!NMTOKEN! #td#link 2/eu/bandm/tools/tdom/runtime/NmTokenAttribute.html #text NmTokenAttribute #td #src!String! #tr #td #src!Id! #td#link 2/eu/bandm/tools/tdom/runtime/IdAttribute.html #text IdAttribute #td #src!String! #tr #td #src!IdRef! #td#link 2/eu/bandm/tools/tdom/runtime/IdRefAttribute.html #text IdRefAttribute #td #src!String! #tr #td #src!CData! #td#link 2/eu/bandm/tools/tdom/runtime/CDataAttribute.html #text CDataAttribute #td #src!String! #tr #td Enumeration #td#link 2/eu/bandm/tools/tdom/runtime/EnumerationAttribute.html #text EnumerationAttribute #td #src!Enum!, dedicated type, generated for (and locally to) this attribute #tr #td Enumeration, if enum values are all integers, see #ref txt_intatts. #td#link 2/eu/bandm/tools/tdom/runtime/SelectedIntegersAttribute.html #text SelectedIntegersAttribute #td #src!Integer! #tr #td #src!NMTOKENS! #td#link 2/eu/bandm/tools/tdom/runtime/NmTokensAttribute.html #text NmTokensAttribute #td #src!List! #tr #td #src!IdRefs! #td#link 2/eu/bandm/tools/tdom/runtime/IdRefsAttribute.html #text IdRefsAttribute #td #src!List! #/table #p (For the inheritance relation between the different attribute classes see #link 2/eu/bandm/tools/tdom/runtime/package-tree.html #text the tdom runtime class tree#/link.) #p In case of enumeration type attributes, the value must be one item from the enumeration class. This is a public inner class of the generated Attribute's class and always has the name "#src!Value!". When "s" is the name of one particular enumeration value as written in the DTD, then the corresponding enumeration items have the name #source "Value_" + (s.replace("[-.:]", "_")) #/source #p This translation is the same as described above, see #ref txt_nametranslation. (Please note that name clashes may result!-) #p The enumeration items offer a method "#src!String getStringValue()!", which delivers the original DTD wording; the EnumerationAttribute's class offers a method "#src!Map getValueMap()!" for the inverse translation. // ------------------------------------------------------- #h3 #title Checking Value Assignments #p #p The #src!setValue()! method executes validity tests on its parameters as follows: #list #i For an #src!EnumerationAttribute! there is a typed #src!setValue(V)! method, which only checks for null values. There is additionally a #src!setValue(String)! method, which fails with the conversion into #src!V!. #i For a #src!SelectedIntegersAttribute! the typed #src!setValue(int)! method checks for allowed integer values. There is additionally a #src!setValue(String)! method, which fails with the conversion. #i For #src!NMTOKEN!, #src!NMTOKENS!, #src!IDREF!, #src!IDREFS! and #src!ID! special syntax rules must be matched, see #link 2/eu/bandm/tools/tdom/runtime/NmTokenAttribute.html #loc checkNmToken-java.lang.String- #text #src!checkNmToken(..)! #i For each attribute declared as "#src!#hh FIXED!" only that one declared character sequence is a valid attribute value (Nevertheless, setting the value explicitly is not superfluous, because it changes the "isSet" flag, see #ref txt-unsetting.) #i Only for attributes which are declared as "#src!#hh IMPLIED!" the value #src!null! may be supplied, representing "not-present". #/list #p All these violation throw a #src!TdomAttributeSyntaxException! (or a subclass thereof). See #ref txt-tdomexception for the hierarchy of #src!TdomException!. So unallowed #src!null! values and violated #src!#hh FIXED! attributes are treated as special cases of failed syntax checks. #nl Therefore all generated #src!setValue(V)! methods are declared with "#src!throws TdomAttributeSyntaxException!", except the #src!setValue(V)! of an #src!EnumerationAttribute! and of a #src!CDataAttribute!, #emph!if and only if! they have the default value #src!#hh IMPLIED!, what additionally allows #src!null!. Compare e.g. the setValue(V) methods for Attr_http_equiv, Attr_lang (common attribute, inherited method) and Attr_content (inherited method) in #link 3/eu/bandm/tools/doctypes/xhtml/Element_meta.java #text Element_meta#/link of the XHTML tdom. // ------------------------------------------------------- #h3 #title Enumeration Attributes with Integer Tokens Only #label txt_intatts #p In many standard DTDs one frequently finds enumeration attributes which only contain selected integer values. The standard implementation would require a chain of three redundant conversions when executing calculations (text representation to enumeration value to text representation to integer value). Therefore the attribute with name "#src!att1!" in element "#src!ele1!" and the shared attribute "#src!att2!" (see #ref txt_commonAttributes) are treated specially when declared in the dtd by the tdom pi #source #/source #p This leads to the generation of a subclass of #link 2/eu/bandm/tools/tdom/runtime/SelectedIntegersAttribute.html #text SelectedIntegersAttribute#/link. This class implements storage, retrieval and validity check directly on Java "#src!int!" values. // ------------------------------------------------------- #h3 #title Unsetting Attributes #label txt-unsetting #p Attributes which are declared with a default value (including the special value #src!#hh IMPLIED!") can be "not present" in the textual representation of an XML document, or similar, when creating the document by constructor calls. #nl In #tdom, this fact is memorized by an internal flag. The dedicated method #source class Attr_[XY] { ... public void clearValue(){..} } #/source #p clears that flag and sets the value back to the default value. Afterwards, the attribute will not be written out when serializing the document model. This can be changed by executing e.g. #source el.getAttr_XY().clearValue(); el.getAttr_XY().setValue(el.readAttr_XY().readValue()); -- or -- el.getAttr_XY().setValue(el.readAttr_XY().getDefaultValue()); #/source #p After this, the value is also the default value, but the attribute is considered "set" and will be written out (unless the value is null in case of #src!#hh IMPLIED!). // ------------------------------------------------------- #h3 #title Common Classes for Common Attributes #label txt_commonAttributes #p So far, no two different attributes are ever assignment compatible, even if they carry the same name, type and default value. This corresponds to the definitions of DTDs, which do not impose any semantics on attributes, beside the mere string value. #p To impose an abstraction on attributes, #tdom understands process instructions like #ldots #suppressVerbatimCommandCharWarning 1 #source #/source #p The meaning is, that on top-level of the generated package (i.e. #emph!not! as a part of any #src!ELEMENT!s code) a stand-alone attribute class is generated. This class is named and behaves like the "local" attribute classes described above. #p A local attribute class is still created in any Element's class, as described above. #emph!But! for each attribute which matches a "global" attribute, this class (1) uses the global class as its base class, (2) inherits its methods and most of its fields, and (3) will be recognized by the more abstract matching methods of the generated #src!Matcher! class. This last feature, as described in #ref txt_visitorcall, is the main purpose of this abstraction. #p#kind missing MATCHER is not yet implemented !?!?! #p#kind missing What about "Visitor" in this context!?!?! #p In many DTDs from practical use, common attributes are declared in #src!ENTITIY!s, which are included in different #src!ATTLIST!s. In this case the effect of creating common base classes can be achieved for all attributes defined in such an entity by the process instruction #ldots #source #/source #p In this case all entities "#src!entA!", "#src!entB!", "#src!entC!", must expand to complete attribute declarations (one or more), and the process intruction is processed exactly as explained above, after expanding these entities. #p Some remarks are practically important: #p #bold First:#/ Currently only the "name" of the attributes is used for name mangling, so there can be only one common attribute class with a certain name. #tdom behaves like DTD (as ugly as it is !-) insofar as the #emph!first! definition wins over any subsequent attempt to re-define. A warning is issued in this case. #p#kind missing WO muessen die PIs im TEXT stehen !? def-before-use oder nicht? #p #bold Second:#/ A common attribute is only recognized if all three dimensions (name, "type", and initial value) are exactly identical. So the following two declarations do #emph!not! match: #suppressVerbatimCommandCharWarning 2 #source #/source #p The #tdom tool will issue a #src!hint!, whenever a common attribute is #emph!not! recognized due to such minimal differences. #p#kind missing is "FIXED" part of the default value, or what ??? #p #bold Third:#/ The entity names themselves and the grouping of the attributes is in no way reflected; they are simply "unpacked" to a list of attributes, which are #emph!independently! compiled as common attributes, as described. // ------------------------------------------------------- #h3 #title Attributes with Attribute Types "#src!ID!", "#src!IDREF!" and "#src!IDREFS!" #label txt_idAttributes #p Attributes of "type ID, IDREF and IDREFS" are special because they are intended to model #emph!references! between sub-trees of a document. An XML dcoument is only "valid" if (a) there is at most one(1) attribute declaration of "type ID" in each element's attribute list, and (b) the string value of every instance of an IDREF attribute, and each single "NAME" token in the value of an IDREFS attribute, corresponds to exactly one(1) instance of an ID attribute carrying the same value (see #cite xml, "Validity constraint: One ID per Element Type" and "Validity constraint: IDREF") #nl Of course these conditions are not really sensible. #nl E.g. for changing the value of an ID attribute without violating these rules, first all referring IDREF/IDREFS tokens must be deleted. Then the attribute's value may be changed, and not before this, all referring attributes may be visited a second time, to set them to the new value. #p Therefore #tdom does not check ID/IDREF/IDREFS attributes by default. Instead, this can be done explicitly, when a model is completely constructed, by the following methods: #commentchar\ #source // in the generated package: class Document_ { /** @return the id-based map string->Element * @ŧhrows tdom.runtime.HomonymousIdException if one(1) id is used for * two(2) different elements * @throws tdom.runtime.SynonymousIdException if two(2) ids are used for * one(1) element */ public ElementDictionary createDictionary() { } } // in package tdom.runtime : /** Indicates the presence of an ID attribute. **/ public interface Identifiable { /** @return the current id, but does not supply an automatically generated one.*/ String @Opt getId() ; } class ElementDictionary{ /** @return the element with the given id, or null.*/ public @Opt E get(String s){ } } class IdRefAttribute{ /** @return the element with the given id, or null-.*/ public @Opt E getValue(ElementDictionary){ } } class IdRefsAttribute{ /** @return a list of all elements with the ids, including "null" for failures.*/ public java.util.List<@Opt E> getValues(ElementDictionary){ } } #/source #commentchar/ #p There are some more useful methods for handling the mappings explicitly. Please refer to the #link 2/eu/bandm/tools/tdom/runtime/package-tree.html #text api doc of the involved runtime classes#/link. #p Please note that "#src!SynonymousIdException!" cannot occur when only using generated code for filling the map: According to #cite xml, "Validity constraint: One ID per Element Type", each element definition may have only one attribute of "ID" flavour, and this is checked statically when translating the DTD and can cause an error message. #p The constraint #cite xml, "Validity constraint: IDREF / second phrase" is not checked at all automatically: every return value #src!==null! can be treated as an error by the caller explicitly, if appropriate. #p#kind missing Abstraktion des Überprüfungsvorganges für alle IDREF/IDREFS in gegebenem Document ? /* =============================================================== #p#kind missing No attributes from two different #src!ELEMENT! declarations are ever compatible, even if they differ neither in name nor in "type" ! #nl Currently we investigate two possible solutions by enhancing the code in #link 1/src2/eu/bandm/tools/dtd #blank text !/../tools/dtd#~: #list #i Either by (heuristically!) identifying "nice entity definitions", which "normally" carry information about the "intended pragmatics" of attribute definitions, #i and / or by re-analysing the fully expanded DTD-model for isomorphic attribute definitions. #/list Both methods aint no fun #src!;-(!. ================================================================== */ // ------------------------------------------------------- #h2 #title Auxiliary methods for numeric contents of elements and attributes #p In most DTDs, the contents of many attributes and of PCDATA-only elements is employed to encode numeric contents, i.e. integer or floating point numbers. For easy decoding of these data, the class #link 2/eu/bandm/tools/tdom/runtime/TypedNode.html #text TypedNode#/link offers some overloaded methods. #p Their names are "#src!asInt(..)!", "#src!asBigInteger(..)!", "#src!asDouble(..)!", "#src!asHexInt(..)!", etc. They behave robust and deliver #src!null! in case of #src!null! input or conversion error. The overloading allows them to be applied to the appropriate attribute types and to element contents in a uniform way. (Therefore their location as static methods in "#src!TypedNode!" !) #p For details please refer to the #link 2/eu/bandm/tools/tdom/runtime/TypedNode.html #text api doc#/. // ------------------------------------------------------- #h2 #title Additional Documentation Text #label txt_docTexts #p The generated Java source contains automatically generated API documentation to explain the fundamental #emph!technical! aspects of handling the generated classes and methods. But of course, #tdom does know nothing about their intendend meaning. #nl Therefore it is possible to attach explicit author's documentation text to #list #i the model as a whole #i each element class #i each single attribute of an element, #i each "common" attribute, i.e. which is declared unrelated to a particular element, as described in #ref txt_commonAttributes. #/list The means are process instructions in the DTD, as described in #ref txt_pi. #p For the above-mentioned categories this looks like #ldots #source #/source #p At the end of the #tdom run all these PIs which had not been processed, eg. because they adress a non-existing target due to a typo, are all reported by one warning message each. #p More than one entry with the same documentation target may occur; their text will be concatenated in source order. #p The treatment of these "semantic" or "author's" documentation texts is similar to that in #link umod.html#text #umod#/link: During the normal rendering process (by #src!javadoc!) the resulting "doc comments" are rendered specially: included in "#src!
!" tags. They appear in green color, if the stylesheet "#src!bandmApiDoc.css!" is appended to the generated stylesheet. This shall clarify the difference between the mere technical documentation and the semantic level of meaning. #p Even farther goes the employment of #nl #src!javadoc ... -doclet eu.bandm.tools.tdom.doclet.TdomUserDoc !#nl #nl In this case most technical fields and methods are totally omitted, and only those with the annotation "#src!@User!" are included in the documentation. The result is much leaner and focussed and may be more useful when programming "around" the tdom model. // ------------------------------------------------------- #h2 #title Ethereals: Comments and Processing Instructions as Second Class Inhabitants#label txt_ethereals #p It is not understood that Processing Instructions and Comments are part of a "model" in the narrow sense, and originally #tdom did not support them. #p Anyhow, the requirements and application contexts are various, so it may be sensible to include them. We introduced them as "second class" inhabitants, which have to be attached to a "substantial" inhabitant for being stored and retrieved. #p Every Element and every PCData fragment as a "Substantial" has two "decorative" sequences of "Ethereals" (see the symbolic class tree in #ref txt_runtimelasses !), one "preceding" and one "following". #p Furthermore, every Element and Document has a "leading" and a "trailing" sequence, which can be used if no Substantials are contained. The access methods are #source List TypedDocument.[get/read]LeadingEthereals() List TypedDocument.[get/read]TrailingEthereals() List TypedElement.[get/read]LeadingEthereals() List TypedElement.[get/read]TrailingEthereals() List TypedSubstantial.[get/read]PrecedingEthereals() List TypedSubstantial.[get/read]FollowingEthereals() #/source #p The #src!..read..! variant delivers a read-only-list (which can be shared iff empty), the #src!..get..! variant delivers a list the user can modify. #p There are n+1 possiblities to store a sequence of n Ethereals w.r.t. the two neighbouring Substantials: #source | IN OR OR OR | el.leading el.leading el.leading subel.preceding | el.leading el.leading subel.preceding subel.preceding | el.leading subel.preceding subel.preceding subel.preceding | | IN subel.following OR subel2.preceding | | #/source #p The fact where an Ethereal is stored does not have any meaning a priori. A parser is allowed to choose any solution, arbitrarily. Of course, on the next conceptual level the user may define a "meta-syntax" of relations and meanings, (E.g., is a comment related to the follower, or to the element just opened? Is a comment related to a processing instruction?) In such a case the model must be traversed and these relations constructed explicitly, implemented by additional data. // ------------------------------------------------------- #h1 #title Construction of #tdom models #label txt_tdomconstruction #p#kind missing FEHLT semi-parser #nl required attributes ? #nl undeclared (wild) attributes ? #nl == GEHEN BEIDE NICHT !!?? #nl new Element_h1(titleString_2, new Element_a()) #nl new Element_h1(new Element_h1.Content(titleString_2), new Element_a()) #nl #nl NETT WARE null statt leerer liste:#nl (new Element_table(null, #nl new Element_table.Choice_1_Alt_1(new Element_col[0]), #nl // ------------------------------------------------------- #h2 #title Error Cases and Exception Hierarchy #label txt-tdomexception #p According to the possible error conditions when constructing a #tdom instance, there is a class tree of the following hierarchy of #emph!checked! exceptions. (Java speak "checked" means that they must be declared explicitly.) #list#symbol bullet #i #src!TdomException! #list#symbol bullet #i #src!TdomContentException! If the next element intended for the contents of a particular element under construction does not match its DTD declaration. // #list // #i TdomContentEndException // If the sequence of intended content elements is exhausted before the DTD declaration is // complete. // #/list #i #src!TdomAttributeException! If an intended attribute setting does not match the DTD declaration. #list #i #src!TdomAttributeSyntaxException! If the sequence of characters of an intended attribute value does not match its DTD declaration. // #list // #i TdomAttributeFixedException // If the sequence of characters of an intended attribute value does not match the "#hh FIXED" // value from the DTD declaration. // #/list #i #src!TdomAttributeMissingException! If an attribute is missing which is declared as "#hh REQUIRED". #i #src!TdomAttributeUndefException! If an attribute is tried to set which is not defined in the DTD. #/list #i #src!TdomXmlException! When a lower (parser) code signals invalid XML input. #/list #/list #p All these classes memorize the information about the offending value, attribute, element context and location, as far as known. There are further subtypes of these classes for particular cases, see #link 2/eu/bandm/tools/tdom/runtime/package-tree.html #text the api doc#/link. #p For narrowing the scope of the necessary exception declarations, there is the class #src!TypedAttribute.SafeValues! with #link 2/eu/bandm/tools/tdom/runtime/TypedAttribute.html#loc safeValues #text this only instance#/link, which is used as a flag to distinguish between safe/unsafe methods and constructors, which throw/do not throw a #src!TdomAttributeException!. For details see #ref txt_newwithattributes. // ------------------------------------------------------- #h2 #title Explicit Constructor Application for #src!Element!s and Sub-Content #label txt-bottomup #p Two central design issues of #tdom are (a) that all existing models at every instants of their life-time are #emph!type-correct! sub-trees w.r.t. the corresponding DTD, and (b) that this is checked statically, at compile time, as far as possible. #nl Therefore most of the generated public constructors always require complete and type-correct contents as their argument. #nl As a consequence, a larger #tdom model must be constructed bottom-up, in a term-like fashion. This (at a first glance possibly annoying) strict discipline implies especially, that #tdom models are #emph!always finite! by construction.#footnote They can become cyclic when later calling #src!set_..()! methods: another reason for better treating model elements as immutable! #/footnote #p #bold Please note#/ that constructing a large #tdom model by explicit constructor calls is a tedious task. Explicit constructor calls only make sense as the back-end of some automated translation procedures. #nl For constructing a #tdom model from a pre-existent XML text file one can use the SAX interface or the w3c Dom interface. These are described in #ref txt_autoconst below. /* FIXME NICE EXAMPLES (using ops.Arrays.append() IN ../projekte2012/src/eu/bandm/wiki/base/HandleRq.java (noch anderswo !?!?) */ // ------------------------------------------------------- #h3 #title Creating Elements with Structured Contents, Statically Typed #p The basic structural element, to which the generated #java constructors correspond, is again the #emph!sequence!. So constructors are generated for top-level content regular expressions, considered as a sequence, and for all sub-sequences and alternatives, which are sequences again. Since the Java method signature corresponds to the DTD content model, no #src!TdomContentException! is thrown by the invocation of such a "statically typed" constructor. (That no #src!TdomAttributeSyntaxException! is thrown can be selected by supplying #src!safeValues!.) All variants are illustrated by the following example: #source --> leads to --> new Element_NC( Element_A[] elems_A_1, Element_B... elems_B_1) #/source // ------------------------------------------------------- #h3 #title Creating Elements with Structured Contents, Dynamically Typed, by Semi-Parser #label txt_createByArrays #label txt-semiparser #p Oftenly it is more convenient to simply enumerate the sequence of Java objects which shall make up the contents of a newly created element. In this case, a simplified parsing process can be applied to the classes of these elements. We call it "semi-parser", because it parses only one layer of content, but does not descend into the depth, into contents of sub-elements, as the full-fledged parsers do, as described in #ref txt_autoconst. #p For this purpose there is an untyped constructor #ldots #commentchar \ #source new Element_NC (Element... elements) throws TdomContentException, TdomAttributeSyntaxException {..} #/source #commentchar / #p ("#src!Element extends tdom.runtime.TypedElement!" is the top-level element class generated specially with this certain model, so the method is "not completely" untyped !-) #p A #src!TdomContentException! is thrown whenever the supplied sequence of Java objects cannot be mapped to the content model. #nl (Since exceptions must be caught or declared anyhow, there is no variant with "#src!safeValues!" preventing #src!TdomAttributeSyntaxException!s.) #p Since the "vararg" arguments can be represented by an array, also alternatives for content creation can be defined by a #emph!pure expression!, using the concatenation operations defined in #link 2/eu/bandm/tools/ops/Arrays.html #text !/eu/bandm/tools/ops/Arrays #/link, as used in our "Dtd to Html renderer" #link 2/eu/bandm/tools/dtm/HtmlRenderer.html #text !/eu/bandm/tools/dtm/HtmlRenderer#/link according to this scheme: #suppressVerbatimCommandCharWarning 1 #commentchar\ #source import eu.bandm.tools.ops.Arrays ; // ... final Element_html el_html = new Element_html (new Element_head (new Element_head.Choice_1[0], new Element_head.Choice_2_Alt_1 (new Element_title("windowTitle")), new Element_head.Choice_2[0]), new Element_body (Arrays.append ((htmlIsDynamic) ?new Element_block_content[] {new Element_noscript (new Element_div (new Element_div.Content ("")) {@Override protected void initAttrs(){ getAttr_class().setValue(class_alert); }})} :new Element_block_content[0], new Element_block_content[] {new Element_pre(preItems.toArray (new Element_pre.Content[preItems.size()])), new Element_hr(), makeFooter(basicFileName, "http://bandm.eu/metatools/docs/usage/dtd.html#txt_dtd_tool") } ))); // ... Element_p makeFooter(String a, String b){...} List preItems = ... #/source #commentchar/ #p The header part of the created html element is constructed statically typed. The body part is dynamically typed, using Arrays.append and function calls to write case distinctions in a fully compositional way. #p (Please note that in our xhtml model "block_content" is an abstraction of different Element classes, controlled by a content model entity, as described in #ref txt_abstractelements.) // ------------------------------------------------------- #h3 #title Creating Elements with Mixed or Pure PCData Contents, Statically Typed #p In case of mixed content, e.g. a declaration like #ldots #suppressVerbatimCommandCharWarning 1 #source #/source #p #ldots the generated constructors are #ldots #source public Element_NM (Element_NM.Content... content) throws TdomAttributeSyntaxException {...} public Element_NM (SafeValues, Element_NM.Content... content){...} public Element_NM (String content) throws TdomAttributeSyntaxException {...} public Element_NM (SafeValues, String content){...} #/source #p The "#src!SafeValues!" flag has the same role as described above. #nl The last two constructors are short-cuts for the case of pure character content. #nl The canonical constructors are the first two, where all components have to be wrapped into the correct content class, like in #ldots #source new Element_NM (new Element_NM.Content("characters with embedded TA "), new Element_NM.Content(aTA), new Element_NM.Content(new TypedPCData(" followed by a TB ")), new Element_NM.Content(aTB) ) #/source #p The first argument in this example is possible because of the short-cut constructor #ldots #source public class TypedElement.MixedContent { ... public MixedContent (String data){ this(new TypedPCData(data)); } ... } #/source #p Of course, instead of "vararg"-parameters you can always supply an array, e.g. delivered by #src!Collection.toArray([])!. // ------------------------------------------------------- #h3 #title Creating Elements with Mixed or Pure PCData Contents, Dynamically Typed #p The dynamically typed constructor for elements with mixed contents has the signature #source new Element_NM (Object ...) throws TdomContentException, TdomAttributeSyntaxException ; #/source #p It behaves like all other semi-parsers, as described in #ref txt-semiparser: It throws a #src!TdomContentException! whenever the supplied sequence of Java objects cannot be mapped to the content model. #p The techniques for constructing the argument list as described for structured content in #ref txt_createByArrays can be used accordingly. #p#kind missing Currently not yet implemented, see BUG 177 // ------------------------------------------------------- #h3 #title Creating Elements with Attributes #label txt_newwithattributes #p When an element instance is constructed which has #src!#hh REQUIRED! attributes, then these must be set by the caller and checked by the constructor code for validity, before the constructor is allowed to return normally, meaning success. This is required by the #tdom philosophy of only producing type correct instances. #p Setting attributes in a constructor call is done by defining an anonymous inline class derived from the real element class. By overriding the methods #src!public void initAttrs() throws TdomAttributeSyntaxException! and #src!public void initAttrsSafe()! the caller can set arbitrary attribute values. #p No #src!TdomAttributeSyntaxException! can leave the second variant, and only this is called by the "safe constructor" (= the constructor with the #src!safeValues! flag). So basically two variants are possible for construction: #source new Elemen_e (a, b, c) { public void initAttrs() throws TdomAttributeSyntaxException { getAttr_a1().setValue(v1); } public void initAttrsSafe() { getAttr_a2().setValue(v2); } }; // throws TdomAttributeSyntaxException -- or -- new Elemen_e (safeValues, a, b, c) { public void initAttrsSafe() { getAttr_a2().setValue(v2); } }; // does not throw TdomAttributeSyntaxException #/source #p Of course the safe constructor can (and should) always be used if no attribute at all is set. #p #xemph!ATTENTION!: The safe constructor does only call #src!initAttrsSafe()!, and NOT #src!initAttrs()!. Putting initialization code in the latter and meaning the former is a hard to find error which #xemph!cannot be detected statically!. #p If a #src!TdomAttributeSyntaxException! is possible, it must be caught and treated locally for using the safe variant, like in #source new Elemen_e (safeValues, a, b, c) { public void initAttrsSafe() { try { getAttr_a1().setValue("myconst"); } catch (final TdomAttributeSyntaxException e){ throw new RuntimeException(" cannot happen, 'myconst' is a valid NMTOKEN."); } } }; // does not throw TdomAttributeSyntaxException #/source #p For convenience there is the method #source import static eu.bandm.tools.tdom.runtime.TypedAttribute.assertSetAttrValid ; new Elemen_e (safeValues, a, b, c) { public void initAttrsSafe() { assertSetAttrValid(getAttr_a1(), "myconst"); } }; #/source #p which wraps the #src!TdomAttributeSyntaxException!, which is assumed to never happen, into an unchecked #src!AssertionError!. #p After the execution of this user-defined initialization methods, the constructor checks for #emph!completeness of required attributes!. #nl If an attribute declared as #src!#hh REQUIRED! is not set explitly, then a #src!TdomAttributeMissingException! is thrown. All constructors of elements which have such an attribute are thus declared like #source package xhtml_1_0 ; public Element_script() throws TdomAttributeMissingException, TdomAttributeSyntaxException { ...} #/source #p#kind missing IMPLEMENTIERT nur fur je eines !?!? Aufsammeln !?!?! #p The definedness of an attribute by a user-defined #src!initAttrs()! method cannot be checked statically, therefore this exception must be caught somewhere. Because a "clean functional style" of programming leads to deeply nested constructor calls, the catch clause would be far away from its cause. Therefore the class #link 2/eu/bandm/tools/tdom/runtime/TdomAttributeMissingSupplier.java #text#src TdomAttributeMissingSupplier#/link provides a wrapper method which can be used to translate all #src!TdomAttributeMissingExceptions! in an unchecked #src!AssertionError!, when they are known not to happen. This is a fragment from "#src!dtm/HtmlRenderer!", which builds a complete Html header element by one single expression: #source (..., ... , Element_head.Choice_2_Alt_1_Choice_1.alt (assertAttrsComplete(() -> new Element_script(safeValues, "￯") {@Override protected void initAttrsSafe(){ getAttr_src().setValue(path_to_javascript); assertSetAttrValid(getAttr_type(), "text/javascript"); }}) )... ) #/source #commentchar\ #p (( #nl By the way: a serious pitfall is trying an abstractions like #ldots #source private void Element_e makeIt (final String name){ return new Element_e(){ @Override public void initAttrs(){ this.getAttr_name().setValue(name); // ^^^^ refers to local field of Element_e class }} ; } #/source #p The method's parameter is NOT adressed by "#src!name!", since the local field of the element's class is the narrower lexical scope! #nl )) #commentchar/ // ------------------------------------------------------- #h2 #title Automated Construction of #src!Document!s and #src!Element!s #label txt_autoconst #p As mentioned above, the methods for constructing large #tdom models from given text files is via the generated SAX parser or the generated W3C-DOM validator. #p Both kinds of creation methods are only defined for the "#src!Document_!" classes, not for pure "#src!Element_!" classes. This is due to the fact that both construction methods possibly require #emph!global! information, like namespace mapping and collections of "#src!ID!"-type attribute values, things not existing with simple elements. #p The creation methods are provided by the #java class implementing the DTD. Let the package containing #tdom generated classes (i.e. all element classes, document classes and the DTD class) be called "#src!!". Let "#src!!", "#src!!", etc. be the tags of those elements which can serve as the top-level element of a document, according to the process instructions as described in #ref txt_pi. #p Then you can create a #tdom #src!Document! object by calling one of the following methods: #source package ; import eu.bandm.tools.util.SAXEventStream ; import eu.bandm.tools.tdom.runtime.TdomAttributeException ; import eu.bandm.tools.tdom.runtime.TdomContentException ; import eu.bandm.tools.tdom.runtime.TdomXmlException ; import eu.bandm.tools.tdom.runtime.TypedDTD ; public final class DTD extends TypedDTD { createDocument_ (Element_ el) {...} createDocument_ (Element_ el) {...} ... createDocument_ (org.w3c.dom.Document document) throws TdomContentException, TdomAttributeException {...} createDocument_ (org.w3c.dom.Document document) throws TdomContentException, TdomAttributeException {...} createDocument_ (SAXEventStream s) throws TdomContentException, TdomAttributeException, TdomXmlException {...} createDocument_ (SAXEventStream s) throws TdomContentException, TdomAttributeException, TdomXmlException {...} createDocument_ (java.io.InputStream in) throws java.io.IOExcept {...} createDocument_ (java.io.InputStream in) throws java.io.IOExcept {...} } #/source #p The first two methods only complete the "manually bottom-up creation" as described in #ref txt-bottomup: You create a document by first creating its top-level element and then giving it as an argument to the constructor. #p For large documents the following methods are more convenient: #p If the argument to "#src.create_Document()." is a W3C DOM, than this DOM object is validated against the DTD, and, in case of conformance, a #tdom model is returned. Otherwise a #src!TdomException! is thrown. #p#kind missing ??? #nl This kind of validation #emph!can! even cope with non-LL(1) content declarations, like "#src!A A (A)*!". #p If the argument to "#src.create_Document()." is a #src!SAXEventStream!, then the content models of the DTD must be LL(1), and the SAX events are consumed to construct the #tdom model. In case of non-conformance, a #src!TdomException! is thrown. #p A #link 2/eu/bandm/tools/util/SAXEventStream.html#text !/util/SAXEventStream#/ is an interface which provides access to a "frozen" sequence of SAX calls. This freezing is necessary because LL(1) parsing needs (which surprise!) a look-ahead of depth 1, which is not provided when using the SAX interface directly. #p The implementation currently provided is contained in #link 2/eu/bandm/tools/util/SAXEventQueue.html#text !/util/SAXEventQueue#/. #p The W2C DOM and SAX based construction methods can throw #src!TdomContentException!s and #src!TdomAttributeException!s, as with the explicit constructor invocations above. Additionally the SAX based methods can throw #src!TdomXmlException! in case of erronuos XML input files. #p The SAX interface's handling of attributes is rather complicated and expensive. Therefore, currently, we #bold!do not totally type-check the SAX event stream! as such! #nl Of course, when there is no value for an attribute which is "required" as described by the DTD (e.g. neither declared as #commandchar$ "$src!#IMPLIED!", nor having a default value), $commandchar# then a #src!TdomAttributeMissingException! is thrown. #nl But we do currently #emph!not! check for undefined attributes, ie. attribute names which are not declared in the DTD and thus nor represented in the model. (The foreseen #src!TdomAttributeUndefException! is not thrown.) #nl This is a violation of the "Validity constraint: Attribute Value Type" from #cite xml , which says "The attribute MUST have been declared" #nl The same holds for the even more primitive "Well-formedness constraint:Unique Att Spec", which says "An attribute name MUST NOT appear more than once in the same start-tag or empty-element tag." #p The format of the SAX event would make both checks rather expensive. #p The #bold!practical problem! is that these kinds of errors oftenly result from a miss-spelled attribute name. But the missing of the really meant attribute #bold!will not be signalled! iff it has a default value! #p#kind missing This is currently covered in bug 168 "missing attribute name check" #p#kind missing warum ist "checkRequiredAttrs()" public !?! (eg in xhtml/Element_a.java !?!? #p // #p#kind type practicalTip If you have to create large sub-structures of a #tdom model (e.g. starting with a top-level element #src!Element_!) out of your own program code, it may be nevertheless the method of choice to use the SAX interface to create a complete #src!Document_!: #nl Simply send SAX events to a #link 2/eu/bandm/tools/util/SAXEventQueue #text !/util/SAXEventQueue#/, the other side of which is consumed by the method #src!aDTD.createDocument_(SAXEventStream s)!. #nl Then extract the desired element by calling the (generated and therefore strongly typed) method #ldots #commentchar\ #source public class Document_ extends ... tdom.runtime.TypedDocument { public Element_ getDocumentElement() {...} // returns top-level element } #/source #commentchar/ #p For this purpose it is necessary to previously declare all those elements declarations in the DTD as "public", which are intended as the top element of such sub-trees. This is described in #ref txt_pi above, and tells #tdom to create the required #src!Document_<>! classes. #p The last method (#src!aDTD.createDocument_(java.io.InputStream)!) is related to our own compression method, and explained in #ref txt_compression. #p The class #link 4/xantlrtdom/TdomReader.html #text !/xantrltdom/TdomReader#/ provides the glueing code between a file input stream or similar source of text, and the construction of a tdom model. Its usage is demonstrated in #link ../../examples/doctypes/xhtml/Makefile#/. // ------------------------------------------------------- #h1 #title Visitors and Patterns #label txt_visitor // ------------------------------------------------------- #h2 #title The Generated Visitor Class and Deriving User Defined Visitors #p As mentioned above, the most elegant way of processing a #tdom model to some other format is the application of #src!Visitor!s. #nl With every #tdom model the base class #src!Visitor.java! is generated, from which you can derive your processing tools. This class defines a "#src!visit(final node)!" method for #emph!each! node classe generated by #tdom/. This includes element classes, classes representing sub-sequences, choices, alternatives and attributes. A user defines a transformation by deriving from this visitor class and overriding only those methods where he/she wants to extract some information or perform some update. #p On the other side the generated #src!Element! class (which is the top of all generated element classes) implements the interface #src!Visitable!, and the method #src!host(Visitor)!. This method is the counterpart, which causes the visitor to call its visit method on #src!this!. (This method is needed to apply a visitor to any elment without knowing its concrete class at compile time.) #p#kind missing "implements Visitable" hab ich bei attribut-klassen NICHT GEFUNDEN !!?? #p The definition of a derived visitor is most conveniently done by editing a copy of the generated #src!VisitorTemplate.java!. This file contains method declarations for all #src!visit()! methods acting on #emph!element! classes. These empty method templates are preceded by a "Javadoc" comment which contains the corresponding content definition from the original DTD. #nl Please note that the method declarations for classes representing sub-content (e.g. "#src+visit(Element_TA.Choice_1_Alt_2 x)+") are #emph!not! included in #src!VisitorTemplate.java!, but have to be added manually, whenever required.#nl #nl ((It may be convenient to have a look at "#src!Visitor.java!" for doing "copy and paste" on some more complicated method declarations of this kind.)) // ------------------------------------------------------- #h2 #title Calling a User Defined Visitor #label txt_visitorcall #p The #tdom visitors are of most simple kind, compared to the more complex ones generated by #link umod.html #inframe #text #umod#/link. They only provide the above-mentioned single method per visited class, namely "#src.visit( x).". This method can be called from external ("hand-written") code for the intial invoking of the visitor. It is also used internally by the #src!visit()! code of the generated base visitor itself, for the descending to its child nodes. #p If the class of an object is known statically, this call is optimal w.r.t. performance. It the class is #emph!not! known, there is the method "#src!visit(generatedPackage.Element element)!", which does a #src!switch/case!-based look up of the element's tag index. #p All node classes support the method "#src!x.host(Visitor v)!". This method calls the most-narrowly statically typed "#src!v.visit(x)!" method of the visitor #src!v!. This allows to visit sub-content which contains choices without the need to know in advance which alternatives are present in the concrete model data. #p This "#src!x.host(Visitor v)!" method is also realized by the generated #src!Document_<>! classes. #p Further, all node classes which have #emph!repeated! sub-content, like "#src!elems_1_A!", "#src!choices_1!" or "#src!seqs_1!", offer a method like "#src!visit_choices_1(Visitor v)!" which does the stepping through the sub-contents automatically, as mentioned already in #ref txt_ruvmethods above. // ------------------------------------------------------- #h2 #title Default Visiting Strategy of Generated Visitors and User Defined Explicit Control #p All these direct way of calling (i.e. the skipping of a "#src!match()!" multiplexer as needed in the #link umod.html #text #umod#/link visitors) are possible because the structure of the model is almost completely #emph!statically! defined at compile time, and because there are #emph!no specialization relations! (= "inheritance") between distinct classes. #nl The only places where #emph!dynamic! decisions are required come (a) from alternatives (including #src!abstract! classes) and (b) from quantification decorations "#src!?!", "#src!*!" or "#src!+!" in the original DTD. #nl The high performance of the #tdom visitors results from the fact that in both cases only simple and constant #src!int! values need to be considered, --- e.g. the result of "#src.final int getAltIndex().", a method generated with every sub-class of #src!TypedAlt! and which returns the value of a #src!static final int! assigned to the generated class at compile time, or of "#src!<>.count()!" in case of repetitions. #p The generated base visitor does nothing more than descending the document tree in depth-first textual order. #p E.g. the DTD declaration #ldots #source #/source #p#ldots generates a method like#footnote This code fragment is written for documentation purpose using the access methods from the user interface, as described above. The real implementation, since in the same package as the element classes, of course uses the direct access to protected variables for the sake of efficiency, e.g. "#src!a.elems_1_C.length!" instead of "#src!a.countElems_1_C()!" #/footnote #ldots #source package MySemiAst ; public class Visitor { ... public void visit (Element_A el){ visit (el.getElem_1_B()); for (int i = 0, n = el.countElems_1_C() ; i < n ; i++) visit(el.getElem_1_C(i)); if (el.hasElem_1_D()) visit(el.getElem_1_D()); } ... } #/source #p Special processing of nodes of a certain class is implemented by deriving from this base class. If you want to descend into the sub-tree structure starting at the currently visited node #src!el!, you simply call "#src!super.visit(el)!", or you start with a new, specialized visitor: #source package transformations ; import MySemiAst.* ; public class Transform_1 extends Visitor { protected class SpecialTransformation extends Transform_1 { ... } protected void visit (Element_A el){ final int value = Integer.parse (el.getElem_1_B().getPCData()); new Specialtranformation().visit(el.getElems_1_C()); <<<<< GEHT NICHT !!?? :-( final int secondvalue = new Visitor(){ protected int result = 0 ; public int process(Element el){visit(el); return result;} public void visit (Element_C el){ result += Integer.parse (el.getPCData()); } }.process(el); super.visit(el); } } #/source #p The call graph for a content declaration "#src{}" can be symbollically sketched like #ldots #source visit(Element_A e) ---------> {for (i=0;i switch(c.getAltIndex()){ case 0:visit(c.toAlt_1()); case 1:visit(c.toAlt_2()); } visit(Element_A.Choice_1_Alt_1 a) ---> visit(...) #/source #p #ldots and the scheme for deriving the transformation tools "#src!UserDefV!" like #ldots #source Visitor.visit(Element_A e) ---------> {..visit(e.getElem_1_B)..} ; visit(...) | +-----------------------------------------+ V UserDefV.visit(Element_B e) ------> /* own processing code, finally/fistly calling.. */ ---> super.visit(e) | +--------------------------------------------------------+ V Visitor.visit(Element_B e) ----------> /* call visit() for sub-structure */ #/source #p#kind missing MATCHER !!!! /* ======================== ????????????????????????????????? FIXME // ------------------------------------------------------------- #h2 #title ((Possible Extension ???)) #p#kind missing Perhapes "#src!Visitor.result!" and "#src!Visitor.process()!" (which indeed are needed quite frequently!) can be provided with #java 1.5 via a type parameter, so that the example would be simplified to #ldots #source protected void visit (Element_A el){ ... final int secondvalue = new Visitor(){ {result=0;} public void visit (Element_C el){result += Integer.parse (el.getPCData());} }.process(el); ... } // needed ??? : // public void visit (Element_A el){result=0; super.visit(el);} #/source ================================================== */ // ------------------------------------------------------- #h2 #title Untyped Visitors #p Sometimes (e.g. for simulating a W3C DOM and for directly applying tpath expressions) // implementing the XSLT2 result-to-input feedback) an #emph!untyped view! to a Tdom instance is required. For this purpose, the class #link 2/eu/bandm/tools/tdom/runtime/UntypedVisitor.html #text UntypedVisitor#/link is provided. It co-operates with additional "hosting" methods in the generated node classes called "#src!__dumpElementSnapshot(List)!" and "#src!__getAllAttrs(List)!". #nl It is esp. suited for accessing and collecting Ethereals, or all attributes of a certain type, etc. It is created without any parameters and applied to any element or document by calling "#src!match(e)!". Its behaviour is defined, as usual, by deriving and overriding the diverse "#src!action(e)!" or "#src!descend_<..>(e)!" methods, see #link 2/eu/bandm/tools/tdom/runtime/UntypedVisitor.html #text the api doc#/link. #nl As appropriate for this untyped view, no action or match methods for structural sub-contents are provided. // ------------------------------------------------------- #h2 #title Generated Paisley patterns #p The command line option #src!--patterns! activates the generation of Paisley patterns, see the #link paisley.html #text paisley documentation#/link. #p FIXME MORE // ------------------------------------------------------- #h1 #title The Extension Mechanism #label txt_extension #p #tdom includes a mechanism for extending a model by one or more others Its prime purpose is to suppoert reusage of visitor based code. #p On the source level, an extension has to declared in the DTD: A short example is contained in #link ../../examples/tdom/extend #text #src!metatools/examples/tdom/extend!#/link. Therein, the file #link ../../examples/tdom/extend/arith.dtd #text #src!arith.dtd!#/link, further simplified, reads as follows: #source // ... etc #/source #p The "#src!?tdom abstract!" process instruction is terminated by the ellipse "#src!...!", which indicates to the #tdom code generator to forsee the plugging-in of more element types. #nl The same is done on the genuine DTD level by defining an "#src!ENTITY!" each, which is per default of course empty. #p The usage of the extension mechanism can be seen in the file #link ../../examples/tdom/extend/arith.dtd #text #src!logic.dtd!#/link, which, slightly simplified, reads as follows: #suppressVerbatimCommandCharWarning 1 #source %arith.dtd; #/source #p Again, the import is executed on the DTD level and the #tdom level parallel, requiring seemingly redundante doubling. #p FIXME BEISPIEL KAPUTT ??? IRRNGTWAS ist da VERLORENGEGANGEN !?!?! #p#kind missing GEHT ZUR ZEIT NICHT #nl cf Bug 146 #p#kind missing BEISPIEL fuer "untere ergänzung" fehlt, e.g. "log ? expr : expr" as an expression !?!!? // ------------------------------------------------------- #h1 #title Serialization and Conversions #label txt_serial // ------------------------------------------------------- #h2 #title Generating #src!SAX! Events #p Among the generated classes there is always a #src!Dumper! class, extending the #src!Visitor! class. Its constructors are defined as #ldots #source public Dumper(org.xml.sax.ContentHandler contentHandler) { ..} public Dumper(org.xml.sax.ContentHandler contentHandler, org.xml.sax.ext.LexicalHandler commentHandler) { ..} #/source #p Whenever #src!visit()! is called on any element of the model, the corresponding #src!SAX! events #cite Sax04 are generated for this element and its attributes, for any related Ethereals (see #ref txt_ethereals) and for all elements contained therein recursively. #p The #src!LexicalHandler! is optional and is used to receive #src!TypedComment! objects. If the first constructor is called and the argument happens to support both interfaces, it will be used in both roles. // ------------------------------------------------------- #h2 #title Visualization of a #tdom Model #p You can easily print a whole model or an abitrary sub-tree to a console or to a text file by combining the above-mentioned SAX event generation with our #link 4/util/ContentPrinter.html #text ContentPrinter#/link. E.g.: #source public Dumper(new ContentPrinter(new PrintWriter_flushing(System.out), true, true) ).visit(myModel); #/source #p#kind missing ML an BT: #nl A SWING TREE interface would be nice for many purposes // ------------------------------------------------------- #h2 #title Format Generation for a #tdom Model #label txt_format_frontend_language #p The class #link 4/formatfrontends/Tdom2format.html #text formatfrontends.Tdom2format#/ contains an instantiation of the #link 4/formatfrontends/GenericCompiler.html #text generic format compiler#/. #p The generated code is a specialization of the #link tdom.html #loc txt_visitor #text #src!Visitor! class#/ generated with the tdom code. It offers a public method "#src!toFormat()!" which translates a tdom model (or any sub-expression of it) into a format object. The outlines of such a converter called "myFormatter". generated for some #tdom generated package "myModel" are: #commentchar\ #source import myModel.* ; // Element_A, Element_B, etc. public class myFormatter extends myModel.Visitor { // public interface public Formst toFormat ( Visitable element){ result=Format.empty; visit(element); return result , } public int default_indent = 2 ; // can be modified public Format default_delimiter = Format.empty ; // for debugging only // auxiliary funtions protected Format __throwIt(){...} protected void visit (TypedPCData){...} protected Format toFormat_throwing(Visitable){...} // if ==null then throw! protected Collection toFormat_array(Visitable[]){...}// "map" // user defined visitor functions public void visit (Element_A element){ result = // format generating function } // etc. } #/source #commentchar/ #p A "#src!visit()!" method is added for each node class of the #tdom model, if and only if the user gives a format description. #p The default case is that the visitor reaches "#src!visit(typedPCData)!". This simply concatenates all character content into one big, unformatted #link 4/format/Format.Append.html #text Append#/ format. This is exactly what you want in most cases for the lowest layers of the structure definition. #p (( There is a field "#src!public Format default_delimiter!" in the generated code, which is initialized to an empty format. For debugging it can be set to something like "---", indicating the borders of the different concatenated pcdata ranges. )) #p The language to define the format for a given #tdom class explicitly, is an instance of the #link format.html #loc txt_format_frontend_language #text generic format definition language#/link. The following instantiations are specific to its application to #tdom/: #cfRule DOMAIN_SPECIFIC_DATA_ADRESSING ::= element | choice | sequence | "$pcdata" | "$quoteDTDstyle" (blankCharacter)+ formatDescription ; #cfRule element ::= tag ("$" nat)? ; #cfRule choice ::= ("$C" | "$Choice") nat ; #cfRule alt ::= ("$A" | "$Alt") nat ; #cfRule sequence ::= ("$S" | "$Seq") nat ; #cfInf nat #def a natural number #xemph!including! zero(0) ; #p The reference "#src!$quoteDTDstyle! data" means a "DTD-like" quotation of the (possibly concatenated) string content of the "data" format, done by #link 4/format/Format.html #loc quoteDTDstyle(eu.bandm.tools.format.Format) #text Format.quoteDTDstyle(Format)#/link. This means simply framing the data with single quotes if it contains double quotes, and vice versa. And doing anything if the data contains both #src$!-)$. #p The reference "#src!$pcdata!" means the text content of the current tdom element. as it is delivered by the #emph!generated! #src!getPCData()! method. #p Appearing in a format code "#src!$!" means a reference to the format generated for "#src!getElem_()!", --- "#src!!" without number defaults to "#src!getElem_1()!".#nl Analoguously: "#src!$C!"/"#src!$Choice!" means a reference to the format generated for "#src!getChoice_()!" and "#src!$S!"/"#src!$Seq!" means "#src!getSeq_()!". #p If the selected element/choice/sequence appears in the DTD under a list combinator ("#src!+!" or "#src!*!"), one of the list format descriptors #bold must be applied#/ ("#src![]!", "#src![,,/]!", "#src![;;/]{}!", etc.), as described #link format.html #loc __NONTERM_aggregateFormatDescription #text for the generic case#/link. #p Please note that currently there is #bold not type-checking#/ between the format descriptors and the DTD of the model. So addressing singular instead of plural (i.e. leaving out a list format descriptor instead of using it, --- or #ital vice versa#/), will crash when trying to compile the generated code (there is either a "#src!getElems_1_A()!" of a "#src!getElem_1_A()!", as #link tdom.html #loc txt_ruvmethods #text described above#/link.) #cfRule DOMAIN_SPECIFIC_SWITCH_SELECTOR ::= element | choice | sequence ; #cfRule DOMAIN_SPECIFIC_CASE_LABEL ::= nat ; #p In case the DOMAIN_SPECIFIC_SWITCH_SELECTOR refers to an element, this element must appear in the current content model with a "#src!?!" modifier. The selector tags may be "0" for absent and "1" for present. #p In case the DOMAIN_SPECIFIC_SWITCH_SELECTOR refers to a choice, than the selector tags correspond to the allowed values of "#src!altIndex!", i.e. the position number of the present alternative in the DTD choice construct. #p If in a switch no default case is given, this defaults to "#src!$throw!". #p#kind missing It is not checked if under a "#src!?!" operator, so #src!$throw! may lead to a runtime exception. IS THIS WHAT WE WANT ??? #nl FIXME MISSING ( ) #p#kind missing Switch does currently not work for "#src!$S1!" ==> "(a,b)?" #nl FIXME MISSING ( ) #p#kind src An instructive and larger example is found in #link 3/eu/bandm/tools/metajava/format/GeneratedJavaFormatter.java#/link. #p Currently there are three ways to write down these definitions: (1) in a stand-alone file, (2) as PIs in a DTD, or (3) by #src!option! values from an #xantlr source. // ------------------------------------------------------- #h3 #title Stand-alone format description file #p All format definitions in such a file must have the form #ldots #cfRule formatRule ::= tag ( seq | choice alt )* (choice)? blankCharacter* "=" formatDescription "." ; #p Please note the dot "#src!.!" as a delimiter at the end which is required because in a #link format.html #loc __NONTERM_formatDescription#text formatDescription#/link all withespace is significant. The begin is directly after the "#src!=!". #p The expression to the left of the "#src!=!" is verbatim translated into a "#src!visit(Element_)!" rule, where is the name of the inner class of the element class, as denotated by the sequence of selectors, which are translated the same way as in the context of the format description, as described above. #p#kind missing The variant ending with a CHOICE is currently not supported. The code could ONLY BE A "#src!$switch $this {0:(...),1:(...),...}!", but #src!$this! is not yet supported !?!? #nl Furthermore, type checking would be hard !?!?!? #p#kind src Currently the #link metajava.html#text metajava#/link unparser is written as such a file, namely #link 3/eu/bandm/tools/metajava/format/java.format#/link. #p The call of the tool in controlled by these parameters:#nl #cmdline_option_documentation ../../src/eu/bandm/tools/formatfrontends/tdom2format.options #lang en #p #bold ATTENTION #/ Currently these parameters are not yet decoded as documented, but #list #i the parameters with numbers as abbrevs can only be given by position #i the other option values may be specified by #src!System.getProperty("eu.bandm.tools.formatfrontedns.tdom2format.")!. // ------------------------------------------------------- #h3 #title Process instructions in a DTD #p The format of the declarations is exacly like above, but all are wrapped into a process instruction, either one or more each: #source #/source #p Please note again the dot "#src!.!" which ends the format descriptions. #p The call of the tool is now #ldots #source ${JAVA} eu.bandm.tools.formatfrontends.Tdom2format -- model.DTD ??? #/source // ------------------------------------------------------- #h3 #title Options from an #xantlr Source #p If the #tdom meta-model is an AST meta-model created from an #link xantlr.html #text #xantlr#/link grammar, then the format directives can be formulated directly from the grammar sorce file by a special kind of rule-wise #src!option!s. #p An in the grammar file construct like #ldots #source public myNonterm options { format = "expr" ; } : grammarExpr ; #/source #p #ldots will be translated by #xantlr into a process instruction in the generated DTD: #source #/source #p Here the dot is added by this translation process, so the whole string constant in the "option" statement (including trailing whitespace!) is seen as format description. #p But you can add more format expressions into the one option statement, if you want do define format rules for sub-expressions like choices or sub-sequences. Simly terminate the "main" expression for the non-terminal with an explicit dot, and append the further declarations for these sub-expressions (or arbitrary unrelated nonterminals !-) as explicit rules. #p Simply consider that #xantlr prepends the name of the non-terminal and the equal sign, and append one dot, than you can insert further rules arbitrarily ; #p So #ldots #source public myNonterm options { format = "expr. myN $C1=sub1. myN $C2$A1 = sub2" ; } :a(b|c)(d|e) ; #/source #p #ldots will be translated by #xantlr into a process instruction with three format declarations in the generated DTD: #source #/source #p#kind src A first attempt to use this feature is in #link 3/eub/bandm/tools/umod/parser/umod.g#/link, but it has never been really employed. // ------------------------------------------------------- #h2 #title Creating a W3C (untyped) DOM Representation #p Creating a w3c dom representation can be done via this #src!SAX! output: #mt provide a #emph!generic! translation module in #link 2/eu/bandm/tools/util/SAX2DOMConverter.html #text !util/SAX2DOMConverter #/link. // NO #text !dom/runtime/SAX2DOMConverter #/link It implements #src!org.xml.sax.ContentHandler! and therefore can be connected as a drain to the #src!Dumper! describwd in the preceding paragraph. #p You have to plug in a W3C-Dom #emph!implementation!, e.g. a #src!Xerces-J! #cite xercesj by calling the public method #ldots #source public void setDOMImplementation(org.w3c.dom DOMImplementation domImpl){...} #/source #p After all #src!SAX! events have been sent completely, you can access the generated Document by calling #ldots #source public org.w3c.dom.Document getDocument() #/source #p#kind missing ML AN BT: #nl Sollte SAX2DOMConverter nicht besser in #src!util! sein statt in #src!tdom/runtime!??? // ------------------------------------------------------- #h2 #title Compressed De-/Serialization #label txt_compression #p Since the standard XML encoding (using opening and closing tags and many different layers of escaping and quoting) is very redundant, #tdom supports a compressed binary storage format, in which all tags are encoded in a minimal way, controlled by the DTD. So the tagging information is a kind of "binary", while the text contents are left unchanged (i.e. it is encoded using #src!UTF-8!) #p The writing out of a document or element is initiated by a sequence like #ldots #commentchar\ #source import eu.bandm.tools.tdom.runtime.EncodingOutputStream ; os = new EncodingOutputStream(anyOutputstream); myElement.encode(os); // OR myDocument.enccode(os); #/source #commentchar/ #p The definition of the encode methods is on the level of the #src!runtime! classes, the generated classes are defined above, so see #link 2/eu/bandm/tools/tdom/runtime/TypedDocument.html #text !/tdom/runtime/TypedDocument#/link or #link 2/eu/bandm/tools/tdom/runtime/TypedElement.html #text !/tdom/runtime/TypedElement#/link for details. #p The reading back is only possible on #src!Document! level, by calling the approporate factory method defined with the DTD class: as already mentioned in #ref txt_autoconst above. #source createDocument_ (java.io.InputStream in) throws java.io.IOExcept {...} #/source #p#kind missing ML AN BT: #nl Irgendwas stimmt nicht mit dieser doku !#nl Symmetrie ?? // ------------------------------------------------------- #h1 #title Using the #tdom Tool #p // ------------------------------------------------------- #h2 #title Calling the #tdom Tool #label txt_calltdom #p The #tdom tool is called from the command line: #nl #cmdline_option_documentation ../../src/eu/bandm/tools/tdom_withOptions/Options.xml #lang en #p Currently the parameters #src!--commonContentClass!, #src!--baseClass! and #src!--noCompress! are not supported. #p Additionally there is a macro in the metatools make system defined in #src!etc/calltools.mk!, which is called from a #src!Makefile! as #ldots #source $(call tdom, , , ) #/source #p This macro esp. cares for the conversion of slashes and backslashes between a unix and a cygwin environment. // ------------------------------------------------------- #h2 #title Outputs and Error Messages #p As an output, #tdom generates one source file for each #src!! declaration, according to the naming conventions explained above in #ref txt_elementclasses, #p Additionally there will be generated #ldots #list #i A file named #src!sources!, listing all source files generated by #tdom, and included by the #src!Makefile!, esp. for realizing "#src!make clean!" #i The abstract super classes #src!Element.java! and #src!Document.java!. #i The binary version of the DTD, contained in #src!DTD.java! #i #src!Visitor.java! and #src!VisitorTemplate.java!, as described above in #ref txt_visitor.. #/list #p #label txt_tdom_ambig #tdom constructs a whole zoo of parsers and validators (from SAX, W3C-DOM, Java constructors, etc). In this context it is central to detect #emph!ambiguities! in the DTD grammar. In practice we oftenly find badly written DTDs. But many of the ambiguities are two alternatives which both include "epsilon" in their language, eg. #ldots #source ... ( a* | b* ) ... #/source #p These ambiguities do #emph!not! violate the "LL(1)" requirement, as iterpreted by W3C. Nevertheless, it is important to have a close look to all points of ambiguity, The #tdom tool does print them in a precise format. Eg. when trying to translate "xhmlt1-flat.dtd", you get the typical warning #source warning: table: conflict on [thead, tbody, tr, tfoot] between alts 0 and 1 in rule [thead, col, tbody, tr, colgroup, tfoot] -> ( [thead, col, tbody, tr, tfoot] -> {[col] -> (col)* -> [thead, tbody, tr, tfoot]} | [thead, tbody, tr, colgroup, tfoot] -> {[colgroup] -> (colgroup)* -> [thead, tbody, tr, tfoot]} ) #/source #p In the innermost nesting (in braces) you see the first and follow sets of some grammar expressions : #source { [firstSet] -> grammarExpr -> [followSet] } #/source #p The conflict is here caused by a disjunction. On the next higher level you see the disjunction of two such constructs, each preceded by its own first set (which is identical to the internal first set, or to the union of the internal first and follow set, if the grammar expresssion can produce epsilon, --- as we all have learned from the Dragon Book !-) #source ( [firstSet_A] -> {...} | [firstSet_B] -> {...} ) #/source #p Before this bracket, again, there is the first set of the disjunction as a whole (which is not very informative, it is just the union), but in the warning message you find the #emph!intersection! of these two sets, which is the cause for the ambiguity. /* ============================= IF SOURCE REACHABLE #p FIXME In the meta tool source repository there are many examples for package source generated by #tdom. Cf. #link XXXX ========================================= */ #p#kind missing WARNING when unreachable element ???? // ------------------------------------------------------------- #h1 #title #xantlr/ and #tdom --- Special Issues of Their Co-Operation #label txt_xantlrtdom // ------------------------------------------------------- // ------------------------------------------------------------- #h2 #title Information Interchange by Option Controlled DTD Generation #p First of all, the grammar file fed into #xantlr must be contain the global, parser-level option #source options { dtdMode = tdom ; } #/source #p This make the generated DTD contain special "process instructions" generated to #tdom (as described above in #ref txt_pi), reflecting the settings in the grammar definition file: #list #i the #xantlr rule-level option "#src!xmlNodeTpye=abstract!" (cf. #link ./xantlr.html #loc txt_eventtypes #text xantrl sax event types#/) adds to the DTD something like #src!!". #i an #xantlr rule-level "#src!private!" or "#src!public!" modifier is translated to a process instruction accordingly, cf. #ref txt_pi. #/list // ------------------------------------------------------- #h2 #title Different Layers of Ambiguity #p When #xantlr and #tdom are plugged together, #emph!two! parsers are involved: First comes the #antlr parser, which consumes front-end characters and emits the standard #antlr error messages. The output of the #xantlr generated parser is a SAX event stream, which is fed into the different SAX receivers created by the #tdom compiler. #p Already when translating the DTD, #tdom possibly issues error messages concerning this second level of parsing, --- mostly caused by ambiguities, i.e. violations of the LL(1)-criterium. This kind of ambiguity should not be mixed up with the #antlr front-end ambiguities. Consider a definition (taken from the #ddd grammar) like #ldots #source definition ::= "list" "of" reference | "short" "for" reference | ... #/source #p The #antlr generated front-end parser has no problems with ambiguity, because there are terminal tokens guarding the alternatives. Because these tokens do #emph not#/ automatically contribute to the semi-AST, the generated DTD would read corresponding to #source definition ::= reference | reference | ... ; #/source #p This is ambiguous, and the following #tdom translation will issue an error message, as explained above in #ref txt_tdom_ambig. #p As a remedy, you should either wrap one of the terminal in a non-terminal (with an empty DTD content model), like #ldots #source definition : LIST "of" reference | "short" "for" reference | ... ; LIST : "list" ; #/source #p #ldots or wrap one of the alternatives as a whole in a non-terminal: #source definition : "list" "of" reference | shortcutdefinition | ... ; shortcutdefinition : "short" "for" reference ; #/source #p Now the generated SAX events suffice to distinguish between the alternatives. // ------------------------------------------------------- #h2 #title #src!XantrlTdom!, Glueing Code and Error Messaging Issues #p #mt provides some glueing code for plugging together #xantlr and #tdom . The central class is #link 2/eu/bandm/tools/xantlrtdom/XantlrTdom.html #text !/xantlrtdom/XantlrTdom#/, which internally creates buffers and auxiliary message pipes, etc, and plugs it all together. #p Let "XXX" be the name of your grammar, and "YYY" the top production, then the usage pattern is #ldots #source final XXX_Lexer lexer = new XXX_Lexer(stream); lexer.setFilename(filename); final XXX_Parser parser = new XXX_Parser(HistoryToken.chain(lexer)); tee = (tracing) ? new ContentPrinter(new PrintWriter_flushing(System.err), true, true) : null ; final XantlrTdom link = XantlrTdom.link (parser, msg1, 1024, tee, DTD.dtd, msg2); final Document_YYY document_module = link.parse("YYY", Document_YYY.class); #/source #p#kind nosrc Please refer to the #link 2/eu/bandm/tools/xantlrtdom/package-tree.html #text API doc. #p#kind src Please refer to the many ecamples in the #mt code employing this pattern. #p#kind missing FEHLER verhalten ?? #eof