[all pages:] introduction message / location / muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal cygwin tips SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf SOURCE:basic.dd2 SOURCE:xslt.dd2

go one page back go to start go to start go one page ahead
message / location / muli bandm meta_tools dtd

The Format Library

(related API documentation: package format,    package formatfrontends   )

1          Principles
2          Kinds of Formats and Their Explicit Construction.
2.1          Atomic Formats
2.2          Indentation
2.3          Compound Formats
2.4          Macros
2.5          Format Variables
2.6          Annotated Format
3          Processing Format Terms With Visitors and Rewriters
4          Output Methods
4.1          Output for Debugging
5          Pre-Defined Library of Formats for Java Source Texts
6          Front-Ends for Format Expressions
6.1          Dynamic Format Generation for Printing Java Data Objects
7          FormatRepository --- A Persistent Cache Store for Format Objects

^ToC 1 Principles

The Format library is used to create well-formatted text output. Some basic outlines of the underlying concept have been published by John Hughes in 1995 [Hug95]. Our implementation offers additionally parametrization and diverse front-end notations.
(Please refer also to the API documentation.)

Pretty-printing is necessary for ergonomic reasons in the many stages of the development of meta-programming software, and is used in meta_tools throughout for producing java sources.
Furthermore, it can be employed for creating any well-formatted text output on user and application level.

The basic way of using the format library is ...

  1. to create a tree of format objects in a term-like fashion,
  2. to instantiate a format term by supplying the arguments for all variables contained therein,
  3. and finally to emit a method call which prints the top-level format with all of its contents to a text file, given a certain numeric value for the maximal number of columns to use.

This output and formatting process is defined as follows:
After resolving all open variables (see Section 2.5 below), we distinguish atomic and compound formats. Atomic formats are the basic elements producing output text, while compound formats combine other formats in a specific manner.

Whenever the top-level format is printed, the output starts in the leftmost ("0th") column of the output text.
All formats contained in a compound format are positioned relatively to the containing format. Their printing starts in the same column as the containing format, or further to the right.

For certain kinds of compound formats (Block, Line and Beside) the algorithm decides whether and where to insert line breaks between the sub-formats contained therein.
As an optional attribute, a format may have an integer value as its indentation. This value is added to the number of the starting column of the containing format to determine the starting column for printing this sub-format, if and only if it is the first first format in a newly opened line or the containing format is a Tabular.
Otherwise the value of the indentation is ignored .

The main properties of the algorithm are ...

  1. a sub-format is never printed further left than the starting column of of the containing format,
  2. the algorithm tries not to print further to the right than given by the maximal column count supplied by the users,
  3. but it cannot guarantee that this number of columns is never exceeded.

^ToC 2 Kinds of Formats and Their Explicit Construction.

Format objects are created as algebraic terms, i.e. bottom-up. This is done by calling static methods found in <METATOOLS>/.../Format.java.

All instances of Format carry algebraic semantics. The instances cannot be modified. So these factory methods may (and should!) employ sharing and caching of Format objects invisibly to the user, and so can do any user's code. For transformation of the algebraic stucture of a format term there is (1) a rewriting mechanism defined with the Compound format, as described below, and (2) a special rewriter class.

^ToC 2.1 Atomic Formats

The following list mentions all kinds of atomic formats. They are represented either by a publicly readable constant, or by a public factory method.
Additionally, each format can be given an indentation as described in Section 2.2.

  1. public Format empty;
    The empty format, useful e.g. as a parameter for functions which normally require a format object.
  2. public Format space(int width){...}
    Constructs a space of given width. On the beginning of a line, the space will be ignored .
  3. public Format space;
    A constant holding a space of width=1.
  4. public Format literal(String s){...}
    a format which will output the given String value s . The string should not contain any newlines and whitespace. Spaces contained in a Literal format will never be ignored, in contrast to space(), and explicit linefeed characters will confuse the formatting algorithm. While both kinds of character data could possibly be contained in a Literal object, the literal() factory method throws an IllegalArgumentException if its argument contains linefeed or tabular characters.
  5. public Format markup(String s){...}
    public Format markupLeft(String s){...}
    public Format markupRight(String s){...}
    The resulting format behaves like literal(), but is treated as if "invisible", i.e. as if it had a length of 0(zero).
    So mark-up (like the html "<span class=...>") can be inserted into a printed text without disturbing the layout of the visible result.

^ToC 2.2 Indentation

Each format delivers a clone with a certain indentation by its method ...

  1. public Format indent(int n){...}

^ToC 2.3 Compound Formats

As mentioned above, the compoind formats differ w.r.t. how thwy arrange the sub-formats. They are realized by the following classes, and oftenly identified by an instance of the enumeration type <METATOOLS>/format/CompoundConstructor. This enumeration offers the method public Format apply(Collection<Format> subs), which delivers a new instance of the correspondig sub-classs of format.Compound .

These sub-classes are ...

  1. Format.Append append (Format... f){...}
    The "append" format, delivered by this factory call, starts the printing of all sub-formats immediately adjacent, without generating line breaks.
  2. Format.Block block(Format... f){...}
    The "block" format decides for each of its subformats individually whether a line break is inserted. This happens if the format will exceed the maximal column number, specified in the call to the pretty printer, or if the sub-format itself will contain more than one line. Otherwise the sub-format is "appended" to its preceding sibling without any insertions.
  3. Format.Line line(Format... f){...}
    The criteria whether to insert any break are the same as above, but this decision is made only once for all sub-formats: If any of those fulfills one of the criteria, all subformats will start in a new line. That means, either all subformats are printed horizontally in one single line, or all subformats start, vertically aligned, in the starting column of the containing Line format (plus their individual indentation!).
  4. Format.Beside beside(Format... f){...}
    This format behaves and decides like Block, but only breaks if the right column will be violated, i.e. it does not refrain from putting multi-line formats adjacently, without a line break.
  5. Format.Beneath beneath(Format... f){...}
    This format always prints the sub-formats vertically beneath each other, each starting in the starting column of the containing column, plus the possible indentation.
  6. Format.Tabular tabular(Format... f){...}
    This format describes one(1) single line of a tabular arrangement: All sub-formats contained therein will start in the column given by their indentation value added to the start of the Tabular format. Iff the blank gap (= distance minus one!) from this start column to the end of the preceding sub-format is negatve or zero, a line break is inserted.
  7. Format.Prior prior(int level, int alternative, Format noparens, Format parens){...}
    If this format is (directly or indirectly) contained in another Prior format, and the value of level of the nearest enclosing such is higher than its own, then the parens format is printed.
    If it is equal, but the alternative value differs, or if the alternative value is negative, then also the parenthesized form is printed. (This allows different operators on the same precedence level to be put in parentheses in both ways of nesting, which is appropriate if there is no intuition to decide, e.g. with "|" and "&" in grammars.)
    Otherwise the noparens form is printed.

When using these construction methods it is important to consider that (except Append and Beneath) the formats do not behave in an associative way! (It can always happen that in the midst of a line or beside format, which is printed vertically, there exists a sub-format of the same class which is printed horizontally.)

Please note that (in contrast to all others) the Prior format is context sensitivew.r.t. its output ! For efficiency, this is treated by the layout algoithm only as an approximation. The effects will stay nearly invisible, as long as the character data difference between both versions is in the range of traditional parentheses usage.

common base class of all these sub-classes of Format provides a rewriting interface,i.e. a set of methods for changing the contents of existing format objects, preserving the correctness of the internal attribute caches. Plese refer to the API documentation of Format.Compound for a description.

^ToC 2.4 Macros

  1. Format.Append list (Format open, CompoundConstructor outer, Format separator, CompoundConstructor inner, Format close, Format... f){...}
  2. Format.Append list (Format open, Format separator, Format close, Format... f){...}
    Creates a compound format for the list of formats given by ...f : All formats but the last are put into an Append with the separator, and these items are joined by the inner CompoundConstructor, and again joined with the open and close format by the outer combinator.
    If the list ...f is empty, then an Empty format is returned.
    The second form substitues as defaults the line combinator as inner and the beside combinator as outer .

    This very special but frequently needed arrangement can be symbolically depicted like ...

   outer combinator
   |  |  |
   |  |  close
   |  | 
   |  inner comb.-----+---    .... ---+
   |  |               |               |
   |  Append          Append          f[last]
   |  |    |	      |    |	    
   |  f[0] separator  f[1] separator

  1. Format.Block text (String text){...}
    This function breaks the String argument into a sequence of Formats. These are put into one Block format. The sub-formats generated are ...
    1. one Literal for each consecutive sequence of non-blank characters,
    2. one Space(n) for each consecutive sequence of n whithespace characters,
    3. and one (!) Break for each sequence of "^M" of "^M/^J" or "^J", as used by the different OSs as line separator symbol. FIXME Break GIPS NICHT MEHR !?!?!?
  2. quoteDTDstyle(Format f){...}
    This method takes f as text data, and returns an Append compound, realizing a quoting as required when apperating in a DTD or other "standards" from the XML zoo:
    1. If the text does not contain any single quote, then frame it with a pair of single quotes.
    2. Else, if the text does not contain any double quote, then frame it with a pair of double quotes.
    3. Otherwise the text contains both. Then use single quotes, after replacing all their occurences in the text by the character entity &#x27;

^ToC 2.5 Format Variables

For the efficiency of any text generating process it is of central importance that format objects are parametrizable. So a complicated structure can be constructed and partially evaluated once, and subsequently be used more than once, in different variants.

We support two flavours of variables which are identfied by number or by name, and refer to values either by position or by an explicit map.

static Format variable(int) delivers a Format.BoundVariable, and static Format variable(String) delivers a Format.FreeVariable.

There are different ways for resolving variables:
Format.applyTo(Format...)returns a copy of "this". If it gets n format objects as arguments, in this copy all occurences of the first bound variables "0" to "n-1" will be replaced by the argument at this position. All bound variables with a number >=n will be re-numbered, starting with zero(0).

There is a second method, the arguments of which are not required to be formats themselves, but may be of class <METATOOLS>/format.Formattable. So they only need a method "format()" which delivers a Format as soon as such is required.

(This interface is e.g. implemented by all many classes from metajava. Therefore generated classes, fields etc. can easily be inserted into pre-defined formats which represent java code. See the metajava user documentation.)

For both methods there is a static variant which takes the format body as an argument, but does do the same evaluation, nameliy apply(Format,Format...)and apply(Format,Formattable...).

For free (=named) variables you currently only have static methods. First is apply(Format,String,Format), which returns a copy of the first format in which all occurences of the free variable with the given name are replaces with the second format.

The only way to substitute more than one variable in one pass is to construct a Format.Context, and then call eval(Context,boolean). (This is the same as static apply(Format,Context,boolean)). This will replace both kinds of variables,as contained in the context, and throw an Assertion Error, if "partial" is set to false but the result still contains unresolved variables.

^ToC 2.6 Annotated Format

^ToC 3 Processing Format Terms With Visitors and Rewriters

The "local" modification interface defined with the Compound class has already been described above.

format package additioally defines a visitor classand a rewriter class. Their usage should be self-explaining, it closely follows the architecture realised in the umod generated visitors.

^ToC 4 Output Methods

Once a format expression is created, the following output methods calculate the final formatting and create the textual representation:

  1. public void printFormat(PrintWriter pw, int width)
    ...prints the given format on the given printer writer, using the given maximal column width.
  2. public void printFormat(PrintWriter pw)
    ...calls the preceding method with a default value of 79.
  3. public void printNonFormat(PrintWriter pw)
    ...prints the format for a non-human reader, without any line feeds and formatting, just inserting one(1) space wherever necessary as a separator.
  4. public String toString(int width)
    ... converts a format to one single String value, using the specified maximal column width.
  5. public String toString()
    ...calls the preceding method with a default value of Integer.MAX_VALUE-1, i.e. probably not creating a single line break. Please note that "hard" linebreaks (represented by Beneath objects) are still translated into line feed characters.

^ToC 4.1 Output for Debugging

The function static Format.showLn(Format, int depth) returns a string containing a symbolic representation of the format arguments's term tree, but showing only the top $depth$ levels.

For larger structures it is more convenient to use static Format.showSwingTree(Format, String, boolean) which opens a swing window and shows a swing "JTree" representation of the format's term structure.

If needed in another context, this tree can be produced by using a Format.Forester explicitly, like in ...

    final JTree tree = new JTree(new Format.Forester().growRoot(myFormat)) ;

^ToC 5 Pre-Defined Library of Formats for Java Source Texts

The library <METATOOLS>/format/java/Formatter.java contains all necessary data items (ie. public static final Format constants, and public static Format()methods) for generating well-formatted java source text.

In our own sources, the usage of this class has been replaced by the much more convenient (since parser-based) methods offered by metajava/FormatClosure.

Both methods closely follow the style rules from [jCodeConv]

^ToC 6 Front-Ends for Format Expressions

On top of the Format library there is a collection of compilers and interpreters for more compact and convenient definitions of format expresssions. E.g. the metajava package includes a java parser, which generates format terms representing java expressions, statements and declarations.

Furthermore, there is an own expression based front-end language for compact descriptions of formats. Basis is a generic grammar and a generic compiler.

Derived from these, there is one instantiation for the umod tool and another instantiation for the tdom tool :
These are small compilers to generate format based code for the user-defined visualization of the user-defined data structures.

Further there is a reflection based version for any Java object, described in Section 6.1.

Syntax and semantics of the "lower" constructs, which deal with concrete data, are of course specific to the concrete instantiation of the generic language. Esp. these are

  1. the designators of atomic or aggregate values (called "DOMAIN_SPECIFIC_DATA_ADRESSING" in the following),
  2. the boolean condition when to take an alternative (in the "?" expression),
  3. and selector values and case labels (in the "$switch" expression)-

But the part of the language which controls the combinators and literals is common to all these tools. These productions and their semantics are ...

formatDescription ::= blankCharacter + | ' character + '
        | ( formatDescription )
        | ( nat ) ? > formatDescription | switch | prior | java | throw | tabular | text | compoundFormatDescription | aggregateFormatDescription | DOMAIN_SPECIFIC_DATA_ADDRESSING
switch ::= $switch ( DOMAIN_SPECIFIC_SWITCH_SELECTOR| $mode )
        { case ( , case ) * ( , append ) ? }
prior ::= $prior nat ( nat| ~ ) ? ( pattern ) ( pattern )
java ::= $java ' character + ' ( append )
throw ::= $throw
tabular ::= $tabular { append ( , append ) * }
text ::= $text dataItem

The meaning of these constructs is ...

  1. blankCharacter +
    white-space (i.e. a sequence of "blank" characters of length n ) is always significant and generates a space ( n ) format.
  2. ' character+ '
    A sequence of characterss in single quotes creates a Literal format with the given contents. (The characted string should not contain whitespace etc., cf the description of the literal Format above.)
  3. ( ... )
    Parentheses just indicate expression precedence.
  4. nat > formatDescription;
    indicates indentation with the given numeric value.
  5. > formatDescription
    Indicates indentation with a global common default value. This value must be supplied later, when starting the output process.
  6. $switch switchSelector { lab1 : val1 , ... labN : valN , defaultCase }
    ...realizes a switch statement: The result is determined by the first label which matches the selector, or possibly by the default value.
    A label can be a quoted string literal, an identifier or a (positive or zero) integer. The meanings of these labels and the matching relation to the switch selector is, of course, defined with the concrete instance of the language (e.g., the umod front-ends uses true and false to match boolean fields of model objects).
  7. $switch $mode { 0: ... ; 1: ... }. is a special switch selector, not referring to any data. Every generated formatter code hss a publicly settable variable "int mode". This is initialized to zero(0) and not visible in any form, unless referred to as a selector in such a switch statement. This allows formatters with (small) variations. If the current value of this variable does not appear as a case selector, such a switch statement evaluates to the empty format.
  8. $prior nat (nat | ~))? ( noparens ) ( parens )
    maps directly to the
    prior() factory method, as described above. ~ stands for -1.
  9. $java ' methodPath ' ( format )
    is "verbatim" translated into a java expression "methodPath(X)", where X is the compilation result of "format", and "methodPath" leads to a code of type "function from Format to Format".
    A typical use is "$java 'eu.bandm.tools.format.Format.quoteDTDstyle' (someTextDataFormat)"
  10. $throw
    if occuring anywhere to the left of a "?" operator (see if expressionbelow), it causes to leave the evaluation of the expression and instead to return the expression which stands the right of this "?" (i.e. the "else clause") as a result. Of course it makes only sense when executed conditionally, e.g. controled by a $switch expression or a further "?".
  11. $tabular{...}
    generates a tabular format, resulting in one(1) line of some table. Please note that, for being sensible, the components of the tabular format must all have an indentation, which will become the fixed starting column nubmer, as described above. (Since the comma separator (for "block") has a high binding power, moat other constructs need to parenthesis when used as components in a tabular line.)
  12. $text dataItem
    Calls the text() macrodescribed above, on the given data value. See Section 2.4, list 2, list item 1.

    The following list contains the combinators which create compound formatsordered by decreasing binding power:

compoundFormatDescription ::= formatDescription formatDescription |
        | formatDescription , formatDescription |
        | formatDescription ; formatDescription
        | formatDescription | formatDescription |
        | formatDescription / formatDescription |
        | formatDescription ? ( formatDescription ) ?

The meaning of these constructs is ...

  1. formatDescription formatDescription
    mere juxtapositon leads to an Append format containing the two (or more!) sub-formats. This may be esp. relevant in combination with white-space , which is always significant, --- see above.
  2. formatDescription "," formatDescription
    The comma operator creates a Block format.
  3. formatDescription ";" formatDescription
    The semicolon operator creates a Line format.
  4. formatDescription "|" formatDescription
    The vertical bar creates a Beside format.
  5. formatDescription "/" formatDescription
    The slash operator creates a Beneath format.
  6. "?" formatDescription nl This is a kind of "if" constructor: Whenever the generation of the left format "fails", the right format is constructed as a result. The right format description can be omitted, thus returning the empty format. The definition of the coniditions which mean "fail" are specific to the instantiation (tdom/dtd, umod, etc.)

aggregateFormatDescription ::= name_of_an_aggregate [ foldOperator ] |
        | name_of_an_aggregate [ f1 foldOp f2 foldOp f3 ] |
        | name_of_an_aggregate [ f1 , f2 , f3 / fempty ]
        | name_of_an_aggregate { formatDescription } [ foldOperator ]
        | name_of_an_aggregate { formatDescription } [ f1 , f2 , f3 ] |
        | name_of_an_aggregate { formatDescription } [ f1 , f2 , f3 / fempty ] | ...

For values which are of an aggregate type code is generated to (1) recursively create formats for the components of the aggregate, and then (2) to combine these to one result format. Of course, the structure of these aggregates and the means for addressing them is again dependent on the instantiation. But the front-end syntax to specify the combination process is common:

  1. name_of_an_aggregate "[" foldOperator "]"
    the addressed value must be a list or a set or a collection. The elements contained therein are translated to formats recursively, and these are combined using the foldOperator.
  2. name_of_an_aggregate "[" f1 "," f2 "," f3 "]"
    the addressed value must be a collection, and its elements are treated as above.
    The sequence of sub-formats is combined as when calling the
    Format.list() factory method, ie. f1 and f3 are literal formats which will be printed at the beginning and the end of this list, and f2 is used as the separator.
  3. name_of_an_aggregate "[" f1 "," f2 "," f3 "/" fempty "]"
    like above, but fempty is returned in case the aggregate field is empty.
  4. name_of_an_aggregate "[" foldOperator "]" "{" formatDescription "}"
    name_of_an_aggregate "[" f1 "," f2 "," f3 "]" "{" formatDescription "}"
    name_of_an_aggregate "[" f1 "," f2 "," f3 "/" fempty "]" "{" formatDescription "}"
    All like above, but the format description to generate the formats for the single components is given explicitly by the expression in curly braces. So to say: the code in curly braces "is mapped over the aggregate", and the sequence of results is folded/combined as described in the preceding cases.

The translation process for all these constructs is implemented in the code of <METATOOLS>/formatfrontends/GenericCompiler.

^ToC 6.1 Dynamic Format Generation for Printing Java Data Objects

The class <METATOOLS>/formatfrontends/DynamicFormatter is used to generated format objects for java data dynamically . It is used by a call like ...

    new formatfrontends.DynamicFormatter(aMessageReceiver)
           .format(model, formatcode)

The formatcode is parsed at run-time, and a format object for pretty-printing the given model is created accordingly.

The instantiation of the specific part of the format front-end language (as described above) is done by the (informally given) definitions ...

  1. DOMAIN_SPECIFIC_DATA_ADDRESSING is in most cases the name of any field of the current object.
  2. Contained in a code block "{...}" there is additionally the designator "$this", referring to the current value.
  3. Iff such a code block is mapped over a Map, then there are the designators "$from" and "$to", referring to both sides of the map entry.

All these objects references can serve as a DOMAIN_SPECIFIC_SWITCH_SELECTOR. Corresponding to its class, the DOMAIN_SPECIFIC_CASE_LABELs must be either String constants (for char and String data) or integer constants (for integer or boolean data; boolean is encoded using "0" for false and "1" for true).

Additionally you have some format generating expressions which do not address objects.

  1. The designator "$super" includes the format for the current object which is defined on the level of its super class.
  2. The designators "$class" and "$CLASS" return literal formats containing the (unqalified, local) class name. either as written, or converted to all uppercase.
  3. The expression "$is cls obj" can occur left to a "?" operator. If obj is not instance of cls, then the evaluation of the left side is terminated and the right side (which may be empty, see above) is evaluated. Otherwise evalutaion continues, and the expression itself returns an empty format.

Additionally to the production for "aggregateFormatDescription" above, you have

aggregateFormatDescription ::= ... | name_of_a_map_field [ f1 , fmap , f2 , f3 ] |
        | name_of_a_map_field [ f1 , fmap , f2 , f3 / fempty ]

These constructs (with the addtional "fmap") must be used iff the field referred to is a map or a multimap. The format "fmap" is used as the separator between both sides of each map entry.

If you want to supply different formats for the same model, may be even interactively, you can split the described process by explicitly calling the two composed steps ...

 final DynamicFormatter myFormatter = new DynamicFormatter(myMessageReceiver);
 Format result = myFormatter.visit(myFormatter.parseHint(myFormatCode)) 

Please refer to the api doc for further details.

^ToC 7 FormatRepository --- A Persistent Cache Store for Format Objects

The class <METATOOLS>format.FormatRepository provides means for caching dynamically generated format objects, as well as for writing them to disk and reloading them, for sake of efficiency.

The usage is explained in the API documentation.

[all pages:] introduction message / location / muli format dtd xantlr tdom ops paisley metajava umod option auxiliaries d2d downloads & licenses people bibliography APPENDICES:: white papers white papers 2 white papers 3 project struct proposal cygwin tips SOURCE:option.dtd SOURCE:dtd.umod DOC:deliverables.ddf DOC-DE:deliverables.ddf DOC:mtdocpage.ddf DOC-DE:mtdocpage.ddf SOURCE:basic.dd2 SOURCE:xslt.dd2

go one page back go to start go to start go one page ahead
message / location / muli bandm meta_tools dtd

made    2018-12-30_10h52   by    lepper   on    linux-q699.site        Valid XHTML 1.0 Transitional Valid CSS 2.1

produced with eu.bandm.metatools.d2d    and    XSLT    FYI view page d2d source text