xhtml Parsing & Objective Reporting Toolkit

Xport is a C++ toolkit library, which allows users to easily generate and parse xhtml documents and stylesheets. With Xport, and some (x)html knowledge, users can create rich xhtml documents for reporting purposes or any other purpose.

Xport includes the ability to generate and parse the three most popular types of xhtml documents.

Of the three document types listed above, xhtml strict is considered to be the default document type in Xport.

Xport consists entirely of template classes. Xport also supports both standard (narrow) and wide character types.

Xport's template classes can be grouped in three categories.

Although all classes in Xport are class templates, users of the library do not have to work directly with these template classes, as Xport declares type aliases for all classes, document types, and character types. The three categories of interface types in Xport are discussed below.

xhtml classes

Xport's xhtml classes support the functionality of creating and parsing xhtml documents. As mentioned above, Xport consists entirely of class templates, but type aliases are declared for all the interface classes for user convenience. The xhtml template classes are paramatized on two types, the document type and the character type. The list below reveals all Xports xhtml interface class templates, and the type aliases available for those classes.

class template type alias
xhtml strict wchar xhtml strict xhtml transitional wchar xhtml transitional xhtml frameset wchar xhtml frameset
xhtml_doc document wdocument tdocument wtdocument fdocument wfdocument
xhtml_markup markup wmarkup tmarkup wtmarkup fmarkup wfmarkup
xhtml_element element welement telement wtelement felement wfelement
xhtml_pcdata pcdata wpcdata tpcdata wtpcdata fpcdata wfpcdata
xhtml_comment comment wcomment tcomment wtcomment fcomment wfcomment
xhtml_processing_instruction procinstr wprocinstr tprocinstr wtprocinstr fprocinstr wfprocinstr
xhtml_formatter formatter wformatter tformatter wtformatter fformatter wfformatter
xhtml_parser parser wparser tparser wtparser fparser wfparser
Table 1: Xport's xhtml class templates and type aliases

Xport's default document type is xhtml strict, which is colored blue in the table above. Most of the example reports will reflect this document type. Thus, users of the library will normally use only the type aliases in blue above to create documents.

A brief description of Xport's xhtml types is given below. The default type alias names will be used to describe the different xhtml types available.


In Xport, the document type encapsulates an xhtml document. The document object is one of the most important objects you'll utilize. The document object organizes it's content in a tree structure, or document tree. A standard xhtml document normally contains the elements html, head, title, and body elements. These elements are known in Xport as the root elements and a document which contain these root elements is considered a root document. When you create a document object, you can optionally create it as a root document, with those root elements included. Those root elements, and all other elements contained within, form the document tree. Once content has been added to the document , the document can be written to a file or stream. document objects can also parse xhtml files, as well as html files. The results of parsing html files will vary, depending on the form of the html file.


markup is the base type for element, comment, and procinstr. markup is used mainly with the use of Xport's markup iterators, which are discussed further below. All markup iterators return references and pointers to markup objects, which are actually objects derived from markup. So, the interface of markup is very important indeed, as it is only through markup 's interface that we're able to access those objects derived from markup, when working with markup iterators.


Xport's element type encapsulates an xhtml element. element objects are used more frequently than any other object type when creating content for the document. An element object is the only markup object which can contain other markup objects, giving element objects the responsibility of making up the document tree. There are a number of ways to add and insert other markup objects into element objects, which are all detailed in the documentation and examples. Xport will only allow elements to be inserted in other elements which do not violate the document type specifications for the document type used. For instance, Xport will not allow a p element to be inserted in another p element, because that would be an xhtml violation for all document types. Each document type has specific rules, or document type definitions, which is enforced by Xport.

The element accepts three arguments in it's constructor. The first argument is required, and specifies the tag name of the element to create. In Xport, tag names are enumerated for convenience. The second optional argument specifies the id attribute for the element, and the optional third argument specifies the class attribute for the element.

An element can also be assigned attributes, which are also enumerated for convenience. The document type definitions specify which attributes can be assigned to which elements, and Xport also enforces these rules. Xport also provides an easy way to assign styles to particular elements through style attributes, but since stylesheets are supported in Xport, the use of stylesheets is encouraged over the use of style attributes.


Xport's pcdata type encapsulates PCDATA (parsable character data) in an xhtml docuement. pcdata objects are mostly used implicitly in Xport. Whenever text is inserted into an element, Xport places the text in a pcdata object. When an element contains PCDATA, the element will contain one or more pcdata objects, which include the PCDATA. The number of pcdata objects which comprise the PCDATA within an element depends on how the PCDATA was inserted into the element.

In Xport, all parsable character data is placed within pcdata objects. xhtml elements can also be placed in pcdata objects, if the elements are included as part of the character data rather than as element objects. For instance, the following code snippit will insert the i element and it's contents within a pcdata object along with it's surrounding text.

element elem;
elem.insert("The included <i>italic</i> element is placed within a pcdata object.");

In the next code snippet, however, the italic element will be seperate from the pcdata, as it's inserted as an object.

element elem;
elem << "The included " << (element(i) << "italic") << " element will not be part of the pcdata object.";

In the second snippet, there will be three markup objects contained in the paragraph element, a pcdata object containing the text The included, then an italic element object, which contains the text italic, then another pcdata object, which contains the remainder of the text.

Regardless of whether elements are represented as an element object, or as part of a pcdata object, when an (x)html document is parsed by Xport, the parser will always parse elements, including inline elements, as element objects rather than include them in pcdata objects. This means that if a paragraph element, for example, would contain content which includes inline elements and PCDATA, the paragraph element will be parsed into multiple separate pcdata objects and element objects.


Xport's comment is a very simple type, which encapsulates an xhtml comment. Comments are not a necessary item in xhtml, but they can be useful to document the xhtml source. Like elements, comments are also derived from markup, and are also considered markup objects.


Xport's procinstr encapsulates an xhtml processing instruction. There are many forms of processing instructions in xhtml, but all are delimeted by <? and ?> . One of the more popular types of processing instructions are PHP processing instructions. procinstr is also derived from markup and is also considered a markup object.


Xport's formatter is used to format the output of documents, whether to a file or to a stream. A formatter object provides detailed control of the xhtml output. With the formatter, you can specify the layout style of any element in the document. You can also specify the maximum line length, and the way entities are presented in the document. Indeed, with Xport's parser along with Xport's formatter, you may use the toolkit to simply reformat current xhtml documents to your liking.


Xport's parser allows the parsing of xhtml and html documents. If the xhtml is properly formed, the parser object will parse the document with no problems. If the document is mal-formed, the parser object will do it's best to parse the document with it's errors. The parser object will parse the file or stream into an Xport document object. No matter how mal-formed the document being parsed, the resulting document object will always be well formed, because Xport does not allow for invalid xhtml. The parser object also allows users to specify options on how documents are parsed, giving users control over such things as newline preservation, entity transformations, and byte order mark preservation. A log can optionally be generated by the parser object on it's progress.

stylesheet classes

Xport's stylesheet functionality is implemented in another set of template classes, which are parametized only by the character type. As with the xhtml template classes, there are type alias declared for the stylesheet template classes to make them easier to work with. The table below illustrates the stylesheet template classes, and their type aliases.

class template type alias
narrow character wide character
xhtml_stylesheet stylesheet wstylesheet
xhtml_stylesheet_rule stylesheet_rule wstylesheet_rule
xhtml_stylesheet_import stylesheet_import wstylesheet_import
xhtml_stylesheet_comment stylesheet_comment wstylesheet_comment
stylesheet_declaration declaration wdeclaration
Table 2: Xport's stylesheet class templates and type aliases

Xport's default character type for stylesheet type aliases is the standard narrow character type. Using standard narrow characters, library users will use only those type aliases in blue, displayed in the table to the left. Xport's stylesheet object encapsulates a cascading style sheet. The stylesheet object can easily be written to a file, or embedded in a document. Xport's stylesheet object can also parse existing stylesheets, from a file or from a stream.

A brief description of Xport's stylesheet types is given below. The default type alias names will be used to describe the different stylesheet types available.


Xport's stylesheet encapsulates an cascading style sheet. stylesheet's interface is small, but very important. It's three operations, add_item(), write(), and parse() give stylesheet it's main functionality. There are three types of objects which can be added to a stylesheet object, a stylesheet_rule, stylesheet_comment, and stylesheet_import. This makes the stylesheet object simpler than the document object. Most of the work for adding a stylesheet to a document involves the stylesheet_rule detailed further below.


Xport's stylesheet_rule encapsulates a CSS rule. When creating stylesheets in Xport, stylesheet_rule objects are used more than any other stylesheet types. Since a stylesheet rule always contains a selector which specifies the parts of the document to which the rule applies, the stylesheet_rule takes a mandatory string argument in it's constructor, which specifies the selector for the rule. After creating a stylesheet_rule object, declarations are added to it with the add_declaration() operation. This is the primary operation in stylesheet_rule and there are two forms of it. The first form accepts a declaration object, which is described below. The second form of the operation accepts two arguments. The first argument specifies the css property, and the second argument specifies that properties value. The css property names are enumerated for convenience. The properties value is specified as a string argument.


Xport's stylesheet_import has one basic purpose, to allow the import of another stylesheet into the current stylesheet.


Xport's stylesheet_comment encapsulates a stylesheet comment. This type allows users to add comments to stylesheets.


Xport's declaration encapsulates a CSS declaration. Once a declaration object is created, it can be added to a stylesheet_rule. The declaration expects two mandatory arguments in it's constructor. The first argument specifies the css property, and the second argument specifies that properties value.

iterator classes

Xport's iterator functionality is available for both xhtml markup and stylesheets. There are two types of iterators available for markup, child markup iterators and descendant markup iterators. As the names imply, child iterators traverse over the immediate children of an element or document, whereas descendant iterators traverse over all descendants of an element or the document.

The table below displays the types of iterators available in Xport.

iterator type iterator variety
non-const const
child markup iterators markup::iterator markup::const_iterator
reverse child markup iterators markup::reverse_iterator markup::const_reverse_iterator
descendant markup iterators markup::descendant_iterator markup::const_descendant_iterator
stylesheet iterators stylesheet::iterator stylesheet::const_iterator
stylesheet rule iterators stylesheet_item::iterator stylesheet_item::const_iterator
Table 3: Xport's iterator types and type aliases

A brief description of Xport's iterator types is given below.

markup::iterator and markup::const_iterator

Xport's child markup iterators, iterator and const_iterator are very useful for both creating and parsing xhtml documents. Not only do these iterators allow the traversal of child markup objects, but they can also be used to insert, add, and erase child markup objects. The iterator in particular, is so useful, that an iterator is returned from an elements insert() operation. The returned iterator in turn can be used to insert additional markup in the element to which the iterator points. Users are encouraged to make heavy use of iterators when generating content in a document.

markup::descendant_iterator and markup::const_descendant_iterator

Xport's descendant markup iterators, descendant_iterator and const_descendant_iterator are used when it's necessary to traverse descendants of a document or particular element in the document. These iterators traverse in a pre-order fashion. A descendant iterator of a document object can traverse every markup object in the whole document, which can be very handy. Like iterators, descendant iterators can also be used for inserting, adding, and erasing markup in the document.

stylesheet::iterator and stylesheet::const_iterator

Xport's stylesheet iterators, iterator and const_iterator are very useful for both creating and parsing stylesheets. Not only do these iterators allow the traversal of stylesheet items, but they can also be used to insert, add, and erase stylesheet items, which include stylesheet rules, import rules, and comments. The iterator in particular, is so useful, that an iterator is returned from a stylesheet insert() operation. When a stylesheet rule is inserted, the returned iterator in turn can be used to insert declarations in the stylesheet_rule to which the iterator points. Users are encouraged to make heavy use of iterators when generating stylesheet items in a stylesheet.

stylesheet_item::iterator and stylesheet_item::const_iterator

Xport's stylesheet rule iterators, iterator and const_iterator are very useful for both creating and parsing stylesheet rules. Although these iterators traverse declarations which are embedded in stylesheet_rule objects, this iterator is declared and defined in stylesheet_rule's base class, stylesheet_item.

This concludes the discussion of Xport's interface types. You are encouraged to read Xport's documentation, and inspect the numerous examples to get a better idea on how you can use Xport to generate and parse xhtml documents.