manakai's XML Conformance Checking

Abstract

...

Status of This Document

This section describes the status of this document at the time of its publication. Other documents might supersede this document.

This document is a working draft, produced as part of the Whatpm subproject of the manakai project. It might be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

The scope of this specification is explicitly limited to the Whatpm implementation. It is not the purpose of this specification to define a general guideline to parse or to check XML documents. This specification does not try to define a new version of XML at all.

This version of the specification supports the fourth edition of XML 1.0 and the second edition of XML 1.1. The fifth edition of XML 1.0 might be supported in a later version. XML namespaces specifications is expected to be supported in a later version of this specification.

Comments on this document are welcome and may be sent to the author.

Translations of thie document might be available. The English version of the document is the only normative version.

Introduction

This section is non‐normative.

This specification defines how the parsing and the conformance checking of XML documents should be implemented in the Whatpm XML parser and conformance checker.

It is not the purpose of this specification to define, e.g., how to parse XML documents in general; its scope is explicitly limited to the Whatpm implementation.

...

Much of invalid (well-formed or not) XML document parsing and XML document / XML DOM conformance is left undefined so that this document provides a guideline for conformance checkers.

Terminology

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119 [KEYWORDS].

All examples and notes in this specification are non‐normative, as are all sections explicitly marked non‐normative. Everything else in this specification is normative.

Algorithm is normative but non-normative. In addition, the order in which errors are raised is undefined.

This document sometimes cites parts of XML 1.0 specification by hyperlinks. When the document being processed is an XML 1.1 document, however, corresponding parts of the XML 1.1 specification should be consulted instead.

Processing Model

Conceptually, validation of an XML document is split into two stages for the purpose of this specification: the XML document parsing stage and the DOM XML conformance checking stage.

The input to the XML document parsing stage is a byte sequence representing the parsed XML document (and any additional metadata), and the output are a DOM tree representing the XML document and zero or more errors. The processor that implements this stage is called parser. Requirements for a parser are defined in the section of Parsing an XML Document.

The input to the DOM XML conformance chcking stage is a DOM tree, and the output are zero or more errors. The processor that implements this stage is called conformance checker. Requirements for a conformance checker are defined in the section of Checking an XML DOM Tree.

Error Classification

An error is ...

If a Document node has no xml-well-formedness-error, entity-error, and unknown-error, then it is well-formed. If a well-formed Document node has no xml-validity-error, it is valid.

A well‐formed Document can be safely serialized into a well‐formed XML document. A valid Document can be easily serialized into a valid XML document.

To be a conforming validating XML processor, ...

Errors are classified into these error categories:

entity-error: @@

This algorithm does not support DOM tree with one or more EntityReference nodes. It is expected that any entity references are expanded at the parse time and any unexpandable entity references make parse time errors raised so that never result in DOM tree with EntityReference nodes.
round-trip-error: @@
round-trip-warning: A round-trip-warning will be raised when a construct, which might not be restored to the same construct when it is serialized and then re-parsed by a conforming processor, is encountered.

For a Comment node a round-trip-warning will be raised, since XML processors are not required to report texts of comments for applications.
unknown-error?: @@
xml-misc-error: An XML error (XML 1.0 [XML10] error / XML 1.1 [XML11] error) that is not classified to any other error category.
xml-misc-fatal-error: An XML fatal error (XML 1.0 [XML10] fatal error / XML 1.1 [XML11] fatal error) that is not classified to any other error category. @@ What errors fall into this category?
xml-misc-recommendation: An xml-misc-recommendation will be raised if a SHOULD‐level requirement in XML specification is not met.
xml-validity-error: A violation of validity constraint in XML document.
xml-well-formedness-error: If an xml-well-formedness-error is raised, it would not be possible to generate an XML serialization that would match to the appropriate production rule and that would not violate to any well‐formedness constraint in XML specification [XML10, XML11].
misc-info: A misc-info is raised when some status information on parsing or checking process that are considered useful for debugging and so on is available. It by no means implies the non-conformance of the document.

@@ TODO: #dt-atuseroption at user option (MAY or MUST), #dt-compat for compatibility, #dt-interop for interoperability

TODO: XML 1.1, XML Namespace 1.0/1.1, xml:base, xml:id

TODO: XML "error"/"fatal error" is not always non-conforming (only when MUST or SHOULD).

Parsing an XML Document

When a byte stream that represents an XML document is given to a parser, it MUST create a DOM tree according to relevant specifications [XML10, XML11, XMLNAMES10, XMLNAMES11, DOM3CORE, WEBDOMCORE, DOMDTDEF, MANAKAIDOMEXT].

The parser MAY continue the parsing of the document even after a fatal error (as defined by the relavant specifications) is encountered. How the parsing ought to be continued is not defined by this specification.

A future version of this specification might define the entire parser in terms of input stream preprocessor, tokenizer, and tree constructor.

In addition, the following requirements are applied to the parser:

For each external entity (including the document entity and the external subset entity, if any)

If there is a byte sequence that are not legal in the encoding in use, then the parser MUST raise an xml-misc-error.

If it is the document entity or a general entity, then:

If the input byte sequence for the entity begins with the BOM, then the parser MUST set the BOM flag of the node corresponding to the entity (the Document node for the document entity or an Entity node for a general entity) to true. @@ flag must be checked later

If it is a parameter entity or the external subset entity, then:

If the character encoding of the entity is UTF-16 but the input byte stream for the entity does not begin with the BOM, then the parser MUST raise an xml-misc-error.
@@ encoding="" preferred name?

For the document

If the XML document does not begin with an XML declaration, then the parser MUST raise an xml-misc-recommendation.

If the document does not contain the document type declaration, or if it does but the document type definition does not contain entity declaration for any of amp, lt, gt, apos, or quot, then the parser MUST raise xml-misc-recommendation(s).

For the document type declaration

@@ read external entity

The entities attribute of the DocumentType node MUST contain a NamedNodeMap object whose first five items are as follows:

An Entity node whose nodeName attribute is amp. It contains a Text node whose data attribute is set to &.
An Entity node whose nodeName attribute is lt. It contains a Text node whose data attribute is set to <.
An Entity node whose nodeName attribute is gt. It contains a Text node whose data attribute is set to >.
An Entity node whose nodeName attribute is quot. It contains a Text node whose data attribute is set to ".
An Entity node whose nodeName attribute is apos. It contains a Text node whose data attribute is set to '.

For each internal general entity declaration being processed by the parser

If the EntityValue part of the general entity declaration contains a bare U+003C LESS-THAN SIGN (<) character, then the parser MUST raise an xml-misc-warning.

For each element type declaration being processed by the parser

If there is another processed element type declaration whose Name is equal to the Name of the element type declaration, then the parser MUST raise an xml-validity-error.

For each attribute definition list declaration being processed by the parser

If there is another processed attribute defintion list declaration whose Name is equal to the Name of the attribute definition list declaration, then the parser MUST raise an xml-misc-warning.

For each attribute definition in the attribute definition list declaration, if there is another processed attribute definition whose Name is equal to the Name of the attribute definition (whether or not in the same attribute definition list declaration), then the parser MUST raise an xml-misc-warning.

For each entity declaration being processed by the parser

Handle as follows:

If the entity declaration declares a general entity, the following is applied:

If the Name is lt or amp
If the entity declaration does not declare an internal entity, or if the replacement text of the entity is not the escaped form of < (if lt) or & (if amp), then the parser MUST raise an xml-misc-error.

In other word, the character in the EntityValue has to be double-escaped.

If the Name is gt, quot, or apos
If the entity declaration does not declare an internal entity, or if the replacement text of the entity is not equal to or not the escaped form of > (if gt), " (if quot), or ' (if apos), then the parser MUST raise an xml-misc-error.

In other word, the character in the EntityValue has to be single- or double-escaped.
If the entity declaration has to be ignored since there has already been declared an entity with the same Name as the declaration, then the parser MUST raise a misc-info and abort these steps.

Five predefined entities, i.e. amp, lt, gt, quot, and apos, are always declared implicitly and therefore any declaration for such an entity always raises an misc-info.
If the entity declaration declares a parameter entity and the Name of the entity begins with the string xml (in any combination of upper- and lowercase letters), then the parser MUST raise an xml-misc-warning.
If the entity declaration contains the EntityValue, then for each occurence of any references to unparsed entities in the EntityValue, the parser MUST raise an xml-misc-error.
If the entity declaration declares a general entity, then an Entity node MUST be created and appended to the NamedNodeMap object in the entities attribute of the DocumentType node.
Read the external entity
If the replacement text of the entity is read, then parse the replacement text as if it were referenced from the content of an element (with no namespace bindings). If no @@ parse error is raised by the parsing process, then the nodes generated by the parsing MUST be appended to the Entity node. The parse error MUST NOT be propagated to the entire parsing process. Other kinds of errors MUST be propagated. The first parse error MUST abort the internal parsing process. @@ better wording
@@ prop
Then, the Entity node and its descendant MUST be marked as read-only.

For each notation declaration being processed by the parser

If there is another processed notation declaration whose Name is equal to the Name of the notation declaration, then the parser MUST raise an xml-validity-error.

For each empty-element tag

If the Name of the tag is not declared by a processed element type declaration as EMPTY content, then the parser MUST raise an xml-misc-recommendation.

For each start-tag

If the Name of the tag is declared by a processed element type declaration as EMPTY content, then the parser MUST raise an xml-misc-recommendation.

For each attribute

The parser MUST set the normalized value of the attribute to the value attribute of the Attr node created for the attribute.

That is, any entity reference has to be expanded. Unexpanded entity references in attribute values are discarded.

For each xml:space attribute

The parser MUST set the normalized value of the xml:space attribute to the value attribute of the Attr node created for the attribute even if the normalized value is different from default or preserve.

For each parameter entity reference

Process as follows:

If the declaration for the entity is not processed, then:

If the document contains no external entity or if the document contains the standalone pseudo-attribute set to yes
The parser MUST raise an xml-well-formedness-error.
Otherwise
The parser MUST raise an xml-validity-error.
If the declaration for the entity is processed but the referenced entity cannot be retrieved, then the parser MUST raise an @@ ??-error.

In any of two cases above, process as follows:

If the parameter entity reference is contained in a declaration, then the declaration MUST be ignored except that any error before the parameter entity MUST be raised as usual.
If the parameter entity reference is contained in the status portion of a conditional section, then the conditional section MUST be processed as if it were an IGNOREd section.
The parser MUST NOT process any entity or attribute-list declaration after the parameter entity reference in the DTD except when the standalone pseudo-attribute of the XML declaration (if any) is set to yes.
If the parameter entity reference is the first reference to an entity that is not read, then the parser MUST raise an entity-error.
The allDeclarationsProcessed @@ ref attribute of the Document node MUST be set to false.

For each general entity reference in an attribute value or in the content of an element

Process as follows:

If the Name of the entity reference is either amp, lt, gt, quot, or apos, then abort these steps.
If the declaration for the entity is not processed, then:

If the document contains no external entity or if the document contains the standalone pseudo-attribute set to yes
The parser MUST raise an xml-well-formedness-error.
Otherwise
The parser MUST raise an xml-validity-error.
If the declaration for the entity is processed but the referenced entity cannot be retrieved, then the parser MUST raise an @@ ??-error.

In any of two cases above, process as follows:

If the general entity reference is the first reference to an entity that is not read, then the parser MUST raise an entity-error. @@ entity declared WFC?
An unexpended entity reference node MUST be inserted to the current node.

For each comment outside of document type declaration

A Comment node MUST be created and inserted appropriately.

The parser MUST try to read any entity referenced by general or parameter entity references and the external subset entity, if any in the document type definition.

Well-formedness constraints. When the parser detects a voilation to one of certain well-formedness constraints, it MUST raise an xml-well-formedness-error. The list of such well-formed constraints is as follows:

Validity constraints. When the parser detects a violation to one of certain validity contraints, it MUST raise an xml-validity-error. The list of such validity constraints is as follows:

Other creteria. If the parser detects a violation to one of certain additional constraints, it MUST raise an xml-misc-recommendation. The list of such constraints is as follows:

For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text SHOULD contain at least one non-blank character, and neither the first nor last non-blank character of the replacement text SHOULD be a connector (| or ,).
External parsed entities SHOULD each begin with a text declaration.

The parser MUST act as if it is a validating XML processor for the purpose of informing of white space characters appearing in element content (See Section 2.10 of the XML specification).

In other word, the isElementContentWhitespace attribute of Text nodes has to be set appropriately. Note that the value of the attribute will be set to false for any Text node in the content of an element whose declaration is not processed.

The parser MUST raise at least one xml-well-formedness-error if the entity it parses does not match to the appropriate production rule in the XML specification. As an exception to this requirement, it MAY choose not to raise such an error if the error will be raised by the conformance checker when the conformance checker checks the Document object produced by the parser.

Checking an XML DOM Tree

The following algorithms and definitions are applied to XML documents; especially, they are not applied to HTML documents.

Definitions

The XML version of a node is the XML version of the document to which the node belongs. For a Documemt node, the XML version of the document is the value of the xmlVersion attribute of the node. For a DocumentType node whose ownerDocument attribute is set to null, the XML version of the document is 1.0. For any other node, the XML version of the document is that of the Document node contained in the ownerDocument attribute of the node.

Conformance Checking Algorithms for Components

To to validate an XML string (s), the following algorithm MUST be used:

If s contains a character that is not in the character class Char10, then raise an xml-well-formedness-error.
If s contains a character that is in the character class CompatChar10, then raise an xml-misc-warning.
If s contains a character that is in the character class ControlChar10, then raise an xml-misc-warning.
@@ XML 1.1 support
If s contains a U+000D CARRIAGE RETURN character, then raise a round-trip-error. @@ We should not raise duplicate errors for U+000D in attribute values. In addition, we should support a mode where U+000D will be serialized as (so that no round-trip-error will be raised).

To validate a Name (s), the following algorithm MUST be used:

If s is an empty string, then raise an xml-well-formedness-error. Abort these steps.
Validate s as an XML string.
If the first character in s is a character that is not in the character class NameStartChar10, then raise an xml-well-formedness-error.
If a character other than the first character in s is a character that is not in the character class NameChar10, then raise an xml-well-formedness-error.
If s begins with the string xml (in any case combination), then raise an xml-misc-warning. @@ except for attribute names xml:lang, xml:space.
@@ XML 1.1 support

To validate an NCName (s), the following algorithm MUST be used:

Validate s as a Name.
@@

To validate a public identifier (pid), the following algorithm MUST be used:

If pid is null, abort these steps.
If pid contains a character that is not in the character class PubidChar, then raise an xml-well-formedness-error.
If pid contains one of U+0009 CHARACTER TABULATION, U+000A CARRIAGE RETURN, and U+000D LINE FEED characters, if the first character of pid is U+0020 SPACE character, if the last character of pid is U+0020 SPACE character, or if there is a U+0020 SPACE character immediately followed by another U+0020 SPACE character in pid, then it is a round-trip-error. Is this really a roundtripness problem? XML spec does only define the way to match public identifiers in fact, no canonical form.

To validate a system identifier (sid), the following algorithm MUST be used:

If sid is null, abort these steps.
Validate sid as an XML string.
If sid contains both U+0022 QUOTATION MARK (") and U+0027 APOSTROPHE (') characters, raise an xml-well-formedness-error.
If sid contains at least one U+0023 NUMBER SIGN (#) character, then raise an xml-misc-error.
@@ If sid cannot be converted to a URI reference, then raise a fact-level error (xml-misc-warning?).

Checking `Node`

The algorithm to check a node (n) is defined as following:

If n is an Attr node

Validate the localName attribute value as an NCName.
If the prefix attribute value is different from null, then validate the prefix attribute value as an NCName.
For each node n_c in the childNodes list of n,
1. If n_c is not a Text or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recusrively.
If nodeName attribute of n is xml:space @@ or {xml namespace}:space ? and value attribute of n is neither default nor preserve, then it is an xml-misc-error.
@@ xml:lang value is not a language tag [RFC 3066 or its successor] or an empty string, then xml-misc-warning (a "fact"-level error; not an XML error).
@@ specified, manakaiAttributeType (#ValueType Validity constraint: Attribute Value Type)
Let v be the value of the attribute value of n.
Validate the n against the declared type as following:
ID_ATTR
1. Validate v as an Name. If it fails, then raise an xml-validity-error.
2. If ID v is defined, then raise an xml-validity-error.
IDREF_ATTR
1. Validate v as an Name. If it fails, then raise an xml-validity-error.
2. If ID v is NOT defined, then raise an xml-validity-error.
IDREFS_ATTR

@@

ENTITY_ATTR
1. Validate v as an Name. If it fails, then raise an xml-validity-error.
2. If Entity v is NOT defined, then raise an xml-validity-error.
ENTITIES_ATTR

@@

NMTOKEN_ATTR
1. Validate v as an Nmtoken. If it fails, then raise an xml-validity-error.
NMTOKENS_ATTR

@@

NOTATION_ATTR

v must be one of enumerated values. If not, then raise an xml-validity-error.

ENUMERATED_ATTR

v must be one of enumerated values. If not, then raise an xml-validity-error.

@@
If type ID and default is NOT #IMPLIED or #REQUIRED, then raise an xml-validity-error.
@@ #FixedAttr Validity constraint: Fixed Attribute Default
@@ strict serialization error for U+000D, U+000A, and U+0009 characters, leading/trailing U+0020, and U+0020{2,} string?

If n is an AttributeDefinition node

If nodeName attribute of n is xml:space @@ or {xml namespace}:space ? and its declared type is different from (default|preserve), (preserve|default), (default), or (preserve), then raise an xml-misc-error.
For each node n_c in the childNodes list of n,
1. If n_c is not a Text or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recusrively.
If NOTATION_ATTR, enumerated values MUST be declared. If not, then raise an xml-validity-error.
If NOTATION_ATTR or ENUMERATED_ATTR, values MUST all be distinct. If not, then raise an xml-validity-error.
If NOTATION_ATTR on an EMPTY element, then raise an xml-validity-error.
@@ #defattrvalid Validity constraint: Attribute Default Value Syntactically Correct

If n is a CDATASection node

Validate the data attribute value as an XML character data.
If the data attribute value contains a string ]]>, then raise an xml-well-formedness-error.
If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.

If n is a Comment node

Raise an round-trip-warning.
Validate the data attribute value as an XML character data.
If the data attribute value contains a string --, or if it ends with a character -, then raise an xml-well-formedness-error.
If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.

If n is a Document node

If XML version of n is different from 1.0 or 1.1, then it is an unknown-error?.
If the xmlEncoding attribute value does not match to [A-Za-z] ([A-Za-z0-9._] | '-')* @@ formal def, then it is an xml-well-formedness-error.
The childNodes list of n have to consist of zero or more Comment and/or ProcessingInstruction nodes, followed by an optional DocumentType node, followed by zero or more Comment and/or ProcessingInstruction nodes, followed by an Element node, followed by zero or more Comment and/or ProcessingInstruction nodes. Any violation to this is an xml-well-formedness-error.
For each node n_c in the childNodes list of n,
1. If n_c is not an EntityReference node, then check n_c recursively.
@@ allDeclarationsProcessed

If n is a DocumentFragment node

For each node n_c in the childNodes list of n,
1. If n_c is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recursively.

If n is a DocumentType node

Validate the nodeName attribute value as an NCName.
Follow the following substeps:
1. If ownerDocument attribute of n is null, then abort these substeps.
2. If documentElement attribute of the node set to ownerDocument attribute of n is null, then abort these substeps.
3. If nodeName attribute of the node set to documentElement attribute of the node set to ownerDocument attribute of n is different from nodeName of n, then raise an xml-validity-error.
Validate the publicId attribute value as a public identifier.
Validate the systemId attribute value as a system identifier.
If the publicId attribute value of n is not null and the systemId attribute value of n is null, then raise an xml-well-formedness-error. @@ publicId == null? Or, publicId == ""
For each node n_c in the childNodes list of n,
1. If n_c is not a ProcessingInstruction node, then it is an xml-well-formedness-error. @@ ref to manakai's extensions
2. Otherwise, check n_c recusrively.
For each node in the entities, notations, and elementTypes lists of n, check the node recursively.
@@ externally declared?
If the NamedNodeMap object in the entities attribute of n does not contain Entity nodes whose nodeName attribute are amp, lt, gt, apos, and quot then raise xml-misc-recommendation(s).

If n is an Element node

Validate the localName attribute value as an NCName.
If the prefix attribute value is different from null, then validate the prefix attribute value as an NCName.
For each node n_c in the childNodes list of n,
1. If n_c is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recursively.
@@ #elementvalid Validity constraint: Element Valid
Let attrs be the value of the attribute attribute of n. Check conformance of attrs as following:
1. If attrs contains an Attr node whose nodeName attribute value is equal to that of another Attr node in attrs, then raise an xml-well-formedness-error.
2. @@ #RequiredAttr Validity constraint: Required Attribute

If n is an ElementTypeDefinition node

If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
@@ At user option, an XML processor MAY issue a warning when a declaration mentions an element type for which no declaration is provided, but this is not an error.
@@ For compatibility, it is an error if the content model allows an element to match more than one occurrence of an element type in the content model.
@@ #vc-MixedChildrenUnique Validity constraint: No Duplicate Types
@@ At user option, an XML processor MAY issue a warning if attributes are declared for an element type not itself declared, but this is not an error.
If there is more than one AttributeDefinition node with attribute type ID in the NamedNodeMap list contained in the attributeDefinitions attribute of n, then raise an xml-validity-error.
If there is more than one AttributeDefinition node with attribute type NOTATION in the NamedNodeMap list contained in the attributeDefinitions attribute of n, then raise an xml-validity-error.
"For interoperability, the same Nmtoken SHOULD NOT occur more than once in the enumerated attribute types of a single element type."

If n is an Entity node whose notationName attribute value is null (i.e. a parsed entity)

Raise an entity-error.
Validate the nodeName attribute value as an NCName.
Validate the publicId attribute value as a public identifier.
Validate the systemId attribute value as a system identifier.
If the publicId attribute value of n is not null and the systemId attribute value of n is null, then raise an xml-well-formedness-error.
For each node n_c in the childNodes list of n,
1. If n_c is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recursively.

If n is an Entity node whose notationName attribute value is not null (i.e. an unparsed entity)

Validate the nodeName attribute value as an NCName.
Validate the publicId attribute value as a public identifier.
Validate the systemId attribute value as a system identifier.
If the systemId attribute value of n is null, then raise an xml-well-formedness-error.
Validate the notationName attribute value of n as an NCName.
@@ #not-declared Validity constraint: Notation Declared
If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.

If n is an EntityReference node

An entity-error.
Validate the nodeName attribute value as an NCName.
For each node n_c in the childNodes list of n,
1. If n_c is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is not an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recursively.

If n is a Notation node

Validate the nodeName attribute value as an NCName.
Validate the publicId attribute value as a public identifier.
Validate the systemId attribute value as a system identifier.
If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.

If n is a ProcessingInstruction node

If the target attribute value matches to the string xml in any case combination, then raise a xml-well-formedness-error.
Otherwise, validate the target attribute value as an NCName.
Then, validate the data attribute value as an XML character data.
If the data attribute value contains a string ?>, then raise a xml-well-formedness-error.
If the data attribute value starts with either U+0009 CHARACTER TABULATION, U+000A LINE FEED, U+000D CARRIAGE RETURN, or U+0020 SPACE character, then raise a round-trip-error.
If the childNodes list of n contains any nodes, then raise an xml-well-formedness-error.
@@ Warn if not declared

If n is a Text node

Validate the data attribute value as an XML character data.
If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.

Otherwise

xml-well-formedness-error? unknown-error?

Character Classes

This section defines a couple of character classes. These classes are referred to by algorithms specified above.

Character class Char10 contains the following characters:

U+0009 CHARACTER TABULATION
U+000A LINE FEED
U+000D CARRIAGE RETURN
U+0020 SPACE .. U+D7FF
U+E000 .. U+FFFD REPLACEMENT CHARACTER
U+10000 .. U+10FFFF

This character class contains all characters allowed in the production rule Char of XML 1.0 [XML10].

Character class CompatChar10 contains the following characters:

@@ Document authors are encouraged to avoid "compatibility characters", as defined in section 6.8 of [Unicode @@ Unicode 2.0 @@] (see also D21 in section 3.6 of [Unicode3]).

Character class ControlChar10 contains the following characters:

U+007F DELETE .. U+0084 INDEX
U+0086 START OF SELECTED AREA .. U+009F APPLICATION PROGRAM COMMAND
U+FDD0 .. U+FDEF
U+1FFFE .. U+1FFFF
U+2FFFE .. U+2FFFF
U+3FFFE .. U+3FFFF
U+4FFFE .. U+4FFFF
U+5FFFE .. U+5FFFF
U+6FFFE .. U+6FFFF
U+7FFFE .. U+7FFFF
U+8FFFE .. U+8FFFF
U+9FFFE .. U+9FFFF
U+AFFFE .. U+AFFFF
U+BFFFE .. U+BFFFF
U+CFFFE .. U+CFFFF
U+DFFFE .. U+DFFFF
U+EFFFE .. U+EFFFF
U+FFFFE .. U+FFFFF
U+10FFFE .. U+10FFFF

This character class contains the characters listed in the Note in Section 2.2 of XML 1.0 [XML10], as amended by errata.

The character class NameStartChar10 contains the following characters:

This character class contains all characters allowed as the first character of a string matching to the production rule Name of XML 1.0 [XML10].

The character class NameChar10 contains the following characters:

The characters in the character class NameStartChar10.

This character class contains all characters allowed as the second character of a string matching to the production rule Name of XML 1.0 [XML10].

The character class PubidChar contains the following characters:

U+0009 CHARACTER TABULATION
U+000A LINE FEED
U+000D CARRIAGE RETURN
U+0020 SPACE
U+0021 EXCLAMATION MARK (!)
U+0023 DOLLAR SIGN ($)
U+0024 NUMBER SIGN (#)
U+0025 PERCENT SIGN (%)
U+0027 APOSTROPHE (')
U+0028 LEFT PARENTHESIS (()
U+0029 RIGHT PARENTHESIS ())
U+002A ASTERISK (*)
U+002B PLUS SIGN (+)
U+002C COMMA (,)
U+002D HYPHEN-MINUS (-)
U+002E FULL STOP (.)
U+002F SOLIDUS (/)
U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9)
U+003A COLON (:)
U+003B SEMICOLON (;)
U+003D EQUAL SIGN (=)
U+003F QUESTION MARK (?)
U+0040 COMMERCIAL AT (@)
U+0041 LATIN CAPITAL LETTER A (A) .. U+005A LATIN CAPITAL LETTER Z (Z)
U+005F LOW LINE (_)
U+0061 LATIN CAPITAL LETTER A (A) .. U+007A LATIN CAPITAL LETTER Z (Z)

This character class contains all characters allowed in the production rule PubidChar of XML 1.0 [XML10].

References

Normative References

DOM3CORE: @@ W3C DOM Level 3 Core
DOMDTDEF: @@ manakai's extension to DOM for document type definitions
KEYWORDS: Key words for use in RFCs to Indicate Requirement Levels, IETF BCP 14, RFC 2119, March 1997. This version of the specification is referenced.
INFOSET: @@
XML10: Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation, 16 August 2006, edited in place 29 September 2006. Tje latest version of the specification is available at <http://www.w3.org/TR/xml>. This version of the specification is referenced.
XML11: @@

Non‐normative References

CSS: Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification, W3C Candidate Recommendation, 19 July 2007. Work in progress. The latest version of the specification is available at <http://www.w3.org/TR/CSS21>.
CSSOM: Cascading Style Sheets Object Model (CSSOM), W3C Editor's Draft, 18 June 2007. Work in progress. The latest Editor's Draft of the specification is available at <http://dev.w3.org/csswg/cssom/Overview.html>.
HTML5: HTML 5, WHATWG Working Draft. Work in progress.
SAPI: Selectors API, W3C Editor's Draft, 29 August 2007. Work in progress. The latest Editor's Draft of the specification is available at <http://dev.w3.org/2006/webapi/selectors-api/Overview.html>. The latest published version of the specification is available at <http://www.w3.org/TR/selectors-api/>.
XBL2: XBL 2.0, Mozilla.org, 15 Mar 2007. Work in progress. The latest W3C‐published version of the specification is available at <http://www.w3.org/TR/xbl/>.
XML5: @@