manakai's XML Conformance Checking

Abstract

...

Status of This Document

This section describes the status of this document at the time of its publication. Other documents might supersede this document.

This document is a working draft, produced as part of the Whatpm subproject of the manakai project. It might be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Comments on this document are welcome and may be sent to the author.

Translations of thie document might be available. The English version of the document is the only normative version.

Introduction

This section is non‐normative.

...

Terminology

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119 [KEYWORDS].

All examples and notes in this specification are non‐normative, as are all sections explicitly marked non‐normative. Everything else in this specification is normative.

Algorithm is normative but non-normative. In addition, the order in which errors are raised is undefined.

Checking DOM

The following algorithms and definitions are applied to XML documents; especially, they are not applied to HTML documents.

Error Classification

If a Document node has no xml-well-formedness-error, entity-error, and unknown-error, then it is well-formed. If a well-formed Document node has no xml-validity-error, it is valid.

A well‐formed Document can be safely serialized into a well‐formed XML document. A valid Document can be easily serialized into a valid XML document.

Errors are classified into these error categories:

entity-error: @@

This algorithm does not support DOM tree with one or more EntityReference nodes. It is expected that any entity references are expanded at the parse time and any unexpandable entity references make parse time errors raised so that never result in DOM tree with EntityReference nodes.
round-trip-error: @@
unknown-error?: @@
xml-misc-error: An XML error (XML 1.0 [XML10] error / XML 1.1 [XML11] error) that is not classified to any other error category.
xml-misc-fatal-error: An XML fatal error (XML 1.0 [XML10] fatal error / XML 1.1 [XML11] fatal error) that is not classified to any other error category. @@ What errors fall into this category?
xml-validity-error: A violation of validity constraint in XML document.
xml-well-formedness-error: If an xml-well-formedness-error is raised, it would not be possible to generate an XML serialization that would match to the appropriate production rule and that would not violate to any well‐formedness constraint in XML specification [XML10, XML11].

@@ TODO: #dt-atuseroption at user option (MAY or MUST), #dt-compat for compatibility, #dt-interop for interoperability

Definitions

The XML version of a node is the XML version of the document to which the node belongs. For a Documemt node, the XML version of the document is the value of the xmlVersion attribute of the node. For a DocumentType node whose ownerDocument attribute is set to null, the XML version of the document is 1.0. For any other node, the XML version of the document is that of the Document node contained in the ownerDocument attribute of the node.

Checking Components

The algorithm to validate an XML character data (s) is defined as following:

If s contains a character that is not in the character class Char10, then raise an xml-well-formedness-error.
If s contains a character that is in the character class CompatChar10, then raise an xml-misc-warning.
If s contains a character that is in the character class ControlChar10, then raise an xml-misc-warning.
@@ XML 1.1 support
@@ If U+000D, round-trip-error

The algorithm to validate a Name (name) is defined as following:

@@

The algorithm to validate an NCName (name) is defined as following:

@@

To validate a public identifier (pid), the algorithm below MUST be used:

If pid contains any character that is outside of the range of #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%], then it is an xml-well-formedness-error.
If pid contains one of U+0009 CHARACTER TABULATION, U+000A CARRIAGE RETURN, and U+000D LINE FEED characters, if the first character of pid is U+0020 SPACE character, if the last character of pid is U+0020 SPACE character, or if there is a U+0020 SPACE character immediately followed by another U+0020 SPACE character in pid, then it is a round-trip-error.
@@ Should we check formal-public-identifierness?

Checking `Node`

The algorithm to check a node (n) is defined as following:

If n is an Attr node

Validate the localName attribute value as an NCName.
If the prefix attribute value is different from null, then validate the prefix attribute value as an NCName.
For each node n_c in the childNodes list of n,
1. If n_c is not a Text or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recusrively.
@@ specified, manakaiAttributeType

If n is an AttributeDefinition node

For each node n_c in the childNodes list of n,
1. If n_c is not a Text or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recusrively.

If n is a CDATASection node

Validate the data attribute value as an XML character data.
If the data attribute value contains a string ]]>, then it is an xml-well-formedness-error.
If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.

If n is a Comment node

Validate the data attribute value as an XML character data.
If the data attribute value contains a string --, or if it ends with a character -, then it is an xml-well-formedness-error.
If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.

If n is a Document node

If XML version of n is different from 1.0 or 1.1, then it is an unknown-error?.
If the xmlEncoding attribute value does not match to [A-Za-z] ([A-Za-z0-9._] | '-')* @@ formal def, then it is an xml-well-formedness-error.
The childNodes list of n have to consist of zero or more Comment and/or ProcessingInstruction nodes, followed by an optional DocumentType node, followed by zero or more Comment and/or ProcessingInstruction nodes, followed by an Element node, followed by zero or more Comment and/or ProcessingInstruction nodes. Any violation to this is an xml-well-formedness-error.
For each node n_c in the childNodes list of n,
1. If n_c is not an EntityReference node, then check n_c recursively.
@@ allDeclarationsProcessed

If n is a DocumentFragment node

For each node n_c in the childNodes list of n,
1. If n_c is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recursively.

If n is a DocumentType node

Validate the nodeName attribute value as an NCName.
Validate the publicId attribute value as a public identifier.
If the systemId attribute value contains both " and ' characters, it is a xml-well-formedness-error.
For each node n_c in the childNodes list of n,
1. If n_c is not a ProcessingInstruction node, then it is an xml-well-formedness-error. @@ ref to manakai's extensions
2. Otherwise, check n_c recusrively.
@@ entities, notations, elementTypes, externally declared?

If n is an Element node

Validate the localName attribute value as an NCName.
If the prefix attribute value is different from null, then validate the prefix attribute value as an NCName.
For each node n_c in the childNodes list of n,
1. If n_c is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recursively.

If n is an ElementTypeDefinition node

If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.

If n is an Entity node

An entity-error @@ if !notationName.
Validate the nodeName attribute value as an NCName.
Validate the publicId attribute value as a public identifier.
If the systemId attribute value contains both " and ' characters, it is a xml-well-formedness-error.
@@ notationName
For each node n_c in the childNodes list of n,
1. If n_c is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recursively.

If n is an EntityReference node

An entity-error.
Validate the nodeName attribute value as an NCName.
For each node n_c in the childNodes list of n,
1. If n_c is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
2. Otherwise, if n_c is not an EntityReference node, then it is an entity-error.
3. Otherwise, check n_c recursively.

If n is a Notation node

Validate the nodeName attribute value as an NCName.
Validate the publicId attribute value as a public identifier.
If the systemId attribute value contains both " and ' characters, it is a xml-well-formedness-error.
If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.

If n is a ProcessingInstruction node

Validate the target attribute value as an NCName.
If the target attribute value matches to XML case-insensitively, then it is a xml-well-formedness-error.
Validate the data attribute value as an XML character data.
If the data attribute value contains a string ?>, or starts with either U+0009, U+000A, U+000D, or U+0020 character, then it is a round-trip-error.
If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.

If n is a Text node

Validate the data attribute value as an XML character data.
If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.

Otherwise

xml-well-formedness-error? unknown-error?

Character Classes

This section defines a couple of character classes. These classes are referred to by algorithms specified above.

Character class Char10 contains the following characters:

U+0009 CHARACTER TABULATION
U+000A LINE FEED
U+000D CARRIAGE RETURN
U+0020 SPACE .. U+D7FF
U+E000 .. U+FFFD REPLACEMENT CHARACTER
U+10000 .. U+10FFFF

This character class contains all characters allowed in the production rule Char of XML 1.0 [XML10].

Character class CompatChar10 contains the following characters:

@@ Document authors are encouraged to avoid "compatibility characters", as defined in section 6.8 of [Unicode @@ Unicode 2.0 @@] (see also D21 in section 3.6 of [Unicode3]).

Character class ControlChar10 contains the following characters:

U+007F DELETE .. U+0084 INDEX
U+0086 START OF SELECTED AREA .. U+009F APPLICATION PROGRAM COMMAND
U+FDD0 .. U+FDEF
U+1FFFE .. U+1FFFF
U+2FFFE .. U+2FFFF
U+3FFFE .. U+3FFFF
U+4FFFE .. U+4FFFF
U+5FFFE .. U+5FFFF
U+6FFFE .. U+6FFFF
U+7FFFE .. U+7FFFF
U+8FFFE .. U+8FFFF
U+9FFFE .. U+9FFFF
U+AFFFE .. U+AFFFF
U+BFFFE .. U+BFFFF
U+CFFFE .. U+CFFFF
U+DFFFE .. U+DFFFF
U+EFFFE .. U+EFFFF
U+FFFFE .. U+FFFFF
U+10FFFE .. U+10FFFF

This character class contains the characters listed in the Note in Section 2.2 of XML 1.0 [XML10], as amended by errata.

References

Normative References

DOM3CORE: @@ W3C DOM Level 3 Core
DOMDTDEF: @@ manakai's extension to DOM for document type definitions
KEYWORDS: Key words for use in RFCs to Indicate Requirement Levels, IETF BCP 14, RFC 2119, March 1997. This version of the specification is referenced.
INFOSET: @@
XML10: Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation, 16 August 2006, edited in place 29 September 2006. Tje latest version of the specification is available at <http://www.w3.org/TR/xml>. This version of the specification is referenced.
XML11: @@

Non‐normative References

CSS: Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification, W3C Candidate Recommendation, 19 July 2007. Work in progress. The latest version of the specification is available at <http://www.w3.org/TR/CSS21>.
CSSOM: Cascading Style Sheets Object Model (CSSOM), W3C Editor's Draft, 18 June 2007. Work in progress. The latest Editor's Draft of the specification is available at <http://dev.w3.org/csswg/cssom/Overview.html>.
HTML5: HTML 5, WHATWG Working Draft. Work in progress.
SAPI: Selectors API, W3C Editor's Draft, 29 August 2007. Work in progress. The latest Editor's Draft of the specification is available at <http://dev.w3.org/2006/webapi/selectors-api/Overview.html>. The latest published version of the specification is available at <http://www.w3.org/TR/selectors-api/>.
XBL2: XBL 2.0, Mozilla.org, 15 Mar 2007. Work in progress. The latest W3C‐published version of the specification is available at <http://www.w3.org/TR/xbl/>.
XML5: @@