manakai's XML Conformance Checking

Working Draft

This Version
<http://suika.fam.cx/www/markup/xml/xmlcc/xmlcc-work>
Latest Version
<http://suika.fam.cx/www/markup/xml/xmlcc/xmlcc>
Latest Working Draft
<http://suika.fam.cx/www/markup/xml/xmlcc/xmlcc-work>
Version History
<http://suika.fam.cx/gate/cvs/markup/xml/xmlcc/xmlcc-work.en.html>
Author
<>

Abstract

...

Status of This Document

This section describes the status of this document at the time of its publication. Other documents might supersede this document.

This document is a working draft, produced as part of the Whatpm subproject of the manakai project. It might be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Comments on this document are welcome and may be sent to the author.

Translations of thie document might be available. The English version of the document is the only normative version.

Introduction

This section is non‐normative.

...

Terminology

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119 [KEYWORDS].

All examples and notes in this specification are non‐normative, as are all sections explicitly marked non‐normative. Everything else in this specification is normative.

Algorithm is normative but non-normative. In addition, the order in which errors are raised is undefined.

Error Classification

If a Document node has no xml-well-formedness-error, entity-error, and unknown-error, then it is well-formed. If a well-formed Document node has no xml-validity-error, it is valid.

A well‐formed Document can be safely serialized into a well‐formed XML document. A valid Document can be easily serialized into a valid XML document.

To be a conforming validating XML processor, ...

Errors are classified into these error categories:

entity-error

@@

This algorithm does not support DOM tree with one or more EntityReference nodes. It is expected that any entity references are expanded at the parse time and any unexpandable entity references make parse time errors raised so that never result in DOM tree with EntityReference nodes.

round-trip-error
@@
round-trip-warning

A round-trip-warning will be raised when a construct, which might not be restored to the same construct when it is serialized and then re-parsed by a conforming processor, is encountered.

For a Comment node a round-trip-warning will be raised, since XML processors are not required to report texts of comments for applications.

unknown-error?
@@
xml-misc-error
An XML error (XML 1.0 [XML10] error / XML 1.1 [XML11] error) that is not classified to any other error category.
xml-misc-fatal-error
An XML fatal error (XML 1.0 [XML10] fatal error / XML 1.1 [XML11] fatal error) that is not classified to any other error category. @@ What errors fall into this category?
xml-misc-recommendation

An xml-misc-recommendation will be raised if a SHOULD‐level requirement in XML specification is not met.

xml-validity-error
A violation of validity constraint in XML document.
xml-well-formedness-error

If an xml-well-formedness-error is raised, it would not be possible to generate an XML serialization that would match to the appropriate production rule and that would not violate to any well‐formedness constraint in XML specification [XML10, XML11].

@@ TODO: #dt-atuseroption at user option (MAY or MUST), #dt-compat for compatibility, #dt-interop for interoperability

Parsing XML Document

Checking DOM

The following algorithms and definitions are applied to XML documents; especially, they are not applied to HTML documents.

Definitions

The XML version of a node is the XML version of the document to which the node belongs. For a Documemt node, the XML version of the document is the value of the xmlVersion attribute of the node. For a DocumentType node whose ownerDocument attribute is set to null, the XML version of the document is 1.0. For any other node, the XML version of the document is that of the Document node contained in the ownerDocument attribute of the node.

Conformance Checking Algorithms for Components

To to validate an XML string (s), the following algorithm MUST be used:

  1. If s contains a character that is not in the character class Char10, then raise an xml-well-formedness-error.
  2. If s contains a character that is in the character class CompatChar10, then raise an xml-misc-warning.
  3. If s contains a character that is in the character class ControlChar10, then raise an xml-misc-warning.
  4. @@ XML 1.1 support
  5. If s contains a U+000D CARRIAGE RETURN character, then raise a round-trip-error. @@ We should not raise duplicate errors for U+000D in attribute values. In addition, we should support a mode where U+000D will be serialized as (so that no round-trip-error will be raised).

To validate a Name (s), the following algorithm MUST be used:

  1. If s is an empty string, then raise an xml-well-formedness-error. Abort these steps.
  2. Validate s as an XML string.
  3. If the first character in s is a character that is not in the character class NameStartChar10, then raise an xml-well-formedness-error.
  4. If a character other than the first character in s is a character that is not in the character class NameChar10, then raise an xml-well-formedness-error.
  5. If s begins with the string xml (in any case combination), then raise an xml-misc-warning. @@ except for attribute names xml:lang, xml:space, xml:base, xml:id, xmlns, xmlns:*, pi name xml-stylesheet.
  6. @@ XML 1.1 support

To validate an NCName (s), the following algorithm MUST be used:

  1. Validate s as a Name.
  2. @@

To validate a public identifier (pid), the following algorithm MUST be used:

  1. If pid is null, abort these steps.
  2. If pid contains any character that is outside of the range of #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%], then it is an xml-well-formedness-error.
  3. If pid contains one of U+0009 CHARACTER TABULATION, U+000A CARRIAGE RETURN, and U+000D LINE FEED characters, if the first character of pid is U+0020 SPACE character, if the last character of pid is U+0020 SPACE character, or if there is a U+0020 SPACE character immediately followed by another U+0020 SPACE character in pid, then it is a round-trip-error. Is this really a roundtripness problem? XML spec does only define the way to match public identifiers in fact, no canonical form.
  4. @@ Should we check formal-public-identifierness?

To validate a system identifier (sid), the following algorithm MUST be used:

  1. If sid is null, abort these steps.
  2. Validate sid as an XML string.
  3. If sid contains both U+0022 QUOTATION MARK (") and U+0027 APOSTROPHE (') characters, raise an xml-well-formedness-error.
  4. If sid contains at least one U+0023 NUMBER SIGN (#) character, then raise an xml-misc-error.
  5. @@ If sid cannot be converted to a URI reference, then raise a fact-level error (xml-misc-warning?).

Checking Node

The algorithm to check a node (n) is defined as following:

If n is an Attr node
  1. Validate the localName attribute value as an NCName.
  2. If the prefix attribute value is different from null, then validate the prefix attribute value as an NCName.
  3. For each node nc in the childNodes list of n,
    1. If nc is not a Text or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recusrively.
  4. If nodeName attribute of n is xml:space @@ or {xml namespace}:space ? and value attribute of n is neither default nor preserve, then it is an xml-misc-error.
  5. @@ xml:lang value is not a language tag [RFC 3066 or its successor] or an empty string, then xml-misc-warning (a "fact"-level error; not an XML error).
  6. @@ specified, manakaiAttributeType (#ValueType Validity constraint: Attribute Value Type)
  7. Let v be the value of the attribute value of n.
  8. Validate the n against the declared type as following:
    ID_ATTR
    1. Validate v as an Name. If it fails, then raise an xml-validity-error.
    2. If ID v is defined, then raise an xml-validity-error.
    IDREF_ATTR
    1. Validate v as an Name. If it fails, then raise an xml-validity-error.
    2. If ID v is NOT defined, then raise an xml-validity-error.
    IDREFS_ATTR
    @@
    ENTITY_ATTR
    1. Validate v as an Name. If it fails, then raise an xml-validity-error.
    2. If Entity v is NOT defined, then raise an xml-validity-error.
    ENTITIES_ATTR
    @@
    NMTOKEN_ATTR
    1. Validate v as an Nmtoken. If it fails, then raise an xml-validity-error.
    NMTOKENS_ATTR
    @@
    NOTATION_ATTR
    v must be one of enumerated values. If not, then raise an xml-validity-error.
    ENUMERATED_ATTR
    v must be one of enumerated values. If not, then raise an xml-validity-error.
    @@
  9. If type ID and default is NOT #IMPLIED or #REQUIRED, then raise an xml-validity-error.
  10. @@ #FixedAttr Validity constraint: Fixed Attribute Default
  11. @@ strict serialization error for U+000D, U+000A, and U+0009 characters, leading/trailing U+0020, and U+0020{2,} string?
If n is an AttributeDefinition node
  1. If nodeName attribute of n is xml:space @@ or {xml namespace}:space ? and its declared type is different from (default|preserve), (default), or (preserve), then raise an xml-misc-error.
  2. For each node nc in the childNodes list of n,
    1. If nc is not a Text or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recusrively.
  3. If NOTATION_ATTR, enumerated values MUST be declared. If not, then raise an xml-validity-error.
  4. If NOTATION_ATTR or ENUMERATED_ATTR, values MUST all be distinct. If not, then raise an xml-validity-error.
  5. If NOTATION_ATTR on an EMPTY element, then raise an xml-validity-error.
  6. @@ #defattrvalid Validity constraint: Attribute Default Value Syntactically Correct
If n is a CDATASection node
  1. Validate the data attribute value as an XML character data.
  2. If the data attribute value contains a string ]]>, then raise an xml-well-formedness-error.
  3. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
If n is a Comment node
  1. Raise an round-trip-warning.
  2. Validate the data attribute value as an XML character data.
  3. If the data attribute value contains a string --, or if it ends with a character -, then raise an xml-well-formedness-error.
  4. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
If n is a Document node
  1. If XML version of n is different from 1.0 or 1.1, then it is an unknown-error?.
  2. If the xmlEncoding attribute value does not match to [A-Za-z] ([A-Za-z0-9._] | '-')* @@ formal def, then it is an xml-well-formedness-error.
  3. The childNodes list of n have to consist of zero or more Comment and/or ProcessingInstruction nodes, followed by an optional DocumentType node, followed by zero or more Comment and/or ProcessingInstruction nodes, followed by an Element node, followed by zero or more Comment and/or ProcessingInstruction nodes. Any violation to this is an xml-well-formedness-error.
  4. For each node nc in the childNodes list of n,
    1. If nc is not an EntityReference node, then check nc recursively.
  5. @@ allDeclarationsProcessed
If n is a DocumentFragment node
  1. For each node nc in the childNodes list of n,
    1. If nc is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recursively.
If n is a DocumentType node
  1. Validate the nodeName attribute value as an NCName.
  2. Follow the following substeps:
    1. If ownerDocument attribute of n is null, then abort these substeps.
    2. If documentElement attribute of the node set to ownerDocument attribute of n is null, then abort these substeps.
    3. If nodeName attribute of the node set to documentElement attribute of the node set to ownerDocument attribute of n is different from nodeName of n, then raise an xml-validity-error.
  3. Validate the publicId attribute value as a public identifier.
  4. Validate the systemId attribute value as a system identifier.
  5. If the publicId attribute value of n is not null and the systemId attribute value of n is null, then raise an xml-well-formedness-error. @@ publicId == null? Or, publicId == ""
  6. For each node nc in the childNodes list of n,
    1. If nc is not a ProcessingInstruction node, then it is an xml-well-formedness-error. @@ ref to manakai's extensions
    2. Otherwise, check nc recusrively.
  7. For each node in the entities, notations, and elementTypes lists of n, check the node recursively.
  8. @@ externally declared?
If n is an Element node
  1. Validate the localName attribute value as an NCName.
  2. If the prefix attribute value is different from null, then validate the prefix attribute value as an NCName.
  3. For each node nc in the childNodes list of n,
    1. If nc is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recursively.
  4. @@ #elementvalid Validity constraint: Element Valid
  5. Let attrs be the value of the attribute attribute of n. Check conformance of attrs as following:
    1. If attrs contains an Attr node whose nodeName attribute value is equal to that of another Attr node in attrs, then raise an xml-well-formedness-error.
    2. @@ #RequiredAttr Validity constraint: Required Attribute
If n is an ElementTypeDefinition node
  1. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
  2. @@ At user option, an XML processor MAY issue a warning when a declaration mentions an element type for which no declaration is provided, but this is not an error.
  3. @@ For compatibility, it is an error if the content model allows an element to match more than one occurrence of an element type in the content model.
  4. @@ #vc-MixedChildrenUnique Validity constraint: No Duplicate Types
  5. @@ At user option, an XML processor MAY issue a warning if attributes are declared for an element type not itself declared, but this is not an error.
  6. If there is more than one AttributeDefinition node with attribute type ID in the NamedNodeMap list contained in the attributeDefinitions attribute of n, then raise an xml-validity-error.
  7. If there is more than one AttributeDefinition node with attribute type NOTATION in the NamedNodeMap list contained in the attributeDefinitions attribute of n, then raise an xml-validity-error.
  8. "For interoperability, the same Nmtoken SHOULD NOT occur more than once in the enumerated attribute types of a single element type."
If n is an Entity node whose notationName attribute value is null (i.e. a parsed entity)
  1. Raise an entity-error.
  2. Validate the nodeName attribute value as an NCName.
  3. Validate the publicId attribute value as a public identifier.
  4. Validate the systemId attribute value as a system identifier.
  5. If the publicId attribute value of n is not null and the systemId attribute value of n is null, then raise an xml-well-formedness-error.
  6. For each node nc in the childNodes list of n,
    1. If nc is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recursively.
If n is an Entity node whose notationName attribute value is not null (i.e. an unparsed entity)
  1. Validate the nodeName attribute value as an NCName.
  2. Validate the publicId attribute value as a public identifier.
  3. Validate the systemId attribute value as a system identifier.
  4. If the systemId attribute value of n is null, then raise an xml-well-formedness-error.
  5. Validate the notationName attribute value of n as an NCName.
  6. @@ #not-declared Validity constraint: Notation Declared
  7. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
If n is an EntityReference node
  1. An entity-error.
  2. Validate the nodeName attribute value as an NCName.
  3. For each node nc in the childNodes list of n,
    1. If nc is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is not an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recursively.
If n is a Notation node
  1. Validate the nodeName attribute value as an NCName.
  2. Validate the publicId attribute value as a public identifier.
  3. Validate the systemId attribute value as a system identifier.
  4. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
If n is a ProcessingInstruction node
  1. If the target attribute value matches to the string xml in any case combination, then raise a xml-well-formedness-error.
  2. Otherwise, validate the target attribute value as an NCName.
  3. Then, validate the data attribute value as an XML character data.
  4. If the data attribute value contains a string ?>, then raise a xml-well-formedness-error.
  5. If the data attribute value starts with either U+0009 CHARACTER TABULATION, U+000A LINE FEED, U+000D CARRIAGE RETURN, or U+0020 SPACE character, then raise a round-trip-error.
  6. If the childNodes list of n contains any nodes, then raise an xml-well-formedness-error.
If n is a Text node
  1. Validate the data attribute value as an XML character data.
  2. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
Otherwise
xml-well-formedness-error? unknown-error?

Character Classes

This section defines a couple of character classes. These classes are referred to by algorithms specified above.

Character class Char10 contains the following characters:

This character class contains all characters allowed in the production rule Char of XML 1.0 [XML10].

Character class CompatChar10 contains the following characters:

Character class ControlChar10 contains the following characters:

This character class contains the characters listed in the Note in Section 2.2 of XML 1.0 [XML10], as amended by errata.

The character class NameStartChar10 contains the following characters:

The character class NameChar10 contains the following characters:

References

Normative References

DOM3CORE
@@ W3C DOM Level 3 Core
DOMDTDEF
@@ manakai's extension to DOM for document type definitions
KEYWORDS
Key words for use in RFCs to Indicate Requirement Levels, IETF BCP 14, RFC 2119, . This version of the specification is referenced.
INFOSET
@@
XML10
Extensible Markup Language (XML) 1.0 (Fourth Edition), W3C Recommendation, , edited in place . Tje latest version of the specification is available at <http://www.w3.org/TR/xml>. This version of the specification is referenced.
XML11
@@

Non‐normative References

CSS
Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification, W3C Candidate Recommendation, . Work in progress. The latest version of the specification is available at <http://www.w3.org/TR/CSS21>.
CSSOM
Cascading Style Sheets Object Model (CSSOM), W3C Editor's Draft, . Work in progress. The latest Editor's Draft of the specification is available at <http://dev.w3.org/csswg/cssom/Overview.html>.
HTML5
HTML 5, WHATWG Working Draft. Work in progress.
SAPI
Selectors API, W3C Editor's Draft, . Work in progress. The latest Editor's Draft of the specification is available at <http://dev.w3.org/2006/webapi/selectors-api/Overview.html>. The latest published version of the specification is available at <http://www.w3.org/TR/selectors-api/>.
XBL2
XBL 2.0, Mozilla.org, . Work in progress. The latest W3C‐published version of the specification is available at <http://www.w3.org/TR/xbl/>.
XML5
@@