XML processing and DOM Document Type Definitions

Manakai Project Specification 18 September 2022

Latest version
<https://suika.suikawiki.org/www/markup/xml/domdtdef/domdtdef>
Version history
History
Author
<>

Abstract

This specification defines how various implementations of XML by the manakai project process XML documents, including: how to parse XML documents; how information in XML DTDs are represented in DOM; how base URLs are determined in XML documents; how namespace prefixes are serialized; and how conformance of XML documents are checked.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents might supersede this document.

This document is a specification produced as part of the manakai project. It is not intended to be implemented by any other DOM or XML implementation, although nothing prevents it from being implemented by any other party. It might be updated, replaced, or obsoleted by other documents at any time.

This specification was originllay developed in DOM3 age. Most part of this document need to be rewritten to align with modern Web standard specifications.

Comments on this document are welcome and may be sent to the author.

Translations of this document might be available. The English version of the document is the only normative version.

Table of contents

  1. 1 Introduction
    1. 1.1 History
  2. 2 Conformance requirements
    1. 2.1 Terminology
    2. 2.2 XML versions
  3. 3 Parsing XML documents
    1. 3.1 Character encodings
    2. 3.2 Expansion of entities
      1. 3.2.1 The external subset entity
      2. 3.2.2 Parameter entities
      3. 3.2.3 General entities
      4. 3.2.4 Fetching and parsing external entities
  4. 4 DOM Document Type Definitions
    1. 4.1 Nodes
    2. 4.2 Documents
    3. 4.3 Document types
    4. 4.4 Element types
    5. 4.5 Attribute definitions
    6. 4.6 General entities
    7. 4.7 Notations
  5. 5 Base URLs
  6. 6 XML namespaces
    1. 6.1 Namespace mappings
    2. 6.2 Namespace fixup of an XML element for serialization
    3. 6.3 Tests
  7. 7 Conformance checking of XML documents
    1. 7.1 Processing Model
      1. 7.1.1 Error Classification
    2. 7.2 Checking an XML DOM Tree
      1. 7.2.1 Definitions
      2. 7.2.2 Conformance Checking Algorithms for Components
      3. 7.2.3 Checking Node
    3. 7.3 Character Classes
  8. 8 Obsolete features
    1. 8.1 The DOM feature string for the DOM Document Type Definitions
    2. 8.2 Obsolete attribute
  9. References
    1. Normative References
    2. Non-normative References

1 Introduction

This section is non-normative.

This specification defines how various implementations of XML by the manakai project process XML documents, including: how to parse XML documents; how information in XML DTDs are represented in DOM; how base URLs are determined in XML documents; how namespace prefixes are serialized; and how conformance of XML documents are checked.

In general, this specification does not intend to override requirements imposed by various XML specifications. It is not intended to be implemented by any other DOM or XML implementation, although nothing prevents it from being implemented by any other party than the manakai project. Especially, it is not intended to be implemented by Web browsers. In fact some of features in this specification (e.g. handling of external entities) are not compatible with Web contents.

1.1 History

Most part of this specification was originally published as DOM Document Type Definition. The First Edition of the DOM Document Type Definition specification is available at <https://suika.suikawiki.org/www/markup/xml/domdtdef/domdtdef-20071209>. After publishing of the First Edition, the specification had moved to the Living Standard model. Version history of that and this specifications are available at <https://suika.suikawiki.org/gate/git/wi/markup/xmlexts.git/history/HEAD:/domdtdef/source.en.html>.

Some part of this specification is derived from the manakai's XML Conformance Checking specification. Revision history of the specification is available at <https://suika.suikawiki.org/gate/cvs/markup/xml/xmlcc/xmlcc-work.en.html>.

Some part of this specification is derived from the Namespace fixup note, whose revision history is available at <https://suika.suikawiki.org/gate/git/wi/markup/xmlexts.git/history/HEAD:/nsfixup-source.en.html>.

2 Conformance requirements

2.1 Terminology

All examples and notes in this specification are non-normative, as are all sections explicitly marked non-normative.

Known issues are marked like this. They are non-normative.

Everything else in this document is normative.

The key words MUST, MUST NOT, and MAY in this document are to be interpreted as described in RFC 2119 [KEYWORDS].

Conformance requirements phrased as algorithms or specific steps MAY be implemented in any manner, so long as the end result is equivalent. Requirements phrased in the imperative as part of algorithms are to be interpreted with the meaning of the key word used in introducing the algorithm.

The following paragraph is no longer applied, but strictErrorChecking has effect for certain methods (need to define them...)

Sometimes an exception is phrased as being required to be raised. As specified in the DOM Core specification [DOM3CORE], when the value of the strictErrorChecking attribute of the Document node is false, then the exception is not have to be raised.

Some conformance requirements are phrased as requirements on interfaces, attributes, methods, or objects. Such requirements are applied to implementations.

The construction a Foo object, where Foo is an interface, is used instead of the more accurate an object implementing the interface Foo.

The construction raise a Bar, where Bar is an exception code, is used instead of the more accurate raise an exception implementing the interface DOMException [DOM3CORE] with its code attribute set to Bar.

The WebIDL fragments in this document are to be interpreted as described in the WebIDL specification [DOMBIND].

The term null is definedby the WebIDL specification [DOMBIND].

The terms, interfaces, attributes, and exceptions context object, child, parent, node or Node (which has nodeName, adopt, node document, clone, equals, base URL), document or Document (which has doctype), XML document, HTML document, element or Element (which has namespace, namespace prefix, local name, and attribute list), attribute or Attr (which has namespace, namespace prefix, local name, and value), doctype or DocumentType (which has doctype name and systemId), XML namespace, XMLNS namespace, in a document, collection, throw, DOMException, HierarchyRequestError, InvalidCharacterError, and NotFoundError are defined by the DOM Standard [DOM].

The terms and steps live, space character, XML MIME type, document base URL, the document's address, resolve a URL, Content-Type metadata, willful violation, XML parser, setting and getting of an attribute, event loop, task, queue a task, parser pause flag, input byte stream, character token, character reference in attribute value state, consume a character reference, additional allowed character, named character references, and stops parsing are defined in the HTML Standard [HTML].

The interfaces and attributes NamedNodeMap, entities, notations, Entity, and Notation are defined in the DOM3 Core specification [DOM3CORE].

The term owner element is defined in the Manakai DOM Extensions specification [MANAKAI].

The terms Name, internal entity, external entity, unparsed entity, general entity, parameter entity, internal subset, external subset, entity reference, XML declaration, text declaration, version pseudo-attribute, encoding pseudo-attribute, standalone pseudo-attribute, replacement text, and process are defined in the XML 1.0 specification [XML].

The terms parse error, input stream, tokenization, character token, end tag token, tree construction, phase, start phase, main phase, end phase, stack of open elements, current element, list of entities, list of parameter entities, append an entity, and entity flag are defined in the XML5 specification [XML5].

The predefined entity names are apos, amp, lt, gt, and quot.

The predefined public IDs are the public identifiers associated with the DTD containing declarations for the named character references, as specified in the HTML Standard.

The terms fetch, request, response, type, body, process response, process response body, and process response end-of-file are defined in the Fetch Standard [FETCH].

Following paragraph need to be updated:

Some IDL fragments are presented as patch to an IDL fragment defined in another document. They are to be interpreted as an instance of InterfaceMembers [DOMBIND]. If the IDL fragment to which the patch is applied has any instance of InterfaceMember with same identifier part as that of an instance of InterfaceMember in the patch, then the former instance of InterfaceMember MUST be ignored.

2.2 XML versions

Implementations of this specification MUST support features defined in XML 1.0 [XML] and Namespaces in XML 1.0 [XMLNS] specifications.

Unless explicitly specified, featurs in this specification does not support XML 1.1.

This document sometimes cites parts of XML 1.0 and Namespaces in XML 1.0 specifications by hyperlinks. If such a requirement is applied to XML 1.1, however, they should be read as references to XML 1.1 [XML11] and Namespaces in XML 1.1 [XMLNS11] specifications.

XXX Should we drop XML 1.1 support entirely?

This specification does not specify which edition of relevant specifications has to be supported.

3 Parsing XML documents

This section defines additional requirements for an XML parser.

An XML parser MUST parse an XML document as specified in the XML5 [XML5] specification.

XXX If steps for an XML parser is not yet defined, similar steps for an HTML parser has to be used instead. Changes to HTML tokenizer made after XML5 has been specced must also be applied to XML tokenizer, where possible. Even for such cases, if the input is not well-formed, it is a parse error. There are before XML declaration phase, before document type phase, and document type phase in the tree construction stage. XML namespaces must also be supported. The XML parser has the entity body phase, which is initially set to the before document type phase. It is used as the next phase of the before XML declaration phase.

An XML tokenizer is a component of the XML parser performing tokenization. An input stream is associated with the tokenizer. Initially, an XML parser has an associated tokenizer and input stream. An XML parser has a stack of tokenizers, which is initially empty.

Unless explicitly specified, various states of the XML parser are shared among tokenizers associated with it.

The XML parser has standalone document, expose DTD content, don't process, and expand external entities flags. Unless explicitly stated, these flags are unset.

DOM APIs such as innerHTML and DOMParser do not set these flags.

Each entry in the list of entities of an XML parser have the open flag, which is initiall unset.

When the XML parser has to append an entity, it MUST do nothing in following cases:

If the expose DTD content flag is set, the element, attribute list, unparsed entity, and notation declarations, as well as processing instructions in DTD, MUST be exposed to the DOM through the DocumentType object.

Unless when the previous phase is the start phase or when the currently processing input stream is an entity referenced by the document type declaration or an entity reference (rather than the document entity), everything must be ignored in the document type phase.

An XML parser has need predefined entity declarations and has character entity declarations flags. They are initially unset.

In the document type phase, the need predefined entity declarations flag MUST be set. It MUST be restored to the original value when the phase is set to another value.

3.1 Character encodings

The value of the encoding pseudo-attribute or the charset parameter MUST be interpreted as an encoding label defined by the Encoding Standard.

3.2 Expansion of entities

3.2.1 The external subset entity

Just before the phase is switched from the before document type phase to the start phase, the XML parser MUST run the following steps:

  1. Let doctype be the doctype of the Document.
  2. If doctype is null, abort these steps.
  3. Let public ID be the publicId of doctype.
  4. If public ID is one of predefined public IDs, set the has character entity declarations flag and abort these steps.
  5. Set the need predefined entity declarations flag.
  6. Let system ID be the systemId of doctype.
  7. If system ID is the empty string, abort these steps.
  8. Invoke the parse an entity steps with following parameters:
    entity
    The external subset.
    url
    The result of resolving system ID against the base URL of doctype.
    original phase
    The phase of the XML parser.
    entity body phase
    The document type phase.

The external subset is parsed and processed after the internal subset, if any.

3.2.2 Parameter entities

To consume a parameter entity, the XML parser MUST run these steps:

  1. Consume the next input character while it is not one of following characters:
    • A space character
    • A U+0022 QUOTATION MARK character (")
    • A U+0025 PERCENT SIGN character (%)
    • A U+0026 AMPERSAND character (&)
    • A U+0027 APOSTROPHE character (')
    • A U+003B SEMICOLON character (;)
    • A U+003C LESS-THAN SIGN character (<)
    • A U+003D EQUAL SIGN character (=)
    • A U+003E GREATER-THAN SIGN character (>)
    • A U+0060 GRAVE ACCENT character (`)
    • An implied EOF character
  2. Let name be the characters consumed by the previous step.
  3. If the next input character is a U+003B SEMICOLON character (;), consume the character and append the character to name.
  4. Otherwise, this is a parse error.
  5. Let entity be the parameter entity whose name is name in the list of parameter entities. If there is no such a parameter entity:

    Names of parameter entities in the list of parameter entities always contain the ; character.

  6. If the don't process flag is set and the standalone document flag is not set, return nothing and abort these steps.
  7. If the open flag of the entry for entity in the list of parameter entities of the XML parser is set, this is a parse error. Return nothing and abort these steps.
  8. Let public ID be the public ID of entity.
  9. If public ID is one of predefined public IDs:
    If the state is the document type internal subset state
    Set the has character entity declarations flag and abort these steps.
    Otherwise
    Parse error. Abort these steps.
  10. Invoke the parse an entity steps with following parameters:
    entity
    entity
    url
    The result of resolving the system ID of entity against the effective declaration base URL of entity, if entity is an external entity, or null.
    original phase
    The current phase of the tree construction stage.
    entity body phase
    The current phase of the tree construction stage.
    state
    If the state is one of document type entity value double quoted state, document type entity value single quoted state, or entity value state, the entity value state.
    in markup declaration flag
    Set if the state of the XML parser is not one of document type internal subset state, document type entity value double quoted state, document type entity value single quoted state, or entity value state, or unset otherwise.

3.2.3 General entities

When the steps to consume a character reference is applied to an XML parser, the following requirements in this section is applied.

The steps to consume a character reference is defined in the HTML Standard for the HTML parser. This section monkeypatches the steps to implement XML-specific rules.

If the stack of open elements is empty and the state is not the character reference in attribute value state when the steps to consume a character reference is invoked, this is a parse error.

Each entity in the list of entities MUST be appendded to the table of the named character references as a row, where the first column is the name of the entity followed by a U+003B SEMICOLON character (;). If the table already has a row whose first column is equal to the first column of the new row, the existing row MUST be replaced by the new row.

If all of following conditions are met, this is a parse error:

If only the last condition is not met and the need predefined entity declarations flag is set, the XML parser MAY report a warning.

When character tokens are returned using the second column of the table, an XML parser MUST run these steps:

  1. Let entity be the entity of the selected row.
  2. If entity is an unparsed entity, this is a parse error. Return nothing and abort the consume a character reference steps.
  3. If the open flag of the entry for entity in the list of entities of the XML parser is set, this is a parse error. Return nothing and abort the consume a character reference steps.
  4. If these steps are invoked in the character reference in attribute value state:
    1. If entity is an external entity, this is a parse error. Return nothing and abort the consume a character reference steps.
    2. Let replacement text be the replacement text of entity.
    3. If replacement text contains a U+003C LESS-THAN SIGN character (<), this is a parse error. Return nothing and abort the consume a character reference steps.
    4. Set the open flag of the entry for entity in the list of entities of the XML parser.
    5. Let s be the empty string.
    6. Loop: If replacement text is the empty string, go to the step labeled end.
    7. If the first character in replacement text is not a U+0026 AMPERSAND character (&), append the character to s and remove the character from replacement text.
    8. If the first character in replacement text is a U+0026 AMPERSAND character (&):
      1. Attempt to consume a character reference steps recursively, with no additional allowed character. Those steps MUST behave as if the input stream were replacement text and remove the characters consumed from replacement text.
      2. If nothing is returned, append a U+0026 AMPERSAND character (&) to s and remove the first character from replacement text.
      3. Otherwise, append the characters represented by the returned character tokens to s.
    9. Go to the step labeled loop.
    10. End: Unset the open flag of the entry for entity in the list of entities of the XML parser.
    11. Return character tokens equivalent to characters in s and abort the consume a character reference steps.
  5. Otherwise:
    1. If the phase of the tree construction stage is not the main phase, return nothing and abort the consume a character reference steps.
    2. Invoke the parse an entity steps with following parameters:
      entity
      entity
      url
      The result of resolving the system ID of entity against the effective declaration base URL of entity, if entity is an external entity, or null.
      original phase
      The main phase.
      entity body phase
      The main phase.
    3. Return zero character tokens and abort the consume a character reference steps.

The number of character tokens returned by these steps can be zero, which is different from returning nothing.

3.2.4 Fetching and parsing external entities

To parse an entity entity with url, original phase, entity body phase, state whose default is null, and in markup declaration flag whose default is unset, the XML parser MUST run these steps:

  1. If in markup declaration flag is set, act as if a U+0020 SPACE character were processed by the tokenizer.
  2. Set the open flag of the entry for entity in the list of entities of the XML parser if entity is a general entity.
  3. Push a marker to the stack of open elements. This marker is referenced later in this section, but is ignored for any other purposes. (Especially, the current element can never be a marker.)
  4. Set the parser pause flag of the tokenizer of the XML parser to true.
  5. Block the tokenizer of the XML parser, such that the event loop will not run tasks that invoke the tokenizer.
  6. Push the current tokenizer of the XML parser to the stack of tokenizers.
  7. If entity is an internal entity, if the expand external entities flag is unset, if url is null, if url is in error, or if there are more than entity references than an implementation-specific limit such that the entity reference that caused these steps invoked ought to be ignored:
    1. Let replacement be the replacement text of entity, if it is an internal entity, or the empty string.
    2. Set the tokenizer of the XML parser to a new tokenizer whose input stream is replacement.

    If the expand external entities flag is unset, a reference to the external entity is expanded to the empty string.

    There should be an implementation-specific limit on how many entity references are expanded to defend against billion laughs attack.

  8. Otherwise:
    1. Let referrer be XXX document's address of the Document of the XML parser, if referrer is enabled, or null.
    2. Let req be a request with url url and referrer referrer.
    3. Invoke the fetch steps using req. The tasks queued by the fetch algorithm MUST run these steps:
      If it is a task to process response
      If the type of the response of the fetch is not default, append the "EOF" character to the input byte stream of the current tokenizer of the XML parser.
      If it is a task to process response body
      If the type of the response of the fetch is default, append newly-arrived bytes in body of the response to the input byte stream of the current tokenizer of the XML parser.
      If it is a task to process response end-of-file
      Append the "EOF" character to the input byte stream of the current tokenizer of the XML parser.
    4. Set the tokenizer of the XML parser to a new tokenizer.
    5. Set the phase of the tree construction stage of the XML parser to before XML declaration phase and entity body phase to entity body phase.

      The before XML declaration phase is used to parse the text declaration in the external entity, if any.

    6. If state is not null, let original state be the state of the XML parser and set the state of the XML parser to state.

    When decoding the input byte stream, the character encoding given in the Content-Type metadata (e.g. the charset parameter) MUST be taken into account. However, the MIME type itself is ignored.

    An external general entity can be served with various MIME types, including but not limited to: application/xml-dtd, text/xml-parsed-entity, application/xml-parsed-entity, various XML MIME types, text/sgml, application/sgml, text/plain, and application/octet-stream.

    XXX Should the charset parameter be taken into account even when the charset parameter is not defined for the MIME type?

    XXX set some error flag if external entity is not expanded

    XXX interaction of standalone=yes and external entity reference??

While the stack of tokenizers is not empty, for the purpose of the processing of an end tag token only, the XML parser MUST act as if the stack of open elements does not contain the marker and any other element added to the stack before the marker.

While the stack of tokenizer is not empty and in markup declaration flag is set upon the last invocation of the parse an entity steps:

  • If the tokenizer consumes a U+003E GREATER-THAN SIGN character (>) and the state is changed to the document type internal subset state, it MUST be a parse error and the state MUST be changed to the bogus markup declaration state instead.
  • If the tokenizer consumes an implied EOF character, the steps to stop parsing MUST be run instead. If this is within an entity value, public literal, system literal, or content model group opened by current entity, this is a parse error and the state MUST be changed to the bogus markup declaration state.

While the stack of tokenizers is not empty, instead of the stops parsing steps, the XML parser MUST run these steps:

  1. Pop the most recently added marker, as well as any element added after the marker, from the stack of open elements.
  2. If the last invocation of the parse an entity steps sets original state, set the state of the XML parser to original state.
  3. Set the phase of the tree construction stage of the XML parser to original phase for the last invocation of the parse an entity steps.
  4. Pop a tokenizer from the stack of tokenizers and set it the tokenizer of the XML parser.
  5. Unblock the tokenizer of the XML parser, such that tasks that invoke the tokenizer can again be run.
  6. Set the parser pause flag of the tokenizer of the XML parser to false.
  7. Unset the open flag of the entry for entity in the list of entities of the XML parser if entity is a general entity.
  8. If in markup declaration flag is set upon the last invocation of the parse an entity steps, act as if a U+0020 SPACE character were processed by the tokenizer.
For each external entity (including the document entity and the external subset entity, if any)
If there is a byte sequence that are not legal in the encoding in use, then the parser MUST raise an xml-misc-error.
If it is the document entity or a general entity, then:
  • If the input byte sequence for the entity begins with the BOM, then the parser MUST set the BOM flag of the node corresponding to the entity (the Document node for the document entity or an Entity node for a general entity) to true. @@ flag must be checked later
If it is a parameter entity or the external subset entity, then:
  • If the character encoding of the entity is UTF-16 but the input byte stream for the entity does not begin with the BOM, then the parser MUST raise an xml-misc-error.
  • @@ encoding="" preferred name?
For the document
If the XML document does not begin with an XML declaration, then the parser MUST raise an xml-misc-recommendation.
If the document does not contain the document type declaration, or if it does but the document type definition does not contain entity declaration for any of amp, lt, gt, apos, or quot, then the parser MUST raise xml-misc-recommendation(s).
For the document type declaration
@@ read external entity
The entities attribute of the DocumentType node MUST contain a NamedNodeMap object whose first five items are as follows:
  1. An Entity node whose nodeName attribute is amp. It contains a Text node whose data attribute is set to &.
  2. An Entity node whose nodeName attribute is lt. It contains a Text node whose data attribute is set to <.
  3. An Entity node whose nodeName attribute is gt. It contains a Text node whose data attribute is set to >.
  4. An Entity node whose nodeName attribute is quot. It contains a Text node whose data attribute is set to ".
  5. An Entity node whose nodeName attribute is apos. It contains a Text node whose data attribute is set to '.
For each internal general entity declaration being processed by the parser
If the EntityValue part of the general entity declaration contains a bare U+003C LESS-THAN SIGN (<) character, then the parser MUST raise an xml-misc-warning.
For each element type declaration being processed by the parser
If there is another processed element type declaration whose Name is equal to the Name of the element type declaration, then the parser MUST raise an xml-validity-error.
For each attribute definition list declaration being processed by the parser
If there is another processed attribute defintion list declaration whose Name is equal to the Name of the attribute definition list declaration, then the parser MUST raise an xml-misc-warning.
For each attribute definition in the attribute definition list declaration, if there is another processed attribute definition whose Name is equal to the Name of the attribute definition (whether or not in the same attribute definition list declaration), then the parser MUST raise an xml-misc-warning.
For each entity declaration being processed by the parser
Handle as follows:
  1. If the entity declaration declares a general entity, the following is applied:

    If the Name is lt or amp

    If the entity declaration does not declare an internal entity, or if the replacement text of the entity is not the escaped form of < (if lt) or & (if amp), then the parser MUST raise an xml-misc-error.

    In other word, the character in the EntityValue has to be double-escaped.

    If the Name is gt, quot, or apos

    If the entity declaration does not declare an internal entity, or if the replacement text of the entity is not equal to or not the escaped form of > (if gt), " (if quot), or ' (if apos), then the parser MUST raise an xml-misc-error.

    In other word, the character in the EntityValue has to be single- or double-escaped.

  2. If the entity declaration has to be ignored since there has already been declared an entity with the same Name as the declaration, then the parser MUST raise a misc-info and abort these steps.

    Five predefined entities, i.e. amp, lt, gt, quot, and apos, are always declared implicitly and therefore any declaration for such an entity always raises an misc-info.

  3. If the entity declaration declares a parameter entity and the Name of the entity begins with the string xml (in any combination of upper- and lowercase letters), then the parser MUST raise an xml-misc-warning.

  4. If the entity declaration contains the EntityValue, then for each occurence of any references to unparsed entities in the EntityValue, the parser MUST raise an xml-misc-error.

  5. If the entity declaration declares a general entity, then an Entity node MUST be created and appended to the NamedNodeMap object in the entities attribute of the DocumentType node.

    Read the external entity

    If the replacement text of the entity is read, then parse the replacement text as if it were referenced from the content of an element (with no namespace bindings). If no @@ parse error is raised by the parsing process, then the nodes generated by the parsing MUST be appended to the Entity node. The parse error MUST NOT be propagated to the entire parsing process. Other kinds of errors MUST be propagated. The first parse error MUST abort the internal parsing process. @@ better wording

    @@ prop

    Then, the Entity node and its descendant MUST be marked as read-only.

For each notation declaration being processed by the parser
If there is another processed notation declaration whose Name is equal to the Name of the notation declaration, then the parser MUST raise an xml-validity-error.
For each empty-element tag
If the Name of the tag is not declared by a processed element type declaration as EMPTY content, then the parser MUST raise an xml-misc-recommendation.
For each start-tag
If the Name of the tag is declared by a processed element type declaration as EMPTY content, then the parser MUST raise an xml-misc-recommendation.
For each attribute

The parser MUST set the normalized value of the attribute to the value attribute of the Attr node created for the attribute.

That is, any entity reference has to be expanded. Unexpanded entity references in attribute values are discarded.

For each xml:space attribute
The parser MUST set the normalized value of the xml:space attribute to the value attribute of the Attr node created for the attribute even if the normalized value is different from default or preserve.
For each parameter entity reference

Process as follows:

  1. If the declaration for the entity is not processed, then:
    If the document contains no external entity or if the document contains the standalone pseudo-attribute set to yes
    The parser MUST raise an xml-well-formedness-error.
    Otherwise
    The parser MUST raise an xml-validity-error.
  2. If the declaration for the entity is processed but the referenced entity cannot be retrieved, then the parser MUST raise an @@ ??-error.

In any of two cases above, process as follows:

  • If the parameter entity reference is contained in a declaration, then the declaration MUST be ignored except that any error before the parameter entity MUST be raised as usual.
  • If the parameter entity reference is contained in the status portion of a conditional section, then the conditional section MUST be processed as if it were an IGNOREd section.
  • The parser MUST NOT process any entity or attribute-list declaration after the parameter entity reference in the DTD except when the standalone pseudo-attribute of the XML declaration (if any) is set to yes.
  • If the parameter entity reference is the first reference to an entity that is not read, then the parser MUST raise an entity-error.
  • The allDeclarationsProcessed @@ ref attribute of the Document node MUST be set to false.
For each general entity reference in an attribute value or in the content of an element

Process as follows:

  1. If the Name of the entity reference is either amp, lt, gt, quot, or apos, then abort these steps.
  2. If the declaration for the entity is not processed, then:
    If the document contains no external entity or if the document contains the standalone pseudo-attribute set to yes
    The parser MUST raise an xml-well-formedness-error.
    Otherwise
    The parser MUST raise an xml-validity-error.
  3. If the declaration for the entity is processed but the referenced entity cannot be retrieved, then the parser MUST raise an @@ ??-error.

In any of two cases above, process as follows:

  • If the general entity reference is the first reference to an entity that is not read, then the parser MUST raise an entity-error. @@ entity declared WFC?
  • An unexpended entity reference node MUST be inserted to the current node.
For each comment outside of document type declaration
A Comment node MUST be created and inserted appropriately.

The parser MUST try to read any entity referenced by general or parameter entity references and the external subset entity, if any in the document type definition.

Well-formedness constraints. When the parser detects a voilation to one of certain well-formedness constraints, it MUST raise an xml-well-formedness-error. The list of such well-formed constraints is as follows:

Validity constraints. When the parser detects a violation to one of certain validity contraints, it MUST raise an xml-validity-error. The list of such validity constraints is as follows:

Other creteria. If the parser detects a violation to one of certain additional constraints, it MUST raise an xml-misc-recommendation. The list of such constraints is as follows:

The parser MUST act as if it is a validating XML processor for the purpose of informing of white space characters appearing in element content (See Section 2.10 of the XML specification).

In other word, the isElementContentWhitespace attribute of Text nodes has to be set appropriately. Note that the value of the attribute will be set to false for any Text node in the content of an element whose declaration is not processed.

The parser MUST raise at least one xml-well-formedness-error if the entity it parses does not match to the appropriate production rule in the XML specification. As an exception to this requirement, it MAY choose not to raise such an error if the error will be raised by the conformance checker when the conformance checker checks the Document object produced by the parser.

4 DOM Document Type Definitions

This section defines various extensions to DOM to represent definitions that can be contained in XML DTDs. They are referred to as DOM Document Type Definitions, which is a set of DOM interfaces, including both new interfaces and modifications to existing standard DOM interfaces.

The primary goal of those features is to make it possible to built an XML DTD validator on the top of the extended DOM API.

Although these features are defined as extensions to the standard DOM interfaces, they are not expected to be implemented by Web browsers.

Interfaces defined in this section are partially modeled and inspired by early Working Drafts of DOM Level 1 [DOM1WD], DOM Level 3 Abstract Schemas draft [DOM3AS], and XML Schema API specification [XSAPI], but they are not compatible with any of them as a whole.

Features in this section are applied to both XML documents and HTML documents.

4.1 Nodes

This specification introduces two kinds of nodes: element types and attribute definitions. Also, this specification reuses two kinds of nodes defined by DOM3 Core specification [DOM3CORE] (but obsoleted by current DOM Standard [DOM]): entities and notations. Requirements on nodes are applied by the following rules:

Node types introduced by this specification (ElementTypeDefinition and AttributeDefinition)
If this specification explicitly states requirements for them, they have to be followed. Otherwise, requirements mentioned in the DOM Standard have to be followed.
DocumentType
If this specification explicitly states requirements for it, they have to be followed. Otherwise, requirements mentioned in the DOM Standard have to be followed.

This specification defines how nodes attached to the DocumentType (e.g. ElementTypeDefinition and Entity) affect to attributes and methods of the nodes. When the DocumentType has no such attached node, they are expected to behave as specified by the DOM Standard.

Node types defined in DOM3 Core specification but obsoleted by DOM Standard (Entity and Notation)
If this specification explicitly states requirements for them, they have to be followed. Otherwise, requirements mentioned in the DOM3 Core specification have to be followed.
Other node types
Requirements in the DOM Standard have to be followed.

Nodes of types ElementTypeDefinition, AttributeDefinition, Entity, and Notation cannot contain, or cannot be contained by, any kind of node as child.

Historically, DOM3 Core specification and earlier versions of this specification have allowed AttributeDefinition and Entity nodes containing children.


partial interface Node {
  const unsigned short ELEMENT_TYPE_DEFINITION_NODE = 81001;
  const unsigned short ATTRIBUTE_DEFINITION_NODE = 81002;
};

The nodeType attribute MUST return the following, depending on the context object:

ElementTypeDefinition
ELEMENT_TYPE_DEFINITION_NODE (81001)
AttributeDefinition
ATTRIBUTE_DEFINITION_NODE (81002)

The nodeName attribute MUST return the name (element type name or attribute definition name) of the node.

If the node has attributes attributes, localName, namespaceURI, and prefix, they MUST return null.

The nodeValue attribute and the textContent attribute MUST return the following, depending on the context object:

AttributeDefinition
The default value of the context object.
Entity
The replacement text of the context object.

Setting the nodeValue attribute or the textContent attribute MUST do the following, depending on the context object:

AttributeDefinition
Set the default value of the context object to the new value.
Entity
Set the replacement text of the context object to the new value.

On setting, the textContent attribute MUST act as if it was the empty string instead if the new value is null.

When a node is cloned, the following values MUST be copied:

DocumentType
Its element types, general entities, and notations.

The DOM Standard defines more values to copy.

ElementTypeDefinition
Its name and attribute definitions.
AttributeDefinition
Its name, declared type, allowed tokens, default type, and default value.
Entity
Its name, public ID, system ID, notation name, and replacement text.

The encoding is not copied.

Notation
Its name, public ID, and system ID.

For the equality of two nodes, equality of the following values MUST also be taken into account:

DocumentType
Its element types, general entities, and notations.
ElementTypeDefinition
Its name and attribute definitions.
AttributeDefinition
Its name, declared type, allowed tokens, default type, and default value.
Entity
Its name, public ID, system ID, notation name, and replacement text.
Notation
Its name, public ID, and system ID.

For the comparison of node sets, only the length and equality of nodes with same nodeName in the set are significant. The order of the nodes are not ignored for the purpose of the comparison. For the comparison of allowed tokens, the length of items and existence of the values in both are taken into account.

If at least one of two nodes compared by the compareDocumentPosition method (i.e. reference and other) are ElementTypeDefinition, AttributeDefinition, Entity, or Notation and they are different nodes, the following rules are applied:


A node of type ElementTypeDefinition, Entity, or Notation has an associated owner document type definition. The AttributeDefinition node has an associated owner element type definition. Their values are null unless otherwise specified. When not ambiguous, they are simply referred to as owner of the node.

The ownerDocumentTypeDefinition attribute of the ElementTypeDefinition, Entity, and Notation interfaces MUST return the owner document type definition of the context object. The ownerElementTypeDefinition attribute of the AttributeDefinition interface MUST return the owner element type definition of the context object.

The publicId attribute and the systemId attribute of the Entity and Notation interfaces MUST return the public ID and system ID of the context object, respectively..

On setting, the publicId attribute and the systemId attribute of the DocumentType, Entity, and Notation interfaces MUST set the public ID and system ID of the context object, respectively, to the new value.

In DOM Standard and DOM3 Core specifications, these attributes are read-only.

Setting an invalid identifier to these attribute might make the node unserializable in the XML syntax.


When a NamedNodeMap collection represents element types, general entities, notations, or attribute definitions, methods getNamedItemNS, setNamedItemNS, and removeNamedItemNS of the NamedNodeMap interface [DOM3CORE] MUST throw a "NotSupportedError" exception.

4.2 Documents

Several factory methods are added to the Document interface.

partial interface Document {
  DocumentType createDocumentTypeDefinition(DOMString name);
  ElementTypeDefinition createElementTypeDefinition(DOMString name);
  AttributeDefinition createAttributeDefinition(DOMString name);
  Entity createGeneralEntity(DOMString name);
  Notation createNotation(DOMString name);
};

Earlier versions of this specification called this WebIDL fragment as the DocumentXDoctype interface.

The createDocumentTypeDefinition(name) method MUST run these steps:

  1. If name does not match the Name production, throw an "InvalidCharacterError" exception and terminate these steps.
  2. Return a new doctype, with name as its name and with its node document set to the context object.

The createElementTypeDefinition(name) method MUST run these steps:

  1. If name does not match the Name production, throw an "InvalidCharacterError" exception and terminate these steps.
  2. Return a new element type, with name as its name and with its node document set to the context object.

The createAttributeDefinition(name) method MUST run these steps:

  1. If name does not match the Name production, throw an "InvalidCharacterError" exception and terminate these steps.
  2. Return a new attribute definition, with name as its name and with its node document set to the context object.

The createGeneralEntity(name) method MUST run these steps:

  1. If name does not match the Name production, throw an "InvalidCharacterError" exception and terminate these steps.
  2. Return a new general entity, with name as its name and with its node document set to the context object.

The createNotation(name) method MUST run these steps:

  1. If name does not match the Name production, throw an "InvalidCharacterError" exception and terminate these steps.
  2. Return a new notation, with name as its name and with its node document set to the context object.

name does not have to be a namespace qualified name.

4.3 Document types

Each doctype has associated unordered sets of nodes element types, general entities, and notations. Unless otherwise specified, they MUST be empty when the doctype is created.

The set element types can only contains element types.

The elementTypes attribute of the DocumentType interface MUST return an NamedNodeMap collection containing the nodes in the element types set of the context object, sorted by their nodeName's code point.

The general entities of the doctype is exposed by the entities attribute of the DocumentType interface [DOM3CORE]. The generalEntities attribute of the DocumentType interface MUST return the same object as the entities attribute. The nodes in the entities collection MUST be sorted by their nodeName's code point.

The notations of the doctype is exposed by the notations attribute of the DocumentType interface [DOM3CORE]. The nodes in the notations collection MUST be sorted by their nodeName's code point.

Need to merge with manakai-allow-doctype-children configuration parameter spec...

A DocumentType interface MAY contain zero or more ProcessingInstruction nodes in the NodeList object contained in the childNodes attribute of the DocumentType node.

If the DocumentType node is created during the process to create a DOM from an XML document, the NodeList object in the childNodes object MUST contains the ProcessingInstruction nodes representing the processing instructions in the document type definition of the document processed [XML, XML11] by the XML processor.

If a DocumentType node is created from a document type declaration information item [INFOSET], the NodeList object in the childNodes attribute of the node MUST contain the ProcessingInstruction nodes created from any processing instruction information items in the list in the [children] property of the document type declaration item in the same order.

If a DocumentType node is mapped to a document type declaration information item, the list in the [children] property MUST contain the processng instruction information items created from the ProcessingInstruction nodes in the NodeList object in the childNodes attribute of the DocumentType node.

partial interface DocumentType {
  [TreatNullAs=EmptyString] attribute DOMString publicId;
  [TreatNullAs=EmptyString] attribute DOMString systemId;
  attribute DOMString? declarationBaseURI;
  attribute DOMString? manakaiDeclarationBaseURI;

  readonly attribute NamedNodeMap elementTypes;
  readonly attribute NamedNodeMap generalEntities;

  ElementTypeDefinition getElementTypeDefinitionNode
    (in DOMString name);
  Entity getGeneralEntityNode
    (in DOMString name);
  Notation getNotationNode
    (in DOMString name);

  ElementTypeDefinition? setElementTypeDefinitionNode(ElementTypeDefinition node);
  Entity? setGeneralEntityNode(Entity node);
  Notation? setNotationNode(Notation node);
  ElementTypeDefinition removeElementTypeDefinitionNode(ElementTypeDefinition node);
  Entity removeGeneralEntityNode(Entity node);
  Notation removeNotationNode(Notation node);
};

Earlier versions of this specification named this WebIDL fragment as the DocumentTypeDefinition interface.

If the DocumentType node is created during the process to create a DOM from an XML document, the following requirements are applied: The NamedNodeMap object in the elementType attribute MUST be so transformed that the object contains the ElementTypeDefinition nodes for the element types whose name is presented as the Name of the element type or attribute definition list declarations processed [XML, XML11] by the XML processor. If there are more than one element type declarations for an element type, then the declarations other than the first one MUST be ignored for the purpose of constructing the NamedNodeMap object.

All entities declared in the document type definition contained in or referenced from the document entity might not be exposed through this collection, depending on the information provided by the XML processor for the DOM implementation. In particular, it might not contain any entity if entity references are expanded at the parse time. An implementation [['''MUST NOT''']] expose an [CODE(DOMi)@en[[[Entity]]]] node whose [CODE(DOMa)@en[[[nodeName]]]] is equal to the name of one of five predefined general entities in XML through the collection as the result of parsing of an XML document that has no error. Duplicate entity declarations are also discarded.

The attribute [['''MUST''']] return the [CODE(DOMi)@en[[[NamedNodeMap]]]] object that contains all the [CODE(DOMi)@en[[[Entity]]]] nodes representing general entities belong to the node.

If the [CODE(DOMi)@en[[[DocumentTypeDefinition]]]] node is created from an XML document, duplicate notation declarations, if any, in DTD [['''MUST NOT''']] result in a node in the [CODE(DOMi)@en[[[NamedNodeMap]]]] object and only the first declaration [['''MUST''']] be made available as a [CODE(DOMi)@en[[[Notation]]]] node.

;; This definition is based on one for the [CODE(DOMa)@en[[[notations]]]] attribute of the [CODE(DOMi)@en[[[DocumentType]]]] interface in DOM XML module. Since duplication is violation to the Validity Constraint, XML parsers might vary on how notations are notified to the application. In particular, [CODE(InfoProp)@en[[[notations]]]] property of the document information item in XML Information Set is so defined that in case ''any'' notation is declared for multiple times then the property has no value.

The attribute [['''MUST''']] return the [CODE(DOMi)@en[[[NamedNodeMap]]]] object that contains all the [CODE(DOMi)@en[[[Notation]]]] nodes representing notations belong to the node.

getElementTypeDefinitionNode, method

Returns the ElementTypeDefinition node with the specified name.

The name parameter is the name of the element type.

When invoked, the method MUST return the ElementTypeDefinition node, whose nodeName is equal to name, in the NamedNodeMap object in the elementTypes attribute of the node. If there is no such node, it MUST return null.

getGeneralEntityNode, method

Returns the Entity node with the specified name.

The name parameter is the name of the general entity.

When invoked, the method MUST return the Entity node, whose nodeName is equal to name, in the NamedNodeMap object in the entities attribute of the node. If there is no such node, it MUST return null.

getNotationNode, method

Returns the Notation node with the specified name.

The name parameter is the name of the notation.

When invoked, the method MUST return the Notation node, whose nodeName is equal to name, in the NamedNodeMap object in the notations attribute of the node. If there is no such node, it MUST return null.

setElementTypeDefinitionNode, method
The setElementTypeDefinitionNode(node) method MUST return the result of setting node to the context object's element types.
setGeneralEntityNode, method
The setGeneralEntityNode(node) method MUST return the result of setting node to the context object's general entities.
setNotationNode, method
The setNotationNode(node) method MUST return the result of setting node to the context object's notations.
removeElementTypeDefinitionNode, method
The removeElementTypeDefinitionNode(node) method MUST return the result of removing node from the context object's element types.
removeGeneralEntityNode, method
The removeGeneralEntityNode(node) method MUST return the result of removing node from the context object's general entities.
removeNotationNode, method
The removeNotationNode(node) method MUST return the result of removing node from the context object's notations.

To remove a node node from the set of nodes set, run these steps:

  1. If set does not contain node, throw an "NotFoundError" exception and abort these steps.
  2. Remove node from set.
  3. Set the owner of node to null.
  4. Return node.

To set a node node to the set of nodes set associated to the node owner, run these steps:

  1. If the owner of node is not null and is not equal to owner, throw a "HierarchyRequestError" and abort these steps.
  2. Adopt node into the node document of owner.
  3. Let oldNode be null.
  4. If there is a node whose nodeName is equal to node's nodeName in set, let oldNode be the node and remove oldNode from set.
  5. Add node to set.
  6. Set the owner of node to owner.
  7. Return oldNode.

4.4 Element types

The ElementTypeDefinition objects are simply known as element types. It represents the definition of the element, identified by the name.

Although the element type is corresponding to the ELEMENT declaration in the XML DTD, the element type does not directly represent the ELEMENT declaration.

interface ElementTypeDefinition : Node {
  readonly attribute DocumentType? ownerDocumentTypeDefinition;

  readonly attribute NamedNodeMap attributeDefinitions;
  AttributeDefinition? getAttributeDefinitionNode(DOMString name);
  AttributeDefinition? setAttributeDefinitionNode(AttributeDefinition node);
  AttributeDefinition removeAttributeDefinitionNode(AttributeDefinition node);
};

Each element type has an associated unordered set of nodes attribute definitions. Unless otherwise specified, it MUST be empty when the element type is created. The set attribute definitions can only contains attribute definition.

The attributeDefinitions attribute of the ElementTypeDefinition interface MUST return an NamedNodeMap collection containing the nodes in the attribute definitions set of the context object, sorted by their nodeName's code point.

Even if there are more than one element type declarations for an element type in DTD, the result DOM will contain only an ElementTypeDefinition node for that element type. In addition, if there are attribute definition declarations for an element type, even when there is no element type declaration for that element type, the DOM will contain an ElementTypeDefinition node for that element type.

If the ElementTypeDefinition node is created during the process to create a DOM from an XML document, the following requirements are applied: The NamedNodeMap object in the attributeDefinitions attribute MUST be so transformed that the object contains the AttributeDefinition node corresponding to the attribute definitions in the attribute definition list declarations processed [XML, XML11] by the XML processor and associated to the element type represented by the node. If there are more than one attribute definitions for an attribute, then the definitions other than the first one MUST be ignored for the purpose of constructing the NamedNodeMap object.

getAttributeDefinitionNode, method

Return the AttributeDefinition node with the specified name.

The name parameter is the name of the attribute.

When invoked, the method MUST return the AttributeDefinition node, whose nodeName attribute value is equal to name, in the NamedNodeMap in the attributeDefinitions attribute of the node. If there is no such node, it MUST return null.

The setAttributeDefinitionNode(node) method of the ElementTypeDefinition object MUST return the result of setting node to the context object's attribute definitions.

The removeAttributeDefinitionNode(node) method of the ElementTypeDefinition object MUST return the result of removing node from the context object's attribute definitions.

A future version of the specification might define a set of attributes and methods for representing and accessing to the content model of the element type.

partial interface ElementTypeDefinition {
  attribute DOMString? contentModelText;
};

4.5 Attribute definitions

The nodes of type ATTRIBUTE_DEFINITION_NODE represents an attribute definition. Such a node MUST implement the AttributeDefinition interface, which extends the Node interface.

Each attribute definition has an associated name.

Each attribute definition has associated declared type, allowed tokens, default type, and default value. Unless otherwise specified, they are initialized to their default values when the attribute definition is created. Their default values are NO_TYPE_ATTR, the empty list, UNKNOWN_DEFAULT, and the empty string, respectively.

An attribute definition represents a definition of the attribute associated to an element type. It is corresponding to the attribute definition in the attribute list declaration in DTD. However, an AttributeDefinition node does not represent the attribute definition in DTD itself. Even if there are more than one attribute definitions for an attribute of an element type in DTD, the result DOM will contain only an AttributeDefinition node for that attribute.

interface AttributeDefinition : Node {
  // DefaultValueType
  const unsigned short UNKNOWN_DEFAULT = 0;
  const unsigned short FIXED_DEFAULT = 1;
  const unsigned short REQUIRED_DEFAULT = 2;
  const unsigned short IMPLIED_DEFAULT = 3;
  const unsigned short EXPLICIT_DEFAULT = 4;

  readonly attribute ElementTypeDefinition? ownerElementTypeDefinition;
  attribute unsigned short declaredType;
  attribute DOMString[] allowedTokens;
  attribute unsigned short defaultType;
};

[NoInterfaceObject]
interface AttrDeclaredValueType {
  // DeclaredValueType
  const unsigned short NO_TYPE_ATTR = 0;
  const unsigned short CDATA_ATTR = 1;
  const unsigned short ID_ATTR = 2;
  const unsigned short IDREF_ATTR = 3;
  const unsigned short IDREFS_ATTR = 4;
  const unsigned short ENTITY_ATTR = 5;
  const unsigned short ENTITIES_ATTR = 6;
  const unsigned short NMTOKEN_ATTR = 7;
  const unsigned short NMTOKENS_ATTR = 8;
  const unsigned short NOTATION_ATTR = 9;
  const unsigned short ENUMERATION_ATTR = 10;
  const unsigned short UNKNOWN_ATTR = 11;
};
AttributeDefinition implements AttrDeclaredValueType;
Attr implements AttrDeclaredValueType;
declaredType of type unsigned short

The declared type [XML, XML11] of the attribute. It is expected that this attribute contains a value from the definition group DeclaredValueType.

On getting, the attribute MUST return the value associated to this attribute.

On setting, it MUST set the specified value as the value associated to this attribute.

If the AttributeDefinition node is created during the process to create a DOM from an XML document, an appropriate value from the DeclaredValueType constant group MUST be set to the attribute.

The definition group DeclaredValueType contains integers indicating the declared type of attributes. The definition group contains the following constants:

NameValueDescription
NO_TYPE_ATTR 0No value [INFOSET].
CDATA_ATTR 1CDATA [XML, XML11].
ID_ATTR 2ID [XML, XML11].
IDREF_ATTR 3IDREF [XML, XML11].
IDREFS_ATTR 4IDREFS [XML, XML11].
ENTITY_ATTR 5ENTITY [XML, XML11].
ENTITIES_ATTR 6ENTITIES [XML, XML11].
NMTOKEN_ATTR 7NMTOKEN [XML, XML11].
NMTOKENS_ATTR 8NMTOKENS [XML, XML11].
NOTATION_ATTR 9NOTATION [XML, XML11].
ENUMERATION_ATTR 10Enumeration [XML, XML11].
UNKNOWN_ATTR 11Unknown, because no declaration for the attribute has been read but the [all declarations processed] property [INFOSET] would be false.

If no attribute type information is available, or if the source of the information does not distinguish no value and unknown [INFOSET], then the value NO_TYPE_ATTR MUST be used.

An AttributeDefinition node created by the createAttributeDefinition method has its declaredType attribute set to NO_TYPE_ATTR.

If the source of the information does not distinguish no value and/or unknown [INFOSET] and CDATA [XML, XML11], then the value CDATA_ATTR MUST be used.

In Perl binding [DOMPERL], the Attr nodes MUST implement the DeclaredValueType definition group.

allowedTokens of type DOMString[]

The list of allowed attribute values.

On getting, the attribute MUST return the DOMStringList object associated to this attribute. The object MAY contain zero or more ordered strings, consist of zero or more characters respectively, possibly with duplications.

If the AttributeDefinition node is created during the process to create a DOM from an XML document, the object MUST contain the names or name tokens allowed for the attribute defined by the node. If the document is well-formed, the object will be empty unless the declaredType is ENUMERATION_ATTR or NOTATION_ATTR.

If the declaredType is different from ENUMERATION_ATTR or NOTATION_ATTR, this attribute MUST be ignored for the purpose of serializing into (part of) XML document.

When serializing the node it should be noted that the object might be empty, might contain duplications, and might contain strings that are not names or name tokens.

defaultType of type unsigned short

The type of the default for the attribute. It is expected that this attribute contains a value from the definition group DefaultValueType.

On getting, the attribute MUST return the value associated to this attribute.

On setting, it MUST set the specified value as the value associated to this attribute.

If the AttributeDefinition node is created during the process to create a DOM from an XML document, an appropriate value from the DefaultValueType definition group MUST be set to the attribute.

The definition group DefaultValueType contains integers indicating the type of the default for the attribute. The definition group contains the following constants:

NameValueDescription
UNKNOWN_DEFAULT 0Unknown.
FIXED_DEFAULT 1Provided explicitly and only the value is allowed. [XML, XML11].
REQUIRED_DEFAULT 2No default value and the attribute have to be explicitly specified.
IMPLIED_DEFAULT 3Implied [XML, XML11].
EXPLICIT_DEFAULT 4Provided explicitly.

If the source of the default type does not distinguish implied and unknown default types, then the value IMPLIED_DEFAULT MUST be used.

An AttributeDefinition node created by the createAttributeDefinition method has its defaultType attribute set to UNKNOWN_DEFAULT.

4.6 General entities

The Entity node is known as general entity, or when not ambiguous, simply entity.

Each entity has an associated public ID and system ID. Unless otherwise specified, their values are the empty string when the entity is created.

partial interface Entity {
  readonly attribute DocumentType? ownerDocumentTypeDefinition;

  [TreatNullAs=EmptyString] attribute DOMString publicId;
  [TreatNullAs=EmptyString] attribute DOMString systemId;
  attribute DOMString? declarationBaseURI;
  attribute DOMString? manakaiDeclarationBaseURI;
  attribute DOMString? notationName;
  readonly attribute DOMString hasReplacementTree;
  attribute DOMString? manakaiEntityURI;
  attribute DOMString? manakaiEntityBaseURI;
  attribute boolean isExternallyDeclared;
};

Each entity has an associated notation name. Unless otherwise specified, it is null when the entity is created.

The notationName attribute returns the notation name of the context object.

On setting, the notationName attribute of the Entity interface MUST set the notation name of the context object to the new value.

In DOM3 Core specification, this attribute was read-only.

Setting an invalid name to this attribute would make the node unserializable in the XML syntax.

Each entity has an associated replacement text. Unless otherwise specified, the replacement text is the empty string when an entity is created.

isExternallyDeclared of type boolean

Whether the entity is declared by an external entity declaration or not. If the value is true, the entity is declared in an entity declaration in the external subset entity or in an external parameter entity. If the value is false, the entity is declared in an entity declaration in the internal subset, or the node is created in memory.

On getting, the attribute MUST return the value associated to this attribute.

On setting, it MUST set the specified value as the value associated to this attribute.

If the Entity node is created during the process to create a DOM from an XML document, the following requirements are applied: If the entity is an unparsed entity, then the attribute MUST be set to false. Otherwise, i.e. the entity is a parsed entity, then the attribute MUST be set to the value of whether the entity is declared by an external markup declaration or not.


A entity has associated entity URL and entity base URL. Unless otherwise specified, their values are null.

On getting, the manakaiEntityURI attribute of the Entity interface MUST run these steps:

  1. If the entity URL of the context object is not null, return it and abort these steps.
  2. Otherwise, if the system ID of the context object is not the empty string, resolve it relative to the effective declaration base URL of the context object. If it succeeded, return the result and abort these steps.
  3. Return null.

On setting, the attribute MUST run these steps:

  1. If the new value is null, set the entity URL of the context object to null and abort these steps.
  2. Resolve the new value relative to the effective declaration base URL of the context object. Set the entity URL of the context object to the result if succeeded, or null otherwise.

It is expected that an XML Parser supporting this specification set the entity URL when an external entity is read.

4.7 Notations

The Notation node is simply known as notation.

Each notation has an associated public ID and system ID. Unless otherwise specified, their values are the empty string when the notation is created.

partial interface Notation {
  readonly attribute DocumentType? ownerDocumentTypeDefinition;
  [TreatNullAs=EmptyString] attribute DOMString publicId;
  [TreatNullAs=EmptyString] attribute DOMString systemId;
  attribute DOMString? declarationBaseURI;
  attribute DOMString? manakaiDeclarationBaseURI;
};

5 Base URLs

To determine the base URL of a node node, these steps MUST be followed:

  1. If node is a Document, return document base URL of node and abort these steps.
  2. If node is the base attribute in the XML namespace:
    If the owner element of node is null
    If the parent of the owner element of node is null
    Return the base URL of the node document of node and abort these steps.
    Otherwise
    Return the base URL of the parent of the owner element of node and abort these steps.
  3. Otherwise, if node is an attribute:
    If the owner element of node is null
    Return the base URL of the node document of node and abort these steps.
    Otherwise
    Return the base URL of the owner element of node and abort these steps.
  4. If node is an element and it has the base attribute in the XML namespace, resolve the value of the attribute relative to the base URL of the attribute. If it succeeded, return the result and abort these steps.
  5. If the parent of node is not null, return the base URL of the parent and abort these steps.
  6. Return the base URL of the node document of node and abort these steps.

These steps ignore external entity boundary, which is a willful violation to the XML Base specification [XMLBASE].


On getting, the declarationBaseURI and manakaiDeclarationBaseURI attributes of the DocumentType interface MUST return the base URL of the context object. On setting, they MUST do nothing.

Each entity or notation has an associated declaration base URL. Its default value is null.

On getting, the declarationBaseURI and manakaiDeclarationBaseURI attributes of the Entity or Notation interface MUST return effective declaration base URL, i.e. the declaration base URL of the context object if it is not null, or the base URL of the context object otherwise.

On setting, they MUST run these steps:

  1. If the new value is null, set the declaration base URL of the context object to null and abort these steps.
  2. Otherwise, resolve the new value relative to the base URL of the context object. Set the declaration base URL of the context object to the result, if it succeeded, or null, otherwise.

partial interface Document {
  attribute DOMString? manakaiEntityBaseURI;
};

The document, like entities, has the entity base URL, initially null.

On getting, the manakaiEntityBaseURI attribute of the Document or Entity interface MUST run these steps:

  1. If the entity base URL of the context object is not null, return it and abort these steps.
  2. Otherwise, if the context object is Entity, return the manakaiEntityURI attribute value of the context object if it is not null.
  3. Otherwise, return the base URL of the context object.

On setting, the attribute MUST run these steps:

  1. If the new value is null, set the entity base URL of the context object to null and abort these steps.
  2. Resolve the new value relative to the manakaiEntityURI attribute value, if the context object is an Entity and it it not null, or the base URL otherwise, of the context object. Set the entity base URL of the context object to the result if succeeded, or null otherwise.

It is expected that an XML Parser supporting this specification set the entity base URL when a document or external entity is read. If the entity is retrieved using HTTP, the Content-Base: and Content-Location: header fields might affect the value. Note also that they does not affect the value of the document base URL of the document.

6 XML namespaces

This section is non-normative.

This section describes how to implement namespace fixup.

6.1 Namespace mappings

A namespace mapping is a data structure with the following fields:

Default namespace
Either a namespace URL, null, or missing. By default, its value is missing.
Prefix to URL
An ordered list of key/value pairs, where keys are namespace prefixes and values are namespace URLs or none. By default, the set contains two pairs: (xml, the XML namespace) and (xmlns, the XMLNS namespace).

There will be at most one pair with same key in the prefix to URL list. However, there might be more than one pairs with same value in the list.

A default namespace declaration with namespace URL url is an attribute whose namespace is the XMLNS namespace, local name is xmlns, and value is url.

A namespace prefix declaration with prefix prefix and namespace URL url is an attribute whose namespace is the XMLNS namespace, namespace prefix is xmlns, local name is prefix, and value is url.

6.2 Namespace fixup of an XML element for serialization

The steps for namespace fixup of an XML element for serialization, with an element element and optional namespace mapping nsmap, are as follows:

  1. If nsmap is not specified, let nsmap be a new namespace mapping.
  2. Otherwise, let nsmap be a deep clone of nsmap.
  3. Let attributes be the list containing the attributes in the attribute list of element in same order.
  4. For each attribute attr in attributes, in order:
    1. If the namespace of attr is not the XMLNS namespace, skip the rest of these substeps and continue with any next attribute.
    2. Let local name be the local name of attr.
    3. Let value be the value of attr.
    4. If local name is xml, or if value is the XML namespace or the XMLNS namespace, do nothing.
    5. Otherwise, if local name is xmlns:
      1. If value is the empty string, set the default namespace of nsmap to null.
      2. Otherwise, set the default namespace of nsmap to value.
    6. Otherwise:
      1. Delete the key/value pair whose key is equal to local name, if any, from the prefix to URL list of nsmap.
      2. If value is the empty string, append key/value pair (local name, none) to the prefix to URL list of nsmap.
      3. Otherwise, append key/value pair (local name, value) to the prefix to URL list of nsmap.
  5. Let tag name and new attr be the values returned by the steps to obtain a QName with the namespace, namespace prefix, and local name of element, attributes, and nsmap and with the is element flag.
  6. If new attr is not null, prepend it to attributes.
  7. Let attribute specifications be the empty list.
  8. For each attribute attr in attributes, in order:
    1. Let attr name and new attr be the values returned by the steps to obtain a QName with the namespace, namespace prefix, and local name of attr, attributes, and nsmap (without the is element flag).
    2. If new attr is not null, insert it into attributes just after attr.
    3. Append attribute name/value pair (attr name, the value of attr) to attribute specifications.
  9. Return tag name, attribute specifications, and nsmap.

These steps return three values. The first value, tag name, is used as the tag name in the start and end tag of the serialization of element. The second value, attribute specifications, contains zero or more pairs of attribute names and values, which are used in the start tag of the serialization of element. The third value, nsmap, is used as the input to the steps for serialization of the child elements of element.

The steps to obtain a QName, with parameters namespace URL, prefix, local name, attributes, and nsmap, where attributes and nsmap are shared with the callee such that any modification to them is also reflected to those variables in the steps which invoked these steps, and the is element flag, are as follows:

  1. Let qname be local name.
  2. Let new attr be null.
  3. If namespace URL is null:
    1. If the is element flag is not set, do nothing.
    2. Otherwise, if the default namespace of nsmap is not null:
      1. Set the default namespace of nsmap to null.
      2. If attributes contains an attribute attr whose namespace is the XMLNS namespace and local name is xmlns:
        1. If the value of attr is not the empty string, replace attr in attributes by a new defualt namespace declaration with namespace URL the empty string, preserving the order.
      3. Otherwise, let new attr be a new defualt namespace declaration with namespace URL the empty string.
  4. Otherwise, if namespace URL is not null:
    1. If namespace URL is the XML namespace, let prefix be xml.
    2. Otherwise, if namespace URL is the XMLNS namespace, let prefix be xmlns.
    3. If namespace URL is the XMLNS namespace and qname is xmlns, do nothing.
    4. Otherwise, if the is element flag is set, prefix is null, and the default namespace of nsmap is missing:
      1. Set the default namespace of nsmap to namespace URL.
      2. Let new attr be a new default namespace declaration with namespace URL namespace URL.
    5. Otherwise, if the is element flag is set, prefix is null, and the default namespace of nsmap is equal to namespace URL, do nothing.
    6. Otherwise, if prefix is not null and there is the key/value pair whose key is prefix and value is namespace URL in the prefix to URL list of nsmap, prepend prefix followed by a U+003A COLON character (:) to qname.
    7. Otherwise, if prefix is not null and there is no key/value pair whose key is prefix in the prefix to URL list of nsmap:
      1. Append key/value pair (prefix, namespace URL) to the prefix to URL list of nsmap.
      2. Let new attr be a new namespace prefix declaration with prefix prefix and namespace URL namespace URL.
      3. Prepend prefix followed by a U+003A COLON character (:) to qname.
    8. Otherwise, if there is the key/value pair whose value is equal to namespace URL in the prefix to URL list of nsmap:
      1. Let prefix be the key of the last key/value pair whose value is equal to namespace URL in the prefix to URL list of nsmap.
      2. Prepend prefix followed by a U+003A COLON character (:) to qname.
    9. Otherwise, if the is element flag is set and the default namespace of nsmap is equal to namespace URL, do nothing.
    10. Otherwise, if the is element flag is set, prefix is null, and attribute does not contain an attribute whose namespace is the XMLNS namespace and local name is xmlns:
      1. Set the default namespace of nsmap to namespace URL.
      2. Let new attr be a new default namespace declaration with namespace URL namespace URL.
    11. Otherwise:
      1. Let prefix be an where n is chosen such that n is the minimum non-negative integer an is not found as a key of the prefix to URL list of nsmap.
      2. Append key/value pair (prefix, namespace URL) to the prefix to URL list of nsmap.
      3. Let new attr be a new namespace prefix declaration with prefix prefix and namespace URL namespace URL.
      4. Prepend prefix followed by a U+003A COLON character (:) to qname.
  5. Return qname and new attr.

These steps are based on, but not equivalent to, Gecko's implementation at the time of writing.

These steps do not prevent an unserializable set of tag name and attributes from being returned when, for example, there is an xmlns attribute in null namespace, there is an xml attribute in the XMLNS namespace whose value is not equal to the XML namespace, or the given nsmap contains a key which is not a valid XML NCName.

6.3 Tests

There is no comprehensive test suite of namespace fixup at the time of writing.

There are some tests at: <https://suika.suikawiki.org/~wakaba/test/web/dom/element/inner-html/> and <https://github.com/manakai/perl-web-markup/blob/master/t/modules/Web-XML-Serializer.t>.

7 Conformance checking of XML documents

If there is a parse error, the document is not well-formed.

...

Much of invalid (well-formed or not) XML document parsing and XML document / XML DOM conformance is left undefined so that this document provides a guideline for conformance checkers.

7.1 Processing Model

Conceptually, validation of an XML document is split into two stages for the purpose of this specification: the XML document parsing stage and the DOM XML conformance checking stage.

The input to the XML document parsing stage is a byte sequence representing the parsed XML document (and any additional metadata), and the output are a DOM tree representing the XML document and zero or more errors. The processor that implements this stage is called parser. Requirements for a parser are defined in the section of Parsing an XML Document.

The input to the DOM XML conformance chcking stage is a DOM tree, and the output are zero or more errors. The processor that implements this stage is called conformance checker. Requirements for a conformance checker are defined in the section of Checking an XML DOM Tree.

7.1.1 Error Classification

An error is ...

If a Document node has no xml-well-formedness-error, entity-error, and unknown-error, then it is well-formed. If a well-formed Document node has no xml-validity-error, it is valid.

A well‐formed Document can be safely serialized into a well‐formed XML document. A valid Document can be easily serialized into a valid XML document.

To be a conforming validating XML processor, ...

Errors are classified into these error categories:

entity-error

@@

This algorithm does not support DOM tree with one or more EntityReference nodes. It is expected that any entity references are expanded at the parse time and any unexpandable entity references make parse time errors raised so that never result in DOM tree with EntityReference nodes.

round-trip-error
@@
round-trip-warning

A round-trip-warning will be raised when a construct, which might not be restored to the same construct when it is serialized and then re-parsed by a conforming processor, is encountered.

For a Comment node a round-trip-warning will be raised, since XML processors are not required to report texts of comments for applications.

unknown-error?
@@
xml-misc-error
An XML error (XML 1.0 [XML] error / XML 1.1 [XML11] error) that is not classified to any other error category.
xml-misc-fatal-error
An XML fatal error (XML 1.0 [XML] fatal error / XML 1.1 [XML11] fatal error) that is not classified to any other error category. @@ What errors fall into this category?
xml-misc-recommendation

An xml-misc-recommendation will be raised if a SHOULD‐level requirement in XML specification is not met.

xml-validity-error
A violation of validity constraint in XML document.
xml-well-formedness-error

If an xml-well-formedness-error is raised, it would not be possible to generate an XML serialization that would match to the appropriate production rule and that would not violate to any well‐formedness constraint in XML specification [XML, XML11].

misc-info

A misc-info is raised when some status information on parsing or checking process that are considered useful for debugging and so on is available. It by no means implies the non-conformance of the document.

@@ TODO: #dt-atuseroption at user option (MAY or MUST), #dt-compat for compatibility, #dt-interop for interoperability

TODO: XML 1.1, XML Namespace 1.0/1.1, xml:base, xml:id

TODO: XML "error"/"fatal error" is not always non-conforming (only when MUST or SHOULD).

7.2 Checking an XML DOM Tree

The following algorithms and definitions are applied to XML documents; especially, they are not applied to HTML documents.

7.2.1 Definitions

The XML version of a node is the XML version of the document to which the node belongs. For a Documemt node, the XML version of the document is the value of the xmlVersion attribute of the node. For a DocumentType node whose ownerDocument attribute is set to null, the XML version of the document is 1.0. For any other node, the XML version of the document is that of the Document node contained in the ownerDocument attribute of the node.

7.2.2 Conformance Checking Algorithms for Components

To to validate an XML string (s), the following algorithm MUST be used:

  1. If s contains a character that is not in the character class Char10, then raise an xml-well-formedness-error.
  2. If s contains a character that is in the character class CompatChar10, then raise an xml-misc-warning.
  3. If s contains a character that is in the character class ControlChar10, then raise an xml-misc-warning.
  4. @@ XML 1.1 support
  5. If s contains a U+000D CARRIAGE RETURN character, then raise a round-trip-error. @@ We should not raise duplicate errors for U+000D in attribute values. In addition, we should support a mode where U+000D will be serialized as (so that no round-trip-error will be raised).

To validate a Name (s), the following algorithm MUST be used:

  1. If s is an empty string, then raise an xml-well-formedness-error. Abort these steps.
  2. Validate s as an XML string.
  3. If the first character in s is a character that is not in the character class NameStartChar10, then raise an xml-well-formedness-error.
  4. If a character other than the first character in s is a character that is not in the character class NameChar10, then raise an xml-well-formedness-error.
  5. If s begins with the string xml (in any case combination), then raise an xml-misc-warning. @@ except for attribute names xml:lang, xml:space.
  6. @@ XML 1.1 support

To validate an NCName (s), the following algorithm MUST be used:

  1. Validate s as a Name.
  2. @@

To validate a public identifier (pid), the following algorithm MUST be used:

  1. If pid is null, abort these steps.
  2. If pid contains a character that is not in the character class PubidChar, then raise an xml-well-formedness-error.
  3. If pid contains one of U+0009 CHARACTER TABULATION, U+000A CARRIAGE RETURN, and U+000D LINE FEED characters, if the first character of pid is U+0020 SPACE character, if the last character of pid is U+0020 SPACE character, or if there is a U+0020 SPACE character immediately followed by another U+0020 SPACE character in pid, then it is a round-trip-error. Is this really a roundtripness problem? XML spec does only define the way to match public identifiers in fact, no canonical form.

To validate a system identifier (sid), the following algorithm MUST be used:

  1. If sid is null, abort these steps.
  2. Validate sid as an XML string.
  3. If sid contains both U+0022 QUOTATION MARK (") and U+0027 APOSTROPHE (') characters, raise an xml-well-formedness-error.
  4. If sid contains at least one U+0023 NUMBER SIGN (#) character, then raise an xml-misc-error.
  5. @@ If sid cannot be converted to a URI reference, then raise a fact-level error (xml-misc-warning?).

7.2.3 Checking Node

The algorithm to check a node (n) is defined as following:

If n is an Attr node
  1. Validate the localName attribute value as an NCName.
  2. If the prefix attribute value is different from null, then validate the prefix attribute value as an NCName.
  3. For each node nc in the childNodes list of n,
    1. If nc is not a Text or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recusrively.
  4. If nodeName attribute of n is xml:space @@ or {xml namespace}:space ? and value attribute of n is neither default nor preserve, then it is an xml-misc-error.
  5. @@ xml:lang value is not a language tag [RFC 3066 or its successor] or an empty string, then xml-misc-warning (a "fact"-level error; not an XML error).
  6. @@ specified, manakaiAttributeType (#ValueType Validity constraint: Attribute Value Type)
  7. Let v be the value of the attribute value of n.
  8. Validate the n against the declared type as following:
    ID_ATTR
    1. Validate v as an Name. If it fails, then raise an xml-validity-error.
    2. If ID v is defined, then raise an xml-validity-error.
    IDREF_ATTR
    1. Validate v as an Name. If it fails, then raise an xml-validity-error.
    2. If ID v is NOT defined, then raise an xml-validity-error.
    IDREFS_ATTR
    @@
    ENTITY_ATTR
    1. Validate v as an Name. If it fails, then raise an xml-validity-error.
    2. If Entity v is NOT defined, then raise an xml-validity-error.
    ENTITIES_ATTR
    @@
    NMTOKEN_ATTR
    1. Validate v as an Nmtoken. If it fails, then raise an xml-validity-error.
    NMTOKENS_ATTR
    @@
    NOTATION_ATTR
    v must be one of enumerated values. If not, then raise an xml-validity-error.
    ENUMERATED_ATTR
    v must be one of enumerated values. If not, then raise an xml-validity-error.
    @@
  9. If type ID and default is NOT #IMPLIED or #REQUIRED, then raise an xml-validity-error.
  10. @@ #FixedAttr Validity constraint: Fixed Attribute Default
  11. @@ strict serialization error for U+000D, U+000A, and U+0009 characters, leading/trailing U+0020, and U+0020{2,} string?
If n is an AttributeDefinition node
  1. If nodeName attribute of n is xml:space @@ or {xml namespace}:space ? and its declared type is different from (default|preserve), (preserve|default), (default), or (preserve), then raise an xml-misc-error.
  2. For each node nc in the childNodes list of n,
    1. If nc is not a Text or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recusrively.
  3. If NOTATION_ATTR, enumerated values MUST be declared. If not, then raise an xml-validity-error.
  4. If NOTATION_ATTR or ENUMERATED_ATTR, values MUST all be distinct. If not, then raise an xml-validity-error.
  5. If NOTATION_ATTR on an EMPTY element, then raise an xml-validity-error.
  6. @@ #defattrvalid Validity constraint: Attribute Default Value Syntactically Correct
If n is a CDATASection node
  1. Validate the data attribute value as an XML character data.
  2. If the data attribute value contains a string ]]>, then raise an xml-well-formedness-error.
  3. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
If n is a Comment node
  1. Raise an round-trip-warning.
  2. Validate the data attribute value as an XML character data.
  3. If the data attribute value contains a string --, or if it ends with a character -, then raise an xml-well-formedness-error.
  4. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
If n is a Document node
  1. If XML version of n is different from 1.0 or 1.1, then it is an unknown-error?.
  2. If the xmlEncoding attribute value does not match to [A-Za-z] ([A-Za-z0-9._] | '-')* @@ formal def, then it is an xml-well-formedness-error.
  3. The childNodes list of n have to consist of zero or more Comment and/or ProcessingInstruction nodes, followed by an optional DocumentType node, followed by zero or more Comment and/or ProcessingInstruction nodes, followed by an Element node, followed by zero or more Comment and/or ProcessingInstruction nodes. Any violation to this is an xml-well-formedness-error.
  4. For each node nc in the childNodes list of n,
    1. If nc is not an EntityReference node, then check nc recursively.
  5. @@ allDeclarationsProcessed
If n is a DocumentFragment node
  1. For each node nc in the childNodes list of n,
    1. If nc is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recursively.
If n is a DocumentType node
  1. Validate the nodeName attribute value as an NCName.
  2. Follow the following substeps:
    1. If ownerDocument attribute of n is null, then abort these substeps.
    2. If documentElement attribute of the node set to ownerDocument attribute of n is null, then abort these substeps.
    3. If nodeName attribute of the node set to documentElement attribute of the node set to ownerDocument attribute of n is different from nodeName of n, then raise an xml-validity-error.
  3. Validate the publicId attribute value as a public identifier.
  4. Validate the systemId attribute value as a system identifier.
  5. If the publicId attribute value of n is not null and the systemId attribute value of n is null, then raise an xml-well-formedness-error. @@ publicId == null? Or, publicId == ""
  6. For each node nc in the childNodes list of n,
    1. If nc is not a ProcessingInstruction node, then it is an xml-well-formedness-error. @@ ref to manakai's extensions
    2. Otherwise, check nc recusrively.
  7. For each node in the entities, notations, and elementTypes lists of n, check the node recursively.
  8. @@ externally declared?
  9. If the NamedNodeMap object in the entities attribute of n does not contain Entity nodes whose nodeName attribute are amp, lt, gt, apos, and quot then raise xml-misc-recommendation(s).
If n is an Element node
  1. Validate the localName attribute value as an NCName.
  2. If the prefix attribute value is different from null, then validate the prefix attribute value as an NCName.
  3. For each node nc in the childNodes list of n,
    1. If nc is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recursively.
  4. @@ #elementvalid Validity constraint: Element Valid
  5. Let attrs be the value of the attribute attribute of n. Check conformance of attrs as following:
    1. If attrs contains an Attr node whose nodeName attribute value is equal to that of another Attr node in attrs, then raise an xml-well-formedness-error.
    2. @@ #RequiredAttr Validity constraint: Required Attribute
If n is an ElementTypeDefinition node
  1. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
  2. @@ At user option, an XML processor MAY issue a warning when a declaration mentions an element type for which no declaration is provided, but this is not an error.
  3. @@ For compatibility, it is an error if the content model allows an element to match more than one occurrence of an element type in the content model.
  4. @@ #vc-MixedChildrenUnique Validity constraint: No Duplicate Types
  5. @@ At user option, an XML processor MAY issue a warning if attributes are declared for an element type not itself declared, but this is not an error.
  6. If there is more than one AttributeDefinition node with attribute type ID in the NamedNodeMap list contained in the attributeDefinitions attribute of n, then raise an xml-validity-error.
  7. If there is more than one AttributeDefinition node with attribute type NOTATION in the NamedNodeMap list contained in the attributeDefinitions attribute of n, then raise an xml-validity-error.
  8. "For interoperability, the same Nmtoken SHOULD NOT occur more than once in the enumerated attribute types of a single element type."
If n is an Entity node whose notationName attribute value is null (i.e. a parsed entity)
  1. Raise an entity-error.
  2. Validate the nodeName attribute value as an NCName.
  3. Validate the publicId attribute value as a public identifier.
  4. Validate the systemId attribute value as a system identifier.
  5. If the publicId attribute value of n is not null and the systemId attribute value of n is null, then raise an xml-well-formedness-error.
  6. For each node nc in the childNodes list of n,
    1. If nc is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recursively.
If n is an Entity node whose notationName attribute value is not null (i.e. an unparsed entity)
  1. Validate the nodeName attribute value as an NCName.
  2. Validate the publicId attribute value as a public identifier.
  3. Validate the systemId attribute value as a system identifier.
  4. If the systemId attribute value of n is null, then raise an xml-well-formedness-error.
  5. Validate the notationName attribute value of n as an NCName.
  6. @@ #not-declared Validity constraint: Notation Declared
  7. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
If n is an EntityReference node
  1. An entity-error.
  2. Validate the nodeName attribute value as an NCName.
  3. For each node nc in the childNodes list of n,
    1. If nc is not an Element, Text, CDATASection, Comment, ProcessingInstruction, or EntityReference node, then it is an xml-well-formedness-error.
    2. Otherwise, if nc is not an EntityReference node, then it is an entity-error.
    3. Otherwise, check nc recursively.
If n is a Notation node
  1. Validate the nodeName attribute value as an NCName.
  2. Validate the publicId attribute value as a public identifier.
  3. Validate the systemId attribute value as a system identifier.
  4. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
If n is a ProcessingInstruction node
  1. If the target attribute value matches to the string xml in any case combination, then raise a xml-well-formedness-error.
  2. Otherwise, validate the target attribute value as an NCName.
  3. Then, validate the data attribute value as an XML character data.
  4. If the data attribute value contains a string ?>, then raise a xml-well-formedness-error.
  5. If the data attribute value starts with either U+0009 CHARACTER TABULATION, U+000A LINE FEED, U+000D CARRIAGE RETURN, or U+0020 SPACE character, then raise a round-trip-error.
  6. If the childNodes list of n contains any nodes, then raise an xml-well-formedness-error.
  7. @@ Warn if not declared
If n is a Text node
  1. Validate the data attribute value as an XML character data.
  2. If the childNodes list of n contains any nodes, they are in xml-well-formedness-error.
Otherwise
xml-well-formedness-error? unknown-error?

7.3 Character Classes

This section defines a couple of character classes. These classes are referred to by algorithms specified above.

Character class Char10 contains the following characters:

  • U+0009 CHARACTER TABULATION
  • U+000A LINE FEED
  • U+000D CARRIAGE RETURN
  • U+0020 SPACE .. U+D7FF
  • U+E000 .. U+FFFD REPLACEMENT CHARACTER
  • U+10000 .. U+10FFFF

This character class contains all characters allowed in the production rule Char of XML 1.0 [XML].

Character class CompatChar10 contains the following characters:

  • @@ Document authors are encouraged to avoid "compatibility characters", as defined in section 6.8 of [Unicode @@ Unicode 2.0 @@] (see also D21 in section 3.6 of [Unicode3]).

Character class ControlChar10 contains the following characters:

  • U+007F DELETE .. U+0084 INDEX
  • U+0086 START OF SELECTED AREA .. U+009F APPLICATION PROGRAM COMMAND
  • U+FDD0 .. U+FDEF
  • U+1FFFE .. U+1FFFF
  • U+2FFFE .. U+2FFFF
  • U+3FFFE .. U+3FFFF
  • U+4FFFE .. U+4FFFF
  • U+5FFFE .. U+5FFFF
  • U+6FFFE .. U+6FFFF
  • U+7FFFE .. U+7FFFF
  • U+8FFFE .. U+8FFFF
  • U+9FFFE .. U+9FFFF
  • U+AFFFE .. U+AFFFF
  • U+BFFFE .. U+BFFFF
  • U+CFFFE .. U+CFFFF
  • U+DFFFE .. U+DFFFF
  • U+EFFFE .. U+EFFFF
  • U+FFFFE .. U+FFFFF
  • U+10FFFE .. U+10FFFF

This character class contains the characters listed in the Note in Section 2.2 of XML 1.0 [XML], as amended by errata.

The character class NameStartChar10 contains the following characters:

This character class contains all characters allowed as the first character of a string matching to the production rule Name of XML 1.0 [XML].

The character class NameChar10 contains the following characters:

This character class contains all characters allowed as the second character of a string matching to the production rule Name of XML 1.0 [XML].

The character class PubidChar contains the following characters:

  • U+0009 CHARACTER TABULATION
  • U+000A LINE FEED
  • U+000D CARRIAGE RETURN
  • U+0020 SPACE
  • U+0021 EXCLAMATION MARK (!)
  • U+0023 DOLLAR SIGN ($)
  • U+0024 NUMBER SIGN (#)
  • U+0025 PERCENT SIGN (%)
  • U+0027 APOSTROPHE (')
  • U+0028 LEFT PARENTHESIS (()
  • U+0029 RIGHT PARENTHESIS ())
  • U+002A ASTERISK (*)
  • U+002B PLUS SIGN (+)
  • U+002C COMMA (,)
  • U+002D HYPHEN-MINUS (-)
  • U+002E FULL STOP (.)
  • U+002F SOLIDUS (/)
  • U+0030 DIGIT ZERO (0) .. U+0039 DIGIT NINE (9)
  • U+003A COLON (:)
  • U+003B SEMICOLON (;)
  • U+003D EQUAL SIGN (=)
  • U+003F QUESTION MARK (?)
  • U+0040 COMMERCIAL AT (@)
  • U+0041 LATIN CAPITAL LETTER A (A) .. U+005A LATIN CAPITAL LETTER Z (Z)
  • U+005F LOW LINE (_)
  • U+0061 LATIN CAPITAL LETTER A (A) .. U+007A LATIN CAPITAL LETTER Z (Z)

This character class contains all characters allowed in the production rule PubidChar of XML 1.0 [XML].

8 Obsolete features

8.1 The DOM feature string for the DOM Document Type Definitions

Ealier versions of this specification defined the DOM feature [DOM3CORE] http://suika.fam.cx/www/2006/feature/XDoctype and http://suika.suikawiki.org/www/2006/feature/XDoctype, version 3.0.

8.2 Obsolete attribute

The internalSubset attribute of the DocumentTypeDefinition interface has been removed.

The hasReplacementTree attribute of the Entity interface MUST return false.

The attribute was used to get or set whether the entity has the replacement subtree (as its childNodes) or not.

partial interface Element {
  readonly attribute DOMString? manakaiBaseURI;
};

partial interface ProcessingInstruction {
  readonly attribute DOMString? manakaiBaseURI;
};

The manakaiBaseURI attribute of the Element and ProcessingInstruction interfaces MUST return null.

References

Need to update references

Normative References

DOM
DOM Standard, WHATWG.
DOM3CORE
Document Object Model (DOM) Level 3 Core Specification, Version 1.0, W3C Recommendation, . This version of the specification is referenced. The latest version of the specification is available at <http://www.w3.org/TR/DOM-Level-3-Core>.
DOMBIND
Language Bindings for DOM Specifications, W3C Working Draft, . This version of the specification is referenced. The latest version of the specification is available at <http://www.w3.org/TR/DOM-Bindings/>.
DOMPERL
manakai's DOM Perl Binding, editor's draft. Work in progress.
FETCH
Fetch Standard, WHATWG.
HTML
HTML, WHATWG.
INFOSET
XML Information Set (Second Edition), W3C Recommendation, . This version of the specification is referenced. The latest version of the specification is available at <http://www.w3.org/TR/xml-infoset>.
KEYWORDS
Key words for use in RFCs to Indicate Requirement Levels, IETF BCP 14, RFC 2119, . This version of the specification is referenced.
MANAKAI
manakai's DOM Extensions, editor's draft. Work in progress.
XML
Extensible Markup Language (XML) 1.0, W3C.
XML11
Extensible Markup Language (XML) 1.1, W3C.
XMLNS
Namespaces in XML 1.0, W3C.
XMLNS11
Namespaces in XML 1.1, W3C.
XML5
XML-ER, Anne van Kesteren.

Non-normative References

DOM1WD
Document Object Model Specification, W3C Working Draft, .
Document Object Model Specification, W3C Working Draft, .
Document Object Model Specification, Version 1.0, W3C Working Draft, .
DOM3AS
Document Object Model (DOM) Level 3 Abstract Schemas Specification, Version 1.0, W3C Note, . The latest version of the specification is available at <http://www.w3.org/TR/DOM-Level-3-AS>.
XMLBASE
XML Base, W3C.
XSAPI
XML Schema API, W3C Member Submission, . The latest version of the specification is available at <http://www.w3.org/Submission/xmlschema-api/>.