<https://suika.suikawiki.org/www/markup/xml/domdtdef/domdtdef>
<wakaba@suikawiki.org>
To the extent possible under law, the editor has waived all copyright and related or neighboring rights to this work. In addition, as of 18 September 2022, the editor has made this specification available under the Open Web Foundation Agreement Version 1.0, which is available at http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0.
This specification defines how various implementations of XML by the manakai project process XML documents, including: how to parse XML documents; how information in XML DTDs are represented in DOM; how base URLs are determined in XML documents; how namespace prefixes are serialized; and how conformance of XML documents are checked.
This section describes the status of this document at the time of its publication. Other documents might supersede this document.
This document is a specification produced as part of the manakai project. It is not intended to be implemented by any other DOM or XML implementation, although nothing prevents it from being implemented by any other party. It might be updated, replaced, or obsoleted by other documents at any time.
This specification was originllay developed in DOM3 age. Most part of this document need to be rewritten to align with modern Web standard specifications.
Comments on this document are welcome and may be sent to the author.
Translations of this document might be available. The English version of the document is the only normative version.
This section is non-normative.
This specification defines how various implementations of XML by the manakai project process XML documents, including: how to parse XML documents; how information in XML DTDs are represented in DOM; how base URLs are determined in XML documents; how namespace prefixes are serialized; and how conformance of XML documents are checked.
In general, this specification does not intend to override requirements imposed by various XML specifications. It is not intended to be implemented by any other DOM or XML implementation, although nothing prevents it from being implemented by any other party than the manakai project. Especially, it is not intended to be implemented by Web browsers. In fact some of features in this specification (e.g. handling of external entities) are not compatible with Web contents.
This section is non-normative.
Most part of this specification was originally published as
DOM Document Type Definition. The First Edition of the
DOM Document Type Definition specification is available
at <https://suika.suikawiki.org/www/markup/xml/domdtdef/domdtdef-20071209>
.
After publishing of the First Edition, the specification had moved to
the Living Standard model. Version history of that and this
specifications are available at <https://suika.suikawiki.org/gate/git/wi/markup/xmlexts.git/history/HEAD:/domdtdef/source.en.html>
.
Some part of this specification is derived from the manakai's
XML Conformance Checking specification.
Revision history of the specification is available at <https://suika.suikawiki.org/gate/cvs/markup/xml/xmlcc/xmlcc-work.en.html>
.
Some part of this specification is derived from the Namespace
fixup note, whose revision history is available at <https://suika.suikawiki.org/gate/git/wi/markup/xmlexts.git/history/HEAD:/nsfixup-source.en.html>
.
All examples and notes in this specification are non-normative, as are all sections explicitly marked non-normative.
Known issues are marked like this. They are non-normative.
Everything else in this document is normative.
The key words MUST
,
MUST NOT
, and MAY
in this document are to be interpreted as described in
RFC
2119
[KEYWORDS].
Conformance requirements phrased as algorithms or specific steps MAY be implemented in any manner, so long as the end result is equivalent. Requirements phrased in the imperative as part of algorithms are to be interpreted with the meaning of the key word used in introducing the algorithm.
The following paragraph is no longer applied, but strictErrorChecking has effect for certain methods (need to define them...)
Sometimes an exception is phrased as being required to be raised. As
specified in the DOM Core specification
[DOM3CORE],
when the value of the strictErrorChecking
attribute of the
Document
node is false
,
then the exception is not have to be raised.
Some conformance requirements are phrased as requirements on interfaces, attributes, methods, or objects. Such requirements are applied to implementations.
The construction a
,
where Foo
objectFoo
is an interface, is used
instead of the more accurate an object implementing the interface
.Foo
The construction raise a
, where Bar
Bar
is
an exception code, is used instead of the more accurate raise an
exception implementing the interface
.DOMException
[DOM3CORE]
with its code
attribute set to
Bar
The WebIDL fragments in this document are to be interpreted as described in the WebIDL specification [DOMBIND].
The term null is definedby the WebIDL specification [DOMBIND].
The terms, interfaces, attributes, and exceptions context
object, child, parent, node or
Node
(which has nodeName
, adopt, node document, clone, equals, base URL), document or
Document
(which has doctype
), XML
document, HTML document, element or Element
(which has namespace, namespace prefix, local name, and attribute list), attribute or
Attr
(which has namespace, namespace prefix, local name, and value), doctype or
DocumentType
(which has doctype name and systemId
), XML
namespace, XMLNS namespace, in a document,
collection, throw,
DOMException
,
HierarchyRequestError
,
InvalidCharacterError
, and
NotFoundError
are defined by the DOM Standard
[DOM].
The terms and steps live, space character, XML MIME type, document base URL, the document's address, resolve a URL, Content-Type metadata, willful violation, XML parser, setting and getting of an attribute, event loop, task, queue a task, parser pause flag, input byte stream, character token, character reference in attribute value state, consume a character reference, additional allowed character, named character references, and stops parsing are defined in the HTML Standard [HTML].
The interfaces and attributes
NamedNodeMap
, entities
, notations
,
Entity
, and Notation
are defined in the DOM3 Core specification [DOM3CORE].
The term owner element is defined in the Manakai DOM Extensions specification [MANAKAI].
The terms Name
,
internal entity, external entity, unparsed
entity, general entity, parameter entity,
internal subset, external subset, entity
reference, XML declaration, text
declaration, version
pseudo-attribute,
encoding
pseudo-attribute, standalone
pseudo-attribute, replacement text, and process
are defined in the XML 1.0 specification [XML].
The terms parse error, input stream, tokenization, character token, end tag token, tree construction, phase, start phase, main phase, end phase, stack of open elements, current element, list of entities, list of parameter entities, append an entity, and entity flag are defined in the XML5 specification [XML5].
The predefined entity names are apos
,
amp
, lt
, gt
, and
quot
.
The predefined public IDs are the public identifiers associated with the DTD containing declarations for the named character references, as specified in the HTML Standard.
The terms fetch, request, response, type, body, process response, process response body, and process response end-of-file are defined in the Fetch Standard [FETCH].
Following paragraph need to be updated:
Some IDL fragments are presented as patch to an
IDL fragment defined in another document. They are to be
interpreted as an instance of
InterfaceMembers
[DOMBIND].
If the IDL fragment to which the patch is applied has any
instance of
InterfaceMember
with same
identifier
part as that of an instance of
InterfaceMember
in the patch, then the former instance of
InterfaceMember
MUST be ignored.
Implementations of this specification MUST support features defined in XML 1.0 [XML] and Namespaces in XML 1.0 [XMLNS] specifications.
Unless explicitly specified, featurs in this specification does not support XML 1.1.
This document sometimes cites parts of XML 1.0 and Namespaces in XML 1.0 specifications by hyperlinks. If such a requirement is applied to XML 1.1, however, they should be read as references to XML 1.1 [XML11] and Namespaces in XML 1.1 [XMLNS11] specifications.
XXX Should we drop XML 1.1 support entirely?
This specification does not specify which edition of relevant specifications has to be supported.
This section defines additional requirements for an XML parser.
An XML parser MUST parse an XML document as specified in the XML5 [XML5] specification.
XXX If steps for an XML parser is not yet defined, similar steps for an HTML parser has to be used instead. Changes to HTML tokenizer made after XML5 has been specced must also be applied to XML tokenizer, where possible. Even for such cases, if the input is not well-formed, it is a parse error. There are before XML declaration phase, before document type phase, and document type phase in the tree construction stage. XML namespaces must also be supported. The XML parser has the entity body phase, which is initially set to the before document type phase. It is used as the next phase of the before XML declaration phase.
An XML tokenizer is a component of the XML parser performing tokenization. An input stream is associated with the tokenizer. Initially, an XML parser has an associated tokenizer and input stream. An XML parser has a stack of tokenizers, which is initially empty.
Unless explicitly specified, various states of the XML parser are shared among tokenizers associated with it.
The XML parser has standalone document, expose DTD content, don't process, and expand external entities flags. Unless explicitly stated, these flags are unset.
DOM APIs such as innerHTML
and
DOMParser
do not set these flags.
Each entry in the list of entities of an XML parser have the open flag, which is initiall unset.
When the XML parser has to append an entity, it MUST do nothing in following cases:
If the expose DTD content flag is set, the
element, attribute list, unparsed entity, and notation declarations, as well as
processing instructions in
DTD, MUST be exposed to the DOM through the
DocumentType
object.
Unless when the previous phase is the start phase or when the currently processing input stream is an entity referenced by the document type declaration or an entity reference (rather than the document entity), everything must be ignored in the document type phase.
An XML parser has need predefined entity declarations and has character entity declarations flags. They are initially unset.
In the document type phase, the need predefined entity declarations flag MUST be set. It MUST be restored to the original value when the phase is set to another value.
The value of the encoding
pseudo-attribute or the
charset
parameter MUST be
interpreted as an encoding label defined by the
Encoding Standard.
Just before the phase is switched from the before document type phase to the start phase, the XML parser MUST run the following steps:
doctype
of the
Document
.
publicId
of doctype.
systemId
of doctype.
The external subset is parsed and processed after the internal subset, if any.
To consume a parameter entity, the XML parser MUST run these steps:
U+0022
QUOTATION
MARK
character ("
)
U+0025
PERCENT SIGN
character (%
)
U+0026
AMPERSAND
character (&
)
U+0027
APOSTROPHE
character ('
)
U+003B
SEMICOLON
character (;
)
U+003C
LESS-THAN
SIGN
character (<
)
U+003D
EQUAL SIGN
character (=
)
U+003E
GREATER-THAN
SIGN
character (>
)
U+0060
GRAVE ACCENT
character (`
)
U+003B
SEMICOLON
character
(;
), consume the character and append the character to
name.
Names of parameter
entities in the list of parameter entities
always contain the ;
character.
When the steps to consume a character reference is applied to an XML parser, the following requirements in this section is applied.
The steps to consume a character reference is defined in the HTML Standard for the HTML parser. This section monkeypatches the steps to implement XML-specific rules.
If the stack of open elements is empty and the state is not the character reference in attribute value state when the steps to consume a character reference is invoked, this is a parse error.
Each entity in the list of entities MUST be appendded to the table of the named
character references as a row, where the first column is the
name of the entity followed by a U+003B
SEMICOLON
character (;
). If the
table already has a row whose first column is equal to the first
column of the new row, the existing row MUST be
replaced by the new row.
If all of following conditions are met, this is a parse error:
U+003B
SEMICOLON
character (;
)
If only the last condition is not met and the need predefined entity declarations flag is set, the XML parser MAY report a warning.
When character tokens are returned using the second column of the table, an XML parser MUST run these steps:
U+003C
LESS-THAN SIGN
character (<
), this is a parse error.
Return nothing and abort the consume a character
reference steps.
U+0026
AMPERSAND
character (&
), append the character to s and remove the character from replacement
text.
U+0026
AMPERSAND
character
(&
):
U+0026
AMPERSAND
character (&
) to s and remove the first character from replacement text.
The number of character tokens returned by these steps can be zero, which is different from returning nothing.
To parse an entity entity with url, original phase, entity body phase, state whose default is null, and in markup declaration flag whose default is unset, the XML parser MUST run these steps:
U+0020
SPACE
character
were processed by the tokenizer.
If the expand external entities flag is unset, a reference to the external entity is expanded to the empty string.
There should be an implementation-specific limit on how many entity references are expanded to defend against billion laughs attack.
Document
of
the XML parser, if referrer is enabled, or
null.
The before XML declaration phase is used to parse the text declaration in the external entity, if any.
When decoding the input byte stream, the character
encoding given in the Content-Type metadata (e.g. the
charset
parameter) MUST be
taken into account. However, the MIME type itself is
ignored.
An external
general entity can be served with various MIME types, including but not limited
to: application/xml-dtd
,
text/xml-parsed-entity
,
application/xml-parsed-entity
, various XML MIME types,
text/sgml
, application/sgml
,
text/plain
, and
application/octet-stream
.
XXX Should the charset
parameter be
taken into account even when the charset
parameter
is not defined for the MIME type?
XXX set some error flag if external entity is not expanded
XXX interaction of standalone=yes
and
external entity reference??
While the stack of tokenizers is not empty, for the purpose of the processing of an end tag token only, the XML parser MUST act as if the stack of open elements does not contain the marker and any other element added to the stack before the marker.
While the stack of tokenizer is not empty and in markup declaration flag is set upon the last invocation of the parse an entity steps:
U+003E
GREATER-THAN SIGN
character (>
) and the state is changed to the
document type internal subset state, it MUST be a parse error and the state
MUST be changed to the bogus markup
declaration state instead.
While the stack of tokenizers is not empty, instead of the stops parsing steps, the XML parser MUST run these steps:
U+0020
SPACE
character
were processed by the tokenizer.
xml-misc-error
.
UTF-16
but
the input byte stream for the entity does not begin with the
BOM, then the parser
MUST raise an
xml-misc-error
.
xml-misc-recommendation
.
amp
, lt
, gt
,
apos
, or quot
, then the parser
MUST raise
xml-misc-recommendation
(s).
entities
attribute of the DocumentType
node MUST contain a NamedNodeMap
object
whose first five items are as follows:
Entity
node whose nodeName
attribute
is amp
. It contains a Text
node whose
data
attribute is set to &
.
Entity
node whose nodeName
attribute
is lt
. It contains a Text
node whose
data
attribute is set to <
.
Entity
node whose nodeName
attribute
is gt
. It contains a Text
node whose
data
attribute is set to >
.
Entity
node whose nodeName
attribute
is quot
. It contains a Text
node whose
data
attribute is set to "
.
Entity
node whose nodeName
attribute
is apos
. It contains a Text
node whose
data
attribute is set to '
.
EntityValue
part of the general entity declaration contains a bare U+003C
LESS-THAN SIGN
(<
) character, then the parser
MUST raise an
xml-misc-warning
.
Name
is equal to the Name
of the element type
declaration, then the parser MUST raise an
xml-validity-error
.
Name
is equal to the Name
of the attribute
definition list declaration, then the parser MUST
raise an
xml-misc-warning
.
Name
is equal to the Name
of the attribute
definition (whether or not in the same attribute definition list
declaration), then the parser MUST raise an
xml-misc-warning
.
If the entity declaration declares a general entity, the following is applied:
Name
is lt
or amp
If the entity declaration does not declare an internal entity,
or if the replacement text of the entity is not the escaped form of
<
(if lt
) or &
(if
amp
), then the parser MUST raise an
xml-misc-error
.
In other word, the character in the EntityValue
has to be double-escaped.
Name
is gt
, quot
, or
apos
If the entity declaration does not declare an internal entity,
or if the replacement text of the entity is not equal to or not the
escaped form of >
(if gt
), "
(if
quot
), or '
(if apos
), then the
parser MUST raise an
xml-misc-error
.
In other word, the character in the EntityValue
has to be single- or double-escaped.
If the entity declaration has to be ignored since there has already
been declared an entity with the same Name
as the declaration,
then the parser MUST raise a
misc-info
and abort these steps.
Five predefined entities, i.e. amp
, lt
,
gt
, quot
, and apos
, are always
declared implicitly and therefore any declaration for such an entity
always raises an
misc-info
.
If the entity declaration declares a parameter entity and the
Name
of the entity begins with the string xml
(in any combination of upper- and lowercase letters), then the parser
MUST raise an
xml-misc-warning
.
If the entity declaration contains the EntityValue
,
then for each occurence of any references to unparsed entities in the
EntityValue
, the parser MUST raise an
xml-misc-error
.
If the entity declaration declares a general entity, then an
Entity
node MUST be created and
appended to the NamedNodeMap
object in the
entities
attribute of the DocumentType
node.
Read the external entity
If the replacement text of the entity is read, then parse the
replacement text as if it were referenced from the content of an
element (with no namespace bindings). If no @@ parse error
is raised by the parsing process, then the nodes generated by the
parsing MUST be appended to the Entity
node. The parse error MUST NOT be propagated to
the entire parsing process. Other kinds of errors
MUST be propagated. The first parse error
MUST abort the internal parsing process.
@@ better wording
@@ prop
Then, the Entity
node and its descendant
MUST be marked as read-only.
Name
is equal to the Name
of the notation
declaration, then the parser MUST raise an
xml-validity-error
.
Name
of the tag is not declared by a processed
element type declaration as EMPTY
content, then the parser
MUST raise an
xml-misc-recommendation
.
Name
of the tag is declared by a processed element
type declaration as EMPTY
content, then the parser
MUST raise an
xml-misc-recommendation
.
The parser MUST set the normalized value of
the attribute to the value
attribute of the Attr
node created for the attribute.
That is, any entity reference has to be expanded. Unexpanded entity references in attribute values are discarded.
xml:space
attribute
xml:space
attribute to the value
attribute
of the Attr
node created for the attribute even if the
normalized value is different from default
or
preserve
.
Process as follows:
standalone
pseudo-attribute set to
yes
xml-well-formedness-error
.
xml-validity-error
.
In any of two cases above, process as follows:
IGNORE
d section.
standalone
pseudo-attribute of the
XML declaration (if any) is set to yes
.
entity-error
.
allDeclarationsProcessed
@@ ref
attribute of the Document
node MUST be
set to false
.
Process as follows:
Name
of the entity reference is either
amp
, lt
, gt
, quot
,
or apos
, then abort these steps.
standalone
pseudo-attribute set to
yes
xml-well-formedness-error
.
xml-validity-error
.
In any of two cases above, process as follows:
entity-error
.
@@ entity declared WFC?
Comment
node MUST be created
and inserted appropriately.
The parser MUST try to read any entity referenced by general or parameter entity references and the external subset entity, if any in the document type definition.
Well-formedness constraints. When the parser
detects a voilation to one of certain well-formedness constraints, it
MUST raise an xml-well-formedness-error
.
The list of such well-formed constraints is as follows:
Validity constraints. When the parser detects a
violation to one of certain validity contraints, it MUST raise an xml-validity-error
. The
list of such validity constraints is as follows:
Other creteria. If the parser detects a violation
to one of certain additional constraints, it MUST raise an xml-misc-recommendation
.
The list of such constraints is as follows:
For interoperability, if a parameter-entity reference appears in achoice
,seq
, orMixed
construct, its replacement text SHOULD contain at least one non-blank character, and neither the first nor last non-blank character of the replacement text SHOULD be a connector (|
or,
).
External parsed entities SHOULD each begin with a text declaration.
The parser MUST act as if it is a validating XML processor for the purpose of informing of white space characters appearing in element content (See Section 2.10 of the XML specification).
In other word, the isElementContentWhitespace
attribute
of Text
nodes has to be set appropriately. Note that the
value of the attribute will be set to false
for any
Text
node in the content of an element whose declaration
is not processed.
The parser MUST raise at least one xml-well-formedness-error
if the entity
it parses does not match to the appropriate production rule in the XML
specification. As an exception to this requirement, it MAY choose not to raise such an error if the error
will be raised by the conformance checker when the conformance checker
checks
the Document
object produced by the parser.
This section defines various extensions to DOM to represent definitions that can be contained in XML DTDs. They are referred to as DOM Document Type Definitions, which is a set of DOM interfaces, including both new interfaces and modifications to existing standard DOM interfaces.
The primary goal of those features is to make it possible to built an XML DTD validator on the top of the extended DOM API.
Although these features are defined as extensions to the standard DOM interfaces, they are not expected to be implemented by Web browsers.
Interfaces defined in this section are partially modeled and inspired by early Working Drafts of DOM Level 1 [DOM1WD], DOM Level 3 Abstract Schemas draft [DOM3AS], and XML Schema API specification [XSAPI], but they are not compatible with any of them as a whole.
Features in this section are applied to both XML documents and HTML documents.
This specification introduces two kinds of nodes: element types and attribute definitions. Also, this specification reuses two kinds of nodes defined by DOM3 Core specification [DOM3CORE] (but obsoleted by current DOM Standard [DOM]): entities and notations. Requirements on nodes are applied by the following rules:
ElementTypeDefinition
and
AttributeDefinition
)
DocumentType
This specification defines how nodes attached to the
DocumentType
(e.g. ElementTypeDefinition
and
Entity
) affect to attributes and methods of the nodes. When the DocumentType
has no such attached node, they are expected to behave as specified by
the DOM Standard.
Entity
and Notation
)
Nodes of types ElementTypeDefinition
,
AttributeDefinition
, Entity
, and
Notation
cannot contain, or cannot be contained by, any
kind of node as child.
Historically, DOM3 Core specification and earlier versions of
this specification have allowed AttributeDefinition
and
Entity
nodes containing children.
partial interface Node {
const unsigned short ELEMENT_TYPE_DEFINITION_NODE = 81001;
const unsigned short ATTRIBUTE_DEFINITION_NODE = 81002;
};
The nodeType
attribute MUST return the following, depending on the
context object:
ElementTypeDefinition
ELEMENT_TYPE_DEFINITION_NODE
(81001
)
AttributeDefinition
ATTRIBUTE_DEFINITION_NODE
(81002
)
The nodeName
attribute MUST return the name (element type name or attribute definition
name) of the node.
If the node has attributes attributes
, localName
, namespaceURI
, and prefix
, they MUST
return null
.
The nodeValue
attribute and
the textContent
attribute MUST return the following, depending on the
context object:
AttributeDefinition
Entity
Setting the nodeValue
attribute or the textContent
attribute MUST do the following, depending on
the context object:
AttributeDefinition
Entity
On setting, the textContent
attribute MUST act as if it was the empty
string instead if the new value is null.
When a node is cloned, the following values MUST be copied:
DocumentType
The DOM Standard defines more values to copy.
ElementTypeDefinition
AttributeDefinition
Entity
The encoding is not copied.
Notation
For the equality of two nodes, equality of the following values MUST also be taken into account:
DocumentType
ElementTypeDefinition
AttributeDefinition
Entity
Notation
For the comparison of node sets, only the length and equality of
nodes with same nodeName
in the
set are significant. The order of the nodes are not ignored for the
purpose of the comparison. For the comparison of allowed
tokens, the length of items and existence of the values in both
are taken into account.
If at least one of two nodes compared by the compareDocumentPosition
method (i.e. reference and other)
are ElementTypeDefinition
,
AttributeDefinition
, Entity
, or
Notation
and they are different nodes, the following
rules are applied:
DOCUMENT_POSITION_PRECEDING
to DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC
.
DOCUMENT_POSITION_FOLLOWING
to DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC
.
nodeName
is less than reference's nodeName
in code point order, the
method MUST return the resulf of adding DOCUMENT_POSITION_PRECEDING
to DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC
.
DOCUMENT_POSITION_FOLLOWING
to DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC
.
A node of type ElementTypeDefinition
,
Entity
, or Notation
has an associated owner document type
definition. The AttributeDefinition
node has an
associated owner
element type definition. Their values are null
unless otherwise specified. When not ambiguous, they are simply
referred to as owner of the node.
The ownerDocumentTypeDefinition
attribute of the ElementTypeDefinition
,
Entity
, and Notation
interfaces MUST return the owner document type
definition of the context object. The ownerElementTypeDefinition
attribute of the AttributeDefinition
interface MUST return the owner element type
definition of the context object.
The publicId
attribute and the systemId
attribute of the Entity
and Notation
interfaces MUST
return the public ID and system ID of the context
object, respectively..
On setting, the publicId
attribute
and the systemId
attribute of the
DocumentType
, Entity
, and
Notation
interfaces MUST set the
public ID and system ID of the context
object, respectively, to the new value.
In DOM Standard and DOM3 Core specifications, these attributes are read-only.
Setting an invalid identifier to these attribute might make the node unserializable in the XML syntax.
When a NamedNodeMap
collection represents element types, general entities,
notations, or attribute
definitions, methods getNamedItemNS
, setNamedItemNS
, and removeNamedItemNS
of
the NamedNodeMap
interface [DOM3CORE] MUST throw a
"NotSupportedError
" exception.
Several factory methods are added to the Document
interface.
partial interface Document {
DocumentType createDocumentTypeDefinition(DOMString name);
ElementTypeDefinition createElementTypeDefinition(DOMString name);
AttributeDefinition createAttributeDefinition(DOMString name);
Entity createGeneralEntity(DOMString name);
Notation createNotation(DOMString name);
};
Earlier versions of this specification called this WebIDL
fragment as the DocumentXDoctype
interface.
The createDocumentTypeDefinition(name)
method MUST run these steps:
Name
production, throw an
"InvalidCharacterError
" exception and terminate these
steps.
The createElementTypeDefinition(name)
method MUST run these steps:
Name
production, throw an
"InvalidCharacterError
" exception and terminate these
steps.
The createAttributeDefinition(name)
method MUST run these steps:
Name
production, throw an
"InvalidCharacterError
" exception and terminate these
steps.
The createGeneralEntity(name)
method MUST run these steps:
Name
production, throw an
"InvalidCharacterError
" exception and terminate these
steps.
The createNotation(name)
method MUST run these steps:
Name
production, throw an
"InvalidCharacterError
" exception and terminate these
steps.
name does not have to be a namespace qualified name.
Each doctype has associated unordered sets of nodes element types, general entities, and notations. Unless otherwise specified, they MUST be empty when the doctype is created.
The set element types can only contains element types.
The elementTypes
attribute of the
DocumentType
interface MUST return
an NamedNodeMap
collection containing the nodes in the
element types set
of the context object, sorted by their nodeName
's code point.
The general
entities of the doctype is
exposed by the entities
attribute of the DocumentType
interface [DOM3CORE]. The generalEntities
attribute of the
DocumentType
interface MUST return
the same object as the entities
attribute. The nodes
in the entities
collection MUST
be sorted by their nodeName
's
code point.
The notations of
the doctype is exposed by the notations
attribute of the
DocumentType
interface [DOM3CORE]. The nodes in the notations
collection MUST
be sorted by their nodeName
's
code point.
Need to merge with manakai-allow-doctype-children
configuration parameter spec...
A DocumentType
interface
MAY contain zero or more
ProcessingInstruction
nodes in the NodeList
object
contained in the childNodes
attribute of the
DocumentType
node.
If the DocumentType
node is created
during the process to create a DOM from an XML
document, the NodeList
object in the childNodes
object MUST contains the
ProcessingInstruction
nodes representing the processing
instructions in the document type definition of the document processed
[XML,
XML11] by the XML processor.
If a DocumentType
node is created
from a
document
type declaration information item
[INFOSET],
the NodeList
object in the childNodes
attribute
of the node MUST contain the
ProcessingInstruction
nodes created from any
processing
instruction information items in the list in the [children] property of the
document
type declaration item in the same order.
If a DocumentType
node is mapped to
a document type declaration information item, the list in the [children]
property MUST contain the processng instruction
information items created from the ProcessingInstruction
nodes
in the NodeList
object in the childNodes
attribute
of the DocumentType
node.
partial interface DocumentType {
[TreatNullAs=EmptyString] attribute DOMString publicId;
[TreatNullAs=EmptyString] attribute DOMString systemId;
attribute DOMString? declarationBaseURI;
attribute DOMString? manakaiDeclarationBaseURI;
readonly attribute NamedNodeMap elementTypes;
readonly attribute NamedNodeMap generalEntities;
ElementTypeDefinition getElementTypeDefinitionNode
(in DOMString name);
Entity getGeneralEntityNode
(in DOMString name);
Notation getNotationNode
(in DOMString name);
ElementTypeDefinition? setElementTypeDefinitionNode(ElementTypeDefinition node);
Entity? setGeneralEntityNode(Entity node);
Notation? setNotationNode(Notation node);
ElementTypeDefinition removeElementTypeDefinitionNode(ElementTypeDefinition node);
Entity removeGeneralEntityNode(Entity node);
Notation removeNotationNode(Notation node);
};
Earlier versions of this specification named this WebIDL fragment
as the DocumentTypeDefinition
interface.
If the DocumentType
node is created during the process
to create a DOM from an XML document, the following
requirements are applied: The NamedNodeMap
object in the
elementType
attribute
MUST be so transformed that the object contains
the ElementTypeDefinition
nodes for the element types
whose name is presented as the Name
of the element type or
attribute definition list declarations processed
[XML,
XML11] by the XML
processor. If there are more than one element type declarations for an
element type, then the declarations other than the first one
MUST be ignored for the purpose of constructing the
NamedNodeMap
object.
All entities declared in the document type definition contained in or referenced from the document entity might not be exposed through this collection, depending on the information provided by the XML processor for the DOM implementation. In particular, it might not contain any entity if entity references are expanded at the parse time. An implementation [['''MUST NOT''']] expose an [CODE(DOMi)@en[[[Entity]]]] node whose [CODE(DOMa)@en[[[nodeName]]]] is equal to the name of one of five predefined general entities in XML through the collection as the result of parsing of an XML document that has no error. Duplicate entity declarations are also discarded.
The attribute [['''MUST''']] return the [CODE(DOMi)@en[[[NamedNodeMap]]]] object that contains all the [CODE(DOMi)@en[[[Entity]]]] nodes representing general entities belong to the node.
If the [CODE(DOMi)@en[[[DocumentTypeDefinition]]]] node is created from an XML document, duplicate notation declarations, if any, in DTD [['''MUST NOT''']] result in a node in the [CODE(DOMi)@en[[[NamedNodeMap]]]] object and only the first declaration [['''MUST''']] be made available as a [CODE(DOMi)@en[[[Notation]]]] node.
;; This definition is based on one for the [CODE(DOMa)@en[[[notations]]]] attribute of the [CODE(DOMi)@en[[[DocumentType]]]] interface in DOM XML module. Since duplication is violation to the Validity Constraint, XML parsers might vary on how notations are notified to the application. In particular, [CODE(InfoProp)@en[[[notations]]]] property of the document information item in XML Information Set is so defined that in case ''any'' notation is declared for multiple times then the property has no value.
The attribute [['''MUST''']] return the [CODE(DOMi)@en[[[NamedNodeMap]]]] object that contains all the [CODE(DOMi)@en[[[Notation]]]] nodes representing notations belong to the node.
getElementTypeDefinitionNode
,
methodReturns the
ElementTypeDefinition
node
with the specified name.
The name parameter is the name of the element type.
When invoked, the method MUST return the
ElementTypeDefinition
node,
whose nodeName
is equal to
name, in the
NamedNodeMap
object in the
elementTypes
attribute of the
node. If there is no such node, it MUST return
null
.
getGeneralEntityNode
,
methodReturns the Entity
node
with the specified name.
The name parameter is the name of the general entity.
When invoked, the method MUST return the
Entity
node,
whose nodeName
is equal to
name, in the
NamedNodeMap
object in the
entities
attribute of the
node. If there is no such node, it MUST return
null
.
getNotationNode
,
methodReturns the Notation
node
with the specified name.
The name parameter is the name of the notation.
When invoked, the method MUST return the
Notation
node,
whose nodeName
is equal to
name, in the
NamedNodeMap
object in the
notations
attribute of the
node. If there is no such node, it MUST return
null
.
setElementTypeDefinitionNode
,
methodsetElementTypeDefinitionNode(node)
method MUST return the
result of setting node to the context object's element types.
setGeneralEntityNode
, methodsetGeneralEntityNode(node)
method MUST return the
result of setting node to the context object's general entities.
setNotationNode
, method
setNotationNode(node)
method MUST return the
result of setting node to the context object's notations.
removeElementTypeDefinitionNode
, method
removeElementTypeDefinitionNode(node)
method MUST return the
result of removing node from the context object's element types.
removeGeneralEntityNode
, method
removeGeneralEntityNode(node)
method MUST return the
result of removing node from the context object's general entities.
removeNotationNode
, method
removeNotationNode(node)
method MUST return the
result of removing node from the context object's notations.
To remove a node node from the set of nodes set, run these steps:
NotFoundError
" exception and
abort these steps.
To set a node node to the set of nodes set associated to the node owner, run these steps:
HierarchyRequestError
" and abort these steps.
nodeName
is equal to node's nodeName
in
set, let oldNode be the node and
remove oldNode
from set.
The ElementTypeDefinition
objects are simply known as
element types. It represents
the definition of the element, identified by the name.
Although the element type
is corresponding to the ELEMENT
declaration in the XML
DTD, the element type does not
directly represent the ELEMENT
declaration.
interface ElementTypeDefinition : Node {
readonly attribute DocumentType? ownerDocumentTypeDefinition;
readonly attribute NamedNodeMap attributeDefinitions;
AttributeDefinition? getAttributeDefinitionNode(DOMString name);
AttributeDefinition? setAttributeDefinitionNode(AttributeDefinition node);
AttributeDefinition removeAttributeDefinitionNode(AttributeDefinition node);
};
Each element type has an associated unordered set of nodes attribute definitions. Unless otherwise specified, it MUST be empty when the element type is created. The set attribute definitions can only contains attribute definition.
The attributeDefinitions
attribute of
the ElementTypeDefinition
interface MUST return an NamedNodeMap
collection containing the nodes in the
attribute
definitions set of the context object, sorted by
their nodeName
's code point.
Even if there are more than one element type declarations for an
element type in DTD, the result DOM will
contain only an ElementTypeDefinition
node for that element type. In addition, if there are attribute
definition declarations for an element type, even when there is no
element type declaration for that element type, the DOM
will contain an ElementTypeDefinition
node for that element type.
If the
ElementTypeDefinition
node
is created during the process to create a DOM from an
XML document, the following requirements are applied: The
NamedNodeMap
object in the
attributeDefinitions
attribute MUST be so transformed that the object
contains the
AttributeDefinition
node
corresponding to the attribute definitions in the attribute definition
list declarations processed
[XML,
XML11] by the XML processor and
associated to the element type represented by the node. If there are more
than one attribute definitions for an attribute, then the definitions other
than the first one MUST be ignored for the purpose
of constructing the NamedNodeMap
object.
getAttributeDefinitionNode
,
methodReturn the
AttributeDefinition
node
with the specified name.
The name parameter is the name of the attribute.
When invoked, the method MUST return the
AttributeDefinition
node,
whose nodeName
attribute value is
equal to name,
in the NamedNodeMap
in the
attributeDefinitions
attribute of the node. If there is no such node, it
MUST return null
.
The setAttributeDefinitionNode(node)
method of the
ElementTypeDefinition
object MUST
return the result of setting node to the context object's attribute
definitions.
The removeAttributeDefinitionNode(node)
method of the
ElementTypeDefinition
object MUST
return the result of removing
node from the context object's attribute
definitions.
A future version of the specification might define a set of attributes and methods for representing and accessing to the content model of the element type.
partial interface ElementTypeDefinition {
attribute DOMString? contentModelText;
};
The nodes of type ATTRIBUTE_DEFINITION_NODE
represents an attribute
definition. Such a node MUST implement
the AttributeDefinition
interface, which extends the Node
interface.
Each attribute definition has an associated name.
Each attribute
definition has associated declared type,
allowed
tokens, default type,
and default
value. Unless otherwise specified, they are initialized to
their default values when the attribute definition is
created. Their default values are NO_TYPE_ATTR
, the
empty list, UNKNOWN_DEFAULT
,
and the empty string, respectively.
An attribute definition represents a definition of the attribute
associated to an element type. It is corresponding to the attribute definition
in the attribute list declaration in DTD. However, an
AttributeDefinition
node does
not represent the attribute definition in DTD itself. Even if
there are more than one attribute definitions for an attribute of an element
type in DTD, the result DOM will contain only an
AttributeDefinition
node for
that attribute.
interface AttributeDefinition : Node {
// DefaultValueType
const unsigned short UNKNOWN_DEFAULT = 0;
const unsigned short FIXED_DEFAULT = 1;
const unsigned short REQUIRED_DEFAULT = 2;
const unsigned short IMPLIED_DEFAULT = 3;
const unsigned short EXPLICIT_DEFAULT = 4;
readonly attribute ElementTypeDefinition? ownerElementTypeDefinition;
attribute unsigned short declaredType;
attribute DOMString[] allowedTokens;
attribute unsigned short defaultType;
};
[NoInterfaceObject]
interface AttrDeclaredValueType {
// DeclaredValueType
const unsigned short NO_TYPE_ATTR = 0;
const unsigned short CDATA_ATTR = 1;
const unsigned short ID_ATTR = 2;
const unsigned short IDREF_ATTR = 3;
const unsigned short IDREFS_ATTR = 4;
const unsigned short ENTITY_ATTR = 5;
const unsigned short ENTITIES_ATTR = 6;
const unsigned short NMTOKEN_ATTR = 7;
const unsigned short NMTOKENS_ATTR = 8;
const unsigned short NOTATION_ATTR = 9;
const unsigned short ENUMERATION_ATTR = 10;
const unsigned short UNKNOWN_ATTR = 11;
};
AttributeDefinition implements AttrDeclaredValueType;
Attr implements AttrDeclaredValueType;
declaredType
of type
unsigned short
The declared type
[XML,
XML11] of the attribute. It is expected that
this attribute contains a value from the definition group
DeclaredValueType
.
On getting, the attribute MUST return the value associated to this attribute.
On setting, it MUST set the specified value as the value associated to this attribute.
If the AttributeDefinition
node is created during the process
to create a DOM from an XML document, an
appropriate value from the
DeclaredValueType
constant
group MUST be set to the attribute.
The definition group
DeclaredValueType
contains
integers indicating the declared type of attributes. The definition
group contains the following constants:
Name | Value | Description |
---|---|---|
NO_TYPE_ATTR
| 0 | No value [INFOSET]. |
CDATA_ATTR
| 1 | CDATA
[XML,
XML11]. |
ID_ATTR
| 2 | ID
[XML,
XML11]. |
IDREF_ATTR
| 3 | IDREF
[XML,
XML11]. |
IDREFS_ATTR
| 4 | IDREFS
[XML,
XML11]. |
ENTITY_ATTR
| 5 | ENTITY
[XML,
XML11]. |
ENTITIES_ATTR
| 6 | ENTITIES
[XML,
XML11]. |
NMTOKEN_ATTR
| 7 | NMTOKEN
[XML,
XML11]. |
NMTOKENS_ATTR
| 8 | NMTOKENS
[XML,
XML11]. |
NOTATION_ATTR
| 9 | NOTATION
[XML,
XML11]. |
ENUMERATION_ATTR
| 10 | Enumeration [XML, XML11]. |
UNKNOWN_ATTR
| 11 | Unknown, because no declaration for the attribute has been read but the [all declarations processed] property [INFOSET] would be false. |
If no attribute type information is available, or if the source
of the information does not distinguish
no
value and
unknown
[INFOSET],
then the value
NO_TYPE_ATTR
MUST be used.
An AttributeDefinition
node
created by the
createAttributeDefinition
method has its declaredType
attribute
set to NO_TYPE_ATTR
.
If the source of the information does not distinguish
no
value and/or
unknown
[INFOSET] and
CDATA
[XML,
XML11], then the value
CDATA_ATTR
MUST
be used.
In Perl binding
[DOMPERL], the
Attr
nodes MUST implement the
DeclaredValueType
definition group.
allowedTokens
of
type DOMString
[]The list of allowed attribute values.
On getting, the attribute MUST return the
DOMStringList
object associated to this attribute. The object
MAY contain zero or more ordered strings, consist of
zero or more characters respectively, possibly with duplications.
If the AttributeDefinition
node is created during the process to create a DOM from an
XML document, the object MUST contain
the names or name tokens allowed for the attribute defined by the node.
If the document is well-formed, the object will be empty unless
the declaredType
is
ENUMERATION_ATTR
or
NOTATION_ATTR
.
If the declaredType
is different
from ENUMERATION_ATTR
or
NOTATION_ATTR
, this attribute
MUST be ignored for the purpose of serializing
into (part of) XML document.
When serializing the node it should be noted that the object might be empty, might contain duplications, and might contain strings that are not names or name tokens.
defaultType
of
type unsigned short
The type of the default for the attribute. It is expected that this
attribute contains a value from the definition group
DefaultValueType
.
On getting, the attribute MUST return the value associated to this attribute.
On setting, it MUST set the specified value as the value associated to this attribute.
If the
AttributeDefinition
node is
created during the process to create a DOM from an
XML document, an appropriate value from the
DefaultValueType
definition
group MUST be set to the attribute.
The definition group
DefaultValueType
contains integers indicating the type of the default for the attribute.
The definition group contains the following constants:
Name | Value | Description |
---|---|---|
UNKNOWN_DEFAULT
| 0 | Unknown. |
FIXED_DEFAULT
| 1 | Provided explicitly and only the value is allowed. [XML, XML11]. |
REQUIRED_DEFAULT
| 2 | No default value and the attribute have to be explicitly specified. |
IMPLIED_DEFAULT
| 3 | Implied [XML, XML11]. |
EXPLICIT_DEFAULT
| 4 | Provided explicitly. |
If the source of the default type does not distinguish
implied and unknown default types, then the value
IMPLIED_DEFAULT
MUST be used.
An AttributeDefinition
node created by the
createAttributeDefinition
method has its defaultType
attribute set to
UNKNOWN_DEFAULT
.
The Entity
node is known as general entity, or when not
ambiguous, simply entity.
Each entity has an associated public ID and system ID. Unless otherwise specified, their values are the empty string when the entity is created.
partial interface Entity {
readonly attribute DocumentType? ownerDocumentTypeDefinition;
[TreatNullAs=EmptyString] attribute DOMString publicId;
[TreatNullAs=EmptyString] attribute DOMString systemId;
attribute DOMString? declarationBaseURI;
attribute DOMString? manakaiDeclarationBaseURI;
attribute DOMString? notationName;
readonly attribute DOMString hasReplacementTree;
attribute DOMString? manakaiEntityURI;
attribute DOMString? manakaiEntityBaseURI;
attribute boolean isExternallyDeclared;
};
Each entity has an associated notation name. Unless otherwise specified, it is null when the entity is created.
The notationName
attribute returns the notation name of the
context object.
On setting, the notationName
attribute of the
Entity
interface MUST set the
notation name of the
context object to the new value.
In DOM3 Core specification, this attribute was read-only.
Setting an invalid name to this attribute would make the node unserializable in the XML syntax.
Each entity has an associated replacement text. Unless otherwise specified, the replacement text is the empty string when an entity is created.
isExternallyDeclared
of type boolean
Whether the entity is declared by an external entity declaration or
not. If the value is true
, the entity is declared in an entity
declaration in the external subset entity or in an external parameter entity.
If the value is false
, the entity is declared in an entity
declaration in the internal subset, or the node is created in memory.
On getting, the attribute MUST return the value associated to this attribute.
On setting, it MUST set the specified value as the value associated to this attribute.
If the Entity
node is created during
the process to create a DOM from an XML document,
the following requirements are applied: If the entity is an unparsed
entity, then the attribute MUST be set to
false
. Otherwise, i.e. the entity is a parsed entity, then the
attribute MUST be set to the value of whether the
entity is declared by an external markup declaration or not.
A entity has associated entity URL and entity base URL. Unless otherwise specified, their values are null.
On getting, the manakaiEntityURI
attribute of the Entity
interface MUST run these steps:
On setting, the attribute MUST run these steps:
It is expected that an XML Parser supporting this specification set the entity URL when an external entity is read.
The Notation
node is simply known as notation.
Each notation has an associated public ID and system ID. Unless otherwise specified, their values are the empty string when the notation is created.
partial interface Notation {
readonly attribute DocumentType? ownerDocumentTypeDefinition;
[TreatNullAs=EmptyString] attribute DOMString publicId;
[TreatNullAs=EmptyString] attribute DOMString systemId;
attribute DOMString? declarationBaseURI;
attribute DOMString? manakaiDeclarationBaseURI;
};
To determine the base URL of a node node, these steps MUST be followed:
Document
, return
document base URL of node and abort
these steps.
base
attribute in the
XML namespace:
base
attribute in the XML namespace, resolve the value of the attribute relative to the base URL of the attribute. If it succeeded, return the
result and abort these steps.
These steps ignore external entity boundary, which is a willful violation to the XML Base specification [XMLBASE].
On getting, the declarationBaseURI
and manakaiDeclarationBaseURI
attributes of the DocumentType
interface MUST return the base URL of the context
object. On setting, they MUST do
nothing.
Each entity or notation has an associated declaration base URL. Its default value is null.
On getting, the declarationBaseURI
and
manakaiDeclarationBaseURI
attributes of the Entity
or Notation
interface MUST return effective declaration
base URL, i.e. the declaration base URL of the
context object if it is not null, or the
base URL of the context
object otherwise.
On setting, they MUST run these steps:
partial interface Document {
attribute DOMString? manakaiEntityBaseURI;
};
The document, like entities, has the entity base URL, initially null.
On getting, the manakaiEntityBaseURI
attribute of the Document
or Entity
interface MUST run these steps:
Entity
, return the manakaiEntityURI
attribute
value of the context object if it is not
null.
On setting, the attribute MUST run these steps:
manakaiEntityURI
attribute value, if the context object is an
Entity
and it it not null, or the base URL otherwise, of the
context object. Set the entity base URL of the
context object to the result if succeeded, or
null otherwise.
It is expected that an XML Parser supporting this
specification set the entity base
URL when a document or external entity is read. If the entity
is retrieved using HTTP, the Content-Base:
and
Content-Location:
header fields might affect the value.
Note also that they does not affect the value of the document
base URL of the document.
This section is non-normative.
This section describes how to implement namespace fixup.
This section is non-normative.
A namespace mapping is a data structure with the following fields:
xml
, the XML
namespace) and (xmlns
, the XMLNS
namespace).
There will be at most one pair with same key in the prefix to URL list. However, there might be more than one pairs with same value in the list.
A default
namespace declaration with namespace URL url is
an attribute whose namespace is the XMLNS
namespace, local
name is xmlns
, and value is url.
A namespace prefix
declaration with prefix prefix and namespace
URL url is an attribute whose namespace is the XMLNS
namespace, namespace prefix is
xmlns
, local
name is prefix, and value is url.
This section is non-normative.
The steps for namespace fixup of an XML element for serialization, with an element element and optional namespace mapping nsmap, are as follows:
xml
, or if value is the XML namespace or the
XMLNS namespace, do nothing.
xmlns
:
These steps return three values. The first value, tag name, is used as the tag name in the start and end tag of the serialization of element. The second value, attribute specifications, contains zero or more pairs of attribute names and values, which are used in the start tag of the serialization of element. The third value, nsmap, is used as the input to the steps for serialization of the child elements of element.
The steps to obtain a QName, with parameters namespace URL, prefix, local name, attributes, and nsmap, where attributes and nsmap are shared with the callee such that any modification to them is also reflected to those variables in the steps which invoked these steps, and the is element flag, are as follows:
xmlns
:
xml
.
xmlns
.
xmlns
,
do nothing.
U+003A
COLON
character
(:
) to qname.
U+003A
COLON
character
(:
) to qname.
U+003A
COLON
character
(:
) to qname.
xmlns
:
an
where n is chosen such that
n is the minimum non-negative integer an
is not found as a key of the
prefix to URL list
of nsmap.
U+003A
COLON
character
(:
) to qname.
These steps are based on, but not equivalent to, Gecko's implementation at the time of writing.
These steps do not prevent an unserializable set of tag
name and attributes from being returned when,
for example, there is an xmlns
attribute in null namespace, there is
an xml
attribute in
the XMLNS namespace whose value is not equal to the
XML namespace, or the given nsmap
contains a key which is not a valid XML NCName
.
This section is non-normative.
There is no comprehensive test suite of namespace fixup at the time of writing.
There are some tests at: <https://suika.suikawiki.org/~wakaba/test/web/dom/element/inner-html/>
and <https://github.com/manakai/perl-web-markup/blob/master/t/modules/Web-XML-Serializer.t>
.
If there is a parse error, the document is not well-formed.
Much of invalid (well-formed or not) XML document parsing and XML document / XML DOM conformance is left undefined so that this document provides a guideline for conformance checkers.
Conceptually, validation of an XML document is split into two stages for the purpose of this specification: the XML document parsing stage and the DOM XML conformance checking stage.
The input to the XML document parsing stage is a byte sequence representing the parsed XML document (and any additional metadata), and the output are a DOM tree representing the XML document and zero or more errors. The processor that implements this stage is called parser. Requirements for a parser are defined in the section of Parsing an XML Document.
The input to the DOM XML conformance chcking stage is a DOM tree, and the output are zero or more errors. The processor that implements this stage is called conformance checker. Requirements for a conformance checker are defined in the section of Checking an XML DOM Tree.
An error is ...
If a Document
node has no
xml-well-formedness-error, entity-error, and unknown-error,
then it is well-formed. If a well-formed Document
node has no xml-validity-error, it is valid.
A well‐formed Document
can be safely serialized
into a well‐formed XML document. A valid Document
can be easily serialized into a valid XML document.
To be a conforming validating XML processor, ...
Errors are classified into these error categories:
entity-error
@@
This algorithm does not support
DOM tree with one or more EntityReference
nodes. It is expected that any entity references are expanded
at the parse time and any unexpandable entity references
make parse time errors raised so that never result in
DOM tree with EntityReference
nodes.
round-trip-error
round-trip-warning
A round-trip-warning
will be raised when a construct, which might not be restored to the same
construct when it is serialized and then re-parsed by a conforming
processor, is encountered.
For a Comment
node a
round-trip-warning
will
be raised, since XML processors are not required to
report texts of comments for applications.
unknown-error
?xml-misc-error
xml-misc-fatal-error
xml-misc-recommendation
An
xml-misc-recommendation
will be raised if a SHOULD‐level requirement
in XML specification is not met.
xml-validity-error
xml-well-formedness-error
If an xml-well-formedness-error
is raised,
it would not be possible to generate an XML serialization that
would match to the appropriate production rule and that would not
violate to any well‐formedness constraint in XML
specification [XML,
XML11].
misc-info
A misc-info
is raised when some status information
on parsing or checking process that are considered useful for debugging
and so on is available. It by no means implies the non-conformance of
the document.
@@ TODO: #dt-atuseroption at user option (MAY or MUST), #dt-compat for compatibility, #dt-interop for interoperability
TODO: XML 1.1, XML Namespace 1.0/1.1, xml:base, xml:id
TODO: XML "error"/"fatal error" is not always non-conforming (only when MUST or SHOULD).
The following algorithms and definitions are applied to XML documents; especially, they are not applied to HTML documents.
The XML version of a node is
the XML version of the document to which the node belongs.
For a Documemt
node, the XML version
of the document is the value of the xmlVersion
attribute of the node. For a DocumentType
node whose
ownerDocument
attribute is set to null
,
the XML version of the document is 1.0
.
For any other node, the XML version of the document
is that of the Document
node contained in the
ownerDocument
attribute of the node.
To to validate an XML string (s), the following algorithm MUST be used:
Char10
,
then raise an
xml-well-formedness-error
.CompatChar10
,
then raise an
xml-misc-warning
.ControlChar10
,
then raise an
xml-misc-warning
.U+000D
CARRIAGE RETURN
character, then
raise a
round-trip-error
.
@@ We should not raise duplicate errors for U+000D
in attribute values. In addition, we should support a mode where
U+000D will be serialized as
(so that no round-trip-error
will be raised).To
validate a
Name
(s), the following
algorithm MUST be used:
xml-well-formedness-error
.
Abort these steps.NameStartChar10
, then raise
an
xml-well-formedness-error
.NameChar10
, then raise an
xml-well-formedness-error
.xml
(in any
case combination), then raise an
xml-misc-warning
.
@@ except for attribute names xml:lang
,
xml:space
.To
validate
an NCName
(s), the
following algorithm MUST be used:
To validate a public identifier (pid), the following algorithm MUST be used:
null
, abort these steps.PubidChar
, then
raise an
xml-well-formedness-error
.U+0009
CHARACTER TABULATION
,
U+000A
CARRIAGE RETURN
,
and U+000D
LINE FEED
characters, if the first character of pid is
U+0020
SPACE
character,
if the last character of pid is U+0020
SPACE
character, or if there is a
U+0020
SPACE
character
immediately followed by another U+0020
SPACE
character in pid, then it is a
round-trip-error
.
Is this really a roundtripness problem? XML spec
does only define the way to match public identifiers in fact, no
canonical form.To validate a system identifier (sid), the following algorithm MUST be used:
null
, abort these steps.U+0022
QUOTATION MARK
("
) and
U+0027
APOSTROPHE
('
) characters, raise an
xml-well-formedness-error
.U+0023
NUMBER SIGN
(#
)
character, then raise an
xml-misc-error
.Node
The algorithm to check a node (n) is defined as following:
Attr
nodelocalName
attribute value as an NCName
.prefix
attribute value is different from
null
, then validate
the prefix
attribute value as an NCName
.childNodes
list of n,
Text
or EntityReference
node, then it is an
xml-well-formedness-error
.EntityReference
node, then it is an
entity-error
.nodeName
attribute of n is
xml:space
@@ or {xml namespace}:space ?
and value
attribute of n is neither
default
nor preserve
, then it is an
xml-misc-error
.specified
,
manakaiAttributeType
(#ValueType Validity constraint: Attribute Value Type)value
of n.ID_ATTR
Name
. If it fails, then raise an
xml-validity-error
.ID
v is defined,
then raise an
xml-validity-error
.Name
. If it fails, then raise an
xml-validity-error
.ID
v is NOT
defined, then raise an
xml-validity-error
.Name
. If it fails, then raise an
xml-validity-error
.Entity
v is NOT
defined, then raise an
xml-validity-error
.Nmtoken
. If it fails, then raise an
xml-validity-error
.xml-validity-error
.xml-validity-error
.xml-validity-error
.AttributeDefinition
nodenodeName
attribute of n is
xml:space
@@ or {xml namespace}:space ?
and its declared type is different from (default|preserve),
(preserve|default), (default), or (preserve), then raise an
xml-misc-error
.childNodes
list of n,
Text
or EntityReference
node, then it is an
xml-well-formedness-error
.EntityReference
node, then it is an
entity-error
.NOTATION_ATTR
, enumerated values MUST
be declared. If not, then raise an
xml-validity-error
.NOTATION_ATTR
or ENUMERATED_ATTR
,
values MUST all be distinct. If not, then raise an
xml-validity-error
.NOTATION_ATTR
on an EMPTY
element, then raise an
xml-validity-error
.CDATASection
nodedata
attribute value as an XML character
data.data
attribute value contains
a string ]]>
, then raise an
xml-well-formedness-error
.childNodes
list of n contains
any nodes, they are in
xml-well-formedness-error
.Comment
noderound-trip-warning
.data
attribute value as an XML character
data.data
attribute value contains
a string --
, or if it ends with a character
-
, then raise an
xml-well-formedness-error
.childNodes
list of n contains
any nodes, they are in
xml-well-formedness-error
.Document
node1.0
or 1.1
,
then it is an unknown-error?
.xmlEncoding
attribute value does not
match to [A-Za-z] ([A-Za-z0-9._] | '-')*
@@ formal def, then it is an
xml-well-formedness-error
.childNodes
list of n have to
consist of zero or more Comment
and/or
ProcessingInstruction
nodes, followed by
an optional DocumentType
node, followed
by zero or more Comment
and/or
ProcessingInstruction
nodes, followed by
an Element
node, followed
by zero or more Comment
and/or
ProcessingInstruction
nodes. Any violation to this is an
xml-well-formedness-error
.childNodes
list of n,
EntityReference
node, then
check
nc recursively.allDeclarationsProcessed
DocumentFragment
nodechildNodes
list of n,
Element
, Text
, CDATASection
,
Comment
, ProcessingInstruction
, or
EntityReference
node, then it is an
xml-well-formedness-error
.EntityReference
node, then it is an
entity-error
.DocumentType
nodenodeName
attribute value as an NCName
.ownerDocument
attribute of n is
null
, then abort these substeps.documentElement
attribute of the node
set to ownerDocument
attribute of n is
null
, then abort these substeps.nodeName
attribute of the node set to
documentElement
attribute of the node set to
ownerDocument
attribute of n is
different from nodeName
of n,
then raise an
xml-validity-error
.publicId
attribute value as a public identifier.systemId
attribute value as a system identifier.publicId
attribute value of n is
not null
and the systemId
attribute
value of n is null
, then raise an
xml-well-formedness-error
.
@@ publicId == null? Or, publicId == ""childNodes
list of n,
ProcessingInstruction
node, then it is an
xml-well-formedness-error
.
@@ ref to manakai's extensionsentities
, notations
,
and elementTypes
lists of n,
check the
node recursively.NamedNodeMap
object in the entities
attribute of n does not contain Entity
nodes
whose nodeName
attribute are amp
,
lt
, gt
, apos
, and quot
then raise
xml-misc-recommendation
(s).
Element
nodelocalName
attribute value as an NCName
.prefix
attribute value is different from
null
, then validate
the prefix
attribute value as an NCName
.childNodes
list of n,
Element
,
Text
, CDATASection
, Comment
,
ProcessingInstruction
, or
EntityReference
node, then it is an
xml-well-formedness-error
.EntityReference
node, then it is an
entity-error
.attribute
attribute of n. Check conformance of attrs
as following:
Attr
node whose
nodeName
attribute value is equal to that of another
Attr
node in attrs, then raise an
xml-well-formedness-error
.ElementTypeDefinition
nodechildNodes
list of n contains
any nodes, they are in
xml-well-formedness-error
.At user option, an XML processor MAY issue a warning when a declaration mentions an element type for which no declaration is provided, but this is not an error.
For compatibility, it is an error if the content model allows an element to match more than one occurrence of an element type in the content model.
At user option, an XML processor MAY issue a warning if attributes are declared for an element type not itself declared, but this is not an error.
AttributeDefinition
node
with attribute type ID
in the
NamedNodeMap
list contained in the
attributeDefinitions
attribute of n, then raise an
xml-validity-error
.AttributeDefinition
node
with attribute type NOTATION
in the
NamedNodeMap
list contained in the
attributeDefinitions
attribute of n, then raise an
xml-validity-error
.Entity
node whose
notationName
attribute value is null
(i.e. a
parsed entity)entity-error
.nodeName
attribute value as an NCName
.publicId
attribute value as a public identifier.systemId
attribute value as a system identifier.publicId
attribute value of n is
not null
and the systemId
attribute
value of n is null
, then raise an
xml-well-formedness-error
.childNodes
list of n,
Element
,
Text
, CDATASection
, Comment
,
ProcessingInstruction
, or EntityReference
node, then it is an
xml-well-formedness-error
.EntityReference
node, then it is an
entity-error
.Entity
node whose
notationName
attribute value is not null
(i.e. an unparsed entity)nodeName
attribute value as an NCName
.publicId
attribute value as a public identifier.systemId
attribute value as a system identifier.systemId
attribute value of n is
null
, then raise an
xml-well-formedness-error
.notationName
attribute value of n as an
NCName
.childNodes
list of n contains
any nodes, they are in
xml-well-formedness-error
.EntityReference
nodeentity-error
.nodeName
attribute value as an NCName
.childNodes
list of n,
Element
,
Text
, CDATASection
, Comment
,
ProcessingInstruction
, or EntityReference
node, then it is an
xml-well-formedness-error
.EntityReference
node, then it is an
entity-error
.Notation
nodenodeName
attribute value as an NCName
.publicId
attribute value as a public identifier.systemId
attribute value as a system identifier.childNodes
list of n contains
any nodes, they are in
xml-well-formedness-error
.ProcessingInstruction
nodetarget
attribute value matches to the string
xml
in any case combination, then raise a
xml-well-formedness-error
.target
attribute value as an NCName
.data
attribute value as an XML character
data.data
attribute value contains a string
?>
, then raise a
xml-well-formedness-error
.data
attribute value starts with either
U+0009
CHARACTER
TABULATION
, U+000A
LINE
FEED
, U+000D
CARRIAGE
RETURN
, or U+0020
SPACE
character, then raise a
round-trip-error
.childNodes
list of n contains
any nodes, then raise an
xml-well-formedness-error
.Text
nodedata
attribute value as an XML character
data.childNodes
list of n contains
any nodes, they are in
xml-well-formedness-error
.This section defines a couple of character classes. These classes are referred to by algorithms specified above.
Character class Char10
contains the following characters:
U+0009
CHARACTER
TABULATION
U+000A
LINE FEED
U+000D
CARRIAGE
RETURN
U+0020
SPACE
.. U+D7FF
U+E000
.. U+FFFD
REPLACEMENT CHARACTER
U+10000
.. U+10FFFF
Character class CompatChar10
contains the following characters:
Document authors are encouraged to avoid "compatibility characters", as defined in section 6.8 of [Unicode @@ Unicode 2.0 @@] (see also D21 in section 3.6 of [Unicode3]).
Character class ControlChar10
contains the following characters:
U+007F
DELETE
..
U+0084
INDEX
U+0086
START OF SELECTED
AREA
.. U+009F
APPLICATION
PROGRAM COMMAND
U+FDD0
.. U+FDEF
U+1FFFE
.. U+1FFFF
U+2FFFE
.. U+2FFFF
U+3FFFE
.. U+3FFFF
U+4FFFE
.. U+4FFFF
U+5FFFE
.. U+5FFFF
U+6FFFE
.. U+6FFFF
U+7FFFE
.. U+7FFFF
U+8FFFE
.. U+8FFFF
U+9FFFE
.. U+9FFFF
U+AFFFE
.. U+AFFFF
U+BFFFE
.. U+BFFFF
U+CFFFE
.. U+CFFFF
U+DFFFE
.. U+DFFFF
U+EFFFE
.. U+EFFFF
U+FFFFE
.. U+FFFFF
U+10FFFE
.. U+10FFFF
This character class contains the characters listed in the Note in Section 2.2 of XML 1.0 [XML], as amended by errata.
The character class NameStartChar10
contains the following characters:
This character class contains all characters allowed as the first character
of a string matching to the production rule
Name
of XML 1.0
[XML].
The character class NameChar10
contains the following characters:
This character class contains all characters allowed as the second
character of a string matching to the production rule
Name
of XML 1.0
[XML].
The character class PubidChar
contains the following characters:
U+0009
CHARACTER
TABULATION
U+000A
LINE FEED
U+000D
CARRIAGE
RETURN
U+0020
SPACE
U+0021
EXCLAMATION MARK
(!
)U+0023
DOLLAR SIGN
($
)U+0024
NUMBER SIGN
(#
)U+0025
PERCENT SIGN
(%
)U+0027
APOSTROPHE
('
)U+0028
LEFT PARENTHESIS
((
)U+0029
RIGHT
PARENTHESIS
()
)U+002A
ASTERISK
(*
)U+002B
PLUS SIGN
(+
)U+002C
COMMA
(,
)U+002D
HYPHEN-MINUS
(-
)U+002E
FULL STOP
(.
)U+002F
SOLIDUS
(/
)U+0030
DIGIT ZERO
(0
) .. U+0039
DIGIT NINE
(9
)U+003A
COLON
(:
)U+003B
SEMICOLON
(;
)U+003D
EQUAL SIGN
(=
)U+003F
QUESTION MARK
(?
)U+0040
COMMERCIAL AT
(@
)U+0041
LATIN CAPITAL LETTER
A
(A
) .. U+005A
LATIN CAPITAL LETTER Z
(Z
)U+005F
LOW LINE
(_
)U+0061
LATIN CAPITAL LETTER
A
(A
) .. U+007A
LATIN CAPITAL LETTER Z
(Z
)Ealier versions of this specification defined the DOM feature [DOM3CORE]
http://suika.fam.cx/www/2006/feature/XDoctype
and
http://suika.suikawiki.org/www/2006/feature/XDoctype
,
version 3.0
.
The internalSubset
attribute of the
DocumentTypeDefinition
interface has been removed.
The hasReplacementTree
attribute of the
Entity
interface MUST return
false.
The attribute was used to get or set whether the entity has the replacement subtree (as its
childNodes
) or not.
partial interface Element {
readonly attribute DOMString? manakaiBaseURI;
};
partial interface ProcessingInstruction {
readonly attribute DOMString? manakaiBaseURI;
};
The manakaiBaseURI
attribute of the Element
and
ProcessingInstruction
interfaces MUST return null.
Need to update references
<http://www.w3.org/TR/DOM-Level-3-Core>
.<http://www.w3.org/TR/DOM-Bindings/>
.<http://www.w3.org/TR/xml-infoset>
.<http://www.w3.org/TR/DOM-Level-3-AS>
.<http://www.w3.org/Submission/xmlschema-api/>
.