This document is a technical specification produced as part of the manakai project. It might be updated, replaced, or obsoleted by other documents at any time.
The scope of this specification is limited to the products within the manakai project. It does not intended to be implemented by multiple parties, although nothing prevents it from implemented by other DOM implementations.
This section is non-normative.
This specification defines details of DOM tree validation not covered by other applicable specifications.
This specification's goals are to:
The obsvocab specification might be merged with this specification in due course.
This specification was originally published
at <http://suika.suikawiki.org/www/markup/xml/validation-langs>
as Handling of unknown namespaces in conformance
checking since .
Some of texts in this specification were originally part of comments and documentations of the Perl modules developed by the the manakai project.
Earlier versions of this specification had defined validation mode, which was an attempt to define interaction of RDF/XML, RSS1, RSS2, and XSLT vocabularies with the rest of the platform in validation. The concept was abandoned because it introduced much complexity into the specification and validators.
Earlier versions of this specification had contained
non-normative descriptions of how to validate
HTML script
and style
elements.
As definitions of these elements were
slightly simplified in the
HTML Standard,
these descriptions were removed.
Earlier versions of this specification had defined RDF/XML integration, which is dropped due to lack of interest.
This specification depends on the Infra Standard. The terms list, code point, concatenate, ASCII case-insensitive, HTML namespace, XML namespace, and XMLNS namespace are defined by the Infra Standard.
The term label is defined by Encoding Standard.
The terms valid URL string, absolute-URL-with-fragment string, and relative-URL-with-fragment string are defined by URL Standard.
The terms MIME type and valid MIME type is defined by MIME Sniffing Standard.
The term valid language tag is defined by BCP 47.
For the purpose of this specification, user agents are conformance checkers, also known as validators.
User agents
MUST implement
the DOM
Standard. Terms
tree,
child,
parent,
ancestor,
descendant,
node,
node document,
document,
content type,
document element,
element,
attribute list,
attribute,
value,
namespace (of element or attribute),
prefix (of element or attribute),
local name (of element or attribute),
comment (Comment
),
processing instruction (ProcessingInstruction
),
target,
template content,
child text content,
shadow root,
and
shadow tree
are defined by the DOM
Standard.
User
agents MUST implement the
HTML Standard.
Terms
applicable specification,
willful violation,
document base URL,
content model,
inter-element whitespace,
nothing,
valid non-negative integer,
valid floating-point number,
valid global date and time string,
valid e-mail address,
parse a URL,
HTML element,
div
,
and
parse error
are defined by the HTML
Standard.
An element or attribute is in no
namespace if
its namespace is null
.
A non-null
namespace is
a namespace that is not null
.
Unless otherwise specified, an element MAY have child comments and processing instructions. In addition, unless otherwise specified, an element whose content model is not a text (with or without additional constraints) MAY have child inter-element whitespaces. They are ignored for the purpose of content model validation of the element.
The shadow root's content model is flow content.
This subsection defines how to interpret old unmaintained specifications for the purpose of validation.
These are sometimes willful violations to relevant specifications.
When a specification is written without clear statements of requirements, elements and attributes in the namespace and RSS2 elements defined by the specification MUST NOT conflict with the descriptions in that specification. Elements and attributes in the namespace and RSS2 elements not defined by any specification MUST NOT be used. Deprecated features SHOULD NOT be used.
Features that are "reserved" cannot be used unless they are defined by later specification.
When a value has to match to a production rule, it MUST also conform to other requirements for that construct.
When an element's content or an attribute is defined to be one of the following constructs, the element's child text content or the attribute value MUST be conform to the corresponding consturct instead:
A text.
If IRI and IRI reference are distinguished and it is identified as an IRI, it MUST also be an absolute-URL-with-fragment string.
To interpret a string, the rules to parse a URL MUST be used with the node document of the node in which the string appears.
A valid URL string that is a relative-URL-with-fragment string.
charset
parameter should not be specified explicitly
A valid MIME type with no charset
parameter.
X-
A label
A string in the RECOMMENDED format for e-mail addresses.
Same as the id
attribute in
no namespace of HTML
elements
div
element
When an element has to be an XHTML Modularization
div
element that
SHOULD be suitable for handling as XHTML,
it MUST be
a div
element in the HTML
namespace.
When an element has
HTML fragment content, setting its child text
content to innerHTML
of a new
div
element in the HTML
namespace MUST NOT generate
a parse error or create a
non-conforming tree.
Validators are expected to detect obsolete features and show errors and alternative features, if known, to improve authoring experiences.
When a specification is identified as obsolete, the features, including but not limited to the elements and attributes, defined by the specification MUST NOT be used unless otherwise specified by another specification. User agents MAY ignore requirements in the specification. The namespaces defined by an obsolete specification are obsolete unless otherwise specified by another specification.
When a namespace is identified as obsolete, the elements and attributes in the namespace, as well as attributes in no namespace, MUST NOT be used unless otherwise specified by another specification.
The following namespaces are obsolete:
http://www.w3.org/TR/WD-xsl
http://www.w3.org/1999/xlink
http://www.w3.org/2001/xml-events
http://www.w3.org/2002/06/xhtml2
http://www.w3.org/2002/06/xhtml2/
http://www.w3.org/2005/07/aaa
Though obsolete,
some attributes in
the http://www.w3.org/1999/xlink
namespace
are allowed to be specified for SVG
elements.
These were once implemented by some Web user agents or used on the Web but no longer considered as part of the Web platform. Use of them could be authoring errors.
There are many protocol or language features that are extensible. A user agent might or might not know how to validate a feature's instance. Therefore, the answer to a question "Is this instance conforming?" can be yes, no, or unfamiliar.
When a user agent is unfamiliar to a feature, it MUST report the validation result as unfamiliar.
A user agent
either fully
supports, partially
supports, or does not support
a non-null
namespace.
A non-null
namespace is fully supported if all of
its elements
and attributes are defined
by applicable
specifications and all of them are somewhat implemented.
The HTML Standard defines that "Authors MUST NOT use elements, attributes, or attribute values that are not permitted by this specification or other applicable specifications". Use of elements or attributes in a fully supported namespace is not allowed unless they are specified in applicable specifications.
A non-null
namespace
is partially supported if
its elements
and attributes are defined
by applicable
specifications and some of them are implemented.
A non-null
namespace is not supported if
its elements
and attributes are not implemented.
A user agent that implements RSS1 but does
not implement RDF/XML other than as part
of RSS1 support does support the RSS
namespace fully
or partially, but
it does not support
the RDF namespace.
Whether a namespace is fully supported, partially supported, or not supported, are orthogonal to its elements and attributes are unknown or not. A user agent might not support the validation of an attribute value of a fully supported namespace.
Unknown elements are following elements:
Unknown attributes are following attributes:
Unknown elements, attributes in no namespace of unknown elements, other than superglobal attributes, and unknown attributes are unfamiliar features.
An unknown element or unknown attribute MUST NOT be used anywhere except where they are explicitly allowed.
An unknown element MAY be used as the document element or as an orphan node. It MAY also be used where any kind of element is allowed.
An unknown element MAY have any attribute in no namespace or any unknown attribute.
In addition, any attribute allowed by an applicable specification is allowed to be specified for an unknown element.
An unknown element MAY have any kind of child.
An unknown attribute MAY have any value. An attribute in no namespace of an unknown element MAY have any value.
Validators are expected to report errors or warnings on unknown elements and attributes useful for authors.
This specification is not intended to override any other specification's requirements.
For a public Web document, non-standard element which is not defined by any applicable specification ought to be reported as an error.
In the following example, as the bookmark
element in
the http://mybookmark.example/
namespace is not defined
by any applicable
specification, this fragment is non-conforming:
<div xmlns="http://www.w3.org/1999/xhtml">
<bookmark xmlns="http://mybookmark.example/">Hello</bookmark>
</div>
For XML data that is not expected to be directly shown to user
(e.g. an XML data retrieved via XMLHttpRequest
), use of
null- or non-standard namespaces ought not to be an error.
For example, following document fragment should not be considered
as non-conforming, nevertheless none of data
and
item
elements and the name
attribute is
defined by any public standard:
<data>
<item name="x1"/>
<item name="x2"><p xmlns="http://www.w3.org/1999/xhtml">Hi!</p></item>
</data>
If the p
element contained an item
element in no namespace, it ought to be reported as an
error, as no standard defines the item
element in no
namespace as
phrasing content.
Unknown processing
instructions are processing
instructions whose target does not begin
with xml-
(ASCII
case-insensitive). Unknown processing instructions
are unfamiliar features.
Processing
instructions whose target begins
with xml-
are reserved.
Several features are intended for limited use, i.e. not expected to be implemented universally. Such features still have legitimated usages, e.g. used internally during preparation of public documents, used as data formats of particular applications, served as data to be processed by public applications, and so on. Use of limited use features are not invalid. However, use of such features in public context could be problematic. At user option, user agents MAY report an error when an instance of limited use feature is detected.
An unprocessed internal vocabulary might be erroneously exposed within a public document when the authoring pipeline is partially broken. A validator with the option enabled can detect this problem.
Following features are intended for limited use:
title
, description
, publisher
,
contributor
, type
, format
,
identifier
, source
, language
,
coverage
, or rights
in the Dublin
Core namespace
rating
, skipDays
, day
,
skipHours
, hour
, textInput
,
or ttl
http://www.w3.org/2000/09/xmldsig#
,
http://purl.org/rss/1.0/modules/slash/
,
Atom Threading namespace,
http://purl.org/syndication/history/1.0
,
http://purl.org/atompub/tombstones/1.0
,
http://www.w3.org/2007/app
,
http://schemas.google.com/g/2005
, and
http://www.hatena.ne.jp/info/xmlns#
i-default
If a user agent supports WebDAV, the elements in the namespaces defined by the WebDAV specifications would be marked as limited use, as they are not supported by general-purpose Web user agents and are not suitable for public Web documents.
Unless otherwise
specified, attributes class
, id
,
and slot
in no
namespace MUST conform to the
requirements for the attribute with
same name for an HTML
element.
This does not mean use of them are conforming.
User
agents MUST implement the XML
specification and the XML Namespaces
specification. The XML specification is
Extensible Markup Language
(XML) 1.0 and
Errata in
REC-xml-20081126. The xml:lang
and xml:space
attributes
are defined by the XML specification. The XML
Namespaces specification is
Namespaces in XML
1.0 and
Namespaces
in XML 1.0 (Third Edition) Errata.
Since user agents
interpret xml:lang
and xml:space
attributes
defined by the XML specification according to
the XML Namespaces specification, they are considered
as lang
and space
attributes
in the XML namespace, not xml:lang
and xml:space
attributes in
no namespace.
Specifications XML Base and xml:id are obsolete.
Therefore use of xml:base
and xml:id
attributes
are non-conforming.
For the purpose of validation of
the base
element in the HTML
namespace, the attributes in
the XMLNS namespace are not considered as taking URLs.
Base URLs are not applied to them.
Element
and attributes whose prefix
or local name begins with xml
(ASCII
case-insensitive) MUST NOT be considered
as reserved as required by the XML Namespaces
specification.
This is a willful violation to the XML Namespaces specification for consistency with the XML specification.
If an element is unique property element
and the element's parent is
not null
,
the element's parent MUST
NOT have another child element with
same namespace and local name.
User
agents MUST implement the RSS1
specification. The RSS1 specification is
RDF Site Summary
(RSS) 1.0. The RDF namespace
is http://www.w3.org/1999/02/22-rdf-syntax-ns#
.
The RSS namespace is http://purl.org/rss/1.0/
.
The Dublin Core namespace is
http://purl.org/dc/elements/1.1/
.
User
agents MUST implement the RSS1
content specification. The RSS1 content
specification is
RDF
Site Summary 1.0 Modules: Content. The RSS content
namespace
is http://purl.org/rss/1.0/modules/content/
.
The RSS1 content specification is obsolete except for
the encoded
element
in the RSS content namespace.
The encoded
element
in the RSS content namespace, which is only shown as a
"draft" in the specification, is de facto part
of RSS1, while other
"formal" elements in that specification has not been used at all.
An element or attribute in the RDF namespace is either RSS1 RDF element or attribute or RDF/XML RDF element or attribute. An RSS1 RDF element or attribute MUST be validated against the requirements for RSS1. An RDF/XML RDF element or attribute MUST be validated against the requirements for RDF/XML. If the user agent does not support RDF/XML, RDF/XML RDF elements and attributes are unknown RDF elements and attributes.
An element is an RSS 1.0 rdf:RDF
element if:
RDF
,
and
application/rss+xml
, or
rdf
, it has
an xmlns
attribute in the XMLNS
namespace whose value
is the RSS namespace, and it has
an rdf
attribute in the XMLNS
namespace whose value is the RDF namespace.
At user option, user agents
can validate a document as an RSS1
document (ignoring its original context) by overriding
its content type to application/rss+xml
.
If an RDF
element in the
RDF namespace is an RSS 1.0 rdf:RDF
element, it is an RSS1 RDF element.
If a Seq
element in the RDF
namespace is a child of
an items
element in the RSS
namespace, it is an RSS1 RDF element.
If an li
element in the RDF
namespace is a child of
a Seq
element in the RDF
namespace that is an RSS1 RDF element, it is
an RSS1 RDF element.
If an about
or resource
attribute in the RSS
namespace is specified for an RSS1 RDF element, it
is an RSS1 RDF attribute. A user agent that
does not support RDF/XML MAY treat
other about
and resource
attributes in
the RSS namespace as RSS1 RDF attributes.
All other elements and attributes in the RDF namespace are RDF/XML RDF elements and attributes.
Though the RSS1 specification references the RDF/XML specification, a user agent can implement the validation by just implementing the requirements in the RSS1 specification and this specification, ignoring the RDF/XML specification.
A channel
or item
element in
the RSS namespace MAY
have child unknown elements as long as all of the following
conditions are met:
null
namespace.
null
, its local
name does not start with xml
(ASCII
case-insensitive).
xml
(ASCII case-insensitive).
Children unknown elements of a channel
or item
element in the RSS
namespace are unique
property elements.
A description
element in the RSS namespace whose
parent is an item
element in
the RSS namespace has an HTML fragment
content.
An encoded
element in
the RSS content namespace has an HTML fragment
content.
A channel
element in the RSS
namespace MAY
have children link
elements
in the Atom namespace.
User agents MUST implement the RSS2 specifications. The RSS2 specifications are RSS 2.0 and Really Simple Syndication Best Practices Profile.
A document is an RSS2 document if
its document element is
an rss
element in no namespace.
RSS 0.91, 0.92, 0.93, and 0.94 documents are RSS2 documents.
An element is an RSS2 element if it
is in no namespace and its node document is
an RSS2 document. An RSS2 channel
element is an RSS2 element whose local
name is channel
. An RSS2 item
element is an RSS2 element whose local
name is item
. RSS2
elements MUST
be validated against the requirements
for RSS2.
An RSS2 item
element MAY have at most
one child updated
element in
the Atom namespace.
These are unique property elements:
creator
elements
in the Dublin Core namespace of
an RSS2 channel
element or
an RSS2 item
element.
encoded
elements
in the RSS content namespace of
an RSS2 item
element.
comments
elements
in
the namespace http://purl.org/rss/1.0/modules/slash/
of an RSS2 item
element.
An RSS2 item
element or
an RSS2 channel
element MAY
have children unknown elements.
This section applies to user agents implementing Media RSS.
The Media RSS namespace is
http://search.yahoo.com/mrss/
or http://search.yahoo.com/mrss
. These are synonyms.
The latter namespace is obsolete.
The namespace URL without slash has been used for historical reason. It is non-conforming.
These are unique property elements:
starRating
elements
in the Media RSS namespace of
a community
element in the Media RSS
namespace.
statistics
elements
in the Media RSS namespace of
a community
element in the Media RSS
namespace.
tags
elements
in the Media RSS namespace of
a community
element in the Media RSS
namespace.
The average
attribute value of
a starRating
element in the Media RSS
namespace MUST be a valid
floating-point number.
The max
attribute value of
a starRating
element in the Media RSS
namespace MUST be a valid
non-negative integer.
The min
attribute value of
a starRating
element in the Media RSS
namespace MUST be a valid
non-negative integer.
This section applies to user agents implementing Podcast.
The iTunes namespace is
http://www.itunes.com/dtds/podcast-1.0.dtd
or http://www.itunes.com/DTDs/Podcast-1.0.dtd
. These are
synonyms. The latter namespace is obsolete.
Children elements
in the iTunes namespace of
an RSS2 channel
element or
an RSS2 item
element are unique property elements.
The content model of
the image
element in the iTunes
namespace is nothing.
User agents MUST implement the Atom specification. The Atom specification is RFC 4287.
Atom and its extension specifications allow extensions such that almost everything is allowed, which is not useful for validators. This specification defines stricter restrictions for the purpose of validation.
The Atom namespace
is http://www.w3.org/2005/Atom
.
The Atom 0.3 namespace
is http://purl.org/atom/ns#
.
The Atom Threading namespace
is http://purl.org/syndication/thread/1.0
.
The Atom Feed Paging and Archiving namespace is
http://purl.org/syndication/history/1.0
.
Atom family namespaces are the Atom 0.3
namespace, the Atom namespace, the Atom
Threading namespace, the Atom Feed Paging and Archiving
namespace,
http://www.w3.org/2007/app
,
and
http://purl.org/atompub/tombstones/1.0
.
An Atom family element is an element in one
of Atom family namespaces.
The terms Date construct and Person construct are defined by Atom 1.0 specification.
Elements and attributes MUST conform to the constraints expressed in the RELAX NG schema fragments in the applicable specifications.
For an Atom family element, an attribute or child that is not explicitly allowed by an applicable specification MUST NOT be used.
Atom extensible elements are following elements:
extensionElement
,
or extensionSansTitleElement
in content
deleted-entry
elements
in namespace http://purl.org/atompub/tombstones/1.0
feed
elements in
the Atom 0.3 namespace
entry
elements in
the Atom 0.3 namespace
An Atom extensible element MAY have children unknown elements.
These are unique property elements:
complete
elements
in the Atom Feed Paging and Archiving namespace of
a feed
element in the Atom
namespace.
archive
elements
in the Atom Feed Paging and Archiving namespace of
a feed
element in the Atom
namespace.
A content
element in the Atom 0.3
namespace with a type
attribute
whose value
is text/html
(ASCII case-insensitive) has
an HTML fragment content.
need to define <atom:content> content validation when type is a MIME type
An entry
element in the Atom
namespace MAY
have children group
and thumbnail
elements in
the Media RSS namespace.
Elements in the Atom 0.3 namespace MUST NOT be used.
Atom 0.3 is obsolete.
This section applies to user agents
implementing GData image
element
if there is no other specification specifying it.
The GData namespace
is http://schemas.google.com/g/2005
.
A Person construct MAY have at
most one child image
element in
the GData namespace.
The image
element in the GData
namespace MUST have
a rel
attribute whose value
is http://schemas.google.com/g/2005#thumbnail
.
The image
element in the GData
namespace MAY have
a width
attribute
whose value is a valid
non-negative integer.
The image
element in the GData
namespace MAY have
a height
attribute
whose value is a valid
non-negative integer.
The image
element in the GData
namespace MUST have
a src
attribute whose value is a valid URL string.
The content model of
the image
element in the GData
namespace is nothing.
A meta
element in the HTML
namespace MAY have
a property
attribute.
If the attribute is specified, the element MUST
NOT have a name
attribute or an
attribute that cannot be used when
a name
attribute is specified.
The content
attribute MUST be specified if
a property
attribute is
specified. An HTML meta
element
with property
attribute
is metadata content (it is not a phrasing
content).
The value of a property
attribute MUST be an OGP property
name. A OGP property name is a property
value defined by an
applicable specification or a prefixed property
value. A prefixed property value is a
property prefix followed by
a U+003A
COLON
character
(:
) followed by one or more characters. A property
prefix is a string of one or more characters that is not
a U+003A
COLON
character
(:
) and is not used
by property
value defined by
an applicable specification as prefix.
A property
value MUST NOT be used unless it is defined in the
context it is used by an applicable specification.
If the property
attribute
value is og:type
, the content
attribute value MUST be a value allowed as an og:type
value or a prefixed property value.
An example of applicable specifications is The Open Graph protocol.
Templates embedded in a document are not rendered and are often incomplete until they are actually used as part of the document. As such, they are sometimes exepmted from the formal requirements of the specifications. However, whether templates are in error or not could be useful information for authors who want to ensure generated trees would not be broken because of poorly authored templates.
If a node is a template root, it MUST be validated in the template mode.
In the template mode, any violation to the requirements except for those of template specifications is marked as in template. User agents SHOULD render errors in template and the other errors in different manners. At user option, user agents MAY hide errors in template.
This means that errors in the node itself, its attributes, its descendants, its template contents, and its shadow roots are distinguished from errors not in the template mode.
In the template mode, no other template root is recognized.
Template specifications of a template root are specifications defining template root's language. If not specified, there is no template specification.
Template contents are template roots.
HTML elements
with hidden
attribute in no
namespace are template
roots.
This section applies to user agents implementing XSLT1.
User agents supporting XSLT1 MUST implement the XSLT1 specifications. The XSLT1 specifications are DOM XPath and documents directly or indirectly referenced from it defining XPath1 and XSLT1, including XSLT Transformations (XSLT) and non-normative descriptions in the HTML Standard. The terms literal result element, template (of XSLT), attribute value template, extension element (of XSLT), and extension namespace (of XSLT), are defined by the XSLT1 specifications.
The XSLT namespace
is http://www.w3.org/1999/XSL/Transform
.
An element is XSLT element if
its namespace is the XSLT namespace.
A document is an XSLT stylesheet if at least one of the following conditions is true:
stylesheet
element in
the XSLT namespace,
transform
element in
the XSLT namespace,
version
attribute in
the XSLT namespace,
or
application/xslt+xml
or text/xsl
,
At user option, user agents
can validate a document as a XSLT
stylesheet (ignoring its original context) by overriding
its content type to application/xslt+xml
.
For the purpose of validation,
any child of the template content of
a template
element in the HTML
element MUST be treated as if it were
a child of the element when its node
document is an XSLT stylesheet.
Attributes
in non-null
namespace MUST
NOT be specified for XSLT
elements unless they are allowed by applicable specification.
For example, attributes in the XMLNS namespace and unknown attributes are allowed. Attributes in the Atom Threading namespace are not.
The value of the
following attributes MUST
be 1.0
:
version
attribute in no
namespace
of stylesheet
elements in
the XSLT namespace,
version
attribute in no
namespace
of transform
elements in
the XSLT namespace,
and
version
attribute in the XSLT
namespace.
Elements other than XSLT elements MUST NOT be used
as children of a stylesheet
or transform
element in the XSLT
namespace unless they are allowed by applicable specifications.
The value of
the method
attribute in no
namespace of an output
element
in XSLT namespace or
the data-type
attribute in no
namespace of a sort
element
in XSLT namespace MUST NOT be
a QName unless it represents a value allowed by
an applicable specification.
The version
attribute in no
namespace SHOULD NOT be specified for
an output
element in XSLT
namespace. If specified, its value MUST be 1.0
.
Only known meaningful combinations of attributes
are: <xsl:output method="xml">
, <xsl:output
method="xml" version="1.0">
, <xsl:output
method="html">
, and <xsl:output
method="text">
.
The extension
namespaces specified by
the extension-element-prefixes
attribute in
no namespace or in the XSLT
namespace MUST be one
of XSLT extension
namespace candidates. The XSLT extension namespace candidates
are namespaces XSLT extension element candidates
are belong to.
An element is XSLT extension element candidate if its semantics as an extension element is defined by its specification.
Unknown elements
whose namespace is not null
are XSLT extension
element candidates.
The null
namespace, the XML
namespace, the XMLNS namespace, and
the XSLT namespace are not in
the XSLT extension
namespace candidates.
Templates are template roots whose template specifications are the XSLT1 specifications and the specifications of the extension elements.
Various attributes in XSLT templates, including those of literal result elements, can contain attribute value templates, which makes validation complicated (or impossible). How to handle them is a quality-of-implementation issue.
This section is non-normative.
The data-web-defs repository contains some machine-readable data for definitions in this specification, in the following file:
The tests-web repository contains validation test data in these directories:
This document is written by Wakaba <wakaba@suikawiki.org> and is produced as part of the the manakai project.
Per CC0, to the extent possible under law, the author has waived all copyright and related or neighboring rights to this work.