Whatpm::HTML::Dumper
use Whatpm::HTML::Dumper qw(dumptree);
warn dumptree $doc;
The Whatpm::HTML::Dumper
exports a function, dumptree
, which serializes the given document into the format used in HTML parser tests.
The module exports a function:
$dumped = dumptree $doc
Dump the DOM tree. The argument must be a DOM document object (i.e. an instance of Message::DOM::Document class). The function returns the dump for the document and its subtree.
The function serializes the DOM tree into the format used in HTML parser tests, as described in <http://wiki.whatwg.org/wiki/Parser_tests#Tree_Construction_Tests>
, with following exceptions:
Element type definition, entity, and notation nodes attached to a document type node is serialized as if they were children of the document type node. They are inserted before any children of the document type node, sorted by node types in the aforementioned order, then by code point order of their node names.
Element type definition nodes are represented as <!ELEMENT
, followed by a U+0020
SPACE
character, followed by the node name, followed by a U+0020
SPACE
character, followed by the contentModelText
of the node, followed by >
.
Entity nodes are represented as <!ENTITY
, followed by a U+0020
SPACE
character, followed by the node name, followed by a U+0020
SPACE
character, followed by list of textContent
, publicId
, and systemId
of the node (the empty string is used when the value is undef
), where each item is enclosed by "
characters, separated by a U+0020
SPACE
character, followed by a U+0020
SPACE
character, followed by the notationName
of the node, if it is not undef
, followed by >
.
Notation nodes are represented as <!NOTATION
, followed by a U+0020
SPACE
character, followed by the node name, followed by a U+0020
SPACE
character, followecd by list of publicId
and systemId
of the node (the empty string is used when the value is undef
), where each item is enclosed by "
characters, separated by a U+0020
SPACE
character, followed by >
.
Attribute definition nodes attached to an element type definition node is serialized as if they were children of the element type node, sorted by code point order of their node names.
Attribute type definition nodes are represented as the node name, followed by a U+0020
SPACE
character, followed by the keyword represented by declaredType
of the node (or ENUMERATION
if it represents the enumerated type), followed by a U+0020
SPACE
character, followed by (
, followed by the list of allowedTokens
of the node separated by |
, followed by )
, followed by a U+0020
SPACE
character, followed by the keyword represented by defaultType
or the node (or EXPLICIT
if it reprensets the explicit default value), followed by a U+0020
SPACE
character, followed by "
, followed by the textContent
of the node, followed by "
.
The namespace designator for the HTML namespace (http://www.w3.org/1999/xhtml
) is html
. While elements in the HTML namespace are serialized without the namespace designator as in original format, attributes in the HTML namespace are serialized with this namespace designator.
An application can define a custom namespace designator by setting the key-value pair to the %$Whatpm::HTML::Dumper::NamespaceMapping
hash:
$Whatpm::HTML::Dumper::NamespaceMapping->{$url} = $prefix;
For example, if the application does:
$Whatpm::HTML::Dumper::NamespaceMapping
->{q<urn:x-suika-fam-cx:markup:suikawiki:0:9:>}
= 'sw';
... then document
in the SuikaWiki/0.9 namespace is serialized as sw document
.
When no namespace designator is explicitly defined for a namespace, the namespace designator for the namespace is {
followed by the namespace URL followed by }
. If an element has no namespace, the namespace designator for the element is {}
.
Parser tests - WHATWG Wiki <http://wiki.whatwg.org/wiki/Parser_tests>
.
Wakaba <w@suika.fam.cx>.
Copyright 2007-2012 Wakaba <w@suika.fam.cx>.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.