Whatpm::HTML::Tokenizer
The Whatpm::HTML::Tokenizer
module provides an implementation of HTML and XML tokenizer.
Unlike its name,
this module can be used for XML documents as well as HTML.
It is not intended to be used directly from general-purpose applications; instead it is used as part of HTML or XML parser,
such as Whatpm::HTML::Parser and Whatpm::XML::Parser.
The module is intended to be a conforming HTML tokenizer according to Web Applications 1.0 specification (though it is meaningless to discuess the conformance of the tokenizer standalone). By setting the XML flag, it can also tokenize XML documents in a way consistent with the HTML tokenization specification. You might consider it as an implementation of the XML5 tokenization algorithm as "patched" by later HTML5 development.
HTML Living Standard <http://www.whatwg.org/specs/web-apps/current-work/complete.html#tokenization>
.
XML 1.0 <http://www.w3.org/TR/xml/>
.
XML 1.1 <http://www.w3.org/TR/xml11/>
.
XML5.
See <https://suika.suikawiki.org/~wakaba/wiki/sw/n/XML5>
for references.
Wakaba <wakaba@suikawiki.org>.
Copyright 2007-2012 Wakaba <wakaba@suikawiki.org>.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.