manakai

Whatpm::HTML::Tokenizer

An HTML and XML tokenizer

DESCRIPTION

The Whatpm::HTML::Tokenizer module provides an implementation of HTML and XML tokenizer. Unlike its name, this module can be used for XML documents as well as HTML. It is not intended to be used directly from general-purpose applications; instead it is used as part of HTML or XML parser, such as Whatpm::HTML::Parser and Whatpm::XML::Parser.

The module is intended to be a conforming HTML tokenizer according to Web Applications 1.0 specification (though it is meaningless to discuess the conformance of the tokenizer standalone). By setting the XML flag, it can also tokenize XML documents in a way consistent with the HTML tokenization specification. You might consider it as an implementation of the XML5 tokenization algorithm as "patched" by later HTML5 development.

SEE ALSO

Whatpm::HTML::Parser, Whatpm::XML::Parser.

Whatpm::HTML::InputStream.

SPECIFICATIONS

[HTML]

HTML Living Standard <http://www.whatwg.org/specs/web-apps/current-work/complete.html#tokenization>.

[XML]

XML 1.0 <http://www.w3.org/TR/xml/>.

XML 1.1 <http://www.w3.org/TR/xml11/>.

XML5. See <https://suika.suikawiki.org/~wakaba/wiki/sw/n/XML5> for references.

AUTHOR

Wakaba <wakaba@suikawiki.org>.

LICENSE

Copyright 2007-2012 Wakaba <wakaba@suikawiki.org>.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.