1 |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
2 |
<html xmlns="http://www.w3.org/1999/xhtml"> |
<html xmlns="http://www.w3.org/1999/xhtml"> |
3 |
<head> |
<head> |
4 |
<title>Whatpm::HTML - An HTML Parser</title> |
<title>Whatpm::HTML - An HTML Parser and Serializer</title> |
5 |
<link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" /> |
<link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" /> |
6 |
<link rev="made" href="mailto:admin@suika.fam.cx" /> |
<link rev="made" href="mailto:admin@suika.fam.cx" /> |
7 |
</head> |
</head> |
17 |
<li><a href="#synopsis">SYNOPSIS</a></li> |
<li><a href="#synopsis">SYNOPSIS</a></li> |
18 |
<li><a href="#description">DESCRIPTION</a></li> |
<li><a href="#description">DESCRIPTION</a></li> |
19 |
<li><a href="#methods">METHODS</a></li> |
<li><a href="#methods">METHODS</a></li> |
20 |
|
<li><a href="#lowlevel_interface">LOW-LEVEL INTERFACE</a></li> |
21 |
|
<ul> |
22 |
|
|
23 |
|
<li><a href="#application_cache_selection_algorithm_hook">Application Cache Selection Algorithm Hook</a></li> |
24 |
|
</ul> |
25 |
|
|
26 |
|
<li><a href="#error_reports">ERROR REPORTS</a></li> |
27 |
<li><a href="#to_do">TO DO</a></li> |
<li><a href="#to_do">TO DO</a></li> |
28 |
<li><a href="#see_also">SEE ALSO</a></li> |
<li><a href="#see_also">SEE ALSO</a></li> |
29 |
<li><a href="#author">AUTHOR</a></li> |
<li><a href="#author">AUTHOR</a></li> |
35 |
<p> |
<p> |
36 |
</p> |
</p> |
37 |
<h1><a name="name">NAME</a></h1> |
<h1><a name="name">NAME</a></h1> |
38 |
<p>Whatpm::HTML - An HTML Parser</p> |
<p>Whatpm::HTML - An HTML Parser and Serializer</p> |
39 |
<p> |
<p> |
40 |
</p> |
</p> |
41 |
<hr /> |
<hr /> |
46 |
my $s = q<<!DOCTYPE html><html>...</html>>; |
my $s = q<<!DOCTYPE html><html>...</html>>; |
47 |
# $doc = an empty DOM |Document| object |
# $doc = an empty DOM |Document| object |
48 |
my $on_error = sub { |
my $on_error = sub { |
49 |
my $error_code = shift; |
my %error = @_; |
50 |
warn $error_code, "\n"; |
warn $error{type}, "\n"; |
51 |
}; |
}; |
52 |
|
|
53 |
Whatpm::HTML->parse_string ($s => $doc, $onerror); |
Whatpm::HTML->parse_string ($s => $doc, $onerror); |
54 |
|
|
55 |
## Then, |$doc| is the DOM representation of |$s|.</pre> |
## Now, |$doc| is the DOM representation of |$s|.</pre> |
56 |
<p> |
<p> |
57 |
</p> |
</p> |
58 |
<hr /> |
<hr /> |
109 |
</dd> |
</dd> |
110 |
<dd> |
<dd> |
111 |
<p>Note that the <code>Whatpm::NanoDOM</code> module provides a non-conforming |
<p>Note that the <code>Whatpm::NanoDOM</code> module provides a non-conforming |
112 |
implementation of DOM that only implements the subset that |
implementation of DOM that only implements a subset that |
113 |
is necessary for the purpose of <code>Whatpm::HTML</code>'s parsing and |
is necessary for the purpose of <code>Whatpm::HTML</code>'s parsing and |
114 |
serializing. |
serializing. |
115 |
With this module, creating a new HTML <code>Document</code> object |
With this module, creating a new HTML <code>Document</code> object |
130 |
</dd> |
</dd> |
131 |
<dd> |
<dd> |
132 |
<p>The first argument, <em>$node</em>, MUST be a DOM <code>Document</code>, |
<p>The first argument, <em>$node</em>, MUST be a DOM <code>Document</code>, |
133 |
<code>Node</code>, or <code>DocumentFragment</code> object.</p> |
<code>Element</code>, or <code>DocumentFragment</code> node.</p> |
134 |
</dd> |
</dd> |
135 |
<dd> |
<dd> |
136 |
<p>The second argument, <em>$onerror</em>, MUST be a reference to the |
<p>The second argument, <em>$onerror</em>, MUST be a reference to the |
137 |
error handling code. This code will be invoked if a descendant |
error handling code. This code will be invoked if a descendant |
138 |
of <em>$node</em> is not of <code>Element</code>, <code>Text</code>, <code>CDATASection</code>, |
of <em>$node</em> is neither of <code>Element</code>, <code>Text</code>, <code>CDATASection</code>, |
139 |
<code>Comment</code>, <code>DocumentType</code>, or <code>EntityReference</code> so |
<code>Comment</code>, <code>DocumentType</code>, nor <code>EntityReference</code>, so |
140 |
that <code>INVALID_STATE_ERR</code> MUST be thrown. |
that an <code>INVALID_STATE_ERR</code> exception MUST be thrown. |
141 |
The code will be invoked with an argument, which is the node |
The code will be invoked with an argument, which is the node |
142 |
whose type is invalid. |
whose type is invalid. |
143 |
This argument is optional; if missing, any such |
The argument <em>$onerror</em> is optional; if missing, any erroneous |
144 |
node is simply ignored.</p> |
node is simply ignored.</p> |
145 |
</dd> |
</dd> |
146 |
<dd> |
<dd> |
151 |
<p> |
<p> |
152 |
</p> |
</p> |
153 |
<hr /> |
<hr /> |
154 |
|
<h1><a name="lowlevel_interface">LOW-LEVEL INTERFACE</a></h1> |
155 |
|
<p>@@ TBW</p> |
156 |
|
<p> |
157 |
|
</p> |
158 |
|
<h2><a name="application_cache_selection_algorithm_hook">Application Cache Selection Algorithm Hook</a></h2> |
159 |
|
<p>Once a parser <em>$p</em> is instantiated by method <code>new</code>, |
160 |
|
a <code>CODE</code> reference can be set to <code>$p->{application_cache_selection}</code>. |
161 |
|
That <code>CODE</code> will be called back when the application cache selection |
162 |
|
algorithm MUST be run per HTML5. By default, |
163 |
|
<code>$p->{application_cache_selection}</code> is set to an empty subroutine.</p> |
164 |
|
<p>The subroutine will be invoked with an argument <em>manifest_uri</em>, |
165 |
|
which is set to the manifest URI when the algorithm MUST be invoked |
166 |
|
with a manifest URI, or is set to <code>undef</code> when the algorithm MUST |
167 |
|
be invoked without no manifest URI.</p> |
168 |
|
<p> |
169 |
|
</p> |
170 |
|
<hr /> |
171 |
|
<h1><a name="error_reports">ERROR REPORTS</a></h1> |
172 |
|
<p>@@ TBW</p> |
173 |
|
<p>The list of the error types is available in |
174 |
|
Whatpm Error Types <http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types>.</p> |
175 |
|
<p> |
176 |
|
</p> |
177 |
|
<hr /> |
178 |
<h1><a name="to_do">TO DO</a></h1> |
<h1><a name="to_do">TO DO</a></h1> |
179 |
<p>Tokenizer should emit a sequence of character tokens as one token |
<p>Tokenizer should emit a sequence of character tokens as one token |
180 |
to improve performance.</p> |
to improve performance.</p> |
181 |
<p>A method that accepts a byte stream as an input.</p> |
<p>A method that accepts a byte stream as an input.</p> |
182 |
<p>Charset detection algorithm.</p> |
<p>Charset detection algorithm.</p> |
183 |
<p>Setting inner_html.</p> |
<p>Documentation for the setter of inner_html.</p> |
184 |
<p>And there are many ``TODO''s and ``ISSUE''s in the source code.</p> |
<p>And there are many ``TODO''s and ``ISSUE''s in the source code.</p> |
185 |
<p> |
<p> |
186 |
</p> |
</p> |
187 |
<hr /> |
<hr /> |
188 |
<h1><a name="see_also">SEE ALSO</a></h1> |
<h1><a name="see_also">SEE ALSO</a></h1> |
189 |
<p>Whatpm |
<p>Whatpm <http://suika.fam.cx/www/markup/html/whatpm/readme>.</p> |
190 |
<http://suika.fam.cx/www/markup/html/whatpm/readme></p> |
<p>Whatpm Error Types |
191 |
<p>Web Applications 1.0 Working Draft (aka HTML5) |
<http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types>.</p> |
192 |
<http://whatwg.org/html5>. (Revision 792, 1 May 2007)</p> |
<p>HTML5 <http://whatwg.org/html5>.</p> |
193 |
<p><a href="../Whatpm/NanoDOM.html">the Whatpm::NanoDOM manpage</a></p> |
<p><a href="../Whatpm/NanoDOM.html">the Whatpm::NanoDOM manpage</a>.</p> |
194 |
|
<p><a href="../Whatpm/ContentChecker/HTML.html">the Whatpm::ContentChecker::HTML manpage</a>.</p> |
195 |
<p> |
<p> |
196 |
</p> |
</p> |
197 |
<hr /> |
<hr /> |