/[suikacvs]/markup/html/whatpm/Whatpm/HTML.html
Suika

Contents of /markup/html/whatpm/Whatpm/HTML.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.6 - (hide annotations) (download) (as text)
Sun Nov 4 04:34:30 2007 UTC (17 years ago) by wakaba
Branch: MAIN
Changes since 1.5: +25 -12 lines
File MIME type: text/html
++ whatpm/t/ChangeLog	4 Nov 2007 04:31:01 -0000
	* content-model-2.dat: Tests for |manifest=""| are added.

2007-11-04  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/ContentChecker/ChangeLog	4 Nov 2007 04:32:03 -0000
	* HTML.pm: Allow <html manifest=URI>.

2007-11-04  Wakaba  <wakaba@suika.fam.cx>

1 wakaba 1.1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2     <html xmlns="http://www.w3.org/1999/xhtml">
3     <head>
4 wakaba 1.6 <title>Whatpm::HTML - An HTML Parser and Serializer</title>
5 wakaba 1.1 <link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" />
6     <link rev="made" href="mailto:admin@suika.fam.cx" />
7     </head>
8    
9     <body>
10    
11     <p><a name="__index__"></a></p>
12     <!-- INDEX BEGIN -->
13    
14     <ul>
15    
16     <li><a href="#name">NAME</a></li>
17     <li><a href="#synopsis">SYNOPSIS</a></li>
18     <li><a href="#description">DESCRIPTION</a></li>
19     <li><a href="#methods">METHODS</a></li>
20 wakaba 1.5 <li><a href="#lowlevel_interface">LOW-LEVEL INTERFACE</a></li>
21     <ul>
22    
23     <li><a href="#application_cache_selection_algorithm_hook">Application Cache Selection Algorithm Hook</a></li>
24     </ul>
25    
26 wakaba 1.6 <li><a href="#error_reports">ERROR REPORTS</a></li>
27 wakaba 1.1 <li><a href="#to_do">TO DO</a></li>
28     <li><a href="#see_also">SEE ALSO</a></li>
29     <li><a href="#author">AUTHOR</a></li>
30     <li><a href="#license">LICENSE</a></li>
31     </ul>
32     <!-- INDEX END -->
33    
34     <hr />
35     <p>
36     </p>
37     <h1><a name="name">NAME</a></h1>
38 wakaba 1.6 <p>Whatpm::HTML - An HTML Parser and Serializer</p>
39 wakaba 1.1 <p>
40     </p>
41     <hr />
42     <h1><a name="synopsis">SYNOPSIS</a></h1>
43     <pre>
44 wakaba 1.2 use Whatpm::HTML;
45 wakaba 1.1
46     my $s = q&lt;&lt;!DOCTYPE html&gt;&lt;html&gt;...&lt;/html&gt;&gt;;
47     # $doc = an empty DOM |Document| object
48     my $on_error = sub {
49 wakaba 1.6 my %error = @_;
50     warn $error{type}, &quot;\n&quot;;
51 wakaba 1.1 };
52    
53 wakaba 1.2 Whatpm::HTML-&gt;parse_string ($s =&gt; $doc, $onerror);
54 wakaba 1.1
55 wakaba 1.6 ## Now, |$doc| is the DOM representation of |$s|.</pre>
56 wakaba 1.1 <p>
57     </p>
58     <hr />
59     <h1><a name="description">DESCRIPTION</a></h1>
60 wakaba 1.2 <p>The <code>Whatpm::HTML</code> module contains HTML parser and serializer.</p>
61 wakaba 1.1 <p>The HTML parser can be used to construct the DOM tree representation
62     from an HTML document. The parsing and tree construction are done
63     as described in the Web Application 1.0 specification.</p>
64     <p>The HTML serializer can be used to obtain the HTML document representation
65     of a DOM tree (or a tree fragment thereof). The serialization
66     is performed as described in the Web Applications 1.0 specification
67     for <code>innerHTML</code> DOM attribute.</p>
68 wakaba 1.2 <p>This module is part of Whatpm - Perl Modules for
69 wakaba 1.1 Web Hypertext Application Technologies.</p>
70     <p>
71     </p>
72     <hr />
73     <h1><a name="methods">METHODS</a></h1>
74     <dl>
75 wakaba 1.2 <dt><strong><a name="item_parse_string">[<em>$doc</em> =] Whatpm::HTML-&gt;parse_string (<em>$s</em>, <em>$doc</em>[, <em>$onerror</em>]);</a></strong><br />
76 wakaba 1.1 </dt>
77     <dd>
78     Parse a string <em>$s</em> as an HTML document.
79     </dd>
80     <dd>
81     <p>The first argument, <em>$s</em>, MUST be a string. It is parsed
82     as a sequence of characters representing an HTML document.</p>
83     </dd>
84     <dd>
85     <p>The second argument, <em>$doc</em>, MUST be an empty read-write
86     DOM <code>Document</code> object. The HTML DOM tree is constructed
87     onto this <code>Document</code> object.</p>
88     </dd>
89     <dd>
90     <p>The third argument, <em>$onerror</em>, MUST be a reference to
91     the error handler code. Whenever a parse error is detected,
92     this code is invoked with an argument that contains a
93     useless string that might describe what is wrong.
94     The code MAY throw an exception, so that whole the parsing
95     process aborts. Otherwise, the parser will continue to
96     process the input. The code MUST NOT modify <em>$s</em> or <em>$doc</em>.
97     If it does, then the result is undefined.
98     This argument is optional; if missing, any
99     parse error makes that string being <code>warn</code>ed.</p>
100     </dd>
101     <dd>
102 wakaba 1.3 <p><strong>NOTE</strong>: To be a conforming user agent, the code MUST either
103     abort the processing by throwing an exception at the first
104     invocation or MUST continue the processing until the parser
105     stops.</p>
106     </dd>
107     <dd>
108 wakaba 1.1 <p>The method returns the DOM <code>Document</code> object (i.e. the second argument).</p>
109     </dd>
110     <dd>
111 wakaba 1.2 <p>Note that the <code>Whatpm::NanoDOM</code> module provides a non-conforming
112 wakaba 1.4 implementation of DOM that only implements a subset that
113 wakaba 1.2 is necessary for the purpose of <code>Whatpm::HTML</code>'s parsing and
114 wakaba 1.1 serializing.
115     With this module, creating a new HTML <code>Document</code> object
116 wakaba 1.3 from a string containing HTML document might be coded as:</p>
117 wakaba 1.1 </dd>
118     <dd>
119     <pre>
120 wakaba 1.2 use Whatpm::HTML;
121     use Whatpm::NanoDOM;
122     my $doc = Whatpm::HTML-&gt;parse_string
123     ($s =&gt; Whatpm::NanoDOM::Document-&gt;new, $onerror);</pre>
124 wakaba 1.1 </dd>
125     <p></p>
126 wakaba 1.2 <dt><strong><a name="item_get_inner_html"><em>$s</em> = Whatpm::HTML-&gt;get_inner_html (<em>$node</em>[, <em>$onerror</em>]);</a></strong><br />
127 wakaba 1.1 </dt>
128     <dd>
129     Return the HTML serialization of a DOM node <em>$node</em>.
130     </dd>
131     <dd>
132     <p>The first argument, <em>$node</em>, MUST be a DOM <code>Document</code>,
133 wakaba 1.4 <code>Element</code>, or <code>DocumentFragment</code> node.</p>
134 wakaba 1.1 </dd>
135     <dd>
136     <p>The second argument, <em>$onerror</em>, MUST be a reference to the
137     error handling code. This code will be invoked if a descendant
138 wakaba 1.4 of <em>$node</em> is neither of <code>Element</code>, <code>Text</code>, <code>CDATASection</code>,
139     <code>Comment</code>, <code>DocumentType</code>, nor <code>EntityReference</code>, so
140     that an <code>INVALID_STATE_ERR</code> exception MUST be thrown.
141 wakaba 1.1 The code will be invoked with an argument, which is the node
142     whose type is invalid.
143 wakaba 1.4 The argument <em>$onerror</em> is optional; if missing, any erroneous
144 wakaba 1.1 node is simply ignored.</p>
145     </dd>
146     <dd>
147 wakaba 1.2 <p>The method returns a reference to the <code>inner_html</code> attribute
148 wakaba 1.3 value, i.e. the HTML serialization of the <em>$node</em>.</p>
149 wakaba 1.1 </dd>
150     <p></p></dl>
151 wakaba 1.5 <p>
152     </p>
153     <hr />
154     <h1><a name="lowlevel_interface">LOW-LEVEL INTERFACE</a></h1>
155     <p>@@ TBW</p>
156     <p>
157     </p>
158     <h2><a name="application_cache_selection_algorithm_hook">Application Cache Selection Algorithm Hook</a></h2>
159     <p>Once a parser <em>$p</em> is instantiated by method <code>new</code>,
160 wakaba 1.6 a <code>CODE</code> reference can be set to <code>$p-&gt;{application_cache_selection}</code>.
161 wakaba 1.5 That <code>CODE</code> will be called back when the application cache selection
162     algorithm MUST be run per HTML5. By default,
163 wakaba 1.6 <code>$p-&gt;{application_cache_selection}</code> is set to an empty subroutine.</p>
164     <p>The subroutine will be invoked with an argument <em>manifest_uri</em>,
165     which is set to the manifest URI when the algorithm MUST be invoked
166     with a manifest URI, or is set to <code>undef</code> when the algorithm MUST
167     be invoked without no manifest URI.</p>
168     <p>
169     </p>
170     <hr />
171     <h1><a name="error_reports">ERROR REPORTS</a></h1>
172     <p>@@ TBW</p>
173     <p>The list of the error types is available in
174     Whatpm Error Types &lt;http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types&gt;.</p>
175 wakaba 1.1 <p>
176     </p>
177     <hr />
178     <h1><a name="to_do">TO DO</a></h1>
179     <p>Tokenizer should emit a sequence of character tokens as one token
180     to improve performance.</p>
181     <p>A method that accepts a byte stream as an input.</p>
182     <p>Charset detection algorithm.</p>
183 wakaba 1.4 <p>Documentation for the setter of inner_html.</p>
184 wakaba 1.1 <p>And there are many ``TODO''s and ``ISSUE''s in the source code.</p>
185     <p>
186     </p>
187     <hr />
188     <h1><a name="see_also">SEE ALSO</a></h1>
189 wakaba 1.6 <p>Whatpm &lt;http://suika.fam.cx/www/markup/html/whatpm/readme&gt;.</p>
190     <p>Whatpm Error Types
191     &lt;http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types&gt;.</p>
192     <p>HTML5 &lt;http://whatwg.org/html5&gt;.</p>
193     <p><a href="../Whatpm/NanoDOM.html">the Whatpm::NanoDOM manpage</a>.</p>
194     <p><a href="../Whatpm/ContentChecker/HTML.html">the Whatpm::ContentChecker::HTML manpage</a>.</p>
195 wakaba 1.1 <p>
196     </p>
197     <hr />
198     <h1><a name="author">AUTHOR</a></h1>
199     <p>Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;.</p>
200     <p>
201     </p>
202     <hr />
203     <h1><a name="license">LICENSE</a></h1>
204     <p>Copyright 2007 Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;</p>
205     <p>This library is free software; you can redistribute it
206     and/or modify it under the same terms as Perl itself.</p>
207    
208     </body>
209    
210     </html>

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24