/[suikacvs]/markup/html/whatpm/Whatpm/HTML.html
Suika

Contents of /markup/html/whatpm/Whatpm/HTML.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.5 - (hide annotations) (download) (as text)
Sun Nov 4 04:15:06 2007 UTC (17 years ago) by wakaba
Branch: MAIN
Changes since 1.4: +19 -0 lines
File MIME type: text/html
++ whatpm/Whatpm/ChangeLog	4 Nov 2007 04:14:45 -0000
	* HTML.pm.src: Support for application cache selection algorithm
	callback.

2007-11-04  Wakaba  <wakaba@suika.fam.cx>

1 wakaba 1.1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2     <html xmlns="http://www.w3.org/1999/xhtml">
3     <head>
4 wakaba 1.2 <title>Whatpm::HTML - An HTML Parser</title>
5 wakaba 1.1 <link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" />
6     <link rev="made" href="mailto:admin@suika.fam.cx" />
7     </head>
8    
9     <body>
10    
11     <p><a name="__index__"></a></p>
12     <!-- INDEX BEGIN -->
13    
14     <ul>
15    
16     <li><a href="#name">NAME</a></li>
17     <li><a href="#synopsis">SYNOPSIS</a></li>
18     <li><a href="#description">DESCRIPTION</a></li>
19     <li><a href="#methods">METHODS</a></li>
20 wakaba 1.5 <li><a href="#lowlevel_interface">LOW-LEVEL INTERFACE</a></li>
21     <ul>
22    
23     <li><a href="#application_cache_selection_algorithm_hook">Application Cache Selection Algorithm Hook</a></li>
24     </ul>
25    
26 wakaba 1.1 <li><a href="#to_do">TO DO</a></li>
27     <li><a href="#see_also">SEE ALSO</a></li>
28     <li><a href="#author">AUTHOR</a></li>
29     <li><a href="#license">LICENSE</a></li>
30     </ul>
31     <!-- INDEX END -->
32    
33     <hr />
34     <p>
35     </p>
36     <h1><a name="name">NAME</a></h1>
37 wakaba 1.2 <p>Whatpm::HTML - An HTML Parser</p>
38 wakaba 1.1 <p>
39     </p>
40     <hr />
41     <h1><a name="synopsis">SYNOPSIS</a></h1>
42     <pre>
43 wakaba 1.2 use Whatpm::HTML;
44 wakaba 1.1
45     my $s = q&lt;&lt;!DOCTYPE html&gt;&lt;html&gt;...&lt;/html&gt;&gt;;
46     # $doc = an empty DOM |Document| object
47     my $on_error = sub {
48     my $error_code = shift;
49     warn $error_code, &quot;\n&quot;;
50     };
51    
52 wakaba 1.2 Whatpm::HTML-&gt;parse_string ($s =&gt; $doc, $onerror);
53 wakaba 1.1
54     ## Then, |$doc| is the DOM representation of |$s|.</pre>
55     <p>
56     </p>
57     <hr />
58     <h1><a name="description">DESCRIPTION</a></h1>
59 wakaba 1.2 <p>The <code>Whatpm::HTML</code> module contains HTML parser and serializer.</p>
60 wakaba 1.1 <p>The HTML parser can be used to construct the DOM tree representation
61     from an HTML document. The parsing and tree construction are done
62     as described in the Web Application 1.0 specification.</p>
63     <p>The HTML serializer can be used to obtain the HTML document representation
64     of a DOM tree (or a tree fragment thereof). The serialization
65     is performed as described in the Web Applications 1.0 specification
66     for <code>innerHTML</code> DOM attribute.</p>
67 wakaba 1.2 <p>This module is part of Whatpm - Perl Modules for
68 wakaba 1.1 Web Hypertext Application Technologies.</p>
69     <p>
70     </p>
71     <hr />
72     <h1><a name="methods">METHODS</a></h1>
73     <dl>
74 wakaba 1.2 <dt><strong><a name="item_parse_string">[<em>$doc</em> =] Whatpm::HTML-&gt;parse_string (<em>$s</em>, <em>$doc</em>[, <em>$onerror</em>]);</a></strong><br />
75 wakaba 1.1 </dt>
76     <dd>
77     Parse a string <em>$s</em> as an HTML document.
78     </dd>
79     <dd>
80     <p>The first argument, <em>$s</em>, MUST be a string. It is parsed
81     as a sequence of characters representing an HTML document.</p>
82     </dd>
83     <dd>
84     <p>The second argument, <em>$doc</em>, MUST be an empty read-write
85     DOM <code>Document</code> object. The HTML DOM tree is constructed
86     onto this <code>Document</code> object.</p>
87     </dd>
88     <dd>
89     <p>The third argument, <em>$onerror</em>, MUST be a reference to
90     the error handler code. Whenever a parse error is detected,
91     this code is invoked with an argument that contains a
92     useless string that might describe what is wrong.
93     The code MAY throw an exception, so that whole the parsing
94     process aborts. Otherwise, the parser will continue to
95     process the input. The code MUST NOT modify <em>$s</em> or <em>$doc</em>.
96     If it does, then the result is undefined.
97     This argument is optional; if missing, any
98     parse error makes that string being <code>warn</code>ed.</p>
99     </dd>
100     <dd>
101 wakaba 1.3 <p><strong>NOTE</strong>: To be a conforming user agent, the code MUST either
102     abort the processing by throwing an exception at the first
103     invocation or MUST continue the processing until the parser
104     stops.</p>
105     </dd>
106     <dd>
107 wakaba 1.1 <p>The method returns the DOM <code>Document</code> object (i.e. the second argument).</p>
108     </dd>
109     <dd>
110 wakaba 1.2 <p>Note that the <code>Whatpm::NanoDOM</code> module provides a non-conforming
111 wakaba 1.4 implementation of DOM that only implements a subset that
112 wakaba 1.2 is necessary for the purpose of <code>Whatpm::HTML</code>'s parsing and
113 wakaba 1.1 serializing.
114     With this module, creating a new HTML <code>Document</code> object
115 wakaba 1.3 from a string containing HTML document might be coded as:</p>
116 wakaba 1.1 </dd>
117     <dd>
118     <pre>
119 wakaba 1.2 use Whatpm::HTML;
120     use Whatpm::NanoDOM;
121     my $doc = Whatpm::HTML-&gt;parse_string
122     ($s =&gt; Whatpm::NanoDOM::Document-&gt;new, $onerror);</pre>
123 wakaba 1.1 </dd>
124     <p></p>
125 wakaba 1.2 <dt><strong><a name="item_get_inner_html"><em>$s</em> = Whatpm::HTML-&gt;get_inner_html (<em>$node</em>[, <em>$onerror</em>]);</a></strong><br />
126 wakaba 1.1 </dt>
127     <dd>
128     Return the HTML serialization of a DOM node <em>$node</em>.
129     </dd>
130     <dd>
131     <p>The first argument, <em>$node</em>, MUST be a DOM <code>Document</code>,
132 wakaba 1.4 <code>Element</code>, or <code>DocumentFragment</code> node.</p>
133 wakaba 1.1 </dd>
134     <dd>
135     <p>The second argument, <em>$onerror</em>, MUST be a reference to the
136     error handling code. This code will be invoked if a descendant
137 wakaba 1.4 of <em>$node</em> is neither of <code>Element</code>, <code>Text</code>, <code>CDATASection</code>,
138     <code>Comment</code>, <code>DocumentType</code>, nor <code>EntityReference</code>, so
139     that an <code>INVALID_STATE_ERR</code> exception MUST be thrown.
140 wakaba 1.1 The code will be invoked with an argument, which is the node
141     whose type is invalid.
142 wakaba 1.4 The argument <em>$onerror</em> is optional; if missing, any erroneous
143 wakaba 1.1 node is simply ignored.</p>
144     </dd>
145     <dd>
146 wakaba 1.2 <p>The method returns a reference to the <code>inner_html</code> attribute
147 wakaba 1.3 value, i.e. the HTML serialization of the <em>$node</em>.</p>
148 wakaba 1.1 </dd>
149     <p></p></dl>
150 wakaba 1.5 <p>
151     </p>
152     <hr />
153     <h1><a name="lowlevel_interface">LOW-LEVEL INTERFACE</a></h1>
154     <p>@@ TBW</p>
155     <p>
156     </p>
157     <h2><a name="application_cache_selection_algorithm_hook">Application Cache Selection Algorithm Hook</a></h2>
158     <p>Once a parser <em>$p</em> is instantiated by method <code>new</code>,
159     a <code>CODE</code> reference can be set to <code>$p-</code>{application_cache_selection}&gt;.
160     That <code>CODE</code> will be called back when the application cache selection
161     algorithm MUST be run per HTML5. By default,
162     <code>$p-</code>{application_cache_selection}&gt; is set to an empty subroutine.</p>
163 wakaba 1.1 <p>
164     </p>
165     <hr />
166     <h1><a name="to_do">TO DO</a></h1>
167     <p>Tokenizer should emit a sequence of character tokens as one token
168     to improve performance.</p>
169     <p>A method that accepts a byte stream as an input.</p>
170     <p>Charset detection algorithm.</p>
171 wakaba 1.4 <p>Documentation for the setter of inner_html.</p>
172 wakaba 1.1 <p>And there are many ``TODO''s and ``ISSUE''s in the source code.</p>
173     <p>
174     </p>
175     <hr />
176     <h1><a name="see_also">SEE ALSO</a></h1>
177 wakaba 1.3 <p>Whatpm
178     &lt;http://suika.fam.cx/www/markup/html/whatpm/readme&gt;</p>
179 wakaba 1.1 <p>Web Applications 1.0 Working Draft (aka HTML5)
180     &lt;http://whatwg.org/html5&gt;. (Revision 792, 1 May 2007)</p>
181 wakaba 1.2 <p><a href="../Whatpm/NanoDOM.html">the Whatpm::NanoDOM manpage</a></p>
182 wakaba 1.1 <p>
183     </p>
184     <hr />
185     <h1><a name="author">AUTHOR</a></h1>
186     <p>Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;.</p>
187     <p>
188     </p>
189     <hr />
190     <h1><a name="license">LICENSE</a></h1>
191     <p>Copyright 2007 Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;</p>
192     <p>This library is free software; you can redistribute it
193     and/or modify it under the same terms as Perl itself.</p>
194    
195     </body>
196    
197     </html>

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24