/[suikacvs]/markup/html/whatpm/Whatpm/HTML.html
Suika

Contents of /markup/html/whatpm/Whatpm/HTML.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.8 - (hide annotations) (download) (as text)
Sun Nov 11 08:39:42 2007 UTC (16 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
Changes since 1.7: +13 -3 lines
File MIME type: text/html
++ whatpm/Whatpm/ChangeLog	11 Nov 2007 08:39:39 -0000
	* HTML.pm.src (parse_char_string): Set |inner_encoding|
	attribute if possible.

2007-11-11  Wakaba  <wakaba@suika.fam.cx>

1 wakaba 1.1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2     <html xmlns="http://www.w3.org/1999/xhtml">
3     <head>
4 wakaba 1.6 <title>Whatpm::HTML - An HTML Parser and Serializer</title>
5 wakaba 1.1 <link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" />
6     <link rev="made" href="mailto:admin@suika.fam.cx" />
7     </head>
8    
9     <body>
10    
11     <p><a name="__index__"></a></p>
12     <!-- INDEX BEGIN -->
13    
14     <ul>
15    
16     <li><a href="#name">NAME</a></li>
17     <li><a href="#synopsis">SYNOPSIS</a></li>
18     <li><a href="#description">DESCRIPTION</a></li>
19     <li><a href="#methods">METHODS</a></li>
20 wakaba 1.5 <li><a href="#lowlevel_interface">LOW-LEVEL INTERFACE</a></li>
21     <ul>
22    
23     <li><a href="#application_cache_selection_algorithm_hook">Application Cache Selection Algorithm Hook</a></li>
24     </ul>
25    
26 wakaba 1.6 <li><a href="#error_reports">ERROR REPORTS</a></li>
27 wakaba 1.1 <li><a href="#to_do">TO DO</a></li>
28 wakaba 1.8 <li><a href="#dependency">DEPENDENCY</a></li>
29 wakaba 1.1 <li><a href="#see_also">SEE ALSO</a></li>
30     <li><a href="#author">AUTHOR</a></li>
31     <li><a href="#license">LICENSE</a></li>
32     </ul>
33     <!-- INDEX END -->
34    
35     <hr />
36     <p>
37     </p>
38     <h1><a name="name">NAME</a></h1>
39 wakaba 1.6 <p>Whatpm::HTML - An HTML Parser and Serializer</p>
40 wakaba 1.1 <p>
41     </p>
42     <hr />
43     <h1><a name="synopsis">SYNOPSIS</a></h1>
44     <pre>
45 wakaba 1.2 use Whatpm::HTML;
46 wakaba 1.1
47     my $s = q&lt;&lt;!DOCTYPE html&gt;&lt;html&gt;...&lt;/html&gt;&gt;;
48     # $doc = an empty DOM |Document| object
49     my $on_error = sub {
50 wakaba 1.6 my %error = @_;
51     warn $error{type}, &quot;\n&quot;;
52 wakaba 1.1 };
53    
54 wakaba 1.8 Whatpm::HTML-&gt;parse_char_string ($s =&gt; $doc, $onerror);
55 wakaba 1.1
56 wakaba 1.6 ## Now, |$doc| is the DOM representation of |$s|.</pre>
57 wakaba 1.1 <p>
58     </p>
59     <hr />
60     <h1><a name="description">DESCRIPTION</a></h1>
61 wakaba 1.2 <p>The <code>Whatpm::HTML</code> module contains HTML parser and serializer.</p>
62 wakaba 1.1 <p>The HTML parser can be used to construct the DOM tree representation
63     from an HTML document. The parsing and tree construction are done
64     as described in the Web Application 1.0 specification.</p>
65     <p>The HTML serializer can be used to obtain the HTML document representation
66     of a DOM tree (or a tree fragment thereof). The serialization
67     is performed as described in the Web Applications 1.0 specification
68     for <code>innerHTML</code> DOM attribute.</p>
69 wakaba 1.2 <p>This module is part of Whatpm - Perl Modules for
70 wakaba 1.1 Web Hypertext Application Technologies.</p>
71     <p>
72     </p>
73     <hr />
74     <h1><a name="methods">METHODS</a></h1>
75     <dl>
76 wakaba 1.8 <dt><strong><a name="item_parse_char_string">[<em>$doc</em> =] Whatpm::HTML-&gt;parse_char_string (<em>$s</em>, <em>$doc</em>[, <em>$onerror</em>]);</a></strong><br />
77 wakaba 1.1 </dt>
78     <dd>
79     Parse a string <em>$s</em> as an HTML document.
80     </dd>
81     <dd>
82     <p>The first argument, <em>$s</em>, MUST be a string. It is parsed
83     as a sequence of characters representing an HTML document.</p>
84     </dd>
85     <dd>
86     <p>The second argument, <em>$doc</em>, MUST be an empty read-write
87     DOM <code>Document</code> object. The HTML DOM tree is constructed
88     onto this <code>Document</code> object.</p>
89     </dd>
90     <dd>
91     <p>The third argument, <em>$onerror</em>, MUST be a reference to
92     the error handler code. Whenever a parse error is detected,
93     this code is invoked with an argument that contains a
94     useless string that might describe what is wrong.
95     The code MAY throw an exception, so that whole the parsing
96     process aborts. Otherwise, the parser will continue to
97     process the input. The code MUST NOT modify <em>$s</em> or <em>$doc</em>.
98     If it does, then the result is undefined.
99     This argument is optional; if missing, any
100     parse error makes that string being <code>warn</code>ed.</p>
101     </dd>
102     <dd>
103 wakaba 1.3 <p><strong>NOTE</strong>: To be a conforming user agent, the code MUST either
104     abort the processing by throwing an exception at the first
105     invocation or MUST continue the processing until the parser
106     stops.</p>
107     </dd>
108     <dd>
109 wakaba 1.1 <p>The method returns the DOM <code>Document</code> object (i.e. the second argument).</p>
110     </dd>
111     <dd>
112 wakaba 1.2 <p>Note that the <code>Whatpm::NanoDOM</code> module provides a non-conforming
113 wakaba 1.4 implementation of DOM that only implements a subset that
114 wakaba 1.2 is necessary for the purpose of <code>Whatpm::HTML</code>'s parsing and
115 wakaba 1.1 serializing.
116     With this module, creating a new HTML <code>Document</code> object
117 wakaba 1.3 from a string containing HTML document might be coded as:</p>
118 wakaba 1.1 </dd>
119     <dd>
120     <pre>
121 wakaba 1.2 use Whatpm::HTML;
122     use Whatpm::NanoDOM;
123 wakaba 1.8 my $doc = Whatpm::HTML-&gt;parse_char_string
124 wakaba 1.2 ($s =&gt; Whatpm::NanoDOM::Document-&gt;new, $onerror);</pre>
125 wakaba 1.1 </dd>
126     <p></p></dl>
127 wakaba 1.5 <p>
128     </p>
129     <hr />
130     <h1><a name="lowlevel_interface">LOW-LEVEL INTERFACE</a></h1>
131     <p>@@ TBW</p>
132     <p>
133     </p>
134     <h2><a name="application_cache_selection_algorithm_hook">Application Cache Selection Algorithm Hook</a></h2>
135     <p>Once a parser <em>$p</em> is instantiated by method <code>new</code>,
136 wakaba 1.6 a <code>CODE</code> reference can be set to <code>$p-&gt;{application_cache_selection}</code>.
137 wakaba 1.5 That <code>CODE</code> will be called back when the application cache selection
138     algorithm MUST be run per HTML5. By default,
139 wakaba 1.6 <code>$p-&gt;{application_cache_selection}</code> is set to an empty subroutine.</p>
140     <p>The subroutine will be invoked with an argument <em>manifest_uri</em>,
141     which is set to the manifest URI when the algorithm MUST be invoked
142     with a manifest URI, or is set to <code>undef</code> when the algorithm MUST
143     be invoked without no manifest URI.</p>
144     <p>
145     </p>
146     <hr />
147     <h1><a name="error_reports">ERROR REPORTS</a></h1>
148     <p>@@ TBW</p>
149     <p>The list of the error types is available in
150     Whatpm Error Types &lt;http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types&gt;.</p>
151 wakaba 1.1 <p>
152     </p>
153     <hr />
154     <h1><a name="to_do">TO DO</a></h1>
155 wakaba 1.8 <p>Documentation for parse_byte_string.</p>
156 wakaba 1.1 <p>Tokenizer should emit a sequence of character tokens as one token
157     to improve performance.</p>
158     <p>A method that accepts a byte stream as an input.</p>
159     <p>Charset detection algorithm.</p>
160 wakaba 1.4 <p>Documentation for the setter of inner_html.</p>
161 wakaba 1.1 <p>And there are many ``TODO''s and ``ISSUE''s in the source code.</p>
162 wakaba 1.8 <p>
163     </p>
164     <hr />
165     <h1><a name="dependency">DEPENDENCY</a></h1>
166     <p>This module requires <em>Error</em>. That module is available at CPAN
167     &lt;http://search.cpan.org/author/SHLOMIF/Error-0.17009/lib/Error.pm&gt;.
168     It is also part of manakai-core distribution
169     &lt;http://suika.fam.cx/www/2006/manakai/&gt;.</p>
170 wakaba 1.1 <p>
171     </p>
172     <hr />
173     <h1><a name="see_also">SEE ALSO</a></h1>
174 wakaba 1.6 <p>Whatpm &lt;http://suika.fam.cx/www/markup/html/whatpm/readme&gt;.</p>
175     <p>Whatpm Error Types
176     &lt;http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types&gt;.</p>
177     <p>HTML5 &lt;http://whatwg.org/html5&gt;.</p>
178 wakaba 1.7 <p><a href="../Whatpm/HTML/Serializer.html">the Whatpm::HTML::Serializer manpage</a>.</p>
179 wakaba 1.6 <p><a href="../Whatpm/NanoDOM.html">the Whatpm::NanoDOM manpage</a>.</p>
180     <p><a href="../Whatpm/ContentChecker/HTML.html">the Whatpm::ContentChecker::HTML manpage</a>.</p>
181 wakaba 1.1 <p>
182     </p>
183     <hr />
184     <h1><a name="author">AUTHOR</a></h1>
185     <p>Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;.</p>
186     <p>
187     </p>
188     <hr />
189     <h1><a name="license">LICENSE</a></h1>
190     <p>Copyright 2007 Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;</p>
191     <p>This library is free software; you can redistribute it
192     and/or modify it under the same terms as Perl itself.</p>
193    
194     </body>
195    
196     </html>

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24