/[suikacvs]/markup/html/whatpm/Whatpm/HTML.html
Suika

Diff of /markup/html/whatpm/Whatpm/HTML.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.1 by wakaba, Tue May 1 10:36:06 2007 UTC revision 1.6 by wakaba, Sun Nov 4 04:34:30 2007 UTC
# Line 1  Line 1 
1  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2  <html xmlns="http://www.w3.org/1999/xhtml">  <html xmlns="http://www.w3.org/1999/xhtml">
3  <head>  <head>
4  <title>What::HTML - An HTML Parser</title>  <title>Whatpm::HTML - An HTML Parser and Serializer</title>
5  <link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" />  <link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" />
6  <link rev="made" href="mailto:admin@suika.fam.cx" />  <link rev="made" href="mailto:admin@suika.fam.cx" />
7  </head>  </head>
# Line 17  Line 17 
17          <li><a href="#synopsis">SYNOPSIS</a></li>          <li><a href="#synopsis">SYNOPSIS</a></li>
18          <li><a href="#description">DESCRIPTION</a></li>          <li><a href="#description">DESCRIPTION</a></li>
19          <li><a href="#methods">METHODS</a></li>          <li><a href="#methods">METHODS</a></li>
20            <li><a href="#lowlevel_interface">LOW-LEVEL INTERFACE</a></li>
21            <ul>
22    
23                    <li><a href="#application_cache_selection_algorithm_hook">Application Cache Selection Algorithm Hook</a></li>
24            </ul>
25    
26            <li><a href="#error_reports">ERROR REPORTS</a></li>
27          <li><a href="#to_do">TO DO</a></li>          <li><a href="#to_do">TO DO</a></li>
28          <li><a href="#see_also">SEE ALSO</a></li>          <li><a href="#see_also">SEE ALSO</a></li>
29          <li><a href="#author">AUTHOR</a></li>          <li><a href="#author">AUTHOR</a></li>
# Line 28  Line 35 
35  <p>  <p>
36  </p>  </p>
37  <h1><a name="name">NAME</a></h1>  <h1><a name="name">NAME</a></h1>
38  <p>What::HTML - An HTML Parser</p>  <p>Whatpm::HTML - An HTML Parser and Serializer</p>
39  <p>  <p>
40  </p>  </p>
41  <hr />  <hr />
42  <h1><a name="synopsis">SYNOPSIS</a></h1>  <h1><a name="synopsis">SYNOPSIS</a></h1>
43  <pre>  <pre>
44    use What::HTML;    use Whatpm::HTML;
45        
46    my $s = q&lt;&lt;!DOCTYPE html&gt;&lt;html&gt;...&lt;/html&gt;&gt;;    my $s = q&lt;&lt;!DOCTYPE html&gt;&lt;html&gt;...&lt;/html&gt;&gt;;
47    # $doc = an empty DOM |Document| object    # $doc = an empty DOM |Document| object
48    my $on_error = sub {    my $on_error = sub {
49      my $error_code = shift;      my %error = @_;
50      warn $error_code, &quot;\n&quot;;      warn $error{type}, &quot;\n&quot;;
51    };    };
52        
53    What::HTML-&gt;parse_string ($s =&gt; $doc, $onerror);    Whatpm::HTML-&gt;parse_string ($s =&gt; $doc, $onerror);
54        
55    ## Then, |$doc| is the DOM representation of |$s|.</pre>    ## Now, |$doc| is the DOM representation of |$s|.</pre>
56  <p>  <p>
57  </p>  </p>
58  <hr />  <hr />
59  <h1><a name="description">DESCRIPTION</a></h1>  <h1><a name="description">DESCRIPTION</a></h1>
60  <p>The <code>What::HTML</code> module contains HTML parser and serializer.</p>  <p>The <code>Whatpm::HTML</code> module contains HTML parser and serializer.</p>
61  <p>The HTML parser can be used to construct the DOM tree representation  <p>The HTML parser can be used to construct the DOM tree representation
62  from an HTML document.  The parsing and tree construction are done  from an HTML document.  The parsing and tree construction are done
63  as described in the Web Application 1.0 specification.</p>  as described in the Web Application 1.0 specification.</p>
# Line 58  as described in the Web Application 1.0 Line 65  as described in the Web Application 1.0
65  of a DOM tree (or a tree fragment thereof).  The serialization  of a DOM tree (or a tree fragment thereof).  The serialization
66  is performed as described in the Web Applications 1.0 specification  is performed as described in the Web Applications 1.0 specification
67  for <code>innerHTML</code> DOM attribute.</p>  for <code>innerHTML</code> DOM attribute.</p>
68  <p>This module is part of WHAT.pm - Perl Modules for  <p>This module is part of Whatpm - Perl Modules for
69  Web Hypertext Application Technologies.</p>  Web Hypertext Application Technologies.</p>
70  <p>  <p>
71  </p>  </p>
72  <hr />  <hr />
73  <h1><a name="methods">METHODS</a></h1>  <h1><a name="methods">METHODS</a></h1>
74  <dl>  <dl>
75  <dt><strong><a name="item_parse_string">[<em>$doc</em> =] What::HTML-&gt;parse_string (<em>$s</em>, <em>$doc</em>[, <em>$onerror</em>]);</a></strong><br />  <dt><strong><a name="item_parse_string">[<em>$doc</em> =] Whatpm::HTML-&gt;parse_string (<em>$s</em>, <em>$doc</em>[, <em>$onerror</em>]);</a></strong><br />
76  </dt>  </dt>
77  <dd>  <dd>
78  Parse a string <em>$s</em> as an HTML document.  Parse a string <em>$s</em> as an HTML document.
# Line 92  This argument is optional; if missing, a Line 99  This argument is optional; if missing, a
99  parse error makes that string being <code>warn</code>ed.</p>  parse error makes that string being <code>warn</code>ed.</p>
100  </dd>  </dd>
101  <dd>  <dd>
102    <p><strong>NOTE</strong>: To be a conforming user agent, the code MUST either
103    abort the processing by throwing an exception at the first
104    invocation or MUST continue the processing until the parser
105    stops.</p>
106    </dd>
107    <dd>
108  <p>The method returns the DOM <code>Document</code> object (i.e. the second argument).</p>  <p>The method returns the DOM <code>Document</code> object (i.e. the second argument).</p>
109  </dd>  </dd>
110  <dd>  <dd>
111  <p>Note that the <code>What::NanoDOM</code> module provides a non-conforming  <p>Note that the <code>Whatpm::NanoDOM</code> module provides a non-conforming
112  implementation of DOM that only implements the subset that  implementation of DOM that only implements a subset that
113  is necessary for the purpose of <code>What::HTML</code>'s parsing and  is necessary for the purpose of <code>Whatpm::HTML</code>'s parsing and
114  serializing.  serializing.
115  With this module, creating a new HTML <code>Document</code> object  With this module, creating a new HTML <code>Document</code> object
116  from a string containing HTML document can be coded as:</p>  from a string containing HTML document might be coded as:</p>
117  </dd>  </dd>
118  <dd>  <dd>
119  <pre>  <pre>
120    use What::HTML;    use Whatpm::HTML;
121    use What::NanoDOM;    use Whatpm::NanoDOM;
122    my $doc = What::HTML-&gt;parse_string ($s =&gt; What::NanoDOM-&gt;new, $onerror);</pre>    my $doc = Whatpm::HTML-&gt;parse_string
123          ($s =&gt; Whatpm::NanoDOM::Document-&gt;new, $onerror);</pre>
124  </dd>  </dd>
125  <p></p>  <p></p>
126  <dt><strong><a name="item_get_inner_html"><em>$s</em> = What::HTML-&gt;get_inner_html (<em>$node</em>[, <em>$onerror</em>]);</a></strong><br />  <dt><strong><a name="item_get_inner_html"><em>$s</em> = Whatpm::HTML-&gt;get_inner_html (<em>$node</em>[, <em>$onerror</em>]);</a></strong><br />
127  </dt>  </dt>
128  <dd>  <dd>
129  Return the HTML serialization of a DOM node <em>$node</em>.  Return the HTML serialization of a DOM node <em>$node</em>.
130  </dd>  </dd>
131  <dd>  <dd>
132  <p>The first argument, <em>$node</em>, MUST be a DOM <code>Document</code>,  <p>The first argument, <em>$node</em>, MUST be a DOM <code>Document</code>,
133  <code>Node</code>, or <code>DocumentFragment</code> object.</p>  <code>Element</code>, or <code>DocumentFragment</code> node.</p>
134  </dd>  </dd>
135  <dd>  <dd>
136  <p>The second argument, <em>$onerror</em>, MUST be a reference to the  <p>The second argument, <em>$onerror</em>, MUST be a reference to the
137  error handling code.  This code will be invoked if a descendant  error handling code.  This code will be invoked if a descendant
138  of <code>$node</code> is not of <code>Element</code>, <code>Text</code>, <code>CDATASection</code>,  of <em>$node</em> is neither of <code>Element</code>, <code>Text</code>, <code>CDATASection</code>,
139  <code>Comment</code>, <code>DocumentType</code>, or <code>EntityReference</code> so  <code>Comment</code>, <code>DocumentType</code>, nor <code>EntityReference</code>, so
140  that <code>INVALID_STATE_ERR</code> MUST be thrown.  that an <code>INVALID_STATE_ERR</code> exception MUST be thrown.
141  The code will be invoked with an argument, which is the node  The code will be invoked with an argument, which is the node
142  whose type is invalid.    whose type is invalid.  
143  This argument is optional; if missing, any such  The argument <em>$onerror</em> is optional; if missing, any erroneous
144  node is simply ignored.</p>  node is simply ignored.</p>
145  </dd>  </dd>
146  <dd>  <dd>
147  <p>The method returns the <code>inner_html</code> attribute  <p>The method returns a reference to the <code>inner_html</code> attribute
148  value, i.e. the HTML serialization of the <code>$node</code>.</p>  value, i.e. the HTML serialization of the <em>$node</em>.</p>
149  </dd>  </dd>
150  <p></p></dl>  <p></p></dl>
151  <p>  <p>
152  </p>  </p>
153  <hr />  <hr />
154    <h1><a name="lowlevel_interface">LOW-LEVEL INTERFACE</a></h1>
155    <p>@@ TBW</p>
156    <p>
157    </p>
158    <h2><a name="application_cache_selection_algorithm_hook">Application Cache Selection Algorithm Hook</a></h2>
159    <p>Once a parser <em>$p</em> is instantiated by method <code>new</code>,
160    a <code>CODE</code> reference can be set to <code>$p-&gt;{application_cache_selection}</code>.
161    That <code>CODE</code> will be called back when the application cache selection
162    algorithm MUST be run per HTML5.  By default,
163    <code>$p-&gt;{application_cache_selection}</code> is set to an empty subroutine.</p>
164    <p>The subroutine will be invoked with an argument <em>manifest_uri</em>,
165    which is set to the manifest URI when the algorithm MUST be invoked
166    with a manifest URI, or is set to <code>undef</code> when the algorithm MUST
167    be invoked without no manifest URI.</p>
168    <p>
169    </p>
170    <hr />
171    <h1><a name="error_reports">ERROR REPORTS</a></h1>
172    <p>@@ TBW</p>
173    <p>The list of the error types is available in
174    Whatpm Error Types &lt;http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types&gt;.</p>
175    <p>
176    </p>
177    <hr />
178  <h1><a name="to_do">TO DO</a></h1>  <h1><a name="to_do">TO DO</a></h1>
179  <p>Tokenizer should emit a sequence of character tokens as one token  <p>Tokenizer should emit a sequence of character tokens as one token
180  to improve performance.</p>  to improve performance.</p>
181  <p>A method that accepts a byte stream as an input.</p>  <p>A method that accepts a byte stream as an input.</p>
182  <p>Charset detection algorithm.</p>  <p>Charset detection algorithm.</p>
183  <p>Setting inner_html.</p>  <p>Documentation for the setter of inner_html.</p>
184  <p>And there are many ``TODO''s and ``ISSUE''s in the source code.</p>  <p>And there are many ``TODO''s and ``ISSUE''s in the source code.</p>
185  <p>  <p>
186  </p>  </p>
187  <hr />  <hr />
188  <h1><a name="see_also">SEE ALSO</a></h1>  <h1><a name="see_also">SEE ALSO</a></h1>
189  <p>Web Applications 1.0 Working Draft (aka HTML5)  <p>Whatpm &lt;http://suika.fam.cx/www/markup/html/whatpm/readme&gt;.</p>
190  &lt;http://whatwg.org/html5&gt;.  (Revision 792, 1 May 2007)</p>  <p>Whatpm Error Types
191  <p><a href="../What/NanoDOM.html">the What::NanoDOM manpage</a></p>  &lt;http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types&gt;.</p>
192    <p>HTML5 &lt;http://whatwg.org/html5&gt;.</p>
193    <p><a href="../Whatpm/NanoDOM.html">the Whatpm::NanoDOM manpage</a>.</p>
194    <p><a href="../Whatpm/ContentChecker/HTML.html">the Whatpm::ContentChecker::HTML manpage</a>.</p>
195  <p>  <p>
196  </p>  </p>
197  <hr />  <hr />

Legend:
Removed from v.1.1  
changed lines
  Added in v.1.6

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24