/[suikacvs]/markup/html/whatpm/Whatpm/HTML.html
Suika

Contents of /markup/html/whatpm/Whatpm/HTML.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.3 - (hide annotations) (download) (as text)
Wed May 2 13:44:33 2007 UTC (17 years, 6 months ago) by wakaba
Branch: MAIN
Changes since 1.2: +11 -3 lines
File MIME type: text/html
++ ChangeLog	2 May 2007 13:37:34 -0000
2007-05-02  Wakaba  <wakaba@suika.fam.cx>

	* readme.en.html: TODO section is added.

++ whatpm/t/ChangeLog	2 May 2007 13:44:02 -0000
2007-05-02  Wakaba  <wakaba@suika.fam.cx>

	* .cvsignore: Result files are added.

	* HTML-tree.t: Support for document fragment tests.

	* Makefile: Generate test result files.

	* tokenizer-test-1.test: A new test to ensure that
	characters after end tag are preserved in RCDATA or CDATA
	case.

++ whatpm/Whatpm/ChangeLog	2 May 2007 13:42:17 -0000
2007-05-02  Wakaba  <wakaba@suika.fam.cx>

	* NanoDOM.pm (DOMImplementation): New class.
	(append_child): Weaken the |parent_node| reference.
	(create_element_ns, Element new): Set the |owner_document|
	reference.
	(implementation): New attribute.
	(owner_document, local_name, namespace_uri): New attributes.

	* HTML.pm.src (parse_string): Line and column numbers
	are now provided to error handler.
	(!!!parse-error): Short descriptions are added.
	(_construct_tree): Split into three methods; support
	for innerHTML mode.
	(set_inner_html): New method.

1 wakaba 1.1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2     <html xmlns="http://www.w3.org/1999/xhtml">
3     <head>
4 wakaba 1.2 <title>Whatpm::HTML - An HTML Parser</title>
5 wakaba 1.1 <link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" />
6     <link rev="made" href="mailto:admin@suika.fam.cx" />
7     </head>
8    
9     <body>
10    
11     <p><a name="__index__"></a></p>
12     <!-- INDEX BEGIN -->
13    
14     <ul>
15    
16     <li><a href="#name">NAME</a></li>
17     <li><a href="#synopsis">SYNOPSIS</a></li>
18     <li><a href="#description">DESCRIPTION</a></li>
19     <li><a href="#methods">METHODS</a></li>
20     <li><a href="#to_do">TO DO</a></li>
21     <li><a href="#see_also">SEE ALSO</a></li>
22     <li><a href="#author">AUTHOR</a></li>
23     <li><a href="#license">LICENSE</a></li>
24     </ul>
25     <!-- INDEX END -->
26    
27     <hr />
28     <p>
29     </p>
30     <h1><a name="name">NAME</a></h1>
31 wakaba 1.2 <p>Whatpm::HTML - An HTML Parser</p>
32 wakaba 1.1 <p>
33     </p>
34     <hr />
35     <h1><a name="synopsis">SYNOPSIS</a></h1>
36     <pre>
37 wakaba 1.2 use Whatpm::HTML;
38 wakaba 1.1
39     my $s = q&lt;&lt;!DOCTYPE html&gt;&lt;html&gt;...&lt;/html&gt;&gt;;
40     # $doc = an empty DOM |Document| object
41     my $on_error = sub {
42     my $error_code = shift;
43     warn $error_code, &quot;\n&quot;;
44     };
45    
46 wakaba 1.2 Whatpm::HTML-&gt;parse_string ($s =&gt; $doc, $onerror);
47 wakaba 1.1
48     ## Then, |$doc| is the DOM representation of |$s|.</pre>
49     <p>
50     </p>
51     <hr />
52     <h1><a name="description">DESCRIPTION</a></h1>
53 wakaba 1.2 <p>The <code>Whatpm::HTML</code> module contains HTML parser and serializer.</p>
54 wakaba 1.1 <p>The HTML parser can be used to construct the DOM tree representation
55     from an HTML document. The parsing and tree construction are done
56     as described in the Web Application 1.0 specification.</p>
57     <p>The HTML serializer can be used to obtain the HTML document representation
58     of a DOM tree (or a tree fragment thereof). The serialization
59     is performed as described in the Web Applications 1.0 specification
60     for <code>innerHTML</code> DOM attribute.</p>
61 wakaba 1.2 <p>This module is part of Whatpm - Perl Modules for
62 wakaba 1.1 Web Hypertext Application Technologies.</p>
63     <p>
64     </p>
65     <hr />
66     <h1><a name="methods">METHODS</a></h1>
67     <dl>
68 wakaba 1.2 <dt><strong><a name="item_parse_string">[<em>$doc</em> =] Whatpm::HTML-&gt;parse_string (<em>$s</em>, <em>$doc</em>[, <em>$onerror</em>]);</a></strong><br />
69 wakaba 1.1 </dt>
70     <dd>
71     Parse a string <em>$s</em> as an HTML document.
72     </dd>
73     <dd>
74     <p>The first argument, <em>$s</em>, MUST be a string. It is parsed
75     as a sequence of characters representing an HTML document.</p>
76     </dd>
77     <dd>
78     <p>The second argument, <em>$doc</em>, MUST be an empty read-write
79     DOM <code>Document</code> object. The HTML DOM tree is constructed
80     onto this <code>Document</code> object.</p>
81     </dd>
82     <dd>
83     <p>The third argument, <em>$onerror</em>, MUST be a reference to
84     the error handler code. Whenever a parse error is detected,
85     this code is invoked with an argument that contains a
86     useless string that might describe what is wrong.
87     The code MAY throw an exception, so that whole the parsing
88     process aborts. Otherwise, the parser will continue to
89     process the input. The code MUST NOT modify <em>$s</em> or <em>$doc</em>.
90     If it does, then the result is undefined.
91     This argument is optional; if missing, any
92     parse error makes that string being <code>warn</code>ed.</p>
93     </dd>
94     <dd>
95 wakaba 1.3 <p><strong>NOTE</strong>: To be a conforming user agent, the code MUST either
96     abort the processing by throwing an exception at the first
97     invocation or MUST continue the processing until the parser
98     stops.</p>
99     </dd>
100     <dd>
101 wakaba 1.1 <p>The method returns the DOM <code>Document</code> object (i.e. the second argument).</p>
102     </dd>
103     <dd>
104 wakaba 1.2 <p>Note that the <code>Whatpm::NanoDOM</code> module provides a non-conforming
105 wakaba 1.1 implementation of DOM that only implements the subset that
106 wakaba 1.2 is necessary for the purpose of <code>Whatpm::HTML</code>'s parsing and
107 wakaba 1.1 serializing.
108     With this module, creating a new HTML <code>Document</code> object
109 wakaba 1.3 from a string containing HTML document might be coded as:</p>
110 wakaba 1.1 </dd>
111     <dd>
112     <pre>
113 wakaba 1.2 use Whatpm::HTML;
114     use Whatpm::NanoDOM;
115     my $doc = Whatpm::HTML-&gt;parse_string
116     ($s =&gt; Whatpm::NanoDOM::Document-&gt;new, $onerror);</pre>
117 wakaba 1.1 </dd>
118     <p></p>
119 wakaba 1.2 <dt><strong><a name="item_get_inner_html"><em>$s</em> = Whatpm::HTML-&gt;get_inner_html (<em>$node</em>[, <em>$onerror</em>]);</a></strong><br />
120 wakaba 1.1 </dt>
121     <dd>
122     Return the HTML serialization of a DOM node <em>$node</em>.
123     </dd>
124     <dd>
125     <p>The first argument, <em>$node</em>, MUST be a DOM <code>Document</code>,
126     <code>Node</code>, or <code>DocumentFragment</code> object.</p>
127     </dd>
128     <dd>
129     <p>The second argument, <em>$onerror</em>, MUST be a reference to the
130     error handling code. This code will be invoked if a descendant
131 wakaba 1.3 of <em>$node</em> is not of <code>Element</code>, <code>Text</code>, <code>CDATASection</code>,
132 wakaba 1.1 <code>Comment</code>, <code>DocumentType</code>, or <code>EntityReference</code> so
133     that <code>INVALID_STATE_ERR</code> MUST be thrown.
134     The code will be invoked with an argument, which is the node
135     whose type is invalid.
136     This argument is optional; if missing, any such
137     node is simply ignored.</p>
138     </dd>
139     <dd>
140 wakaba 1.2 <p>The method returns a reference to the <code>inner_html</code> attribute
141 wakaba 1.3 value, i.e. the HTML serialization of the <em>$node</em>.</p>
142 wakaba 1.1 </dd>
143     <p></p></dl>
144     <p>
145     </p>
146     <hr />
147     <h1><a name="to_do">TO DO</a></h1>
148     <p>Tokenizer should emit a sequence of character tokens as one token
149     to improve performance.</p>
150     <p>A method that accepts a byte stream as an input.</p>
151     <p>Charset detection algorithm.</p>
152     <p>Setting inner_html.</p>
153     <p>And there are many ``TODO''s and ``ISSUE''s in the source code.</p>
154     <p>
155     </p>
156     <hr />
157     <h1><a name="see_also">SEE ALSO</a></h1>
158 wakaba 1.3 <p>Whatpm
159     &lt;http://suika.fam.cx/www/markup/html/whatpm/readme&gt;</p>
160 wakaba 1.1 <p>Web Applications 1.0 Working Draft (aka HTML5)
161     &lt;http://whatwg.org/html5&gt;. (Revision 792, 1 May 2007)</p>
162 wakaba 1.2 <p><a href="../Whatpm/NanoDOM.html">the Whatpm::NanoDOM manpage</a></p>
163 wakaba 1.1 <p>
164     </p>
165     <hr />
166     <h1><a name="author">AUTHOR</a></h1>
167     <p>Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;.</p>
168     <p>
169     </p>
170     <hr />
171     <h1><a name="license">LICENSE</a></h1>
172     <p>Copyright 2007 Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;</p>
173     <p>This library is free software; you can redistribute it
174     and/or modify it under the same terms as Perl itself.</p>
175    
176     </body>
177    
178     </html>

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24