/[suikacvs]/markup/html/whatpm/What/HTML.pod
Suika

Contents of /markup/html/whatpm/What/HTML.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Tue May 1 07:46:42 2007 UTC (17 years, 7 months ago) by wakaba
Branch: MAIN
++ whatpm/What/ChangeLog	1 May 2007 07:44:59 -0000
	* HTML.pm.src (parse_string): New method.
	(get_inner_html): Renamed from |inner_html|.

	* Makefile: A rule for |HTML.html| is added.

	* HTML.pod: New documentation.

2007-05-01  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/t/ChangeLog	1 May 2007 07:46:34 -0000
	* HTML-tree.t: Use new |What::HTML->parse_string| method
	so that this test don't have to know parser class's
	internals.

	* tree-test-1.dat: New tests for "(<head> stuffs) in body"
	and "(<head> stuffs) after body" are added.

2007-05-01  Wakaba  <wakaba@suika.fam.cx>

1 wakaba 1.1 =head1 NAME
2    
3     What::HTML - An HTML Parser
4    
5     =head1 SYNOPSIS
6    
7     use What::HTML;
8    
9     my $s = q<<!DOCTYPE html><html>...</html>>;
10     # $doc = an empty DOM |Document| object
11     my $on_error = sub {
12     my $error_code = shift;
13     warn $error_code, "\n";
14     };
15    
16     What::HTML->parse_string ($s => $doc, $onerror);
17    
18     ## Then, |$doc| is the DOM representation of |$s|.
19    
20     =head1 DESCRIPTION
21    
22     The C<What::HTML> module is an experimental implementation
23     of the HTML5 parsing specification.
24    
25     =head1 METHODS
26    
27     =over 4
28    
29     =item [I<$doc> =] What::HTML->parse_string (I<$s>, I<$doc>[, I<$onerror>]);
30    
31     Parse a string I<$s> as an HTML document.
32    
33     The first argument, I<$s>, MUST be a string. It is parsed
34     as a sequence of characters representing an HTML document.
35    
36     The second argument, I<$doc>, MUST be an empty read-write
37     DOM C<Document> object. The HTML DOM tree is constructed
38     onto this C<Document> object.
39    
40     The third argument, I<$onerror>, MUST be a reference to
41     the error handler code. Whenever a parse error is detected,
42     this code is invoked with an argument that contains a
43     useless string that might describe what is wrong.
44     The code MAY throw an exception, so that whole the parsing
45     process aborts. Otherwise, the parser will continue to
46     process the input. The code MUST NOT modify I<$s> or I<$doc>.
47     If it does, then the result is undefined.
48     This argument is optional; if missing, any
49     parse error makes that string being C<warn>ed.
50    
51     The method returns the DOM C<Document> object (i.e. the second argument).
52    
53     Note that the C<What::NanoDOM> module provides a non-conforming
54     implementation of DOM that only implements the subset that
55     is necessary for the purpose of C<What::HTML>'s parsing and
56     serializing.
57     With this module, creating a new HTML C<Document> object
58     from a string containing HTML document can be coded as:
59    
60     use What::HTML;
61     use What::NanoDOM;
62     my $doc = What::HTML->parse_string ($s => What::NanoDOM->new, $onerror);
63    
64     =item I<$s> = What::HTML->get_inner_html (I<$node>[, I<$onerror>]);
65    
66     Return the HTML serialization of a DOM node I<$node>.
67    
68     The first argument, I<$node>, MUST be a DOM C<Document>,
69     C<Node>, or C<DocumentFragment> object.
70    
71     The second argument, I<$onerror>, MUST be a reference to the
72     error handling code. This code will be invoked if a descendant
73     of C<$node> is not of C<Element>, C<Text>, C<CDATASection>,
74     C<Comment>, C<DocumentType>, or C<EntityReference> so
75     that C<INVALID_STATE_ERR> MUST be thrown.
76     The code will be invoked with an argument, which is the node
77     whose type is invalid.
78     This argument is optional; if missing, any such
79     node is simply ignored.
80    
81     The method returns the C<inner_html> attribute
82     value, i.e. the HTML serialization of the C<$node>.
83    
84     =back
85    
86     =head1 TODO
87    
88     Tokenizer should emit a sequence of character tokens as one token
89     to improve performance.
90    
91     A method that accepts a byte stream as an input.
92    
93     Charset detection algorithm.
94    
95     Setting inner_html.
96    
97     And there are many "TODO"s and "ISSUE"s in the source code.
98    
99     =head1 SEE ALSO
100    
101     Web Applications 1.0 Working Draft (aka HTML5)
102     <http://whatwg.org/html5>. (Revision 792, 1 May 2007)
103    
104     L<What::NanoDOM>
105    
106     =head1 AUTHOR
107    
108     Wakaba <w@suika.fam.cx>.
109    
110     =head1 LICENSE
111    
112     Copyright 2007 Wakaba <w@suika.fam.cx>
113    
114     This library is free software; you can redistribute it
115     and/or modify it under the same terms as Perl itself.
116    
117     =cut
118    
119     # $Date: 2007/04/28 14:31:34 $

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24