/[suikacvs]/markup/html/whatpm/Whatpm/HTML.pod
Suika

Contents of /markup/html/whatpm/Whatpm/HTML.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.6 - (hide annotations) (download)
Sun Nov 4 04:34:30 2007 UTC (17 years ago) by wakaba
Branch: MAIN
Changes since 1.5: +27 -12 lines
++ whatpm/t/ChangeLog	4 Nov 2007 04:31:01 -0000
	* content-model-2.dat: Tests for |manifest=""| are added.

2007-11-04  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/ContentChecker/ChangeLog	4 Nov 2007 04:32:03 -0000
	* HTML.pm: Allow <html manifest=URI>.

2007-11-04  Wakaba  <wakaba@suika.fam.cx>

1 wakaba 1.1 =head1 NAME
2    
3 wakaba 1.6 Whatpm::HTML - An HTML Parser and Serializer
4 wakaba 1.1
5     =head1 SYNOPSIS
6    
7 wakaba 1.2 use Whatpm::HTML;
8 wakaba 1.1
9     my $s = q<<!DOCTYPE html><html>...</html>>;
10     # $doc = an empty DOM |Document| object
11     my $on_error = sub {
12 wakaba 1.6 my %error = @_;
13     warn $error{type}, "\n";
14 wakaba 1.1 };
15    
16 wakaba 1.2 Whatpm::HTML->parse_string ($s => $doc, $onerror);
17 wakaba 1.1
18 wakaba 1.6 ## Now, |$doc| is the DOM representation of |$s|.
19 wakaba 1.1
20     =head1 DESCRIPTION
21    
22 wakaba 1.2 The C<Whatpm::HTML> module contains HTML parser and serializer.
23 wakaba 1.1
24     The HTML parser can be used to construct the DOM tree representation
25     from an HTML document. The parsing and tree construction are done
26     as described in the Web Application 1.0 specification.
27    
28     The HTML serializer can be used to obtain the HTML document representation
29     of a DOM tree (or a tree fragment thereof). The serialization
30     is performed as described in the Web Applications 1.0 specification
31     for C<innerHTML> DOM attribute.
32    
33 wakaba 1.2 This module is part of Whatpm - Perl Modules for
34 wakaba 1.1 Web Hypertext Application Technologies.
35    
36     =head1 METHODS
37    
38     =over 4
39    
40 wakaba 1.2 =item [I<$doc> =] Whatpm::HTML->parse_string (I<$s>, I<$doc>[, I<$onerror>]);
41 wakaba 1.1
42     Parse a string I<$s> as an HTML document.
43    
44     The first argument, I<$s>, MUST be a string. It is parsed
45     as a sequence of characters representing an HTML document.
46    
47     The second argument, I<$doc>, MUST be an empty read-write
48     DOM C<Document> object. The HTML DOM tree is constructed
49     onto this C<Document> object.
50    
51     The third argument, I<$onerror>, MUST be a reference to
52     the error handler code. Whenever a parse error is detected,
53     this code is invoked with an argument that contains a
54     useless string that might describe what is wrong.
55     The code MAY throw an exception, so that whole the parsing
56     process aborts. Otherwise, the parser will continue to
57     process the input. The code MUST NOT modify I<$s> or I<$doc>.
58     If it does, then the result is undefined.
59     This argument is optional; if missing, any
60     parse error makes that string being C<warn>ed.
61    
62 wakaba 1.3 B<NOTE>: To be a conforming user agent, the code MUST either
63     abort the processing by throwing an exception at the first
64     invocation or MUST continue the processing until the parser
65     stops.
66    
67 wakaba 1.1 The method returns the DOM C<Document> object (i.e. the second argument).
68    
69 wakaba 1.2 Note that the C<Whatpm::NanoDOM> module provides a non-conforming
70 wakaba 1.4 implementation of DOM that only implements a subset that
71 wakaba 1.2 is necessary for the purpose of C<Whatpm::HTML>'s parsing and
72 wakaba 1.1 serializing.
73     With this module, creating a new HTML C<Document> object
74 wakaba 1.3 from a string containing HTML document might be coded as:
75 wakaba 1.1
76 wakaba 1.2 use Whatpm::HTML;
77     use Whatpm::NanoDOM;
78     my $doc = Whatpm::HTML->parse_string
79     ($s => Whatpm::NanoDOM::Document->new, $onerror);
80 wakaba 1.1
81 wakaba 1.2 =item I<$s> = Whatpm::HTML->get_inner_html (I<$node>[, I<$onerror>]);
82 wakaba 1.1
83     Return the HTML serialization of a DOM node I<$node>.
84    
85     The first argument, I<$node>, MUST be a DOM C<Document>,
86 wakaba 1.4 C<Element>, or C<DocumentFragment> node.
87 wakaba 1.1
88     The second argument, I<$onerror>, MUST be a reference to the
89     error handling code. This code will be invoked if a descendant
90 wakaba 1.4 of I<$node> is neither of C<Element>, C<Text>, C<CDATASection>,
91     C<Comment>, C<DocumentType>, nor C<EntityReference>, so
92     that an C<INVALID_STATE_ERR> exception MUST be thrown.
93 wakaba 1.1 The code will be invoked with an argument, which is the node
94     whose type is invalid.
95 wakaba 1.4 The argument I<$onerror> is optional; if missing, any erroneous
96 wakaba 1.1 node is simply ignored.
97    
98     The method returns a reference to the C<inner_html> attribute
99 wakaba 1.3 value, i.e. the HTML serialization of the I<$node>.
100 wakaba 1.1
101     =back
102    
103 wakaba 1.5 =head1 LOW-LEVEL INTERFACE
104    
105     @@ TBW
106    
107     =head2 Application Cache Selection Algorithm Hook
108    
109     Once a parser I<$p> is instantiated by method C<new>,
110 wakaba 1.6 a C<CODE> reference can be set to C<< I<$p>->{application_cache_selection} >>.
111 wakaba 1.5 That C<CODE> will be called back when the application cache selection
112     algorithm MUST be run per HTML5. By default,
113 wakaba 1.6 C<< I<$p>->{application_cache_selection} >> is set to an empty subroutine.
114    
115     The subroutine will be invoked with an argument I<manifest_uri>,
116     which is set to the manifest URI when the algorithm MUST be invoked
117     with a manifest URI, or is set to C<undef> when the algorithm MUST
118     be invoked without no manifest URI.
119    
120     =head1 ERROR REPORTS
121    
122     @@ TBW
123    
124     The list of the error types is available in
125     Whatpm Error Types <http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types>.
126 wakaba 1.5
127 wakaba 1.1 =head1 TO DO
128    
129     Tokenizer should emit a sequence of character tokens as one token
130     to improve performance.
131    
132     A method that accepts a byte stream as an input.
133    
134     Charset detection algorithm.
135    
136 wakaba 1.4 Documentation for the setter of inner_html.
137 wakaba 1.1
138     And there are many "TODO"s and "ISSUE"s in the source code.
139    
140     =head1 SEE ALSO
141    
142 wakaba 1.6 Whatpm <http://suika.fam.cx/www/markup/html/whatpm/readme>.
143    
144     Whatpm Error Types
145     <http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types>.
146    
147     HTML5 <http://whatwg.org/html5>.
148 wakaba 1.3
149 wakaba 1.6 L<Whatpm::NanoDOM>.
150 wakaba 1.1
151 wakaba 1.6 L<Whatpm::ContentChecker::HTML>.
152 wakaba 1.1
153     =head1 AUTHOR
154    
155     Wakaba <w@suika.fam.cx>.
156    
157     =head1 LICENSE
158    
159     Copyright 2007 Wakaba <w@suika.fam.cx>
160    
161     This library is free software; you can redistribute it
162     and/or modify it under the same terms as Perl itself.
163    
164     =cut
165    
166 wakaba 1.6 # $Date: 2007/11/04 04:15:07 $

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24