/[suikacvs]/markup/html/whatpm/Whatpm/HTML.pod
Suika

Contents of /markup/html/whatpm/Whatpm/HTML.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.7 - (show annotations) (download)
Sun Nov 11 04:59:36 2007 UTC (16 years, 11 months ago) by wakaba
Branch: MAIN
Changes since 1.6: +3 -21 lines
++ ChangeLog	11 Nov 2007 04:59:27 -0000
2007-11-11  Wakaba  <wakaba@suika.fam.cx>

	* readme.en.html: Link to |Whatpm::HTML::Serializer|.

++ whatpm/Whatpm/ChangeLog	11 Nov 2007 04:59:14 -0000
2007-11-11  Wakaba  <wakaba@suika.fam.cx>

	* HTML.pod (get_inner_html): Removed.

	* Makefile (HTML-all, HTML-clean): New.

2007-11-11  Wakaba  <wakaba@suika.fam.cx>

	* HTML.pm.src (get_inner_html): Removed (moved to HTML/Serializer.pm).

++ whatpm/Whatpm/HTML/ChangeLog	11 Nov 2007 04:58:48 -0000
2007-11-11  Wakaba  <wakaba@suika.fam.cx>

	* Serializer.pod: New file.

	* Makefile: New file.

2007-11-11  Wakaba  <wakaba@suika.fam.cx>

	* Serializer.pm: New module (split from ../HTML.pm.src).

2007-11-11  Wakaba  <wakaba@suika.fam.cx>

	* ChangeLog: New file.

1 =head1 NAME
2
3 Whatpm::HTML - An HTML Parser and Serializer
4
5 =head1 SYNOPSIS
6
7 use Whatpm::HTML;
8
9 my $s = q<<!DOCTYPE html><html>...</html>>;
10 # $doc = an empty DOM |Document| object
11 my $on_error = sub {
12 my %error = @_;
13 warn $error{type}, "\n";
14 };
15
16 Whatpm::HTML->parse_string ($s => $doc, $onerror);
17
18 ## Now, |$doc| is the DOM representation of |$s|.
19
20 =head1 DESCRIPTION
21
22 The C<Whatpm::HTML> module contains HTML parser and serializer.
23
24 The HTML parser can be used to construct the DOM tree representation
25 from an HTML document. The parsing and tree construction are done
26 as described in the Web Application 1.0 specification.
27
28 The HTML serializer can be used to obtain the HTML document representation
29 of a DOM tree (or a tree fragment thereof). The serialization
30 is performed as described in the Web Applications 1.0 specification
31 for C<innerHTML> DOM attribute.
32
33 This module is part of Whatpm - Perl Modules for
34 Web Hypertext Application Technologies.
35
36 =head1 METHODS
37
38 =over 4
39
40 =item [I<$doc> =] Whatpm::HTML->parse_string (I<$s>, I<$doc>[, I<$onerror>]);
41
42 Parse a string I<$s> as an HTML document.
43
44 The first argument, I<$s>, MUST be a string. It is parsed
45 as a sequence of characters representing an HTML document.
46
47 The second argument, I<$doc>, MUST be an empty read-write
48 DOM C<Document> object. The HTML DOM tree is constructed
49 onto this C<Document> object.
50
51 The third argument, I<$onerror>, MUST be a reference to
52 the error handler code. Whenever a parse error is detected,
53 this code is invoked with an argument that contains a
54 useless string that might describe what is wrong.
55 The code MAY throw an exception, so that whole the parsing
56 process aborts. Otherwise, the parser will continue to
57 process the input. The code MUST NOT modify I<$s> or I<$doc>.
58 If it does, then the result is undefined.
59 This argument is optional; if missing, any
60 parse error makes that string being C<warn>ed.
61
62 B<NOTE>: To be a conforming user agent, the code MUST either
63 abort the processing by throwing an exception at the first
64 invocation or MUST continue the processing until the parser
65 stops.
66
67 The method returns the DOM C<Document> object (i.e. the second argument).
68
69 Note that the C<Whatpm::NanoDOM> module provides a non-conforming
70 implementation of DOM that only implements a subset that
71 is necessary for the purpose of C<Whatpm::HTML>'s parsing and
72 serializing.
73 With this module, creating a new HTML C<Document> object
74 from a string containing HTML document might be coded as:
75
76 use Whatpm::HTML;
77 use Whatpm::NanoDOM;
78 my $doc = Whatpm::HTML->parse_string
79 ($s => Whatpm::NanoDOM::Document->new, $onerror);
80
81 =back
82
83 =head1 LOW-LEVEL INTERFACE
84
85 @@ TBW
86
87 =head2 Application Cache Selection Algorithm Hook
88
89 Once a parser I<$p> is instantiated by method C<new>,
90 a C<CODE> reference can be set to C<< I<$p>->{application_cache_selection} >>.
91 That C<CODE> will be called back when the application cache selection
92 algorithm MUST be run per HTML5. By default,
93 C<< I<$p>->{application_cache_selection} >> is set to an empty subroutine.
94
95 The subroutine will be invoked with an argument I<manifest_uri>,
96 which is set to the manifest URI when the algorithm MUST be invoked
97 with a manifest URI, or is set to C<undef> when the algorithm MUST
98 be invoked without no manifest URI.
99
100 =head1 ERROR REPORTS
101
102 @@ TBW
103
104 The list of the error types is available in
105 Whatpm Error Types <http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types>.
106
107 =head1 TO DO
108
109 Tokenizer should emit a sequence of character tokens as one token
110 to improve performance.
111
112 A method that accepts a byte stream as an input.
113
114 Charset detection algorithm.
115
116 Documentation for the setter of inner_html.
117
118 And there are many "TODO"s and "ISSUE"s in the source code.
119
120 =head1 SEE ALSO
121
122 Whatpm <http://suika.fam.cx/www/markup/html/whatpm/readme>.
123
124 Whatpm Error Types
125 <http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types>.
126
127 HTML5 <http://whatwg.org/html5>.
128
129 L<Whatpm::HTML::Serializer>.
130
131 L<Whatpm::NanoDOM>.
132
133 L<Whatpm::ContentChecker::HTML>.
134
135 =head1 AUTHOR
136
137 Wakaba <w@suika.fam.cx>.
138
139 =head1 LICENSE
140
141 Copyright 2007 Wakaba <w@suika.fam.cx>
142
143 This library is free software; you can redistribute it
144 and/or modify it under the same terms as Perl itself.
145
146 =cut
147
148 # $Date: 2007/11/04 04:34:30 $

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24