/[suikacvs]/markup/html/whatpm/Whatpm/HTML.pod
Suika

Contents of /markup/html/whatpm/Whatpm/HTML.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue May 1 10:36:06 2007 UTC (17 years, 6 months ago) by wakaba
Branch: MAIN
Renamed

1 =head1 NAME
2
3 What::HTML - An HTML Parser
4
5 =head1 SYNOPSIS
6
7 use What::HTML;
8
9 my $s = q<<!DOCTYPE html><html>...</html>>;
10 # $doc = an empty DOM |Document| object
11 my $on_error = sub {
12 my $error_code = shift;
13 warn $error_code, "\n";
14 };
15
16 What::HTML->parse_string ($s => $doc, $onerror);
17
18 ## Then, |$doc| is the DOM representation of |$s|.
19
20 =head1 DESCRIPTION
21
22 The C<What::HTML> module contains HTML parser and serializer.
23
24 The HTML parser can be used to construct the DOM tree representation
25 from an HTML document. The parsing and tree construction are done
26 as described in the Web Application 1.0 specification.
27
28 The HTML serializer can be used to obtain the HTML document representation
29 of a DOM tree (or a tree fragment thereof). The serialization
30 is performed as described in the Web Applications 1.0 specification
31 for C<innerHTML> DOM attribute.
32
33 This module is part of WHAT.pm - Perl Modules for
34 Web Hypertext Application Technologies.
35
36 =head1 METHODS
37
38 =over 4
39
40 =item [I<$doc> =] What::HTML->parse_string (I<$s>, I<$doc>[, I<$onerror>]);
41
42 Parse a string I<$s> as an HTML document.
43
44 The first argument, I<$s>, MUST be a string. It is parsed
45 as a sequence of characters representing an HTML document.
46
47 The second argument, I<$doc>, MUST be an empty read-write
48 DOM C<Document> object. The HTML DOM tree is constructed
49 onto this C<Document> object.
50
51 The third argument, I<$onerror>, MUST be a reference to
52 the error handler code. Whenever a parse error is detected,
53 this code is invoked with an argument that contains a
54 useless string that might describe what is wrong.
55 The code MAY throw an exception, so that whole the parsing
56 process aborts. Otherwise, the parser will continue to
57 process the input. The code MUST NOT modify I<$s> or I<$doc>.
58 If it does, then the result is undefined.
59 This argument is optional; if missing, any
60 parse error makes that string being C<warn>ed.
61
62 The method returns the DOM C<Document> object (i.e. the second argument).
63
64 Note that the C<What::NanoDOM> module provides a non-conforming
65 implementation of DOM that only implements the subset that
66 is necessary for the purpose of C<What::HTML>'s parsing and
67 serializing.
68 With this module, creating a new HTML C<Document> object
69 from a string containing HTML document can be coded as:
70
71 use What::HTML;
72 use What::NanoDOM;
73 my $doc = What::HTML->parse_string
74 ($s => What::NanoDOM::Document->new, $onerror);
75
76 =item I<$s> = What::HTML->get_inner_html (I<$node>[, I<$onerror>]);
77
78 Return the HTML serialization of a DOM node I<$node>.
79
80 The first argument, I<$node>, MUST be a DOM C<Document>,
81 C<Node>, or C<DocumentFragment> object.
82
83 The second argument, I<$onerror>, MUST be a reference to the
84 error handling code. This code will be invoked if a descendant
85 of C<$node> is not of C<Element>, C<Text>, C<CDATASection>,
86 C<Comment>, C<DocumentType>, or C<EntityReference> so
87 that C<INVALID_STATE_ERR> MUST be thrown.
88 The code will be invoked with an argument, which is the node
89 whose type is invalid.
90 This argument is optional; if missing, any such
91 node is simply ignored.
92
93 The method returns a reference to the C<inner_html> attribute
94 value, i.e. the HTML serialization of the C<$node>.
95
96 =back
97
98 =head1 TO DO
99
100 Tokenizer should emit a sequence of character tokens as one token
101 to improve performance.
102
103 A method that accepts a byte stream as an input.
104
105 Charset detection algorithm.
106
107 Setting inner_html.
108
109 And there are many "TODO"s and "ISSUE"s in the source code.
110
111 =head1 SEE ALSO
112
113 Web Applications 1.0 Working Draft (aka HTML5)
114 <http://whatwg.org/html5>. (Revision 792, 1 May 2007)
115
116 L<What::NanoDOM>
117
118 =head1 AUTHOR
119
120 Wakaba <w@suika.fam.cx>.
121
122 =head1 LICENSE
123
124 Copyright 2007 Wakaba <w@suika.fam.cx>
125
126 This library is free software; you can redistribute it
127 and/or modify it under the same terms as Perl itself.
128
129 =cut
130
131 # $Date: 2007/05/01 08:17:44 $

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24