/[suikacvs]/markup/html/whatpm/What/HTML.pod
Suika

Contents of /markup/html/whatpm/What/HTML.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.3 - (hide annotations) (download)
Tue May 1 10:37:35 2007 UTC (17 years, 7 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
Changes since 1.2: +1 -1 lines
FILE REMOVED
Renamed

1 wakaba 1.1 =head1 NAME
2    
3     What::HTML - An HTML Parser
4    
5     =head1 SYNOPSIS
6    
7     use What::HTML;
8    
9     my $s = q<<!DOCTYPE html><html>...</html>>;
10     # $doc = an empty DOM |Document| object
11     my $on_error = sub {
12     my $error_code = shift;
13     warn $error_code, "\n";
14     };
15    
16     What::HTML->parse_string ($s => $doc, $onerror);
17    
18     ## Then, |$doc| is the DOM representation of |$s|.
19    
20     =head1 DESCRIPTION
21    
22 wakaba 1.2 The C<What::HTML> module contains HTML parser and serializer.
23    
24     The HTML parser can be used to construct the DOM tree representation
25     from an HTML document. The parsing and tree construction are done
26     as described in the Web Application 1.0 specification.
27    
28     The HTML serializer can be used to obtain the HTML document representation
29     of a DOM tree (or a tree fragment thereof). The serialization
30     is performed as described in the Web Applications 1.0 specification
31     for C<innerHTML> DOM attribute.
32    
33     This module is part of WHAT.pm - Perl Modules for
34     Web Hypertext Application Technologies.
35 wakaba 1.1
36     =head1 METHODS
37    
38     =over 4
39    
40     =item [I<$doc> =] What::HTML->parse_string (I<$s>, I<$doc>[, I<$onerror>]);
41    
42     Parse a string I<$s> as an HTML document.
43    
44     The first argument, I<$s>, MUST be a string. It is parsed
45     as a sequence of characters representing an HTML document.
46    
47     The second argument, I<$doc>, MUST be an empty read-write
48     DOM C<Document> object. The HTML DOM tree is constructed
49     onto this C<Document> object.
50    
51     The third argument, I<$onerror>, MUST be a reference to
52     the error handler code. Whenever a parse error is detected,
53     this code is invoked with an argument that contains a
54     useless string that might describe what is wrong.
55     The code MAY throw an exception, so that whole the parsing
56     process aborts. Otherwise, the parser will continue to
57     process the input. The code MUST NOT modify I<$s> or I<$doc>.
58     If it does, then the result is undefined.
59     This argument is optional; if missing, any
60     parse error makes that string being C<warn>ed.
61    
62     The method returns the DOM C<Document> object (i.e. the second argument).
63    
64     Note that the C<What::NanoDOM> module provides a non-conforming
65     implementation of DOM that only implements the subset that
66     is necessary for the purpose of C<What::HTML>'s parsing and
67     serializing.
68     With this module, creating a new HTML C<Document> object
69     from a string containing HTML document can be coded as:
70    
71     use What::HTML;
72     use What::NanoDOM;
73     my $doc = What::HTML->parse_string ($s => What::NanoDOM->new, $onerror);
74    
75     =item I<$s> = What::HTML->get_inner_html (I<$node>[, I<$onerror>]);
76    
77     Return the HTML serialization of a DOM node I<$node>.
78    
79     The first argument, I<$node>, MUST be a DOM C<Document>,
80     C<Node>, or C<DocumentFragment> object.
81    
82     The second argument, I<$onerror>, MUST be a reference to the
83     error handling code. This code will be invoked if a descendant
84     of C<$node> is not of C<Element>, C<Text>, C<CDATASection>,
85     C<Comment>, C<DocumentType>, or C<EntityReference> so
86     that C<INVALID_STATE_ERR> MUST be thrown.
87     The code will be invoked with an argument, which is the node
88     whose type is invalid.
89     This argument is optional; if missing, any such
90     node is simply ignored.
91    
92     The method returns the C<inner_html> attribute
93     value, i.e. the HTML serialization of the C<$node>.
94    
95     =back
96    
97 wakaba 1.2 =head1 TO DO
98 wakaba 1.1
99     Tokenizer should emit a sequence of character tokens as one token
100     to improve performance.
101    
102     A method that accepts a byte stream as an input.
103    
104     Charset detection algorithm.
105    
106     Setting inner_html.
107    
108     And there are many "TODO"s and "ISSUE"s in the source code.
109    
110     =head1 SEE ALSO
111    
112     Web Applications 1.0 Working Draft (aka HTML5)
113     <http://whatwg.org/html5>. (Revision 792, 1 May 2007)
114    
115     L<What::NanoDOM>
116    
117     =head1 AUTHOR
118    
119     Wakaba <w@suika.fam.cx>.
120    
121     =head1 LICENSE
122    
123     Copyright 2007 Wakaba <w@suika.fam.cx>
124    
125     This library is free software; you can redistribute it
126     and/or modify it under the same terms as Perl itself.
127    
128     =cut
129    
130 wakaba 1.3 # $Date: 2007/05/01 08:17:44 $

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24