1 |
<!DOCTYPE html> |
2 |
<html lang="en"> |
3 |
<head> |
4 |
<title>Whatpm — Perl Modules for Web Hypertext Application |
5 |
Technologies (beta)</title> |
6 |
<link rel="stylesheet" href="http://suika.fam.cx/www/style/html/xhtml"> |
7 |
<link rel="license" href="#license"> |
8 |
<link rel="author" href="#author"> |
9 |
</head> |
10 |
<body> |
11 |
<h1>Whatpm (<em>beta</em>)</h1> |
12 |
|
13 |
<div class="section" id="introduction"> |
14 |
<h2>Introduction</h2> |
15 |
|
16 |
<p><dfn>Whatpm</dfn> is a <em>work-in-progress</em> set of |
17 |
<mark>P</mark>erl <mark>m</mark>odules for <mark>W</mark>eb |
18 |
<mark>h</mark>ypertext <mark>a</mark>pplication |
19 |
<mark>t</mark>echnologies. It is part of the <a |
20 |
href="http://suika.fam.cx/www/2006/manakai/" rel=up>manakai</a> |
21 |
project.</p> |
22 |
|
23 |
<p>Whatpm supports various Web standard technologies, including <a |
24 |
href="#modules-html">HTML, XHTML</a>, <a href="#modules-xml">XML</a>, |
25 |
<a hreF="#modules-css">CSS</a>, <a href="#modules-http">HTTP</a>, and |
26 |
<a href="#modules-url">URL</a>. |
27 |
</div> |
28 |
|
29 |
<div class=section id=modules> |
30 |
<h2>Modules</h2> |
31 |
|
32 |
<div class=section id=modules-html-xml> |
33 |
<h3>Modules for HTML and XML</h3> |
34 |
|
35 |
<p id=modules-html>Modules related to HTML and XHTML are as follows: |
36 |
<dl> |
37 |
<dt id=module-whatpm-html><a href="Whatpm/HTML.html"><code>Whatpm::HTML</code></a></dt> |
38 |
<dd>An implementation of HTML5 document and fragment |
39 |
parsing algorithms. It can be used |
40 |
to convert an arbitrary string into a |
41 |
<abbr title="Document Object Model">DOM</abbr>. (See also |
42 |
<a href="#demo-html-parser">demo</a>.)</dd> |
43 |
<dt id=module-whatpm-html-serializer><a href="Whatpm/HTML/Serializer.html"><code>Whatpm::HTML::Serializer</code></a></dt> |
44 |
<dd>An implementation of HTML5 fragment serialization algorithm. |
45 |
(See also <a href="#demo-html-parser">demo</a>.)</dd> |
46 |
<dt><a href="Whatpm/HTMLTable.html"><code>Whatpm::HTMLTable</code></a></dt> |
47 |
<dd>An implementation of the HTML5 table algorithm. It can be |
48 |
used to extract a table structure from a DOM <code>table</code> |
49 |
element node. (See also <a href="#demo-html-table">demo</a>.)</dd> |
50 |
</dl> |
51 |
|
52 |
<p id=modules-xml>The module for <i>tentative</i> XML support is as follow: |
53 |
<dl> |
54 |
<dt><a href="Whatpm/XMLSerializer.html"><code>Whatpm::XMLSerializer</code></a></dt> |
55 |
<dd>A simple XML serializer.</dd> |
56 |
</dl> |
57 |
|
58 |
<p><i>Real</i> XML parser and serializer are currently not available yet. |
59 |
|
60 |
<p id=modules-cc>The module for conformance checking of a DOM tree (i.e. |
61 |
a in-memory representation of a HTML or XML document) is as follows: |
62 |
<dl> |
63 |
<dt><a href="Whatpm/ContentChecker.html"><code>Whatpm::ContentChecker</code></a></dt> |
64 |
<dd>A DOM5 HTML (in-memory representation of a document) conformance |
65 |
checker with a partial support for Atom 1.0. (See also |
66 |
<a href="#demo-html-parser">demo</a>.)</dd> |
67 |
</dl> |
68 |
|
69 |
<p>Currently, conformance checking of HTML/XHTML and Atom documents |
70 |
is supported. |
71 |
</div> |
72 |
|
73 |
<div class=section id=modules-css> |
74 |
<h3>Modules for CSS</h3> |
75 |
|
76 |
<p>Modules for CSS and related technologies are as follows: |
77 |
<dl> |
78 |
<dt><a href="Whatpm/CSS/Cascade.html"><code>Whatpm::CSS::Cascade</code></a> |
79 |
<dd>A media-independent implementation of CSS cascading and value |
80 |
computation. (See also <a href="#demo-css-parser">demo</a>.) |
81 |
<dt><a href="Whatpm/CSS/Parser.html"><code>Whatpm::CSS::Parser</code></a> |
82 |
<dd>A CSS parser that constructs CSSOM trees from style sheets. (See |
83 |
also <a href="#demo-css-parser">demo</a>.) |
84 |
<dt><a href="Whatpm/CSS/SelectorsParser.html"><code>Whatpm::CSS::SelectorsParser</code></a></dt> |
85 |
<dd>A <a href="http://www.w3.org/TR/css3-selectors/#grouping">group of |
86 |
selectors</a> parser. (See also <a href="#demo-css-parser">demo</a>.)</dd> |
87 |
<dt><a href="Whatpm/CSS/SelectorsSerializer.html"><code>Whatpm::CSS::SelectorsSerializer</code></a></dt> |
88 |
<dd>A <a href="http://www.w3.org/TR/css3-selectors/#grouping">group of |
89 |
selectors</a> serializer. (See also <a href="#spec-ssft">specification</a> |
90 |
and <a href="#demo-css-parser">demo</a>.)</dd> |
91 |
<dt><a href="Whatpm/CSS/Tokenizer.html"><code>Whatpm::CSS::Tokenizer</code></a></dt> |
92 |
<dd>A CSS tokenizer. (See also <a href="#demo-css-parser">demo</a>.)</dd> |
93 |
</dl> |
94 |
</div> |
95 |
|
96 |
<div class=section id=modules-http> |
97 |
<h3>Modules for HTTP</h3> |
98 |
|
99 |
<p>Modules for HTTP and related technologies are as follows: |
100 |
<dl> |
101 |
<dt><a href="Whatpm/ContentType.html"><code>Whatpm::ContentType</code></a></dt> |
102 |
<dd>An implementation of HTML5 Content Type sniffing algorithm.</dd> |
103 |
<dt><a href="Whatpm/IMTChecker.html"><code>Whatpm::IMTChecker</code></a></dt> |
104 |
<dd>An Internet Media Type (<abbr>aka</abbr> MIME type) label |
105 |
conformance checker.</dd> |
106 |
</dl> |
107 |
|
108 |
<p>Currently, support for parsing of HTTP headers and as such is not |
109 |
yet available. |
110 |
</div> |
111 |
|
112 |
<div class=section id=modules-url> |
113 |
<h3>Module for URL</h3> |
114 |
|
115 |
<p>Module for the URL support is as follows: |
116 |
<dl> |
117 |
<dt><a href="Whatpm/URIChecker.html"><code>Whatpm::URIChecker</code></a></dt> |
118 |
<dd>An IRI reference conformance checker.</dd> |
119 |
</dl> |
120 |
|
121 |
<p>Support for HTML5's realistic definition of URL is not available yet. |
122 |
</div> |
123 |
|
124 |
<div class=section id=modules-misc> |
125 |
<h3>Modules for other technologies</h3> |
126 |
|
127 |
<p>Following modules provide support for other Web-related technologies: |
128 |
<dl> |
129 |
<dt><a href="Whatpm/CacheManifest.html"><code>Whatpm::CacheManifest</code></a></dt> |
130 |
<dd>An |
131 |
<a href="http://www.whatwg.org/specs/web-apps/current-work/#manifests">HTML5 |
132 |
cache manifest</a> parser.</dd> |
133 |
<dt id=whatpm-charset-universalchardet><a href="Whatpm/Charset/UniversalCharDet.html"><code>Whatpm::Charset::UniversalCharDet</code></a></dt> |
134 |
<dd>A Perl interface to universalchardet character encoding detection |
135 |
library.</dd> |
136 |
<dt><a href="Whatpm/LangTag.html"><code>Whatpm::LangTag</code></a> |
137 |
<dd>A language tag parser and conformance checker, supporting both |
138 |
older RFC 3066 definition and latest RFC 4646 definition. (See also |
139 |
<a href="#demo-langtag">demo</a>.) |
140 |
<dt><a href="Whatpm/WebIDL.html"><code>Whatpm::WebIDL</code></a></dt> |
141 |
<dd>A WebIDL fragment parser. It parses an IDL fragment, whether conforming |
142 |
or not, and constructs a DOM-like object model for further processing. |
143 |
Non-conforming (or broken) IDL fragment-like string will be parsed using |
144 |
CSS-like error-tolerant parsing rules, e.g. ignoring anything until next |
145 |
<code>;</code> character. |
146 |
</dl> |
147 |
</div> |
148 |
|
149 |
<p>Note that all of these modules are <em>work in progress</em> |
150 |
and have <a href="#todo">a number of unresolved problems</a>.</p> |
151 |
|
152 |
<p>Note also that some modules have no documentation for now.</p> |
153 |
|
154 |
<!-- Whatpm::ContentChecker::*, Whatpm::H2H, Whatpm::NanoDOM, and |
155 |
Whatpm::XMLParser are intentionally omitted from the list. --> |
156 |
</div> |
157 |
|
158 |
<div class=section id=documents> |
159 |
<h2>Documents</h2> |
160 |
|
161 |
<p>For the description of functionalities provided by each module, see |
162 |
<abbr>pod</abbr> documentation of the module. HTML version of |
163 |
<abbr>pod</abbr> documentations are linked from the <a |
164 |
href="#modules">list of modules above</a>. |
165 |
|
166 |
<p>In addition, there are additional documents for some topics: |
167 |
<dl> |
168 |
<dt><a href="http://suika.fam.cx/gate/2005/sw/Whatpm%20Error%20Types">List of error types</a></dt> |
169 |
<!-- @@ TODO: Need to update the link - the document above is out of date --> |
170 |
<dd>Description of errors to be notified to callback functions by Whatpm |
171 |
modules.</dd> |
172 |
|
173 |
<dt><a href="Whatpm/CSS/selectors-object">Selectors object</a></dt> |
174 |
<dd>Description of data structure for Selectors, as implemented by |
175 |
<a href="Whatpm/CSS/SelectorsParser.html"><code>Whatpm::CSS::SelectorsParser</code></a> |
176 |
(as output), and |
177 |
<a href="Whatpm/CSS/SelectorsSerializer.html"><code>Whatpm::CSS::SelectorsSerializer</code></a> |
178 |
(as input)<!--, and |
179 |
<a href="http://suika.fam.cx/www/manakai-core/lib/Message/DOM/SelectorsAPI.html"><code>Message::DOM::SelectorsAPI</code></a>-->.</dd> |
180 |
|
181 |
<dt id=doc-user-data-names><a href="http://suika.fam.cx/gate/2005/sw/manakai/Predefined%20User%20Data%20Names">List of predefined user data names</a></dt> |
182 |
<dd>List of user data names defined by Whatpm modules.</dd> |
183 |
|
184 |
<dt id=doc-handles><a href="Whatpm/Charset/handles">Handle objects</a> |
185 |
<dd>Description of character or byte stream input handle interfaces. |
186 |
</dl> |
187 |
|
188 |
<p>Following specifications define Whatpm-specific formats and extensions: |
189 |
<dl id=spec> |
190 |
<dt id=spec-ssft><a href="http://suika.fam.cx/www/markup/selectors/ssft/ssft"><abbr title="Selectors Serialization Format for Testing">SSFT</abbr> |
191 |
Specification</a></dt> |
192 |
<dd>The specification for the serialization format used for |
193 |
testing Selectors-related modules.</dd> |
194 |
|
195 |
<dt id=spec-manakai-selectors"><a href="http://suika.fam.cx/gate/2005/sw/manakai/Selectors%20Extensions">manakai's |
196 |
Selectors Extensions</a></dt> |
197 |
<dd>The specification for <code>:-manakai-<var>*</var></code> |
198 |
pseudo-classes implemented by Selectors-related modules.</dd> |
199 |
</dl> |
200 |
</div> |
201 |
|
202 |
<div class="section" id="demo"> |
203 |
<h2>Demo</h2> |
204 |
|
205 |
<ul> |
206 |
<li id=demo-html-parser-nanodom><a href="http://suika.fam.cx/gate/2007/html/parser-interface">HTML5 parser |
207 |
and checker demo</a> |
208 |
(<a href="http://suika.fam.cx/gate/cvs/*checkout*/webroot/gate/2007/html/parser.cgi">source</a>, |
209 |
with <a href="Whatpm/NanoDOM.html">a lightweight non-conforming |
210 |
DOM implementation</a>)</li> |
211 |
<li id=demo-html-parser-manakai><a href="http://suika.fam.cx/gate/2007/html/parser-manakai-interface">HTML5 |
212 |
parser and checker demo, with manakai's DOM implementation</a> |
213 |
(<a href="http://suika.fam.cx/gate/cvs/*checkout*/webroot/gate/2007/html/parser-manakai.cgi">source</a>)</li> |
214 |
<li id=demo-html-table><a href="http://suika.fam.cx/gate/2007/html/table-interface">HTML5 table |
215 |
structure visualization demo</a> |
216 |
(<a href="http://suika.fam.cx/gate/cvs/*checkout*/webroot/gate/2007/html/table.cgi">source</a>)</li> |
217 |
|
218 |
<li id=demo-css-parser><a href="http://suika.fam.cx/gate/2007/css/parser-interface">CSS |
219 |
tokenizer, parser, and computed style computation demo</a> |
220 |
(<a href="http://suika.fam.cx/gate/cvs/*checkout*/webroot/gate/2007/css/parser.cgi">source</a>)</li> |
221 |
|
222 |
<li id=demo-langtag><a href="http://suika.fam.cx/gate/2007/langtag/langtag-demo-interface">Language |
223 |
tag parsing and conformance checking demo</a> |
224 |
(<a href="http://suika.fam.cx/gate/cvs/*checkout*/webroot/gate/2007/langtag/langtag-demo.cgi">source</a>) |
225 |
</ul> |
226 |
</div> |
227 |
|
228 |
<div class=section id=applications> |
229 |
<h2>Application</h2> |
230 |
|
231 |
<ul> |
232 |
<li><a href="http://suika.fam.cx/gate/2007/html/cc/"><abbr>WebHACC</abbr> |
233 |
(Web hypertext application conformance checker)</a> |
234 |
</ul> |
235 |
</div> |
236 |
|
237 |
<div class="section" id="dependency"> |
238 |
<h2>Dependency</h2> |
239 |
|
240 |
<dl> |
241 |
<dt id=dependency-perl>Perl 5.8 or later</dt> |
242 |
<dd>It is recommended to use newer stable release of Perl 5.8 (or |
243 |
later).</dd> |
244 |
<dd id=dependency-encode>Some modules require <code>Encode</code> |
245 |
modules, which are part of standard Perl distribution.</dd> |
246 |
<dt id=dependency-manakai-core>Modules from |
247 |
<a href="http://suika.fam.cx/www/2006/manakai/">manakai-core</a></dt> |
248 |
<dd> |
249 |
<dl> |
250 |
<dt id=dependency-error><a href="http://search.cpan.org/author/SHLOMIF/Error-0.17009/lib/Error.pm"><code>Error</code></a></dt> |
251 |
<dd>Module <code>Whatpm::HTML</code> requires <code>Error</code>, |
252 |
which is bundled in |
253 |
<a href="http://suika.fam.cx/www/2006/manakai/">manakai-core</a>.</dd> |
254 |
<dt><code>Message::IMT::InternetMediaType</code></dt> |
255 |
<dd>Module <code>Whatpm::IMTChecker</code> depends on |
256 |
<code>Message::IMT::InternetMediaType</code>, which is part of |
257 |
<a href="http://suika.fam.cx/www/2006/manakai/">manakai-core</a>.</dd> |
258 |
<dt><code>Message::URI::URIReference</code></dt> |
259 |
<dd>Modules <code>Whatpm::URIChecker</code> and |
260 |
<code>Whatpm::CacheManifest</code> depend on |
261 |
<a href="http://suika.fam.cx/www/manakai-core/lib/Message/URI/URIReference.html"><code>Message::URI::URIReference</code></a>, |
262 |
which is part of |
263 |
<a href="http://suika.fam.cx/www/2006/manakai/">manakai-core</a>.</dd> |
264 |
<dt><code>Message::Charset::Info</code></dt> |
265 |
<dd>Module <code>Whatpm::ContentChecker</code> depends on |
266 |
<a href="http://suika.fam.cx/www/manakai-core/lib/Message/Charset/Info.html"><code>Message::Charset::Info</code></a>, |
267 |
which is part of |
268 |
<a href="http://suika.fam.cx/www/2006/manakai/">manakai-core</a>.</dd> |
269 |
<dt><code>Message::DOM::DOMImplementation</code> |
270 |
<dd>Module <code>Whatpm::URIChecker</code> depends on |
271 |
<code>Message::DOM::DOMImplementation</code>, |
272 |
which is part of |
273 |
<a href="http://suika.fam.cx/www/2006/manakai/">manakai-core</a>. |
274 |
<dt><code>Message::DOM::DOMImplementation</code> and related modules</dt> |
275 |
<dd><em>Testing</em> for module <code>Whatpm::ContentChecker</code> |
276 |
depends on <code>Message::DOM::DOMImplementation</code> and related modules |
277 |
in <a href="http://suika.fam.cx/www/2006/manakai/">manakai-core</a>. |
278 |
They are not required in practice.</dd> |
279 |
</dl> |
280 |
</dd> |
281 |
<dt><a href="http://suika.fam.cx/www/manakai-charlib/readme">manakai |
282 |
charlib</a></dt> |
283 |
<dd>Module <code>Whatpm::Charset::DecodeHandle</code> depends on |
284 |
modules in <a href="http://suika.fam.cx/www/manakai-charlib/readme">manakai |
285 |
charlib</a> for decoding of <em>Japanese character encodings</em>. |
286 |
See the documentation for |
287 |
<a href="http://suika.fam.cx/www/manakai-charlib/readme">manakai |
288 |
charlib</a> for more information.</dd> |
289 |
<dt><a href="http://www.python.org/">Python</a>, Perl |
290 |
<a href="http://search.cpan.org/~neilw/Inline-Python-0.22/"><code>Inline::Python</code></a> |
291 |
module, and <a href="http://chardet.feedparser.org/">Universal Encoding |
292 |
Detector</a></dt> |
293 |
<dd>For the module <code>Whatpm::Charset::UniversalCharDet</code> being |
294 |
meaningful, these softwares are required on the system. See the |
295 |
<a href="Whatpm/Charset/UniversalCharDet.html#dependency">documentation</a> |
296 |
for more information.</dd> |
297 |
<dt><a href="http://search.cpan.org/~makamaka/JSON-1.14/"><code>JSON</code></a></dt> |
298 |
<dd><em>Testing</em> for modules <code>Whatpm::HTML</code> and |
299 |
<code>Whatpm::CSS::Tokenizer</code> |
300 |
depends on <a href="http://search.cpan.org/~makamaka/JSON-1.14/"><code>JSON</code> and related modules</a>. |
301 |
They are not required in practice.</dd> |
302 |
</dl> |
303 |
</div> |
304 |
|
305 |
<div class="section" id="download"> |
306 |
<h2>Distribution</h2> |
307 |
|
308 |
<p>The development version of Whatpm may be found in the |
309 |
<a href="http://suika.fam.cx/gate/cvs/markup/html/whatpm/">CVS |
310 |
repository</a>.</p> |
311 |
|
312 |
<p><a href="http://suika.fam.cx/gate/cvs/markup/html/whatpm/whatpm.tar.gz?tarball=1">The |
313 |
latest developmenet version of the Whatpm</a> is also available as a |
314 |
tarball. |
315 |
|
316 |
</div> |
317 |
|
318 |
<div class="section" id="todo"> |
319 |
<h2>TO DO</h2> |
320 |
|
321 |
<ul> |
322 |
<li>Bug fix (Test results: |
323 |
<a href="t/content-type-result"><code>Whatpm::ContentType</code></a>, |
324 |
<a href="t/tokenizer-result">HTML tokenization</a>, |
325 |
<a href="t/tree-construction-result">HTML tree construction</a>, |
326 |
<a href="t/content-checker-result"><code>Whatpm::ContentChecker</code></a>).</li> |
327 |
<li>Merge with the <a href="http://suika.fam.cx/www/2006/manakai/">manakai-core</a> |
328 |
code tree. |
329 |
<li>Charset detection.</li> |
330 |
<li>Validation for <code>meta</code>.</li> |
331 |
<li>Validation for media queries, IRIs (against URI schemes), language tags, |
332 |
and so on.</li> |
333 |
<li>Documentations are missing for some features.</li> |
334 |
<li>XML parser<!-- with application cache selection algorithm hook-->.</li> |
335 |
<li>In addition, each module has its own TO DO items. |
336 |
(Search for <q>## TODO</q> and <q>## ISSUE</q> in each module.)</li> |
337 |
</ul> |
338 |
</div> |
339 |
|
340 |
<div class=section id=acknowledgments> |
341 |
<h2>Acknowledgments</h2> |
342 |
|
343 |
<p>Thanks to the <a href="http://code.google.com/p/html5lib/">html5lib</a> |
344 |
team for their |
345 |
<a href="http://html5lib.googlecode.com/svn/trunk/testdata/">HTML5 |
346 |
parser test data</a>.</p> |
347 |
</div> |
348 |
|
349 |
<div class="section" id="author"> |
350 |
<h2>Author</h2> |
351 |
|
352 |
<p><a href="http://suika.fam.cx/~wakaba/who?" rel="author">Wakaba</a>.</p> |
353 |
</div> |
354 |
|
355 |
<div class="section" id="license"> |
356 |
<h2>License</h2> |
357 |
|
358 |
<p>Copyright 2007$B!>(B2008 Wakaba |
359 |
<code class="mail"><<a href="mailto:w@suika.fam.cx" |
360 |
rel="author">w@suika.fam.cx</a>></code>.</p> |
361 |
|
362 |
<p>This library is free software; you can redistribute it and/or modify |
363 |
it under the same terms as Perl itself.</p> |
364 |
</div> |
365 |
|
366 |
</body> |
367 |
</html> |