Whatpm is a work-in-progress set of Perl modules for Web hypertext application technologies. It is part of the manakai project.
Whatpm supports various Web standard technologies, including HTML, XHTML, XML, CSS, HTTP, and URL.
An Atom feed for ChangeLog diffs is available.
Note that all of these modules are work in progress and have a number of unresolved problems.
Note also that some modules have no documentation for now.
Modules related to HTML and XHTML are as follows:
Whatpm::HTML
Whatpm::HTML::Serializer
Whatpm::HTMLTable
table
element node. (See also demo.)Modules for the XML support is as follow:
Whatpm::XML::Parser
An XML parser with non-draconian error handling. It can
construct a DOM tree from XML 1.0/1.1 documents that does not
rely on external entities (including the external subset entity) and
that does not contain general entity reference that
references an entity whose replacement text contains character
&
or <
. It also supports XML namespaces.
It does not stop the process to construct a DOM tree even if it
detects a well-formedness or a namespace well-formedness error. It
recovers from errors in a manner similar to HTML5's tokenization
algorithm. It is expected that the combination of this module and a
future extension to the Whatpm::ContentChecker
framework
will provide a mean to detect all well-formedness and validity errors,
if desired.
(See also demo.)
Whatpm::XMLSerializer
The module for conformance checking of a DOM tree (i.e. a in-memory representation of an HTML or XML document) is as follows:
Whatpm::ContentChecker
Currently, conformance checking of HTML/XHTML and Atom documents is supported.
For these modules, a DOM implementation that supports the manakai's
Perl binding of DOM is necessary to represent a
document in memory. The manakai-core
package contains such an implementation,
Message::DOM::Implementation
, but it
should also be possible to use any other implementation that supports
the binding.
Modules for CSS and related technologies are as follows:
Whatpm::CSS::Cascade
Whatpm::CSS::MediaQueryParser
Whatpm::CSS::MediaQuerySerializer
Whatpm::CSS::Parser
Whatpm::CSS::SelectorsParser
Whatpm::CSS::SelectorsSerializer
Whatpm::CSS::Tokenizer
For the Whatpm::CSS::Parser
module reresenting a CSSOM
tree, modules in the manakai-core
package are used. Those modules also provide the serializer for the
CSSOM tree, in the form of the standard
css_text
CSSOM attribute.
Modules for HTTP and related technologies are as follows:
Whatpm::ContentType
Whatpm::IMTChecker
Currently, support for parsing of HTTP headers and as such is not yet available.
Module for the URL support is as follows:
Whatpm::URIChecker
Support for HTML5's realistic definition of URL is not available yet.
Following modules provide support for other Web-related technologies:
Whatpm::CacheManifest
Whatpm::Charset::DecodeHandle
Whatpm::Charset::UnicodeChecker
Whatpm::Charset::UniversalCharDet
Whatpm::LangTag
Whatpm::RDFXML
Whatpm::WebIDL
;
character.
Modules listed above, which are included in
the Whatpm package, can be used by directly use
ing or
require
ing these modules and then invoking their native
interface. For more information on those native interfaces, see the
document of those modules and the source code of them.
In addition, some of functionality provided by those modules can be
accessed via standardized DOM interfaces implemented by modules
included in the manakai-core
package. See the document of the module Message::DOM::DOMImplementation
for the way to access to the DOM interfaces.
The table below summarizes the relationship between Whatpm modules and DOM methods/attributes implemented by manakai-core modules:
Whatpm module | DOM methods/attributes |
---|---|
Whatpm::CSS::Cascade
| get_computed_style (ViewCSS ),
current_style (ElementCSS )
|
Whatpm::CSS::Parser
| CSSStyleDeclaration 's attributes and methods,
css_text (CSSOM interfaces)
|
Whatpm::CSS::Serializer
| |
Whatpm::CSS::SelectorsParser
| query_selector , query_selector_all
(DocumentSelector , ElementSelector )
|
selector_text (CSSStyleRule )
| |
Whatpm::CSS::SelectorsSerializer
| |
Whatpm::HTML
| inner_html (HTMLDocument ,
Element )
|
Whatpm::HTML::Serializer
| |
Whatpm::XML::Parser
| |
Whatpm::XMLSerializer
|
For the description of functionalities provided by each module, see pod documentation of the module. HTML version of pod documentations are linked from the list of modules above.
In addition, there are additional documents for some topics:
Whatpm::CSS::SelectorsParser
(as output), and
Whatpm::CSS::SelectorsSerializer
(as input).Following specifications define Whatpm-specific formats and extensions:
-manakai-*
properties
and property values implemented by CSS-related modules.
:-manakai-*
pseudo-classes implemented by Selectors-related modules.See also a list of applications using modules in the manakai-core package; some of them indirectly use Whatpm modules from DOM interfaces provided by manakai-core.
Encode
modules, which are part of standard Perl distribution.Error
Whatpm::HTML
requires Error
,
which is bundled in
manakai-core.Message::IMT::InternetMediaType
Whatpm::IMTChecker
depends on
Message::IMT::InternetMediaType
, which is part of
manakai-core.Message::URI::URIReference
Whatpm::URIChecker
and
Whatpm::CacheManifest
depend on
Message::URI::URIReference
,
which is part of
manakai-core.Message::Charset::Info
Whatpm::ContentChecker
depends on
Message::Charset::Info
,
which is part of
manakai-core.Message::DOM::DOMImplementation
Whatpm::URIChecker
depends on
Message::DOM::DOMImplementation
,
which is part of
manakai-core.
Message::DOM::DOMImplementation
and related modulesWhatpm::ContentChecker
depends on Message::DOM::DOMImplementation
and related modules
in manakai-core.
They are not required for any practical use of those modules.
Whatpm::Charset::DecodeHandle
depends on
modules in manakai
charlib for decoding of Japanese character encodings.
See the documentation for
manakai
charlib for more information.Inline::Python
module, and Universal Encoding
DetectorWhatpm::Charset::UniversalCharDet
being
meaningful, these softwares are required on the system. See the
documentation
for more information.JSON
Whatpm::HTML
and
Whatpm::CSS::Tokenizer
depends on JSON
and related modules.
They are not required for any practical use of those modules.
The development version of Whatpm may be found in the CVS repository.
The latest developmenet version of the Whatpm is also available as a tarball.
Whatpm::ContentType
,
HTML tokenization,
HTML tree construction,
Whatpm::ContentChecker
).meta
.## TODOand
## ISSUEin each module.)
Thanks to the html5lib team for their HTML5 parser test data.
Copyright 2007‐2008 Wakaba
<w@suika.fam.cx>
.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.