Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet Character Encoding Detection
require Whatpm::Charset::UniversalCharDet; $charset_name = Whatpm::Charset::UniversalCharDet ->detect_byte_string ($byte_string); # $charset_name: charset name (in lowercase) or undef
The Whatpm::Charset::UniversalCharDet
module is a Perl interface to
the universalchardet character encoding detection.
The universalchardet is originally developed by Mozilla project and
then ported to other platforms. The
Whatpm::Charset::UniversalCharDet
module provides a Perl interface
to Universal Encoding Detector, a Python port of the Mozilla's
universalchardet code. Future version of this module might provide an
interface to another port of the universalchardet.
Detect the character encoding of the specified byte string.
The byte string.
The name of the character encoding, detected by universalchardet, in
lowercase. If no character encoding can be detected, because, e.g.,
no implementation for universalchardet is found, undef
is returned.
For the list of supported encodings, see documentation for Universal Encoding Detector <http://chardet.feedparser.org/docs/supported-encodings.html>.
A Perl module which enables Python support for Inline code embedding, available from <http://search.cpan.org/dist/Inline-Python/>.
To install the module using CPAN:
root# perl -MCPAN -eshell cpan> install Inline::Python
Available at <http://www.python.org/download/>.
Available at <http://chardet.feedparser.org/download/>.
Expand the archive and then execute python setup.py install
in the
expanded directory.
The Whatpm::Charset::UniversalCharDet
module does not raise error
even when it fails to load the universalchardet library; it simply
warn
s the error message.
This behavior can be changed by setting a true value to the flag
$Whatpm::Charset::UniversalCharDet::DEBUG
- it will make any error
invoke die
instead of warn
.
Common error messages are as follows:
Module Inline is not installed.
Module the Inline::Python manpage is not installed. If you did install the
module, please find "the Inline DIRECTORY" (e.g. ./_Inline
) and
remove it.
The temporary directory for the Inline module is not available. See The Inline DIRECTORY in the Inline::Python manpage or <http://search.cpan.org/dist/Inline/Inline.pod#The_Inline_DIRECTORY>.
Universal Encoding Detector is not installed.
UNIVCHARDET - SuikaWiki <http://suika.fam.cx/gate/2005/sw/UNIVCHARDET>.
Universal Encoding Detector: character encoding auto-detection in Python <http://chardet.feedparser.org/>.
A composite approach to language/encoding detection <http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html>.
Web Applications 1.0 - Determining the character encoding <http://www.whatwg.org/specs/web-apps/current-work/complete.html#determining-the-character-encoding>.
Wakaba <w@suika.fam.cx>.
Copyright 2007-2010 Wakaba <w@suika.fam.cx>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.