Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet Character Encoding Detection
require Whatpm::Charset::UniversalCharDet; $charset_name = Whatpm::Charset::UniversalCharDet ->detect_byte_string ($byte_string); # $charset_name: charset name (in lowercase) or undef
The Whatpm::Charset::UniversalCharDet
module is a Perl interface to
the universalchardet character encoding detection.
The universalchardet is originally developed by Mozilla project
and then ported to other platforms. The Whatpm::Charset::UniversalCharDet
module provides a Perl interface to Universal Encoding Detector,
a Python port of the Mozilla's universalchardet code. Future
version of this module might provide an interface to another
port of the universalchardet.
Detect the character encoding of the specified byte string.
The byte string.
The name of the character encoding, detected by universalchardet,
in lowercase.
If no character encoding can be detected, because, e.g., no implementation
for universalchardet is found, undef
is returned.
For the list of supported encodings, see documentation for Universal Encoding Detector <http://chardet.feedparser.org/docs/supported-encodings.html>.
A Perl module available at CPAN <http://search.cpan.org/~neilw/Inline-Python-0.22/>.
To install the module using CPAN.pm:
root# perl -MCPAN -eshell cpan> install Inline::Python
Available at <http://www.python.org/download/>.
Available at <http://chardet.feedparser.org/download/>.
Expand the archive and then execute python setup.py install
in the expanded directory.
The Whatpm::Charset::UniversalCharDet
module does not raise
error even when it fails to load the universalchardet library;
it simply warn
s the error message.
This behavior can be changed by setting a true value to the flag
$Whatpm::Charset::UniversalCharDet::DEBUG
- it will make any error
invoke die
instead of warn
.
Common error messages are as follows:
Module Inline is not installed.
Module the Inline::Python manpage is not installed.
The temporary directory for the Inline module is not available. See The Inline DIRECTORY in the Inline::Python manpage or <http://search.cpan.org/~ingy/Inline-0.44/Inline.pod#The_Inline_DIRECTORY>.
Universal Encoding Detector is not installed.
UNIVCHARDET - SuikaWiki <http://suika.fam.cx/gate/2005/sw/UNIVCHARDET>
Universal Encoding Detector: character encoding auto-detection in Python <http://chardet.feedparser.org/>
A composite approach to language/encoding detection <http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html>
Wakaba <w@suika.fam.cx>.
Copyright 2007-2008 Wakaba <w@suika.fam.cx>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.