Whatpm::Charset::UniversalCharDet
require Whatpm::Charset::UniversalCharDet;
$charset_name = Whatpm::Charset::UniversalCharDet
->detect_byte_string ($byte_string);
# $charset_name: charset name (in lowercase) or undef
The Whatpm::Charset::UniversalCharDet
module is a Perl interface to the universalchardet character encoding detection.
The universalchardet is originally developed by Mozilla project and then ported to other platforms. The Whatpm::Charset::UniversalCharDet
module provides a Perl interface to Universal Encoding Detector, a Python port of the Mozilla's universalchardet code. Future version of this module might provide an interface to another port of the universalchardet.
$charset = Whatpm::Charset::UniversalCharDet->detect_byte_string ($s)
Detect the character encoding of the specified byte string.
The byte string.
The name of the character encoding, detected by universalchardet, in lowercase. If no character encoding can be detected, because, e.g., no implementation for universalchardet is found, undef
is returned.
For the list of supported encodings, see documentation for Universal Encoding Detector <http://chardet.feedparser.org/docs/supported-encodings.html>.
A Perl module which enables Python support for Inline code embedding, available from <http://search.cpan.org/dist/Inline-Python/>
.
To install the module using CPAN:
root# perl -MCPAN -eshell
cpan> install Inline::Python
Available at <http://www.python.org/download/>
.
Available at <http://chardet.feedparser.org/download/>
.
Expand the archive and then execute python setup.py install
in the expanded directory.
The Whatpm::Charset::UniversalCharDet
module does not raise error even when it fails to load the universalchardet library; it simply warn
s the error message.
This behavior can be changed by setting a true value to the flag $Whatpm::Charset::UniversalCharDet::DEBUG
- it will make any error invoke die
instead of warn
.
Common error messages are as follows:
Module Inline is not installed.
Module Inline::Python is not installed. If you did install the module, please find "the Inline DIRECTORY" (e.g. ./_Inline
) and remove it.
The temporary directory for the Inline module is not available. See "The Inline DIRECTORY" in Inline::Python or <http://search.cpan.org/dist/Inline/Inline.pod#The_Inline_DIRECTORY>
.
Universal Encoding Detector is not installed.
UNIVCHARDET
SuikaWiki <https://suika.suikawiki.org/gate/2005/sw/UNIVCHARDET>
.
Universal Encoding Detector: character encoding auto-detection in Python <http://chardet.feedparser.org/>
.
A composite approach to language/encoding detection <http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html>
.
Web Applications 1.0 - Determining the character encoding <http://www.whatwg.org/specs/web-apps/current-work/complete.html#determining-the-character-encoding>
.
Wakaba <wakaba@suikawiki.org>.
Copyright 2007-2010 Wakaba <wakaba@suikawiki.org>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.