1 |
=head1 NAME |
2 |
|
3 |
Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet |
4 |
Character Encoding Detection |
5 |
|
6 |
=head1 SYNOPSIS |
7 |
|
8 |
require Whatpm::Charset::UniversalCharDet; |
9 |
$charset_name = Whatpm::Charset::UniversalCharDet |
10 |
->detect_byte_string ($byte_string); |
11 |
# $charset_name: charset name (in lowercase) or undef |
12 |
|
13 |
=head1 DESCRIPTION |
14 |
|
15 |
The C<Whatpm::Charset::UniversalCharDet> module is a Perl interface to |
16 |
the universalchardet character encoding detection. |
17 |
|
18 |
The universalchardet is originally developed by Mozilla project |
19 |
and then ported to other platforms. The C<Whatpm::Charset::UniversalCharDet> |
20 |
module provides a Perl interface to Universal Encoding Detector, |
21 |
a Python port of the Mozilla's universalchardet code. Future |
22 |
version of this module might provide an interface to another |
23 |
port of the universalchardet. |
24 |
|
25 |
=head1 METHOD |
26 |
|
27 |
=over 4 |
28 |
|
29 |
=item I<$charset> = Whatpm::Charset::UniversalCharDet->detect_byte_string (I<$s>) |
30 |
|
31 |
Detect the character encoding of the specified byte string. |
32 |
|
33 |
=over 4 |
34 |
|
35 |
=item I<$s> |
36 |
|
37 |
The byte string. |
38 |
|
39 |
=item I<$charset> |
40 |
|
41 |
The name of the character encoding, detected by universalchardet, |
42 |
in lowercase. |
43 |
If no character encoding can be detected, because, e.g., no implementation |
44 |
for universalchardet is found, C<undef> is returned. |
45 |
|
46 |
For the list of supported encodings, see documentation for |
47 |
Universal Encoding Detector |
48 |
<http://chardet.feedparser.org/docs/supported-encodings.html>. |
49 |
|
50 |
=back |
51 |
|
52 |
=back |
53 |
|
54 |
=head1 DEPENDENCY |
55 |
|
56 |
=over 4 |
57 |
|
58 |
=item L<Inline::Python> |
59 |
|
60 |
A Perl module available at CPAN |
61 |
<http://search.cpan.org/~neilw/Inline-Python-0.22/>. |
62 |
|
63 |
To install the module using L<CPAN.pm>: |
64 |
|
65 |
root# perl -MCPAN -eshell |
66 |
cpan> install Inline::Python |
67 |
|
68 |
=item Python |
69 |
|
70 |
Available at <http://www.python.org/download/>. |
71 |
|
72 |
=item Universal Encoding Detector |
73 |
|
74 |
Available at <http://chardet.feedparser.org/download/>. |
75 |
|
76 |
Expand the archive and then execute C<python setup.py install> |
77 |
in the expanded directory. |
78 |
|
79 |
=back |
80 |
|
81 |
=head1 TROUBLESHOOTING |
82 |
|
83 |
The C<Whatpm::Charset::UniversalCharDet> module does not raise |
84 |
error even when it fails to load the universalchardet library; |
85 |
it simply C<warn>s the error message. |
86 |
|
87 |
This behavior can be changed by setting a true value to the flag |
88 |
C<$Whatpm::Charset::UniversalCharDet::DEBUG> - it will make any error |
89 |
invoke C<die> instead of C<warn>. |
90 |
|
91 |
Common error messages are as follows: |
92 |
|
93 |
=over 4 |
94 |
|
95 |
=item Can't locate Inline.pm in @INC |
96 |
|
97 |
Module L<Inline> is not installed. |
98 |
|
99 |
=item Error. You have specified 'Python' as an Inline programming language. |
100 |
|
101 |
Module L<Inline::Python> is not installed. |
102 |
|
103 |
=item Couldn't find an appropriate DIRECTORY for Inline to use. |
104 |
|
105 |
The temporary directory for the L<Inline> module is not available. |
106 |
See L<Inline::Python/"The Inline DIRECTORY"> or |
107 |
<http://search.cpan.org/~ingy/Inline-0.44/Inline.pod#The_Inline_DIRECTORY>. |
108 |
|
109 |
=item Error -- py_eval raised an exception |
110 |
|
111 |
Universal Encoding Detector is not installed. |
112 |
|
113 |
=back |
114 |
|
115 |
=head1 SEE ALSO |
116 |
|
117 |
UNIVCHARDET - SuikaWiki |
118 |
<http://suika.fam.cx/gate/2005/sw/UNIVCHARDET> |
119 |
|
120 |
Universal Encoding Detector: character encoding auto-detection in Python |
121 |
<http://chardet.feedparser.org/> |
122 |
|
123 |
A composite approach to language/encoding detection |
124 |
<http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html> |
125 |
|
126 |
=head1 AUTHOR |
127 |
|
128 |
Wakaba <w@suika.fam.cx>. |
129 |
|
130 |
=head1 LICENSE |
131 |
|
132 |
Copyright 2007-2008 Wakaba <w@suika.fam.cx> |
133 |
|
134 |
This library is free software; you can redistribute it |
135 |
and/or modify it under the same terms as Perl itself. |
136 |
|
137 |
=cut |
138 |
|
139 |
## $Date: 2007/11/19 12:18:27 $ |