1 |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
2 |
<html xmlns="http://www.w3.org/1999/xhtml"> |
3 |
<head> |
4 |
<title>Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet |
5 |
Character Encoding Detection</title> |
6 |
<link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" /> |
7 |
<link rev="made" href="mailto:admin@suika.fam.cx" /> |
8 |
</head> |
9 |
|
10 |
<body> |
11 |
|
12 |
<p><a name="__index__"></a></p> |
13 |
<!-- INDEX BEGIN --> |
14 |
|
15 |
<ul> |
16 |
|
17 |
<li><a href="#name">NAME</a></li> |
18 |
<li><a href="#synopsis">SYNOPSIS</a></li> |
19 |
<li><a href="#description">DESCRIPTION</a></li> |
20 |
<li><a href="#method">METHOD</a></li> |
21 |
<li><a href="#dependency">DEPENDENCY</a></li> |
22 |
<li><a href="#troubleshooting">TROUBLESHOOTING</a></li> |
23 |
<li><a href="#see_also">SEE ALSO</a></li> |
24 |
<li><a href="#author">AUTHOR</a></li> |
25 |
<li><a href="#license">LICENSE</a></li> |
26 |
</ul> |
27 |
<!-- INDEX END --> |
28 |
|
29 |
<hr /> |
30 |
<p> |
31 |
</p> |
32 |
<h1><a name="name">NAME</a></h1> |
33 |
<p>Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet |
34 |
Character Encoding Detection</p> |
35 |
<p> |
36 |
</p> |
37 |
<hr /> |
38 |
<h1><a name="synopsis">SYNOPSIS</a></h1> |
39 |
<pre> |
40 |
require Whatpm::Charset::UniversalCharDet; |
41 |
$charset_name = Whatpm::Charset::UniversalCharDet |
42 |
->detect_byte_string ($byte_string); |
43 |
# $charset_name: charset name (in lowercase) or undef</pre> |
44 |
<p> |
45 |
</p> |
46 |
<hr /> |
47 |
<h1><a name="description">DESCRIPTION</a></h1> |
48 |
<p>The <code>Whatpm::Charset::UniversalCharDet</code> module is a Perl interface to |
49 |
the universalchardet character encoding detection.</p> |
50 |
<p>The universalchardet is originally developed by Mozilla project |
51 |
and then ported to other platforms. The <code>Whatpm::Charset::UniversalCharDet</code> |
52 |
module provides a Perl interface to Universal Encoding Detector, |
53 |
a Python port of the Mozilla's universalchardet code. Future |
54 |
version of this module might provide an interface to another |
55 |
port of the universalchardet.</p> |
56 |
<p> |
57 |
</p> |
58 |
<hr /> |
59 |
<h1><a name="method">METHOD</a></h1> |
60 |
<dl> |
61 |
<dt><strong><a name="item_detect_byte_string"><em>$charset</em> = Whatpm::Charset::UniversalCharDet->detect_byte_string (<em>$s</em>)</a></strong><br /> |
62 |
</dt> |
63 |
<dd> |
64 |
Detect the character encoding of the specified byte string. |
65 |
</dd> |
66 |
<dl> |
67 |
<dt><strong><a name="item__24s"><em>$s</em></a></strong><br /> |
68 |
</dt> |
69 |
<dd> |
70 |
The byte string. |
71 |
</dd> |
72 |
<p></p> |
73 |
<dt><strong><a name="item__24charset"><em>$charset</em></a></strong><br /> |
74 |
</dt> |
75 |
<dd> |
76 |
The name of the character encoding, detected by universalchardet, |
77 |
in lowercase. |
78 |
If no character encoding can be detected, because, e.g., no implementation |
79 |
for universalchardet is found, <code>undef</code> is returned. |
80 |
</dd> |
81 |
<dd> |
82 |
<p>For the list of supported encodings, see documentation for |
83 |
Universal Encoding Detector |
84 |
<http://chardet.feedparser.org/docs/supported-encodings.html>.</p> |
85 |
</dd> |
86 |
<p></p></dl> |
87 |
</dl> |
88 |
<p> |
89 |
</p> |
90 |
<hr /> |
91 |
<h1><a name="dependency">DEPENDENCY</a></h1> |
92 |
<dl> |
93 |
<dt><strong><a name="item_inline_3a_3apython"><a href="../../Inline/Python.html">the Inline::Python manpage</a></a></strong><br /> |
94 |
</dt> |
95 |
<dd> |
96 |
A Perl module available at CPAN |
97 |
<http://search.cpan.org/~neilw/Inline-Python-0.22/>. |
98 |
</dd> |
99 |
<dd> |
100 |
<p>To install the module using <em>CPAN.pm</em>:</p> |
101 |
</dd> |
102 |
<dd> |
103 |
<pre> |
104 |
root# perl -MCPAN -eshell |
105 |
cpan> install Inline::Python</pre> |
106 |
</dd> |
107 |
<p></p> |
108 |
<dt><strong><a name="item_python">Python</a></strong><br /> |
109 |
</dt> |
110 |
<dd> |
111 |
Available at <http://www.python.org/download/>. |
112 |
</dd> |
113 |
<p></p> |
114 |
<dt><strong><a name="item_universal_encoding_detector">Universal Encoding Detector</a></strong><br /> |
115 |
</dt> |
116 |
<dd> |
117 |
Available at <http://chardet.feedparser.org/download/>. |
118 |
</dd> |
119 |
<dd> |
120 |
<p>Expand the archive and then execute <code>python setup.py install</code> |
121 |
in the expanded directory.</p> |
122 |
</dd> |
123 |
<p></p></dl> |
124 |
<p> |
125 |
</p> |
126 |
<hr /> |
127 |
<h1><a name="troubleshooting">TROUBLESHOOTING</a></h1> |
128 |
<p>The <code>Whatpm::Charset::UniversalCharDet</code> module does not raise |
129 |
error even when it fails to load the universalchardet library; |
130 |
it simply <code>warn</code>s the error message.</p> |
131 |
<p>This behavior can be changed by setting a true value to the |
132 |
flag <code>$Whatpm::Charset::UniversalCharDet</code> - it will make any |
133 |
error <code>die</code> rather than <code>warn</code>.</p> |
134 |
<p>Common error messages are following:</p> |
135 |
<dl> |
136 |
<dt><strong><a name="item_can_27t_locate_inlinea_2epm_in__40inc">Can't locate Inlinea.pm in @INC</a></strong><br /> |
137 |
</dt> |
138 |
<dd> |
139 |
<em>Inline</em> is not installed. |
140 |
</dd> |
141 |
<p></p> |
142 |
<dt><strong><a name="item_error_2e_you_have_specified__27python_27_as_an_inl">Error. You have specified 'Python' as an Inline programming language.</a></strong><br /> |
143 |
</dt> |
144 |
<dd> |
145 |
<a href="../../Inline/Python.html">the Inline::Python manpage</a> is not installed. |
146 |
</dd> |
147 |
<p></p> |
148 |
<dt><strong><a name="item_couldn_27t_find_an_appropriate_directory_for_inlin">Couldn't find an appropriate DIRECTORY for Inline to use.</a></strong><br /> |
149 |
</dt> |
150 |
<dd> |
151 |
The temporary directory for the <em>Inline</em> module is not available. |
152 |
See <a href="../../Inline/Python.html#the_inline_directory">The Inline DIRECTORY in the Inline::Python manpage</a> or |
153 |
<http://search.cpan.org/~ingy/Inline-0.44/Inline.pod#The_Inline_DIRECTORY>. |
154 |
</dd> |
155 |
<p></p> |
156 |
<dt><strong><a name="item_error__2d_2d_py_eval_raised_an_exception">Error -- py_eval raised an exception</a></strong><br /> |
157 |
</dt> |
158 |
<dd> |
159 |
Universal Encoding Detector is not installed. |
160 |
</dd> |
161 |
<p></p></dl> |
162 |
<p> |
163 |
</p> |
164 |
<hr /> |
165 |
<h1><a name="see_also">SEE ALSO</a></h1> |
166 |
<p>UNIVCHARDET - SuikaWiki |
167 |
<http://suika.fam.cx/gate/2005/sw/UNIVCHARDET></p> |
168 |
<p>Universal Encoding Detector: character encoding auto-detection in Python |
169 |
<http://chardet.feedparser.org/></p> |
170 |
<p>A composite approach to language/encoding detection |
171 |
<http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html></p> |
172 |
<p> |
173 |
</p> |
174 |
<hr /> |
175 |
<h1><a name="author">AUTHOR</a></h1> |
176 |
<p>Wakaba <<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>>.</p> |
177 |
<p> |
178 |
</p> |
179 |
<hr /> |
180 |
<h1><a name="license">LICENSE</a></h1> |
181 |
<p>Copyright 2007 Wakaba <<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>></p> |
182 |
<p>This library is free software; you can redistribute it |
183 |
and/or modify it under the same terms as Perl itself.</p> |
184 |
|
185 |
</body> |
186 |
|
187 |
</html> |