1 |
wakaba |
1.2 |
<?xml version="1.0" ?> |
2 |
wakaba |
1.1 |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
3 |
|
|
<html xmlns="http://www.w3.org/1999/xhtml"> |
4 |
|
|
<head> |
5 |
|
|
<title>Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet |
6 |
|
|
Character Encoding Detection</title> |
7 |
|
|
<link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" /> |
8 |
wakaba |
1.2 |
<meta http-equiv="content-type" content="text/html; charset=utf-8" /> |
9 |
|
|
<link rev="made" href="mailto:wakaba@suika.fam.cx" /> |
10 |
wakaba |
1.1 |
</head> |
11 |
|
|
|
12 |
|
|
<body> |
13 |
|
|
|
14 |
wakaba |
1.2 |
|
15 |
|
|
<!-- INDEX BEGIN --> |
16 |
|
|
<div name="index"> |
17 |
wakaba |
1.1 |
<p><a name="__index__"></a></p> |
18 |
|
|
|
19 |
|
|
<ul> |
20 |
|
|
|
21 |
|
|
<li><a href="#name">NAME</a></li> |
22 |
|
|
<li><a href="#synopsis">SYNOPSIS</a></li> |
23 |
|
|
<li><a href="#description">DESCRIPTION</a></li> |
24 |
|
|
<li><a href="#method">METHOD</a></li> |
25 |
|
|
<li><a href="#dependency">DEPENDENCY</a></li> |
26 |
|
|
<li><a href="#troubleshooting">TROUBLESHOOTING</a></li> |
27 |
|
|
<li><a href="#see_also">SEE ALSO</a></li> |
28 |
|
|
<li><a href="#author">AUTHOR</a></li> |
29 |
|
|
<li><a href="#license">LICENSE</a></li> |
30 |
|
|
</ul> |
31 |
wakaba |
1.2 |
|
32 |
|
|
<hr name="index" /> |
33 |
|
|
</div> |
34 |
wakaba |
1.1 |
<!-- INDEX END --> |
35 |
|
|
|
36 |
|
|
<p> |
37 |
|
|
</p> |
38 |
|
|
<h1><a name="name">NAME</a></h1> |
39 |
|
|
<p>Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet |
40 |
|
|
Character Encoding Detection</p> |
41 |
|
|
<p> |
42 |
|
|
</p> |
43 |
|
|
<hr /> |
44 |
|
|
<h1><a name="synopsis">SYNOPSIS</a></h1> |
45 |
|
|
<pre> |
46 |
|
|
require Whatpm::Charset::UniversalCharDet; |
47 |
|
|
$charset_name = Whatpm::Charset::UniversalCharDet |
48 |
|
|
->detect_byte_string ($byte_string); |
49 |
|
|
# $charset_name: charset name (in lowercase) or undef</pre> |
50 |
|
|
<p> |
51 |
|
|
</p> |
52 |
|
|
<hr /> |
53 |
|
|
<h1><a name="description">DESCRIPTION</a></h1> |
54 |
|
|
<p>The <code>Whatpm::Charset::UniversalCharDet</code> module is a Perl interface to |
55 |
|
|
the universalchardet character encoding detection.</p> |
56 |
|
|
<p>The universalchardet is originally developed by Mozilla project |
57 |
|
|
and then ported to other platforms. The <code>Whatpm::Charset::UniversalCharDet</code> |
58 |
|
|
module provides a Perl interface to Universal Encoding Detector, |
59 |
|
|
a Python port of the Mozilla's universalchardet code. Future |
60 |
|
|
version of this module might provide an interface to another |
61 |
|
|
port of the universalchardet.</p> |
62 |
|
|
<p> |
63 |
|
|
</p> |
64 |
|
|
<hr /> |
65 |
|
|
<h1><a name="method">METHOD</a></h1> |
66 |
|
|
<dl> |
67 |
wakaba |
1.2 |
<dt><strong><a name="detect_byte_string" class="item"><em>$charset</em> = Whatpm::Charset::UniversalCharDet->detect_byte_string (<em>$s</em>)</a></strong> |
68 |
|
|
|
69 |
wakaba |
1.1 |
<dd> |
70 |
wakaba |
1.2 |
<p>Detect the character encoding of the specified byte string.</p> |
71 |
wakaba |
1.1 |
</dd> |
72 |
|
|
<dl> |
73 |
wakaba |
1.2 |
<dt><strong><a name="_s" class="item"><em>$s</em></a></strong> |
74 |
|
|
|
75 |
wakaba |
1.1 |
<dd> |
76 |
wakaba |
1.2 |
<p>The byte string.</p> |
77 |
wakaba |
1.1 |
</dd> |
78 |
wakaba |
1.2 |
</li> |
79 |
|
|
<dt><strong><a name="_charset" class="item"><em>$charset</em></a></strong> |
80 |
|
|
|
81 |
wakaba |
1.1 |
<dd> |
82 |
wakaba |
1.2 |
<p>The name of the character encoding, detected by universalchardet, |
83 |
wakaba |
1.1 |
in lowercase. |
84 |
|
|
If no character encoding can be detected, because, e.g., no implementation |
85 |
wakaba |
1.2 |
for universalchardet is found, <code>undef</code> is returned.</p> |
86 |
wakaba |
1.1 |
</dd> |
87 |
|
|
<dd> |
88 |
|
|
<p>For the list of supported encodings, see documentation for |
89 |
|
|
Universal Encoding Detector |
90 |
|
|
<http://chardet.feedparser.org/docs/supported-encodings.html>.</p> |
91 |
|
|
</dd> |
92 |
wakaba |
1.2 |
</li> |
93 |
|
|
</dl> |
94 |
wakaba |
1.1 |
</dl> |
95 |
|
|
<p> |
96 |
|
|
</p> |
97 |
|
|
<hr /> |
98 |
|
|
<h1><a name="dependency">DEPENDENCY</a></h1> |
99 |
|
|
<dl> |
100 |
wakaba |
1.2 |
<dt><strong><a name="inline_python" class="item"><a href="../../Inline/Python.html">the Inline::Python manpage</a></a></strong> |
101 |
|
|
|
102 |
wakaba |
1.1 |
<dd> |
103 |
wakaba |
1.2 |
<p>A Perl module available at CPAN |
104 |
|
|
<http://search.cpan.org/~neilw/Inline-Python-0.22/>.</p> |
105 |
wakaba |
1.1 |
</dd> |
106 |
|
|
<dd> |
107 |
|
|
<p>To install the module using <em>CPAN.pm</em>:</p> |
108 |
|
|
</dd> |
109 |
|
|
<dd> |
110 |
|
|
<pre> |
111 |
|
|
root# perl -MCPAN -eshell |
112 |
|
|
cpan> install Inline::Python</pre> |
113 |
|
|
</dd> |
114 |
wakaba |
1.2 |
</li> |
115 |
|
|
<dt><strong><a name="python" class="item">Python</a></strong> |
116 |
|
|
|
117 |
wakaba |
1.1 |
<dd> |
118 |
wakaba |
1.2 |
<p>Available at <http://www.python.org/download/>.</p> |
119 |
wakaba |
1.1 |
</dd> |
120 |
wakaba |
1.2 |
</li> |
121 |
|
|
<dt><strong><a name="universal_encoding_detector" class="item">Universal Encoding Detector</a></strong> |
122 |
|
|
|
123 |
wakaba |
1.1 |
<dd> |
124 |
wakaba |
1.2 |
<p>Available at <http://chardet.feedparser.org/download/>.</p> |
125 |
wakaba |
1.1 |
</dd> |
126 |
|
|
<dd> |
127 |
|
|
<p>Expand the archive and then execute <code>python setup.py install</code> |
128 |
|
|
in the expanded directory.</p> |
129 |
|
|
</dd> |
130 |
wakaba |
1.2 |
</li> |
131 |
|
|
</dl> |
132 |
wakaba |
1.1 |
<p> |
133 |
|
|
</p> |
134 |
|
|
<hr /> |
135 |
|
|
<h1><a name="troubleshooting">TROUBLESHOOTING</a></h1> |
136 |
|
|
<p>The <code>Whatpm::Charset::UniversalCharDet</code> module does not raise |
137 |
|
|
error even when it fails to load the universalchardet library; |
138 |
|
|
it simply <code>warn</code>s the error message.</p> |
139 |
wakaba |
1.2 |
<p>This behavior can be changed by setting a true value to the flag |
140 |
|
|
<code>$Whatpm::Charset::UniversalCharDet::DEBUG</code> - it will make any error |
141 |
|
|
invoke <code>die</code> instead of <code>warn</code>.</p> |
142 |
|
|
<p>Common error messages are as follows:</p> |
143 |
wakaba |
1.1 |
<dl> |
144 |
wakaba |
1.2 |
<dt><strong><a name="can_t_locate_inline_pm_in_inc" class="item">Can't locate Inline.pm in @INC</a></strong> |
145 |
|
|
|
146 |
wakaba |
1.1 |
<dd> |
147 |
wakaba |
1.2 |
<p>Module <em>Inline</em> is not installed.</p> |
148 |
wakaba |
1.1 |
</dd> |
149 |
wakaba |
1.2 |
</li> |
150 |
|
|
<dt><strong><a name="error_you_have_specified_python_as_an_inline_programming_language" class="item">Error. You have specified 'Python' as an Inline programming language.</a></strong> |
151 |
|
|
|
152 |
wakaba |
1.1 |
<dd> |
153 |
wakaba |
1.2 |
<p>Module <a href="../../Inline/Python.html">the Inline::Python manpage</a> is not installed.</p> |
154 |
wakaba |
1.1 |
</dd> |
155 |
wakaba |
1.2 |
</li> |
156 |
|
|
<dt><strong><a name="couldn_t_find_an_appropriate_directory_for_inline_to_use" class="item">Couldn't find an appropriate DIRECTORY for Inline to use.</a></strong> |
157 |
|
|
|
158 |
wakaba |
1.1 |
<dd> |
159 |
wakaba |
1.2 |
<p>The temporary directory for the <em>Inline</em> module is not available. |
160 |
wakaba |
1.1 |
See <a href="../../Inline/Python.html#the_inline_directory">The Inline DIRECTORY in the Inline::Python manpage</a> or |
161 |
wakaba |
1.2 |
<http://search.cpan.org/~ingy/Inline-0.44/Inline.pod#The_Inline_DIRECTORY>.</p> |
162 |
wakaba |
1.1 |
</dd> |
163 |
wakaba |
1.2 |
</li> |
164 |
|
|
<dt><strong><a name="error_py_eval_raised_an_exception" class="item">Error -- py_eval raised an exception</a></strong> |
165 |
|
|
|
166 |
wakaba |
1.1 |
<dd> |
167 |
wakaba |
1.2 |
<p>Universal Encoding Detector is not installed.</p> |
168 |
wakaba |
1.1 |
</dd> |
169 |
wakaba |
1.2 |
</li> |
170 |
|
|
</dl> |
171 |
wakaba |
1.1 |
<p> |
172 |
|
|
</p> |
173 |
|
|
<hr /> |
174 |
|
|
<h1><a name="see_also">SEE ALSO</a></h1> |
175 |
|
|
<p>UNIVCHARDET - SuikaWiki |
176 |
|
|
<http://suika.fam.cx/gate/2005/sw/UNIVCHARDET></p> |
177 |
|
|
<p>Universal Encoding Detector: character encoding auto-detection in Python |
178 |
|
|
<http://chardet.feedparser.org/></p> |
179 |
|
|
<p>A composite approach to language/encoding detection |
180 |
|
|
<http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html></p> |
181 |
|
|
<p> |
182 |
|
|
</p> |
183 |
|
|
<hr /> |
184 |
|
|
<h1><a name="author">AUTHOR</a></h1> |
185 |
|
|
<p>Wakaba <<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>>.</p> |
186 |
|
|
<p> |
187 |
|
|
</p> |
188 |
|
|
<hr /> |
189 |
|
|
<h1><a name="license">LICENSE</a></h1> |
190 |
wakaba |
1.2 |
<p>Copyright 2007-2008 Wakaba <<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>></p> |
191 |
wakaba |
1.1 |
<p>This library is free software; you can redistribute it |
192 |
|
|
and/or modify it under the same terms as Perl itself.</p> |
193 |
|
|
|
194 |
|
|
</body> |
195 |
|
|
|
196 |
|
|
</html> |