| 1 |
wakaba |
1.2 |
<?xml version="1.0" ?> |
| 2 |
wakaba |
1.1 |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
| 3 |
|
|
<html xmlns="http://www.w3.org/1999/xhtml"> |
| 4 |
|
|
<head> |
| 5 |
|
|
<title>Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet |
| 6 |
|
|
Character Encoding Detection</title> |
| 7 |
|
|
<link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" /> |
| 8 |
wakaba |
1.2 |
<meta http-equiv="content-type" content="text/html; charset=utf-8" /> |
| 9 |
|
|
<link rev="made" href="mailto:wakaba@suika.fam.cx" /> |
| 10 |
wakaba |
1.1 |
</head> |
| 11 |
|
|
|
| 12 |
|
|
<body> |
| 13 |
|
|
|
| 14 |
wakaba |
1.2 |
|
| 15 |
|
|
<!-- INDEX BEGIN --> |
| 16 |
|
|
<div name="index"> |
| 17 |
wakaba |
1.1 |
<p><a name="__index__"></a></p> |
| 18 |
|
|
|
| 19 |
|
|
<ul> |
| 20 |
|
|
|
| 21 |
|
|
<li><a href="#name">NAME</a></li> |
| 22 |
|
|
<li><a href="#synopsis">SYNOPSIS</a></li> |
| 23 |
|
|
<li><a href="#description">DESCRIPTION</a></li> |
| 24 |
|
|
<li><a href="#method">METHOD</a></li> |
| 25 |
|
|
<li><a href="#dependency">DEPENDENCY</a></li> |
| 26 |
|
|
<li><a href="#troubleshooting">TROUBLESHOOTING</a></li> |
| 27 |
|
|
<li><a href="#see_also">SEE ALSO</a></li> |
| 28 |
|
|
<li><a href="#author">AUTHOR</a></li> |
| 29 |
|
|
<li><a href="#license">LICENSE</a></li> |
| 30 |
|
|
</ul> |
| 31 |
wakaba |
1.2 |
|
| 32 |
|
|
<hr name="index" /> |
| 33 |
|
|
</div> |
| 34 |
wakaba |
1.1 |
<!-- INDEX END --> |
| 35 |
|
|
|
| 36 |
|
|
<p> |
| 37 |
|
|
</p> |
| 38 |
|
|
<h1><a name="name">NAME</a></h1> |
| 39 |
|
|
<p>Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet |
| 40 |
|
|
Character Encoding Detection</p> |
| 41 |
|
|
<p> |
| 42 |
|
|
</p> |
| 43 |
|
|
<hr /> |
| 44 |
|
|
<h1><a name="synopsis">SYNOPSIS</a></h1> |
| 45 |
|
|
<pre> |
| 46 |
|
|
require Whatpm::Charset::UniversalCharDet; |
| 47 |
|
|
$charset_name = Whatpm::Charset::UniversalCharDet |
| 48 |
|
|
->detect_byte_string ($byte_string); |
| 49 |
|
|
# $charset_name: charset name (in lowercase) or undef</pre> |
| 50 |
|
|
<p> |
| 51 |
|
|
</p> |
| 52 |
|
|
<hr /> |
| 53 |
|
|
<h1><a name="description">DESCRIPTION</a></h1> |
| 54 |
|
|
<p>The <code>Whatpm::Charset::UniversalCharDet</code> module is a Perl interface to |
| 55 |
|
|
the universalchardet character encoding detection.</p> |
| 56 |
|
|
<p>The universalchardet is originally developed by Mozilla project |
| 57 |
|
|
and then ported to other platforms. The <code>Whatpm::Charset::UniversalCharDet</code> |
| 58 |
|
|
module provides a Perl interface to Universal Encoding Detector, |
| 59 |
|
|
a Python port of the Mozilla's universalchardet code. Future |
| 60 |
|
|
version of this module might provide an interface to another |
| 61 |
|
|
port of the universalchardet.</p> |
| 62 |
|
|
<p> |
| 63 |
|
|
</p> |
| 64 |
|
|
<hr /> |
| 65 |
|
|
<h1><a name="method">METHOD</a></h1> |
| 66 |
|
|
<dl> |
| 67 |
wakaba |
1.2 |
<dt><strong><a name="detect_byte_string" class="item"><em>$charset</em> = Whatpm::Charset::UniversalCharDet->detect_byte_string (<em>$s</em>)</a></strong> |
| 68 |
|
|
|
| 69 |
wakaba |
1.1 |
<dd> |
| 70 |
wakaba |
1.2 |
<p>Detect the character encoding of the specified byte string.</p> |
| 71 |
wakaba |
1.1 |
</dd> |
| 72 |
|
|
<dl> |
| 73 |
wakaba |
1.2 |
<dt><strong><a name="_s" class="item"><em>$s</em></a></strong> |
| 74 |
|
|
|
| 75 |
wakaba |
1.1 |
<dd> |
| 76 |
wakaba |
1.2 |
<p>The byte string.</p> |
| 77 |
wakaba |
1.1 |
</dd> |
| 78 |
wakaba |
1.2 |
</li> |
| 79 |
|
|
<dt><strong><a name="_charset" class="item"><em>$charset</em></a></strong> |
| 80 |
|
|
|
| 81 |
wakaba |
1.1 |
<dd> |
| 82 |
wakaba |
1.2 |
<p>The name of the character encoding, detected by universalchardet, |
| 83 |
wakaba |
1.1 |
in lowercase. |
| 84 |
|
|
If no character encoding can be detected, because, e.g., no implementation |
| 85 |
wakaba |
1.2 |
for universalchardet is found, <code>undef</code> is returned.</p> |
| 86 |
wakaba |
1.1 |
</dd> |
| 87 |
|
|
<dd> |
| 88 |
|
|
<p>For the list of supported encodings, see documentation for |
| 89 |
|
|
Universal Encoding Detector |
| 90 |
|
|
<http://chardet.feedparser.org/docs/supported-encodings.html>.</p> |
| 91 |
|
|
</dd> |
| 92 |
wakaba |
1.2 |
</li> |
| 93 |
|
|
</dl> |
| 94 |
wakaba |
1.1 |
</dl> |
| 95 |
|
|
<p> |
| 96 |
|
|
</p> |
| 97 |
|
|
<hr /> |
| 98 |
|
|
<h1><a name="dependency">DEPENDENCY</a></h1> |
| 99 |
|
|
<dl> |
| 100 |
wakaba |
1.2 |
<dt><strong><a name="inline_python" class="item"><a href="../../Inline/Python.html">the Inline::Python manpage</a></a></strong> |
| 101 |
|
|
|
| 102 |
wakaba |
1.1 |
<dd> |
| 103 |
wakaba |
1.2 |
<p>A Perl module available at CPAN |
| 104 |
|
|
<http://search.cpan.org/~neilw/Inline-Python-0.22/>.</p> |
| 105 |
wakaba |
1.1 |
</dd> |
| 106 |
|
|
<dd> |
| 107 |
|
|
<p>To install the module using <em>CPAN.pm</em>:</p> |
| 108 |
|
|
</dd> |
| 109 |
|
|
<dd> |
| 110 |
|
|
<pre> |
| 111 |
|
|
root# perl -MCPAN -eshell |
| 112 |
|
|
cpan> install Inline::Python</pre> |
| 113 |
|
|
</dd> |
| 114 |
wakaba |
1.2 |
</li> |
| 115 |
|
|
<dt><strong><a name="python" class="item">Python</a></strong> |
| 116 |
|
|
|
| 117 |
wakaba |
1.1 |
<dd> |
| 118 |
wakaba |
1.2 |
<p>Available at <http://www.python.org/download/>.</p> |
| 119 |
wakaba |
1.1 |
</dd> |
| 120 |
wakaba |
1.2 |
</li> |
| 121 |
|
|
<dt><strong><a name="universal_encoding_detector" class="item">Universal Encoding Detector</a></strong> |
| 122 |
|
|
|
| 123 |
wakaba |
1.1 |
<dd> |
| 124 |
wakaba |
1.2 |
<p>Available at <http://chardet.feedparser.org/download/>.</p> |
| 125 |
wakaba |
1.1 |
</dd> |
| 126 |
|
|
<dd> |
| 127 |
|
|
<p>Expand the archive and then execute <code>python setup.py install</code> |
| 128 |
|
|
in the expanded directory.</p> |
| 129 |
|
|
</dd> |
| 130 |
wakaba |
1.2 |
</li> |
| 131 |
|
|
</dl> |
| 132 |
wakaba |
1.1 |
<p> |
| 133 |
|
|
</p> |
| 134 |
|
|
<hr /> |
| 135 |
|
|
<h1><a name="troubleshooting">TROUBLESHOOTING</a></h1> |
| 136 |
|
|
<p>The <code>Whatpm::Charset::UniversalCharDet</code> module does not raise |
| 137 |
|
|
error even when it fails to load the universalchardet library; |
| 138 |
|
|
it simply <code>warn</code>s the error message.</p> |
| 139 |
wakaba |
1.2 |
<p>This behavior can be changed by setting a true value to the flag |
| 140 |
|
|
<code>$Whatpm::Charset::UniversalCharDet::DEBUG</code> - it will make any error |
| 141 |
|
|
invoke <code>die</code> instead of <code>warn</code>.</p> |
| 142 |
|
|
<p>Common error messages are as follows:</p> |
| 143 |
wakaba |
1.1 |
<dl> |
| 144 |
wakaba |
1.2 |
<dt><strong><a name="can_t_locate_inline_pm_in_inc" class="item">Can't locate Inline.pm in @INC</a></strong> |
| 145 |
|
|
|
| 146 |
wakaba |
1.1 |
<dd> |
| 147 |
wakaba |
1.2 |
<p>Module <em>Inline</em> is not installed.</p> |
| 148 |
wakaba |
1.1 |
</dd> |
| 149 |
wakaba |
1.2 |
</li> |
| 150 |
|
|
<dt><strong><a name="error_you_have_specified_python_as_an_inline_programming_language" class="item">Error. You have specified 'Python' as an Inline programming language.</a></strong> |
| 151 |
|
|
|
| 152 |
wakaba |
1.1 |
<dd> |
| 153 |
wakaba |
1.2 |
<p>Module <a href="../../Inline/Python.html">the Inline::Python manpage</a> is not installed.</p> |
| 154 |
wakaba |
1.1 |
</dd> |
| 155 |
wakaba |
1.2 |
</li> |
| 156 |
|
|
<dt><strong><a name="couldn_t_find_an_appropriate_directory_for_inline_to_use" class="item">Couldn't find an appropriate DIRECTORY for Inline to use.</a></strong> |
| 157 |
|
|
|
| 158 |
wakaba |
1.1 |
<dd> |
| 159 |
wakaba |
1.2 |
<p>The temporary directory for the <em>Inline</em> module is not available. |
| 160 |
wakaba |
1.1 |
See <a href="../../Inline/Python.html#the_inline_directory">The Inline DIRECTORY in the Inline::Python manpage</a> or |
| 161 |
wakaba |
1.2 |
<http://search.cpan.org/~ingy/Inline-0.44/Inline.pod#The_Inline_DIRECTORY>.</p> |
| 162 |
wakaba |
1.1 |
</dd> |
| 163 |
wakaba |
1.2 |
</li> |
| 164 |
|
|
<dt><strong><a name="error_py_eval_raised_an_exception" class="item">Error -- py_eval raised an exception</a></strong> |
| 165 |
|
|
|
| 166 |
wakaba |
1.1 |
<dd> |
| 167 |
wakaba |
1.2 |
<p>Universal Encoding Detector is not installed.</p> |
| 168 |
wakaba |
1.1 |
</dd> |
| 169 |
wakaba |
1.2 |
</li> |
| 170 |
|
|
</dl> |
| 171 |
wakaba |
1.1 |
<p> |
| 172 |
|
|
</p> |
| 173 |
|
|
<hr /> |
| 174 |
|
|
<h1><a name="see_also">SEE ALSO</a></h1> |
| 175 |
|
|
<p>UNIVCHARDET - SuikaWiki |
| 176 |
|
|
<http://suika.fam.cx/gate/2005/sw/UNIVCHARDET></p> |
| 177 |
|
|
<p>Universal Encoding Detector: character encoding auto-detection in Python |
| 178 |
|
|
<http://chardet.feedparser.org/></p> |
| 179 |
|
|
<p>A composite approach to language/encoding detection |
| 180 |
|
|
<http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html></p> |
| 181 |
|
|
<p> |
| 182 |
|
|
</p> |
| 183 |
|
|
<hr /> |
| 184 |
|
|
<h1><a name="author">AUTHOR</a></h1> |
| 185 |
|
|
<p>Wakaba <<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>>.</p> |
| 186 |
|
|
<p> |
| 187 |
|
|
</p> |
| 188 |
|
|
<hr /> |
| 189 |
|
|
<h1><a name="license">LICENSE</a></h1> |
| 190 |
wakaba |
1.2 |
<p>Copyright 2007-2008 Wakaba <<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>></p> |
| 191 |
wakaba |
1.1 |
<p>This library is free software; you can redistribute it |
| 192 |
|
|
and/or modify it under the same terms as Perl itself.</p> |
| 193 |
|
|
|
| 194 |
|
|
</body> |
| 195 |
|
|
|
| 196 |
|
|
</html> |