/[suikacvs]/markup/html/whatpm/Whatpm/Charset/UniversalCharDet.html
Suika

Contents of /markup/html/whatpm/Whatpm/Charset/UniversalCharDet.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download) (as text)
Mon Nov 19 12:18:27 2007 UTC (17 years, 5 months ago) by wakaba
Branch: MAIN
File MIME type: text/html
++ ChangeLog	19 Nov 2007 12:16:52 -0000
2007-11-19  Wakaba  <wakaba@suika.fam.cx>

	* readme.en.html: Link to |Whatpm::Charset::UniversalCharDet|.

++ whatpm/Whatpm/ChangeLog	19 Nov 2007 12:17:47 -0000
2007-11-19  Wakaba  <wakaba@suika.fam.cx>

	* HTML.pm.src (parse_byte_string): Detect charset
	by universalchardet if charset parameter is not specified.

	* Makefile (Charset-all, Charset-clean): New rules.

++ whatpm/Whatpm/Charset/ChangeLog	19 Nov 2007 11:54:20 -0000
2007-11-19  Wakaba  <wakaba@suika.fam.cx>

	* Makefile: New file.

	* UniversalCharDet.pm, UniversalCharDet.pod: New files.

1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2 <html xmlns="http://www.w3.org/1999/xhtml">
3 <head>
4 <title>Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet
5 Character Encoding Detection</title>
6 <link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" />
7 <link rev="made" href="mailto:admin@suika.fam.cx" />
8 </head>
9
10 <body>
11
12 <p><a name="__index__"></a></p>
13 <!-- INDEX BEGIN -->
14
15 <ul>
16
17 <li><a href="#name">NAME</a></li>
18 <li><a href="#synopsis">SYNOPSIS</a></li>
19 <li><a href="#description">DESCRIPTION</a></li>
20 <li><a href="#method">METHOD</a></li>
21 <li><a href="#dependency">DEPENDENCY</a></li>
22 <li><a href="#troubleshooting">TROUBLESHOOTING</a></li>
23 <li><a href="#see_also">SEE ALSO</a></li>
24 <li><a href="#author">AUTHOR</a></li>
25 <li><a href="#license">LICENSE</a></li>
26 </ul>
27 <!-- INDEX END -->
28
29 <hr />
30 <p>
31 </p>
32 <h1><a name="name">NAME</a></h1>
33 <p>Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet
34 Character Encoding Detection</p>
35 <p>
36 </p>
37 <hr />
38 <h1><a name="synopsis">SYNOPSIS</a></h1>
39 <pre>
40 require Whatpm::Charset::UniversalCharDet;
41 $charset_name = Whatpm::Charset::UniversalCharDet
42 -&gt;detect_byte_string ($byte_string);
43 # $charset_name: charset name (in lowercase) or undef</pre>
44 <p>
45 </p>
46 <hr />
47 <h1><a name="description">DESCRIPTION</a></h1>
48 <p>The <code>Whatpm::Charset::UniversalCharDet</code> module is a Perl interface to
49 the universalchardet character encoding detection.</p>
50 <p>The universalchardet is originally developed by Mozilla project
51 and then ported to other platforms. The <code>Whatpm::Charset::UniversalCharDet</code>
52 module provides a Perl interface to Universal Encoding Detector,
53 a Python port of the Mozilla's universalchardet code. Future
54 version of this module might provide an interface to another
55 port of the universalchardet.</p>
56 <p>
57 </p>
58 <hr />
59 <h1><a name="method">METHOD</a></h1>
60 <dl>
61 <dt><strong><a name="item_detect_byte_string"><em>$charset</em> = Whatpm::Charset::UniversalCharDet-&gt;detect_byte_string (<em>$s</em>)</a></strong><br />
62 </dt>
63 <dd>
64 Detect the character encoding of the specified byte string.
65 </dd>
66 <dl>
67 <dt><strong><a name="item__24s"><em>$s</em></a></strong><br />
68 </dt>
69 <dd>
70 The byte string.
71 </dd>
72 <p></p>
73 <dt><strong><a name="item__24charset"><em>$charset</em></a></strong><br />
74 </dt>
75 <dd>
76 The name of the character encoding, detected by universalchardet,
77 in lowercase.
78 If no character encoding can be detected, because, e.g., no implementation
79 for universalchardet is found, <code>undef</code> is returned.
80 </dd>
81 <dd>
82 <p>For the list of supported encodings, see documentation for
83 Universal Encoding Detector
84 &lt;http://chardet.feedparser.org/docs/supported-encodings.html&gt;.</p>
85 </dd>
86 <p></p></dl>
87 </dl>
88 <p>
89 </p>
90 <hr />
91 <h1><a name="dependency">DEPENDENCY</a></h1>
92 <dl>
93 <dt><strong><a name="item_inline_3a_3apython"><a href="../../Inline/Python.html">the Inline::Python manpage</a></a></strong><br />
94 </dt>
95 <dd>
96 A Perl module available at CPAN
97 &lt;http://search.cpan.org/~neilw/Inline-Python-0.22/&gt;.
98 </dd>
99 <dd>
100 <p>To install the module using <em>CPAN.pm</em>:</p>
101 </dd>
102 <dd>
103 <pre>
104 root# perl -MCPAN -eshell
105 cpan&gt; install Inline::Python</pre>
106 </dd>
107 <p></p>
108 <dt><strong><a name="item_python">Python</a></strong><br />
109 </dt>
110 <dd>
111 Available at &lt;http://www.python.org/download/&gt;.
112 </dd>
113 <p></p>
114 <dt><strong><a name="item_universal_encoding_detector">Universal Encoding Detector</a></strong><br />
115 </dt>
116 <dd>
117 Available at &lt;http://chardet.feedparser.org/download/&gt;.
118 </dd>
119 <dd>
120 <p>Expand the archive and then execute <code>python setup.py install</code>
121 in the expanded directory.</p>
122 </dd>
123 <p></p></dl>
124 <p>
125 </p>
126 <hr />
127 <h1><a name="troubleshooting">TROUBLESHOOTING</a></h1>
128 <p>The <code>Whatpm::Charset::UniversalCharDet</code> module does not raise
129 error even when it fails to load the universalchardet library;
130 it simply <code>warn</code>s the error message.</p>
131 <p>This behavior can be changed by setting a true value to the
132 flag <code>$Whatpm::Charset::UniversalCharDet</code> - it will make any
133 error <code>die</code> rather than <code>warn</code>.</p>
134 <p>Common error messages are following:</p>
135 <dl>
136 <dt><strong><a name="item_can_27t_locate_inlinea_2epm_in__40inc">Can't locate Inlinea.pm in @INC</a></strong><br />
137 </dt>
138 <dd>
139 <em>Inline</em> is not installed.
140 </dd>
141 <p></p>
142 <dt><strong><a name="item_error_2e_you_have_specified__27python_27_as_an_inl">Error. You have specified 'Python' as an Inline programming language.</a></strong><br />
143 </dt>
144 <dd>
145 <a href="../../Inline/Python.html">the Inline::Python manpage</a> is not installed.
146 </dd>
147 <p></p>
148 <dt><strong><a name="item_couldn_27t_find_an_appropriate_directory_for_inlin">Couldn't find an appropriate DIRECTORY for Inline to use.</a></strong><br />
149 </dt>
150 <dd>
151 The temporary directory for the <em>Inline</em> module is not available.
152 See <a href="../../Inline/Python.html#the_inline_directory">The Inline DIRECTORY in the Inline::Python manpage</a> or
153 &lt;http://search.cpan.org/~ingy/Inline-0.44/Inline.pod#The_Inline_DIRECTORY&gt;.
154 </dd>
155 <p></p>
156 <dt><strong><a name="item_error__2d_2d_py_eval_raised_an_exception">Error -- py_eval raised an exception</a></strong><br />
157 </dt>
158 <dd>
159 Universal Encoding Detector is not installed.
160 </dd>
161 <p></p></dl>
162 <p>
163 </p>
164 <hr />
165 <h1><a name="see_also">SEE ALSO</a></h1>
166 <p>UNIVCHARDET - SuikaWiki
167 &lt;http://suika.fam.cx/gate/2005/sw/UNIVCHARDET&gt;</p>
168 <p>Universal Encoding Detector: character encoding auto-detection in Python
169 &lt;http://chardet.feedparser.org/&gt;</p>
170 <p>A composite approach to language/encoding detection
171 &lt;http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html&gt;</p>
172 <p>
173 </p>
174 <hr />
175 <h1><a name="author">AUTHOR</a></h1>
176 <p>Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;.</p>
177 <p>
178 </p>
179 <hr />
180 <h1><a name="license">LICENSE</a></h1>
181 <p>Copyright 2007 Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;</p>
182 <p>This library is free software; you can redistribute it
183 and/or modify it under the same terms as Perl itself.</p>
184
185 </body>
186
187 </html>

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24