/[suikacvs]/markup/html/whatpm/Whatpm/Charset/UniversalCharDet.html
Suika

Contents of /markup/html/whatpm/Whatpm/Charset/UniversalCharDet.html

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download) (as text)
Mon Nov 19 12:18:27 2007 UTC (17 years, 7 months ago) by wakaba
Branch: MAIN
File MIME type: text/html
++ ChangeLog	19 Nov 2007 12:16:52 -0000
2007-11-19  Wakaba  <wakaba@suika.fam.cx>

	* readme.en.html: Link to |Whatpm::Charset::UniversalCharDet|.

++ whatpm/Whatpm/ChangeLog	19 Nov 2007 12:17:47 -0000
2007-11-19  Wakaba  <wakaba@suika.fam.cx>

	* HTML.pm.src (parse_byte_string): Detect charset
	by universalchardet if charset parameter is not specified.

	* Makefile (Charset-all, Charset-clean): New rules.

++ whatpm/Whatpm/Charset/ChangeLog	19 Nov 2007 11:54:20 -0000
2007-11-19  Wakaba  <wakaba@suika.fam.cx>

	* Makefile: New file.

	* UniversalCharDet.pm, UniversalCharDet.pod: New files.

1 wakaba 1.1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
2     <html xmlns="http://www.w3.org/1999/xhtml">
3     <head>
4     <title>Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet
5     Character Encoding Detection</title>
6     <link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" />
7     <link rev="made" href="mailto:admin@suika.fam.cx" />
8     </head>
9    
10     <body>
11    
12     <p><a name="__index__"></a></p>
13     <!-- INDEX BEGIN -->
14    
15     <ul>
16    
17     <li><a href="#name">NAME</a></li>
18     <li><a href="#synopsis">SYNOPSIS</a></li>
19     <li><a href="#description">DESCRIPTION</a></li>
20     <li><a href="#method">METHOD</a></li>
21     <li><a href="#dependency">DEPENDENCY</a></li>
22     <li><a href="#troubleshooting">TROUBLESHOOTING</a></li>
23     <li><a href="#see_also">SEE ALSO</a></li>
24     <li><a href="#author">AUTHOR</a></li>
25     <li><a href="#license">LICENSE</a></li>
26     </ul>
27     <!-- INDEX END -->
28    
29     <hr />
30     <p>
31     </p>
32     <h1><a name="name">NAME</a></h1>
33     <p>Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet
34     Character Encoding Detection</p>
35     <p>
36     </p>
37     <hr />
38     <h1><a name="synopsis">SYNOPSIS</a></h1>
39     <pre>
40     require Whatpm::Charset::UniversalCharDet;
41     $charset_name = Whatpm::Charset::UniversalCharDet
42     -&gt;detect_byte_string ($byte_string);
43     # $charset_name: charset name (in lowercase) or undef</pre>
44     <p>
45     </p>
46     <hr />
47     <h1><a name="description">DESCRIPTION</a></h1>
48     <p>The <code>Whatpm::Charset::UniversalCharDet</code> module is a Perl interface to
49     the universalchardet character encoding detection.</p>
50     <p>The universalchardet is originally developed by Mozilla project
51     and then ported to other platforms. The <code>Whatpm::Charset::UniversalCharDet</code>
52     module provides a Perl interface to Universal Encoding Detector,
53     a Python port of the Mozilla's universalchardet code. Future
54     version of this module might provide an interface to another
55     port of the universalchardet.</p>
56     <p>
57     </p>
58     <hr />
59     <h1><a name="method">METHOD</a></h1>
60     <dl>
61     <dt><strong><a name="item_detect_byte_string"><em>$charset</em> = Whatpm::Charset::UniversalCharDet-&gt;detect_byte_string (<em>$s</em>)</a></strong><br />
62     </dt>
63     <dd>
64     Detect the character encoding of the specified byte string.
65     </dd>
66     <dl>
67     <dt><strong><a name="item__24s"><em>$s</em></a></strong><br />
68     </dt>
69     <dd>
70     The byte string.
71     </dd>
72     <p></p>
73     <dt><strong><a name="item__24charset"><em>$charset</em></a></strong><br />
74     </dt>
75     <dd>
76     The name of the character encoding, detected by universalchardet,
77     in lowercase.
78     If no character encoding can be detected, because, e.g., no implementation
79     for universalchardet is found, <code>undef</code> is returned.
80     </dd>
81     <dd>
82     <p>For the list of supported encodings, see documentation for
83     Universal Encoding Detector
84     &lt;http://chardet.feedparser.org/docs/supported-encodings.html&gt;.</p>
85     </dd>
86     <p></p></dl>
87     </dl>
88     <p>
89     </p>
90     <hr />
91     <h1><a name="dependency">DEPENDENCY</a></h1>
92     <dl>
93     <dt><strong><a name="item_inline_3a_3apython"><a href="../../Inline/Python.html">the Inline::Python manpage</a></a></strong><br />
94     </dt>
95     <dd>
96     A Perl module available at CPAN
97     &lt;http://search.cpan.org/~neilw/Inline-Python-0.22/&gt;.
98     </dd>
99     <dd>
100     <p>To install the module using <em>CPAN.pm</em>:</p>
101     </dd>
102     <dd>
103     <pre>
104     root# perl -MCPAN -eshell
105     cpan&gt; install Inline::Python</pre>
106     </dd>
107     <p></p>
108     <dt><strong><a name="item_python">Python</a></strong><br />
109     </dt>
110     <dd>
111     Available at &lt;http://www.python.org/download/&gt;.
112     </dd>
113     <p></p>
114     <dt><strong><a name="item_universal_encoding_detector">Universal Encoding Detector</a></strong><br />
115     </dt>
116     <dd>
117     Available at &lt;http://chardet.feedparser.org/download/&gt;.
118     </dd>
119     <dd>
120     <p>Expand the archive and then execute <code>python setup.py install</code>
121     in the expanded directory.</p>
122     </dd>
123     <p></p></dl>
124     <p>
125     </p>
126     <hr />
127     <h1><a name="troubleshooting">TROUBLESHOOTING</a></h1>
128     <p>The <code>Whatpm::Charset::UniversalCharDet</code> module does not raise
129     error even when it fails to load the universalchardet library;
130     it simply <code>warn</code>s the error message.</p>
131     <p>This behavior can be changed by setting a true value to the
132     flag <code>$Whatpm::Charset::UniversalCharDet</code> - it will make any
133     error <code>die</code> rather than <code>warn</code>.</p>
134     <p>Common error messages are following:</p>
135     <dl>
136     <dt><strong><a name="item_can_27t_locate_inlinea_2epm_in__40inc">Can't locate Inlinea.pm in @INC</a></strong><br />
137     </dt>
138     <dd>
139     <em>Inline</em> is not installed.
140     </dd>
141     <p></p>
142     <dt><strong><a name="item_error_2e_you_have_specified__27python_27_as_an_inl">Error. You have specified 'Python' as an Inline programming language.</a></strong><br />
143     </dt>
144     <dd>
145     <a href="../../Inline/Python.html">the Inline::Python manpage</a> is not installed.
146     </dd>
147     <p></p>
148     <dt><strong><a name="item_couldn_27t_find_an_appropriate_directory_for_inlin">Couldn't find an appropriate DIRECTORY for Inline to use.</a></strong><br />
149     </dt>
150     <dd>
151     The temporary directory for the <em>Inline</em> module is not available.
152     See <a href="../../Inline/Python.html#the_inline_directory">The Inline DIRECTORY in the Inline::Python manpage</a> or
153     &lt;http://search.cpan.org/~ingy/Inline-0.44/Inline.pod#The_Inline_DIRECTORY&gt;.
154     </dd>
155     <p></p>
156     <dt><strong><a name="item_error__2d_2d_py_eval_raised_an_exception">Error -- py_eval raised an exception</a></strong><br />
157     </dt>
158     <dd>
159     Universal Encoding Detector is not installed.
160     </dd>
161     <p></p></dl>
162     <p>
163     </p>
164     <hr />
165     <h1><a name="see_also">SEE ALSO</a></h1>
166     <p>UNIVCHARDET - SuikaWiki
167     &lt;http://suika.fam.cx/gate/2005/sw/UNIVCHARDET&gt;</p>
168     <p>Universal Encoding Detector: character encoding auto-detection in Python
169     &lt;http://chardet.feedparser.org/&gt;</p>
170     <p>A composite approach to language/encoding detection
171     &lt;http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html&gt;</p>
172     <p>
173     </p>
174     <hr />
175     <h1><a name="author">AUTHOR</a></h1>
176     <p>Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;.</p>
177     <p>
178     </p>
179     <hr />
180     <h1><a name="license">LICENSE</a></h1>
181     <p>Copyright 2007 Wakaba &lt;<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>&gt;</p>
182     <p>This library is free software; you can redistribute it
183     and/or modify it under the same terms as Perl itself.</p>
184    
185     </body>
186    
187     </html>

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24