/[suikacvs]/markup/html/whatpm/Whatpm/Charset/UniversalCharDet.pod
Suika

Contents of /markup/html/whatpm/Whatpm/Charset/UniversalCharDet.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Mon Nov 19 12:18:27 2007 UTC (16 years, 11 months ago) by wakaba
Branch: MAIN
++ ChangeLog	19 Nov 2007 12:16:52 -0000
2007-11-19  Wakaba  <wakaba@suika.fam.cx>

	* readme.en.html: Link to |Whatpm::Charset::UniversalCharDet|.

++ whatpm/Whatpm/ChangeLog	19 Nov 2007 12:17:47 -0000
2007-11-19  Wakaba  <wakaba@suika.fam.cx>

	* HTML.pm.src (parse_byte_string): Detect charset
	by universalchardet if charset parameter is not specified.

	* Makefile (Charset-all, Charset-clean): New rules.

++ whatpm/Whatpm/Charset/ChangeLog	19 Nov 2007 11:54:20 -0000
2007-11-19  Wakaba  <wakaba@suika.fam.cx>

	* Makefile: New file.

	* UniversalCharDet.pm, UniversalCharDet.pod: New files.

1 wakaba 1.1 =head1 NAME
2    
3     Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet
4     Character Encoding Detection
5    
6     =head1 SYNOPSIS
7    
8     require Whatpm::Charset::UniversalCharDet;
9     $charset_name = Whatpm::Charset::UniversalCharDet
10     ->detect_byte_string ($byte_string);
11     # $charset_name: charset name (in lowercase) or undef
12    
13     =head1 DESCRIPTION
14    
15     The C<Whatpm::Charset::UniversalCharDet> module is a Perl interface to
16     the universalchardet character encoding detection.
17    
18     The universalchardet is originally developed by Mozilla project
19     and then ported to other platforms. The C<Whatpm::Charset::UniversalCharDet>
20     module provides a Perl interface to Universal Encoding Detector,
21     a Python port of the Mozilla's universalchardet code. Future
22     version of this module might provide an interface to another
23     port of the universalchardet.
24    
25     =head1 METHOD
26    
27     =over 4
28    
29     =item I<$charset> = Whatpm::Charset::UniversalCharDet->detect_byte_string (I<$s>)
30    
31     Detect the character encoding of the specified byte string.
32    
33     =over 4
34    
35     =item I<$s>
36    
37     The byte string.
38    
39     =item I<$charset>
40    
41     The name of the character encoding, detected by universalchardet,
42     in lowercase.
43     If no character encoding can be detected, because, e.g., no implementation
44     for universalchardet is found, C<undef> is returned.
45    
46     For the list of supported encodings, see documentation for
47     Universal Encoding Detector
48     <http://chardet.feedparser.org/docs/supported-encodings.html>.
49    
50     =back
51    
52     =back
53    
54     =head1 DEPENDENCY
55    
56     =over 4
57    
58     =item L<Inline::Python>
59    
60     A Perl module available at CPAN
61     <http://search.cpan.org/~neilw/Inline-Python-0.22/>.
62    
63     To install the module using L<CPAN.pm>:
64    
65     root# perl -MCPAN -eshell
66     cpan> install Inline::Python
67    
68     =item Python
69    
70     Available at <http://www.python.org/download/>.
71    
72     =item Universal Encoding Detector
73    
74     Available at <http://chardet.feedparser.org/download/>.
75    
76     Expand the archive and then execute C<python setup.py install>
77     in the expanded directory.
78    
79     =back
80    
81     =head1 TROUBLESHOOTING
82    
83     The C<Whatpm::Charset::UniversalCharDet> module does not raise
84     error even when it fails to load the universalchardet library;
85     it simply C<warn>s the error message.
86    
87     This behavior can be changed by setting a true value to the
88     flag C<$Whatpm::Charset::UniversalCharDet> - it will make any
89     error C<die> rather than C<warn>.
90    
91     Common error messages are following:
92    
93     =over 4
94    
95     =item Can't locate Inlinea.pm in @INC
96    
97     L<Inline> is not installed.
98    
99     =item Error. You have specified 'Python' as an Inline programming language.
100    
101     L<Inline::Python> is not installed.
102    
103     =item Couldn't find an appropriate DIRECTORY for Inline to use.
104    
105     The temporary directory for the L<Inline> module is not available.
106     See L<Inline::Python/"The Inline DIRECTORY"> or
107     <http://search.cpan.org/~ingy/Inline-0.44/Inline.pod#The_Inline_DIRECTORY>.
108    
109     =item Error -- py_eval raised an exception
110    
111     Universal Encoding Detector is not installed.
112    
113     =back
114    
115     =head1 SEE ALSO
116    
117     UNIVCHARDET - SuikaWiki
118     <http://suika.fam.cx/gate/2005/sw/UNIVCHARDET>
119    
120     Universal Encoding Detector: character encoding auto-detection in Python
121     <http://chardet.feedparser.org/>
122    
123     A composite approach to language/encoding detection
124     <http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html>
125    
126     =head1 AUTHOR
127    
128     Wakaba <w@suika.fam.cx>.
129    
130     =head1 LICENSE
131    
132     Copyright 2007 Wakaba <w@suika.fam.cx>
133    
134     This library is free software; you can redistribute it
135     and/or modify it under the same terms as Perl itself.
136    
137     =cut
138    
139     ## $Date:$

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24