/[suikacvs]/markup/html/whatpm/Whatpm/Charset/UniversalCharDet.pod
Suika

Contents of /markup/html/whatpm/Whatpm/Charset/UniversalCharDet.pod

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.2 - (show annotations) (download)
Sun Oct 5 06:42:04 2008 UTC (16 years, 9 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
Changes since 1.1: +9 -9 lines
++ ChangeLog	5 Oct 2008 06:41:29 -0000
2008-10-05  Wakaba  <wakaba@suika.fam.cx>

	* readme.en.html: Missing link to Whatpm::RDFXML module is added.
	Typo fixed.  Noted that Media Query Level 3 is not supported yet.
	Linked to WebHACC supported standards documentation.

++ whatpm/Whatpm/Charset/ChangeLog	5 Oct 2008 06:41:56 -0000
2008-10-05  Wakaba  <wakaba@suika.fam.cx>

	* UniversalCharDet.pod: Typo fixed.

1 =head1 NAME
2
3 Whatpm::Charset::UniversalCharDet - A Perl Interface to universalchardet
4 Character Encoding Detection
5
6 =head1 SYNOPSIS
7
8 require Whatpm::Charset::UniversalCharDet;
9 $charset_name = Whatpm::Charset::UniversalCharDet
10 ->detect_byte_string ($byte_string);
11 # $charset_name: charset name (in lowercase) or undef
12
13 =head1 DESCRIPTION
14
15 The C<Whatpm::Charset::UniversalCharDet> module is a Perl interface to
16 the universalchardet character encoding detection.
17
18 The universalchardet is originally developed by Mozilla project
19 and then ported to other platforms. The C<Whatpm::Charset::UniversalCharDet>
20 module provides a Perl interface to Universal Encoding Detector,
21 a Python port of the Mozilla's universalchardet code. Future
22 version of this module might provide an interface to another
23 port of the universalchardet.
24
25 =head1 METHOD
26
27 =over 4
28
29 =item I<$charset> = Whatpm::Charset::UniversalCharDet->detect_byte_string (I<$s>)
30
31 Detect the character encoding of the specified byte string.
32
33 =over 4
34
35 =item I<$s>
36
37 The byte string.
38
39 =item I<$charset>
40
41 The name of the character encoding, detected by universalchardet,
42 in lowercase.
43 If no character encoding can be detected, because, e.g., no implementation
44 for universalchardet is found, C<undef> is returned.
45
46 For the list of supported encodings, see documentation for
47 Universal Encoding Detector
48 <http://chardet.feedparser.org/docs/supported-encodings.html>.
49
50 =back
51
52 =back
53
54 =head1 DEPENDENCY
55
56 =over 4
57
58 =item L<Inline::Python>
59
60 A Perl module available at CPAN
61 <http://search.cpan.org/~neilw/Inline-Python-0.22/>.
62
63 To install the module using L<CPAN.pm>:
64
65 root# perl -MCPAN -eshell
66 cpan> install Inline::Python
67
68 =item Python
69
70 Available at <http://www.python.org/download/>.
71
72 =item Universal Encoding Detector
73
74 Available at <http://chardet.feedparser.org/download/>.
75
76 Expand the archive and then execute C<python setup.py install>
77 in the expanded directory.
78
79 =back
80
81 =head1 TROUBLESHOOTING
82
83 The C<Whatpm::Charset::UniversalCharDet> module does not raise
84 error even when it fails to load the universalchardet library;
85 it simply C<warn>s the error message.
86
87 This behavior can be changed by setting a true value to the flag
88 C<$Whatpm::Charset::UniversalCharDet::DEBUG> - it will make any error
89 invoke C<die> instead of C<warn>.
90
91 Common error messages are as follows:
92
93 =over 4
94
95 =item Can't locate Inline.pm in @INC
96
97 Module L<Inline> is not installed.
98
99 =item Error. You have specified 'Python' as an Inline programming language.
100
101 Module L<Inline::Python> is not installed.
102
103 =item Couldn't find an appropriate DIRECTORY for Inline to use.
104
105 The temporary directory for the L<Inline> module is not available.
106 See L<Inline::Python/"The Inline DIRECTORY"> or
107 <http://search.cpan.org/~ingy/Inline-0.44/Inline.pod#The_Inline_DIRECTORY>.
108
109 =item Error -- py_eval raised an exception
110
111 Universal Encoding Detector is not installed.
112
113 =back
114
115 =head1 SEE ALSO
116
117 UNIVCHARDET - SuikaWiki
118 <http://suika.fam.cx/gate/2005/sw/UNIVCHARDET>
119
120 Universal Encoding Detector: character encoding auto-detection in Python
121 <http://chardet.feedparser.org/>
122
123 A composite approach to language/encoding detection
124 <http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html>
125
126 =head1 AUTHOR
127
128 Wakaba <w@suika.fam.cx>.
129
130 =head1 LICENSE
131
132 Copyright 2007-2008 Wakaba <w@suika.fam.cx>
133
134 This library is free software; you can redistribute it
135 and/or modify it under the same terms as Perl itself.
136
137 =cut
138
139 ## $Date: 2007/11/19 12:18:27 $

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24