1 |
wakaba |
1.2 |
<?xml version="1.0" ?> |
2 |
wakaba |
1.1 |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
3 |
|
|
<html xmlns="http://www.w3.org/1999/xhtml"> |
4 |
|
|
<head> |
5 |
|
|
<title>Char::Class::RFC1815 - Regular Expression Character Classes - C<RFC1815></title> |
6 |
|
|
<link rel="stylesheet" href="http://suika.fam.cx/www/style/html/pod.css" type="text/css" /> |
7 |
wakaba |
1.2 |
<meta http-equiv="content-type" content="text/html; charset=utf-8" /> |
8 |
|
|
<link rev="made" href="mailto:wakaba@suika.fam.cx" /> |
9 |
wakaba |
1.1 |
</head> |
10 |
|
|
|
11 |
|
|
<body> |
12 |
|
|
|
13 |
wakaba |
1.2 |
|
14 |
|
|
<!-- INDEX BEGIN --> |
15 |
|
|
<div name="index"> |
16 |
wakaba |
1.1 |
<p><a name="__index__"></a></p> |
17 |
|
|
|
18 |
|
|
<ul> |
19 |
|
|
|
20 |
|
|
<li><a href="#name">NAME</a></li> |
21 |
|
|
<li><a href="#description">DESCRIPTION</a></li> |
22 |
|
|
<li><a href="#collection_names">COLLECTION NAMES</a></li> |
23 |
|
|
<li><a href="#example">EXAMPLE</a></li> |
24 |
|
|
<li><a href="#see_also">SEE ALSO</a></li> |
25 |
|
|
<li><a href="#license">LICENSE</a></li> |
26 |
|
|
</ul> |
27 |
wakaba |
1.2 |
|
28 |
|
|
<hr name="index" /> |
29 |
|
|
</div> |
30 |
wakaba |
1.1 |
<!-- INDEX END --> |
31 |
|
|
|
32 |
|
|
<p> |
33 |
|
|
</p> |
34 |
|
|
<h1><a name="name">NAME</a></h1> |
35 |
wakaba |
1.2 |
<p>Char::Class::<a href="http://www.ietf.org/rfc/rfc1815.txt" class="rfc">RFC1815</a> - Regular Expression Character Classes - <code>RFC1815</code></p> |
36 |
wakaba |
1.1 |
<p> |
37 |
|
|
</p> |
38 |
|
|
<hr /> |
39 |
|
|
<h1><a name="description">DESCRIPTION</a></h1> |
40 |
wakaba |
1.2 |
<p>Two ISO/IEC 10646 character repertories, defined by <a href="http://www.ietf.org/rfc/rfc1815.txt" class="rfc">RFC 1815</a>.</p> |
41 |
wakaba |
1.1 |
<p>RFC 1815 defines two profiled text encoding schemes based on |
42 |
|
|
ISO/IEC 10646. Because of the size of full ISO/IEC 10646 |
43 |
|
|
character set, full implemention of it is (or WAS at least |
44 |
wakaba |
1.2 |
at the days of <a href="http://www.ietf.org/rfc/rfc1815.txt" class="rfc">RFC 1815</a> published) too difficult, so a couple |
45 |
wakaba |
1.1 |
of profiled (restricted) ISO/IEC 10646-base coded character sets |
46 |
wakaba |
1.2 |
were defined (as <a href="http://www.ietf.org/rfc/rfc1815.txt" class="rfc">RFC 1815</a> did). (For details of these schemes, |
47 |
|
|
see <a href="http://www.ietf.org/rfc/rfc1815.txt" class="rfc">RFC 1815</a>.)</p> |
48 |
wakaba |
1.1 |
<p>Both encoding schemes (and most of such schemes) uses two octet |
49 |
|
|
BMP form (ie. UCS-2), the only realistic encoding form of those |
50 |
|
|
days. Today it should be obsoleted. But their repertories |
51 |
|
|
can be useful for interoperability purpose even today.</p> |
52 |
|
|
<dl> |
53 |
wakaba |
1.2 |
<dt><strong><a name="iso_10646" class="item">ISO-10646</a></strong> |
54 |
|
|
|
55 |
wakaba |
1.1 |
<dd> |
56 |
wakaba |
1.2 |
<p>The repertory of charset <a href="#iso_10646"><code>ISO-10646</code></a> is same as ISO/IEC 8859-1 |
57 |
wakaba |
1.1 |
(MIME name: <code>ISO-8859-1</code>. Aka Latin-1). In most situation |
58 |
wakaba |
1.2 |
you can simply write <code>[ -ÿ]</code>.</p> |
59 |
wakaba |
1.1 |
</dd> |
60 |
wakaba |
1.2 |
</li> |
61 |
|
|
<dt><strong><a name="iso_10646_j_1" class="item">ISO-10646-J-1</a></strong> |
62 |
|
|
|
63 |
wakaba |
1.1 |
<dd> |
64 |
wakaba |
1.2 |
<p>The repertory of charset <a href="#iso_10646_j_1"><code>ISO-10646-J-1</code></a> is superset of |
65 |
wakaba |
1.1 |
BASIC JAPANESE plus FULLWIDTH ALPHANUMERICS plus HALFWIDTH KATAKANA, |
66 |
wakaba |
1.2 |
defined by JIS X 0221 Appendix 1.</p> |
67 |
wakaba |
1.1 |
</dd> |
68 |
|
|
<dd> |
69 |
|
|
<p>This repertory includes all available characters of Japanized |
70 |
|
|
Windoze NT 3.51. (Note that more characters are available |
71 |
|
|
in later versions of Windoze NT.)</p> |
72 |
|
|
</dd> |
73 |
|
|
<dd> |
74 |
wakaba |
1.2 |
<p>Description of <a href="http://www.ietf.org/rfc/rfc1815.txt" class="rfc">RFC 1815</a> does mention any character of U+21xx |
75 |
wakaba |
1.1 |
and U+24xx. It would be a bug, since ISO-10646-J-1 is defined |
76 |
|
|
as alternate of JIS X 0208, including those characters. |
77 |
|
|
This module also includes those characters.</p> |
78 |
|
|
</dd> |
79 |
|
|
<dd> |
80 |
wakaba |
1.2 |
<p>Since description of <a href="http://www.ietf.org/rfc/rfc1815.txt" class="rfc">RFC 1815</a> is ambitious, most of character list |
81 |
|
|
of <a href="#iso_10646_j_1"><code>ISO-10646-J-1</code></a> is made from the list of JIS X 0221 Appendix 1. |
82 |
|
|
(Note that <a href="http://www.ietf.org/rfc/rfc1815.txt" class="rfc">RFC 1815</a> is not referring JIS X 0221-1995 because of |
83 |
wakaba |
1.1 |
schedule of both memo/standard.)</p> |
84 |
|
|
</dd> |
85 |
wakaba |
1.2 |
</li> |
86 |
|
|
</dl> |
87 |
wakaba |
1.1 |
<p> |
88 |
|
|
</p> |
89 |
|
|
<hr /> |
90 |
|
|
<h1><a name="collection_names">COLLECTION NAMES</a></h1> |
91 |
|
|
<dl> |
92 |
wakaba |
1.2 |
<dt><strong><a name="inrfc1815iso10646j1" class="item"><code>InRFC1815ISO10646J1</code></a></strong> |
93 |
|
|
|
94 |
wakaba |
1.1 |
</dl> |
95 |
|
|
<p> |
96 |
|
|
</p> |
97 |
|
|
<hr /> |
98 |
|
|
<h1><a name="example">EXAMPLE</a></h1> |
99 |
|
|
<pre> |
100 |
|
|
use Char::Class::RFC1815; |
101 |
|
|
if ($s =~ /\p{InISO10646J1}/) { |
102 |
|
|
print "Match!\n"; |
103 |
|
|
}</pre> |
104 |
|
|
<p> |
105 |
|
|
</p> |
106 |
|
|
<hr /> |
107 |
|
|
<h1><a name="see_also">SEE ALSO</a></h1> |
108 |
|
|
<p>RFC 1815 <urn:ietf:rfc:1815></p> |
109 |
|
|
<p><a href="../../Char/Class/UCS.html">the Char::Class::UCS manpage</a></p> |
110 |
|
|
<p><a href="../../Char/Class/JISX0221.html">the Char::Class::JISX0221 manpage</a></p> |
111 |
|
|
<p> |
112 |
|
|
</p> |
113 |
|
|
<hr /> |
114 |
|
|
<h1><a name="license">LICENSE</a></h1> |
115 |
wakaba |
1.2 |
<p>Copyright 2008 Wakaba <<a href="mailto:w@suika.fam.cx">w@suika.fam.cx</a>></p> |
116 |
wakaba |
1.1 |
<p>This library and the library generated by it is free software; |
117 |
|
|
you can redistribute them and/or modify them under the same |
118 |
|
|
terms as Perl itself.</p> |
119 |
|
|
|
120 |
|
|
</body> |
121 |
|
|
|
122 |
|
|
</html> |