1 |
wakaba |
1.1 |
Network Working Group Jun Murai |
2 |
|
|
Internet Draft Mark Crispin |
3 |
|
|
Erik van der Poel |
4 |
|
|
25th August 1992 |
5 |
|
|
|
6 |
|
|
|
7 |
|
|
Japanese Character Encoding for Internet Messages |
8 |
|
|
|
9 |
|
|
|
10 |
|
|
Status of this Memo |
11 |
|
|
|
12 |
|
|
This document is an Internet Draft. Internet Drafts are working |
13 |
|
|
documents of the Internet Engineering Task Force (IETF), its Areas, |
14 |
|
|
and its Working Groups. Note that other groups may also distribute |
15 |
|
|
working documents as Internet Drafts. |
16 |
|
|
|
17 |
|
|
Internet Drafts are draft documents valid for a maximum of six |
18 |
|
|
months. Internet Drafts may be updated, replaced, or obsoleted by |
19 |
|
|
other documents at any time. It is not appropriate to use Internet |
20 |
|
|
Drafts as reference material or to cite them other than as a "working |
21 |
|
|
draft" or "work in progress." |
22 |
|
|
|
23 |
|
|
Please check the I-D abstract listing contained in each Internet |
24 |
|
|
Draft directory to learn the current status of this or any other |
25 |
|
|
Internet Draft. |
26 |
|
|
|
27 |
|
|
This draft document will be submitted to the RFC editor as an |
28 |
|
|
informational document. This document will expire before 2nd March |
29 |
|
|
1993. Distribution of this memo is unlimited. Please send comments |
30 |
|
|
to ietf-822@dimacs.rutgers.edu. |
31 |
|
|
|
32 |
|
|
|
33 |
|
|
Introduction |
34 |
|
|
|
35 |
|
|
This document describes the encoding used in plain text electronic |
36 |
|
|
mail and network news in several Japanese networks. It was first |
37 |
|
|
specified by and used in JUNET [JUNET]. The encoding is now also |
38 |
|
|
widely used in Japanese IP communities. |
39 |
|
|
|
40 |
|
|
This document provides a name for the encoding which is intended to |
41 |
|
|
be used in the "charset" parameter field of MIME [MIME] messages. |
42 |
|
|
|
43 |
|
|
This document only describes the encoding of plain text. The encoding |
44 |
|
|
of other subtypes of text, such as rich text, is not discussed here. |
45 |
|
|
|
46 |
|
|
|
47 |
|
|
|
48 |
|
|
|
49 |
|
|
|
50 |
|
|
|
51 |
|
|
|
52 |
|
|
Murai et al Expires 2nd March 1993 [Page 1] |
53 |
|
|
|
54 |
|
|
Internet Draft Updated 25th August 1992 |
55 |
|
|
|
56 |
|
|
|
57 |
|
|
Informal Description |
58 |
|
|
|
59 |
|
|
The message body starts in ASCII, and switches to Japanese characters |
60 |
|
|
through an escape sequence. For example, the escape sequence ESC $ B |
61 |
|
|
(three bytes) indicates that the bytes following this escape sequence |
62 |
|
|
are Japanese characters, which are encoded in two bytes each. To |
63 |
|
|
switch back to ASCII, the escape sequence ESC ( B is used. |
64 |
|
|
|
65 |
|
|
The following table gives the escape sequences and the character sets |
66 |
|
|
used in JUNET messages. |
67 |
|
|
|
68 |
|
|
ESC ( B ASCII |
69 |
|
|
ESC ( J JIS X 0201-1976 (left-hand part) |
70 |
|
|
ESC $ @ JIS X 0208-1978 |
71 |
|
|
ESC $ B JIS X 0208-1983 |
72 |
|
|
|
73 |
|
|
The left-hand part of JIS X 0201-1976 is identical to ASCII except |
74 |
|
|
for backslash (\) and tilde (~). The backslash is replaced by the Yen |
75 |
|
|
sign, and the tilde is replaced by macron (overline). This set is |
76 |
|
|
Japan's national variant of ISO 646. |
77 |
|
|
|
78 |
|
|
The JIS X 0208 character sets consist of Kanji, Hiragana, Katakana |
79 |
|
|
and some other symbols and characters. Each character takes up two |
80 |
|
|
bytes. |
81 |
|
|
|
82 |
|
|
For further details about the JIS Japanese national character set |
83 |
|
|
standards, refer to the JIS standards themselves. For further |
84 |
|
|
information about the escape sequences, see ISO 2022 [ISO2022]. |
85 |
|
|
|
86 |
|
|
If there are JIS X 0208 characters on a line, there must be a switch |
87 |
|
|
to ASCII or to the left-hand part of JIS X 0201 before the end of the |
88 |
|
|
line (i.e. before the CRLF). This means that the next line starts in |
89 |
|
|
the character set that was switched to before the end of the previous |
90 |
|
|
line. Other restrictions are given in the Formal Description below. |
91 |
|
|
|
92 |
|
|
|
93 |
|
|
Formal Description |
94 |
|
|
|
95 |
|
|
This section provides a formal description of the JUNET encoding. In |
96 |
|
|
the event that this description is not consistent with the above |
97 |
|
|
informal description, this formal description shall take precedence. |
98 |
|
|
|
99 |
|
|
The notational conventions used here are identical to those used in |
100 |
|
|
RFC 822 [RFC822]. |
101 |
|
|
|
102 |
|
|
The * (asterisk) convention is as follows: |
103 |
|
|
|
104 |
|
|
l*m something |
105 |
|
|
|
106 |
|
|
|
107 |
|
|
|
108 |
|
|
Murai et al Expires 2nd March 1993 [Page 2] |
109 |
|
|
|
110 |
|
|
Internet Draft Updated 25th August 1992 |
111 |
|
|
|
112 |
|
|
|
113 |
|
|
meaning at least l and at most m somethings, with l and m taking |
114 |
|
|
default values of 0 and infinity, respectively. |
115 |
|
|
|
116 |
|
|
|
117 |
|
|
line = *text *1( *segment single-byte-seq *text ) CRLF |
118 |
|
|
|
119 |
|
|
segment = single-byte-segment / double-byte-segment |
120 |
|
|
|
121 |
|
|
single-byte-segment = single-byte-seq 1*text |
122 |
|
|
|
123 |
|
|
double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 ) |
124 |
|
|
|
125 |
|
|
single-byte-seq = ESC "(" ( "B" / "J" ) |
126 |
|
|
|
127 |
|
|
double-byte-seq = ESC "$" ( "@" / "B" ) |
128 |
|
|
|
129 |
|
|
; ( Octal, Decimal.) |
130 |
|
|
|
131 |
|
|
ESC = <ISO 2022 ESC, escape> ; ( 33, 27.) |
132 |
|
|
|
133 |
|
|
one-of-94 = <any char in 94-char set> ; (41-176, 33.-126.) |
134 |
|
|
|
135 |
|
|
CHAR = <any ASCII character> ; ( 0-177, 0.-127.) |
136 |
|
|
|
137 |
|
|
text = <any CHAR, including bare |
138 |
|
|
CR & bare LF, but NOT |
139 |
|
|
including CRLF> |
140 |
|
|
|
141 |
|
|
|
142 |
|
|
Additional restrictions that are difficult to describe in the above |
143 |
|
|
are as follows. |
144 |
|
|
|
145 |
|
|
Adjacent segments should have different escape sequences. For |
146 |
|
|
example, the following is not recommended: |
147 |
|
|
|
148 |
|
|
ESC $ B .... ESC $ B .... |
149 |
|
|
|
150 |
|
|
|
151 |
|
|
MIME Considerations |
152 |
|
|
|
153 |
|
|
The name given to the JUNET character encoding is "ISO-2022-JP". This |
154 |
|
|
name is intended to be used in MIME messages as follows: |
155 |
|
|
|
156 |
|
|
Content-Type: text/plain; charset=iso-2022-jp |
157 |
|
|
|
158 |
|
|
The JUNET encoding is already in 7-bit form, so it is not necessary |
159 |
|
|
to use a Content-Transfer-Encoding header. It should be noted that |
160 |
|
|
applying the Base64 or Quoted-Printable encoding will render the |
161 |
|
|
|
162 |
|
|
|
163 |
|
|
|
164 |
|
|
Murai et al Expires 2nd March 1993 [Page 3] |
165 |
|
|
|
166 |
|
|
Internet Draft Updated 25th August 1992 |
167 |
|
|
|
168 |
|
|
|
169 |
|
|
message unreadable in current JUNET software. |
170 |
|
|
|
171 |
|
|
|
172 |
|
|
Background Information |
173 |
|
|
|
174 |
|
|
The JUNET encoding was described in the JUNET User's Guide [JUNET] |
175 |
|
|
(JUNET Riyou No Tebiki Dai Ippan). |
176 |
|
|
|
177 |
|
|
The encoding is based on the particular usage of ISO 2022 [ISO2022] |
178 |
|
|
announced by 4/1. However, the escape sequence normally used for this |
179 |
|
|
announcement is not included in JUNET messages. |
180 |
|
|
|
181 |
|
|
|
182 |
|
|
References |
183 |
|
|
|
184 |
|
|
[ISO2022] International Organization for Standardization (ISO), |
185 |
|
|
"Information processing -- ISO 7-bit and 8-bit coded character sets |
186 |
|
|
-- Code extension techniques", International Standard, 1986, Ref. No. |
187 |
|
|
ISO 2022-1986 (E) |
188 |
|
|
|
189 |
|
|
[JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide |
190 |
|
|
Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET |
191 |
|
|
User's Guide (First Edition)"), February 1988 |
192 |
|
|
|
193 |
|
|
[MIME] Nathaniel Borenstein and Ned Freed, "MIME (Multipurpose |
194 |
|
|
Internet Mail Extensions): Mechanisms for Specifying and Describing |
195 |
|
|
the Format of Internet Message Bodies", Proposed (Internet) standard, |
196 |
|
|
June 1992, rfc1341 |
197 |
|
|
|
198 |
|
|
[RFC822] David H. Crocker, "Standard for the Format of ARPA Internet |
199 |
|
|
Text Messages", Internet standard, August 1982, rfc822 |
200 |
|
|
|
201 |
|
|
|
202 |
|
|
Security Considerations |
203 |
|
|
|
204 |
|
|
Security considerations are not discussed in this memo. |
205 |
|
|
|
206 |
|
|
|
207 |
|
|
Acknowledgements |
208 |
|
|
|
209 |
|
|
Many people assisted in drafting this document. The authors wish to |
210 |
|
|
thank in particular Akira Kato, Masahiro Sekiguchi and Ken'ichi |
211 |
|
|
Handa. |
212 |
|
|
|
213 |
|
|
|
214 |
|
|
|
215 |
|
|
|
216 |
|
|
|
217 |
|
|
|
218 |
|
|
|
219 |
|
|
|
220 |
|
|
Murai et al Expires 2nd March 1993 [Page 4] |
221 |
|
|
|
222 |
|
|
Internet Draft Updated 25th August 1992 |
223 |
|
|
|
224 |
|
|
|
225 |
|
|
Authors' Addresses |
226 |
|
|
|
227 |
|
|
|
228 |
|
|
Jun Murai |
229 |
|
|
Keio University |
230 |
|
|
5322 Endo, Fujisawa |
231 |
|
|
Fujisawa 252 Japan |
232 |
|
|
|
233 |
|
|
Fax: +81 (466) 49-1101 |
234 |
|
|
|
235 |
|
|
EMail: jun@wide.ad.jp |
236 |
|
|
|
237 |
|
|
|
238 |
|
|
Mark Crispin |
239 |
|
|
Panda Programming |
240 |
|
|
6158 Lariat Loop NE |
241 |
|
|
Bainbridge Island, WA 98110-2098 |
242 |
|
|
USA |
243 |
|
|
|
244 |
|
|
Phone: +1 (206) 842-2385 |
245 |
|
|
|
246 |
|
|
EMail: MRC@PANDA.COM |
247 |
|
|
|
248 |
|
|
|
249 |
|
|
Erik M. van der Poel |
250 |
|
|
A-105 Park Avenue |
251 |
|
|
4-4-10 Ohta, Kisarazu |
252 |
|
|
Chiba 292 Japan |
253 |
|
|
|
254 |
|
|
Phone: +81 (438) 22-5836 |
255 |
|
|
Fax: +81 (438) 22-5837 |
256 |
|
|
|
257 |
|
|
EMail: erik@poel.juice.or.jp |
258 |
|
|
|
259 |
|
|
|
260 |
|
|
|
261 |
|
|
|
262 |
|
|
|
263 |
|
|
|
264 |
|
|
|
265 |
|
|
|
266 |
|
|
|
267 |
|
|
|
268 |
|
|
|
269 |
|
|
|
270 |
|
|
|
271 |
|
|
|
272 |
|
|
|
273 |
|
|
|
274 |
|
|
|
275 |
|
|
|
276 |
|
|
Murai et al Expires 2nd March 1993 [Page 5] |