1 |
Network Working Group Jun Murai |
2 |
Internet Draft Mark Crispin |
3 |
Erik van der Poel |
4 |
25th August 1992 |
5 |
|
6 |
|
7 |
Japanese Character Encoding for Internet Messages |
8 |
|
9 |
|
10 |
Status of this Memo |
11 |
|
12 |
This document is an Internet Draft. Internet Drafts are working |
13 |
documents of the Internet Engineering Task Force (IETF), its Areas, |
14 |
and its Working Groups. Note that other groups may also distribute |
15 |
working documents as Internet Drafts. |
16 |
|
17 |
Internet Drafts are draft documents valid for a maximum of six |
18 |
months. Internet Drafts may be updated, replaced, or obsoleted by |
19 |
other documents at any time. It is not appropriate to use Internet |
20 |
Drafts as reference material or to cite them other than as a "working |
21 |
draft" or "work in progress." |
22 |
|
23 |
Please check the I-D abstract listing contained in each Internet |
24 |
Draft directory to learn the current status of this or any other |
25 |
Internet Draft. |
26 |
|
27 |
This draft document will be submitted to the RFC editor as an |
28 |
informational document. This document will expire before 2nd March |
29 |
1993. Distribution of this memo is unlimited. Please send comments |
30 |
to ietf-822@dimacs.rutgers.edu. |
31 |
|
32 |
|
33 |
Introduction |
34 |
|
35 |
This document describes the encoding used in plain text electronic |
36 |
mail and network news in several Japanese networks. It was first |
37 |
specified by and used in JUNET [JUNET]. The encoding is now also |
38 |
widely used in Japanese IP communities. |
39 |
|
40 |
This document provides a name for the encoding which is intended to |
41 |
be used in the "charset" parameter field of MIME [MIME] messages. |
42 |
|
43 |
This document only describes the encoding of plain text. The encoding |
44 |
of other subtypes of text, such as rich text, is not discussed here. |
45 |
|
46 |
|
47 |
|
48 |
|
49 |
|
50 |
|
51 |
|
52 |
Murai et al Expires 2nd March 1993 [Page 1] |
53 |
|
54 |
Internet Draft Updated 25th August 1992 |
55 |
|
56 |
|
57 |
Informal Description |
58 |
|
59 |
The message body starts in ASCII, and switches to Japanese characters |
60 |
through an escape sequence. For example, the escape sequence ESC $ B |
61 |
(three bytes) indicates that the bytes following this escape sequence |
62 |
are Japanese characters, which are encoded in two bytes each. To |
63 |
switch back to ASCII, the escape sequence ESC ( B is used. |
64 |
|
65 |
The following table gives the escape sequences and the character sets |
66 |
used in JUNET messages. |
67 |
|
68 |
ESC ( B ASCII |
69 |
ESC ( J JIS X 0201-1976 (left-hand part) |
70 |
ESC $ @ JIS X 0208-1978 |
71 |
ESC $ B JIS X 0208-1983 |
72 |
|
73 |
The left-hand part of JIS X 0201-1976 is identical to ASCII except |
74 |
for backslash (\) and tilde (~). The backslash is replaced by the Yen |
75 |
sign, and the tilde is replaced by macron (overline). This set is |
76 |
Japan's national variant of ISO 646. |
77 |
|
78 |
The JIS X 0208 character sets consist of Kanji, Hiragana, Katakana |
79 |
and some other symbols and characters. Each character takes up two |
80 |
bytes. |
81 |
|
82 |
For further details about the JIS Japanese national character set |
83 |
standards, refer to the JIS standards themselves. For further |
84 |
information about the escape sequences, see ISO 2022 [ISO2022]. |
85 |
|
86 |
If there are JIS X 0208 characters on a line, there must be a switch |
87 |
to ASCII or to the left-hand part of JIS X 0201 before the end of the |
88 |
line (i.e. before the CRLF). This means that the next line starts in |
89 |
the character set that was switched to before the end of the previous |
90 |
line. Other restrictions are given in the Formal Description below. |
91 |
|
92 |
|
93 |
Formal Description |
94 |
|
95 |
This section provides a formal description of the JUNET encoding. In |
96 |
the event that this description is not consistent with the above |
97 |
informal description, this formal description shall take precedence. |
98 |
|
99 |
The notational conventions used here are identical to those used in |
100 |
RFC 822 [RFC822]. |
101 |
|
102 |
The * (asterisk) convention is as follows: |
103 |
|
104 |
l*m something |
105 |
|
106 |
|
107 |
|
108 |
Murai et al Expires 2nd March 1993 [Page 2] |
109 |
|
110 |
Internet Draft Updated 25th August 1992 |
111 |
|
112 |
|
113 |
meaning at least l and at most m somethings, with l and m taking |
114 |
default values of 0 and infinity, respectively. |
115 |
|
116 |
|
117 |
line = *text *1( *segment single-byte-seq *text ) CRLF |
118 |
|
119 |
segment = single-byte-segment / double-byte-segment |
120 |
|
121 |
single-byte-segment = single-byte-seq 1*text |
122 |
|
123 |
double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 ) |
124 |
|
125 |
single-byte-seq = ESC "(" ( "B" / "J" ) |
126 |
|
127 |
double-byte-seq = ESC "$" ( "@" / "B" ) |
128 |
|
129 |
; ( Octal, Decimal.) |
130 |
|
131 |
ESC = <ISO 2022 ESC, escape> ; ( 33, 27.) |
132 |
|
133 |
one-of-94 = <any char in 94-char set> ; (41-176, 33.-126.) |
134 |
|
135 |
CHAR = <any ASCII character> ; ( 0-177, 0.-127.) |
136 |
|
137 |
text = <any CHAR, including bare |
138 |
CR & bare LF, but NOT |
139 |
including CRLF> |
140 |
|
141 |
|
142 |
Additional restrictions that are difficult to describe in the above |
143 |
are as follows. |
144 |
|
145 |
Adjacent segments should have different escape sequences. For |
146 |
example, the following is not recommended: |
147 |
|
148 |
ESC $ B .... ESC $ B .... |
149 |
|
150 |
|
151 |
MIME Considerations |
152 |
|
153 |
The name given to the JUNET character encoding is "ISO-2022-JP". This |
154 |
name is intended to be used in MIME messages as follows: |
155 |
|
156 |
Content-Type: text/plain; charset=iso-2022-jp |
157 |
|
158 |
The JUNET encoding is already in 7-bit form, so it is not necessary |
159 |
to use a Content-Transfer-Encoding header. It should be noted that |
160 |
applying the Base64 or Quoted-Printable encoding will render the |
161 |
|
162 |
|
163 |
|
164 |
Murai et al Expires 2nd March 1993 [Page 3] |
165 |
|
166 |
Internet Draft Updated 25th August 1992 |
167 |
|
168 |
|
169 |
message unreadable in current JUNET software. |
170 |
|
171 |
|
172 |
Background Information |
173 |
|
174 |
The JUNET encoding was described in the JUNET User's Guide [JUNET] |
175 |
(JUNET Riyou No Tebiki Dai Ippan). |
176 |
|
177 |
The encoding is based on the particular usage of ISO 2022 [ISO2022] |
178 |
announced by 4/1. However, the escape sequence normally used for this |
179 |
announcement is not included in JUNET messages. |
180 |
|
181 |
|
182 |
References |
183 |
|
184 |
[ISO2022] International Organization for Standardization (ISO), |
185 |
"Information processing -- ISO 7-bit and 8-bit coded character sets |
186 |
-- Code extension techniques", International Standard, 1986, Ref. No. |
187 |
ISO 2022-1986 (E) |
188 |
|
189 |
[JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide |
190 |
Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET |
191 |
User's Guide (First Edition)"), February 1988 |
192 |
|
193 |
[MIME] Nathaniel Borenstein and Ned Freed, "MIME (Multipurpose |
194 |
Internet Mail Extensions): Mechanisms for Specifying and Describing |
195 |
the Format of Internet Message Bodies", Proposed (Internet) standard, |
196 |
June 1992, rfc1341 |
197 |
|
198 |
[RFC822] David H. Crocker, "Standard for the Format of ARPA Internet |
199 |
Text Messages", Internet standard, August 1982, rfc822 |
200 |
|
201 |
|
202 |
Security Considerations |
203 |
|
204 |
Security considerations are not discussed in this memo. |
205 |
|
206 |
|
207 |
Acknowledgements |
208 |
|
209 |
Many people assisted in drafting this document. The authors wish to |
210 |
thank in particular Akira Kato, Masahiro Sekiguchi and Ken'ichi |
211 |
Handa. |
212 |
|
213 |
|
214 |
|
215 |
|
216 |
|
217 |
|
218 |
|
219 |
|
220 |
Murai et al Expires 2nd March 1993 [Page 4] |
221 |
|
222 |
Internet Draft Updated 25th August 1992 |
223 |
|
224 |
|
225 |
Authors' Addresses |
226 |
|
227 |
|
228 |
Jun Murai |
229 |
Keio University |
230 |
5322 Endo, Fujisawa |
231 |
Fujisawa 252 Japan |
232 |
|
233 |
Fax: +81 (466) 49-1101 |
234 |
|
235 |
EMail: jun@wide.ad.jp |
236 |
|
237 |
|
238 |
Mark Crispin |
239 |
Panda Programming |
240 |
6158 Lariat Loop NE |
241 |
Bainbridge Island, WA 98110-2098 |
242 |
USA |
243 |
|
244 |
Phone: +1 (206) 842-2385 |
245 |
|
246 |
EMail: MRC@PANDA.COM |
247 |
|
248 |
|
249 |
Erik M. van der Poel |
250 |
A-105 Park Avenue |
251 |
4-4-10 Ohta, Kisarazu |
252 |
Chiba 292 Japan |
253 |
|
254 |
Phone: +81 (438) 22-5836 |
255 |
Fax: +81 (438) 22-5837 |
256 |
|
257 |
EMail: erik@poel.juice.or.jp |
258 |
|
259 |
|
260 |
|
261 |
|
262 |
|
263 |
|
264 |
|
265 |
|
266 |
|
267 |
|
268 |
|
269 |
|
270 |
|
271 |
|
272 |
|
273 |
|
274 |
|
275 |
|
276 |
Murai et al Expires 2nd March 1993 [Page 5] |