2004/id/draft-ietf-822ext-iso2022jp-00.txt

Network Working Group                                          Jun Murai
Internet Draft                                              Mark Crispin
                                                       Erik van der Poel
                                                        25th August 1992


           Japanese Character Encoding for Internet Messages


Status of this Memo

   This document is an Internet Draft.  Internet Drafts are working
   documents of the Internet Engineering Task Force (IETF), its Areas,
   and its Working Groups. Note that other groups may also distribute
   working documents as Internet Drafts.

   Internet Drafts are draft documents valid for a maximum of six
   months. Internet Drafts may be updated, replaced, or obsoleted by
   other documents at any time.  It is not appropriate to use Internet
   Drafts as reference material or to cite them other than as a "working
   draft" or "work in progress."

   Please check the I-D abstract listing contained in each Internet
   Draft directory to learn the current status of this or any other
   Internet Draft.

   This draft document will be submitted to the RFC editor as an
   informational document.  This document will expire before 2nd March
   1993.  Distribution of this memo is unlimited.  Please send comments
   to ietf-822@dimacs.rutgers.edu.


Introduction

   This document describes the encoding used in plain text electronic
   mail and network news in several Japanese networks. It was first
   specified by and used in JUNET [JUNET]. The encoding is now also
   widely used in Japanese IP communities.

   This document provides a name for the encoding which is intended to
   be used in the "charset" parameter field of MIME [MIME] messages.

   This document only describes the encoding of plain text. The encoding
   of other subtypes of text, such as rich text, is not discussed here.


Murai et al              Expires 2nd March 1993                 [Page 1]

Internet Draft                                  Updated 25th August 1992


Informal Description

   The message body starts in ASCII, and switches to Japanese characters
   through an escape sequence. For example, the escape sequence ESC $ B
   (three bytes) indicates that the bytes following this escape sequence
   are Japanese characters, which are encoded in two bytes each.  To
   switch back to ASCII, the escape sequence ESC ( B is used.

   The following table gives the escape sequences and the character sets
   used in JUNET messages.

           ESC ( B         ASCII
           ESC ( J         JIS X 0201-1976 (left-hand part)
           ESC $ @         JIS X 0208-1978
           ESC $ B         JIS X 0208-1983

   The left-hand part of JIS X 0201-1976 is identical to ASCII except
   for backslash (\) and tilde (~). The backslash is replaced by the Yen
   sign, and the tilde is replaced by macron (overline). This set is
   Japan's national variant of ISO 646.

   The JIS X 0208 character sets consist of Kanji, Hiragana, Katakana
   and some other symbols and characters. Each character takes up two
   bytes.

   For further details about the JIS Japanese national character set
   standards, refer to the JIS standards themselves. For further
   information about the escape sequences, see ISO 2022 [ISO2022].

   If there are JIS X 0208 characters on a line, there must be a switch
   to ASCII or to the left-hand part of JIS X 0201 before the end of the
   line (i.e. before the CRLF). This means that the next line starts in
   the character set that was switched to before the end of the previous
   line. Other restrictions are given in the Formal Description below.


Formal Description

   This section provides a formal description of the JUNET encoding. In
   the event that this description is not consistent with the above
   informal description, this formal description shall take precedence.

   The notational conventions used here are identical to those used in
   RFC 822 [RFC822].

   The * (asterisk) convention is as follows:

           l*m something


Murai et al              Expires 2nd March 1993                 [Page 2]

Internet Draft                                  Updated 25th August 1992


   meaning at least l and at most m somethings, with l and m taking
   default values of 0 and infinity, respectively.


   line                = *text *1( *segment single-byte-seq *text ) CRLF

   segment             = single-byte-segment / double-byte-segment

   single-byte-segment = single-byte-seq 1*text

   double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )

   single-byte-seq     = ESC "(" ( "B" / "J" )

   double-byte-seq     = ESC "$" ( "@" / "B" )

                                                    ; ( Octal, Decimal.)

   ESC                 = <ISO 2022 ESC, escape>     ; (    33,      27.)

   one-of-94           = <any char in 94-char set>  ; (41-176, 33.-126.)

   CHAR                = <any ASCII character>      ; ( 0-177,  0.-127.)

   text                = <any CHAR, including bare
                          CR & bare LF, but NOT
                          including CRLF>


   Additional restrictions that are difficult to describe in the above
   are as follows.

   Adjacent segments should have different escape sequences. For
   example, the following is not recommended:

           ESC $ B .... ESC $ B ....


MIME Considerations

   The name given to the JUNET character encoding is "ISO-2022-JP". This
   name is intended to be used in MIME messages as follows:

           Content-Type: text/plain; charset=iso-2022-jp

   The JUNET encoding is already in 7-bit form, so it is not necessary
   to use a Content-Transfer-Encoding header. It should be noted that
   applying the Base64 or Quoted-Printable encoding will render the


Murai et al              Expires 2nd March 1993                 [Page 3]

Internet Draft                                  Updated 25th August 1992


   message unreadable in current JUNET software.


Background Information

   The JUNET encoding was described in the JUNET User's Guide [JUNET]
   (JUNET Riyou No Tebiki Dai Ippan).

   The encoding is based on the particular usage of ISO 2022 [ISO2022]
   announced by 4/1. However, the escape sequence normally used for this
   announcement is not included in JUNET messages.


References

   [ISO2022] International Organization for Standardization (ISO),
   "Information processing -- ISO 7-bit and 8-bit coded character sets
   -- Code extension techniques", International Standard, 1986, Ref. No.
   ISO 2022-1986 (E)

   [JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide
   Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET
   User's Guide (First Edition)"), February 1988

   [MIME] Nathaniel Borenstein and Ned Freed, "MIME (Multipurpose
   Internet Mail Extensions): Mechanisms for Specifying and Describing
   the Format of Internet Message Bodies", Proposed (Internet) standard,
   June 1992, rfc1341

   [RFC822] David H. Crocker, "Standard for the Format of ARPA Internet
   Text Messages", Internet standard, August 1982, rfc822


Security Considerations

   Security considerations are not discussed in this memo.


Acknowledgements

   Many people assisted in drafting this document. The authors wish to
   thank in particular Akira Kato, Masahiro Sekiguchi and Ken'ichi
   Handa.


Murai et al              Expires 2nd March 1993                 [Page 4]

Internet Draft                                  Updated 25th August 1992


Authors' Addresses


   Jun Murai
   Keio University
   5322 Endo, Fujisawa
   Fujisawa 252 Japan

   Fax: +81 (466) 49-1101

   EMail: jun@wide.ad.jp


   Mark Crispin
   Panda Programming
   6158 Lariat Loop NE
   Bainbridge Island, WA 98110-2098
   USA

   Phone: +1 (206) 842-2385

   EMail: MRC@PANDA.COM


   Erik M. van der Poel
   A-105 Park Avenue
   4-4-10 Ohta, Kisarazu
   Chiba 292 Japan

   Phone: +81 (438) 22-5836
   Fax:   +81 (438) 22-5837

   EMail: erik@poel.juice.or.jp


Murai et al              Expires 2nd March 1993                 [Page 5]
1	wakaba	1.1	Network Working Group Jun Murai
2			Internet Draft Mark Crispin
3			Erik van der Poel
4			25th August 1992
5
6
7			Japanese Character Encoding for Internet Messages
8
9
10			Status of this Memo
11
12			This document is an Internet Draft. Internet Drafts are working
13			documents of the Internet Engineering Task Force (IETF), its Areas,
14			and its Working Groups. Note that other groups may also distribute
15			working documents as Internet Drafts.
16
17			Internet Drafts are draft documents valid for a maximum of six
18			months. Internet Drafts may be updated, replaced, or obsoleted by
19			other documents at any time. It is not appropriate to use Internet
20			Drafts as reference material or to cite them other than as a "working
21			draft" or "work in progress."
22
23			Please check the I-D abstract listing contained in each Internet
24			Draft directory to learn the current status of this or any other
25			Internet Draft.
26
27			This draft document will be submitted to the RFC editor as an
28			informational document. This document will expire before 2nd March
29			1993. Distribution of this memo is unlimited. Please send comments
30			to ietf-822@dimacs.rutgers.edu.
31
32
33			Introduction
34
35			This document describes the encoding used in plain text electronic
36			mail and network news in several Japanese networks. It was first
37			specified by and used in JUNET [JUNET]. The encoding is now also
38			widely used in Japanese IP communities.
39
40			This document provides a name for the encoding which is intended to
41			be used in the "charset" parameter field of MIME [MIME] messages.
42
43			This document only describes the encoding of plain text. The encoding
44			of other subtypes of text, such as rich text, is not discussed here.
45
46
47
48
49
50
51
52			Murai et al Expires 2nd March 1993 [Page 1]
53
54			Internet Draft Updated 25th August 1992
55
56
57			Informal Description
58
59			The message body starts in ASCII, and switches to Japanese characters
60			through an escape sequence. For example, the escape sequence ESC $ B
61			(three bytes) indicates that the bytes following this escape sequence
62			are Japanese characters, which are encoded in two bytes each. To
63			switch back to ASCII, the escape sequence ESC ( B is used.
64
65			The following table gives the escape sequences and the character sets
66			used in JUNET messages.
67
68			ESC ( B ASCII
69			ESC ( J JIS X 0201-1976 (left-hand part)
70			ESC $ @ JIS X 0208-1978
71			ESC $ B JIS X 0208-1983
72
73			The left-hand part of JIS X 0201-1976 is identical to ASCII except
74			for backslash (\) and tilde (~). The backslash is replaced by the Yen
75			sign, and the tilde is replaced by macron (overline). This set is
76			Japan's national variant of ISO 646.
77
78			The JIS X 0208 character sets consist of Kanji, Hiragana, Katakana
79			and some other symbols and characters. Each character takes up two
80			bytes.
81
82			For further details about the JIS Japanese national character set
83			standards, refer to the JIS standards themselves. For further
84			information about the escape sequences, see ISO 2022 [ISO2022].
85
86			If there are JIS X 0208 characters on a line, there must be a switch
87			to ASCII or to the left-hand part of JIS X 0201 before the end of the
88			line (i.e. before the CRLF). This means that the next line starts in
89			the character set that was switched to before the end of the previous
90			line. Other restrictions are given in the Formal Description below.
91
92
93			Formal Description
94
95			This section provides a formal description of the JUNET encoding. In
96			the event that this description is not consistent with the above
97			informal description, this formal description shall take precedence.
98
99			The notational conventions used here are identical to those used in
100			RFC 822 [RFC822].
101
102			The * (asterisk) convention is as follows:
103
104			l*m something
105
106
107
108			Murai et al Expires 2nd March 1993 [Page 2]
109
110			Internet Draft Updated 25th August 1992
111
112
113			meaning at least l and at most m somethings, with l and m taking
114			default values of 0 and infinity, respectively.
115
116
117			line = text 1( segment single-byte-seq text ) CRLF
118
119			segment = single-byte-segment / double-byte-segment
120
121			single-byte-segment = single-byte-seq 1*text
122
123			double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )
124
125			single-byte-seq = ESC "(" ( "B" / "J" )
126
127			double-byte-seq = ESC "$" ( "@" / "B" )
128
129			; ( Octal, Decimal.)
130
131			ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)
132
133			one-of-94 = <any char in 94-char set> ; (41-176, 33.-126.)
134
135			CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
136
137			text = <any CHAR, including bare
138			CR & bare LF, but NOT
139			including CRLF>
140
141
142			Additional restrictions that are difficult to describe in the above
143			are as follows.
144
145			Adjacent segments should have different escape sequences. For
146			example, the following is not recommended:
147
148			ESC $ B .... ESC $ B ....
149
150
151			MIME Considerations
152
153			The name given to the JUNET character encoding is "ISO-2022-JP". This
154			name is intended to be used in MIME messages as follows:
155
156			Content-Type: text/plain; charset=iso-2022-jp
157
158			The JUNET encoding is already in 7-bit form, so it is not necessary
159			to use a Content-Transfer-Encoding header. It should be noted that
160			applying the Base64 or Quoted-Printable encoding will render the
161
162
163
164			Murai et al Expires 2nd March 1993 [Page 3]
165
166			Internet Draft Updated 25th August 1992
167
168
169			message unreadable in current JUNET software.
170
171
172			Background Information
173
174			The JUNET encoding was described in the JUNET User's Guide [JUNET]
175			(JUNET Riyou No Tebiki Dai Ippan).
176
177			The encoding is based on the particular usage of ISO 2022 [ISO2022]
178			announced by 4/1. However, the escape sequence normally used for this
179			announcement is not included in JUNET messages.
180
181
182			References
183
184			[ISO2022] International Organization for Standardization (ISO),
185			"Information processing -- ISO 7-bit and 8-bit coded character sets
186			-- Code extension techniques", International Standard, 1986, Ref. No.
187			ISO 2022-1986 (E)
188
189			[JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide
190			Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET
191			User's Guide (First Edition)"), February 1988
192
193			[MIME] Nathaniel Borenstein and Ned Freed, "MIME (Multipurpose
194			Internet Mail Extensions): Mechanisms for Specifying and Describing
195			the Format of Internet Message Bodies", Proposed (Internet) standard,
196			June 1992, rfc1341
197
198			[RFC822] David H. Crocker, "Standard for the Format of ARPA Internet
199			Text Messages", Internet standard, August 1982, rfc822
200
201
202			Security Considerations
203
204			Security considerations are not discussed in this memo.
205
206
207			Acknowledgements
208
209			Many people assisted in drafting this document. The authors wish to
210			thank in particular Akira Kato, Masahiro Sekiguchi and Ken'ichi
211			Handa.
212
213
214
215
216
217
218
219
220			Murai et al Expires 2nd March 1993 [Page 4]
221
222			Internet Draft Updated 25th August 1992
223
224
225			Authors' Addresses
226
227
228			Jun Murai
229			Keio University
230			5322 Endo, Fujisawa
231			Fujisawa 252 Japan
232
233			Fax: +81 (466) 49-1101
234
235			EMail: jun@wide.ad.jp
236
237
238			Mark Crispin
239			Panda Programming
240			6158 Lariat Loop NE
241			Bainbridge Island, WA 98110-2098
242			USA
243
244			Phone: +1 (206) 842-2385
245
246			EMail: MRC@PANDA.COM
247
248
249			Erik M. van der Poel
250			A-105 Park Avenue
251			4-4-10 Ohta, Kisarazu
252			Chiba 292 Japan
253
254			Phone: +81 (438) 22-5836
255			Fax: +81 (438) 22-5837
256
257			EMail: erik@poel.juice.or.jp
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276			Murai et al Expires 2nd March 1993 [Page 5]