2005/pre-id/iso-2022-jp.txt

* Note:
* The name is likely to become "ISO-2022-JP" rather than "junet-code".
--

Network Working Group                                          [authors]
Internet Draft                                           [organizations]
                                                         10th April 1992


        JUNET Japanese Character Encoding for Internet Messages


Status of this Memo

   This draft document will be submitted to the RFC editor as an
   informational document.  This is a working document only, it should
   neither be cited nor quoted in any formal document.  This document
   will expire before 10th October 1992.  Distribution of this memo is
   unlimited.  Please send comments to net-char@sra.co.jp.


Introduction

   This document describes the encoding used in plain text electronic
   mail and network news in several Japanese networks. It was first
   specified by and used in JUNET [JUNET]. The encoding is now also
   widely used in Japanese IP communities and the Japanese BITNET
   community (BITNETJP).

   This document provides a name for the encoding which is intended to
   be used in the "charset" parameter field of MIME [MIME] messages.

   This document only describes the encoding of plain text. The encoding
   of other subtypes of text, such as rich text, is not discussed here.


Informal Description

   The message body starts in ASCII, and switches to Japanese characters
   through an escape sequence. For example, the escape sequence ESC $ B
   (three bytes) indicates that the bytes following this escape sequence
   are Japanese characters, which are encoded in two bytes each.  To
   switch back to ASCII, the escape sequence ESC ( B is used.

   The following table gives the escape sequences and the character sets
   used in JUNET messages.

           ESC ( B         ASCII
           ESC ( J         JIS X 0201-1976 (left-hand part)
           ESC $ @         JIS X 0208-1978
           ESC $ B         JIS X 0208-1983


[authors]                                                       [Page 1]

Internet Draft                                           10th April 1992


   The left-hand part of JIS X 0201-1976 is identical to ASCII except
   for backslash () and tilde (~). The backslash is replaced by the Yen
   sign, and the tilde is replaced by macron (overline). This set is
   Japan's national variant of ISO 646.

   The JIS X 0208 character sets consist of Kanji, Hiragana, Katakana
   and some other symbols and characters. Each character takes up two
   bytes.

   For further details about the JIS Japanese national character set
   standards, refer to the JIS standards themselves. For further
   information about the escape sequences, see ISO 2022 [ISO2022].

   If there are JIS X 0208 characters on a line, there must be a switch
   to ASCII or to the left-hand part of JIS X 0201 before the end of the
   line (i.e. before the CRLF). This means that the next line starts in
   the character set that was switched to before the end of the previous
   line. Other restrictions are given in the Formal Description below.


Formal Description

   This section provides a formal description of the JUNET encoding. In
   the event that this description is not consistent with the above
   informal description, this formal description shall take precedence.

   The notational conventions used here are identical to those used in
   RFC 822 [RFC822].

   The * (asterisk) convention is as follows:

           l*m something

   meaning at least l and at most m somethings, with l and m taking
   default values of 0 and infinity, respectively.


   line                = *text *1( *segment single-byte-seq *text ) CRLF

   segment             = single-byte-segment / double-byte-segment

   single-byte-segment = single-byte-seq 1*text

   double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )

   single-byte-seq     = ESC "(" ( "B" / "J" )

   double-byte-seq     = ESC "$" ( "@" / "B" )


[authors]                                                       [Page 2]

Internet Draft                                           10th April 1992


                                                    ; ( Octal, Decimal.)

   ESC                 = <ISO 2022 ESC, escape>     ; (    33,      27.)

   one-of-94           = <any char in 94-char set>  ; (41-176, 33.-126.)

   CHAR                = <any ASCII character>      ; ( 0-177,  0.-127.)

   text                = <any CHAR, including bare  ; => atoms, specials,
                          CR & bare LF, but NOT     ;  comments and
                          including CRLF>           ;  quoted-strings are
                                                    ;  NOT recognized.


   Additional restrictions that are difficult to describe in the above
   are as follows.

   Adjacent segments should have different escape sequences. For
   example, the following is not recommended:

           ESC $ B .... ESC $ B ....


MIME Considerations

   The name given to the JUNET character encoding is "junet-code". This
   name is intended to be used in MIME messages as follows:

           Content-Type: text/plain; charset=junet-code

   The JUNET encoding is already in 7-bit form, so the correct "transfer
   encoding" to use is:

           Content-Transfer-Encoding: 7bit

   It should be noted that applying the Base64 or Quoted-Printable
   encoding will render the message unreadable in current JUNET
   software.


Background Information

   The JUNET encoding was described in the JUNET User's Guide [JUNET]
   (JUNET Riyou No Tebiki Dai Ippan).

   The encoding is based on the particular usage of ISO 2022 [ISO2022]
   announced by 4/1. However, the escape sequence normally used for this
   announcement is not included in JUNET messages.


[authors]                                                       [Page 3]

Internet Draft                                           10th April 1992


References

   [ISO2022] International Standard for Organization (ISO), "Information
   processing -- ISO 7-bit and 8-bit coded character sets -- Code
   extension techniques", International Standard, 1986, Ref. No. ISO
   2022-1986 (E)

   [JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide
   Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET
   User's Guide (First Edition)"), February 1988

   [MIME] Nathaniel Borenstein and Ned Freed, "MIME (Multipurpose
   Internet Mail Extensions): Mechanisms for Specifying and Describing
   the Format of Internet Message Bodies", Internet Draft, March 1992,
   draft-ietf-822ext-messagebodies-06.txt

   [RFC822] David H. Crocker, "Standard for the Format of ARPA Internet
   Text Messages", Internet standard, August 1982, rfc822


Security Considerations

   Security considerations are not discussed in this memo.


Authors' Addresses

   [authors]
   [organizations]
   [addresses]


[authors]                                                       [Page 4]
1	wakaba	1.1	* Note:
2			* The name is likely to become "ISO-2022-JP" rather than "junet-code".
3			--
4
5			Network Working Group [authors]
6			Internet Draft [organizations]
7			10th April 1992
8
9
10			JUNET Japanese Character Encoding for Internet Messages
11
12
13			Status of this Memo
14
15			This draft document will be submitted to the RFC editor as an
16			informational document. This is a working document only, it should
17			neither be cited nor quoted in any formal document. This document
18			will expire before 10th October 1992. Distribution of this memo is
19			unlimited. Please send comments to net-char@sra.co.jp.
20
21
22			Introduction
23
24			This document describes the encoding used in plain text electronic
25			mail and network news in several Japanese networks. It was first
26			specified by and used in JUNET [JUNET]. The encoding is now also
27			widely used in Japanese IP communities and the Japanese BITNET
28			community (BITNETJP).
29
30			This document provides a name for the encoding which is intended to
31			be used in the "charset" parameter field of MIME [MIME] messages.
32
33			This document only describes the encoding of plain text. The encoding
34			of other subtypes of text, such as rich text, is not discussed here.
35
36
37			Informal Description
38
39			The message body starts in ASCII, and switches to Japanese characters
40			through an escape sequence. For example, the escape sequence ESC $ B
41			(three bytes) indicates that the bytes following this escape sequence
42			are Japanese characters, which are encoded in two bytes each. To
43			switch back to ASCII, the escape sequence ESC ( B is used.
44
45			The following table gives the escape sequences and the character sets
46			used in JUNET messages.
47
48			ESC ( B ASCII
49			ESC ( J JIS X 0201-1976 (left-hand part)
50			ESC $ @ JIS X 0208-1978
51			ESC $ B JIS X 0208-1983
52
53
54
55
56			[authors] [Page 1]
57
58			Internet Draft 10th April 1992
59
60
61			The left-hand part of JIS X 0201-1976 is identical to ASCII except
62			for backslash () and tilde (~). The backslash is replaced by the Yen
63			sign, and the tilde is replaced by macron (overline). This set is
64			Japan's national variant of ISO 646.
65
66			The JIS X 0208 character sets consist of Kanji, Hiragana, Katakana
67			and some other symbols and characters. Each character takes up two
68			bytes.
69
70			For further details about the JIS Japanese national character set
71			standards, refer to the JIS standards themselves. For further
72			information about the escape sequences, see ISO 2022 [ISO2022].
73
74			If there are JIS X 0208 characters on a line, there must be a switch
75			to ASCII or to the left-hand part of JIS X 0201 before the end of the
76			line (i.e. before the CRLF). This means that the next line starts in
77			the character set that was switched to before the end of the previous
78			line. Other restrictions are given in the Formal Description below.
79
80
81			Formal Description
82
83			This section provides a formal description of the JUNET encoding. In
84			the event that this description is not consistent with the above
85			informal description, this formal description shall take precedence.
86
87			The notational conventions used here are identical to those used in
88			RFC 822 [RFC822].
89
90			The * (asterisk) convention is as follows:
91
92			l*m something
93
94			meaning at least l and at most m somethings, with l and m taking
95			default values of 0 and infinity, respectively.
96
97
98			line = text 1( segment single-byte-seq text ) CRLF
99
100			segment = single-byte-segment / double-byte-segment
101
102			single-byte-segment = single-byte-seq 1*text
103
104			double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 )
105
106			single-byte-seq = ESC "(" ( "B" / "J" )
107
108			double-byte-seq = ESC "$" ( "@" / "B" )
109
110
111
112			[authors] [Page 2]
113
114			Internet Draft 10th April 1992
115
116
117			; ( Octal, Decimal.)
118
119			ESC = <ISO 2022 ESC, escape> ; ( 33, 27.)
120
121			one-of-94 = <any char in 94-char set> ; (41-176, 33.-126.)
122
123			CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
124
125			text = <any CHAR, including bare ; => atoms, specials,
126			CR & bare LF, but NOT ; comments and
127			including CRLF> ; quoted-strings are
128			; NOT recognized.
129
130
131			Additional restrictions that are difficult to describe in the above
132			are as follows.
133
134			Adjacent segments should have different escape sequences. For
135			example, the following is not recommended:
136
137			ESC $ B .... ESC $ B ....
138
139
140			MIME Considerations
141
142			The name given to the JUNET character encoding is "junet-code". This
143			name is intended to be used in MIME messages as follows:
144
145			Content-Type: text/plain; charset=junet-code
146
147			The JUNET encoding is already in 7-bit form, so the correct "transfer
148			encoding" to use is:
149
150			Content-Transfer-Encoding: 7bit
151
152			It should be noted that applying the Base64 or Quoted-Printable
153			encoding will render the message unreadable in current JUNET
154			software.
155
156
157			Background Information
158
159			The JUNET encoding was described in the JUNET User's Guide [JUNET]
160			(JUNET Riyou No Tebiki Dai Ippan).
161
162			The encoding is based on the particular usage of ISO 2022 [ISO2022]
163			announced by 4/1. However, the escape sequence normally used for this
164			announcement is not included in JUNET messages.
165
166
167
168			[authors] [Page 3]
169
170			Internet Draft 10th April 1992
171
172
173			References
174
175			[ISO2022] International Standard for Organization (ISO), "Information
176			processing -- ISO 7-bit and 8-bit coded character sets -- Code
177			extension techniques", International Standard, 1986, Ref. No. ISO
178			2022-1986 (E)
179
180			[JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide
181			Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET
182			User's Guide (First Edition)"), February 1988
183
184			[MIME] Nathaniel Borenstein and Ned Freed, "MIME (Multipurpose
185			Internet Mail Extensions): Mechanisms for Specifying and Describing
186			the Format of Internet Message Bodies", Internet Draft, March 1992,
187			draft-ietf-822ext-messagebodies-06.txt
188
189			[RFC822] David H. Crocker, "Standard for the Format of ARPA Internet
190			Text Messages", Internet standard, August 1982, rfc822
191
192
193			Security Considerations
194
195			Security considerations are not discussed in this memo.
196
197
198			Authors' Addresses
199
200			[authors]
201			[organizations]
202			[addresses]
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224			[authors] [Page 4]