* Note: * The name is likely to become "ISO-2022-JP" rather than "junet-code". -- Network Working Group [authors] Internet Draft [organizations] 10th April 1992 JUNET Japanese Character Encoding for Internet Messages Status of this Memo This draft document will be submitted to the RFC editor as an informational document. This is a working document only, it should neither be cited nor quoted in any formal document. This document will expire before 10th October 1992. Distribution of this memo is unlimited. Please send comments to net-char@sra.co.jp. Introduction This document describes the encoding used in plain text electronic mail and network news in several Japanese networks. It was first specified by and used in JUNET [JUNET]. The encoding is now also widely used in Japanese IP communities and the Japanese BITNET community (BITNETJP). This document provides a name for the encoding which is intended to be used in the "charset" parameter field of MIME [MIME] messages. This document only describes the encoding of plain text. The encoding of other subtypes of text, such as rich text, is not discussed here. Informal Description The message body starts in ASCII, and switches to Japanese characters through an escape sequence. For example, the escape sequence ESC $ B (three bytes) indicates that the bytes following this escape sequence are Japanese characters, which are encoded in two bytes each. To switch back to ASCII, the escape sequence ESC ( B is used. The following table gives the escape sequences and the character sets used in JUNET messages. ESC ( B ASCII ESC ( J JIS X 0201-1976 (left-hand part) ESC $ @ JIS X 0208-1978 ESC $ B JIS X 0208-1983 [authors] [Page 1] Internet Draft 10th April 1992 The left-hand part of JIS X 0201-1976 is identical to ASCII except for backslash () and tilde (~). The backslash is replaced by the Yen sign, and the tilde is replaced by macron (overline). This set is Japan's national variant of ISO 646. The JIS X 0208 character sets consist of Kanji, Hiragana, Katakana and some other symbols and characters. Each character takes up two bytes. For further details about the JIS Japanese national character set standards, refer to the JIS standards themselves. For further information about the escape sequences, see ISO 2022 [ISO2022]. If there are JIS X 0208 characters on a line, there must be a switch to ASCII or to the left-hand part of JIS X 0201 before the end of the line (i.e. before the CRLF). This means that the next line starts in the character set that was switched to before the end of the previous line. Other restrictions are given in the Formal Description below. Formal Description This section provides a formal description of the JUNET encoding. In the event that this description is not consistent with the above informal description, this formal description shall take precedence. The notational conventions used here are identical to those used in RFC 822 [RFC822]. The * (asterisk) convention is as follows: l*m something meaning at least l and at most m somethings, with l and m taking default values of 0 and infinity, respectively. line = *text *1( *segment single-byte-seq *text ) CRLF segment = single-byte-segment / double-byte-segment single-byte-segment = single-byte-seq 1*text double-byte-segment = double-byte-seq 1*( one-of-94 one-of-94 ) single-byte-seq = ESC "(" ( "B" / "J" ) double-byte-seq = ESC "$" ( "@" / "B" ) [authors] [Page 2] Internet Draft 10th April 1992 ; ( Octal, Decimal.) ESC = ; ( 33, 27.) one-of-94 = ; (41-176, 33.-126.) CHAR = ; ( 0-177, 0.-127.) text = atoms, specials, CR & bare LF, but NOT ; comments and including CRLF> ; quoted-strings are ; NOT recognized. Additional restrictions that are difficult to describe in the above are as follows. Adjacent segments should have different escape sequences. For example, the following is not recommended: ESC $ B .... ESC $ B .... MIME Considerations The name given to the JUNET character encoding is "junet-code". This name is intended to be used in MIME messages as follows: Content-Type: text/plain; charset=junet-code The JUNET encoding is already in 7-bit form, so the correct "transfer encoding" to use is: Content-Transfer-Encoding: 7bit It should be noted that applying the Base64 or Quoted-Printable encoding will render the message unreadable in current JUNET software. Background Information The JUNET encoding was described in the JUNET User's Guide [JUNET] (JUNET Riyou No Tebiki Dai Ippan). The encoding is based on the particular usage of ISO 2022 [ISO2022] announced by 4/1. However, the escape sequence normally used for this announcement is not included in JUNET messages. [authors] [Page 3] Internet Draft 10th April 1992 References [ISO2022] International Standard for Organization (ISO), "Information processing -- ISO 7-bit and 8-bit coded character sets -- Code extension techniques", International Standard, 1986, Ref. No. ISO 2022-1986 (E) [JUNET] JUNET Riyou No Tebiki Sakusei Iin Kai (JUNET User's Guide Drafting Committee), "JUNET Riyou No Tebiki (Dai Ippan)" ("JUNET User's Guide (First Edition)"), February 1988 [MIME] Nathaniel Borenstein and Ned Freed, "MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies", Internet Draft, March 1992, draft-ietf-822ext-messagebodies-06.txt [RFC822] David H. Crocker, "Standard for the Format of ARPA Internet Text Messages", Internet standard, August 1982, rfc822 Security Considerations Security considerations are not discussed in this memo. Authors' Addresses [authors] [organizations] [addresses] [authors] [Page 4]