| 1 |
wakaba |
1.1 |
#?SuikaWiki/0.9 default-name="US-ASCII" |
| 2 |
|
|
|
| 3 |
|
|
|
| 4 |
|
|
* charset 系パラメーター |
| 5 |
|
|
|
| 6 |
|
|
,charset ,token ,[[MIME]] |
| 7 |
|
|
,charset-edition ,4DIGIT ,RFC 1922 |
| 8 |
|
|
,charset-extension ,token ,RFC 1922 |
| 9 |
|
|
|
| 10 |
|
|
|
| 11 |
|
|
* RFC 2046 から |
| 12 |
|
|
|
| 13 |
|
|
A critical parameter that may be specified in the Content-Type field |
| 14 |
|
|
for "text/plain" data is the character set. This is specified with a |
| 15 |
|
|
"charset" parameter, as in: |
| 16 |
|
|
|
| 17 |
|
|
"text/plain" データの Content-Type 領域に指定しても良いパラメーターは |
| 18 |
|
|
文字集合です、これは "charset" パラメーターで次のように指定します。 |
| 19 |
|
|
|
| 20 |
|
|
[PRE[ |
| 21 |
|
|
Content-type: text/plain; charset=iso-8859-1 |
| 22 |
|
|
]PRE] |
| 23 |
|
|
|
| 24 |
|
|
Unlike some other parameter values, the values of the charset |
| 25 |
|
|
parameter are NOT case sensitive. The default character set, which |
| 26 |
|
|
must be assumed in the absence of a charset parameter, is US-ASCII. |
| 27 |
|
|
|
| 28 |
|
|
他の幾つかのパラメーター値とは違って、 charset パラメーターの値は |
| 29 |
|
|
大文字・小文字を区別'''しません'''。 charset パラメーターが無い場合に |
| 30 |
|
|
仮定しなければならない既定の文字集合は、 US-ASCII です。 |
| 31 |
|
|
|
| 32 |
|
|
[PRE[ |
| 33 |
|
|
The specification for any future subtypes of "text" must specify |
| 34 |
|
|
whether or not they will also utilize a "charset" parameter, and may |
| 35 |
|
|
possibly restrict its values as well. For other subtypes of "text" |
| 36 |
|
|
than "text/plain", the semantics of the "charset" parameter should be |
| 37 |
|
|
defined to be identical to those specified here for "text/plain", |
| 38 |
|
|
i.e., the body consists entirely of characters in the given charset. |
| 39 |
|
|
In particular, definers of future "text" subtypes should pay close |
| 40 |
|
|
attention to the implications of multioctet character sets for their |
| 41 |
|
|
subtype definitions. |
| 42 |
|
|
]PRE] |
| 43 |
|
|
|
| 44 |
|
|
将来の "text" の亜型の仕様は "charset" パラメーターを利用するかどうか |
| 45 |
|
|
規定しなければなりません。また、その値を制限しても構いません。 |
| 46 |
|
|
"text/plain" 以外の "text" 亜型には、 "charset" パラメーターの意味は |
| 47 |
|
|
ここで "text/plain" 用に既定するのと同じ様に定義するべきです。 |
| 48 |
|
|
つまり、本文は完全に与えた charset で構成されます。特に、 |
| 49 |
|
|
将来の "text" 亜型の定義者は多オクテット文字集合とその亜型定義 |
| 50 |
|
|
との関係についてしっかり注意するべきです。 |
| 51 |
|
|
|
| 52 |
|
|
[PRE[ |
| 53 |
|
|
The charset parameter for subtypes of "text" gives a name of a |
| 54 |
|
|
character set, as "character set" is defined in RFC 2045. The rules |
| 55 |
|
|
regarding line breaks detailed in the previous section must also be |
| 56 |
|
|
observed -- a character set whose definition does not conform to |
| 57 |
|
|
these rules cannot be used in a MIME "text" subtype. |
| 58 |
|
|
]PRE] |
| 59 |
|
|
|
| 60 |
|
|
"text" 亜型の charset パラメーターは文字集合の名前を与えます。 |
| 61 |
|
|
ここで「文字集合 character set」は RFC 2045 で定義したものです。 |
| 62 |
|
|
前の節で詳しく述べた改行に関する規則にも注意して下さい。 |
| 63 |
|
|
この規則に適合しない文字集合は MIME "text" 亜型で使うことは出来ません。 |
| 64 |
|
|
|
| 65 |
|
|
An initial list of predefined character set names can be found at the |
| 66 |
|
|
end of this section. Additional character sets may be registered with IANA. |
| 67 |
|
|
|
| 68 |
|
|
予め定義した文字集合名はこの節の終わりにあります。追加の文字集合を |
| 69 |
|
|
IANA で登録しても構いません。 |
| 70 |
|
|
|
| 71 |
|
|
Other media types than subtypes of "text" might choose to employ the |
| 72 |
|
|
charset parameter as defined here, but with the CRLF/line break |
| 73 |
|
|
restriction removed. Therefore, all character sets that conform to |
| 74 |
|
|
the general definition of "character set" in RFC 2045 can be |
| 75 |
|
|
registered for MIME use. |
| 76 |
|
|
|
| 77 |
|
|
"text" の亜型以外の媒体型もここで定義した charset パラメーターを使う |
| 78 |
|
|
ことにしても構いませんが、 CRLF/改行制限は削除されます。 |
| 79 |
|
|
ですから、 RFC 2045 の「文字集合 character set」の定義に適合する |
| 80 |
|
|
全ての文字集合を MIME で使用するのに登録出来ます。 |
| 81 |
|
|
|
| 82 |
|
|
[PRE[ |
| 83 |
|
|
Note that if the specified character set includes 8-bit characters |
| 84 |
|
|
and such characters are used in the body, a Content-Transfer-Encoding |
| 85 |
|
|
header field and a corresponding encoding on the data are required in |
| 86 |
|
|
order to transmit the body via some mail transfer protocols, such as |
| 87 |
|
|
SMTP [RFC-821]. |
| 88 |
|
|
]PRE] |
| 89 |
|
|
|
| 90 |
|
|
なお、指定文字集合が8ビット文字を含んでいてそのような文字が本文で |
| 91 |
|
|
使われている場合、 Content-Transfer-Encoding 頭領域と対応するデータの符号化 |
| 92 |
|
|
が本文を SMTP のような幾つかのメイル転送プロトコルで転送するために |
| 93 |
|
|
施す必要があります。 |
| 94 |
|
|
|
| 95 |
|
|
[PRE[ |
| 96 |
|
|
The default character set, US-ASCII, has been the subject of some |
| 97 |
|
|
confusion and ambiguity in the past. Not only were there some |
| 98 |
|
|
ambiguities in the definition, there have been wide variations in |
| 99 |
|
|
practice. In order to eliminate such ambiguity and variations in the |
| 100 |
|
|
future, it is strongly recommended that new user agents explicitly |
| 101 |
|
|
specify a character set as a media type parameter in the Content-Type |
| 102 |
|
|
header field. "US-ASCII" does not indicate an arbitrary 7-bit |
| 103 |
|
|
character set, but specifies that all octets in the body must be |
| 104 |
|
|
interpreted as characters according to the US-ASCII character set. |
| 105 |
|
|
National and application-oriented versions of ISO 646 [ISO-646] are |
| 106 |
|
|
usually NOT identical to US-ASCII, and in that case their use in |
| 107 |
|
|
Internet mail is explicitly discouraged. The omission of the ISO 646 |
| 108 |
|
|
character set from this document is deliberate in this regard. The |
| 109 |
|
|
character set name of "US-ASCII" explicitly refers to the character |
| 110 |
|
|
set defined in ANSI X3.4-1986 [US- ASCII]. The new international |
| 111 |
|
|
reference version (IRV) of the 1991 edition of ISO 646 is identical |
| 112 |
|
|
to US-ASCII. The character set name "ASCII" is reserved and must not |
| 113 |
|
|
be used for any purpose. |
| 114 |
|
|
]PRE] |
| 115 |
|
|
|
| 116 |
|
|
既定の文字集合 US-ASCII は過去より混乱と曖昧がありました。定義に曖昧性がある |
| 117 |
|
|
だけではなく、慣習上多様な変種があります。将来このような曖昧性と変種を |
| 118 |
|
|
取り除くため、新しい利用者代理者は明示的に文字集合を Content-Type |
| 119 |
|
|
頭領域の媒体型パラメーターとして指定することを強く推奨します。 "US-ASCII" |
| 120 |
|
|
は任意の7ビット文字集合を示すのではなく、本文の |
| 121 |
|
|
全てのオクテットを US-ASCII 文字集合によって文字として解釈しなければならない |
| 122 |
|
|
と指定します。国家・応用指向の ISO 646 の版は一般に US-ASCII |
| 123 |
|
|
と同一では'''なく'''、この場合 Internet メイルでの利用は明白に非推奨です。 |
| 124 |
|
|
ISO 646 文字集合をこの文書から省いたのは、このためわざとそうしたのです。 |
| 125 |
|
|
"US-ASCII" の名前の文字集合は明白に、 ANSI X3.4-1986 で定義された文字集合を |
| 126 |
|
|
参照します。 ISO 646 の 1991 年版の新しい国際基準版 (IRV) は US-ASCII |
| 127 |
|
|
と同一です。文字集合名 "ASCII" は保留され、どんな目的にも使ってはいけません。 |
| 128 |
|
|
|
| 129 |
|
|
[PRE[ |
| 130 |
|
|
NOTE: RFC 821 explicitly specifies "ASCII", and references an earlier |
| 131 |
|
|
version of the American Standard. Insofar as one of the purposes of |
| 132 |
|
|
specifying a media type and character set is to permit the receiver |
| 133 |
|
|
to unambiguously determine how the sender intended the coded message |
| 134 |
|
|
to be interpreted, assuming anything other than "strict ASCII" as the |
| 135 |
|
|
default would risk unintentional and incompatible changes to the |
| 136 |
|
|
semantics of messages now being transmitted. This also implies that |
| 137 |
|
|
messages containing characters coded according to other versions of |
| 138 |
|
|
ISO 646 than US-ASCII and the 1991 IRV, or using code-switching |
| 139 |
|
|
procedures (e.g., those of ISO 2022), as well as 8bit or multiple |
| 140 |
|
|
octet character encodings MUST use an appropriate character set |
| 141 |
|
|
specification to be consistent with MIME. |
| 142 |
|
|
]PRE] |
| 143 |
|
|
|
| 144 |
|
|
[PRE[ |
| 145 |
|
|
The complete US-ASCII character set is listed in ANSI X3.4- 1986. |
| 146 |
|
|
Note that the control characters including DEL (0-31, 127) have no |
| 147 |
|
|
defined meaning in apart from the combination CRLF (US-ASCII values |
| 148 |
|
|
13 and 10) indicating a new line. Two of the characters have de |
| 149 |
|
|
facto meanings in wide use: FF (12) often means "start subsequent |
| 150 |
|
|
text on the beginning of a new page"; and TAB or HT (9) often (though |
| 151 |
|
|
not always) means "move the cursor to the next available column after |
| 152 |
|
|
the current position where the column number is a multiple of 8 |
| 153 |
|
|
(counting the first column as column 0)." Aside from these |
| 154 |
|
|
conventions, any use of the control characters or DEL in a body must |
| 155 |
|
|
either occur |
| 156 |
|
|
]PRE] |
| 157 |
|
|
|
| 158 |
|
|
[PRE[ |
| 159 |
|
|
(1) because a subtype of text other than "plain" |
| 160 |
|
|
specifically assigns some additional meaning, or |
| 161 |
|
|
]PRE] |
| 162 |
|
|
|
| 163 |
|
|
[PRE[ |
| 164 |
|
|
(2) within the context of a private agreement between the |
| 165 |
|
|
sender and recipient. Such private agreements are |
| 166 |
|
|
discouraged and should be replaced by the other |
| 167 |
|
|
capabilities of this document. |
| 168 |
|
|
]PRE] |
| 169 |
|
|
|
| 170 |
|
|
[PRE[ |
| 171 |
|
|
NOTE: An enormous proliferation of character sets exist beyond US- |
| 172 |
|
|
ASCII. A large number of partially or totally overlapping character |
| 173 |
|
|
sets is NOT a good thing. A SINGLE character set that can be used |
| 174 |
|
|
universally for representing all of the world's languages in Internet |
| 175 |
|
|
mail would be preferrable. Unfortunately, existing practice in |
| 176 |
|
|
several communities seems to point to the continued use of multiple |
| 177 |
|
|
character sets in the near future. A small number of standard |
| 178 |
|
|
character sets are, therefore, defined for Internet use in this |
| 179 |
|
|
document. |
| 180 |
|
|
]PRE] |
| 181 |
|
|
|
| 182 |
|
|
The defined charset values are: |
| 183 |
|
|
|
| 184 |
|
|
定義されている charset 値は次の通りです。 |
| 185 |
|
|
|
| 186 |
|
|
[PRE[ |
| 187 |
|
|
(1) US-ASCII -- as defined in ANSI X3.4-1986 [US-ASCII]. |
| 188 |
|
|
]PRE] |
| 189 |
|
|
|
| 190 |
|
|
(1) US-ASCII ANSI X3.4-1986 で定義されたもの。 |
| 191 |
|
|
|
| 192 |
|
|
[PRE[ |
| 193 |
|
|
(2) ISO-8859-X -- where "X" is to be replaced, as |
| 194 |
|
|
necessary, for the parts of ISO-8859 [ISO-8859]. Note |
| 195 |
|
|
that the ISO 646 character sets have deliberately been |
| 196 |
|
|
omitted in favor of their 8859 replacements, which are |
| 197 |
|
|
the designated character sets for Internet mail. As of |
| 198 |
|
|
the publication of this document, the legitimate values |
| 199 |
|
|
for "X" are the digits 1 through 10. |
| 200 |
|
|
]PRE] |
| 201 |
|
|
|
| 202 |
|
|
(2) ISO-8859-X ここで "X" は ISO-8859 の部分で置き換えたもの。 |
| 203 |
|
|
なお、 ISO 646 の文字集合達は代わりの Internet メイルの指示文字集合 |
| 204 |
|
|
である 8859 があるので故意に省いています。この文書の出版の時点では、 |
| 205 |
|
|
"X" の適当な値は数字 1 から 10 です。 |
| 206 |
|
|
|
| 207 |
|
|
Characters in the range 128-159 has no assigned meaning in ISO-8859-X. |
| 208 |
|
|
Characters with values below 128 in ISO-8859-X have the same |
| 209 |
|
|
assigned meaning as they do in US-ASCII. |
| 210 |
|
|
|
| 211 |
|
|
範囲 128〜159 の文字は ISO-8859-X で割り当てられた意味はありません。 |
| 212 |
|
|
ISO-8859-X で128以下の値は US-ASCII で割り当てられたのと同じ意味を持ちます。 |
| 213 |
|
|
|
| 214 |
|
|
[PRE[ |
| 215 |
|
|
Part 6 of ISO 8859 (Latin/Arabic alphabet) and part 8 (Latin/Hebrew |
| 216 |
|
|
alphabet) includes both characters for which the normal writing |
| 217 |
|
|
direction is right to left and characters for which it is left to |
| 218 |
|
|
right, but do not define a canonical ordering method for representing |
| 219 |
|
|
bi-directional text. The charset values "ISO-8859-6" and "ISO-8859- |
| 220 |
|
|
8", however, specify that the visual method is used [RFC-1556]. |
| 221 |
|
|
]PRE] |
| 222 |
|
|
|
| 223 |
|
|
[PRE[ |
| 224 |
|
|
All of these character sets are used as pure 7bit or 8bit sets |
| 225 |
|
|
without any shift or escape functions. The meaning of shift and |
| 226 |
|
|
escape sequences in these character sets is not defined. |
| 227 |
|
|
]PRE] |
| 228 |
|
|
|
| 229 |
|
|
[PRE[ |
| 230 |
|
|
The character sets specified above are the ones that were relatively |
| 231 |
|
|
uncontroversial during the drafting of MIME. This document does not |
| 232 |
|
|
endorse the use of any particular character set other than US-ASCII, |
| 233 |
|
|
and recognizes that the future evolution of world character sets |
| 234 |
|
|
remains unclear. |
| 235 |
|
|
]PRE] |
| 236 |
|
|
|
| 237 |
|
|
Note that the character set used, if anything other than US- ASCII, |
| 238 |
|
|
must always be explicitly specified in the Content-Type field. |
| 239 |
|
|
|
| 240 |
|
|
なお、 US-ASCII 以外の文字集合が使われている時は、必ず |
| 241 |
|
|
Content-Type 領域に明示しなければなりません。 |
| 242 |
|
|
|
| 243 |
|
|
[PRE[ |
| 244 |
|
|
No character set name other than those defined above may be used in |
| 245 |
|
|
Internet mail without the publication of a formal specification and |
| 246 |
|
|
its registration with IANA, or by private agreement, in which case |
| 247 |
|
|
the character set name must begin with "X-". |
| 248 |
|
|
]PRE] |
| 249 |
|
|
|
| 250 |
|
|
Implementors are discouraged from defining new character sets unless |
| 251 |
|
|
absolutely necessary. |
| 252 |
|
|
|
| 253 |
|
|
実装者が新しい文字集合を定義するのは完全に必要でない限り非推奨です。 |
| 254 |
|
|
|
| 255 |
|
|
[PRE[ |
| 256 |
|
|
The "charset" parameter has been defined primarily for the purpose of |
| 257 |
|
|
textual data, and is described in this section for that reason. |
| 258 |
|
|
However, it is conceivable that non-textual data might also wish to |
| 259 |
|
|
specify a charset value for some purpose, in which case the same |
| 260 |
|
|
syntax and values should be used. |
| 261 |
|
|
]PRE] |
| 262 |
|
|
|
| 263 |
|
|
[PRE[ |
| 264 |
|
|
In general, composition software should always use the "lowest common |
| 265 |
|
|
denominator" character set possible. For example, if a body contains |
| 266 |
|
|
only US-ASCII characters, it SHOULD be marked as being in the US- |
| 267 |
|
|
ASCII character set, not ISO-8859-1, which, like all the ISO-8859 |
| 268 |
|
|
family of character sets, is a superset of US-ASCII. More generally, |
| 269 |
|
|
if a widely-used character set is a subset of another character set, |
| 270 |
|
|
and a body contains only characters in the widely-used subset, it |
| 271 |
|
|
should be labelled as being in that subset. This will increase the |
| 272 |
|
|
chances that the recipient will be able to view the resulting entity |
| 273 |
|
|
correctly. |
| 274 |
|
|
]PRE] |
| 275 |
|
|
|
| 276 |
|
|
|
| 277 |
|
|
* RFC 1922 4. Two New MIME parameters |
| 278 |
|
|
|
| 279 |
|
|
Here we define two new MIME parameters to be used with "charset" parameters. |
| 280 |
|
|
|
| 281 |
|
|
ここに、 "charset" パラメーターと共に使う、 |
| 282 |
|
|
2つの新しい MIME パラメーターを定義します。 |
| 283 |
|
|
|
| 284 |
|
|
|
| 285 |
|
|
** 4.1. "charset-edition" |
| 286 |
|
|
|
| 287 |
|
|
[PRE[ |
| 288 |
|
|
This parameter is used after the MIME "charset" parameter, using four |
| 289 |
|
|
digits (AD) to indicate what the year of edition is for the character |
| 290 |
|
|
set standard shown in "charset". Its use is optional. |
| 291 |
|
|
Implementations should ignore this parameter unless the |
| 292 |
|
|
implementation has specific support for that particular character set |
| 293 |
|
|
edition. |
| 294 |
|
|
]PRE] |
| 295 |
|
|
|
| 296 |
|
|
[PRE[ |
| 297 |
|
|
The reason for defining this parameter is that there are often |
| 298 |
|
|
differences in the defined characters between editions of a character |
| 299 |
|
|
set standard. Sometimes, the difference can not be ignored, |
| 300 |
|
|
otherwise implementations would have problems when processing it. |
| 301 |
|
|
There are only two ways to indicate this difference, in the current |
| 302 |
|
|
MIME syntax. One way is to indicate the edition in the charset name, |
| 303 |
|
|
such as CN-GB-1988-80 (the 1980's edition of GB 1988). The other way |
| 304 |
|
|
is to define a new optional parameter such as "charset-edition". The |
| 305 |
|
|
latter way is better because receiving applications that can only |
| 306 |
|
|
process an older edition can still recognize the character set and |
| 307 |
|
|
offer to display the text in the older edition. This display may |
| 308 |
|
|
have a few mistakes, but it is better than refusing to display any |
| 309 |
|
|
text at all or defaulting to an inappropriate character set such as |
| 310 |
|
|
US-ASCII or ISO-8859-1. |
| 311 |
|
|
]PRE] |
| 312 |
|
|
|
| 313 |
|
|
|
| 314 |
|
|
** 4.2. "charset-extension" |
| 315 |
|
|
|
| 316 |
|
|
[PRE[ |
| 317 |
|
|
This parameter is also used after the MIME "charset" parameter. It |
| 318 |
|
|
is case-insensitive and optional, and any value of this parameter |
| 319 |
|
|
should be registered in IANA. Unregistered value should start with |
| 320 |
|
|
"x-" as with any MIME extension-token. Implementations should ignore |
| 321 |
|
|
this parameter unless the implementation has specific support for |
| 322 |
|
|
that particular character set extension. |
| 323 |
|
|
]PRE] |
| 324 |
|
|
|
| 325 |
|
|
[PRE[ |
| 326 |
|
|
A character set extension has displayed glyphs for code points that |
| 327 |
|
|
are not assigned in the character set, for example, vendor-specific |
| 328 |
|
|
extensions of standard character sets. This parameter provides the |
| 329 |
|
|
option of using these extensions. Although character set extensions |
| 330 |
|
|
may cause interoperability problems, we recognize the existence of |
| 331 |
|
|
such extensions. |
| 332 |
|
|
]PRE] |
| 333 |
|
|
|
| 334 |
|
|
[PRE[ |
| 335 |
|
|
For example: |
| 336 |
|
|
Content-Type: text/plain; charset=CN-Big5; charset-edition=1984; |
| 337 |
|
|
charset-extension=ETen-2.00.03-DOS |
| 338 |
|
|
]PRE] |
| 339 |
|
|
|
| 340 |
|
|
[PRE[ |
| 341 |
|
|
This may indicate Eten company's extension of Big5: ETen 2.00.03 for |
| 342 |
|
|
DOS, assuming that "ETen-2.00.03-DOS" is registered with the IANA.. |
| 343 |
|
|
]PRE] |
| 344 |
|
|
|
| 345 |
|
|
|
| 346 |
|
|
** 4.3. Formal Syntax: |
| 347 |
|
|
|
| 348 |
|
|
The following changes and additions are made to the MIME syntax: |
| 349 |
|
|
|
| 350 |
|
|
MIME 構文に対して、次の通り変更・追加します。 |
| 351 |
|
|
|
| 352 |
|
|
[PRE[ |
| 353 |
|
|
charset-edition := "charset-edition" "=" 4DIGIT |
| 354 |
|
|
; year of edition in four digits |
| 355 |
|
|
]PRE] |
| 356 |
|
|
|
| 357 |
|
|
[PRE[ |
| 358 |
|
|
charset-extension := "charset-extension" "=" extension-token |
| 359 |
|
|
]PRE] |
| 360 |
|
|
|
| 361 |
|
|
|
| 362 |
|
|
* メモ |
| 363 |
|
|
|
| 364 |
|
|
RFC 1922 部分を訳し終えて送ろうとしたら、糞 IE がいかれて |
| 365 |
|
|
消えちまった。和訳文を返せ。 |
| 366 |
|
|
|
| 367 |
|
|
charset-edition の定義は 4DIGIT ですが、[[10000年問題]] |
| 368 |
|
|
を考慮して、4*DIGIT とするのがよさげ。[[2000年問題]]対策 |
| 369 |
|
|
と称して 2DIGIT に1900足したりする必要はないと思います。 |
| 370 |
|
|
そういうのを受け取っても、 |
| 371 |
|
|
西暦1世紀に制定された charset だということにしてはどーですか? |
| 372 |
|
|
|
| 373 |
|
|
charset-extension の登録簿は IANA には無いみたい。 |
| 374 |
|
|
[[#comment]] |
| 375 |
|
|
|
| 376 |
|
|
|
| 377 |
|
|
* 古い Netscape Navigator の charset パラメーター認識問題 |
| 378 |
|
|
|
| 379 |
|
|
[1] 古い [[NetscapeNavigator]] (2 以前?) には、 [CODE[charset]] |
| 380 |
|
|
パラメーターを特定の値 (hard coding で実装されている名前) |
| 381 |
|
|
以外の場合、手動で[[符号化方法]]を指定することさえ出来なくなり、結果 |
| 382 |
|
|
(多くの場合) 文字化けするという問題があります。 |
| 383 |
|
|
|
| 384 |
|
|
[2] 特に日本語系[[文字コード]]では、 [CODE[x-euc-jp]] 及び [CODE[x-sjis]] |
| 385 |
|
|
という私用名にしか対応しておらず、 [[NN]]2 より後に登録された [[IANA]] |
| 386 |
|
|
名 [CODE[euc-jp]] 及び [CODE[shift_jis]] や [CODE[windows-31j]] |
| 387 |
|
|
が指定されていると文字化けします。 |
| 388 |
|
|
|
| 389 |
|
|
[3] なお、この指定は [[HTTP]] [[頭]]内の [[Content-Type:欄]]に指定しても無視されるようで、 |
| 390 |
|
|
[[HTML]] の [[meta要素]]で [[http-equiv属性]]を使って指定する必要があります。 |
| 391 |
|
|
- [4] >>2 なお、 [CODE[iso-2022-jp]] については問題は起こりません。 |
| 392 |
|
|
- [5] この指定とは関係なしに文書の一部が[[文字化け]]する現象がありますが、原因はよくわかりません。 |
| 393 |
|
|
- [6] [WEAK[2002-12-01 (日) 10:38]] ''[[US-ASCII]]'': >>1-5 問題を Netscape Navigator 2.01 で確認しますた。 (っていうかこんな古い版はやく捨てましょう。) |
| 394 |
|
|
[[#comment]] |
| 395 |
|
|
|
| 396 |
|
|
|
| 397 |
|
|
* quoted-string |
| 398 |
|
|
|
| 399 |
|
|
[15] [[MIME]]/[[HTTP]] で使われる[[媒体型]]の [CODE[charset]] |
| 400 |
|
|
引数の指定方法では、引数値は他の引数同様に |
| 401 |
|
|
[CODE(ABNF)[[[value]]]] |
| 402 |
|
|
です。つまり、 [CODE(ABNF)[[[token]]]] |
| 403 |
|
|
または [CODE(ABNF)[[[quoted-string]]]] |
| 404 |
|
|
を1つ使って指定できます。 |
| 405 |
|
|
|
| 406 |
|
|
例: |
| 407 |
|
|
- [CODE(MIME)[charset=US-ASCII]] |
| 408 |
|
|
- [CODE(MIME)[charset="US-ASCII"]] |
| 409 |
|
|
- [CODE(MIME)[charset="US\-ASCII"]] |
| 410 |
|
|
|
| 411 |
|
|
(すべて等価) |
| 412 |
|
|
|
| 413 |
|
|
[16] [[WinIE6]] は引用符があるものに対応していないみたいです。 |
| 414 |
|
|
|
| 415 |
|
|
簡単な傍証: |
| 416 |
|
|
= [CODE(HTTP)[charset=iso-8859-1]] の実体を表示 |
| 417 |
|
|
= 次のどちらか: |
| 418 |
|
|
-- [CODE(HTTP)[charset=iso-2022-jp]] の実体を表示 |
| 419 |
|
|
-- [CODE(HTTP)[charset="iso-2022-jp"]] の実体を表示 |
| 420 |
|
|
|
| 421 |
|
|
引用符をつけていると見事に化けます。 |
| 422 |
|
|
|
| 423 |
|
|
もちろんこのような杜撰な実装は規格不適合です。 |
| 424 |
|
|
|
| 425 |
|
|
|
| 426 |
|
|
[17] |
| 427 |
|
|
<u style="display: none;">... no changes ... no changes ... no changes ... no changes ... no changes ... no changes ... no changes ... no changes ... no changes ... no changes ... no changes ... </u> |
| 428 |
|
|
|
| 429 |
|
|
( |
| 430 |
|
|
([[Mr.Anonymous]]) |
| 431 |
|
|
|
| 432 |
|
|
|
| 433 |
|
|
[[#comment]] |
| 434 |
|
|
|
| 435 |
|
|
|
| 436 |
|
|
* RFC の部分の License |
| 437 |
|
|
|
| 438 |
|
|
See [[RFCのライセンス]] |
| 439 |
|
|
|
| 440 |
|
|
|
| 441 |
|
|
* メモ |
| 442 |
|
|
|
| 443 |
|
|
- [7] [WEAK[2002-12-01 (日) 10:38]] ''[[US-ASCII]]'': age |
| 444 |
|
|
- [8] [[RFC3023]] は、 [[XML]] 系の媒体型のための [CODE(MIME)[charset]] 引数を規定しています。 |
| 445 |
|
|
- [9] >>8 [CODE[text/xml]] などは、省略時の既定値が MIME でも ''HTTP でも'' [CODE(charset)[us-ascii]] です。また、 [CODE[application/xml]] などは、省略時には既定値なしで、 [[xml宣言]]などを参照します。 |
| 446 |
|
|
- [10] >>8-9 他の [CODE[+xml]] 系媒体型がよく定義にこの RFC を参照していますから、影響力は大きいです。 |
| 447 |
|
|
- [11] >>10 [INS[というわけで詳しい開設を各。]] |
| 448 |
|
|
- [12] ''Bug 1697 - multipart/form-data 送出時に Content-Type が送られない'' <http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=1697>, ''Bug 116346 - Content-Type should be supplied for form data of 'enctype="multipart/form-data"'[from sub]'' <http://bugzilla.mozilla.org/show_bug.cgi?id=116346> : [CODE(MIME)[[[multipart/form-data]]]] を送るときに [[Mozilla]] が [CODE(MIME)[charset]] 引数をつけなかったという問題。だけどこれは特定の媒体型や特定の実装に固有じゃない、実は大きな問題をはらんでいます。「ファイル添付」のような、 MIME 実体を生成する応用を単に通過するだけのデータのメタ情報をどうやって得るのかとか、多文字化された実装で charset 情報とどう向き合うのかとか。 MIME は10年も前の規格で、こんなことなんて考えてもいなかったわけですけど、この先どうなるでしょう。やっぱりこれまで通り騙し騙し無理しながら現状維持し続けるしか方法はないのかな。 |
| 449 |
|
|
- [13] [CODE(MIME)[charset=136]] ってのみかけたけど、中身は [CODE(charset)[[[Big5]]]] だった。どうしてこんな値になったんだ? |
| 450 |
|
|
- [14] >>13 はちなみに [[spam]]。 |