/[suikacvs]/webroot/www/2004/id/draft-ietf-html-i18n-02.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-html-i18n-02.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:04 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2 Network Working Group F. Yergeau
3 Internet Draft G. Nicol
4 <draft-ietf-html-i18n-02.txt> G. Adams
5 Expires 27 May 1996 M. Duerst
6 22 November 1995
7
8
9 Internationalization of the Hypertext Markup Language
10
11
12 Status of this Memo
13
14 This document is an Internet-Draft. Internet-Drafts are working doc-
15 uments of the Internet Engineering Task Force (IETF), its areas, and
16 its working groups. Note that other groups may also distribute work-
17 ing documents as Internet-Drafts.
18
19 Internet-Drafts are draft documents valid for a maximum of six
20 months. Internet-Drafts may be updated, replaced, or obsoleted by
21 other documents at any time. It is not appropriate to use Internet-
22 Drafts as reference material or to cite them other than as a "working
23 draft" or "work in progress".
24
25 To learn the current status of any Internet-Draft, please check the
26 1id-abstracts.txt listing contained in the Internet-Drafts Shadow
27 Directories on ds.internic.net (US East Coast), nic.nordu.net
28 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
29 Rim).
30
31 Distribution of this document is unlimited. Please send comments to
32 the HTML working group (HTML-WG) of the Internet Engineering Task
33 Force (IETF) at <html-wg@oclc.org>. Discussions of the group are
34 archived at URL: http://www.acl.lanl.gov/HTML_WG/archives.html.
35
36
37 Abstract
38
39 The Hypertext Markup Language (HTML) is a simple markup language used
40 to create hypertext documents that are platform independent. Ini-
41 tially, the application of HTML on the World Wide Web was seriously
42 restricted by its reliance on the ISO-8859-1 coded character set,
43 which is appropriate only for Western European languages. Despite
44 this restriction, HTML has been widely used with other languages,
45 using other coded character sets or character encodings, through var-
46 ious ad hoc extensions to the language.
47
48 This document is meant to address the issue of the internationaliza-
49 tion of HTML by extending the specification of HTML and giving
50
51
52
53 Expires 27 May 1996 [Page 1]
54
55 Internet Draft HTML internationalization 22 November 1995
56
57
58 additional recommendations for proper internationalisation support.
59 A foremost consideration is to make sure that HTML remains a valid
60 application of SGML, while enabling its use in all languages of the
61 world.
62
63
64 Table of contents
65
66 1. Introduction .................................................. 2
67 1.1. Scope ...................................................... 3
68 1.2. Conformance ................................................ 3
69 2. The document character set ..................................... 4
70 2.1. Reference processing model ................................. 4
71 2.2. The document character set ................................. 6
72 2.3. Undisplayable characters ................................... 7
73 3. Language tags .................................................. 7
74 4. Additional entities, attributes and elements ................... 9
75 4.1. Full Latin-1 entity set .................................... 9
76 4.2. Markup for language-dependent presentation ................. 9
77 5. Forms ..........................................................11
78 5.1. DTD additions ..............................................12
79 5.2. Form submission ............................................12
80 6. Miscellaneous ..................................................13
81 7. HTML public text ...............................................14
82 7.1. HTML DTD ...................................................14
83 7.2. SGML declaration for HTML ..................................29
84 7.3. ISO Latin 1 character entity set ...........................31
85 Bibliography ......................................................33
86 Authors' Addresses ................................................35
87
88
89 1. Introduction
90
91 The Hypertext Markup Language (HTML) is a simple markup language used
92 to create hypertext documents that are platform independent. Ini-
93 tially, the application of HTML on the World Wide Web was seriously
94 restricted by its reliance on the ISO-8859-1 coded character set,
95 which is appropriate only for Western European languages. Despite
96 this restriction, HTML has been widely used with other languages,
97 using other coded character sets or character encodings, through var-
98 ious ad hoc extensions to the language [TAKADA].
99
100 This document is meant to address the issue of the internationaliza-
101 tion of HTML by extending the specification of HTML and giving addi-
102 tional recommendations for proper internationalisation support. It
103 is in good part based on a paper by one of the authors on multilin-
104 gualism on the WWW [NICOL]. A foremost consideration is to make sure
105 that HTML remains a valid application of SGML, while enabling its use
106
107
108
109 Expires 27 May 1996 [Page 2]
110
111 Internet Draft HTML internationalization 22 November 1995
112
113
114 in all languages of the world.
115
116 The specific issues addressed are the SGML document character set to
117 be used for HTML, the proper treatment of the charset parameter asso-
118 ciated with the "text/html" content type and the specification of
119 language tags and additional entities.
120
121
122 1.1 Scope
123
124 HTML has been in use by the World-Wide Web (WWW) global information
125 initiative since 1990. This specification extends the capabilities
126 of HTML (RFC 1866), primarily by removing the restriction to the
127 ISO-8859-1 coded character set [ISO-8859-1].
128
129 HTML is an application of ISO Standard 8879:1986, Information Pro-
130 cessing Text and Office Systems -- Standard Generalized Markup Lan-
131 guage (SGML) [ISO-8879]. The HTML Document Type Definition (DTD) is a
132 formal definition of the HTML syntax in terms of SGML. This specifi-
133 cation amends the DTD of HTML in order to make it applicable to docu-
134 ments encompassing a character repertoire much larger than that of
135 ISO-8859-1, while still remaining SGML conformant.
136
137
138 1.2 Conformance
139
140 This specification changes slightly the conformance requirements of
141 HTML documents and HTML user agents.
142
143 1.2.1 Documents
144
145 All HTML 2.0 conforming documents remain conforming with this speci-
146 fication. However, the extensions introduced here make valid cer-
147 tains documents that would not be HTML 2.0 conforming, in particular
148 those containing characters or character references outside of the
149 repertoire of ISO 8859-1.
150
151 1.2.2. User agents
152
153 In addition to the requirements of RFC 1866, the following require-
154 ments are placed on HTML user agents.
155
156 To ensure interoperability and proper support for at least
157 ISO-8859-1 in an environment where character encoding schemes
158 other than ISO-8859-1 are present, user agents must correctly
159 interpret the charset parameter accompanying an HTML document
160 received from the network.
161
162
163
164
165 Expires 27 May 1996 [Page 3]
166
167 Internet Draft HTML internationalization 22 November 1995
168
169
170 Furthermore, conforming user-agents are required to at least parse
171 correctly numeric character references within the range of the
172 Basic Multilingual Plane (BMP) of ISO 10646-1 [ISO-10646].
173
174 NOTE -- To support non-western writing systems, it is rec-
175 ommended that HTML user agents support `UNICODE-1-1' or
176 similar character encoding schemes and as much of the char-
177 acter repertoire of [ISO-10646] as is practical.
178
179 2. The document character set
180
181 2.1. Reference processing model
182
183 This overview explains a reference processing model used for HTML,
184 and in particular the SGML concept of a document character set. An
185 actual implementation may widely differ in its internal workings from
186 the model given below, but should behave as described to an outside
187 observer.
188
189 Because there are various widely differing encodings of text, SGML
190 does not directly address the question of how characters are encoded
191 e.g. in a file. SGML views the characters as a single set (called a
192 "character repertoire"), and a "code set" that assigns an integer
193 number (known as "character number") to each character in the reper-
194 toire. The document character set declaration defines what each of
195 the character numbers represents [GOLD90, p. 451]. In most cases, an
196 SGML DTD and all documents that refer to it have a single document
197 character set, and all markup and data characters are part of this
198 set.
199
200 HTML, as an application of SGML, does not directly address the ques-
201 tion of how characters are encoded as octets in external representa-
202 tions such as files. This is deferred to mechanisms external to HTML,
203 such as the HTTP protocol, or MIME for electronic mail.
204
205 For the HTTP protocol [HTTP], the way characters are encoded is
206 defined by the "charset" parameter[1] of the "Content-Type" field of
207 the header of an HTTP response. For example, to indicate that the
208 _________________________
209 [1] The term "charset" in MIME is used to designate a
210 character encoding, rather than a coded character set
211 as the term may suggest. A character encoding is a
212 mapping (possibly many-to-one) of a sequence of octets
213 to a sequence of characters taken from one or more
214 character repertoires. A coded character set is a map-
215 ping between individual bit patterns and individual
216 characters from a single character repertoire.
217
218
219
220
221 Expires 27 May 1996 [Page 4]
222
223 Internet Draft HTML internationalization 22 November 1995
224
225
226 transmitted document is encoded in the "JIS" encoding of Japanese
227 [RFC1468], the header will contain the following line:
228
229 Content-Type: text/html; charset=ISO-2022-JP
230
231 The default charset parameter in the case of the HTTP protocol is
232 ISO-8859-1 (the so-called "Latin-1" for Western European characters).
233 The HTTP protocol also defines a mechanism for the client to specify
234 the character encodings it can accept. Clients and servers are
235 strongly requested to use these mechanisms to assure correct trans-
236 mission and interpretation of any document. Provisions that can be
237 taken to help correct interpretation, even in cases where a server or
238 client do not yet use these mechanisms, are described in section 6.
239
240 Similarly, if HTML documents are transferred by electronic mail, the
241 character encoding is defined by the "charset" parameter of the "Con-
242 tent-Type" MIME header line [RFC1521].
243
244 In the case any other way of transferring and storing HTML documents
245 are defined or become popular, it is advised that similar provisions
246 should be made to clearly identify the character encoding used and/or
247 to use a single/default encoding capable of representing the widest
248 range of characters used in an international context.
249
250 Whatever the external character encoding actually be, the reference
251 processing model translates it to a representation of the document
252 character set specified in Section 2.2 before processing specific to
253 SGML/HTML. The reference processing model can be depicted as fol-
254 lows:
255
256 [resource]->[decoder]->[entity ]->[ SGML ]->[application]->[display]
257 [manager] [parser]
258 ^ |
259 | |
260 +----------+
261
262 The decoder is responsible for decoding the external representation
263 of the resource to a representation using the document character set.
264 The entity manager, the parser, and the application deal only with
265 characters of the document character set. A display-oriented part of
266 the application or the display machinery itself may again convert
267 characters represented in the document character set to some other
268 representation more suitable for their purpose. In any case, the
269 entity manager, the parser, and the application, as far as character
270 semantics are concerned, are using the HTML document character set
271 only.
272
273 An actual implementation may choose, or not, to translate the
274
275
276
277 Expires 27 May 1996 [Page 5]
278
279 Internet Draft HTML internationalization 22 November 1995
280
281
282 document into some encoding of the document character set as
283 described above; the behaviour described by this reference processing
284 model can be achieved otherwise. This subject is well out of the
285 scope of this specification, however, and the reader is invited to
286 consult the SGML standard [ISO-8879] or a SGML handbook [BRYAN88]
287 [GOLD90] [VANH90] [SQ91] for further information.
288
289 The most important consequence of this reference processing model is
290 that numeric character references are always resolved to the same
291 characters, whatever the external encoding actually used. For an
292 example, see Section 2.2.
293
294 2.2. The document character set
295
296 The document character set, in the SGML sense, is the Basic Multilin-
297 gual Plane of ISO 10646:1993 [ISO-10646], also known as UCS-2. This
298 is code-by-code identical with the Unicode standard [UNICODE]. The
299 adoption of this document character set implies a change in the SGML
300 declaration specified in the HTML 2.0 specification (section 9.5 of
301 [HTML-2]). The change amounts to removing the two BASESET specifica-
302 tions and their accompanying DESCSET declarations, replacing them
303 with the following declaration:
304
305 BASESET "ISO Registration Number 176//CHARSET
306 ISO/IEC 10646-1:1993 UCS-2 with implementation level 3
307 //ESC 2/5 2/15 4/5"
308 DESCSET 0 9 UNUSED
309 9 2 9
310 11 2 UNUSED
311 13 1 13
312 14 18 UNUSED
313 32 95 32
314 127 1 UNUSED
315 128 32 UNUSED
316 160 65374 160
317
318 Making UCS-2 the document character set does not create non-
319 conformance of any expression, construct or document that is conform-
320 ing to HTML 2.0. It does make conforming certain constructs that are
321 not admissible in HTML 2.0. One consequence is that data characters
322 outside the repertoire of ISO-8859-1, but within that of UCS-2 become
323 valid SGML characters. Another is that the upper limit of the range
324 of numeric character references is extended from 255 to 65533[2] ;
325 _________________________
326 [2] 65533 (FFFD hexadecimal) is the last valid char-
327 acter in UCS-2. 65534 (FFFE hexadecimal) is unassigned
328 and reserved as the byte-swapped version of ZERO WIDTH
329 NON-BREAKING SPACE for byte-sex detection purposes.
330
331
332
333 Expires 27 May 1996 [Page 6]
334
335 Internet Draft HTML internationalization 22 November 1995
336
337
338 thus, &#1048; is a valid reference to a "CYRILLIC CAPITAL LETTER I".
339 [ERCS] is a good source of information on Unicode and SGML, although
340 its scope and technical content differ greatly from this specifica-
341 tion.
342
343 ISO 10646-1:1993 is the most encompassing character set currently
344 existing, and there is no other character set that could take its
345 place as the document character set for HTML. Also, it is expected
346 that with future extensions of ISO 10646, this specification may also
347 be extended. If nevertheless for a specific application there is a
348 need to use characters outside this standard, this should be done by
349 avoiding any conflicts with present or future versions of ISO 10646,
350 i.e. by assigning these characters to a private zone. Also, it should
351 be borne in mind that such a use will be highly unportable; in many
352 cases, it may be better to use inline bitmaps.
353
354 2.3. Undisplayable characters
355
356 With the document character set being the full ISO 10646 BMP, the
357 possibility that a character cannot be displayed due to lack of
358 appropriate resources (fonts) cannot be avoided. Because there are
359 many different things that can be done in such a case, this document
360 does not recommend any specific behaviour. Depending on the implemen-
361 tation, this may also be handled by the underlaying display system
362 and not the application itself. The following considerations, how-
363 ever, may be of help:
364
365 - A clearly visible, but unobtrusive behaviour should be preferred.
366 Some documents may contain many characters that cannot be renden-
367 dered, and so showing an alert for each of them is not the right
368 thing to do.
369
370 - In case a numeric representation of the missing character is
371 given, its hexadecimal (not decimal) form is to be preferred,
372 because this form is used in character set standards [ERCS].
373
374 3. Language tags
375
376 Language tags can be used to control rendering of a marked up docu-
377 ment in various ways: character disambiguation, in cases where the
378 character encoding is not sufficient to resolve to a specific glyph;
379 quotation marks; hyphenation; ligatures; spacing; voice synthesis;
380 etc. Independently of rendering issues, language markup is useful as
381 content markup for purposes such as classification and searching.
382
383 _________________________
384 65535 (FFFF hexadecimal) is unassigned.
385
386
387
388
389 Expires 27 May 1996 [Page 7]
390
391 Internet Draft HTML internationalization 22 November 1995
392
393
394 The language attribute, LANG, takes as its value a language tag that
395 identifies a natural language spoken, written, or otherwise conveyed
396 by human beings for communication of information to other human
397 beings. Computer languages are explicitly excluded.
398
399 The syntax and registry of HTML language tags is the same as that
400 defined by RFC 1766 [RFC1766]. In summary, a language tag is composed
401 of one or more parts: A primary language tag and a possibly empty
402 series of subtags:
403
404 language-tag = primary-tag *( "-" subtag )
405 primary-tag = 1*8ALPHA
406 subtag = 1*8ALPHA
407
408 Whitespace is not allowed within the tag and all tags are case-
409 insensitive. The namespace of language tags is administered by the
410 IANA. Example tags include:
411
412 en, en-US, en-cockney, i-cherokee, x-pig-latin
413
414 Two-letter primary-tags are reserved for ISO 639 language abbrevia-
415 tions [ISO-639], and three-letter primary-tags for the language
416 abbreviations of the "Ethnologue" [ETHNO] (the latter is in addition
417 to the requirements of RFC 1766). Any two-letter initial subtag is an
418 ISO 3166 country code [ISO-3166].
419
420 In the context of HTML, a language tag is not to be interpreted as a
421 single token, as per RFC 1766, but as a hierarchy. For example, a
422 user agent that adjusts rendering according to language should con-
423 sider that it has a match when a language tag in a style sheet entry
424 matches the initial portion of the language tag of an element. An
425 exact match should be preferred. This interpretation allows an ele-
426 ment marked up as, for instance, "en-US" to trigger styles corre-
427 sponding to, in order of preference, US-English ("en-US") or 'plain'
428 or 'international' English ("en").
429
430 NOTE -- using the language tag as a hierarchy does not
431 imply that all languages with a common prefix will be
432 understood by those fluent in one or more of those lan-
433 guages; it simply allows the user to request this commonal-
434 ity when it is true for that user.
435
436 Since any text can logically be assigned a language, almost all HTML
437 elements admit the LANG attribute. The DTD reflects this. It is
438 also intended that any new element introduced in later versions of
439 HTML will admit the LANG attribute, unless there is a good reason not
440 to do so.
441
442
443
444
445 Expires 27 May 1996 [Page 8]
446
447 Internet Draft HTML internationalization 22 November 1995
448
449
450 The rendering of elements is meant to be controlled (in part) by the
451 LANG attribute. Specific user preferences set within the browser
452 should override the value of the LANG attribute, which in turn over-
453 rides the value specified by the LANG attribute of any enclosing ele-
454 ment. If none of these are set, a suitable default, perhaps con-
455 trolled by the user's locale, should be used to control rendering.
456
457 4. Additional entities, attributes and elements
458
459 4.1. Full Latin-1 entity set
460
461 According to the suggestion of section 14 of [HTML-2], the set of
462 Latin-1 entities is extended to cover the whole right part of
463 ISO-8859-1 (all code positions with the high-order bit set). The
464 names of the entities are taken from the appendices of [SGML]. A
465 list is provided in section 7.3.1 of this specification.
466
467 4.2. Markup for language-dependent presentation
468
469 For the correct presentation of text from certain languages (irre-
470 spective of formatting issues), some support in the form of addi-
471 tional entities and elements is needed. In particular, bidirectional
472 text (BIDI for short) requires markup in special circumstances where
473 ambiguities as to the directionnality of some characters have to be
474 resolved. Plain text may contain this markup in the form of special-
475 purpose characters; in HTML, these are replaced by SGML markup to be
476 described below.
477
478 This markup affects the ability to render BIDI text in a semantically
479 legible fashion. That is, without this special BIDI markup, cases
480 arise which would prevent *any* rendering whatsoever that reflected
481 the basic meaning of the text. It is for this reason that these spe-
482 cial characters were added to Unicode (and, thence, to ISO/IEC
483 10646). If it were possible to do reliable layout and rendering of
484 bidirectionnal text without them, they definitely would not have been
485 included in Unicode.
486
487 First, a set of named character entities is added that allows partial
488 support of the Unicode bidirectional algorithm [UNICODE], plus some
489 help with languages requiring contextual analysis for rendering:
490
491 <!ENTITY zwnj CDATA "&#8204;"--=zero width non-joiner-->
492 <!ENTITY zwj CDATA "&#8205;"--=zero width joiner-->
493 <!ENTITY lrm CDATA "&#8206;"--=left-to-right mark-->
494 <!ENTITY rlm CDATA "&#8207;"--=right-to-left mark-->
495
496 The first two, zwnj and zwj, are used to force or block joining
497 behavior in contexts which joining would occur but should not or
498
499
500
501 Expires 27 May 1996 [Page 9]
502
503 Internet Draft HTML internationalization 22 November 1995
504
505
506 would not occur but should. For example, ARABIC LETTER HEH is used
507 to abbreviate "Hijri" (the Islamic calendrical system); however, the
508 isolated form of HEH looks like the digit five as employed in Arabic
509 script (actually based on Indic digits). In order to prevent one
510 from reading HEH as a final digit five in a year, the initial form of
511 HEH is used. However, there is no following context (i.e., a joining
512 letter) to which the HEH can join. Therefore, the ZWJ is used to
513 provide that context. In Persian texts, there are cases where a let-
514 ter that normally would join a subsequent letter in a cursive connec-
515 tion does not. Here the ZWNJ is used.
516
517 The other two, lrm and rlm, are used to disambiguate directionality
518 of directionally neutral characters, e.g., if you have a double quote
519 sitting between an Arabic and a Latin letter, then which direction
520 does the quote resolve to? These characters are like zero width
521 spaces which have a directional property (but no word/line break
522 property).
523
524 Next, an attribute called DIR is introduced, restricted to the values
525 LTR and RTL and admitted by most elements. On block-type elements,
526 the DIR attribute indicates the base directionnality of the text in
527 the block; if omitted it is inherited from the parent element. On
528 inline elements, it makes the element start a new embedding level; if
529 omitted the inline element does not start a new embedding level.
530 Embedding is used to handle nested directional runs; a common need
531 for the embedding characters is to handle text that has been pasted
532 from one bidi context to another, and the possibility of multiply
533 embedded pastings. Following is an example of a case where embedding
534 is needed, showing its effect:
535
536 Given the following latin (upper case) and arabic (lower
537 case) letters in backing store with the specified embed-
538 dings (LRE is shorthand for <SPAN DIR=LTR>, RLE for <SPAN
539 DIR=RTL> and PDF for </SPAN>):
540
541 LRE A B RLE a b LRE C D PDF c d PDF E F PDF
542
543 One gets the following rendering (with [] showing the
544 directional transitions):
545
546 [ A B [ d c [ C D ] b a ] E F ]
547
548 On the other hand, without these characters, e.g., with
549
550 A B a b C D c d E F
551
552 and a base level of LTR one gets the following rendering:
553
554
555
556
557 Expires 27 May 1996 [Page 10]
558
559 Internet Draft HTML internationalization 22 November 1995
560
561
562 [ A B [ b a ] C D [ d c ] E F ]
563
564 Notice that b,a is on the left and d,c on the right unlike
565 the above case where the embedding levels are used. With-
566 out the embedding characters one has at most two levels: a
567 base directional level and a single counterflow directional
568 level.
569
570 A directionnal override feature is needed to deal with
571 unusual pieces of text in which directionality cannot be
572 resolved from context in an unambiguous fashion. For exam-
573 ple, in part numbers, formulas, telephone numbers, and
574 other similar pieces of text, it is difficult or impossible
575 to derive the directionality of numbers, punctuation, and
576 other neutrals from their context. To this effect, a new
577 element called BDO (BIDI override) is introduced, which
578 requires the DIR attribute to specify whether the override
579 is left-to-right or right-to-left.
580
581 A few other additional elements are important to have for
582 proper language-dependent rendering. First, a generic con-
583 tainer is needed to carry the LANG and BIDI attributes in
584 cases where no other element is appropriate; the SPAN ele-
585 ment is introduced for that purpose.
586
587 Short quotations, and in particular the quotation marks
588 surrounding them, are typically rendered differently in
589 different languages and on platforms with different graphic
590 capabilities: "a quotation in English", `another, slightly
591 better one', ,,a quotation in German", << a quotation in
592 French >>. The <Q> element is introduced for that purpose.
593
594 Many languages, notably French, require superscripts for
595 proper rendering: "Mlle Dupont" should have "lle" in super-
596 script. The <SUP> element, and its sibling <SUB>, are
597 introduced to allow proper markup of such text. <SUP> and
598 <SUB> contents are restricted to PCDATA to avoid nesting
599 problems.
600
601 Finally, in many languages text justification is much more
602 important than it is in Western languages, and justifies
603 markup. The ALIGN attribute, admitting values of LEFT,
604 RIGHT, CENTER and JUSTIFY, is added to a selection of ele-
605 ments where it makes sense (block-like).
606
607 5. Forms
608
609
610
611
612
613 Expires 27 May 1996 [Page 11]
614
615 Internet Draft HTML internationalization 22 November 1995
616
617
618 5.1. DTD additions
619
620 It is natural to expect input in any language in forms, as they pro-
621 vide one of the only ways of obtaining user input. While this is pri-
622 marily a UI issue, there are some things that should be specified at
623 the HTML level to guide behavior and promote interoperability.
624
625 To ensure interoperability, it is necessary for the user agent (and
626 the user) to have an indication of the character set(s) that the
627 server providing a form will be able to handle upon submission of the
628 filled-in form. Such an indication is provided by the ACCEPT-CHARSET
629 attribute of the FORM element, modeled on the HTTP Accept-Charset
630 header (see [HTTP]), which contains a space and/or comma delimited
631 list of character sets acceptable to the server. A user agent may
632 want to somehow advise the user of the contents of this attribute, or
633 to restrict his possibility to enter unacceptable characters.
634
635 NOTE -- The list of character sets is to be interpreted as
636 an EXCLUSIVE-OR list; the server announces that it is ready
637 to accept any ONE of these character encoding schemes for
638 each part of a multipart entity.
639
640 NOTE -- The default value for the ACCEPT-CHARSET attribute
641 of a FORM element is the reserved value "UNKNOWN". A user
642 agent may interpret that value as the character encoding
643 scheme that was used to transmit the document containing
644 that FORM element.
645
646
647 5.2. Form submission
648
649 The HTML 2.0 form submission mechanism, based on the "application/x-
650 www-form-urlencoded" media type, is hopelessly broken with regard to
651 internationalization. In fact, since URLs are restricted to ASCII
652 characters, the mechanism is broken even for ISO-8859-1 text. Sec-
653 tion 2.2 of [RFC1738] specifies that octets may be encoded using the
654 "%HH" notation, but text submitted from a form is composed of charac-
655 ters, not octets. Lacking a specification of a character encoding
656 scheme, the "%HH" notation has no meaning.
657
658 A partial solution to this sorry state of affairs is to specify a
659 default character encoding scheme to be assumed when the GET method
660 of form submission is used. Specifying UCS-2 would break all exist-
661 ing forms, so the only sensible way is to designate ISO-8859-1. That
662 is, the encoded URL sent to submit a form by the GET method is to be
663 interpreted as a sequence of single-octet characters encoded accord-
664 ing to ISO-8859-1, and further encoded according to the scheme of
665 [RFC1738] (the "%HH" notation). This is clearly insufficient, so the
666
667
668
669 Expires 27 May 1996 [Page 12]
670
671 Internet Draft HTML internationalization 22 November 1995
672
673
674 GET method of form submission is deprecated and should not be used in
675 future documents, despite the language of section 8 of [HTML-2].
676
677 A better solution is to add a MIME charset parameter to the "applica-
678 tion/x-www-form-urlencoded" media type specifier sent along with a
679 POST method form submission, with the understanding that the URL
680 encoding of [RFC1738] is applied on top of the specified character
681 encoding, as a kind of implicit Content-Transfer-Encoding. The
682 default ISO-8859-1 is implied in the absence of a charset parameter.
683
684 The best solution is to use the "multipart/form-data" media type
685 described in [FILE-UPLOAD] with the POST method of form submission.
686 This mechanism encapsulates the value part of each name-value pair in
687 a body-part of a multipart MIME body that is sent as the HTTP entity;
688 each body part can be labeled with an appropriate Content-Type,
689 including if necessary a charset parameter that specifies the charac-
690 ter encoding scheme. The changes to the DTD necessary to support
691 this method of form submission have been incorporated in the DTD
692 included in this specification.
693
694 How the user agent determines the encoding of the text entered by the
695 user is outside the scope of this specification.
696
697 6. Miscellaneous
698
699 Proper interpretation of a text document requires that the character
700 encoding scheme be known. Current HTTP servers, however, do not gen-
701 erally include an appropriate charset parameter with the Content-Type
702 header, even when the encoding scheme is different from the default
703 ISO-8859-1. This is bad behaviour, and as such strongly discouraged,
704 but some preventive measures can be taken to minimize the detrimental
705 effects.
706
707 In the case where a document is accessed from a hyperlink in an ori-
708 gin HTML document, a CHARSET attribute is added to the attribute list
709 of elements with link semantics (A and LINK), specifically by adding
710 it to the linkExtraAttributes entity. The value of that attribute is
711 to be considered a hint to the User Agent as to the character encod-
712 ing scheme used by the ressource pointed to by the hyperlink; it
713 should be the appropriate value of the MIME charset parameter for
714 that ressource.
715
716 In any document, it is possible to include an indication of the
717 encoding scheme like the following, as early as possible within the
718 HEAD of the document:
719
720 <META HTTP-EQUIV="Content-Type"
721 CONTENT="text/html; charset=ISO-2022-JP">
722
723
724
725 Expires 27 May 1996 [Page 13]
726
727 Internet Draft HTML internationalization 22 November 1995
728
729
730 This is not foolproof, but will work if the encoding scheme is such
731 that ASCII characters stand for themselves at least until the META
732 element is parsed.
733
734 For definiteness, the "charset" parameter received from the source of
735 the document should be considered the most authoritative, followed in
736 order of preference by the contents of a META element such as the
737 above, and finally the CHARSET parameter of the anchor that was fol-
738 lowed (if any).
739
740 When HTML text is transmitted directly in UCS-2
741 (charset=UNICODE-1-1), the question of byte order arises: does the
742 high-order byte of each two-byte character come first or second? For
743 definiteness, this specification recommends that UCS-2 be transmitted
744 in big-endian byte order (high order byte first), which corresponds
745 both to the established network byte order for two-byte quantities
746 and to the Unicode recommendation for serialized text data. Further-
747 more, to maximize chances of proper interpretation, it is recommended
748 that documents transmitted as UCS-2 always begin with a ZERO-WIDTH
749 NON-BREAKING SPACE character (hexadecimal FEFF) which, when byte-
750 reversed becomes number FFFE, a character guaranteed to be never
751 assigned. Thus, a user-agent receiving an FFFE as the first octets
752 of a text would know that bytes have to be reversed for the remainder
753 of the text.
754
755 The UTF-1 transformation format of ISO 10646 (registered by IANA as
756 ISO-10646-UTF-1), has been removed from the standard, and should not
757 be used.
758
759 7. HTML Public Text
760
761 7.1. HTML DTD
762
763 <!-- html.dtd
764
765 Document Type Definition for the HyperText Markup Language,
766 extended for internationalisation (HTML DTD)
767
768 Last revised: 95/09/25
769
770 Authors: Daniel W. Connolly <connolly@w3.org>
771 Francois Yergeau <yergeau@alis.com>
772 -->
773
774 <!ENTITY % HTML.Version
775 "-//IETF//DTD HTML//EN"
776
777 -- Typical usage:
778
779
780
781 Expires 27 May 1996 [Page 14]
782
783 Internet Draft HTML internationalization 22 November 1995
784
785
786 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
787 <html>
788 ...
789 </html>
790 --
791 >
792
793
794 <!--============ Feature Test Entities ========================-->
795
796 <!ENTITY % HTML.Recommended "IGNORE"
797 -- Certain features of the language are necessary for
798 compatibility with widespread usage, but they may
799 compromise the structural integrity of a document.
800 This feature test entity enables a more prescriptive
801 document type definition that eliminates
802 those features.
803 -->
804
805 <![ %HTML.Recommended [
806 <!ENTITY % HTML.Deprecated "IGNORE">
807 ]]>
808
809 <!ENTITY % HTML.Deprecated "INCLUDE"
810 -- Certain features of the language are necessary for
811 compatibility with earlier versions of the specification,
812 but they tend to be used and implemented inconsistently,
813 and their use is deprecated. This feature test entity
814 enables a document type definition that eliminates
815 these features.
816 -->
817
818 <!ENTITY % HTML.Highlighting "INCLUDE"
819 -- Use this feature test entity to validate that a
820 document uses no highlighting tags, which may be
821 ignored on minimal implementations.
822 -->
823
824 <!ENTITY % HTML.Forms "INCLUDE"
825 -- Use this feature test entity to validate that a document
826 contains no forms, which may not be supported in minimal
827 implementations
828 -->
829
830 <!--============== Imported Names ==============================-->
831
832 <!ENTITY % Content-Type "CDATA"
833 -- meaning an internet media type
834
835
836
837 Expires 27 May 1996 [Page 15]
838
839 Internet Draft HTML internationalization 22 November 1995
840
841
842 (aka MIME content type, as per RFC1521)
843 -->
844
845 <!ENTITY % HTTP-Method "GET | POST"
846 -- as per HTTP specification, in progress
847 -->
848
849 <!--========= DTD "Macros" =====================-->
850
851 <!ENTITY % heading "H1|H2|H3|H4|H5|H6">
852
853 <!ENTITY % list " UL | OL | DIR | MENU " >
854
855 <!ENTITY % attrs -- common attributes for elements --
856 "LANG NAME #IMPLIED -- RFC 1766 language tag --
857 DIR (ltr|rtl) #IMPLIED -- text directionnality --
858 id ID #IMPLIED -- element identifier --
859 class NAMES #IMPLIED -- for subclassing elements --">
860
861 <!ENTITY % just -- an attribute for text justification --
862 "ALIGN (left|right|center|justify) #IMPLIED">
863
864 <!--======= Character mnemonic entities =================-->
865
866 <!ENTITY % ISOlat1 PUBLIC
867 "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
868 %ISOlat1;
869
870 <!--Entities for markup significant characters -->
871 <!ENTITY amp CDATA "&#38;" -- ampersand -->
872 <!ENTITY gt CDATA "&#62;" -- greater than -->
873 <!ENTITY lt CDATA "&#60;" -- less than -->
874 <!ENTITY quot CDATA "&#34;" -- double quote -->
875
876 <!--Entities for language-dependent presentation (BIDI and contextual analysis) -->
877 <!ENTITY zwnj CDATA "&#8204;"-- zero width non-joiner-->
878 <!ENTITY zwj CDATA "&#8205;"-- zero width joiner-->
879 <!ENTITY lrm CDATA "&#8206;"-- left-to-right mark-->
880 <!ENTITY rlm CDATA "&#8207;"-- right-to-left mark-->
881
882 <!--========= SGML Document Access (SDA) Parameter Entities =====-->
883
884 <!-- HTML 2.0 contains SGML Document Access (SDA) fixed attributes
885 in support of easy transformation to the International Committee
886 for Accessible Document Design (ICADD) DTD
887 "-//EC-USA-CDA/ICADD//DTD ICADD22//EN".
888 ICADD applications are designed to support usable access to
889 structured information by print-impaired individuals through
890
891
892
893 Expires 27 May 1996 [Page 16]
894
895 Internet Draft HTML internationalization 22 November 1995
896
897
898 Braille, large print and voice synthesis. For more information on
899 SDA & ICADD:
900 - ISO 12083:1993, Annex A.8, Facilities for Braille,
901 large print and computer voice
902 - ICADD ListServ
903 <ICADD%ASUACAD.BITNET@ARIZVM1.ccit.arizona.edu>
904 - Usenet news group bit.listserv.easi
905 - Recording for the Blind, +1 800 221 4792
906 -->
907
908 <!ENTITY % SDAFORM "SDAFORM CDATA #FIXED"
909 -- one to one mapping -->
910 <!ENTITY % SDARULE "SDARULE CDATA #FIXED"
911 -- context-sensitive mapping -->
912 <!ENTITY % SDAPREF "SDAPREF CDATA #FIXED"
913 -- generated text prefix -->
914 <!ENTITY % SDASUFF "SDASUFF CDATA #FIXED"
915 -- generated text suffix -->
916 <!ENTITY % SDASUSP "SDASUSP NAME #FIXED"
917 -- suspend transform process -->
918
919 <!--========== Text Markup =====================-->
920
921 <![ %HTML.Highlighting [
922
923 <!ENTITY % font " TT | B | I ">
924
925 <!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE">
926
927 <!ENTITY % text "#PCDATA|A|IMG|BR|%phrase|%font|SPAN|Q|BDO|SUP|SUB">
928
929 <!ELEMENT (%font;|%phrase) - - (%text)*>
930 <!ATTLIST ( TT | CODE | SAMP | KBD | VAR )
931 %attrs;
932 %SDAFORM; "Lit"
933 >
934 <!ATTLIST ( B | STRONG )
935 %attrs;
936 %SDAFORM; "B"
937 >
938 <!ATTLIST ( I | EM | CITE )
939 %attrs;
940 %SDAFORM; "It"
941 >
942
943 <!-- <TT> Typewriter text -->
944 <!-- <B> Bold text -->
945 <!-- <I> Italic text -->
946
947
948
949 Expires 27 May 1996 [Page 17]
950
951 Internet Draft HTML internationalization 22 November 1995
952
953
954 <!-- <EM> Emphasized phrase -->
955 <!-- <STRONG> Strong emphasis -->
956 <!-- <CODE> Source code phrase -->
957 <!-- <SAMP> Sample text or characters -->
958 <!-- <KBD> Keyboard phrase, e.g. user input -->
959 <!-- <VAR> Variable phrase or substituable -->
960 <!-- <CITE> Name or title of cited work -->
961
962 <!ENTITY % pre.content "#PCDATA|A|HR|BR|%font|%phrase|SPAN|BDO">
963
964 ]]>
965
966 <!ENTITY % text "#PCDATA|A|IMG|BR|SPAN|Q|BDO|SUP|SUB">
967
968 <!-- Should the BDO element have an SDAFORM attr.? Which? -->
969 <!ELEMENT BDO - - (%text)+>
970 <!ATTLIST BDO
971 LANG NAME #IMPLIED
972 DIR (ltr|rtl) #REQUIRED
973 >
974
975 <!-- <BDO> Control bidirectionnal text -->
976
977 <!ELEMENT BR - O EMPTY>
978 <!ATTLIST BR
979 %SDAPREF; "&#RE;"
980 >
981
982 <!-- <BR> Line break -->
983
984 <!-- Should the SPAN element have an SDAFORM attr.? Which? -->
985 <!ELEMENT SPAN - - (%text)*>
986 <!ATTLIST SPAN
987 %attrs;
988 >
989
990 <!-- <SPAN> Generic container -->
991
992 <!ELEMENT Q - - (%text)*>
993 <!ATTLIST Q
994 %attrs;
995 %SDAFORM; "It" -- to be verified --
996 >
997
998 <!-- <Q> Short quotation -->
999
1000 <!ELEMENT (SUP|SUB) - - (#PCDATA)>
1001 <!ATTLIST (SUP|SUB)
1002
1003
1004
1005 Expires 27 May 1996 [Page 18]
1006
1007 Internet Draft HTML internationalization 22 November 1995
1008
1009
1010 %attrs;
1011 >
1012
1013 <!-- <SUP> Superscript -->
1014 <!-- <SUB> Subscript -->
1015
1016 <!--========= Link Markup ======================-->
1017
1018 <!ENTITY % linkType "NAME">
1019
1020 <!ENTITY % linkExtraAttributes
1021 "REL %linkType #IMPLIED
1022 REV %linkType #IMPLIED
1023 URN CDATA #IMPLIED
1024 TITLE CDATA #IMPLIED
1025 METHODS NAMES #IMPLIED
1026 CHARSET NAME #IMPLIED
1027 ">
1028
1029 <![ %HTML.Recommended [
1030 <!ENTITY % A.content "(%text)*"
1031 -- <H1><a name="xxx">Heading</a></H1>
1032 is preferred to
1033 <a name="xxx"><H1>Heading</H1></a>
1034 -->
1035 ]]>
1036
1037 <!ENTITY % A.content "(%heading|%text)*">
1038
1039 <!ELEMENT A - - %A.content -(A)>
1040 <!ATTLIST A
1041 %attrs;
1042 HREF CDATA #IMPLIED
1043 NAME CDATA #IMPLIED
1044 %linkExtraAttributes;
1045 %SDAPREF; "<Anchor: #AttList>"
1046 >
1047 <!-- <A> Anchor; source/destination of link -->
1048 <!-- <A NAME="..."> Name of this anchor -->
1049 <!-- <A HREF="..."> Address of link destination -->
1050 <!-- <A URN="..."> Permanent address of destination -->
1051 <!-- <A REL=...> Relationship to destination -->
1052 <!-- <A REV=...> Relationship of destination to this -->
1053 <!-- <A TITLE="..."> Title of destination (advisory) -->
1054 <!-- <A METHODS="..."> Operations on destination (advisory) -->
1055 <!-- <A CHARSET="..."> Charset of destination (advisory) -->
1056
1057
1058
1059
1060
1061 Expires 27 May 1996 [Page 19]
1062
1063 Internet Draft HTML internationalization 22 November 1995
1064
1065
1066 <!--========== Images ==========================-->
1067
1068 <!ELEMENT IMG - O EMPTY>
1069 <!ATTLIST IMG
1070 %attrs;
1071 SRC CDATA #REQUIRED
1072 ALT CDATA #IMPLIED
1073 ALIGN (top|middle|bottom) #IMPLIED
1074 ISMAP (ISMAP) #IMPLIED
1075 %SDAPREF; "<Fig><?SDATrans Img: #AttList>#AttVal(Alt)</Fig>"
1076 >
1077
1078 <!-- <IMG> Image; icon, glyph or illustration -->
1079 <!-- <IMG SRC="..."> Address of image object -->
1080 <!-- <IMG ALT="..."> Textual alternative -->
1081 <!-- <IMG ALIGN=...> Position relative to text -->
1082 <!-- <IMG ISMAP> Each pixel can be a link -->
1083
1084 <!--========== Paragraphs=======================-->
1085
1086 <!ELEMENT P - O (%text)*>
1087 <!ATTLIST P
1088 %attrs;
1089 %just;
1090 %SDAFORM; "Para"
1091 >
1092
1093 <!-- <P> Paragraph -->
1094
1095
1096 <!--========== Headings, Titles, Sections ===============-->
1097
1098 <!ELEMENT HR - O EMPTY>
1099 <!ATTLIST HR
1100 DIR (ltr|rtl) #IMPLIED
1101 %just;
1102 %SDAPREF; "&#RE;&#RE;"
1103 >
1104
1105 <!-- <HR> Horizontal rule -->
1106
1107 <!ELEMENT ( %heading ) - - (%text;)*>
1108 <!ATTLIST H1
1109 %attrs;
1110 %just;
1111 %SDAFORM; "H1"
1112 >
1113 <!ATTLIST H2
1114
1115
1116
1117 Expires 27 May 1996 [Page 20]
1118
1119 Internet Draft HTML internationalization 22 November 1995
1120
1121
1122 %attrs;
1123 %just;
1124 %SDAFORM; "H2"
1125 >
1126 <!ATTLIST H3
1127 %attrs;
1128 %just;
1129 %SDAFORM; "H3"
1130 >
1131 <!ATTLIST H4
1132 %attrs;
1133 %just;
1134 %SDAFORM; "H4"
1135 >
1136 <!ATTLIST H5
1137 %attrs;
1138 %just;
1139 %SDAFORM; "H5"
1140 >
1141 <!ATTLIST H6
1142 %attrs;
1143 %just;
1144 %SDAFORM; "H6"
1145 >
1146
1147 <!-- <H1> Heading, level 1 -->
1148 <!-- <H2> Heading, level 2 -->
1149 <!-- <H3> Heading, level 3 -->
1150 <!-- <H4> Heading, level 4 -->
1151 <!-- <H5> Heading, level 5 -->
1152 <!-- <H6> Heading, level 6 -->
1153
1154
1155 <!--========== Text Flows ======================-->
1156
1157 <![ %HTML.Forms [
1158 <!ENTITY % block.forms "BLOCKQUOTE | FORM | ISINDEX">
1159 ]]>
1160
1161 <!ENTITY % block.forms "BLOCKQUOTE">
1162
1163 <![ %HTML.Deprecated [
1164 <!ENTITY % preformatted "PRE | XMP | LISTING">
1165 ]]>
1166
1167 <!ENTITY % preformatted "PRE">
1168
1169 <!ENTITY % block "P | %list | DL
1170
1171
1172
1173 Expires 27 May 1996 [Page 21]
1174
1175 Internet Draft HTML internationalization 22 November 1995
1176
1177
1178 | %preformatted
1179 | %block.forms">
1180
1181 <!ENTITY % flow "(%text|%block)*">
1182
1183 <!ENTITY % pre.content "#PCDATA | A | HR | BR | SPAN | BDO">
1184 <!ELEMENT PRE - - (%pre.content)*>
1185 <!ATTLIST PRE
1186 %attrs;
1187 WIDTH NUMBER #implied
1188 %SDAFORM; "Lit"
1189 >
1190
1191 <!-- <PRE> Preformatted text -->
1192 <!-- <PRE WIDTH=...> Maximum characters per line -->
1193
1194 <![ %HTML.Deprecated [
1195
1196 <!ENTITY % literal "CDATA"
1197 -- historical, non-conforming parsing mode where
1198 the only markup signal is the end tag
1199 in full
1200 -->
1201
1202 <!ELEMENT (XMP|LISTING) - - %literal>
1203 <!ATTLIST XMP
1204 %attrs;
1205 %SDAFORM; "Lit"
1206 %SDAPREF; "Example:&#RE;"
1207 >
1208 <!ATTLIST LISTING
1209 %attrs;
1210 %SDAFORM; "Lit"
1211 %SDAPREF; "Listing:&#RE;"
1212 >
1213
1214 <!-- <XMP> Example section -->
1215 <!-- <LISTING> Computer listing -->
1216
1217 <!ELEMENT PLAINTEXT - O %literal>
1218 <!-- <PLAINTEXT> Plain text passage -->
1219
1220 <!ATTLIST PLAINTEXT
1221 %attrs;
1222 %SDAFORM; "Lit"
1223 >
1224 ]]>
1225
1226
1227
1228
1229 Expires 27 May 1996 [Page 22]
1230
1231 Internet Draft HTML internationalization 22 November 1995
1232
1233
1234 <!--========== Lists ==================-->
1235
1236 <!ELEMENT DL - - (DT | DD)+>
1237 <!ATTLIST DL
1238 %attrs;
1239 COMPACT (COMPACT) #IMPLIED
1240 %SDAFORM; "List"
1241 %SDAPREF; "Definition List:"
1242 >
1243
1244 <!ELEMENT DT - O (%text)*>
1245 <!ATTLIST DT
1246 %attrs;
1247 %SDAFORM; "Term"
1248 >
1249
1250 <!ELEMENT DD - O %flow>
1251 <!ATTLIST DD
1252 %attrs;
1253 %SDAFORM; "LItem"
1254 >
1255
1256 <!-- <DL> Definition list, or glossary -->
1257 <!-- <DL COMPACT> Compact style list -->
1258 <!-- <DT> Term in definition list -->
1259 <!-- <DD> Definition of term -->
1260
1261 <!ELEMENT (OL|UL) - - (LI)+>
1262 <!ATTLIST OL
1263 %attrs;
1264 %just;
1265 COMPACT (COMPACT) #IMPLIED
1266 %SDAFORM; "List"
1267 >
1268 <!ATTLIST UL
1269 %attrs;
1270 %just;
1271 COMPACT (COMPACT) #IMPLIED
1272 %SDAFORM; "List"
1273 >
1274 <!-- <UL> Unordered list -->
1275 <!-- <UL COMPACT> Compact list style -->
1276 <!-- <OL> Ordered, or numbered list -->
1277 <!-- <OL COMPACT> Compact list style -->
1278
1279
1280 <!ELEMENT (DIR|MENU) - - (LI)+ -(%block)>
1281 <!ATTLIST DIR
1282
1283
1284
1285 Expires 27 May 1996 [Page 23]
1286
1287 Internet Draft HTML internationalization 22 November 1995
1288
1289
1290 %attrs;
1291 %just;
1292 COMPACT (COMPACT) #IMPLIED
1293 %SDAFORM; "List"
1294 %SDAPREF; "<LHead>Directory</LHead>"
1295 >
1296 <!ATTLIST MENU
1297 %attrs;
1298 %just;
1299 COMPACT (COMPACT) #IMPLIED
1300 %SDAFORM; "List"
1301 %SDAPREF; "<LHead>Menu</LHead>"
1302 >
1303
1304 <!-- <DIR> Directory list -->
1305 <!-- <DIR COMPACT> Compact list style -->
1306 <!-- <MENU> Menu list -->
1307 <!-- <MENU COMPACT> Compact list style -->
1308
1309 <!ELEMENT LI - O %flow>
1310 <!ATTLIST LI
1311 %attrs;
1312 %just;
1313 %SDAFORM; "LItem"
1314 >
1315
1316 <!-- <LI> List item -->
1317
1318 <!--========== Document Body ===================-->
1319
1320 <![ %HTML.Recommended [
1321 <!ENTITY % body.content "(%heading|%block|HR|ADDRESS|IMG)*"
1322 -- <h1>Heading</h1>
1323 <p>Text ...
1324 is preferred to
1325 <h1>Heading</h1>
1326 Text ...
1327 -->
1328 ]]>
1329
1330 <!ENTITY % body.content "(%heading | %text | %block |
1331 HR | ADDRESS)*">
1332
1333 <!ELEMENT BODY O O %body.content>
1334 <!ATTLIST BODY
1335 %attrs;
1336 >
1337
1338
1339
1340
1341 Expires 27 May 1996 [Page 24]
1342
1343 Internet Draft HTML internationalization 22 November 1995
1344
1345
1346 <!-- <BODY> Document body -->
1347
1348 <!ELEMENT BLOCKQUOTE - - %body.content>
1349 <!ATTLIST BLOCKQUOTE
1350 %attrs;
1351 %just;
1352 %SDAFORM; "BQ"
1353 >
1354
1355 <!-- <BLOCKQUOTE> Quoted passage -->
1356
1357 <!ELEMENT ADDRESS - - (%text|P)*>
1358 <!ATTLIST ADDRESS
1359 %attrs;
1360 %just;
1361 %SDAFORM; "Lit"
1362 %SDAPREF; "Address:&#RE;"
1363 >
1364
1365 <!-- <ADDRESS> Address, signature, or byline -->
1366
1367
1368 <!--======= Forms ====================-->
1369
1370 <![ %HTML.Forms [
1371
1372 <!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
1373 <!ATTLIST FORM
1374 %attrs;
1375 ACTION CDATA #IMPLIED
1376 METHOD (%HTTP-Method) GET
1377 ENCTYPE %Content-Type; "application/x-www-form-urlencoded"
1378 ACCEPT-CHARSET CDATA #IMPLIED
1379 %SDAPREF; "<Para>Form:</Para>"
1380 %SDASUFF; "<Para>Form End.</Para>"
1381 >
1382
1383 <!-- <FORM> Fill-out or data-entry form -->
1384 <!-- <FORM ACTION="..."> Address for completed form -->
1385 <!-- <FORM METHOD=...> Method of submitting form -->
1386 <!-- <FORM ENCTYPE="..."> Representation of form data -->
1387
1388 <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
1389 RADIO | SUBMIT | RESET |
1390 IMAGE | HIDDEN | FILE )">
1391 <!ELEMENT INPUT - O EMPTY>
1392 <!ATTLIST INPUT
1393 %attrs;
1394
1395
1396
1397 Expires 27 May 1996 [Page 25]
1398
1399 Internet Draft HTML internationalization 22 November 1995
1400
1401
1402 TYPE %InputType TEXT
1403 NAME CDATA #IMPLIED
1404 VALUE CDATA #IMPLIED
1405 SRC CDATA #IMPLIED
1406 CHECKED (CHECKED) #IMPLIED
1407 SIZE CDATA #IMPLIED
1408 MAXLENGTH NUMBER #IMPLIED
1409 ALIGN (top|middle|bottom) #IMPLIED
1410 ACCEPT CDATA #IMPLIED --list of content types --
1411 %SDAPREF; "Input: "
1412 >
1413
1414 <!-- <INPUT> Form input datum -->
1415 <!-- <INPUT TYPE=...> Type of input interaction -->
1416 <!-- <INPUT NAME=...> Name of form datum -->
1417 <!-- <INPUT VALUE="..."> Default/initial/selected value -->
1418 <!-- <INPUT SRC="..."> Address of image -->
1419 <!-- <INPUT CHECKED> Initial state is "on" -->
1420 <!-- <INPUT SIZE=...> Field size hint -->
1421 <!-- <INPUT MAXLENGTH=...> Data length maximum -->
1422 <!-- <INPUT ALIGN=...> Image alignment -->
1423
1424 <!ELEMENT SELECT - - (OPTION+) -(INPUT|SELECT|TEXTAREA)>
1425 <!ATTLIST SELECT
1426 %attrs;
1427 NAME CDATA #REQUIRED
1428 SIZE NUMBER #IMPLIED
1429 MULTIPLE (MULTIPLE) #IMPLIED
1430 %SDAFORM; "List"
1431 %SDAPREF;
1432 "<LHead>Select #AttVal(Multiple)</LHead>"
1433 >
1434
1435 <!-- <SELECT> Selection of option(s) -->
1436 <!-- <SELECT NAME=...> Name of form datum -->
1437 <!-- <SELECT SIZE=...> Options displayed at a time -->
1438 <!-- <SELECT MULTIPLE> Multiple selections allowed -->
1439
1440 <!ELEMENT OPTION - O (#PCDATA)*>
1441 <!ATTLIST OPTION
1442 %attrs;
1443 SELECTED (SELECTED) #IMPLIED
1444 VALUE CDATA #IMPLIED
1445 %SDAFORM; "LItem"
1446 %SDAPREF;
1447 "Option: #AttVal(Value) #AttVal(Selected)"
1448 >
1449
1450
1451
1452
1453 Expires 27 May 1996 [Page 26]
1454
1455 Internet Draft HTML internationalization 22 November 1995
1456
1457
1458 <!-- <OPTION> A selection option -->
1459 <!-- <OPTION SELECTED> Initial state -->
1460 <!-- <OPTION VALUE="..."> Form datum value for this option-->
1461
1462 <!ELEMENT TEXTAREA - - (#PCDATA)* -(INPUT|SELECT|TEXTAREA)>
1463 <!ATTLIST TEXTAREA
1464 %attrs;
1465 NAME CDATA #REQUIRED
1466 ROWS NUMBER #REQUIRED
1467 COLS NUMBER #REQUIRED
1468 %SDAFORM; "Para"
1469 %SDAPREF; "Input Text -- #AttVal(Name): "
1470 >
1471
1472 <!-- <TEXTAREA> An area for text input -->
1473 <!-- <TEXTAREA NAME=...> Name of form datum -->
1474 <!-- <TEXTAREA ROWS=...> Height of area -->
1475 <!-- <TEXTAREA COLS=...> Width of area -->
1476
1477 ]]>
1478
1479
1480 <!--======= Document Head ======================-->
1481
1482 <![ %HTML.Recommended [
1483 <!ENTITY % head.extra "">
1484 ]]>
1485 <!ENTITY % head.extra "& NEXTID?">
1486
1487 <!ENTITY % head.content "TITLE & ISINDEX? & BASE? %head.extra">
1488
1489 <!ELEMENT HEAD O O (%head.content) +(META|LINK)>
1490 <!ATTLIST HEAD
1491 %attrs; >
1492
1493 <!-- <HEAD> Document head -->
1494
1495 <!ELEMENT TITLE - - (#PCDATA)* -(META|LINK)>
1496 <!ATTLIST TITLE
1497 %attrs;
1498 %SDAFORM; "Ti" >
1499
1500 <!-- <TITLE> Title of document -->
1501
1502 <!ELEMENT LINK - O EMPTY>
1503 <!ATTLIST LINK
1504 %attrs;
1505 HREF CDATA #REQUIRED
1506
1507
1508
1509 Expires 27 May 1996 [Page 27]
1510
1511 Internet Draft HTML internationalization 22 November 1995
1512
1513
1514 %linkExtraAttributes;
1515 %SDAPREF; "Linked to : #AttVal (TITLE) (URN) (HREF)>" >
1516
1517 <!-- <LINK> Link from this document -->
1518 <!-- <LINK HREF="..."> Address of link destination -->
1519 <!-- <LINK URN="..."> Lasting name of destination -->
1520 <!-- <LINK REL=...> Relationship to destination -->
1521 <!-- <LINK REV=...> Relationship of destination to this -->
1522 <!-- <LINK TITLE="..."> Title of destination (advisory) -->
1523 <!-- <LINK CHARSET="..."> Charset of destination (advisory) -->
1524 <!-- <LINK METHODS="..."> Operations allowed (advisory) -->
1525
1526 <!ELEMENT ISINDEX - O EMPTY>
1527 <!ATTLIST ISINDEX
1528 %attrs;
1529 %SDAPREF;
1530 "<Para>[Document is indexed/searchable.]</Para>">
1531
1532 <!-- <ISINDEX> Document is a searchable index -->
1533
1534 <!ELEMENT BASE - O EMPTY>
1535 <!ATTLIST BASE
1536 HREF CDATA #REQUIRED >
1537
1538 <!-- <BASE> Base context document -->
1539 <!-- <BASE HREF="..."> Address for this document -->
1540
1541 <!ELEMENT NEXTID - O EMPTY>
1542 <!ATTLIST NEXTID
1543 N CDATA #REQUIRED >
1544
1545 <!-- <NEXTID> Next ID to use for link name -->
1546 <!-- <NEXTID N=...> Next ID to use for link name -->
1547
1548 <!ELEMENT META - O EMPTY>
1549 <!ATTLIST META
1550 HTTP-EQUIV NAME #IMPLIED
1551 NAME NAME #IMPLIED
1552 CONTENT CDATA #REQUIRED
1553 >
1554
1555 <!-- <META> Generic Meta-information -->
1556 <!-- <META HTTP-EQUIV=...> HTTP response header name -->
1557 <!-- <META NAME=...> Meta-information name -->
1558 <!-- <META CONTENT="..."> Associated information -->
1559
1560 <!--======= Document Structure =================-->
1561
1562
1563
1564
1565 Expires 27 May 1996 [Page 28]
1566
1567 Internet Draft HTML internationalization 22 November 1995
1568
1569
1570 <![ %HTML.Deprecated [
1571 <!ENTITY % html.content "HEAD, BODY, PLAINTEXT?">
1572 ]]>
1573 <!ENTITY % html.content "HEAD, BODY">
1574
1575 <!ELEMENT HTML O O (%html.content)>
1576 <!ENTITY % version.attr "VERSION CDATA #FIXED '%HTML.Version;'">
1577
1578 <!ATTLIST HTML
1579 %attrs;
1580 %version.attr;
1581 %SDAFORM; "Book"
1582 >
1583
1584 <!-- <HTML> HTML Document -->
1585
1586
1587 7.2. SGML Declaration for HTML
1588
1589 <!SGML "ISO 8879:1986"
1590 --
1591 SGML Declaration for HyperText Markup Language version 2.x
1592 (HTML 2.x).
1593
1594 --
1595
1596 CHARSET
1597 BASESET "ISO Registration Number 176//CHARSET
1598 ISO/IEC 10646-1:1993 UCS-2 with
1599 implementation level 3//ESC 2/5 2/15 4/5"
1600 DESCSET 0 9 UNUSED
1601 9 2 9
1602 11 2 UNUSED
1603 13 1 13
1604 14 18 UNUSED
1605 32 95 32
1606 127 1 UNUSED
1607 128 32 UNUSED
1608 160 65376 160
1609
1610
1611 CAPACITY SGMLREF
1612 TOTALCAP 150000
1613 GRPCAP 150000
1614 ENTCAP 150000
1615
1616 SCOPE DOCUMENT
1617 SYNTAX
1618
1619
1620
1621 Expires 27 May 1996 [Page 29]
1622
1623 Internet Draft HTML internationalization 22 November 1995
1624
1625
1626 SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1627 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127
1628 BASESET "ISO 646:1983//CHARSET
1629 International Reference Version
1630 (IRV)//ESC 2/5 4/0"
1631 DESCSET 0 128 0
1632
1633 FUNCTION
1634 RE 13
1635 RS 10
1636 SPACE 32
1637 TAB SEPCHAR 9
1638
1639 NAMING LCNMSTRT ""
1640 UCNMSTRT ""
1641 LCNMCHAR ".-"
1642 UCNMCHAR ".-"
1643 NAMECASE GENERAL YES
1644 ENTITY NO
1645 DELIM GENERAL SGMLREF
1646 SHORTREF SGMLREF
1647 NAMES SGMLREF
1648 QUANTITY SGMLREF
1649 ATTSPLEN 2100
1650 LITLEN 1024
1651 NAMELEN 72 -- somewhat arbitrary; taken from
1652 internet line length conventions --
1653 PILEN 1024
1654 TAGLVL 100
1655 TAGLEN 2100
1656 GRPGTCNT 150
1657 GRPCNT 64
1658
1659 FEATURES
1660 MINIMIZE
1661 DATATAG NO
1662 OMITTAG YES
1663 RANK NO
1664 SHORTTAG YES
1665 LINK
1666 SIMPLE NO
1667 IMPLICIT NO
1668 EXPLICIT NO
1669 OTHER
1670 CONCUR NO
1671 SUBDOC NO
1672 FORMAL YES
1673 APPINFO "SDA" -- conforming SGML Document Access application
1674
1675
1676
1677 Expires 27 May 1996 [Page 30]
1678
1679 Internet Draft HTML internationalization 22 November 1995
1680
1681
1682 --
1683 >
1684
1685
1686 7.3. ISO Latin 1 entity set
1687
1688 The following public text lists each of the characters specified in
1689 the Added Latin 1 entity set, along with its name, syntax for use,
1690 and description. This list is derived from ISO Standard
1691 8879:1986//ENTITIES Added Latin 1//EN. HTML includes the entire
1692 entity set, and adds entities for all missing characters in the right
1693 part of ISO-8859-1.
1694
1695 <!-- (C) International Organization for Standardization 1986
1696 Permission to copy in any form is granted for use with
1697 conforming SGML systems and applications as defined in
1698 ISO 8879, provided this notice is included in all copies.
1699 -->
1700 <!-- Character entity set. Typical invocation:
1701 <!ENTITY % ISOlat1 PUBLIC
1702 "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
1703 %ISOlat1;
1704 -->
1705 <!ENTITY nbsp CDATA "&#160;" -- no-break space -->
1706 <!ENTITY iexcl CDATA "&#161;" -- inverted exclamation mark -->
1707 <!ENTITY cent CDATA "&#162;" -- cent sign -->
1708 <!ENTITY pound CDATA "&#163;" -- pound sterling sign -->
1709 <!ENTITY curren CDATA "&#164;" -- general currency sign -->
1710 <!ENTITY yen CDATA "&#165;" -- yen sign -->
1711 <!ENTITY brvbar CDATA "&#166;" -- broken (vertical) bar -->
1712 <!ENTITY sect CDATA "&#167;" -- section sign -->
1713 <!ENTITY uml CDATA "&#168;" -- umlaut (dieresis) -->
1714 <!ENTITY copy CDATA "&#169;" -- copyright sign -->
1715 <!ENTITY ordf CDATA "&#170;" -- ordinal indicator, feminine -->
1716 <!ENTITY laquo CDATA "&#171;" -- angle quotation mark, left -->
1717 <!ENTITY not CDATA "&#172;" -- not sign -->
1718 <!ENTITY shy CDATA "&#173;" -- soft hyphen -->
1719 <!ENTITY reg CDATA "&#174;" -- registered sign -->
1720 <!ENTITY macr CDATA "&#175;" -- macron -->
1721 <!ENTITY deg CDATA "&#176;" -- degree sign -->
1722 <!ENTITY plusmn CDATA "&#177;" -- plus-or-minus sign -->
1723 <!ENTITY sup2 CDATA "&#178;" -- superscript two -->
1724 <!ENTITY sup3 CDATA "&#179;" -- superscript three -->
1725 <!ENTITY acute CDATA "&#180;" -- acute accent -->
1726 <!ENTITY micro CDATA "&#181;" -- micro sign -->
1727 <!ENTITY para CDATA "&#182;" -- pilcrow (paragraph sign) -->
1728 <!ENTITY middot CDATA "&#183;" -- middle dot -->
1729 <!ENTITY cedil CDATA "&#184;" -- cedilla -->
1730
1731
1732
1733 Expires 27 May 1996 [Page 31]
1734
1735 Internet Draft HTML internationalization 22 November 1995
1736
1737
1738 <!ENTITY sup1 CDATA "&#185;" -- superscript one -->
1739 <!ENTITY ordm CDATA "&#186;" -- ordinal indicator, masculine -->
1740 <!ENTITY raquo CDATA "&#187;" -- angle quotation mark, right -->
1741 <!ENTITY frac14 CDATA "&#188;" -- fraction one-quarter -->
1742 <!ENTITY frac12 CDATA "&#189;" -- fraction one-half -->
1743 <!ENTITY frac34 CDATA "&#190;" -- fraction three-quarters -->
1744 <!ENTITY iquest CDATA "&#191;" -- inverted question mark -->
1745 <!ENTITY Agrave CDATA "&#192;" -- capital A, grave accent -->
1746 <!ENTITY Aacute CDATA "&#193;" -- capital A, acute accent -->
1747 <!ENTITY Acirc CDATA "&#194;" -- capital A, circumflex accent -->
1748 <!ENTITY Atilde CDATA "&#195;" -- capital A, tilde -->
1749 <!ENTITY Auml CDATA "&#196;" -- capital A, dieresis or umlaut mark -->
1750 <!ENTITY Aring CDATA "&#197;" -- capital A, ring -->
1751 <!ENTITY AElig CDATA "&#198;" -- capital AE diphthong (ligature) -->
1752 <!ENTITY Ccedil CDATA "&#199;" -- capital C, cedilla -->
1753 <!ENTITY Egrave CDATA "&#200;" -- capital E, grave accent -->
1754 <!ENTITY Eacute CDATA "&#201;" -- capital E, acute accent -->
1755 <!ENTITY Ecirc CDATA "&#202;" -- capital E, circumflex accent -->
1756 <!ENTITY Euml CDATA "&#203;" -- capital E, dieresis or umlaut mark -->
1757 <!ENTITY Igrave CDATA "&#204;" -- capital I, grave accent -->
1758 <!ENTITY Iacute CDATA "&#205;" -- capital I, acute accent -->
1759 <!ENTITY Icirc CDATA "&#206;" -- capital I, circumflex accent -->
1760 <!ENTITY Iuml CDATA "&#207;" -- capital I, dieresis or umlaut mark -->
1761 <!ENTITY ETH CDATA "&#208;" -- capital Eth, Icelandic -->
1762 <!ENTITY Ntilde CDATA "&#209;" -- capital N, tilde -->
1763 <!ENTITY Ograve CDATA "&#210;" -- capital O, grave accent -->
1764 <!ENTITY Oacute CDATA "&#211;" -- capital O, acute accent -->
1765 <!ENTITY Ocirc CDATA "&#212;" -- capital O, circumflex accent -->
1766 <!ENTITY Otilde CDATA "&#213;" -- capital O, tilde -->
1767 <!ENTITY Ouml CDATA "&#214;" -- capital O, dieresis or umlaut mark -->
1768 <!ENTITY times CDATA "&#215;" -- multiply sign -->
1769 <!ENTITY Oslash CDATA "&#216;" -- capital O, slash -->
1770 <!ENTITY Ugrave CDATA "&#217;" -- capital U, grave accent -->
1771 <!ENTITY Uacute CDATA "&#218;" -- capital U, acute accent -->
1772 <!ENTITY Ucirc CDATA "&#219;" -- capital U, circumflex accent -->
1773 <!ENTITY Uuml CDATA "&#220;" -- capital U, dieresis or umlaut mark -->
1774 <!ENTITY Yacute CDATA "&#221;" -- capital Y, acute accent -->
1775 <!ENTITY THORN CDATA "&#222;" -- capital Thorn, Icelandic -->
1776 <!ENTITY szlig CDATA "&#223;" -- small sharp s, German (sz ligature) -->
1777 <!ENTITY agrave CDATA "&#224;" -- small a, grave accent -->
1778 <!ENTITY aacute CDATA "&#225;" -- small a, acute accent -->
1779 <!ENTITY acirc CDATA "&#226;" -- small a, circumflex accent -->
1780 <!ENTITY atilde CDATA "&#227;" -- small a, tilde -->
1781 <!ENTITY auml CDATA "&#228;" -- small a, dieresis or umlaut mark -->
1782 <!ENTITY aring CDATA "&#229;" -- small a, ring -->
1783 <!ENTITY aelig CDATA "&#230;" -- small ae diphthong (ligature) -->
1784 <!ENTITY ccedil CDATA "&#231;" -- small c, cedilla -->
1785 <!ENTITY egrave CDATA "&#232;" -- small e, grave accent -->
1786
1787
1788
1789 Expires 27 May 1996 [Page 32]
1790
1791 Internet Draft HTML internationalization 22 November 1995
1792
1793
1794 <!ENTITY eacute CDATA "&#233;" -- small e, acute accent -->
1795 <!ENTITY ecirc CDATA "&#234;" -- small e, circumflex accent -->
1796 <!ENTITY euml CDATA "&#235;" -- small e, dieresis or umlaut mark -->
1797 <!ENTITY igrave CDATA "&#236;" -- small i, grave accent -->
1798 <!ENTITY iacute CDATA "&#237;" -- small i, acute accent -->
1799 <!ENTITY icirc CDATA "&#238;" -- small i, circumflex accent -->
1800 <!ENTITY iuml CDATA "&#239;" -- small i, dieresis or umlaut mark -->
1801 <!ENTITY eth CDATA "&#240;" -- small eth, Icelandic -->
1802 <!ENTITY ntilde CDATA "&#241;" -- small n, tilde -->
1803 <!ENTITY ograve CDATA "&#242;" -- small o, grave accent -->
1804 <!ENTITY oacute CDATA "&#243;" -- small o, acute accent -->
1805 <!ENTITY ocirc CDATA "&#244;" -- small o, circumflex accent -->
1806 <!ENTITY otilde CDATA "&#245;" -- small o, tilde -->
1807 <!ENTITY ouml CDATA "&#246;" -- small o, dieresis or umlaut mark -->
1808 <!ENTITY divide CDATA "&#247;" -- divide sign -->
1809 <!ENTITY oslash CDATA "&#248;" -- small o, slash -->
1810 <!ENTITY ugrave CDATA "&#249;" -- small u, grave accent -->
1811 <!ENTITY uacute CDATA "&#250;" -- small u, acute accent -->
1812 <!ENTITY ucirc CDATA "&#251;" -- small u, circumflex accent -->
1813 <!ENTITY uuml CDATA "&#252;" -- small u, dieresis or umlaut mark -->
1814 <!ENTITY yacute CDATA "&#253;" -- small y, acute accent -->
1815 <!ENTITY thorn CDATA "&#254;" -- small thorn, Icelandic -->
1816 <!ENTITY yuml CDATA "&#255;" -- small y, dieresis or umlaut mark -->
1817
1818
1819 Bibliography
1820
1821 [BRYAN88] M. Bryan, "SGML -- An Author's Guide to the Standard
1822 Generalized Markup Language", Addison-Wesley, Reading,
1823 1988.
1824
1825 [ERCS] Extended Reference Concrete Syntax for SGML.
1826 <http://www.sgmlopen.org/sgml/docs/ercs/ercs-
1827 home.html>
1828
1829 [ETHNO] "Ethnologue, Languages of the World", 12th Edition,
1830 Barbara F. Grimes editor, Summer Institute of Linguis-
1831 tics, Dallas, 1992.
1832
1833 [FILE-UPLOAD] E. Nebel and L. Masinter, "Form-based File Upload in
1834 HTML", Work in progress (draft-ietf-html-
1835 fileupload-03.txt), Xerox Corporation, August 1995.
1836
1837 [GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed.,
1838 Oxford University Press, 1990.
1839
1840 [HTML-2] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan-
1841 guage - 2.0", RFC 1866, MIT/W3C, November 1995.
1842
1843
1844
1845 Expires 27 May 1996 [Page 33]
1846
1847 Internet Draft HTML internationalization 22 November 1995
1848
1849
1850 [HTTP] T. Berners-Lee, R. T. Fielding, and H. Frystyk
1851 Nielsen, "Hypertext Transfer Protocol - HTTP/1.0",
1852 Work in progress (draft-ietf-http-v10-spec-00.ps),
1853 MIT, UC Irvine, CERN, March 1995.
1854
1855 [ISO-639] ISO 639:1988. Codes pour la reprsentation des noms de
1856 langue. Technical content in
1857 <http://www.sil.org/sgml/iso639a.html>
1858
1859 [ISO-1000] ISO 1000:1992. Units SI et recommandations pour
1860 l'emploi de leurs multiples et de certaines autres
1861 units.
1862
1863 [ISO-3166] ISO 3166:1993. Codes pour la reprsentation des noms
1864 de pays.
1865
1866 [ISO-4217] ISO 4217:1990. Codes pour la reprsentation des mon-
1867 naies et types des fonds.
1868
1869 [ISO-8601] ISO 8601:1988. lments de donnes et formats
1870 d'change -- change d'information -- Reprsentation
1871 de la date et de l'heure.
1872
1873 [ISO-8859-1] ISO 8859-1:1987. International Standard -- Informa-
1874 tion Processing -- 8-bit Single-Byte Coded Graphic
1875 Character Sets -- Part 1: Latin Alphabet No. 1.
1876
1877 [ISO-8879] ISO 8879:1986. International Standard -- Information
1878 Processing -- Text and Office Systems -- Standard Gen-
1879 eralized Markup Language (SGML).
1880
1881 [ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Infor-
1882 mation technology -- Universal Multiple-Octet Coded
1883 Character Set (UCS) -- Part 1: Architecture and Basic
1884 Multilingual Plane.
1885
1886 [NICOL] G.T. Nicol, "The Multilingual World Wide Web", Elec-
1887 tronic Book Technologies, 1995,
1888 <http://www.ebt.com/docs/multling.html>
1889
1890 [RFC1468] J. Murai, M. Crispin and E. van der Poel, "Japanese
1891 Character Encoding for Internet Messages", RFC 1468,
1892 Keio University, Panda Programming, June 1993.
1893
1894 [RFC1521] N. Borenstein and N. Freed, "MIME (Multipurpose Inter-
1895 net Mail Extensions) Part One: Mechanisms for Specify-
1896 ing and Describing the Format of Internet Message Bod-
1897 ies", RFC 1521, Bellcore, Innosoft, September 1993.
1898
1899
1900
1901 Expires 27 May 1996 [Page 34]
1902
1903 Internet Draft HTML internationalization 22 November 1995
1904
1905
1906 [RFC1590] J. Postel, "Media Type Registration Procedure", RFC
1907 1590, USC/ISI, March 1994.
1908
1909 [RFC1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform
1910 Resource Locators (URL)", RFC 1738, CERN, Xerox PARC,
1911 University of Minnesota, October 1994.
1912
1913 [RFC1766] H. Alverstrand, "Tags for the Identification of Lan-
1914 guages", RFC 1766, UNINETT, March 1995.
1915
1916 [SQ91] SoftQuad, "The SGML Primer", 3rd ed., SoftQuad Inc.,
1917 1991.
1918
1919 [TAKADA] Toshihiro Takada, "Multilingual Information Exchange
1920 through the World-Wide Web", Computer Networks and
1921 ISDN Systems, Vol. 27, No. 2, Nov. 1994 , p. 235-241.
1922
1923 [TEI] TEI Guidelines for Electronic Text Encoding and Inter-
1924 change. <http://etext.virgina.edu/TEI.html>
1925
1926 [UNICODE] The Unicode Consortium, "The Unicode Standard --
1927 Worldwide Character Encoding -- Version 1.0", Addison-
1928 Wesley, Volume 1, 1991, Volume 2, 1992. The BIDI
1929 algorithm is in appendix A of volume 1, with correc-
1930 tions in appendix D of volume 2.
1931
1932 [VANH90] E. van Hervijnen, "Practical SGML", Kluwer Academicq
1933 Publishers Group, Norwell and Dordrecht, 1990.
1934
1935 Authors' Addresses
1936
1937 Franois Yergeau
1938 Alis Technologies
1939 100, boul. Alexis-Nihon
1940 Suite 600
1941 Montral QC H4M 2P2
1942 Canada
1943
1944 Tel: +1 (514) 747-2547
1945 Fax: +1 (514) 747-2561
1946 EMail: yergeau@alis.ca
1947
1948
1949 Gavin Thomas Nicol
1950 Electronic Book Technologies, Japan
1951 1-29-9 Tsurumaki,
1952 Setagaya-ku,
1953 Tokyo
1954
1955
1956
1957 Expires 27 May 1996 [Page 35]
1958
1959 Internet Draft HTML internationalization 22 November 1995
1960
1961
1962 Japan
1963
1964 Tel + Fax: +81-3-3706-7351
1965 EMail: gtn@ebt.com, gtn@twics.co.jp
1966
1967
1968 Glenn Adams
1969 Stonehand
1970 118 Magazine Street
1971 Cambridge, MA 02139
1972 U.S.A.
1973
1974 Tel: +1 (617) 864-5524
1975 Fax: +1 (617) 864-4965
1976 EMail: glenn@stonehand.com
1977
1978
1979 Martin J. Duerst
1980 Multimedia-Laboratory
1981 Departement of Computer Science
1982 University of Zurich
1983 Winterthurerstrasse 190
1984 CH-8057 Zurich
1985 Switzerland
1986
1987 Tel: +41 1 257 43 16
1988 Fax: +41 1 363 00 35
1989 E-mail: mduerst@ifi.unizh.ch
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013 Expires 27 May 1996 [Page 36]
2014

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24