/[suikacvs]/webroot/www/2004/id/draft-ietf-html-i18n-01.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-html-i18n-01.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:04 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2 Network Working Group F. Yergeau
3 Internet Draft G. Nicol
4 <draft-ietf-html-i18n-01.txt> G. Adams
5 Expires 30 March 1996 M. Duerst
6 25 September 1995
7
8
9 Internationalization of the Hypertext Markup Language
10
11
12 Status of this Memo
13
14 This document is an Internet-Draft. Internet-Drafts are working doc-
15 uments of the Internet Engineering Task Force (IETF), its areas, and
16 its working groups. Note that other groups may also distribute work-
17 ing documents as Internet-Drafts.
18
19 Internet-Drafts are draft documents valid for a maximum of six
20 months. Internet-Drafts may be updated, replaced, or obsoleted by
21 other documents at any time. It is not appropriate to use Internet-
22 Drafts as reference material or to cite them other than as a "working
23 draft" or "work in progress".
24
25 To learn the current status of any Internet-Draft, please check the
26 1id-abstracts.txt listing contained in the Internet-Drafts Shadow
27 Directories on ds.internic.net (US East Coast), nic.nordu.net
28 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
29 Rim).
30
31 Distribution of this document is unlimited. Please send comments to
32 the HTML working group (HTML-WG) of the Internet Engineering Task
33 Force (IETF) at <html-wg@oclc.org>. Discussions of the group are
34 archived at URL: http://www.acl.lanl.gov/HTML_WG/archives.html.
35
36
37 Abstract
38
39 The Hypertext Markup Language (HTML) is a simple markup language used
40 to create hypertext documents that are platform independent. Ini-
41 tially, the application of HTML on the World Wide Web was seriously
42 restricted by its reliance on the ISO-8859-1 coded character set,
43 which is appropriate only for Western European languages. Despite
44 this restriction, HTML has been widely used with other languages,
45 using other coded character sets or character encodings, through var-
46 ious ad hoc extensions to the language.
47
48 This document is meant to address the issue of the internationaliza-
49 tion of HTML by extending the specification of HTML 2.0 and giving
50
51
52
53 Expires 30 March 1996 [Page 1]
54
55 Internet Draft HTML internationalization 25 September 1995
56
57
58 additional recommendations for proper internationalisation support.
59 A foremost consideration is to make sure that HTML remains a valid
60 application of SGML, while enabling its use in all languages of the
61 world.
62
63 The "text/html; version=2.1" Internet Media Type [RFC1590] and MIME
64 Content Type [RFC1521] is defined by this specification, taken
65 together with the HTML 2.0 specification [HTML-2].
66
67
68 Table of contents
69
70 1. Introduction .................................................. 2
71 1.1. Scope ...................................................... 3
72 1.2. Conformance ................................................ 3
73 2. The document character set ..................................... 5
74 2.1. Reference processing model ................................. 5
75 2.2. The HTML 2.1 document character set ........................ 7
76 2.3. Undisplayable characters ................................... 8
77 3. Language tags .................................................. 8
78 4. Additional entities, attributes and elements ...................10
79 4.1. Full Latin-1 entity set ....................................10
80 4.2. Markup for language-dependent presentation .................10
81 5. Forms ..........................................................12
82 5.1. DTD additions ..............................................12
83 5.2. Form submission ............................................13
84 6. Miscellaneous ..................................................14
85 7. HTML public text ...............................................15
86 7.1. HTML DTD ...................................................15
87 7.2. SGML declaration for HTML ..................................30
88 7.3. Entity sets ................................................31
89 7.3.1. ISO Latin 1 character entity set .......................31
90 Bibliography ......................................................34
91 Authors' Addresses ................................................36
92
93
94 1. Introduction
95
96 The Hypertext Markup Language (HTML) is a simple markup language used
97 to create hypertext documents that are platform independent. Ini-
98 tially, the application of HTML on the World Wide Web was seriously
99 restricted by its reliance on the ISO-8859-1 coded character set,
100 which is appropriate only for Western European languages. Despite
101 this restriction, HTML has been widely used with other languages,
102 using other coded character sets or character encodings, through var-
103 ious ad hoc extensions to the language [TAKADA].
104
105 This document is meant to address the issue of the
106
107
108
109 Expires 30 March 1996 [Page 2]
110
111 Internet Draft HTML internationalization 25 September 1995
112
113
114 internationalization of HTML by extending the specification of HTML
115 2.0 and giving additional recommendations for proper internationali-
116 sation support. It is in good part based on a paper by one of the
117 authors on multilingualism on the WWW [NICOL]. A foremost considera-
118 tion is to make sure that HTML remains a valid application of SGML,
119 while enabling its use in all languages of the world.
120
121 The specific issues addressed are the SGML document character set to
122 be used for HTML, the proper treatment of the charset parameter asso-
123 ciated with the "text/html" content type and the specification of
124 language tags and additional entities.
125
126
127 1.1 Scope
128
129 HTML has been in use by the World-Wide Web (WWW) global information
130 initiative since 1990. This specification extends the capabilities
131 of HTML 2.0 (RFC xxx), primarily by removing the restriction to the
132 ISO-8859-1 coded character set [ISO-8859-1]. Together with the HTML
133 2.0 specification, it defines a new version of HTML to be known as
134 "HTML 2.1".
135
136 HTML is an application of ISO Standard 8879:1986, Information Pro-
137 cessing Text and Office Systems -- Standard Generalized Markup Lan-
138 guage (SGML) [ISO-8879]. The HTML Document Type Definition (DTD) is a
139 formal definition of the HTML syntax in terms of SGML. This specifi-
140 cation amends the DTD of HTML 2.0 in order to make it applicable to
141 documents encompassing a character repertoire much larger than that
142 of ISO-8859-1, while still remaining SGML conformant.
143
144 Together with the HTML 2.0, specification, this specification also
145 defines HTML as an Internet Media Type [RFC1590] and MIME Content
146 Type [RFC1521] called "text/html", or "text/html; version=2.1". As
147 such, it defines the semantics of the HTML syntax and how that syntax
148 should be interpreted by user agents.
149
150
151 1.2 Conformance
152
153 This specification governs the syntax of HTML documents and aspects
154 of the behavior of HTML user agents.
155
156 1.2.1 Documents
157
158 A document is a conforming HTML document if:
159
160 * It is a conforming SGML document, and it conforms to the HTML DTD
161 (see 7.1, "HTML DTD").
162
163
164
165 Expires 30 March 1996 [Page 3]
166
167 Internet Draft HTML internationalization 25 September 1995
168
169
170 * It conforms to the application conventions in this specification.
171 For example, the value of the HREF attribute of the <A> element
172 must conform to the URI syntax.
173
174 1.2.2. User agents
175
176 An HTML user agent conforms to this specification if:
177
178 * It parses the characters of an HTML document into data characters
179 and markup according to SGML [ISO-8879].
180
181 NOTE -- In the interest of robustness and extensibility,
182 there are a number of widely deployed conventions for han-
183 dling non-conforming documents. See section 4.2.1 of the
184 HTML 2.0 specification [HTML-2], "Undeclared Markup Error
185 Handling" for details.
186
187 * It supports at least the ISO-8859-1 character encoding scheme and
188 processes each character in the ISO Latin Alphabet No. 1 as speci-
189 fied in section 6.1 of [HTML-2].
190
191 To ensure interoperability and proper support for at least
192 ISO-8859-1 in an environment where character encoding schemes
193 other than ISO-8859-1 are present, user agents must correctly
194 interpret the charset parameter accompanying an HTML document
195 received from the network.
196
197 Furthermore, conforming user-agents are required to at least parse
198 correctly numeric character references within the range of the
199 Basic Multilingual Plane (BMP) of ISO 10646-1 [ISO-10646].
200
201 NOTE -- To support non-western writing systems, HTML user
202 agents are encouraged to support `UNICODE-1-1' or similar
203 character encoding schemes and as much of the character
204 repertoire of [ISO-10646] as is practical.
205
206 * It behaves identically for documents whose parsed token sequences
207 are identical.
208
209 For example, comments and the whitespace in tags disappear during
210 tokenization, and hence they do not influence the behavior of con-
211 forming user agents.
212
213 * It allows the user to traverse (or at least attempt to traverse,
214 resources permitting) all hyperlinks from <A> elements in an HTML
215 document.
216
217 An HTML user agent is a level 2 user agent if, additionally:
218
219
220
221 Expires 30 March 1996 [Page 4]
222
223 Internet Draft HTML internationalization 25 September 1995
224
225
226 * It allows the user to express all form field values specified in
227 an HTML document and to (attempt to) submit the values as requests
228 to information services.
229
230 2. The document character set
231
232 2.1. Reference processing model
233
234 This overview explains the reference processing model used for HTML
235 2.1, and in particular the SGML concept of a document character set.
236 An actual implementation may widely differ in its internal workings
237 from the model given below, but should behave as described to an out-
238 side observer.
239
240 Because there are various widely differing encodings of text, SGML
241 does not directly address the question of how characters are encoded
242 e.g. in a file. SGML views the characters as a single set (called a
243 "character repertoire"), and a "code set" that assigns an integer
244 number (known as "character number") to each character in the reper-
245 toire. The document character set declaration defines what each of
246 the character numbers represents [GOLD90, p. 451]. In most cases, an
247 SGML DTD and all documents that refer to it have a single document
248 character set, and all markup and data characters are part of this
249 set.
250
251 HTML, as an application of SGML, does not directly address the ques-
252 tion of how characters are encoded as octets in external representa-
253 tions such as files. This is deferred to mechanisms external to HTML,
254 such as the HTTP protocol, or MIME for electronic mail.
255
256 For the HTTP protocol [HTTP], the way characters are encoded is
257 defined by the "charset" parameter[1] of the "Content-Type" field of
258 the header of an HTTP response. For example, to indicate that the
259 transmitted document is encoded in the "JIS" encoding of Japanese
260 [RFC1468], the header will contain the following line:
261
262 Content-Type: text/html; charset=ISO-2022-JP
263
264 _________________________
265 [1] The term "charset" in MIME is used to designate a
266 character encoding, rather than a coded character set
267 as the term may suggest. A character encoding is a
268 mapping (possibly many-to-one) of a sequence of octets
269 to a sequence of characters taken from one or more
270 character repertoires. A coded character set is a map-
271 ping between individual bit patterns and individual
272 characters from a single character repertoire.
273
274
275
276
277 Expires 30 March 1996 [Page 5]
278
279 Internet Draft HTML internationalization 25 September 1995
280
281
282 The default charset parameter in case of the HTTP protocol is
283 ISO-8859-1 (the so-called "Latin-1" for Western European characters).
284 The HTTP protocol also defines a mechanism for the client to specify
285 the character encodings it can accept. Clients and servers are
286 strongly requested to use these mechanisms to assure correct trans-
287 mission and interpretation of any document. Provisions that can be
288 taken to help correct interpretation, even in cases where a server or
289 client do not yet use these mechanisms, are described in section 6.
290
291 Similarly, if HTML documents are transferred by electronic mail, the
292 character encoding is defined by the "charset" parameter of the "Con-
293 tent-Type" MIME header line [RFC1521].
294
295 In the case any other way of transferring and storing HTML documents
296 are defined or become popular, it is advised that similar provisions
297 should be made to clearly identify the character encoding used and/or
298 to use a single/default encoding capable of representing the widest
299 range of characters used in an international context.
300
301 Whatever the external character encoding actually be, the reference
302 processing model translates it to a representation of the document
303 character set specified in Section 2.2 before processing specific to
304 SGML/HTML. The reference processing model can be depicted as fol-
305 lows:
306
307 [resource]->[decoder]->[entity ]->[ SGML ]->[application]->[display]
308 [manager] [parser]
309 ^ |
310 | |
311 +----------+
312
313 The decoder is responsible for decoding the external representation
314 of the resource to a representation using the document character set.
315 The entity manager, the parser, and the application deal only with
316 characters of the document character set. A display-oriented part of
317 the application or the display machinery itself may again convert
318 characters represented in the document character set to some other
319 representation more suitable for their purpose. In any case, the
320 entity manager, the parser, and the application, as far as character
321 semantics are concerned, are using the HTML 2.1 document character
322 set only.
323
324 An actual implementation may choose, or not, to translate the docu-
325 ment into some encoding of the document character set as described
326 above; the behaviour described by this reference processing model can
327 be achieved otherwise. This subject is well out of the scope of this
328 specification, however, and the reader is invited to consult the SGML
329 standard [ISO-8879] or a SGML handbook [BRYAN88] [GOLD90] [VANH90]
330
331
332
333 Expires 30 March 1996 [Page 6]
334
335 Internet Draft HTML internationalization 25 September 1995
336
337
338 [SQ91] for further information.
339
340 The most important consequence of this reference processing model is
341 that numeric character references are always resolved to the same
342 characters, whatever the external encoding actually used. For an
343 example, see Section 2.2.
344
345 2.2. The HTML 2.1 document character set
346
347 The document character set, in the SGML sense, of HTML 2.1 is the
348 Basic Multilingual Plane of ISO 10646:1993 [ISO-10646], also known as
349 UCS-2. This is code-by-code identical with the Unicode standard
350 [UNICODE]. The adoption of this document character set implies a
351 change in the SGML declaration specified in the HTML 2.0 specifica-
352 tion (section 9.5 of [HTML-2]). The change amounts to removing the
353 two BASESET specifications and their accompanying DESCSET declara-
354 tions, replacing them with the following declaration:
355
356 BASESET "ISO Registration Number 176//CHARSET
357 ISO/IEC 10646-1:1993 UCS-2 with implementation level 3
358 //ESC 2/5 2/15 4/5"
359 DESCSET 0 9 UNUSED
360 9 2 9
361 11 2 UNUSED
362 13 1 13
363 14 18 UNUSED
364 32 95 32
365 127 1 UNUSED
366 128 32 UNUSED
367 160 65374 160
368
369 Making UCS-2 the document character set does not create non-
370 conformance of any expression, construct or document that is conform-
371 ing to HTML 2.0. It does make conforming certain constructs that are
372 not admissible in HTML 2.0. One consequence is that data characters
373 outside the repertoire of ISO-8859-1, but within that of UCS-2 become
374 valid SGML characters. Another is that the upper limit of the range
375 of numeric character references is extended from 255 to 65533[2] ;
376 thus, &#1048; is a valid reference to a "CYRILLIC CAPITAL LETTER I".
377 [ERCS] is a good source of information on Unicode and SGML, although
378 its scope and technical content differ greatly from this
379 _________________________
380 [2] 65533 (FFFD hexadecimal) is the last valid char-
381 acter in UCS-2. 65534 (FFFE hexadecimal) is unassigned
382 and reserved as the byte-swapped version of ZERO WIDTH
383 NON-BREAKING SPACE for byte-sex detection purposes.
384 65535 (FFFF hexadecimal) is unassigned.
385
386
387
388
389 Expires 30 March 1996 [Page 7]
390
391 Internet Draft HTML internationalization 25 September 1995
392
393
394 specification.
395
396 ISO 10646-1:1993 is the most encompassing character set currently
397 existing, and there is no other character set that could take its
398 place as the document character set for HTML 2.1. Also, it is
399 expected that with future extensions of ISO 10646, this specification
400 may also be extended. If nevertheless for a specific application
401 there is a need to use characters outside this standard, this should
402 be done by avoiding any conflicts with present or future versions of
403 ISO 10646, i.e. by assigning these characters to a private zone.
404 Also, it should be borne in mind that such a use will be highly
405 unportable; in many cases, it may be better to use inline bitmaps.
406
407 2.3. Undisplayable characters
408
409 With the document character set being the full ISO 10646 BMP, the
410 possibility that a character cannot be displayed due to lack of
411 appropriate resources (fonts) cannot be avoided. Because there are
412 many different things that can be done in such a case, this document
413 does not recommend any specific behaviour. Depending on the implemen-
414 tation, this may also be handled by the underlaying display system
415 and not the application itself. The following considerations, how-
416 ever, may be of help:
417
418 - A clearly visible, but unobtrusive behaviour should be preferred.
419 Some documents may contain many characters that cannot be renden-
420 dered, and so showing an alert for each of them is not the right
421 thing to do.
422
423 - In case a numeric representation of the missing character is
424 given, its hexadecimal (not decimal) form is to be preferred,
425 because this form is used in character set standards [ERCS].
426
427 3. Language tags
428
429 Language tags can be used to control rendering of a marked up docu-
430 ment in various ways: character disambiguation, in cases where the
431 character encoding is not sufficient to resolve to a specific glyph;
432 quotation marks; hyphenation; ligatures; spacing; voice synthesis;
433 etc. Independently of rendering issues, language markup is useful as
434 content markup for purposes such as classification and searching.
435
436 The language attribute, LANG, takes as its value a language tag that
437 identifies a natural language spoken, written, or otherwise conveyed
438 by human beings for communication of information to other human
439 beings. Computer languages are explicitly excluded.
440
441 The syntax and registry of HTML language tags is the same as that
442
443
444
445 Expires 30 March 1996 [Page 8]
446
447 Internet Draft HTML internationalization 25 September 1995
448
449
450 defined by RFC 1766 [RFC1766]. In summary, a language tag is composed
451 of one or more parts: A primary language tag and a possibly empty
452 series of subtags:
453
454 language-tag = primary-tag *( "-" subtag )
455 primary-tag = 1*8ALPHA
456 subtag = 1*8ALPHA
457
458 Whitespace is not allowed within the tag and all tags are case-
459 insensitive. The namespace of language tags is administered by the
460 IANA. Example tags include:
461
462 en, en-US, en-cockney, i-cherokee, x-pig-latin
463
464 Two-letter primary-tags are reserved for ISO 639 language abbrevia-
465 tions [ISO-639], and three-letter primary-tags for the language
466 abbreviations of the "Ethnologue" [ETHNO] (the latter is in addition
467 to the requirements of RFC 1766). Any two-letter initial subtag is an
468 ISO 3166 country code [ISO-3166].
469
470 In the context of HTML, a language tag is not to be interpreted as a
471 single token, as per RFC 1766, but as a hierarchy. For example, a
472 user agent that adjusts rendering according to language should con-
473 sider that it has a match when a language tag in a style sheet entry
474 matches the initial portion of the language tag of an element. An
475 exact match should be preferred. This interpretation allows an ele-
476 ment marked up as, for instance, "en-US" to trigger styles corre-
477 sponding to, in order of preference, US-English ("en-US") or 'plain'
478 or 'international' English ("en").
479
480 NOTE -- using the language tag as a hierarchy does not
481 imply that all languages with a common prefix will be
482 understood by those fluent in one or more of those lan-
483 guages; it simply allows the user to request this commonal-
484 ity when it is true for that user.
485
486 Since any text can logically be assigned a language, almost all HTML
487 elements admit the LANG attribute. The DTD reflects this. It is
488 also intended that any new element introduced in later versions of
489 HTML will admit the LANG attribute, unless there is a good reason not
490 to do so.
491
492 The rendering of elements is meant to be controlled (in part) by the
493 LANG attribute. Specific user preferences set within the browser
494 should override the value of the LANG attribute, which in turn over-
495 rides the value specified by the LANG attribute of any enclosing ele-
496 ment. If none of these are set, a suitable default, perhaps con-
497 trolled by the user's locale, should be used to control rendering.
498
499
500
501 Expires 30 March 1996 [Page 9]
502
503 Internet Draft HTML internationalization 25 September 1995
504
505
506 4. Additional entities, attributes and elements
507
508 4.1. Full Latin-1 entity set
509
510 According to the suggestion of section 14 of [HTML-2], the set of
511 Latin-1 entities is extended to cover the whole right part of
512 ISO-8859-1. The names of the entities are taken from the appendices
513 of [SGML]. A list is provided in section 7.3.1 of this specifica-
514 tion.
515
516 4.2. Markup for language-dependent presentation
517
518 For the correct presentation of text from certain languages (irre-
519 spective of formatting issues), some support in the form of addi-
520 tional entities and elements is needed. In particular, bidirectional
521 text (BIDI for short) requires markup in special circumstances where
522 ambiguities as to the directionnality of some characters have to be
523 resolved. Plain text may contain this markup in the form of special-
524 purpose characters; in HTML, these are replaced by SGML markup to be
525 described below.
526
527 This markup affects the ability to render BIDI text in a semantically
528 legible fashion. That is, without this special BIDI markup, cases
529 arise which would prevent *any* rendering whatsoever that reflected
530 the basic meaning of the text. It is for this reason that these spe-
531 cial characters were added to Unicode (and, thence, to ISO/IEC
532 10646). If it were possible to do reliable layout and rendering of
533 bidirectionnal text without them, they definitely would not have been
534 included in Unicode.
535
536 First, a set of named character entities is added that allows partial
537 support of the Unicode bidirectional algorithm [UNICODE], plus some
538 help with languages requiring contextual analysis for rendering:
539
540 <!ENTITY zwnj CDATA "&#8204;"--=zero width non-joiner-->
541 <!ENTITY zwj CDATA "&#8205;"--=zero width joiner-->
542 <!ENTITY lrm CDATA "&#8206;"--=left-to-right mark-->
543 <!ENTITY rlm CDATA "&#8207;"--=right-to-left mark-->
544
545 The first two, zwnj and zwj, are used to force or block joining
546 behavior in contexts which joining would occur but should not or
547 would not occur but should. For example, ARABIC LETTER HEH is used
548 to abbreviate "Hijri" (the Islamic calendrical system); however, the
549 isolated form of HEH looks like the digit five as employed in Arabic
550 script (actually based on Indic digits). In order to prevent one
551 from reading HEH as a final digit five in a year, the initial form of
552 HEH is used. However, there is no following context (i.e., a joining
553 letter) to which the HEH can join. Therefore, the ZWJ is used to
554
555
556
557 Expires 30 March 1996 [Page 10]
558
559 Internet Draft HTML internationalization 25 September 1995
560
561
562 provide that context. In Farsi texts, there are cases where a letter
563 that normally would join a subsequent letter in a cursive connection
564 does not. Here the ZWNJ is used.
565
566 The other two, lrm and rlm, are used to disambiguate directionality
567 of directionally neutral characters, e.g., if you have a double quote
568 sitting between an Arabic and a Latin letter, then which direction
569 does the quote resolve to? These characters are like zero width
570 spaces which have a directional property (but no word/line break
571 property).
572
573 Next, an attribute called DIR is introduced, restricted to the values
574 LTR and RTL and admitted by most elements. On block-type elements,
575 the DIR attribute indicates the base directionnality of the text in
576 the block; if omitted it is inherited from the parent element. On
577 inline elements, it makes the element start a new embedding level; if
578 omitted the inline element does not start a new embedding level.
579 Embedding is used to handle nested directional runs; a common need
580 for the embedding characters is to handle text that has been pasted
581 from one bidi context to another, and the possibility of multiply
582 embedded pastings. Following is an example of a case where embedding
583 is needed, showing its effect:
584
585 Given the following latin (upper case) and arabic (lower
586 case) letters in backing store with the specified embed-
587 dings (LRE is shorthand for <SPAN DIR=LTR>, RLE for <SPAN
588 DIR=RTL> and PDF for </SPAN>):
589
590 LRE A B RLE a b LRE C D PDF c d PDF E F PDF
591
592 One gets the following rendering (with [] showing the
593 directional transitions):
594
595 [ A B [ d c [ C D ] b a ] E F ]
596
597 On the other hand, without these characters, e.g., with
598
599 A B a b C D c d E F
600
601 and a base level of LTR one gets the following rendering:
602
603 [ A B [ b a ] C D [ d c ] E F ]
604
605 Notice that b,a is on the left and d,c on the right unlike
606 the above case where the embedding levels are used. With-
607 out the embedding characters one has at most two levels: a
608 base directional level and a single counterflow directional
609 level.
610
611
612
613 Expires 30 March 1996 [Page 11]
614
615 Internet Draft HTML internationalization 25 September 1995
616
617
618 A directionnal override feature is needed to deal with
619 unusual pieces of text in which directionality cannot be
620 resolved from context in an unambiguous fashion. For exam-
621 ple, in part numbers, formulas, telephone numbers, and
622 other similar pieces of text, it is difficult or impossible
623 to derive the directionality of numbers, punctuation, and
624 other neutrals from their context. To this effect, a new
625 element called BDO (BIDI override) is introduced, which
626 requires the DIR attribute to specify whether the override
627 is left-to-right or right-to-left.
628
629 A few other additional elements are important to have for
630 proper language-dependent rendering. First, a generic con-
631 tainer is needed to carry the LANG and BIDI attributes in
632 cases where no other element is appropriate; the SPAN ele-
633 ment is introduced for that purpose.
634
635 Short quotations, and in particular the quotation marks
636 surrounding them, are typically rendered differently in
637 different languages and on platforms with different graphic
638 capabilities: "a quotation in English", `another, slightly
639 better one', ,,a quotation in German", << a quotation in
640 French >>. The <Q> element is introduced for that purpose.
641
642 Many languages, including English, require superscripts for
643 proper rendering: "the XXth century" should have "th" in
644 superscript. The <SUP> element, and its sibling <SUB>, are
645 introduced to allow proper markup of such text. <SUP> and
646 <SUB> contents are restricted to PCDATA to avoid nesting
647 problems.
648
649 Finally, in many languages text justification is much more
650 important than it is in Western languages, and justifies
651 markup. The ALIGN attribute, admitting values of LEFT,
652 RIGHT, CENTER and JUSTIFY, is added to a selection of ele-
653 ments where it makes sense (block-like).
654
655 5. Forms
656
657
658 5.1. DTD additions
659
660 It is natural to expect input in any language in forms, as they pro-
661 vide one of the only ways of obtaining user input. While this is pri-
662 marily a UI issue, there are some things that should be specified at
663 the HTML level to guide behavior and promote interoperability.
664
665 To ensure interoperability, it is necessary for the user agent (and
666
667
668
669 Expires 30 March 1996 [Page 12]
670
671 Internet Draft HTML internationalization 25 September 1995
672
673
674 the user) to have an indication of the character set(s) that the
675 server providing a form will be able to handle upon submission of the
676 filled-in form. Such an indication is provided by the ACCEPT-CHARSET
677 attribute of the FORM element, modeled on the HTTP Accept-Charset
678 header (see [HTTP]), which contains a space and/or comma delimited
679 list of character sets acceptable to the server. A user agent may
680 want to somehow advise the user of the contents of this attribute, or
681 to restrict his possibility to enter unacceptable characters.
682
683 NOTE -- The list of character sets is to be interpreted as
684 an EXCLUSIVE-OR list; the server announces that it is ready
685 to accept any ONE of these character encoding schemes for
686 each part of a multipart entity.
687
688
689 5.2. Form submission
690
691 The HTML 2.0 form submission mechanism, based on the "application/x-
692 www-form-urlencoded" media type, is hopelessly broken with regard to
693 internationalization. In fact, since URLs are restricted to ASCII
694 characters, the mechanism is broken even for ISO-8859-1 text. Sec-
695 tion 2.2 of [RFC1738] specifies that octets may be encoded using the
696 "%HH" notation, but text submitted from a form is composed of charac-
697 ters, not octets. Lacking a specification of a character encoding
698 scheme, the "%HH" notation has no meaning.
699
700 A partial solution to this sorry state of affairs is to specify a
701 default character encoding scheme to be assumed when the GET method
702 of form submission is used. Specifying UCS-2 would break all exist-
703 ing forms, so the only sensible way is to designate ISO-8859-1. That
704 is, the encoded URL sent to submit a form by the GET method is to be
705 interpreted as a sequence of single-octet characters encoded accord-
706 ing to ISO-8859-1, and further encoded according to the scheme of
707 [RFC1738] (the "%HH" notation). This is clearly insufficient, so the
708 GET method of form submission is deprecated and should not be used in
709 future documents, despite the language of section XX of [HTML-2].
710
711 A better solution is to add a MIME charset parameter to the Content-
712 Type header sent along with a POST method form submission, with the
713 understanding that the URL encoding of [RFC1738] is applied on top of
714 the specified character encoding, as a kind of implicit Content-
715 Transfer-Encoding. The default ISO-8859-1 is to be implied in the
716 absence of a charset parameter.
717
718 The best solution is to use the "multipart/form-data" media type
719 described in [FILE-UPLOAD] with the POST method of form submission.
720 This mechanism encapsulates the value part of each name-value pair in
721 a body-part of a multipart MIME body that is sent as the HTTP entity;
722
723
724
725 Expires 30 March 1996 [Page 13]
726
727 Internet Draft HTML internationalization 25 September 1995
728
729
730 each body part can be labeled with an appropriate Content-Type,
731 including if necessary a charset parameter that specifies the charac-
732 ter encoding scheme. The changes to the DTD necessary to support
733 this method of form submission have been incorporated in the DTD
734 included in this specification.
735
736 How the user agent determines the encoding of the text entered by the
737 user is outside the scope of this specification.
738
739 6. Miscellaneous
740
741 Proper interpretation of a text document requires that the character
742 encoding scheme be known. Current HTTP servers, however, do not gen-
743 erally include an appropriate charset parameter with the Content-Type
744 header, even when the encoding scheme is different from the default
745 ISO-8859-1. This is bad behaviour, and as such strongly discouraged,
746 but some preventive measures can be taken to minimize the detrimental
747 effects.
748
749 In the case where a document is accessed from a hyperlink in an ori-
750 gin HTML document, a CHARSET attribute is added to the attribute list
751 of elements with link semantics (A and LINK), specifically by adding
752 it to the linkExtraAttributes entity. The value of that attribute is
753 to be considered a hint to the User Agent as to the character encod-
754 ing scheme used by the ressource pointed to by the hyperlink; it
755 should be the appropriate value of the MIME charset parameter for
756 that ressource.
757
758 In any document, it may be wise to include an indication of the
759 encoding scheme like the following, as early as possible within the
760 HEAD of the document:
761
762 <META HTTP-EQUIV="Content-Type"
763 CONTENT="text/html; charset=ISO-2022-JP">
764
765 This is not foolproof, but will work if the encoding scheme is such
766 that ASCII characters stand for themselves at least until the META
767 element is parsed.
768
769 For definiteness, the "charset" parameter received from the source of
770 the document should be considered the most authoritative, followed in
771 order of preference by the contents of a META element such as the
772 above, and finally the CHARSET parameter of the anchor that was fol-
773 lowed (if any).
774
775 When HTML text is transmitted directly in UCS-2
776 (charset=UNICODE-1-1), the question of byte order arises: does the
777 high-order byte of each two-byte character come first or second? For
778
779
780
781 Expires 30 March 1996 [Page 14]
782
783 Internet Draft HTML internationalization 25 September 1995
784
785
786 definiteness, this specification recommends that UCS-2 be transmitted
787 in big-endian byte order (high order byte first), which corresponds
788 both to the established network byte order for two-byte quantities
789 and to the Unicode recommendation for serialized text data. Further-
790 more, to maximize chances of proper interpretation, it is recommended
791 that documents transmitted as UCS-2 always begin with a ZERO-WIDTH
792 NON-BREAKING SPACE character (hexadecimal FEFF) which, when byte-
793 reversed becomes number FFFE, a character guaranteed to be never
794 assigned. Thus, a user-agent receiving an FFFE as the first octets
795 of a text would know that bytes have to be reversed for the remainder
796 of the text.
797
798 7. HTML Public Text
799
800 7.1. HTML DTD
801
802 <!-- html-2.1.dtd
803
804 Document Type Definition for the HyperText Markup Language,
805 version 2.1 (HTML DTD)
806
807 Last revised: 95/09/25
808
809 Authors: Daniel W. Connolly <connolly@w3.org>
810 Francois Yergeau <yergeau@alis.com>
811 -->
812
813 <!ENTITY % HTML.Version
814 "-//IETF//DTD HTML 2.1//EN"
815
816 -- Typical usage:
817
818 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.1//EN">
819 <html>
820 ...
821 </html>
822 --
823 >
824
825
826 <!--============ Feature Test Entities ========================-->
827
828 <!ENTITY % HTML.Recommended "IGNORE"
829 -- Certain features of the language are necessary for
830 compatibility with widespread usage, but they may
831 compromise the structural integrity of a document.
832 This feature test entity enables a more prescriptive
833 document type definition that eliminates
834
835
836
837 Expires 30 March 1996 [Page 15]
838
839 Internet Draft HTML internationalization 25 September 1995
840
841
842 those features.
843 -->
844
845 <![ %HTML.Recommended [
846 <!ENTITY % HTML.Deprecated "IGNORE">
847 ]]>
848
849 <!ENTITY % HTML.Deprecated "INCLUDE"
850 -- Certain features of the language are necessary for
851 compatibility with earlier versions of the specification,
852 but they tend to be used and implemented inconsistently,
853 and their use is deprecated. This feature test entity
854 enables a document type definition that eliminates
855 these features.
856 -->
857
858 <!ENTITY % HTML.Highlighting "INCLUDE"
859 -- Use this feature test entity to validate that a
860 document uses no highlighting tags, which may be
861 ignored on minimal implementations.
862 -->
863
864 <!ENTITY % HTML.Forms "INCLUDE"
865 -- Use this feature test entity to validate that a document
866 contains no forms, which may not be supported in minimal
867 implementations
868 -->
869
870 <!--============== Imported Names ==============================-->
871
872 <!ENTITY % Content-Type "CDATA"
873 -- meaning an internet media type
874 (aka MIME content type, as per RFC1521)
875 -->
876
877 <!ENTITY % HTTP-Method "GET | POST"
878 -- as per HTTP specification, in progress
879 -->
880
881 <!--========= DTD "Macros" =====================-->
882
883 <!ENTITY % heading "H1|H2|H3|H4|H5|H6">
884
885 <!ENTITY % list " UL | OL | DIR | MENU " >
886
887 <!ENTITY % attrs -- common attributes for elements --
888 "LANG NAME #IMPLIED -- RFC 1766 language tag --
889 DIR (ltr|rtl) #IMPLIED -- text directionnality --">
890
891
892
893 Expires 30 March 1996 [Page 16]
894
895 Internet Draft HTML internationalization 25 September 1995
896
897
898 <!ENTITY % just -- an attribute for text justification --
899 "ALIGN (left|right|center|justify) #IMPLIED">
900
901 <!--======= Character mnemonic entities =================-->
902
903 <!ENTITY % ISOlat1 PUBLIC
904 "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
905 %ISOlat1;
906
907 <!--Entities for markup significant characters -->
908 <!ENTITY amp CDATA "&#38;" -- ampersand -->
909 <!ENTITY gt CDATA "&#62;" -- greater than -->
910 <!ENTITY lt CDATA "&#60;" -- less than -->
911 <!ENTITY quot CDATA "&#34;" -- double quote -->
912
913 <!--Entities for language-dependent presentation (BIDI and contextual analysis) -->
914 <!ENTITY zwnj CDATA "&#8204;"-- zero width non-joiner-->
915 <!ENTITY zwj CDATA "&#8205;"-- zero width joiner-->
916 <!ENTITY lrm CDATA "&#8206;"-- left-to-right mark-->
917 <!ENTITY rlm CDATA "&#8207;"-- right-to-left mark-->
918
919 <!--========= SGML Document Access (SDA) Parameter Entities =====-->
920
921 <!-- HTML 2.0 contains SGML Document Access (SDA) fixed attributes
922 in support of easy transformation to the International Committee
923 for Accessible Document Design (ICADD) DTD
924 "-//EC-USA-CDA/ICADD//DTD ICADD22//EN".
925 ICADD applications are designed to support usable access to
926 structured information by print-impaired individuals through
927 Braille, large print and voice synthesis. For more information on
928 SDA & ICADD:
929 - ISO 12083:1993, Annex A.8, Facilities for Braille,
930 large print and computer voice
931 - ICADD ListServ
932 <ICADD%ASUACAD.BITNET@ARIZVM1.ccit.arizona.edu>
933 - Usenet news group bit.listserv.easi
934 - Recording for the Blind, +1 800 221 4792
935 -->
936
937 <!ENTITY % SDAFORM "SDAFORM CDATA #FIXED"
938 -- one to one mapping -->
939 <!ENTITY % SDARULE "SDARULE CDATA #FIXED"
940 -- context-sensitive mapping -->
941 <!ENTITY % SDAPREF "SDAPREF CDATA #FIXED"
942 -- generated text prefix -->
943 <!ENTITY % SDASUFF "SDASUFF CDATA #FIXED"
944 -- generated text suffix -->
945 <!ENTITY % SDASUSP "SDASUSP NAME #FIXED"
946
947
948
949 Expires 30 March 1996 [Page 17]
950
951 Internet Draft HTML internationalization 25 September 1995
952
953
954 -- suspend transform process -->
955
956 <!--========== Text Markup =====================-->
957
958 <![ %HTML.Highlighting [
959
960 <!ENTITY % font " TT | B | I ">
961
962 <!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE">
963
964 <!ENTITY % text "#PCDATA|A|IMG|BR|%phrase|%font|SPAN|Q|BDO|SUP|SUB">
965
966 <!ELEMENT (%font;|%phrase) - - (%text)*>
967 <!ATTLIST ( TT | CODE | SAMP | KBD | VAR )
968 %attrs;
969 %SDAFORM; "Lit"
970 >
971 <!ATTLIST ( B | STRONG )
972 %attrs;
973 %SDAFORM; "B"
974 >
975 <!ATTLIST ( I | EM | CITE )
976 %attrs;
977 %SDAFORM; "It"
978 >
979
980 <!-- <TT> Typewriter text -->
981 <!-- <B> Bold text -->
982 <!-- <I> Italic text -->
983 <!-- <EM> Emphasized phrase -->
984 <!-- <STRONG> Strong emphasis -->
985 <!-- <CODE> Source code phrase -->
986 <!-- <SAMP> Sample text or characters -->
987 <!-- <KBD> Keyboard phrase, e.g. user input -->
988 <!-- <VAR> Variable phrase or substituable -->
989 <!-- <CITE> Name or title of cited work -->
990
991 <!ENTITY % pre.content "#PCDATA|A|HR|BR|%font|%phrase|SPAN|BDO">
992
993 ]]>
994
995 <!ENTITY % text "#PCDATA|A|IMG|BR|SPAN|Q|BDO|SUP|SUB">
996
997 <!-- Should the BDO element have an SDAFORM attr.? Which? -->
998 <!ELEMENT BDO - - (%text)+>
999 <!ATTLIST BDO
1000 LANG NAME #IMPLIED
1001 DIR (ltr|rtl) #REQUIRED
1002
1003
1004
1005 Expires 30 March 1996 [Page 18]
1006
1007 Internet Draft HTML internationalization 25 September 1995
1008
1009
1010 >
1011
1012 <!-- <BDO> Control bidirectionnal text -->
1013
1014 <!ELEMENT BR - O EMPTY>
1015 <!ATTLIST BR
1016 %SDAPREF; "&#RE;"
1017 >
1018
1019 <!-- <BR> Line break -->
1020
1021 <!-- Should the SPAN element have an SDAFORM attr.? Which? -->
1022 <!ELEMENT SPAN - - (%text)*>
1023 <!ATTLIST SPAN
1024 %attrs;
1025 >
1026
1027 <!-- <SPAN> Generic container -->
1028
1029 <!ELEMENT Q - - (%text)*>
1030 <!ATTLIST Q
1031 %attrs;
1032 %SDAFORM; "It" -- to be verified --
1033 >
1034
1035 <!-- <Q> Short quotation -->
1036
1037 <!ELEMENT (SUP|SUB) - - (#PCDATA)>
1038 <!ATTLIST (SUP|SUB)
1039 %attrs;
1040 >
1041
1042 <!-- <SUP> Superscript -->
1043 <!-- <SUB> Subscript -->
1044
1045 <!--========= Link Markup ======================-->
1046
1047 <!ENTITY % linkType "NAME">
1048
1049 <!ENTITY % linkExtraAttributes
1050 "REL %linkType #IMPLIED
1051 REV %linkType #IMPLIED
1052 URN CDATA #IMPLIED
1053 TITLE CDATA #IMPLIED
1054 METHODS NAMES #IMPLIED
1055 CHARSET NAME #IMPLIED
1056 ">
1057
1058
1059
1060
1061 Expires 30 March 1996 [Page 19]
1062
1063 Internet Draft HTML internationalization 25 September 1995
1064
1065
1066 <![ %HTML.Recommended [
1067 <!ENTITY % A.content "(%text)*"
1068 -- <H1><a name="xxx">Heading</a></H1>
1069 is preferred to
1070 <a name="xxx"><H1>Heading</H1></a>
1071 -->
1072 ]]>
1073
1074 <!ENTITY % A.content "(%heading|%text)*">
1075
1076 <!ELEMENT A - - %A.content -(A)>
1077 <!ATTLIST A
1078 %attrs;
1079 HREF CDATA #IMPLIED
1080 NAME CDATA #IMPLIED
1081 %linkExtraAttributes;
1082 %SDAPREF; "<Anchor: #AttList>"
1083 >
1084 <!-- <A> Anchor; source/destination of link -->
1085 <!-- <A NAME="..."> Name of this anchor -->
1086 <!-- <A HREF="..."> Address of link destination -->
1087 <!-- <A URN="..."> Permanent address of destination -->
1088 <!-- <A REL=...> Relationship to destination -->
1089 <!-- <A REV=...> Relationship of destination to this -->
1090 <!-- <A TITLE="..."> Title of destination (advisory) -->
1091 <!-- <A METHODS="..."> Operations on destination (advisory) -->
1092 <!-- <A CHARSET="..."> Charset of destination (advisory) -->
1093
1094
1095 <!--========== Images ==========================-->
1096
1097 <!ELEMENT IMG - O EMPTY>
1098 <!ATTLIST IMG
1099 %attrs;
1100 SRC CDATA #REQUIRED
1101 ALT CDATA #IMPLIED
1102 ALIGN (top|middle|bottom) #IMPLIED
1103 ISMAP (ISMAP) #IMPLIED
1104 %SDAPREF; "<Fig><?SDATrans Img: #AttList>#AttVal(Alt)</Fig>"
1105 >
1106
1107 <!-- <IMG> Image; icon, glyph or illustration -->
1108 <!-- <IMG SRC="..."> Address of image object -->
1109 <!-- <IMG ALT="..."> Textual alternative -->
1110 <!-- <IMG ALIGN=...> Position relative to text -->
1111 <!-- <IMG ISMAP> Each pixel can be a link -->
1112
1113 <!--========== Paragraphs=======================-->
1114
1115
1116
1117 Expires 30 March 1996 [Page 20]
1118
1119 Internet Draft HTML internationalization 25 September 1995
1120
1121
1122 <!ELEMENT P - O (%text)*>
1123 <!ATTLIST P
1124 %attrs;
1125 %just;
1126 %SDAFORM; "Para"
1127 >
1128
1129 <!-- <P> Paragraph -->
1130
1131
1132 <!--========== Headings, Titles, Sections ===============-->
1133
1134 <!ELEMENT HR - O EMPTY>
1135 <!ATTLIST HR
1136 %attrs;
1137 %just;
1138 %SDAPREF; "&#RE;&#RE;"
1139 >
1140
1141 <!-- <HR> Horizontal rule -->
1142
1143 <!ELEMENT ( %heading ) - - (%text;)*>
1144 <!ATTLIST H1
1145 %attrs;
1146 %just;
1147 %SDAFORM; "H1"
1148 >
1149 <!ATTLIST H2
1150 %attrs;
1151 %just;
1152 %SDAFORM; "H2"
1153 >
1154 <!ATTLIST H3
1155 %attrs;
1156 %just;
1157 %SDAFORM; "H3"
1158 >
1159 <!ATTLIST H4
1160 %attrs;
1161 %just;
1162 %SDAFORM; "H4"
1163 >
1164 <!ATTLIST H5
1165 %attrs;
1166 %just;
1167 %SDAFORM; "H5"
1168 >
1169 <!ATTLIST H6
1170
1171
1172
1173 Expires 30 March 1996 [Page 21]
1174
1175 Internet Draft HTML internationalization 25 September 1995
1176
1177
1178 %attrs;
1179 %just;
1180 %SDAFORM; "H6"
1181 >
1182
1183 <!-- <H1> Heading, level 1 -->
1184 <!-- <H2> Heading, level 2 -->
1185 <!-- <H3> Heading, level 3 -->
1186 <!-- <H4> Heading, level 4 -->
1187 <!-- <H5> Heading, level 5 -->
1188 <!-- <H6> Heading, level 6 -->
1189
1190
1191 <!--========== Text Flows ======================-->
1192
1193 <![ %HTML.Forms [
1194 <!ENTITY % block.forms "BLOCKQUOTE | FORM | ISINDEX">
1195 ]]>
1196
1197 <!ENTITY % block.forms "BLOCKQUOTE">
1198
1199 <![ %HTML.Deprecated [
1200 <!ENTITY % preformatted "PRE | XMP | LISTING">
1201 ]]>
1202
1203 <!ENTITY % preformatted "PRE">
1204
1205 <!ENTITY % block "P | %list | DL
1206 | %preformatted
1207 | %block.forms">
1208
1209 <!ENTITY % flow "(%text|%block)*">
1210
1211 <!ENTITY % pre.content "#PCDATA | A | HR | BR | SPAN | BDO">
1212 <!ELEMENT PRE - - (%pre.content)*>
1213 <!ATTLIST PRE
1214 %attrs;
1215 WIDTH NUMBER #implied
1216 %SDAFORM; "Lit"
1217 >
1218
1219 <!-- <PRE> Preformatted text -->
1220 <!-- <PRE WIDTH=...> Maximum characters per line -->
1221
1222 <![ %HTML.Deprecated [
1223
1224 <!ENTITY % literal "CDATA"
1225 -- historical, non-conforming parsing mode where
1226
1227
1228
1229 Expires 30 March 1996 [Page 22]
1230
1231 Internet Draft HTML internationalization 25 September 1995
1232
1233
1234 the only markup signal is the end tag
1235 in full
1236 -->
1237
1238 <!ELEMENT (XMP|LISTING) - - %literal>
1239 <!ATTLIST XMP
1240 %attrs;
1241 %SDAFORM; "Lit"
1242 %SDAPREF; "Example:&#RE;"
1243 >
1244 <!ATTLIST LISTING
1245 %attrs;
1246 %SDAFORM; "Lit"
1247 %SDAPREF; "Listing:&#RE;"
1248 >
1249
1250 <!-- <XMP> Example section -->
1251 <!-- <LISTING> Computer listing -->
1252
1253 <!ELEMENT PLAINTEXT - O %literal>
1254 <!-- <PLAINTEXT> Plain text passage -->
1255
1256 <!ATTLIST PLAINTEXT
1257 %attrs;
1258 %SDAFORM; "Lit"
1259 >
1260 ]]>
1261
1262
1263 <!--========== Lists ==================-->
1264
1265 <!ELEMENT DL - - (DT | DD)+>
1266 <!ATTLIST DL
1267 %attrs;
1268 COMPACT (COMPACT) #IMPLIED
1269 %SDAFORM; "List"
1270 %SDAPREF; "Definition List:"
1271 >
1272
1273 <!ELEMENT DT - O (%text)*>
1274 <!ATTLIST DT
1275 %attrs;
1276 %SDAFORM; "Term"
1277 >
1278
1279 <!ELEMENT DD - O %flow>
1280 <!ATTLIST DD
1281 %attrs;
1282
1283
1284
1285 Expires 30 March 1996 [Page 23]
1286
1287 Internet Draft HTML internationalization 25 September 1995
1288
1289
1290 %SDAFORM; "LItem"
1291 >
1292
1293 <!-- <DL> Definition list, or glossary -->
1294 <!-- <DL COMPACT> Compact style list -->
1295 <!-- <DT> Term in definition list -->
1296 <!-- <DD> Definition of term -->
1297
1298 <!ELEMENT (OL|UL) - - (LI)+>
1299 <!ATTLIST OL
1300 %attrs;
1301 %just;
1302 COMPACT (COMPACT) #IMPLIED
1303 %SDAFORM; "List"
1304 >
1305 <!ATTLIST UL
1306 %attrs;
1307 %just;
1308 COMPACT (COMPACT) #IMPLIED
1309 %SDAFORM; "List"
1310 >
1311 <!-- <UL> Unordered list -->
1312 <!-- <UL COMPACT> Compact list style -->
1313 <!-- <OL> Ordered, or numbered list -->
1314 <!-- <OL COMPACT> Compact list style -->
1315
1316
1317 <!ELEMENT (DIR|MENU) - - (LI)+ -(%block)>
1318 <!ATTLIST DIR
1319 %attrs;
1320 %just;
1321 COMPACT (COMPACT) #IMPLIED
1322 %SDAFORM; "List"
1323 %SDAPREF; "<LHead>Directory</LHead>"
1324 >
1325 <!ATTLIST MENU
1326 %attrs;
1327 %just;
1328 COMPACT (COMPACT) #IMPLIED
1329 %SDAFORM; "List"
1330 %SDAPREF; "<LHead>Menu</LHead>"
1331 >
1332
1333 <!-- <DIR> Directory list -->
1334 <!-- <DIR COMPACT> Compact list style -->
1335 <!-- <MENU> Menu list -->
1336 <!-- <MENU COMPACT> Compact list style -->
1337
1338
1339
1340
1341 Expires 30 March 1996 [Page 24]
1342
1343 Internet Draft HTML internationalization 25 September 1995
1344
1345
1346 <!ELEMENT LI - O %flow>
1347 <!ATTLIST LI
1348 %attrs;
1349 %just;
1350 %SDAFORM; "LItem"
1351 >
1352
1353 <!-- <LI> List item -->
1354
1355 <!--========== Document Body ===================-->
1356
1357 <![ %HTML.Recommended [
1358 <!ENTITY % body.content "(%heading|%block|HR|ADDRESS|IMG)*"
1359 -- <h1>Heading</h1>
1360 <p>Text ...
1361 is preferred to
1362 <h1>Heading</h1>
1363 Text ...
1364 -->
1365 ]]>
1366
1367 <!ENTITY % body.content "(%heading | %text | %block |
1368 HR | ADDRESS)*">
1369
1370 <!ELEMENT BODY O O %body.content>
1371 <!ATTLIST BODY
1372 %attrs;
1373 >
1374
1375 <!-- <BODY> Document body -->
1376
1377 <!ELEMENT BLOCKQUOTE - - %body.content>
1378 <!ATTLIST BLOCKQUOTE
1379 %attrs;
1380 %just;
1381 %SDAFORM; "BQ"
1382 >
1383
1384 <!-- <BLOCKQUOTE> Quoted passage -->
1385
1386 <!ELEMENT ADDRESS - - (%text|P)*>
1387 <!ATTLIST ADDRESS
1388 %attrs;
1389 %just;
1390 %SDAFORM; "Lit"
1391 %SDAPREF; "Address:&#RE;"
1392 >
1393
1394
1395
1396
1397 Expires 30 March 1996 [Page 25]
1398
1399 Internet Draft HTML internationalization 25 September 1995
1400
1401
1402 <!-- <ADDRESS> Address, signature, or byline -->
1403
1404
1405 <!--======= Forms ====================-->
1406
1407 <![ %HTML.Forms [
1408
1409 <!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
1410 <!ATTLIST FORM
1411 %attrs;
1412 ACTION CDATA #IMPLIED
1413 METHOD (%HTTP-Method) GET
1414 ENCTYPE %Content-Type; "application/x-www-form-urlencoded"
1415 ACCEPT-CHARSET CDATA #IMPLIED
1416 %SDAPREF; "<Para>Form:</Para>"
1417 %SDASUFF; "<Para>Form End.</Para>"
1418 >
1419
1420 <!-- <FORM> Fill-out or data-entry form -->
1421 <!-- <FORM ACTION="..."> Address for completed form -->
1422 <!-- <FORM METHOD=...> Method of submitting form -->
1423 <!-- <FORM ENCTYPE="..."> Representation of form data -->
1424
1425 <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
1426 RADIO | SUBMIT | RESET |
1427 IMAGE | HIDDEN | FILE )">
1428 <!ELEMENT INPUT - O EMPTY>
1429 <!ATTLIST INPUT
1430 %attrs;
1431 TYPE %InputType TEXT
1432 NAME CDATA #IMPLIED
1433 VALUE CDATA #IMPLIED
1434 SRC CDATA #IMPLIED
1435 CHECKED (CHECKED) #IMPLIED
1436 SIZE CDATA #IMPLIED
1437 MAXLENGTH NUMBER #IMPLIED
1438 ALIGN (top|middle|bottom) #IMPLIED
1439 ACCEPT CDATA #IMPLIED --list of content types --
1440 %SDAPREF; "Input: "
1441 >
1442
1443 <!-- <INPUT> Form input datum -->
1444 <!-- <INPUT TYPE=...> Type of input interaction -->
1445 <!-- <INPUT NAME=...> Name of form datum -->
1446 <!-- <INPUT VALUE="..."> Default/initial/selected value -->
1447 <!-- <INPUT SRC="..."> Address of image -->
1448 <!-- <INPUT CHECKED> Initial state is "on" -->
1449 <!-- <INPUT SIZE=...> Field size hint -->
1450
1451
1452
1453 Expires 30 March 1996 [Page 26]
1454
1455 Internet Draft HTML internationalization 25 September 1995
1456
1457
1458 <!-- <INPUT MAXLENGTH=...> Data length maximum -->
1459 <!-- <INPUT ALIGN=...> Image alignment -->
1460
1461 <!ELEMENT SELECT - - (OPTION+) -(INPUT|SELECT|TEXTAREA)>
1462 <!ATTLIST SELECT
1463 %attrs;
1464 NAME CDATA #REQUIRED
1465 SIZE NUMBER #IMPLIED
1466 MULTIPLE (MULTIPLE) #IMPLIED
1467 %SDAFORM; "List"
1468 %SDAPREF;
1469 "<LHead>Select #AttVal(Multiple)</LHead>"
1470 >
1471
1472 <!-- <SELECT> Selection of option(s) -->
1473 <!-- <SELECT NAME=...> Name of form datum -->
1474 <!-- <SELECT SIZE=...> Options displayed at a time -->
1475 <!-- <SELECT MULTIPLE> Multiple selections allowed -->
1476
1477 <!ELEMENT OPTION - O (#PCDATA)*>
1478 <!ATTLIST OPTION
1479 %attrs;
1480 SELECTED (SELECTED) #IMPLIED
1481 VALUE CDATA #IMPLIED
1482 %SDAFORM; "LItem"
1483 %SDAPREF;
1484 "Option: #AttVal(Value) #AttVal(Selected)"
1485 >
1486
1487 <!-- <OPTION> A selection option -->
1488 <!-- <OPTION SELECTED> Initial state -->
1489 <!-- <OPTION VALUE="..."> Form datum value for this option-->
1490
1491 <!ELEMENT TEXTAREA - - (#PCDATA)* -(INPUT|SELECT|TEXTAREA)>
1492 <!ATTLIST TEXTAREA
1493 %attrs;
1494 NAME CDATA #REQUIRED
1495 ROWS NUMBER #REQUIRED
1496 COLS NUMBER #REQUIRED
1497 %SDAFORM; "Para"
1498 %SDAPREF; "Input Text -- #AttVal(Name): "
1499 >
1500
1501 <!-- <TEXTAREA> An area for text input -->
1502 <!-- <TEXTAREA NAME=...> Name of form datum -->
1503 <!-- <TEXTAREA ROWS=...> Height of area -->
1504 <!-- <TEXTAREA COLS=...> Width of area -->
1505
1506
1507
1508
1509 Expires 30 March 1996 [Page 27]
1510
1511 Internet Draft HTML internationalization 25 September 1995
1512
1513
1514 ]]>
1515
1516
1517 <!--======= Document Head ======================-->
1518
1519 <![ %HTML.Recommended [
1520 <!ENTITY % head.extra "">
1521 ]]>
1522 <!ENTITY % head.extra "& NEXTID?">
1523
1524 <!ENTITY % head.content "TITLE & ISINDEX? & BASE? %head.extra">
1525
1526 <!ELEMENT HEAD O O (%head.content) +(META|LINK)>
1527 <!ATTLIST HEAD
1528 %attrs; >
1529
1530 <!-- <HEAD> Document head -->
1531
1532 <!ELEMENT TITLE - - (#PCDATA)* -(META|LINK)>
1533 <!ATTLIST TITLE
1534 %attrs;
1535 %SDAFORM; "Ti" >
1536
1537 <!-- <TITLE> Title of document -->
1538
1539 <!ELEMENT LINK - O EMPTY>
1540 <!ATTLIST LINK
1541 %attrs;
1542 HREF CDATA #REQUIRED
1543 %linkExtraAttributes;
1544 %SDAPREF; "Linked to : #AttVal (TITLE) (URN) (HREF)>" >
1545
1546 <!-- <LINK> Link from this document -->
1547 <!-- <LINK HREF="..."> Address of link destination -->
1548 <!-- <LINK URN="..."> Lasting name of destination -->
1549 <!-- <LINK REL=...> Relationship to destination -->
1550 <!-- <LINK REV=...> Relationship of destination to this -->
1551 <!-- <LINK TITLE="..."> Title of destination (advisory) -->
1552 <!-- <LINK CHARSET="..."> Charset of destination (advisory) -->
1553 <!-- <LINK METHODS="..."> Operations allowed (advisory) -->
1554
1555 <!ELEMENT ISINDEX - O EMPTY>
1556 <!ATTLIST ISINDEX
1557 %attrs;
1558 %SDAPREF;
1559 "<Para>[Document is indexed/searchable.]</Para>">
1560
1561 <!-- <ISINDEX> Document is a searchable index -->
1562
1563
1564
1565 Expires 30 March 1996 [Page 28]
1566
1567 Internet Draft HTML internationalization 25 September 1995
1568
1569
1570 <!ELEMENT BASE - O EMPTY>
1571 <!ATTLIST BASE
1572 HREF CDATA #REQUIRED >
1573
1574 <!-- <BASE> Base context document -->
1575 <!-- <BASE HREF="..."> Address for this document -->
1576
1577 <!ELEMENT NEXTID - O EMPTY>
1578 <!ATTLIST NEXTID
1579 N CDATA #REQUIRED >
1580
1581 <!-- <NEXTID> Next ID to use for link name -->
1582 <!-- <NEXTID N=...> Next ID to use for link name -->
1583
1584 <!ELEMENT META - O EMPTY>
1585 <!ATTLIST META
1586 HTTP-EQUIV NAME #IMPLIED
1587 NAME NAME #IMPLIED
1588 CONTENT CDATA #REQUIRED
1589 >
1590
1591 <!-- <META> Generic Meta-information -->
1592 <!-- <META HTTP-EQUIV=...> HTTP response header name -->
1593 <!-- <META NAME=...> Meta-information name -->
1594 <!-- <META CONTENT="..."> Associated information -->
1595
1596 <!--======= Document Structure =================-->
1597
1598 <![ %HTML.Deprecated [
1599 <!ENTITY % html.content "HEAD, BODY, PLAINTEXT?">
1600 ]]>
1601 <!ENTITY % html.content "HEAD, BODY">
1602
1603 <!ELEMENT HTML O O (%html.content)>
1604 <!ENTITY % version.attr "VERSION CDATA #FIXED '%HTML.Version;'">
1605
1606 <!ATTLIST HTML
1607 %attrs;
1608 %version.attr;
1609 %SDAFORM; "Book"
1610 >
1611
1612 <!-- <HTML> HTML Document -->
1613
1614
1615 7.2. SGML Declaration for HTML
1616
1617 <!SGML "ISO 8879:1986"
1618
1619
1620
1621 Expires 30 March 1996 [Page 29]
1622
1623 Internet Draft HTML internationalization 25 September 1995
1624
1625
1626 --
1627 SGML Declaration for HyperText Markup Language version 2.x
1628 (HTML 2.x).
1629
1630 --
1631
1632 CHARSET
1633 BASESET "ISO Registration Number 176//CHARSET
1634 ISO/IEC 10646-1:1993 UCS-2 with
1635 implementation level 3//ESC 2/5 2/15 4/5"
1636 DESCSET 0 9 UNUSED
1637 9 2 9
1638 11 2 UNUSED
1639 13 1 13
1640 14 18 UNUSED
1641 32 95 32
1642 127 1 UNUSED
1643 128 32 UNUSED
1644 160 65376 160
1645
1646
1647 CAPACITY SGMLREF
1648 TOTALCAP 150000
1649 GRPCAP 150000
1650 ENTCAP 150000
1651
1652 SCOPE DOCUMENT
1653 SYNTAX
1654 SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1655 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127
1656 BASESET "ISO 646:1983//CHARSET
1657 International Reference Version
1658 (IRV)//ESC 2/5 4/0"
1659 DESCSET 0 128 0
1660
1661 FUNCTION
1662 RE 13
1663 RS 10
1664 SPACE 32
1665 TAB SEPCHAR 9
1666
1667 NAMING LCNMSTRT ""
1668 UCNMSTRT ""
1669 LCNMCHAR ".-"
1670 UCNMCHAR ".-"
1671 NAMECASE GENERAL YES
1672 ENTITY NO
1673 DELIM GENERAL SGMLREF
1674
1675
1676
1677 Expires 30 March 1996 [Page 30]
1678
1679 Internet Draft HTML internationalization 25 September 1995
1680
1681
1682 SHORTREF SGMLREF
1683 NAMES SGMLREF
1684 QUANTITY SGMLREF
1685 ATTSPLEN 2100
1686 LITLEN 1024
1687 NAMELEN 72 -- somewhat arbitrary; taken from
1688 internet line length conventions --
1689 PILEN 1024
1690 TAGLVL 100
1691 TAGLEN 2100
1692 GRPGTCNT 150
1693 GRPCNT 64
1694
1695 FEATURES
1696 MINIMIZE
1697 DATATAG NO
1698 OMITTAG YES
1699 RANK NO
1700 SHORTTAG YES
1701 LINK
1702 SIMPLE NO
1703 IMPLICIT NO
1704 EXPLICIT NO
1705 OTHER
1706 CONCUR NO
1707 SUBDOC NO
1708 FORMAL YES
1709 APPINFO "SDA" -- conforming SGML Document Access application
1710 --
1711 >
1712
1713
1714 7.3. Entity sets
1715
1716 7.3.1. ISO Latin 1 Character Entity Set
1717
1718 The following public text lists each of the characters specified in the
1719 Added Latin 1 entity set, along with its name, syntax for use, and
1720 description. This list is derived from ISO Standard 8879:1986//ENTITIES
1721 Added Latin 1//EN. HTML includes the entire entity set, and adds enti-
1722 ties for all missing characters in the right part of ISO-8859-1.
1723
1724 <!-- (C) International Organization for Standardization 1986
1725 Permission to copy in any form is granted for use with
1726 conforming SGML systems and applications as defined in
1727 ISO 8879, provided this notice is included in all copies.
1728 -->
1729 <!-- Character entity set. Typical invocation:
1730
1731
1732
1733 Expires 30 March 1996 [Page 31]
1734
1735 Internet Draft HTML internationalization 25 September 1995
1736
1737
1738 <!ENTITY % ISOlat1 PUBLIC
1739 "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
1740 %ISOlat1;
1741 -->
1742 <!ENTITY nbsp CDATA "&#160;" -- no-break space -->
1743 <!ENTITY iexcl CDATA "&#161;" -- inverted exclamation mark -->
1744 <!ENTITY cent CDATA "&#162;" -- cent sign -->
1745 <!ENTITY pound CDATA "&#163;" -- pound sterling sign -->
1746 <!ENTITY curren CDATA "&#164;" -- general currency sign -->
1747 <!ENTITY yen CDATA "&#165;" -- yen sign -->
1748 <!ENTITY brvbar CDATA "&#166;" -- broken (vertical) bar -->
1749 <!ENTITY sect CDATA "&#167;" -- section sign -->
1750 <!ENTITY uml CDATA "&#168;" -- umlaut (dieresis) -->
1751 <!ENTITY copy CDATA "&#169;" -- copyright sign -->
1752 <!ENTITY ordf CDATA "&#170;" -- ordinal indicator, feminine -->
1753 <!ENTITY laquo CDATA "&#171;" -- angle quotation mark, left -->
1754 <!ENTITY not CDATA "&#172;" -- not sign -->
1755 <!ENTITY shy CDATA "&#173;" -- soft hyphen -->
1756 <!ENTITY reg CDATA "&#174;" -- registered sign -->
1757 <!ENTITY macr CDATA "&#175;" -- macron -->
1758 <!ENTITY deg CDATA "&#176;" -- degree sign -->
1759 <!ENTITY plusmn CDATA "&#177;" -- plus-or-minus sign -->
1760 <!ENTITY sup2 CDATA "&#178;" -- superscript two -->
1761 <!ENTITY sup3 CDATA "&#179;" -- superscript three -->
1762 <!ENTITY acute CDATA "&#180;" -- acute accent -->
1763 <!ENTITY micro CDATA "&#181;" -- micro sign -->
1764 <!ENTITY para CDATA "&#182;" -- pilcrow (paragraph sign) -->
1765 <!ENTITY middot CDATA "&#183;" -- middle dot -->
1766 <!ENTITY cedil CDATA "&#184;" -- cedilla -->
1767 <!ENTITY sup1 CDATA "&#185;" -- superscript one -->
1768 <!ENTITY ordm CDATA "&#186;" -- ordinal indicator, masculine -->
1769 <!ENTITY raquo CDATA "&#187;" -- angle quotation mark, right -->
1770 <!ENTITY frac14 CDATA "&#188;" -- fraction one-quarter -->
1771 <!ENTITY frac12 CDATA "&#189;" -- fraction one-half -->
1772 <!ENTITY frac34 CDATA "&#190;" -- fraction three-quarters -->
1773 <!ENTITY iquest CDATA "&#191;" -- inverted question mark -->
1774 <!ENTITY Agrave CDATA "&#192;" -- capital A, grave accent -->
1775 <!ENTITY Aacute CDATA "&#193;" -- capital A, acute accent -->
1776 <!ENTITY Acirc CDATA "&#194;" -- capital A, circumflex accent -->
1777 <!ENTITY Atilde CDATA "&#195;" -- capital A, tilde -->
1778 <!ENTITY Auml CDATA "&#196;" -- capital A, dieresis or umlaut mark -->
1779 <!ENTITY Aring CDATA "&#197;" -- capital A, ring -->
1780 <!ENTITY AElig CDATA "&#198;" -- capital AE diphthong (ligature) -->
1781 <!ENTITY Ccedil CDATA "&#199;" -- capital C, cedilla -->
1782 <!ENTITY Egrave CDATA "&#200;" -- capital E, grave accent -->
1783 <!ENTITY Eacute CDATA "&#201;" -- capital E, acute accent -->
1784 <!ENTITY Ecirc CDATA "&#202;" -- capital E, circumflex accent -->
1785 <!ENTITY Euml CDATA "&#203;" -- capital E, dieresis or umlaut mark -->
1786
1787
1788
1789 Expires 30 March 1996 [Page 32]
1790
1791 Internet Draft HTML internationalization 25 September 1995
1792
1793
1794 <!ENTITY Igrave CDATA "&#204;" -- capital I, grave accent -->
1795 <!ENTITY Iacute CDATA "&#205;" -- capital I, acute accent -->
1796 <!ENTITY Icirc CDATA "&#206;" -- capital I, circumflex accent -->
1797 <!ENTITY Iuml CDATA "&#207;" -- capital I, dieresis or umlaut mark -->
1798 <!ENTITY ETH CDATA "&#208;" -- capital Eth, Icelandic -->
1799 <!ENTITY Ntilde CDATA "&#209;" -- capital N, tilde -->
1800 <!ENTITY Ograve CDATA "&#210;" -- capital O, grave accent -->
1801 <!ENTITY Oacute CDATA "&#211;" -- capital O, acute accent -->
1802 <!ENTITY Ocirc CDATA "&#212;" -- capital O, circumflex accent -->
1803 <!ENTITY Otilde CDATA "&#213;" -- capital O, tilde -->
1804 <!ENTITY Ouml CDATA "&#214;" -- capital O, dieresis or umlaut mark -->
1805 <!ENTITY times CDATA "&#215;" -- multiply sign -->
1806 <!ENTITY Oslash CDATA "&#216;" -- capital O, slash -->
1807 <!ENTITY Ugrave CDATA "&#217;" -- capital U, grave accent -->
1808 <!ENTITY Uacute CDATA "&#218;" -- capital U, acute accent -->
1809 <!ENTITY Ucirc CDATA "&#219;" -- capital U, circumflex accent -->
1810 <!ENTITY Uuml CDATA "&#220;" -- capital U, dieresis or umlaut mark -->
1811 <!ENTITY Yacute CDATA "&#221;" -- capital Y, acute accent -->
1812 <!ENTITY THORN CDATA "&#222;" -- capital Thorn, Icelandic -->
1813 <!ENTITY szlig CDATA "&#223;" -- small sharp s, German (sz ligature) -->
1814 <!ENTITY agrave CDATA "&#224;" -- small a, grave accent -->
1815 <!ENTITY aacute CDATA "&#225;" -- small a, acute accent -->
1816 <!ENTITY acirc CDATA "&#226;" -- small a, circumflex accent -->
1817 <!ENTITY atilde CDATA "&#227;" -- small a, tilde -->
1818 <!ENTITY auml CDATA "&#228;" -- small a, dieresis or umlaut mark -->
1819 <!ENTITY aring CDATA "&#229;" -- small a, ring -->
1820 <!ENTITY aelig CDATA "&#230;" -- small ae diphthong (ligature) -->
1821 <!ENTITY ccedil CDATA "&#231;" -- small c, cedilla -->
1822 <!ENTITY egrave CDATA "&#232;" -- small e, grave accent -->
1823 <!ENTITY eacute CDATA "&#233;" -- small e, acute accent -->
1824 <!ENTITY ecirc CDATA "&#234;" -- small e, circumflex accent -->
1825 <!ENTITY euml CDATA "&#235;" -- small e, dieresis or umlaut mark -->
1826 <!ENTITY igrave CDATA "&#236;" -- small i, grave accent -->
1827 <!ENTITY iacute CDATA "&#237;" -- small i, acute accent -->
1828 <!ENTITY icirc CDATA "&#238;" -- small i, circumflex accent -->
1829 <!ENTITY iuml CDATA "&#239;" -- small i, dieresis or umlaut mark -->
1830 <!ENTITY eth CDATA "&#240;" -- small eth, Icelandic -->
1831 <!ENTITY ntilde CDATA "&#241;" -- small n, tilde -->
1832 <!ENTITY ograve CDATA "&#242;" -- small o, grave accent -->
1833 <!ENTITY oacute CDATA "&#243;" -- small o, acute accent -->
1834 <!ENTITY ocirc CDATA "&#244;" -- small o, circumflex accent -->
1835 <!ENTITY otilde CDATA "&#245;" -- small o, tilde -->
1836 <!ENTITY ouml CDATA "&#246;" -- small o, dieresis or umlaut mark -->
1837 <!ENTITY divide CDATA "&#247;" -- divide sign -->
1838 <!ENTITY oslash CDATA "&#248;" -- small o, slash -->
1839 <!ENTITY ugrave CDATA "&#249;" -- small u, grave accent -->
1840 <!ENTITY uacute CDATA "&#250;" -- small u, acute accent -->
1841 <!ENTITY ucirc CDATA "&#251;" -- small u, circumflex accent -->
1842
1843
1844
1845 Expires 30 March 1996 [Page 33]
1846
1847 Internet Draft HTML internationalization 25 September 1995
1848
1849
1850 <!ENTITY uuml CDATA "&#252;" -- small u, dieresis or umlaut mark -->
1851 <!ENTITY yacute CDATA "&#253;" -- small y, acute accent -->
1852 <!ENTITY thorn CDATA "&#254;" -- small thorn, Icelandic -->
1853 <!ENTITY yuml CDATA "&#255;" -- small y, dieresis or umlaut mark -->
1854
1855
1856 Bibliography
1857
1858 [BRYAN88] M. Bryan, "SGML -- An Author's Guide to the Standard
1859 Generalized Markup Language", Addison-Wesley, Reading,
1860 1988.
1861
1862 [ERCS] Extended Reference Concrete Syntax for SGML.
1863 <http://www.sgmlopen.org/sgml/docs/ercs/ercs-
1864 home.html>
1865
1866 [ETHNO] "Ethnologue, Languages of the World", 12th Edition,
1867 Barbara F. Grimes editor, Summer Institute of Linguis-
1868 tics, Dallas, 1992.
1869
1870 [FILE-UPLOAD] E. Nebel and L. Masinter, "Form-based File Upload in
1871 HTML", Work in progress (draft-ietf-html-
1872 fileupload-03.txt), Xerox Corporation, August 1995.
1873
1874 [GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed.,
1875 Oxford University Press, 1990.
1876
1877 [HTML-2] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan-
1878 guage - 2.0", Work in progress (draft-ietf-html-
1879 spec-05.txt), MIT/W3C, August 1995.
1880
1881 [HTTP] T. Berners-Lee, R. T. Fielding, and H. Frystyk
1882 Nielsen, "Hypertext Transfer Protocol - HTTP/1.0",
1883 Work in progress (draft-ietf-http-v10-spec-00.ps),
1884 MIT, UC Irvine, CERN, March 1995.
1885
1886 [ISO-639] ISO 639:1988. Codes pour la reprsentation des noms de
1887 langue. Technical content in
1888 <http://www.sil.org/sgml/iso639a.html>
1889
1890 [ISO-1000] ISO 1000:1992. Units SI et recommandations pour
1891 l'emploi de leurs multiples et de certaines autres
1892 units.
1893
1894 [ISO-3166] ISO 3166:1993. Codes pour la reprsentation des noms
1895 de pays.
1896
1897 [ISO-4217] ISO 4217:1990. Codes pour la reprsentation des
1898
1899
1900
1901 Expires 30 March 1996 [Page 34]
1902
1903 Internet Draft HTML internationalization 25 September 1995
1904
1905
1906 monnaies et types des fonds.
1907
1908 [ISO-8601] ISO 8601:1988. lments de donnes et formats
1909 d'change -- change d'information -- Reprsentation
1910 de la date et de l'heure.
1911
1912 [ISO-8859-1] ISO 8859-1:1987. International Standard -- Informa-
1913 tion Processing -- 8-bit Single-Byte Coded Graphic
1914 Character Sets -- Part 1: Latin Alphabet No. 1.
1915
1916 [ISO-8879] ISO 8879:1986. International Standard -- Information
1917 Processing -- Text and Office Systems -- Standard Gen-
1918 eralized Markup Language (SGML).
1919
1920 [ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Infor-
1921 mation technology -- Universal Multiple-Octet Coded
1922 Character Set (UCS) -- Part 1: Architecture and Basic
1923 Multilingual Plane.
1924
1925 [NICOL] G.T. Nicol, "The Multilingual World Wide Web", Elec-
1926 tronic Book Technologies, 1995,
1927 <http://www.ebt.com/docs/multling.html>
1928
1929 [RFC1468] J. Murai, M. Crispin and E. van der Poel, "Japanese
1930 Character Encoding for Internet Messages", RFC 1468,
1931 Keio University, Panda Programming, June 1993.
1932
1933 [RFC1521] N. Borenstein and N. Freed, "MIME (Multipurpose Inter-
1934 net Mail Extensions) Part One: Mechanisms for Specify-
1935 ing and Describing the Format of Internet Message Bod-
1936 ies", RFC 1521, Bellcore, Innosoft, September 1993.
1937
1938 [RFC1590] J. Postel, "Media Type Registration Procedure", RFC
1939 1590, USC/ISI, March 1994.
1940
1941 [RFC1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform
1942 Resource Locators (URL)", RFC 1738, CERN, Xerox PARC,
1943 University of Minnesota, October 1994.
1944
1945 [RFC1766] H. Alverstrand, "Tags for the Identification of Lan-
1946 guages", RFC 1766, UNINETT, March 1995.
1947
1948 [SQ91] SoftQuad, "The SGML Primer", 3rd ed., SoftQuad Inc.,
1949 1991.
1950
1951 [TAKADA] Toshihiro Takada, "Multilingual Information Exchange
1952 through the World-Wide Web", Computer Networks and
1953 ISDN Systems, Vol. 27, No. 2, Nov. 1994 , p. 235-241.
1954
1955
1956
1957 Expires 30 March 1996 [Page 35]
1958
1959 Internet Draft HTML internationalization 25 September 1995
1960
1961
1962 [TEI] TEI Guidelines for Electronic Text Encoding and Inter-
1963 change. <http://etext.virgina.edu/TEI.html>
1964
1965 [UNICODE] The Unicode Consortium, "The Unicode Standard --
1966 Worldwide Character Encoding -- Version 1.0", Addison-
1967 Wesley, Volume 1, 1991, Volume 2, 1992. The BIDI
1968 algorithm is in appendix A of volume 1, with correc-
1969 tions in appendix D of volume 2.
1970
1971 [VANH90] E. van Hervijnen, "Practical SGML", Kluwer Academicq
1972 Publishers Group, Norwell and Dordrecht, 1990.
1973
1974 Authors' Addresses
1975
1976 Franois Yergeau
1977 Alis Technologies
1978 3410, rue Griffith
1979 Montral QC H4T 1A7
1980 Canada
1981
1982 Tel: +1 (514) 738-9171
1983 Fax: +1 (514) 342-0318
1984 EMail: yergeau@alis.ca
1985
1986
1987 Gavin Thomas Nicol
1988 Electronic Book Technologies, Japan
1989 1-29-9 Tsurumaki,
1990 Setagaya-ku,
1991 Tokyo
1992 Japan
1993
1994 Tel + Fax: +81-3-3706-7351
1995 EMail: gtn@ebt.com, gtn@twics.co.jp
1996
1997
1998 Glenn Adams
1999 Stonehand
2000 118 Magazine Street
2001 Cambridge, MA 02139
2002 U.S.A.
2003
2004 Tel: +1 (617) 864-5524
2005 Fax: +1 (617) 864-4965
2006 EMail: glenn@stonehand.com
2007
2008
2009 Martin J. Duerst
2010
2011
2012
2013 Expires 30 March 1996 [Page 36]
2014
2015 Internet Draft HTML internationalization 25 September 1995
2016
2017
2018 Multimedia-Laboratory
2019 Departement of Computer Science
2020 University of Zurich
2021 Winterthurerstrasse 190
2022 CH-8057 Zurich
2023 Switzerland
2024
2025 Tel: +41 1 257 43 16
2026 Fax: +41 1 363 00 35
2027 E-mail: mduerst@ifi.unizh.ch
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069 Expires 30 March 1996 [Page 37]
2070

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24