/[suikacvs]/webroot/www/2004/id/draft-ietf-html-i18n-04.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-html-i18n-04.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:04 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2
3
4 Network Working Group F. Yergeau
5 Internet Draft G. Nicol
6 <draft-ietf-html-i18n-04.txt> G. Adams
7 Expires 2 December 1996 M. Duerst
8 27 May 1996
9
10
11 Internationalization of the Hypertext Markup Language
12
13
14 Status of this Memo
15
16 This document is an Internet-Draft. Internet-Drafts are working doc-
17 uments of the Internet Engineering Task Force (IETF), its areas, and
18 its working groups. Note that other groups may also distribute work-
19 ing documents as Internet-Drafts.
20
21 Internet-Drafts are draft documents valid for a maximum of six
22 months. Internet-Drafts may be updated, replaced, or obsoleted by
23 other documents at any time. It is not appropriate to use Internet-
24 Drafts as reference material or to cite them other than as a "working
25 draft" or "work in progress".
26
27 To learn the current status of any Internet-Draft, please check the
28 1id-abstracts.txt listing contained in the Internet-Drafts Shadow
29 Directories on ds.internic.net (US East Coast), nic.nordu.net
30 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
31 Rim).
32
33 Distribution of this document is unlimited. Please send comments to
34 the HTML working group (HTML-WG) of the Internet Engineering Task
35 Force (IETF) at <html-wg@w3.org>. Subscription address is <html-wg-
36 request@w3.org>. Discussions of the group are archived at
37 <URL:http://www.acl.lanl.gov/HTML_WG/archives.html>.
38
39
40 Abstract
41
42 The Hypertext Markup Language (HTML) is a simple markup language used
43 to create hypertext documents that are platform independent. Ini-
44 tially, the application of HTML on the World Wide Web was seriously
45 restricted by its reliance on the ISO-8859-1 coded character set,
46 which is appropriate only for Western European languages. Despite
47 this restriction, HTML has been widely used with other languages,
48 using other coded character sets or character encodings, at the
49 expense of interoperability.
50
51 This document is meant to address the issue of the
52
53
54
55 Expires 2 December 1996 [Page 1]
56
57 Internet Draft HTML internationalization 27 May 1996
58
59
60 internationalization (i18n, i followed by 18 letters followed by n)
61 of HTML by extending the specification of HTML and giving additional
62 recommendations for proper internationalization support. A foremost
63 consideration is to make sure that HTML remains a valid application
64 of SGML, while enabling its use in all languages of the world.
65
66
67 Table of contents
68
69 1. Introduction .................................................. 2
70 1.1. Scope ...................................................... 3
71 1.2. Conformance ................................................ 3
72 2. The document character set ..................................... 4
73 2.1. Reference processing model ................................. 4
74 2.2. The document character set ................................. 6
75 2.3. Undisplayable characters ................................... 8
76 3. The LANG attribute.............................................. 8
77 4. Additional entities, attributes and elements ................... 9
78 4.1. Full Latin-1 entity set .................................... 9
79 4.2. Markup for language-dependent presentation ................. 9
80 5. Forms ..........................................................15
81 5.1. DTD additions ..............................................15
82 5.2. Form submission ............................................15
83 6. Miscellaneous ..................................................17
84 7. HTML public text ...............................................18
85 7.1. HTML DTD ...................................................18
86 7.2. SGML declaration for HTML ..................................34
87 7.3. ISO Latin 1 character entity set ...........................35
88 Bibliography ......................................................38
89 Authors' Addresses ................................................40
90
91
92 1. Introduction
93
94 The Hypertext Markup Language (HTML) is a simple markup language used
95 to create hypertext documents that are platform independent. Ini-
96 tially, the application of HTML on the World Wide Web was seriously
97 restricted by its reliance on the ISO-8859-1 coded character set,
98 which is appropriate only for Western European languages. Despite
99 this restriction, HTML has been widely used with other languages,
100 using other coded character sets or character encodings, through var-
101 ious ad hoc extensions to the language [TAKADA].
102
103 This document is meant to address the issue of the internationaliza-
104 tion of HTML by extending the specification of HTML and giving addi-
105 tional recommendations for proper internationalization support. It
106 is in good part based on a paper by one of the authors on multilin-
107 gualism on the WWW [NICOL]. A foremost consideration is to make sure
108
109
110
111 Expires 2 December 1996 [Page 2]
112
113 Internet Draft HTML internationalization 27 May 1996
114
115
116 that HTML remains a valid application of SGML, while enabling its use
117 in all languages of the world.
118
119 The specific issues addressed are the SGML document character set to
120 be used for HTML, the proper treatment of the charset parameter asso-
121 ciated with the "text/html" content type and the specification of
122 some additional elements and entities.
123
124
125 1.1 Scope
126
127 HTML has been in use by the World-Wide Web (WWW) global information
128 initiative since 1990. This specification extends the capabilities
129 of HTML 2.0 (RFC 1866), primarily by removing the restriction to the
130 ISO-8859-1 coded character set [ISO-8859-1].
131
132 HTML is an application of ISO Standard 8879:1986, Information Pro-
133 cessing Text and Office Systems -- Standard Generalized Markup Lan-
134 guage (SGML) [ISO-8879]. The HTML Document Type Definition (DTD) is a
135 formal definition of the HTML syntax in terms of SGML. This specifi-
136 cation amends the DTD of HTML in order to make it applicable to docu-
137 ments encompassing a character repertoire much larger than that of
138 ISO-8859-1, while still remaining SGML conformant.
139
140 Both formal and actual development of HTML are advancing very fast.
141 The features described in this document are designed so that they can
142 (and should) be added to other forms of HTML besides that described
143 in RFC 1866. Where indicated, attributes introduced here should be
144 extended to the appropriate elements.
145
146
147 1.2 Conformance
148
149 This specification changes slightly the conformance requirements of
150 HTML documents and HTML user agents.
151
152 1.2.1 Documents
153
154 All HTML 2.0 conforming documents remain conforming with this speci-
155 fication. However, the extensions introduced here make valid cer-
156 tains documents that would not be HTML 2.0 conforming, in particular
157 those containing characters or character references outside of the
158 repertoire of ISO 8859-1, and those containing markup introduced
159 herein.
160
161
162
163
164
165
166
167 Expires 2 December 1996 [Page 3]
168
169 Internet Draft HTML internationalization 27 May 1996
170
171
172 1.2.2. User agents
173
174 In addition to the requirements of RFC 1866, the following require-
175 ments are placed on HTML user agents.
176
177 To ensure interoperability and proper support for at least
178 ISO-8859-1 in an environment where character encoding schemes
179 other than ISO-8859-1 are present, user agents must correctly
180 interpret the charset parameter accompanying an HTML document
181 received from the network.
182
183 Furthermore, conforming user-agents are required to at least parse
184 correctly all numeric character references within the range of ISO
185 10646-1 [ISO-10646].
186
187 Conforming user-agents are required to apply the BIDI presentation
188 algorithm if they display right-to-left characters. If there is
189 no displayable right-to-left character in a document, there is no
190 need to apply BIDI processing.
191
192 2. The document character set
193
194 2.1. Reference processing model
195
196 This overview explains a reference processing model used for HTML,
197 and in particular the SGML concept of a document character set. An
198 actual implementation may widely differ in its internal workings from
199 the model given below, but should behave as described to an outside
200 observer.
201
202 Because there are various widely differing encodings of text, SGML
203 does not directly address the question of how characters are encoded
204 e.g. in a file. SGML views the characters as a single set (called a
205 "character repertoire"), and a "code set" that assigns an integer
206 number (known as "character number") to each character in the reper-
207 toire. The document character set declaration defines what each of
208 the character numbers represents [GOLD90, p. 451]. In most cases, an
209 SGML DTD and all documents that refer to it have a single document
210 character set, and all markup and data characters are part of this
211 set.
212
213 HTML, as an application of SGML, does not directly address the ques-
214 tion of how characters are encoded as octets in external representa-
215 tions such as files. This is deferred to mechanisms external to HTML,
216 such as MIME as used by the HTTP protocol or by electronic mail.
217
218 For the HTTP protocol [RFC1945], the way characters are encoded is
219
220
221
222
223 Expires 2 December 1996 [Page 4]
224
225 Internet Draft HTML internationalization 27 May 1996
226
227
228 defined by the "charset" parameter[1] of the "Content-Type" field of
229 the header of an HTTP response. For example, to indicate that the
230 transmitted document is encoded in the "JIS" encoding of Japanese
231 [RFC1468], the header will contain the following line:
232
233 Content-Type: text/html; charset=ISO-2022-JP
234
235 The HTTP protocol also defines a mechanism for the client to specify
236 the character encodings it can accept. Clients and servers are
237 strongly requested to use these mechanisms to assure correct trans-
238 mission and interpretation of any document. Provisions that can be
239 taken to help correct interpretation, even in cases where a server or
240 client do not yet use these mechanisms, are described in section 6.
241
242 Similarly, if HTML documents are transferred by electronic mail, the
243 character encoding is defined by the "charset" parameter of the "Con-
244 tent-Type" MIME header line [RFC1521], and defaults to US-ASCII in
245 its absence.
246
247 In the case any other way of transferring and storing HTML documents
248 are defined or become popular, it is advised that similar provisions
249 be made to clearly identify the character encoding used and/or to use
250 a single/default encoding capable of representing the widest range of
251 characters used in an international context.
252
253 Whatever the external character encoding may be, the reference pro-
254 cessing model translates it to a representation of the document char-
255 acter set specified in Section 2.2 before processing specific to
256 SGML/HTML. The reference processing model can be depicted as fol-
257 lows:
258
259 [resource]->[decoder]->[entity ]->[ SGML ]->[application]->[display]
260 [manager] [parser]
261 ^ |
262 | |
263 +----------+
264
265 The decoder is responsible for decoding the external representation
266 of the resource to a representation using the document character set.
267 The entity manager, the parser, and the application deal only with
268 characters of the document character set. A display-oriented part of
269 the application or the display machinery itself may again convert
270 -----------
271 1 The term "charset" in MIME is used to designate a char-
272 acter encoding, rather than a coded character set as the
273 term may suggest. A character encoding is a mapping (possi-
274 bly many-to-one) of a sequence of octets to a sequence of
275 characters taken from one or more character repertoires.
276
277
278
279 Expires 2 December 1996 [Page 5]
280
281 Internet Draft HTML internationalization 27 May 1996
282
283
284 characters represented in the document character set to some other
285 representation more suitable for their purpose. In any case, the
286 entity manager, the parser, and the application, as far as character
287 semantics are concerned, are using the HTML document character set
288 only.
289
290 An actual implementation may choose, or not, to translate the docu-
291 ment into some encoding of the document character set as described
292 above; the behaviour described by this reference processing model can
293 be achieved otherwise. This subject is well out of the scope of this
294 specification, however, and the reader is invited to consult the SGML
295 standard [ISO-8879] or an SGML handbook [BRYAN88] [GOLD90] [VANH90]
296 [SQ91] for further information.
297
298 The most important consequence of this reference processing model is
299 that numeric character references are always resolved with respect to
300 the fixed document character set, and thus to the same characters,
301 whatever the external encoding actually used. For an example, see
302 Section 2.2.
303
304 2.2. The document character set
305
306 The document character set, in the SGML sense, is the Universal Char-
307 acter Set (UCS) of ISO 10646:1993 [ISO-10646], as amended. Cur-
308 rently, this is code-by-code identical with the Unicode standard,
309 version 1.1 [UNICODE].
310
311 NOTE -- implementers should be aware that ISO 10646 is
312 amended from time to time; 4 amendments have been adopted
313 since the initial 1993 publication, none of which signifi-
314 cantly affects this specification. A fifth amendment, now
315 under consideration, will introduce incompatible changes to
316 the standard: 6556 Korean Hangul syllables allocated
317 between code positions 3400 and 4DFF (hexadecimal) will be
318 moved to new positions (and 4516 new syllables added), thus
319 making references to the old positions invalid. Since the
320 Unicode consortium has already adopted a corresponding
321 amendment for inclusion in the forthcoming Unicode 2.0,
322 adoption of DAM 5 is considered likely and implementers
323 should probably consider the old code positions as already
324 invalid. Despite this one-time change, the relevant stan-
325 dard bodies appear to remain committed not to change any
326 allocated code position in the future. To encode Korean
327 Hangul irrespective of these changes, the combining Hangul
328 Jamo in the range 1110-11F9 can be used.
329
330 The adoption of this document character set implies a change in the
331 SGML declaration specified in the HTML 2.0 specification (section 9.5
332
333
334
335 Expires 2 December 1996 [Page 6]
336
337 Internet Draft HTML internationalization 27 May 1996
338
339
340 of [RFC1866]). The change amounts to removing the first BASESET
341 specification and its accompanying DESCSET declaration, replacing
342 them with the following declaration:
343
344 BASESET "ISO Registration Number 177//CHARSET
345 ISO/IEC 10646-1:1993 UCS-4 with implementation level 3
346 //ESC 2/5 2/15 4/6"
347 DESCSET 0 9 UNUSED
348 9 2 9
349 11 2 UNUSED
350 13 1 13
351 14 18 UNUSED
352 32 95 32
353 127 1 UNUSED
354 128 32 UNUSED
355 160 2147483486 160
356
357 Making the UCS the document character set does not create non-
358 conformance of any expression, construct or document that is conform-
359 ing to HTML 2.0. It does make conforming certain constructs that are
360 not admissible in HTML 2.0. One consequence is that data characters
361 outside the repertoire of ISO-8859-1, but within that of UCS-4 become
362 valid SGML characters. Another is that the upper limit of the range
363 of numeric character references is extended from 255 to 2147483645;
364 thus, &#1048; is a valid reference to a "CYRILLIC CAPITAL LETTER I".
365 [ERCS] is a good source of information on Unicode and SGML, although
366 its scope and technical content differ greatly from this specifica-
367 tion.
368
369 NOTE -- the above SGML declaration, like that of HTML 2.0,
370 specifies the character numbers 128 to 159 (80 to 9F hex)
371 as UNUSED. This means that numeric character references
372 within that range (e.g. &#146;) are illegal in HTML. Nei-
373 ther ISO 8859-1 nor ISO 10646 contain characters in that
374 range, which is reserved for control characters.
375
376 ISO 10646-1:1993 is the most encompassing character set currently
377 existing, and there is no other character set that could take its
378 place as the document character set for HTML. If nevertheless for a
379 specific application there is a need to use characters outside this
380 standard, this should be done by avoiding any conflicts with present
381 or future versions of ISO 10646, i.e. by assigning these characters
382 to a private zone. Also, it should be borne in mind that such a use
383 will be highly unportable; in many cases, it may be better to use
384 inline bitmaps.
385
386
387
388
389
390
391 Expires 2 December 1996 [Page 7]
392
393 Internet Draft HTML internationalization 27 May 1996
394
395
396 2.3. Undisplayable characters
397
398 With the document character set being the full ISO 10646, the possi-
399 bility that a character cannot be displayed due to lack of appropri-
400 ate resources (fonts) cannot be avoided. Because there are many dif-
401 ferent things that can be done in such a case, this document does not
402 prescribe any specific behaviour. Depending on the implementation,
403 this may also be handled by the underlaying display system and not
404 the application itself. The following considerations, however, may
405 be of help:
406
407 - A clearly visible, but unobtrusive behaviour should be preferred.
408 Some documents may contain many characters that cannot be renden-
409 dered, and so showing an alert for each of them is not the right
410 thing to do.
411
412 - In case a numeric representation of the missing character is
413 given, its hexadecimal (not decimal) form is to be preferred,
414 because this form is used in character set standards [ERCS].
415
416 3. The LANG attribute
417
418 Language tags can be used to control rendering of a marked up docu-
419 ment in various ways: glyph disambiguation, in cases where the char-
420 acter encoding is not sufficient to resolve to a specific glyph; quo-
421 tation marks; hyphenation; ligatures; spacing; voice synthesis; etc.
422 Independently of rendering issues, language markup is useful as con-
423 tent markup for purposes such as classification and searching.
424
425 Since any text can logically be assigned a language, almost all HTML
426 elements admit the LANG attribute. The DTD reflects this. It is
427 also intended that any new element introduced in later versions of
428 HTML will admit the LANG attribute, unless there is a good reason not
429 to do so.
430
431 The language attribute, LANG, takes as its value a language tag that
432 identifies a natural language spoken, written, or otherwise conveyed
433 by human beings for communication of information to other human
434 beings. Computer languages are explicitly excluded.
435
436 The syntax and registry of HTML language tags is the same as that
437 defined by RFC 1766 [RFC1766]. In summary, a language tag is composed
438 of one or more parts: A primary language tag and a possibly empty
439 series of subtags:
440
441 language-tag = primary-tag *( "-" subtag )
442 primary-tag = 1*8ALPHA
443 subtag = 1*8ALPHA
444
445
446
447 Expires 2 December 1996 [Page 8]
448
449 Internet Draft HTML internationalization 27 May 1996
450
451
452 Whitespace is not allowed within the tag and all tags are case-
453 insensitive. The namespace of language tags is administered by the
454 IANA. Example tags include:
455
456 en, en-US, en-cockney, i-cherokee, x-pig-latin
457
458 In the context of HTML, a language tag is not to be interpreted as a
459 single token, as per RFC 1766, but as a hierarchy. For example, a
460 user agent that adjusts rendering according to language should con-
461 sider that it has a match when a language tag in a style sheet entry
462 matches the initial portion of the language tag of an element. An
463 exact match should be preferred. This interpretation allows an ele-
464 ment marked up as, for instance, "en-US" to trigger styles corre-
465 sponding to, in order of preference, US-English ("en-US") or 'plain'
466 or 'international' English ("en").
467
468 NOTE -- using the language tag as a hierarchy does not
469 imply that all languages with a common prefix will be
470 understood by those fluent in one or more of those lan-
471 guages; it simply allows the user to request this commonal-
472 ity when it is true for that user.
473
474 The rendering of elements may be affected by the LANG attribute. For
475 any element, the value of the LANG attribute overrides the value
476 specified by the LANG attribute of any enclosing element and the
477 value (if any) of the HTTP Content-Language header. If none of these
478 are set, a suitable default, perhaps controlled by user preferences,
479 by automatic context analysis or by the user's locale, should be used
480 to control rendering.
481
482 4. Additional entities, attributes and elements
483
484 4.1. Full Latin-1 entity set
485
486 According to the suggestion of section 14 of [RFC1866], the set of
487 Latin-1 entities is extended to cover the whole right part of
488 ISO-8859-1 (all code positions with the high-order bit set), includ-
489 ing the already commonly used &nbsp;, &copy; and &reg;. The names of
490 the entities are taken from the appendices of SGML [ISO-8879]. A
491 list is provided in section 7.3 of this specification.
492
493 4.2. Markup for language-dependent presentation
494
495
496 4.2.1. Overview
497
498 For the correct presentation of text in certain languages (irrespec-
499 tive of formatting issues), some support in the form of additional
500
501
502
503 Expires 2 December 1996 [Page 9]
504
505 Internet Draft HTML internationalization 27 May 1996
506
507
508 entities and elements is needed.
509
510 In particular, the following features are dealt with:
511
512 - Markup of bidirectional text, i.e. text where left-to-right and
513 right-to-left scripts are mixed.
514
515 - Control of cursive joining behaviour in contexts where the default
516 behaviour is not appropriate.
517
518 - Language-dependent rendering of short (in-line) quotations.
519
520 - Better justification control for languages where this is impor-
521 tant.
522
523 - Superscripts and subscripts for languages where they appear as
524 part of general text.
525
526 Some of the above features need very little additional support; oth-
527 ers need more. The additional features are introduced below with
528 brief comments only. Explanations on cursive joining behaviour and
529 bidirectional text follow later. For cursive joining behaviour and
530 bidirectional text, this document follows [UNICODE] in that: i) char-
531 acter semantics, where applicable, are identical to [UNICODE], and
532 ii) where functionality is moved to HTML as a higher level protocol,
533 this is done in a way that allows straightforward conversion to the
534 lower-level mechanisms defined in [UNICODE].
535
536
537 4.2.2. List of entities, elements, and attributes
538
539 First, a generic container is needed to carry the LANG and DIR (see
540 below) attributes in cases where no other element is appropriate; the
541 SPAN element is introduced for that purpose.
542
543 A set of named character entities is added for use with bidirectional
544 rendering and cursive joining control:
545
546 <!ENTITY zwnj CDATA "&#8204;"--=zero width non-joiner-->
547 <!ENTITY zwj CDATA "&#8205;"--=zero width joiner-->
548 <!ENTITY lrm CDATA "&#8206;"--=left-to-right mark-->
549 <!ENTITY rlm CDATA "&#8207;"--=right-to-left mark-->
550
551 These entities can be used in place of the corresponding formatting
552 characters whenever convenient, for example to ease keyboard entry or
553 when a formatting character is not available in the character encod-
554 ing of the document.
555
556
557
558
559 Expires 2 December 1996 [Page 10]
560
561 Internet Draft HTML internationalization 27 May 1996
562
563
564 Next, an attribute called DIR is introduced, restricted to the values
565 LTR (left-to-right) and RTL (right-to-left) and admitted by most ele-
566 ments, for the indication of directionality in the context of bidi-
567 rectional text (see 4.2.4 below for details). Since any text and
568 many other elements (e.g. tables) can logically be assigned a direc-
569 tionality, almost all HTML elements admit the DIR attribute. The DTD
570 reflects this. It is also intended that any new element introduced
571 in later versions of HTML will admit the DIR attribute, unless there
572 is a good reason not to do so.
573
574 A new element called BDO (BIDI Override) is introduced, which
575 requires the DIR attribute to specify whether the override is left-
576 to-right or right-to-left. This element is required for bidirec-
577 tional text control; for detailed explanations, see section 4.2.4.
578
579 The <Q> element is introduced to allow language-dependent rendering
580 of short quotations depending on language and platform capability.
581 As the following examples show, in particular the quotation marks
582 surrounding the quotation are affected: "a quotation in English",
583 `another, slightly better one', ,,a quotation in German'', << a quo-
584 tation in French >>. The contents of the <Q> element does not
585 include quotation marks, they have to be added by the rendering pro-
586 cess.
587
588 NOTE -- <Q> elements can be nested. Many languages use dif-
589 ferent quotation styles for outer and inner quotations, and
590 this should be respected by user-agents implementing this
591 element.
592
593 Many languages require superscripts for proper rendering: as an exam-
594 ple, the French "Mlle Dupont" should have "lle" in superscript. The
595 <SUP> element, and its sibling <SUB>, are introduced to allow proper
596 markup of such text. <SUP> and <SUB> contents are restricted to
597 PCDATA to avoid nesting problems.
598
599 Finally, in many languages text justification is much more important
600 than it is in Western languages, and justifies markup. The ALIGN
601 attribute, admitting values of LEFT, RIGHT, CENTER and JUSTIFY, is
602 added to a selection of elements where it makes sense (block-like).
603 If a user-agent chooses to have LEFT as a default for blocks of left-
604 to-right directionality, it should use RIGHT for blocks of right-to-
605 left directionality.
606
607 In the DTD, the LANG and DIR attributes are grouped together in a
608 parameter entity called attrs. In addition, the ID and CLASS
609 attributes from RFC 1942 [RFC1942] were added to attrs, as was done
610 in the latter. The ID, and CLASS attributes are required for use with
611 style sheets, and RFC 1942 defines them as follows:
612
613
614
615 Expires 2 December 1996 [Page 11]
616
617 Internet Draft HTML internationalization 27 May 1996
618
619
620 ID Used to define a document-wide identifier. This can be used
621 for naming positions within documents as the destination of a
622 hypertext link. It may also be used by style sheets for ren-
623 dering an element in a unique style. An ID attribute value is
624 an SGML NAME token. NAME tokens are formed by an initial let-
625 ter followed by letters, digits, "-" and "." characters. The
626 letters are restricted to A-Z and a-z.
627
628 CLASS A space separated list of SGML NAME tokens. CLASS names spec-
629 ify that the element belongs to the corresponding named
630 classes. It allows authors to distinguish different roles
631 played by the same tag. The classes may be used by style
632 sheets to provide different renderings as appropriate to
633 these roles.
634
635 4.2.3. Cursive joining behaviour
636
637 Markup is needed in some cases to force cursive joining behavior in
638 contexts in which it would not normally occur, or to block it when it
639 would normally occur.
640
641 The zero-width joiner and non-joiner (&zwj; and &zwnj;) are used to
642 control cursive joining behaviour. For example, ARABIC LETTER HEH is
643 used in isolation to abbreviate "Hijri" (the Islamic calendrical sys-
644 tem); however, the initial form of the letter is desired, because the
645 isolated form of HEH looks like the digit five as employed in Arabic
646 script. This is obtained by following the HEH with a zero-width
647 joiner whose only effect is to provide context. In Persian texts,
648 there are cases where a letter that normally would join a subsequent
649 letter in a cursive connection does not. Here a zero-width non-
650 joiner is used.
651
652 4.2.4. Bidirectional text
653
654 Many languages are written in horizontal lines from left to right,
655 while others are written from right to left. When both writing
656 directions are present, one talks of bidirectional text (BIDI for
657 short). BIDI text requires markup in special circumstances where
658 ambiguities as to the directionality of some characters have to be
659 resolved. This markup affects the ability to render BIDI text in a
660 semantically legible fashion. That is, without this special BIDI
661 markup, cases arise which would prevent *any* rendering whatsoever
662 that reflected the basic meaning of the text. Plain text may contain
663 this markup (joining or BIDI) in the form of special-purpose charac-
664 ters; in HTML, these are supplemented by SGML markup.
665
666 BIDI is a complex issue, and implementers are advised to consult
667 appropriate documentation such as [UNICODE]. Here, explanations are
668
669
670
671 Expires 2 December 1996 [Page 12]
672
673 Internet Draft HTML internationalization 27 May 1996
674
675
676 given only as far as they are needed to understand the necessity of
677 the features introduced and to define their exact semantics.
678
679 The Unicode BIDI algorithm is based on a logical sequence of text
680 characters and works mainly by reference to the implicit directional-
681 ity of characters (e.g. Hebrew and Arabic characters are specified to
682 be rendered from right to left, etc.).
683
684 The left-to-right and right-to-left marks (&lrm; and &rlm;) are used
685 to disambiguate directionality of neutral characters. For example,
686 when a double quote sits between an Arabic and a Latin letter, its
687 direction is ambiguous; if a directional mark is added on one side
688 such that the quotation mark is surrounded by characters of only one
689 directionality, the ambiguity is removed. These characters are like
690 zero width spaces which have a directional property (but no word/line
691 break property).
692
693 Nested embeddings of contra-directional text runs, due to nested quo-
694 tations or to the pasting of text from one BIDI context to another,
695 is also a case where the implicit directionality of characters is not
696 sufficient, requiring markup. Also, it is frequently desirable to
697 specify the basic directionality of a block of text. For these pur-
698 poses, the DIR attribute is used.
699
700 On block-type elements, the DIR attribute indicates the base direc-
701 tionality of the text in the block; if omitted it is inherited from
702 the parent element. The default directionality of the overall HTML
703 document is left-to-right.
704
705 On inline elements, it makes the element start a new embedding level
706 (to be explained below); if omitted the inline element does not start
707 a new embedding level.
708
709 NOTE -- the PRE, XMP and LISTING elements admit the DIR
710 attribute, indicating that the contents should not be con-
711 sidered as preformatted with respect to bidirectional lay-
712 out. The BIDI algorithm still needs to be applied to each
713 line of text.
714
715 Following is an example of a case where embedding is needed, showing
716 its effect:
717
718 Given the following latin (upper case) and arabic (lower
719 case) letters in backing store with the specified embed-
720 dings:
721
722 <SPAN DIR=LTR> AB <SPAN DIR=RTL> xy <SPAN DIR=LTR> CD
723 </SPAN> zw </SPAN> EF </SPAN>
724
725
726
727 Expires 2 December 1996 [Page 13]
728
729 Internet Draft HTML internationalization 27 May 1996
730
731
732 One gets the following rendering (with [] showing the
733 directional transitions):
734
735 [ AB [ wz [ CD ] yx ] EF ]
736
737 On the other hand, without this markup and with a base
738 direction of LTR one gets the following rendering:
739
740 [ AB [ yx ] CD [ wz ] EF ]
741
742 Notice that yx is on the left and wz on the right unlike
743 the above case where the embedding levels are used. With-
744 out the embedding markup one has at most two levels: a base
745 directional level and a single counterflow directional
746 level.
747
748 The DIR attribute on inline elements is equivalent to the formatting
749 characters LEFT-TO-RIGHT EMBEDDING (202A) and RIGHT-TO-LEFT EMBED-
750 DING (202B) of ISO 10646. The end tag of the element is equivalent
751 to the POP DIRECTIONAL FORMATTING (202C) character.
752
753 Directional override, as provided by the <BDO> element, is needed to
754 deal with unusual short pieces of text in which directionality cannot
755 be resolved from context in an unambiguous fashion. For example, it
756 can be used to force left-to-right (or right-to-left) display of part
757 numbers composed of Latin letters, digits and Hebrew letters.
758
759 The effect of <BDO> is to force the directionality of all characters
760 within it to the value of DIR, irrespective of their intrinsic direc-
761 tional properties. It is equivalent to using the LEFT-TO-RIGHT OVER-
762 RIDE (202D) or RIGHT-TO-LEFT OVERRIDE (202E) characters of ISO 10646,
763 the end tag again being equivalent to the POP DIRECTIONAL FORMATTING
764 (202C) character.
765
766 NOTE -- authors and authoring software writers should be
767 aware that conflicts can arise if the DIR attribute is used
768 on inline elements (including <BDO>) concurrently with the
769 use of the corresponding ISO 10646 formatting characters.
770 Preferably one or the other should be used exclusively; the
771 markup method is better able to guarantee document struc-
772 tural integrity, and alleviates some problems when editing
773 bidirectional HTML text with a simple text editor, but some
774 software may be more apt at using the 10646 characters. If
775 both methods are used, great care should be exercised to
776 insure proper nesting of markup and directional embedding
777 or override; otherwise, rendering results are undefined.
778
779
780
781
782
783 Expires 2 December 1996 [Page 14]
784
785 Internet Draft HTML internationalization 27 May 1996
786
787
788 5. Forms
789
790
791 5.1. DTD additions
792
793 It is natural to expect input in any language in forms, as they pro-
794 vide one of the only ways of obtaining user input. While this is pri-
795 marily a UI issue, there are some things that should be specified at
796 the HTML level to guide behavior and promote interoperability.
797
798 To ensure full interoperability, it is necessary for the user agent
799 (and the user) to have an indication of the character encoding(s)
800 that the server providing a form will be able to handle upon submis-
801 sion of the filled-in form. Such an indication is provided by the
802 ACCEPT-CHARSET attribute of the INPUT and TEXTAREA elements, modeled
803 on the HTTP Accept-Charset header (see [HTTP-1.1]), which contains a
804 space and/or comma delimited list of character sets acceptable to the
805 server. A user agent may want to somehow advise the user of the con-
806 tents of this attribute, or to restrict his possibility to enter
807 characters outside the repertoires of the listed character sets.
808
809 NOTE -- The list of character sets is to be interpreted as
810 an EXCLUSIVE-OR list; the server announces that it is ready
811 to accept any ONE of these character encoding schemes for
812 each part of a multipart entity. The client may perform
813 character encoding translation to satisfy the server if
814 necessary.
815
816 NOTE -- The default value for the ACCEPT-CHARSET attribute
817 of an INPUT or TEXTAREA element is the reserved value
818 "UNKNOWN". A user agent may interpret that value as the
819 character encoding scheme that was used to transmit the
820 document containing that element.
821
822
823 5.2. Form submission
824
825 The HTML 2.0 form submission mechanism, based on the "application/x-
826 www-form-urlencoded" media type, is ill-equipped with regard to
827 internationalization. In fact, since URLs are restricted to ASCII
828 characters, the mechanism is akward even for ISO-8859-1 text. Sec-
829 tion 2.2 of [RFC1738] specifies that octets may be encoded using the
830 "%HH" notation, but text submitted from a form is composed of charac-
831 ters, not octets. Lacking a specification of a character encoding
832 scheme, the "%HH" notation has no well-defined meaning.
833
834 The best solution is to use the "multipart/form-data" media type
835 described in [RFC1867] with the POST method of form submission. This
836
837
838
839 Expires 2 December 1996 [Page 15]
840
841 Internet Draft HTML internationalization 27 May 1996
842
843
844 mechanism encapsulates the value part of each name-value pair in a
845 body-part of a multipart MIME body that is sent as the HTTP entity;
846 each body part can be labeled with an appropriate Content-Type,
847 including if necessary a charset parameter that specifies the charac-
848 ter encoding scheme. The changes to the DTD necessary to support
849 this method of form submission have been incorporated in the DTD
850 included in this specification.
851
852 A less satisfactory solution is to add a MIME charset parameter to
853 the "application/x-www-form-urlencoded" media type specifier sent
854 along with a POST method form submission, with the understanding that
855 the URL encoding of [RFC1738] is applied on top of the specified
856 character encoding, as a kind of implicit Content-Transfer-Encoding.
857
858 One problem with both solutions above is that current browsers do not
859 generally allow for bookmarks to specify the POST method; this should
860 be improved. Conversely, the GET method could be used with the form
861 data transmitted in the body instead of in the URL. Nothing in the
862 protocol seems to prevent it, but no implementations appear to exist
863 at present.
864
865 How the user agent determines the encoding of the text entered by the
866 user is outside the scope of this specification.
867
868 NOTE -- Designers of forms and their handling scripts
869 should be aware of an important caveat: when the default
870 value of a field (the VALUE attribute) is returned upon
871 form submission (i.e. the user did not modify this value),
872 it cannot be guaranteed to be transmitted as a sequence of
873 octets identical to that in the source document -- only as
874 a possibly different but valid encoding of the same
875 sequence of text elements. This may be true even if the
876 encoding of the document containing the form and that used
877 for submission are the same.
878
879 Differences can occur when a sequence of characters can be
880 represented by various sequences of octets, and also when a
881 composite sequence (a base character plus one or more com-
882 bining diacritics) can be represented by either a different
883 but equivalent composite sequence or by a fully precomposed
884 character. For instance, the UCS-2 sequence 00EA+0232
885 (LATIN SMALL LETTER E WITH CIRCUMFLEX ACCENT + COMBINING
886 DOT BELOW) may be transformed into 1EC7 (LATIN SMALL LETTER
887 E WITH CIRCUMFLEX ACCENT AND DOT BELOW), into
888 0065+0302+0323 (LATIN SMALL LETTER E + COMBINING CIRCUMFLEX
889 ACCENT + COMBINING DOT BELOW), as well as into other equiv-
890 alent composite sequences.
891
892
893
894
895 Expires 2 December 1996 [Page 16]
896
897 Internet Draft HTML internationalization 27 May 1996
898
899
900 6. Miscellaneous
901
902 Proper interpretation of a text document requires that the character
903 encoding scheme be known. Current HTTP servers, however, do not gen-
904 erally include an appropriate charset parameter with the Content-Type
905 header. This is bad behaviour[2], and as such strongly discouraged,
906 but some preventive measures can be taken to minimize the detrimental
907 effects.
908
909 In the case where a document is accessed from a hyperlink in an ori-
910 gin HTML document, a CHARSET attribute is added to the attribute list
911 of elements with link semantics (A and LINK), specifically by adding
912 it to the linkExtraAttributes entity. The value of that attribute is
913 to be considered a hint to the User Agent as to the character encod-
914 ing scheme used by the ressource pointed to by the hyperlink; it
915 should be the appropriate value of the MIME charset parameter for
916 that ressource.
917
918 In any document, it is possible to include an indication of the
919 encoding scheme like the following, as early as possible within the
920 HEAD of the document:
921
922 <META HTTP-EQUIV="Content-Type"
923 CONTENT="text/html; charset=ISO-2022-JP">
924
925 This is not foolproof, but will work if the encoding scheme is such
926 that ASCII characters stand for themselves at least until the META
927 element is parsed. Note that there are better ways for a server to
928 obtain character encoding information, instead of the unreliable
929 <META> above; see [NICOL2] for some details and a proposal.
930
931 For definiteness, the "charset" parameter received from the source of
932 the document should be considered the most authoritative, followed in
933 order of preference by the contents of a META element such as the
934 above, and finally the CHARSET parameter of the anchor that was fol-
935 lowed (if any).
936
937 When HTML text is transmitted directly in UCS-2 or UCS-4 form, the
938 question of byte order arises: does the high-order byte of each
939 multi-byte character come first or last? For definiteness, this
940 specification recommends that UCS-2 and UCS-4 be transmitted in big-
941 -----------
942 2 This bad behaviour is even encouraged by the continued
943 existence of browsers that declare an unrecognized media
944 type when they receive a charset parameter. User agent
945 implementators are strongly encouraged to make their soft-
946 ware tolerant of this parameter, even if they cannot take
947 advantage of it.
948
949
950
951 Expires 2 December 1996 [Page 17]
952
953 Internet Draft HTML internationalization 27 May 1996
954
955
956 endian byte order (high order byte first), which corresponds to the
957 established network byte order for two- and four-byte quantities, to
958 the Unicode recommendation for serialized text data and to RFC 1641.
959 Furthermore, to maximize chances of proper interpretation, it is rec-
960 ommended that documents transmitted as UCS-2 or UCS-4 always begin
961 with a ZERO-WIDTH NON-BREAKING SPACE character (hexadecimal FEFF or
962 0000FEFF) which, when byte-reversed becomes number FFFE or FFFE0000,
963 a character guaranteed to be never assigned. Thus, a user-agent
964 receiving an FFFE as the first octets of a text would know that bytes
965 have to be reversed for the remainder of the text.
966
967 There exist so-called UCS Transformation Formats than can be used to
968 transmit UCS data, in addition to UCS-2 and UCS-4. UTF-7 [RFC1642]
969 and UTF-8 [UTF-8] have favorable properties (no byte-ordering prob-
970 lem, different flavours of ASCII compatibility) that make them worthy
971 of consideration, especially for transmission of multilingual text.
972 Another encoding scheme, MNEM [RFC1345], also has interesting proper-
973 ties and the capability to transmit the full UCS. The UTF-1 trans-
974 formation format of ISO 10646:1993 (registered by IANA as
975 ISO-10646-UTF-1), has been removed from ISO 10646 by amendment 4, and
976 should not be used.
977
978 The SOFT HYPHEN character (U+00AD) needs a little attention from
979 user-agent implementers. It is present in many character sets
980 (including the whole ISO 8859 series and, of course, ISO 10646), and
981 has semantics different from the plain HYPHEN. If not used for
982 hyphenation, the soft hyphen must be completely ignored. For exam-
983 ple, "rec&shy;ord" should display as "record", should match a search
984 for "record", and should sort as "record". Non-observance of these
985 semantics effectively discourages its use on the World Wide Web, even
986 with software that does support it.
987
988 7. HTML Public Text
989
990 7.1. HTML DTD
991
992 This section contains a DTD for HTML based on the HTML 2.0 DTD of RFC
993 1866, incorporating the changes for file upload as specified in RFC
994 1867, and the changes deriving from this document.
995
996 <!-- html.dtd
997
998 Document Type Definition for the HyperText Markup Language,
999 extended for internationalisation (HTML DTD)
1000
1001 Last revised: 96/05/27
1002
1003 Authors: Daniel W. Connolly <connolly@w3.org>
1004
1005
1006
1007 Expires 2 December 1996 [Page 18]
1008
1009 Internet Draft HTML internationalization 27 May 1996
1010
1011
1012 Francois Yergeau <yergeau@alis.com>
1013 See Also: html.decl, html-1.dtd
1014 http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html
1015 -->
1016
1017 <!ENTITY % HTML.Version
1018 "-//IETF//DTD HTML//EN"
1019
1020 -- Typical usage:
1021
1022 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
1023 <html>
1024 ...
1025 </html>
1026 --
1027 >
1028
1029
1030 <!--============ Feature Test Entities ========================-->
1031
1032 <!ENTITY % HTML.Recommended "IGNORE"
1033 -- Certain features of the language are necessary for
1034 compatibility with widespread usage, but they may
1035 compromise the structural integrity of a document.
1036 This feature test entity enables a more prescriptive
1037 document type definition that eliminates
1038 those features.
1039 -->
1040
1041 <![ %HTML.Recommended [
1042 <!ENTITY % HTML.Deprecated "IGNORE">
1043 ]]>
1044
1045 <!ENTITY % HTML.Deprecated "INCLUDE"
1046 -- Certain features of the language are necessary for
1047 compatibility with earlier versions of the specification,
1048 but they tend to be used and implemented inconsistently,
1049 and their use is deprecated. This feature test entity
1050 enables a document type definition that eliminates
1051 these features.
1052 -->
1053
1054 <!ENTITY % HTML.Highlighting "INCLUDE"
1055 -- Use this feature test entity to validate that a
1056 document uses no highlighting tags, which may be
1057 ignored on minimal implementations.
1058 -->
1059
1060
1061
1062
1063 Expires 2 December 1996 [Page 19]
1064
1065 Internet Draft HTML internationalization 27 May 1996
1066
1067
1068 <!ENTITY % HTML.Forms "INCLUDE"
1069 -- Use this feature test entity to validate that a document
1070 contains no forms, which may not be supported in minimal
1071 implementations
1072 -->
1073
1074 <!--============== Imported Names ==============================-->
1075
1076 <!ENTITY % Content-Type "CDATA"
1077 -- meaning an internet media type
1078 (aka MIME content type, as per RFC1521)
1079 -->
1080
1081 <!ENTITY % HTTP-Method "GET | POST"
1082 -- as per HTTP specification, RFC1945
1083 -->
1084
1085 <!--========= DTD "Macros" =====================-->
1086
1087 <!ENTITY % heading "H1|H2|H3|H4|H5|H6">
1088
1089 <!ENTITY % list " UL | OL | DIR | MENU " >
1090
1091 <!ENTITY % attrs -- common attributes for elements --
1092 "LANG NAME #IMPLIED -- RFC 1766 language tag --
1093 DIR (ltr|rtl) #IMPLIED -- text directionnality --
1094 ID ID #IMPLIED -- element identifier (from RFC1942) --
1095 CLASS NAMES #IMPLIED -- for subclassing elements (from RFC1942) --">
1096
1097 <!ENTITY % just -- an attribute for text justification --
1098 "ALIGN (left|right|center|justify) #IMPLIED"
1099 -- default is left for ltr paragraphs, right for rtl -- >
1100
1101 <!--======= Character mnemonic entities =================-->
1102
1103 <!ENTITY % ISOlat1 PUBLIC
1104 "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
1105 %ISOlat1;
1106
1107 <!ENTITY amp CDATA "&#38;" -- ampersand -->
1108 <!ENTITY gt CDATA "&#62;" -- greater than -->
1109 <!ENTITY lt CDATA "&#60;" -- less than -->
1110 <!ENTITY quot CDATA "&#34;" -- double quote -->
1111
1112 <!--Entities for language-dependent presentation (BIDI and contextual analysis) -->
1113 <!ENTITY zwnj CDATA "&#8204;"-- zero width non-joiner-->
1114 <!ENTITY zwj CDATA "&#8205;"-- zero width joiner-->
1115 <!ENTITY lrm CDATA "&#8206;"-- left-to-right mark-->
1116
1117
1118
1119 Expires 2 December 1996 [Page 20]
1120
1121 Internet Draft HTML internationalization 27 May 1996
1122
1123
1124 <!ENTITY rlm CDATA "&#8207;"-- right-to-left mark-->
1125
1126
1127 <!--========= SGML Document Access (SDA) Parameter Entities =====-->
1128
1129 <!-- HTML contains SGML Document Access (SDA) fixed attributes
1130 in support of easy transformation to the International Committee
1131 for Accessible Document Design (ICADD) DTD
1132 "-//EC-USA-CDA/ICADD//DTD ICADD22//EN".
1133 ICADD applications are designed to support usable access to
1134 structured information by print-impaired individuals through
1135 Braille, large print and voice synthesis. For more information on
1136 SDA & ICADD:
1137 - ISO 12083:1993, Annex A.8, Facilities for Braille,
1138 large print and computer voice
1139 - ICADD ListServ
1140 <ICADD%ASUACAD.BITNET@ARIZVM1.ccit.arizona.edu>
1141 - Usenet news group bit.listserv.easi
1142 - Recording for the Blind, +1 800 221 4792
1143 -->
1144
1145 <!ENTITY % SDAFORM "SDAFORM CDATA #FIXED"
1146 -- one to one mapping -->
1147 <!ENTITY % SDARULE "SDARULE CDATA #FIXED"
1148 -- context-sensitive mapping -->
1149 <!ENTITY % SDAPREF "SDAPREF CDATA #FIXED"
1150 -- generated text prefix -->
1151 <!ENTITY % SDASUFF "SDASUFF CDATA #FIXED"
1152 -- generated text suffix -->
1153 <!ENTITY % SDASUSP "SDASUSP NAME #FIXED"
1154 -- suspend transform process -->
1155
1156
1157 <!--========== Text Markup =====================-->
1158
1159 <![ %HTML.Highlighting [
1160
1161 <!ENTITY % font " TT | B | I ">
1162
1163 <!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE ">
1164
1165 <!ENTITY % text "#PCDATA|A|IMG|BR|%phrase|%font|SPAN|Q|BDO|SUP|SUB">
1166
1167 <!ELEMENT (%font;|%phrase) - - (%text)*>
1168 <!ATTLIST ( TT | CODE | SAMP | KBD | VAR )
1169 %attrs;
1170 %SDAFORM; "Lit"
1171 >
1172
1173
1174
1175 Expires 2 December 1996 [Page 21]
1176
1177 Internet Draft HTML internationalization 27 May 1996
1178
1179
1180 <!ATTLIST ( B | STRONG )
1181 %attrs;
1182 %SDAFORM; "B"
1183 >
1184 <!ATTLIST ( I | EM | CITE )
1185 %attrs;
1186 %SDAFORM; "It"
1187 >
1188
1189 <!-- <TT> Typewriter text -->
1190 <!-- <B> Bold text -->
1191 <!-- <I> Italic text -->
1192
1193 <!-- <EM> Emphasized phrase -->
1194 <!-- <STRONG> Strong emphasis -->
1195 <!-- <CODE> Source code phrase -->
1196 <!-- <SAMP> Sample text or characters -->
1197 <!-- <KBD> Keyboard phrase, e.g. user input -->
1198 <!-- <VAR> Variable phrase or substitutable -->
1199 <!-- <CITE> Name or title of cited work -->
1200
1201 <!ENTITY % pre.content "#PCDATA|A|HR|BR|%font|%phrase|SPAN|BDO">
1202
1203 ]]>
1204
1205 <!ENTITY % text "#PCDATA|A|IMG|BR|SPAN|Q|BDO|SUP|SUB">
1206
1207 <!ELEMENT BR - O EMPTY>
1208 <!ATTLIST BR
1209 %SDAPREF; "&#RE;"
1210 >
1211
1212 <!-- <BR> Line break -->
1213
1214 <!ELEMENT SPAN - - (%text)*>
1215 <!ATTLIST SPAN
1216 %attrs;
1217 %SDAFORM; "other #Attlist"
1218 >
1219
1220 <!-- <SPAN> Generic inline container -->
1221 <!-- <SPAN DIR=...> New counterflow embedding -->
1222 <!-- <SPAN LANG="..."> Language of contents -->
1223
1224 <!ELEMENT Q - - (%text)*>
1225 <!ATTLIST Q
1226 %attrs;
1227 %SDAPREF; '"'
1228
1229
1230
1231 Expires 2 December 1996 [Page 22]
1232
1233 Internet Draft HTML internationalization 27 May 1996
1234
1235
1236 %SDASUFF; '"'
1237 >
1238
1239 <!-- <Q> Short quotation -->
1240 <!-- <Q LANG=xx> Language of quotation is xx -->
1241 <!-- <Q DIR=...> New conterflow embedding -->
1242
1243 <!ELEMENT BDO - - (%text)+>
1244 <!ATTLIST BDO
1245 LANG NAME #IMPLIED
1246 DIR (ltr|rtl) #REQUIRED
1247 %SDAPREF "Bidi Override #Attval(DIR): "
1248 %SDASUFF "End Bidi"
1249 >
1250
1251 <!-- <BDO DIR=...> Override directionality of text to value of DIR -->
1252 <!-- <BDO LANG=...> Language of contents -->
1253
1254 <!ELEMENT (SUP|SUB) - - (#PCDATA)>
1255 <!ATTLIST (SUP)
1256 %attrs;
1257 %SDAPREF "Superscript(#content)"
1258 >
1259 <!ATTLIST (SUB)
1260 %attrs;
1261 %SDAPREF "Subscript(#content)"
1262 >
1263
1264 <!-- <SUP> Superscript -->
1265 <!-- <SUB> Subscript -->
1266
1267 <!--========= Link Markup ======================-->
1268
1269 <!ENTITY % linkType "NAMES">
1270
1271 <!ENTITY % linkExtraAttributes
1272 "REL %linkType #IMPLIED
1273 REV %linkType #IMPLIED
1274 URN CDATA #IMPLIED
1275 TITLE CDATA #IMPLIED
1276 METHODS NAMES #IMPLIED
1277 CHARSET NAME #IMPLIED
1278 ">
1279
1280 <![ %HTML.Recommended [
1281 <!ENTITY % A.content "(%text)*"
1282 -- <H1><a name="xxx">Heading</a></H1>
1283 is preferred to
1284
1285
1286
1287 Expires 2 December 1996 [Page 23]
1288
1289 Internet Draft HTML internationalization 27 May 1996
1290
1291
1292 <a name="xxx"><H1>Heading</H1></a>
1293 -->
1294 ]]>
1295
1296 <!ENTITY % A.content "(%heading|%text)*">
1297
1298 <!ELEMENT A - - %A.content -(A)>
1299 <!ATTLIST A
1300 %attrs;
1301 HREF CDATA #IMPLIED
1302 NAME CDATA #IMPLIED
1303 %linkExtraAttributes;
1304 %SDAPREF; "<Anchor: #AttList>"
1305 >
1306 <!-- <A> Anchor; source/destination of link -->
1307 <!-- <A NAME="..."> Name of this anchor -->
1308 <!-- <A HREF="..."> Address of link destination -->
1309 <!-- <A URN="..."> Permanent address of destination -->
1310 <!-- <A REL=...> Relationship to destination -->
1311 <!-- <A REV=...> Relationship of destination to this -->
1312 <!-- <A TITLE="..."> Title of destination (advisory) -->
1313 <!-- <A METHODS="..."> Operations on destination (advisory) -->
1314 <!-- <A CHARSET="..."> Charset of destination (advisory) -->
1315 <!-- <A LANG="..."> Language of contents btw <A> and </A> -->
1316 <!-- <A DIR=...> Contents is a new counterflow embedding -->
1317
1318 <!--========== Images ==========================-->
1319
1320 <!ELEMENT IMG - O EMPTY>
1321 <!ATTLIST IMG
1322 %attrs;
1323 SRC CDATA #REQUIRED
1324 ALT CDATA #IMPLIED
1325 ALIGN (top|middle|bottom) #IMPLIED
1326 ISMAP (ISMAP) #IMPLIED
1327 %SDAPREF; "<Fig><?SDATrans Img: #AttList>#AttVal(Alt)</Fig>"
1328 >
1329
1330 <!-- <IMG> Image; icon, glyph or illustration -->
1331 <!-- <IMG SRC="..."> Address of image object -->
1332 <!-- <IMG ALT="..."> Textual alternative -->
1333 <!-- <IMG ALIGN=...> Position relative to text -->
1334 <!-- <IMG LANG=...> Image contains "text" in that language -->
1335 <!-- <IMG DIR=rtl> Inline image acts as a right-to-left
1336 embedding w/r to BIDI algorithm -->
1337 <!-- <IMG ISMAP> Each pixel can be a link -->
1338
1339 <!--========== Paragraphs=======================-->
1340
1341
1342
1343 Expires 2 December 1996 [Page 24]
1344
1345 Internet Draft HTML internationalization 27 May 1996
1346
1347
1348 <!ELEMENT P - O (%text)*>
1349 <!ATTLIST P
1350 %attrs;
1351 %just;
1352 %SDAFORM; "Para"
1353 >
1354
1355 <!-- <P> Paragraph -->
1356 <!-- <P LANG="..."> Language of paragraph text -->
1357 <!-- <P DIR=...> Base directionality of paragraph -->
1358 <!-- <P ALIGN=...> Paragraph alignment (justification) -->
1359
1360 <!--========== Headings, Titles, Sections ===============-->
1361
1362 <!ELEMENT HR - O EMPTY>
1363 <!ATTLIST HR
1364 %just;
1365 %SDAPREF; "&#RE;&#RE;"
1366 >
1367
1368 <!-- <HR> Horizontal rule -->
1369
1370 <!ELEMENT ( %heading ) - - (%text;)*>
1371 <!ATTLIST H1
1372 %attrs;
1373 %just;
1374 %SDAFORM; "H1"
1375 >
1376 <!ATTLIST H2
1377 %attrs;
1378 %just;
1379 %SDAFORM; "H2"
1380 >
1381 <!ATTLIST H3
1382 %attrs;
1383 %just;
1384 %SDAFORM; "H3"
1385 >
1386 <!ATTLIST H4
1387 %attrs;
1388 %just;
1389 %SDAFORM; "H4"
1390 >
1391 <!ATTLIST H5
1392 %attrs;
1393 %just;
1394 %SDAFORM; "H5"
1395 >
1396
1397
1398
1399 Expires 2 December 1996 [Page 25]
1400
1401 Internet Draft HTML internationalization 27 May 1996
1402
1403
1404 <!ATTLIST H6
1405 %attrs;
1406 %just;
1407 %SDAFORM; "H6"
1408 >
1409
1410 <!-- <H1> Heading, level 1 -->
1411 <!-- <H2> Heading, level 2 -->
1412 <!-- <H3> Heading, level 3 -->
1413 <!-- <H4> Heading, level 4 -->
1414 <!-- <H5> Heading, level 5 -->
1415 <!-- <H6> Heading, level 6 -->
1416
1417
1418 <!--========== Text Flows ======================-->
1419
1420 <![ %HTML.Forms [
1421 <!ENTITY % block.forms "BLOCKQUOTE | FORM | ISINDEX">
1422 ]]>
1423
1424 <!ENTITY % block.forms "BLOCKQUOTE">
1425
1426 <![ %HTML.Deprecated [
1427 <!ENTITY % preformatted "PRE | XMP | LISTING">
1428 ]]>
1429
1430 <!ENTITY % preformatted "PRE">
1431
1432 <!ENTITY % block "P | %list | DL
1433 | %preformatted
1434 | %block.forms">
1435
1436 <!ENTITY % flow "(%text|%block)*">
1437
1438 <!ENTITY % pre.content "#PCDATA | A | HR | BR | SPAN | BDO">
1439 <!ELEMENT PRE - - (%pre.content)*>
1440 <!ATTLIST PRE
1441 %attrs;
1442 WIDTH NUMBER #implied
1443 %SDAFORM; "Lit"
1444 >
1445
1446 <!-- <PRE> Preformatted text -->
1447 <!-- <PRE WIDTH=...> Maximum characters per line -->
1448 <!-- <PRE DIR=...> Base direction of preformatted block -->
1449 <!-- <PRE LANG=...> Language of contents -->
1450
1451 <![ %HTML.Deprecated [
1452
1453
1454
1455 Expires 2 December 1996 [Page 26]
1456
1457 Internet Draft HTML internationalization 27 May 1996
1458
1459
1460 <!ENTITY % literal "CDATA"
1461 -- historical, non-conforming parsing mode where
1462 the only markup signal is the end tag
1463 in full
1464 -->
1465
1466 <!ELEMENT (XMP|LISTING) - - %literal>
1467 <!ATTLIST XMP
1468 %attrs;
1469 %SDAFORM; "Lit"
1470 %SDAPREF; "Example:&#RE;"
1471 >
1472 <!ATTLIST LISTING
1473 %attrs;
1474 %SDAFORM; "Lit"
1475 %SDAPREF; "Listing:&#RE;"
1476 >
1477
1478 <!-- <XMP> Example section -->
1479 <!-- <LISTING> Computer listing -->
1480
1481 <!ELEMENT PLAINTEXT - O %literal>
1482 <!-- <PLAINTEXT> Plain text passage -->
1483
1484 <!ATTLIST PLAINTEXT
1485 %attrs;
1486 %SDAFORM; "Lit"
1487 >
1488 ]]>
1489
1490
1491 <!--========== Lists ==================-->
1492
1493 <!ELEMENT DL - - (DT | DD)+>
1494 <!ATTLIST DL
1495 %attrs;
1496 COMPACT (COMPACT) #IMPLIED
1497 %SDAFORM; "List"
1498 %SDAPREF; "Definition List:"
1499 >
1500
1501 <!ELEMENT DT - O (%text)*>
1502 <!ATTLIST DT
1503 %attrs;
1504 %SDAFORM; "Term"
1505 >
1506
1507 <!ELEMENT DD - O %flow>
1508
1509
1510
1511 Expires 2 December 1996 [Page 27]
1512
1513 Internet Draft HTML internationalization 27 May 1996
1514
1515
1516 <!ATTLIST DD
1517 %attrs;
1518 %SDAFORM; "LItem"
1519 >
1520
1521 <!-- <DL> Definition list, or glossary -->
1522 <!-- <DL COMPACT> Compact style list -->
1523 <!-- <DT> Term in definition list -->
1524 <!-- <DD> Definition of term -->
1525
1526 <!ELEMENT (OL|UL) - - (LI)+>
1527 <!ATTLIST OL
1528 %attrs;
1529 %just;
1530 COMPACT (COMPACT) #IMPLIED
1531 %SDAFORM; "List"
1532 >
1533 <!ATTLIST UL
1534 %attrs;
1535 %just;
1536 COMPACT (COMPACT) #IMPLIED
1537 %SDAFORM; "List"
1538 >
1539 <!-- <UL> Unordered list -->
1540 <!-- <UL COMPACT> Compact list style -->
1541 <!-- <OL> Ordered, or numbered list -->
1542 <!-- <OL COMPACT> Compact list style -->
1543
1544
1545 <!ELEMENT (DIR|MENU) - - (LI)+ -(%block)>
1546 <!ATTLIST DIR
1547 %attrs;
1548 %just;
1549 COMPACT (COMPACT) #IMPLIED
1550 %SDAFORM; "List"
1551 %SDAPREF; "<LHead>Directory</LHead>"
1552 >
1553 <!ATTLIST MENU
1554 %attrs;
1555 %just;
1556 COMPACT (COMPACT) #IMPLIED
1557 %SDAFORM; "List"
1558 %SDAPREF; "<LHead>Menu</LHead>"
1559 >
1560
1561 <!-- <DIR> Directory list -->
1562 <!-- <DIR COMPACT> Compact list style -->
1563 <!-- <MENU> Menu list -->
1564
1565
1566
1567 Expires 2 December 1996 [Page 28]
1568
1569 Internet Draft HTML internationalization 27 May 1996
1570
1571
1572 <!-- <MENU COMPACT> Compact list style -->
1573
1574 <!ELEMENT LI - O %flow>
1575 <!ATTLIST LI
1576 %attrs;
1577 %just;
1578 %SDAFORM; "LItem"
1579 >
1580
1581 <!-- <LI> List item -->
1582
1583 <!--========== Document Body ===================-->
1584
1585 <![ %HTML.Recommended [
1586 <!ENTITY % body.content "(%heading|%block|HR|ADDRESS|IMG)*"
1587 -- <h1>Heading</h1>
1588 <p>Text ...
1589 is preferred to
1590 <h1>Heading</h1>
1591 Text ...
1592 -->
1593 ]]>
1594
1595 <!ENTITY % body.content "(%heading | %text | %block |
1596 HR | ADDRESS)*">
1597
1598 <!ELEMENT BODY O O %body.content>
1599 <!ATTLIST BODY
1600 %attrs;
1601 >
1602
1603 <!-- <BODY> Document body -->
1604 <!-- <BODY DIR=...> Base direction of whole body -->
1605 <!-- <BODY LANG=...> Language of contents -->
1606
1607 <!ELEMENT BLOCKQUOTE - - %body.content>
1608 <!ATTLIST BLOCKQUOTE
1609 %attrs;
1610 %just;
1611 %SDAFORM; "BQ"
1612 >
1613
1614 <!-- <BLOCKQUOTE> Quoted passage -->
1615
1616 <!ELEMENT ADDRESS - - (%text|P)*>
1617 <!ATTLIST ADDRESS
1618 %attrs;
1619 %just;
1620
1621
1622
1623 Expires 2 December 1996 [Page 29]
1624
1625 Internet Draft HTML internationalization 27 May 1996
1626
1627
1628 %SDAFORM; "Lit"
1629 %SDAPREF; "Address:&#RE;"
1630 >
1631
1632 <!-- <ADDRESS> Address, signature, or byline -->
1633
1634
1635 <!--======= Forms ====================-->
1636
1637 <![ %HTML.Forms [
1638
1639 <!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
1640 <!ATTLIST FORM
1641 %attrs;
1642 ACTION CDATA #IMPLIED
1643 METHOD (%HTTP-Method) GET
1644 ENCTYPE %Content-Type; "application/x-www-form-urlencoded"
1645 %SDAPREF; "<Para>Form:</Para>"
1646 %SDASUFF; "<Para>Form End.</Para>"
1647 >
1648
1649 <!-- <FORM> Fill-out or data-entry form -->
1650 <!-- <FORM ACTION="..."> Address for completed form -->
1651 <!-- <FORM METHOD=...> Method of submitting form -->
1652 <!-- <FORM ENCTYPE="..."> Representation of form data -->
1653 <!-- <FORM DIR=...> Base direction of form -->
1654 <!-- <FORM LANG=...> Language of contents -->
1655
1656 <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
1657 RADIO | SUBMIT | RESET |
1658 IMAGE | HIDDEN | FILE )">
1659 <!ELEMENT INPUT - O EMPTY>
1660 <!ATTLIST INPUT
1661 %attrs;
1662 TYPE %InputType TEXT
1663 NAME CDATA #IMPLIED
1664 VALUE CDATA #IMPLIED
1665 SRC CDATA #IMPLIED
1666 CHECKED (CHECKED) #IMPLIED
1667 SIZE CDATA #IMPLIED
1668 MAXLENGTH NUMBER #IMPLIED
1669 ALIGN (top|middle|bottom) #IMPLIED
1670 ACCEPT CDATA #IMPLIED --list of content types --
1671 ACCEPT-CHARSET CDATA #IMPLIED --list of charsets accepted by server --
1672 %SDAPREF; "Input: "
1673 >
1674
1675 <!-- <INPUT> Form input datum -->
1676
1677
1678
1679 Expires 2 December 1996 [Page 30]
1680
1681 Internet Draft HTML internationalization 27 May 1996
1682
1683
1684 <!-- <INPUT TYPE=...> Type of input interaction -->
1685 <!-- <INPUT NAME=...> Name of form datum -->
1686 <!-- <INPUT VALUE="..."> Default/initial/selected value -->
1687 <!-- <INPUT SRC="..."> Address of image -->
1688 <!-- <INPUT CHECKED> Initial state is "on" -->
1689 <!-- <INPUT SIZE=...> Field size hint -->
1690 <!-- <INPUT MAXLENGTH=...> Data length maximum -->
1691 <!-- <INPUT ALIGN=...> Image alignment -->
1692 <!-- <INPUT ACCEPT="..."> List of desired media types -->
1693 <!-- <INPUT ACCEPT-CHARSET="..."> List of acceptable charsets -->
1694
1695 <!ELEMENT SELECT - - (OPTION+) -(INPUT|SELECT|TEXTAREA)>
1696 <!ATTLIST SELECT
1697 %attrs;
1698 NAME CDATA #REQUIRED
1699 SIZE NUMBER #IMPLIED
1700 MULTIPLE (MULTIPLE) #IMPLIED
1701 %SDAFORM; "List"
1702 %SDAPREF;
1703 "<LHead>Select #AttVal(Multiple)</LHead>"
1704 >
1705
1706 <!-- <SELECT> Selection of option(s) -->
1707 <!-- <SELECT NAME=...> Name of form datum -->
1708 <!-- <SELECT SIZE=...> Options displayed at a time -->
1709 <!-- <SELECT MULTIPLE> Multiple selections allowed -->
1710
1711 <!ELEMENT OPTION - O (#PCDATA)*>
1712 <!ATTLIST OPTION
1713 %attrs;
1714 SELECTED (SELECTED) #IMPLIED
1715 VALUE CDATA #IMPLIED
1716 %SDAFORM; "LItem"
1717 %SDAPREF;
1718 "Option: #AttVal(Value) #AttVal(Selected)"
1719 >
1720
1721 <!-- <OPTION> A selection option -->
1722 <!-- <OPTION SELECTED> Initial state -->
1723 <!-- <OPTION VALUE="..."> Form datum value for this option-->
1724
1725 <!ELEMENT TEXTAREA - - (#PCDATA)* -(INPUT|SELECT|TEXTAREA)>
1726 <!ATTLIST TEXTAREA
1727 %attrs;
1728 NAME CDATA #REQUIRED
1729 ROWS NUMBER #REQUIRED
1730 COLS NUMBER #REQUIRED
1731 ACCEPT-CHARSET CDATA #IMPLIED -- list of charsets accepted by server --
1732
1733
1734
1735 Expires 2 December 1996 [Page 31]
1736
1737 Internet Draft HTML internationalization 27 May 1996
1738
1739
1740 %SDAFORM; "Para"
1741 %SDAPREF; "Input Text -- #AttVal(Name): "
1742 >
1743
1744 <!-- <TEXTAREA> An area for text input -->
1745 <!-- <TEXTAREA NAME=...> Name of form datum -->
1746 <!-- <TEXTAREA ROWS=...> Height of area -->
1747 <!-- <TEXTAREA COLS=...> Width of area -->
1748
1749 ]]>
1750
1751
1752 <!--======= Document Head ======================-->
1753
1754 <![ %HTML.Recommended [
1755 <!ENTITY % head.extra "">
1756 ]]>
1757 <!ENTITY % head.extra "& NEXTID?">
1758
1759 <!ENTITY % head.content "TITLE & ISINDEX? & BASE? %head.extra">
1760
1761 <!ELEMENT HEAD O O (%head.content) +(META|LINK)>
1762 <!ATTLIST HEAD
1763 %attrs; >
1764
1765 <!-- <HEAD> Document head -->
1766
1767 <!ELEMENT TITLE - - (#PCDATA)* -(META|LINK)>
1768 <!ATTLIST TITLE
1769 %attrs;
1770 %SDAFORM; "Ti" >
1771
1772 <!-- <TITLE> Title of document -->
1773
1774 <!ELEMENT LINK - O EMPTY>
1775 <!ATTLIST LINK
1776 %attrs;
1777 HREF CDATA #REQUIRED
1778 %linkExtraAttributes;
1779 %SDAPREF; "Linked to : #AttVal (TITLE) (URN) (HREF)>" >
1780
1781 <!-- <LINK> Link from this document -->
1782 <!-- <LINK HREF="..."> Address of link destination -->
1783 <!-- <LINK URN="..."> Lasting name of destination -->
1784 <!-- <LINK REL=...> Relationship to destination -->
1785 <!-- <LINK REV=...> Relationship of destination to this -->
1786 <!-- <LINK TITLE="..."> Title of destination (advisory) -->
1787 <!-- <LINK CHARSET="..."> Charset of destination (advisory) -->
1788
1789
1790
1791 Expires 2 December 1996 [Page 32]
1792
1793 Internet Draft HTML internationalization 27 May 1996
1794
1795
1796 <!-- <LINK METHODS="..."> Operations allowed (advisory) -->
1797
1798 <!ELEMENT ISINDEX - O EMPTY>
1799 <!ATTLIST ISINDEX
1800 %attrs;
1801 %SDAPREF;
1802 "<Para>[Document is indexed/searchable.]</Para>">
1803
1804 <!-- <ISINDEX> Document is a searchable index -->
1805
1806 <!ELEMENT BASE - O EMPTY>
1807 <!ATTLIST BASE
1808 HREF CDATA #REQUIRED >
1809
1810 <!-- <BASE> Base context document -->
1811 <!-- <BASE HREF="..."> Address for this document -->
1812
1813 <!ELEMENT NEXTID - O EMPTY>
1814 <!ATTLIST NEXTID
1815 N CDATA #REQUIRED >
1816
1817 <!-- <NEXTID> Next ID to use for link name -->
1818 <!-- <NEXTID N=...> Next ID to use for link name -->
1819
1820 <!ELEMENT META - O EMPTY>
1821 <!ATTLIST META
1822 HTTP-EQUIV NAME #IMPLIED
1823 NAME NAME #IMPLIED
1824 CONTENT CDATA #REQUIRED >
1825
1826 <!-- <META> Generic Meta-information -->
1827 <!-- <META HTTP-EQUIV=...> HTTP response header name -->
1828 <!-- <META NAME=...> Meta-information name -->
1829 <!-- <META CONTENT="..."> Associated information -->
1830
1831 <!--======= Document Structure =================-->
1832
1833 <![ %HTML.Deprecated [
1834 <!ENTITY % html.content "HEAD, BODY, PLAINTEXT?">
1835 ]]>
1836 <!ENTITY % html.content "HEAD, BODY">
1837
1838 <!ELEMENT HTML O O (%html.content)>
1839 <!ENTITY % version.attr "VERSION CDATA #FIXED '%HTML.Version;'">
1840
1841 <!ATTLIST HTML
1842 %attrs;
1843 %version.attr;
1844
1845
1846
1847 Expires 2 December 1996 [Page 33]
1848
1849 Internet Draft HTML internationalization 27 May 1996
1850
1851
1852 %SDAFORM; "Book"
1853 >
1854
1855 <!-- <HTML> HTML Document -->
1856
1857
1858 7.2. SGML Declaration for HTML
1859
1860 <!SGML "ISO 8879:1986"
1861 --
1862 SGML Declaration for HyperText Markup Language version 2.x
1863 (HTML 2.x = HTML 2.0 + i18n).
1864
1865 --
1866
1867 CHARSET
1868 BASESET "ISO Registration Number 177//CHARSET
1869 ISO/IEC 10646-1:1993 UCS-4 with
1870 implementation level 3//ESC 2/5 2/15 4/6"
1871 DESCSET 0 9 UNUSED
1872 9 2 9
1873 11 2 UNUSED
1874 13 1 13
1875 14 18 UNUSED
1876 32 95 32
1877 127 1 UNUSED
1878 128 32 UNUSED
1879 160 2147483486 160
1880
1881 CAPACITY SGMLREF
1882 TOTALCAP 150000
1883 GRPCAP 150000
1884 ENTCAP 150000
1885
1886 SCOPE DOCUMENT
1887 SYNTAX
1888 SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1889 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127
1890 BASESET "ISO 646:1983//CHARSET
1891 International Reference Version
1892 (IRV)//ESC 2/5 4/0"
1893 DESCSET 0 128 0
1894
1895 FUNCTION
1896 RE 13
1897 RS 10
1898 SPACE 32
1899 TAB SEPCHAR 9
1900
1901
1902
1903 Expires 2 December 1996 [Page 34]
1904
1905 Internet Draft HTML internationalization 27 May 1996
1906
1907
1908 NAMING LCNMSTRT ""
1909 UCNMSTRT ""
1910 LCNMCHAR ".-"
1911 UCNMCHAR ".-"
1912 NAMECASE GENERAL YES
1913 ENTITY NO
1914 DELIM GENERAL SGMLREF
1915 SHORTREF SGMLREF
1916 NAMES SGMLREF
1917 QUANTITY SGMLREF
1918 ATTSPLEN 2100
1919 LITLEN 1024
1920 NAMELEN 72 -- somewhat arbitrary; taken from
1921 internet line length conventions --
1922 PILEN 1024
1923 TAGLVL 100
1924 TAGLEN 2100
1925 GRPGTCNT 150
1926 GRPCNT 64
1927
1928 FEATURES
1929 MINIMIZE
1930 DATATAG NO
1931 OMITTAG YES
1932 RANK NO
1933 SHORTTAG YES
1934 LINK
1935 SIMPLE NO
1936 IMPLICIT NO
1937 EXPLICIT NO
1938 OTHER
1939 CONCUR NO
1940 SUBDOC NO
1941 FORMAL YES
1942 APPINFO "SDA" -- conforming SGML Document Access application
1943 --
1944 >
1945
1946
1947 7.3. ISO Latin 1 entity set
1948
1949 The following public text lists each of the characters specified in
1950 the Added Latin 1 entity set, along with its name, syntax for use,
1951 and description. This list is derived from ISO Standard
1952 8879:1986//ENTITIES Added Latin 1//EN. HTML includes the entire
1953 entity set, and adds entities for all missing characters in the right
1954 part of ISO-8859-1.
1955
1956
1957
1958
1959 Expires 2 December 1996 [Page 35]
1960
1961 Internet Draft HTML internationalization 27 May 1996
1962
1963
1964 <!-- (C) International Organization for Standardization 1986
1965 Permission to copy in any form is granted for use with
1966 conforming SGML systems and applications as defined in
1967 ISO 8879, provided this notice is included in all copies.
1968 -->
1969 <!-- Character entity set. Typical invocation:
1970 <!ENTITY % ISOlat1 PUBLIC
1971 "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
1972 %ISOlat1;
1973 -->
1974 <!ENTITY nbsp CDATA "&#160;" -- no-break space -->
1975 <!ENTITY iexcl CDATA "&#161;" -- inverted exclamation mark -->
1976 <!ENTITY cent CDATA "&#162;" -- cent sign -->
1977 <!ENTITY pound CDATA "&#163;" -- pound sterling sign -->
1978 <!ENTITY curren CDATA "&#164;" -- general currency sign -->
1979 <!ENTITY yen CDATA "&#165;" -- yen sign -->
1980 <!ENTITY brvbar CDATA "&#166;" -- broken (vertical) bar -->
1981 <!ENTITY sect CDATA "&#167;" -- section sign -->
1982 <!ENTITY uml CDATA "&#168;" -- umlaut (dieresis) -->
1983 <!ENTITY copy CDATA "&#169;" -- copyright sign -->
1984 <!ENTITY ordf CDATA "&#170;" -- ordinal indicator, feminine -->
1985 <!ENTITY laquo CDATA "&#171;" -- angle quotation mark, left -->
1986 <!ENTITY not CDATA "&#172;" -- not sign -->
1987 <!ENTITY shy CDATA "&#173;" -- soft hyphen -->
1988 <!ENTITY reg CDATA "&#174;" -- registered sign -->
1989 <!ENTITY macr CDATA "&#175;" -- macron -->
1990 <!ENTITY deg CDATA "&#176;" -- degree sign -->
1991 <!ENTITY plusmn CDATA "&#177;" -- plus-or-minus sign -->
1992 <!ENTITY sup2 CDATA "&#178;" -- superscript two -->
1993 <!ENTITY sup3 CDATA "&#179;" -- superscript three -->
1994 <!ENTITY acute CDATA "&#180;" -- acute accent -->
1995 <!ENTITY micro CDATA "&#181;" -- micro sign -->
1996 <!ENTITY para CDATA "&#182;" -- pilcrow (paragraph sign) -->
1997 <!ENTITY middot CDATA "&#183;" -- middle dot -->
1998 <!ENTITY cedil CDATA "&#184;" -- cedilla -->
1999 <!ENTITY sup1 CDATA "&#185;" -- superscript one -->
2000 <!ENTITY ordm CDATA "&#186;" -- ordinal indicator, masculine -->
2001 <!ENTITY raquo CDATA "&#187;" -- angle quotation mark, right -->
2002 <!ENTITY frac14 CDATA "&#188;" -- fraction one-quarter -->
2003 <!ENTITY frac12 CDATA "&#189;" -- fraction one-half -->
2004 <!ENTITY frac34 CDATA "&#190;" -- fraction three-quarters -->
2005 <!ENTITY iquest CDATA "&#191;" -- inverted question mark -->
2006 <!ENTITY Agrave CDATA "&#192;" -- capital A, grave accent -->
2007 <!ENTITY Aacute CDATA "&#193;" -- capital A, acute accent -->
2008 <!ENTITY Acirc CDATA "&#194;" -- capital A, circumflex accent -->
2009 <!ENTITY Atilde CDATA "&#195;" -- capital A, tilde -->
2010 <!ENTITY Auml CDATA "&#196;" -- capital A, dieresis or umlaut mark -->
2011 <!ENTITY Aring CDATA "&#197;" -- capital A, ring -->
2012
2013
2014
2015 Expires 2 December 1996 [Page 36]
2016
2017 Internet Draft HTML internationalization 27 May 1996
2018
2019
2020 <!ENTITY AElig CDATA "&#198;" -- capital AE diphthong (ligature) -->
2021 <!ENTITY Ccedil CDATA "&#199;" -- capital C, cedilla -->
2022 <!ENTITY Egrave CDATA "&#200;" -- capital E, grave accent -->
2023 <!ENTITY Eacute CDATA "&#201;" -- capital E, acute accent -->
2024 <!ENTITY Ecirc CDATA "&#202;" -- capital E, circumflex accent -->
2025 <!ENTITY Euml CDATA "&#203;" -- capital E, dieresis or umlaut mark -->
2026 <!ENTITY Igrave CDATA "&#204;" -- capital I, grave accent -->
2027 <!ENTITY Iacute CDATA "&#205;" -- capital I, acute accent -->
2028 <!ENTITY Icirc CDATA "&#206;" -- capital I, circumflex accent -->
2029 <!ENTITY Iuml CDATA "&#207;" -- capital I, dieresis or umlaut mark -->
2030 <!ENTITY ETH CDATA "&#208;" -- capital Eth, Icelandic -->
2031 <!ENTITY Ntilde CDATA "&#209;" -- capital N, tilde -->
2032 <!ENTITY Ograve CDATA "&#210;" -- capital O, grave accent -->
2033 <!ENTITY Oacute CDATA "&#211;" -- capital O, acute accent -->
2034 <!ENTITY Ocirc CDATA "&#212;" -- capital O, circumflex accent -->
2035 <!ENTITY Otilde CDATA "&#213;" -- capital O, tilde -->
2036 <!ENTITY Ouml CDATA "&#214;" -- capital O, dieresis or umlaut mark -->
2037 <!ENTITY times CDATA "&#215;" -- multiply sign -->
2038 <!ENTITY Oslash CDATA "&#216;" -- capital O, slash -->
2039 <!ENTITY Ugrave CDATA "&#217;" -- capital U, grave accent -->
2040 <!ENTITY Uacute CDATA "&#218;" -- capital U, acute accent -->
2041 <!ENTITY Ucirc CDATA "&#219;" -- capital U, circumflex accent -->
2042 <!ENTITY Uuml CDATA "&#220;" -- capital U, dieresis or umlaut mark -->
2043 <!ENTITY Yacute CDATA "&#221;" -- capital Y, acute accent -->
2044 <!ENTITY THORN CDATA "&#222;" -- capital Thorn, Icelandic -->
2045 <!ENTITY szlig CDATA "&#223;" -- small sharp s, German (sz ligature) -->
2046 <!ENTITY agrave CDATA "&#224;" -- small a, grave accent -->
2047 <!ENTITY aacute CDATA "&#225;" -- small a, acute accent -->
2048 <!ENTITY acirc CDATA "&#226;" -- small a, circumflex accent -->
2049 <!ENTITY atilde CDATA "&#227;" -- small a, tilde -->
2050 <!ENTITY auml CDATA "&#228;" -- small a, dieresis or umlaut mark -->
2051 <!ENTITY aring CDATA "&#229;" -- small a, ring -->
2052 <!ENTITY aelig CDATA "&#230;" -- small ae diphthong (ligature) -->
2053 <!ENTITY ccedil CDATA "&#231;" -- small c, cedilla -->
2054 <!ENTITY egrave CDATA "&#232;" -- small e, grave accent -->
2055 <!ENTITY eacute CDATA "&#233;" -- small e, acute accent -->
2056 <!ENTITY ecirc CDATA "&#234;" -- small e, circumflex accent -->
2057 <!ENTITY euml CDATA "&#235;" -- small e, dieresis or umlaut mark -->
2058 <!ENTITY igrave CDATA "&#236;" -- small i, grave accent -->
2059 <!ENTITY iacute CDATA "&#237;" -- small i, acute accent -->
2060 <!ENTITY icirc CDATA "&#238;" -- small i, circumflex accent -->
2061 <!ENTITY iuml CDATA "&#239;" -- small i, dieresis or umlaut mark -->
2062 <!ENTITY eth CDATA "&#240;" -- small eth, Icelandic -->
2063 <!ENTITY ntilde CDATA "&#241;" -- small n, tilde -->
2064 <!ENTITY ograve CDATA "&#242;" -- small o, grave accent -->
2065 <!ENTITY oacute CDATA "&#243;" -- small o, acute accent -->
2066 <!ENTITY ocirc CDATA "&#244;" -- small o, circumflex accent -->
2067 <!ENTITY otilde CDATA "&#245;" -- small o, tilde -->
2068
2069
2070
2071 Expires 2 December 1996 [Page 37]
2072
2073 Internet Draft HTML internationalization 27 May 1996
2074
2075
2076 <!ENTITY ouml CDATA "&#246;" -- small o, dieresis or umlaut mark -->
2077 <!ENTITY divide CDATA "&#247;" -- divide sign -->
2078 <!ENTITY oslash CDATA "&#248;" -- small o, slash -->
2079 <!ENTITY ugrave CDATA "&#249;" -- small u, grave accent -->
2080 <!ENTITY uacute CDATA "&#250;" -- small u, acute accent -->
2081 <!ENTITY ucirc CDATA "&#251;" -- small u, circumflex accent -->
2082 <!ENTITY uuml CDATA "&#252;" -- small u, dieresis or umlaut mark -->
2083 <!ENTITY yacute CDATA "&#253;" -- small y, acute accent -->
2084 <!ENTITY thorn CDATA "&#254;" -- small thorn, Icelandic -->
2085 <!ENTITY yuml CDATA "&#255;" -- small y, dieresis or umlaut mark -->
2086
2087
2088 Bibliography
2089
2090 [BRYAN88] M. Bryan, "SGML -- An Author's Guide to the Standard
2091 Generalized Markup Language", Addison-Wesley, Reading,
2092 1988.
2093
2094 [ERCS] Extended Reference Concrete Syntax for SGML.
2095 <http://www.sgmlopen.org/sgml/docs/ercs/ercs-
2096 home.html>
2097
2098 [GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed.,
2099 Oxford University Press, 1990.
2100
2101 [HTTP-1.1] R.T. Fielding, H. Frystyk Nielsen, and T. Berners-Lee,
2102 "Hypertext Transfer Protocol -- HTTP/1.1", Work in
2103 progress (draft-ietf-http-v11-spec-03.txt), MIT/LCS,
2104 May 1996.
2105
2106 [ISO-639] ISO 639:1988. Codes pour la représentation des noms de
2107 langue. Technical content in
2108 <http://www.sil.org/sgml/iso639a.html>
2109
2110 [ISO-3166] ISO 3166:1993. Codes pour la représentation des noms
2111 de pays.
2112
2113 [ISO-8601] ISO 8601:1988. Éléments de données et formats
2114 d'échange -- Échange d'information -- Représentation
2115 de la date et de l'heure.
2116
2117 [ISO-8859-1] ISO 8859-1:1987. International Standard -- Informa-
2118 tion Processing -- 8-bit Single-Byte Coded Graphic
2119 Character Sets -- Part 1: Latin Alphabet No. 1.
2120
2121 [ISO-8879] ISO 8879:1986. International Standard -- Information
2122 Processing -- Text and Office Systems -- Standard Gen-
2123 eralized Markup Language (SGML).
2124
2125
2126
2127 Expires 2 December 1996 [Page 38]
2128
2129 Internet Draft HTML internationalization 27 May 1996
2130
2131
2132 [ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Infor-
2133 mation technology -- Universal Multiple-Octet Coded
2134 Character Set (UCS) -- Part 1: Architecture and Basic
2135 Multilingual Plane.
2136
2137 [NICOL] G.T. Nicol, "The Multilingual World Wide Web", Elec-
2138 tronic Book Technologies, 1995,
2139 <http://www.ebt.com/docs/multling.html>
2140
2141 [NICOL2] G.T. Nicol, "MIME Header Supplemented File Type", Work
2142 in progress, <draft-nicol-mime-header-type-00.txt>,
2143 EBT, October 1995.
2144
2145 [RFC1345] K. Simonsen, "Character Mnemonics & Character Sets",
2146 RFC 1345, Rationel Almen Planlaegning, June 1992.
2147
2148 [RFC1468] J. Murai, M. Crispin and E. van der Poel, "Japanese
2149 Character Encoding for Internet Messages", RFC 1468,
2150 Keio University, Panda Programming, June 1993.
2151
2152 [RFC1521] N. Borenstein and N. Freed, "MIME (Multipurpose Inter-
2153 net Mail Extensions) Part One: Mechanisms for Specify-
2154 ing and Describing the Format of Internet Message Bod-
2155 ies", RFC 1521, Bellcore, Innosoft, September 1993.
2156
2157 [RFC1641] D. Goldsmith, M.Davis, "Using Unicode with MIME", RFC
2158 1641, Taligent inc., July 1994.
2159
2160 [RFC1642] D. Goldsmith, M. Davis, "UTF-7: A Mail-safe Transfor-
2161 mation Format of Unicode", RFC 1642, Taligent inc.,
2162 July 1994.
2163
2164 [RFC1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform
2165 Resource Locators (URL)", RFC 1738, CERN, Xerox PARC,
2166 University of Minnesota, October 1994.
2167
2168 [RFC1766] H. Alverstrand, "Tags for the Identification of Lan-
2169 guages", RFC 1766, UNINETT, March 1995.
2170
2171 [RFC1866] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan-
2172 guage - 2.0", RFC 1866, MIT/W3C, November 1995.
2173
2174 [RFC1867] E. Nebel and L. Masinter, "Form-based File Upload in
2175 HTML", RFC 1867, Xerox Corporation, November 1995.
2176
2177 [RFC1942] D. Raggett, "HTML Tables", RFC 1942, W3C, May 1996.
2178
2179
2180
2181
2182
2183 Expires 2 December 1996 [Page 39]
2184
2185 Internet Draft HTML internationalization 27 May 1996
2186
2187
2188 [RFC1945] T. Berners-Lee, R.T. Fielding, and H. Frystyk Nielsen,
2189 "Hypertext Transfer Protocol -- HTTP/1.0", RFC 1945,
2190 MIT/LCS, UC Irvine, May 1996.
2191
2192 [SQ91] SoftQuad, "The SGML Primer", 3rd ed., SoftQuad Inc.,
2193 1991.
2194
2195 [TAKADA] Toshihiro Takada, "Multilingual Information Exchange
2196 through the World-Wide Web", Computer Networks and
2197 ISDN Systems, Vol. 27, No. 2, Nov. 1994 , p. 235-241.
2198
2199 [TEI] TEI Guidelines for Electronic Text Encoding and Inter-
2200 change. <http://etext.virgina.edu/TEI.html>
2201
2202 [UNICODE] The Unicode Consortium, "The Unicode Standard --
2203 Worldwide Character Encoding -- Version 1.0", Addison-
2204 Wesley, Volume 1, 1991, Volume 2, 1992, and Technical
2205 Report #4, 1993. The BIDI algorithm is in appendix A
2206 of volume 1, with corrections in appendix D of volume
2207 2.
2208
2209 [UTF-8] ISO/IEC 10646-1:1993 AMENDMENT 2 (1996). UCS Transfor-
2210 mation Format 8 (UTF-8).
2211
2212 [VANH90] E. van Hervijnen, "Practical SGML", Kluwer Academicq
2213 Publishers Group, Norwell and Dordrecht, 1990.
2214
2215 Authors' Addresses
2216
2217 François Yergeau
2218 Alis Technologies
2219 100, boul. Alexis-Nihon, bureau 600
2220 Montréal QC H4M 2P2
2221 Canada
2222
2223 Tel: +1 (514) 747-2547
2224 Fax: +1 (514) 747-2561
2225 EMail: fyergeau@alis.com
2226
2227
2228 Gavin Thomas Nicol
2229 Electronic Book Technologies, Japan
2230 1-29-9 Tsurumaki,
2231 Setagaya-ku,
2232 Tokyo
2233 Japan
2234
2235 Tel: +81-3-3230-8161
2236
2237
2238
2239 Expires 2 December 1996 [Page 40]
2240
2241 Internet Draft HTML internationalization 27 May 1996
2242
2243
2244 Fax: +81-3-3230-8163
2245 EMail: gtn@ebt.com, gtn@twics.co.jp
2246
2247
2248 Glenn Adams
2249 Spyglass
2250 118 Magazine Street
2251 Cambridge, MA 02139
2252 U.S.A.
2253
2254 Tel: +1 (617) 864-5524
2255 Fax: +1 (617) 864-4965
2256 EMail: glenn@spyglass.com
2257
2258
2259 Martin J. Duerst
2260 Multimedia-Laboratory
2261 Department of Computer Science
2262 University of Zurich
2263 Winterthurerstrasse 190
2264 CH-8057 Zurich
2265 Switzerland
2266
2267 Tel: +41 1 257 43 16
2268 Fax: +41 1 363 00 35
2269 E-mail: mduerst@ifi.unizh.ch
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295 Expires 2 December 1996 [Page 41]
2296

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24