/[suikacvs]/webroot/www/2004/id/draft-ietf-html-i18n-00.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-html-i18n-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:04 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
Error occurred while calculating annotation data.
New

1
2 Network Working Group F. Yergeau
3 Internet Draft G. Nicol
4 <draft-ietf-html-i18n-00.txt> G. Adams
5 Expires 20 February 1996 M. Duerst
6 15 August 1995
7
8
9 Internationalization of the Hypertext Markup Language
10
11
12 Status of this Memo
13
14 This document is an Internet-Draft. Internet-Drafts are working doc-
15 uments of the Internet Engineering Task Force (IETF), its areas, and
16 its working groups. Note that other groups may also distribute work-
17 ing documents as Internet-Drafts.
18
19 Internet-Drafts are draft documents valid for a maximum of six
20 months. Internet-Drafts may be updated, replaced, or obsoleted by
21 other documents at any time. It is not appropriate to use Internet-
22 Drafts as reference material or to cite them other than as a "working
23 draft" or "work in progress".
24
25 To learn the current status of any Internet-Draft, please check the
26 1id-abstracts.txt listing contained in the Internet-Drafts Shadow
27 Directories on ds.internic.net (US East Coast), nic.nordu.net
28 (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
29 Rim).
30
31 Distribution of this document is unlimited. Please send comments to
32 the HTML working group (HTML-WG) of the Internet Engineering Task
33 Force (IETF) at <html-wg@oclc.org>. Discussions of the group are
34 archived at URL: http://www.acl.lanl.gov/HTML_WG/archives.html.
35
36
37 Abstract
38
39 The Hypertext Markup Language (HTML) is a simple markup language used
40 to create hypertext documents that are platform independent. Up to
41 the present time, the application of HTML on the World Wide Web was
42 seriously restricted by its reliance on the ISO-8859-1 coded charac-
43 ter set, which is appropriate only for Western European languages.
44 Despite this restriction, HTML has been widely used with other lan-
45 guages, using other coded character sets or character encodings,
46 through various ad hoc extensions to the language.
47
48 This document is meant to address the issue of the internationaliza-
49 tion of HTML by extending the specification of HTML 2.0 and giving
50
51
52
53 Expires 20 February 1996 [Page 1]
54
55 Internet Draft HTML internationalization 15 August 1995
56
57
58 additional recommendations for proper internationalisation support.
59 A foremost consideration is to make sure that HTML remains a valid
60 application of SGML, while enabling its use in all languages of the
61 world.
62
63 The "text/html; version=2.x" Internet Media Type [RFC1590] and MIME
64 Content Type [RFC1521] is defined by this specification, taken
65 together with the HTML 2.0 specification [HTML-2].
66
67
68 Table of contents
69
70 1. Introduction .................................................. 2
71 1.1. Scope ...................................................... 3
72 1.2. Conformance ................................................ 3
73 2. The document character set ..................................... 5
74 2.1. Reference processing model ................................. 5
75 2.2. The HTML 2.x document character set ........................ 7
76 2.3. Undisplayable characters ................................... 8
77 3. Language tags .................................................. 8
78 4. Additional entities and elements ...............................10
79 4.1. Full Latin-1 entity set ....................................10
80 4.2. Date, time, measures and monetary amounts ..................10
81 4.3. Entities and elements for language-dependent presentation ..12
82 5. Forms ..........................................................15
83 5.1. DTD additions ..............................................15
84 5.2. Form submission ............................................17
85 6. Miscellaneous ..................................................17
86 7. HTML public text ...............................................18
87 7.1. HTML DTD ...................................................18
88 7.2. SGML declaration for HTML ..................................34
89 7.3. Entity sets ................................................36
90 7.3.1. ISO Latin 1 character entity set .......................36
91 7.3.2. BIDI entity set ........................................39
92 Bibliography ......................................................39
93 Authors' Addresses ................................................41
94
95
96 1. Introduction
97
98 The Hypertext Markup Language (HTML) is a simple markup language used
99 to create hypertext documents that are platform independent. Up to
100 the present time, the application of HTML on the World Wide Web was
101 seriously restricted by its reliance on the ISO-8859-1 coded charac-
102 ter set, which is appropriate only for Western European languages.
103 Despite this restriction, HTML has been widely used with other lan-
104 guages, using other coded character sets or character encodings,
105 through various ad hoc extensions to the language [TAKADA].
106
107
108
109 Expires 20 February 1996 [Page 2]
110
111 Internet Draft HTML internationalization 15 August 1995
112
113
114 This document is meant to address the issue of the internationaliza-
115 tion of HTML by extending the specification of HTML 2.0 and giving
116 additional recommendations for proper internationalisation support.
117 It is in good part based on a paper by one of the authors on multi-
118 lingualism on the WWW [NICOL]. A foremost consideration is to make
119 sure that HTML remains a valid application of SGML, while enabling
120 its use in all languages of the world.
121
122 The specific issues addressed are the SGML document character set to
123 be used for HTML, the proper treatment of the charset parameter asso-
124 ciated with the "text/html" content type and the specification of
125 language tags and additional entities.
126
127
128 1.1 Scope
129
130 HTML has been in use by the World-Wide Web (WWW) global information
131 initiative since 1990. This specification extends the capabilities
132 of HTML 2.0 (RFC xxx), primarily by removing the restriction to the
133 ISO-8859-1 coded character set [ISO-8859-1]. Together with the HTML
134 2.0 specification, it defines a new version of HTML to be known as
135 "HTML 2.x".
136
137 HTML is an application of ISO Standard 8879:1986, Information Pro-
138 cessing Text and Office Systems -- Standard Generalized Markup Lan-
139 guage (SGML) [ISO-8879]. The HTML Document Type Definition (DTD) is a
140 formal definition of the HTML syntax in terms of SGML. This specifi-
141 cation amends the DTD of HTML 2.0 in order to make it applicable to
142 documents encompassing a character repertoire much larger than that
143 of ISO-8859-1, while still remaining SGML conformant.
144
145 Together with the HTML 2.0, specification, this specification also
146 defines HTML as an Internet Media Type [RFC1590] and MIME Content
147 Type [RFC1521] called "text/html", or "text/html; version=2.x". As
148 such, it defines the semantics of the HTML syntax and how that syntax
149 should be interpreted by user agents.
150
151
152 1.2 Conformance
153
154 This specification governs the syntax of HTML documents and aspects
155 of the behavior of HTML user agents.
156
157 1.2.1 Documents
158
159 A document is a conforming HTML document if:
160
161 * It is a conforming SGML document, and it conforms to the HTML DTD
162
163
164
165 Expires 20 February 1996 [Page 3]
166
167 Internet Draft HTML internationalization 15 August 1995
168
169
170 (see 7.1, "HTML DTD").
171
172 * It conforms to the application conventions in this specification.
173 For example, the value of the HREF attribute of the <A> element
174 must conform to the URI syntax.
175
176 1.2.2. User agents
177
178 An HTML user agent conforms to this specification if:
179
180 * It parses the characters of an HTML document into data characters
181 and markup according to SGML [ISO-8879].
182
183 NOTE -- In the interest of robustness and extensibility,
184 there are a number of widely deployed conventions for han-
185 dling non-conforming documents. See section 4.2.1 of the
186 HTML 2.0 specification [HTML-2], "Undeclared Markup Error
187 Handling" for details.
188
189 * It supports at least the ISO-8859-1 character encoding scheme and
190 processes each character in the ISO Latin Alphabet No. 1 as speci-
191 fied in section 6.1 of [HTML-2].
192
193 To ensure interoperability and proper support for at least
194 ISO-8859-1 in an environment where character encoding schemes
195 other than ISO-8859-1 are present, user agents must correctly
196 interpret the charset parameter accompanying an HTML document
197 received from the network.
198
199 Furthermore, conforming user-agents are required to at least parse
200 correctly numeric character references outside the range of
201 ISO-8859-1, but within that of UCS-2.
202
203 NOTE -- To support non-western writing systems, HTML user
204 agents are encouraged to support `ISO-10646-UCS-2' or simi-
205 lar character encoding schemes and as much of the character
206 repertoire of [ISO-10646] as is practical.
207
208 * It behaves identically for documents whose parsed token sequences
209 are identical.
210
211 For example, comments and the whitespace in tags disappear during
212 tokenization, and hence they do not influence the behavior of con-
213 forming user agents.
214
215 * It allows the user to traverse (or at least attempt to traverse,
216 resources permitting) all hyperlinks from <A> elements in an HTML
217 document.
218
219
220
221 Expires 20 February 1996 [Page 4]
222
223 Internet Draft HTML internationalization 15 August 1995
224
225
226 An HTML user agent is a level 2 user agent if, additionally:
227
228 * It allows the user to express all form field values specified in
229 an HTML document and to (attempt to) submit the values as requests
230 to information services.
231
232 2. The document character set
233
234 2.1. Reference processing model
235
236 This overview explains the reference processing model used for HTML
237 2.x, and in particular the SGML concept of a document character set.
238 An actual implementation may widely differ in its internal workings
239 from the model given below, but should behave as described to an out-
240 side observer.
241
242 Because there are various widely differing encodings of text, SGML
243 does not directly address the question of how characters are encoded
244 e.g. in a file. SGML views the characters as a single set (called a
245 "character repertoire"), and a "code set" that assigns an integer
246 number (known as "character number") to each character in the reper-
247 toire. The document character set declaration defines what each of
248 the character numbers represents [GOLD90, p. 451]. In most cases, an
249 SGML DTD and all documents that refer to it have a single document
250 character set, and all markup and data characters are part of this
251 set.
252
253 HTML, as an application of SGML, does not directly address the ques-
254 tion of how characters are encoded as octets in external representa-
255 tions such as files. This is deferred to mechanisms external to HTML,
256 such as the HTTP protocol, or MIME for electronic mail.
257
258 For the HTTP protocol [HTTP], the way characters are encoded is
259 defined by the "charset" parameter[1] added to the "Content-Type"
260 field of the header of an HTTP response. For example, to indicate
261 that the transmitted document is encoded in the "JIS" encoding of
262 Japanese [RFC1468], the header will contain the following line:
263
264 Content-Type: text/html; charset=ISO-2022-JP
265
266 _________________________
267 [1] The use of the keyword "charset" in MIME suggests
268 that the corresponding parameter defines a character
269 set in the terms used here. This is not true, the
270 "charset" parameter actually specifies an encoding,
271 i.e. the mapping of one (or several) character set(s)
272 to octets.
273
274
275
276
277 Expires 20 February 1996 [Page 5]
278
279 Internet Draft HTML internationalization 15 August 1995
280
281
282 The default character set parameter in case of the HTTP protocol is
283 ISO-8859-1 (the so-called "Latin-1" for Western European characters).
284 The HTTP protocol also defines a mechanism for the client to define
285 the character encodings it can accept. Clients and servers are
286 strongly requested to use these mechanisms to assure correct trans-
287 mission and interpretation of any document. Provisions that can be
288 taken to help correct interpretation, even in cases where a server or
289 client do not yet use these mechanisms, are described in section 6.
290
291 Similarly, if HTML documents are transferred by electronic mail, the
292 character encoding is defined by the "charset" parameter of the "Con-
293 tent-Type" MIME header line [RFC1521].
294
295 In the case any other way of transferring and storing HTML documents
296 are defined or become popular, it is advised that similar provisions
297 should be made to clearly identify the character encoding used and/or
298 to use a single/default encoding capable of representing the widest
299 range of characters used in an international context.
300
301 Whatever the external character encoding actually be, it is always
302 translated to a representation of the document character set speci-
303 fied in Section 2.2 before processing specific to SGML/HTML. The
304 reference processing model can be depicted as follows:
305
306 [resource]->[decoder]->[entity ]->[ SGML ]->[application]->[display]
307 [manager] [parser]
308 ^ |
309 | |
310 +----------+
311
312 The decoder is responsible for decoding the external representation
313 of the resource to a representation using the document character set.
314 The entity manager, the parser, and the application deal only with
315 characters of the document character set. A display-oriented part of
316 the application or the display machinery itself may again convert
317 characters represented in the document character set to some other
318 representation more suitable for their purpose. In any case, the
319 entity manager, the parser, and the application, as far as character
320 semantics are concerned, are using the HTML 2.x document character
321 set only.
322
323 An actual implementation may choose to translate the document into
324 some encoding of the document character set as described above. How-
325 ever, the behaviour described by this reference processing model can
326 be achieved otherwise, in particular by using scan-suppression tech-
327 niques. This subject is well out of the scope of this specification,
328 however, and the reader is invited to consult the SGML standard
329 [ISO-8879] or a SGML handbook [BRYAN88] [GOLD90] [VANH90] [SQ91] for
330
331
332
333 Expires 20 February 1996 [Page 6]
334
335 Internet Draft HTML internationalization 15 August 1995
336
337
338 further information.
339
340 The most important consequence of this reference processing model is
341 that numeric character references are always resolved to the same
342 characters, whatever the external encoding actually used. For an
343 example, see Section 2.2.
344
345 2.2. The HTML 2.x document character set
346
347 The document character set, in the SGML sense, of HTML 2.x is the
348 Basic Multilingual Plane of ISO 10646:1993 [ISO-10646], also known as
349 UCS-2. This is code-by-code identical with the Unicode standard
350 [UNICODE]. The adoption of this document character set implies a
351 change in the SGML declaration specified in the HTML 2.0 specifica-
352 tion (section 9.5 of [HTML-2]). The change amounts to removing the
353 two BASESET specifications and their accompanying DESCSET declara-
354 tions, replacing them with the following declaration:
355
356 BASESET "ISO Registration Number 176//CHARSET
357 ISO/IEC 10646-1:1993 UCS-2 with implementation level 3
358 //ESC 2/5 2/15 4/5"
359 DESCSET 0 9 UNUSED
360 9 2 9
361 11 2 UNUSED
362 13 1 13
363 14 18 UNUSED
364 32 95 32
365 127 1 UNUSED
366 128 32 UNUSED
367 160 65376 160
368
369 Making UCS-2 the document character set does not create non-
370 conformance of any expression, construct or document that is conform-
371 ing to HTML 2.0. It does make conforming certain constructs that are
372 not admissible in HTML 2.0. One consequence is that data characters
373 outside the repertoire of ISO-8859-1, but within that of UCS-2 become
374 valid SGML characters. Another is that the upper limit of the range
375 of numeric character references is extended from 255 to 65533[2] ;
376 thus, &#1048; is a valid reference to a "CYRILLIC CAPITAL LETTER I".
377 [ERCS] is a good source of information on Unicode and SGML, although
378 its scope and technical content differ greatly from this
379 _________________________
380 [2] 65533 (FFFD hexadecimal) is the last valid char-
381 acter in UCS-2. 65534 (FFFE hexadecimal) is unassigned
382 and reserved as the byte-swapped version of ZERO WIDTH
383 NON-BREAKING SPACE for byte-sex detection purposes.
384 65535 (FFFF hexadecimal) is unassigned.
385
386
387
388
389 Expires 20 February 1996 [Page 7]
390
391 Internet Draft HTML internationalization 15 August 1995
392
393
394 specification.
395
396 ISO 10646-1:1993 is the most encompassing character set currently
397 existing, and there is no other character set that could take its
398 place as the document character set for HTML 2.x. Also, it is
399 expected that with future extensions of ISO 10646, this specification
400 may also be extended. If nevertheless for a specific application
401 there is a need to use characters outside this standard, this should
402 be done by avoiding any conflicts with present or future versions of
403 ISO 10646, i.e. by assigning these characters to a private zone.
404 Also, it should be borne in mind that such a use will be highly
405 unportable; in many cases, it may be better to use inline bitmaps.
406
407 2.3. Undisplayable characters
408
409 With the document character set being the full ISO 10646 BMP, the
410 possibility that a character cannot be displayed due to lack of
411 appropriate resources (fonts) cannot be avoided. Because there are
412 many different things that can be done in such a case, this document
413 does not recommend any specific behaviour. Depending on the implemen-
414 tation, this may also be handled by the underlaying display system
415 and not the application itself. The following considerations, how-
416 ever, may be of help:
417
418 - A clearly visible, but unobtrusive behaviour should be preferred.
419 Some documents may contain many characters that cannot be renden-
420 dered, and so showing an alert for each of them is not the right
421 thing to do.
422
423 - In case a numeric representation of the missing character is
424 given, its hexadecimal (not decimal) form is to be preferred,
425 because this form is used in character set standards [ERCS].
426
427 3. Language tags
428
429 Language tags can be used to control rendering of a marked up docu-
430 ment in various ways: character disambiguation, in cases where the
431 character encoding is not sufficient to resolve to a specific glyph;
432 quotation marks; hyphenation; ligatures; spacing; voice synthesis;
433 etc. Independently of rendering issues, language markup is useful as
434 content markup for purposes such as classification and searching.
435
436 The language attribute, LANG, takes as its value a language tag that
437 identifies a natural language spoken, written, or otherwise conveyed
438 by human beings for communication of information to other human
439 beings. Computer languages are explicitly excluded.
440
441 The syntax and registry of HTML language tags is the same as that
442
443
444
445 Expires 20 February 1996 [Page 8]
446
447 Internet Draft HTML internationalization 15 August 1995
448
449
450 defined by RFC 1766 [RFC1766]. In summary, a language tag is composed
451 of one or more parts: A primary language tag and a possibly empty
452 series of subtags:
453
454 language-tag = primary-tag *( "-" subtag )
455 primary-tag = 1*8ALPHA
456 subtag = 1*8ALPHA
457
458 Whitespace is not allowed within the tag and all tags are case-
459 insensitive. The namespace of language tags is administered by the
460 IANA. Example tags include:
461
462 en, en-US, en-cockney, i-cherokee, x-pig-latin
463
464 Two-letter primary-tags are reserved for ISO 639 language abbrevia-
465 tions [ISO-639], and three-letter primary-tags for the language
466 abbreviations of ISO CD 639-2 [ISO-CD-639-2] (the latter is in addi-
467 tion to the requirements of RFC 1766). Any two-letter initial subtag
468 is an ISO 3166 country code [ISO-3166].
469
470 In the context of HTML, a language tag is not to be interpreted as a
471 single token, as per RFC 1766, but as a hierarchy. For example, a
472 user agent that adjusts rendering according to language should con-
473 sider that it has a match when a language tag in a style sheet entry
474 matches the initial portion of the language tag of an element. An
475 exact match should be preferred. This interpretation allows an ele-
476 ment marked up as, for instance, "en-US" to trigger styles corre-
477 sponding to, in order of preference, US-English ("en-US") or 'plain'
478 or 'international' English ("en").
479
480 NOTE -- using the language tag as a hierarchy does not
481 imply that all languages with a common prefix will be
482 understood by those fluent in one or more of those lan-
483 guages; it simply allows the user to request this commonal-
484 ity when it is true for that user.
485
486 Since any text can logically be assigned a language, almost all HTML
487 elements admit the LANG attribute. The DTD reflects this. It is
488 also intended that any new element introduced in later versions of
489 HTML will admit the LANG attribute, unless there is a good reason not
490 to do so.
491
492 For the cases where a word or phrase differs only by language from
493 the surrounding text, an element is needed as a container. This ele-
494 ment is called LANG, and admits the LANG attribute.
495
496 The rendering of elements is meant to be controlled (in part) by the
497 LANG attribute. Specific user preferences set within the browser
498
499
500
501 Expires 20 February 1996 [Page 9]
502
503 Internet Draft HTML internationalization 15 August 1995
504
505
506 should override the value of the LANG attribute, which in turn over-
507 rides the value specified by the LANG attribute of any enclosing ele-
508 ment. If none of these are set, a suitable default, perhaps con-
509 trolled by the user's locale, should be used to control rendering.
510
511 4. Additional entities and elements
512
513 4.1. Full Latin-1 entity set
514
515 According to the suggestion of section 14 of [HTML-2], the set of
516 Latin-1 entities is extended to cover the whole right part of
517 ISO-8859-1. The names of the entities are taken from the appendices
518 of [SGML]. A list is provided in section 7.3.1 of this specifica-
519 tion.
520
521 4.2. Date, time, measures and monetary amounts
522
523 One problem that faces the Web is that of data representation. Given
524 the date "12/9/95", many people will think that this represents the
525 12th of September, 1995, while many others will think it represents
526 December 9th. The same problem arises for many other data forms. It
527 is desireable that the Web have a culture-neutral format for data, so
528 that browsers can display the data in the most appropriate format for
529 the end user. However, taking away all presentation choice from the
530 publishers is also a bad idea, hence, some way of supplying override-
531 able presentation hints is also desireable. A set of elements are
532 proposed below to address the above problem.
533
534
535 DATE This is used to store dates in such a way that formatting
536 can be decided upon by the browser. It is desirable that
537 the document author be able to provide the default format,
538 with the end-user making the final decision. This format-
539 ting is decided upon by the combination, of the CALENDAR
540 and LANG attributes. The declaration of the DATE element
541 is:
542
543 <!ELEMENT DATE - O #EMPTY>
544 <!ATTLIST DATE
545 %attrs;
546 CALENDAR CDATA #IMPLIED --specify possible values? --
547 VALUE CDATA #REQUIRED
548 >
549
550 If the CALENDAR attribute is not specified, the Gregorian
551 calendar should be assumed, in which case, the format for
552 the value of the VALUE attribute should be in yyyy-mm-dd
553 format, as per ISO 8601:1988 [ISO-8601].
554
555
556
557 Expires 20 February 1996 [Page 10]
558
559 Internet Draft HTML internationalization 15 August 1995
560
561
562 TIME Like the DATE element, the TIME element is used to store
563 time such that it is independent of geographical location,
564 and formatting. The declaration of the TIME element is:
565
566 <!ELEMENT TIME - O #EMPTY>
567 <!ATTLIST TIME
568 %attrs;
569 ZONE CDATA #IMPLIED
570 VALUE CDATA #REQUIRED
571 >
572
573 The contents of VALUE should be in hh:mm:ss.ss format. ZONE
574 should contain a string representing the offset of the zone
575 from GMT of the form "+HHMM" or "-HHMM". If omitted, Uni-
576 versal Time (GMT) should be assumed. For example, <TIME
577 ZONE="-0500" VALUE="11:35:04"> represents eleven hours
578 thirty-five minutes and four seconds after midnight in
579 Eastern North America, which is 16:35:04 GMT.
580
581
582 MEASURE This element is designed to allow measurements to be marked
583 up such that they can be converted between systems, and
584 also to allow some formatting flexibility. The declaration
585 of the MEASURE element is:
586
587 <!ELEMENT MEASURE - O #EMPTY>
588 <!ATTLIST MEASURE
589 %attrs;
590 TYPE (mass|length|area|volume|temp|dur) #REQUIRED
591 UNIT CDATA #IMPLIED
592 VALUE CDATA #REQUIRED
593 >
594
595 This is a variation of the TEI MEASURE element [TEI]. The
596 TYPE attribute specifies the type of measurement being rep-
597 resented. The UNIT attribute indicates the measurement unit
598 type, and defaults to the applicable unit type from SI
599 [ISO-1000] if not specified. The VALUE attribute specifies
600 the amount of the unit. The contents of the VALUE unit
601 should be parseable using the float_constant pattern from
602 the following lex(1) definition:
603
604 digit [0-9]
605 exponent [eE][+-]?{digit}+
606 i {digit}+
607 float_constant[+-]?({i}|({i}.{i}?)|({i}?.{i})){exponent}?
608
609
610
611
612
613 Expires 20 February 1996 [Page 11]
614
615 Internet Draft HTML internationalization 15 August 1995
616
617
618 MONEY This element is designed to represent monetary amounts,
619 such that conversion between systems and formatting varia-
620 tions are possible. The definition of this element is:
621
622 <!ELEMENT MONEY - O #EMPTY>
623 <!ATTLIST MONEY
624 %attrs;
625 UNIT CDATA #REQUIRED
626 VALUE CDATA #REQUIRED
627 >
628
629 The UNIT attribute specifies the currency unit, using the
630 abbreviations of ISO 4217 [ISO-4217]. The VALUE attribute
631 contains the amount, and should follow the lexical model of
632 the VALUE attribute of the MEASURE element. It is conceiv-
633 able that the functionality of this element could be made
634 part of MEASURE.
635
636 It should be noted that there are many special cases involving the
637 representation of data. For example, many people in New Zealand still
638 use miles, even though New Zealand has officially adopted the metric
639 system. Worse, some people use miles when they mean kilometers. This
640 proposal is not aimed at handling all such cases, but rather to pre-
641 sent a reasonable balance between usability, and accuracy. When for-
642 mat is of the utmost importance, these tags need not be used.
643
644
645 4.3. Entities and elements for language-dependent presentation
646
647 For the correct presentation of text from certain languages (irre-
648 spective of formatting issues), some support in the form of addi-
649 tional entities and elements is needed. In particular, bidirectional
650 text (BIDI for short) requires markup in special circumstances where
651 ambiguities as to the directionnality of some characters have to be
652 resolved. First, a set of named character entities is added that
653 allows full support of the Unicode bidirectional algorithm [UNICODE],
654 plus some help with languages requiring contextual analysis for ren-
655 dering:
656
657 <!ENTITY zwnj SDATA "[zwnj ]"--=zero width non-joiner-->
658 <!ENTITY zwj SDATA "[zwj ]"--=zero width joiner-->
659 <!ENTITY lrm SDATA "[lrm ]"--=left-to-right mark-->
660 <!ENTITY rlm SDATA "[rlm ]"--=right-to-left mark-->
661 <!ENTITY lre SDATA "[lre ]"--=left-to-right embedding-->
662 <!ENTITY rle SDATA "[rle ]"--=right-to-left embedding-->
663 <!ENTITY pdf SDATA "[pdf ]"--=pop directional formatting-->
664 <!ENTITY lro SDATA "[lro ]"--=left-to-right override-->
665 <!ENTITY rlo SDATA "[rlo ]"--=right-to-left override-->
666
667
668
669 Expires 20 February 1996 [Page 12]
670
671 Internet Draft HTML internationalization 15 August 1995
672
673
674 These correspond to the following characters from ISO/IEC
675 10646-1:1993 (with the equivalent numeric character reference added
676 at the right):
677
678 0x200C ZERO WIDTH NON-JOINER &#8204;
679 0x200D ZERO WIDTH JOINER &#8205;
680 0x200E LEFT-TO-RIGHT MARK &#8206;
681 0x200F RIGHT-TO-LEFT MARK &#8207;
682 0x202A LEFT-TO-RIGHT EMBEDDING &#8234;
683 0x202B RIGHT-TO-LEFT EMBEDDING &#8235;
684 0x202C POP DIRECTIONAL FORMATTING &#8236;
685 0x202D LEFT-TO-RIGHT OVERRIDE &#8237;
686 0x202E RIGHT-TO-LEFT OVERRIDE &#8238;
687
688 These entities affect the ability to render BIDI text in a semanti-
689 cally legible fashion. That is, without these special BIDI charac-
690 ters, cases arise which would prevent *any* rendering whatsoever that
691 reflected the basic meaning of the text. It is for this reason that
692 these special characters were added to Unicode (and, thence, to
693 ISO/IEC 10646). If it were possible to do reliable layout and ren-
694 dering of bidirectionnal text without them, they definitely would not
695 have been included in Unicode (at least not the stateful characters:
696 LRE, RLE, LRO, LRO, and PDF). They are needed for the following:
697
698 1. RTL MARK, LTR MARK - used to disambiguate directionality
699 of directionally neutral characters, e.g., if you have a
700 double quote sitting between an Arabic and a Latin letter,
701 then which direction does the quote resolve to? These
702 characters are like zero width spaces which have a direc-
703 tional property (but no word/line break property).
704
705 2. ZWJ, ZWNJ - used to force or block joining behavior in
706 contexts which joining would occur but should not or would
707 not occur but should. For example, ARABIC LETTER HEH is
708 used to abbreviate "Hijri" (the Islamic calendrical sys-
709 tem); however, the isolated form of HEH looks like the
710 digit five as employed in Arabic script (actually based on
711 Indic digits). In order to prevent one from reading HEH as
712 a final digit five in a year, the initial form of HEH is
713 used. However, there is no following context (i.e., a
714 joining letter) to which the HEH can join. Therefore, the
715 ZWJ is used to provide that context. In Farsi texts, there
716 are cases where a letter that normally would join a subse-
717 quent letter in a cursive connection does not. Here the
718 ZWNJ is used.
719
720 3. RTL EMBEDDING, LTR EMBEDDING is used to handle nested
721 directional runs such as:
722
723
724
725 Expires 20 February 1996 [Page 13]
726
727 Internet Draft HTML internationalization 15 August 1995
728
729
730 Given the following latin/arabic letters in backing store
731 with the specified embeddings:
732
733 LRE L0 L1 RLE A0 A1 LRE L2 L3 PDF A2 A3 PDF L4 L5 PDF
734
735 One gets the following rendering (with [] showing the
736 directional transitions):
737
738 [ L0 L1 [ A3 A2 [ L2 L3 ] A1 A0 ] L4 L5 ]
739
740 On the other hand, without these characters, e.g., with
741
742 L0 L1 A0 A1 L2 L3 A2 A3 L4 L5
743
744 and a base level of LTR one gets the following rendering:
745
746 [ L0 L1 [ A1 A0 ] L2 L3 [ A3 A2 ] L4 L5 ]
747
748 Notice that A1,A0 is on the left and A3,A2 on the right
749 unlike the above case where the embedding levels are used.
750 Without the embedding characters one has at most two lev-
751 els: a base directional level and a single counterflow
752 directional level.
753
754 A common need for the embedding characters is to handle
755 text that has been pasted from one bidi context to another
756 and the possibility of multiply embedding pastings.
757
758 4. LTR OVERRIDE, RTL OVERRIDE - these are needed to deal
759 with unusual pieces of text in which directionality cannot
760 be resolved from context in an unambiguous fashion. For
761 example, in part numbers, formulas, telephone numbers, and
762 other similar pieces of text, it is difficult or impossible
763 to derive the directionality of numbers, punctuation, and
764 other neutrals from their context.
765
766 To handle the case of the directional controls appearing directly in
767 the text as coded characters, a new element, entities and SHORTREFS
768 are defined:
769
770 <!ELEMENT BIDI - - (%text)+>
771 <!ATTLIST BIDI
772 %attrs;
773 DIR (ltr|rtl) #IMPLIED
774 FORCE (gad|dag) #IMPLIED
775 >
776
777 The dir attribute corresponds to the 'embedding' entities (lre and
778
779
780
781 Expires 20 February 1996 [Page 14]
782
783 Internet Draft HTML internationalization 15 August 1995
784
785
786 rle), while the FORCE attribute corresponds to the 'override' ones
787 (lro and rlo). Different allowed values of these attributes have to
788 be used because of the quixotic semantics of SGML regarding tokens in
789 name token groups. To support the occurrence of Unicode BIDI charac-
790 ters in text (as coded characters), the following is defined:
791
792 <!ENTITY lretag "<BIDI DIR=LTR>" >
793 <!ENTITY rletag "<BIDI DIR=RTL>" >
794 <!ENTITY lrotag "<BIDI FORCE=GAD>" >
795 <!ENTITY rlotag "<BIDI FORCE=DAG>" >
796 <!ENTITY pdftag "</BIDI>" >
797 <!SHORTREF bidi "&#LRE;" lretag
798 "&#RLE;" rletag
799 "&#LRO;" lrotag
800 "&#RLO;" rlotag
801 "&#PDF;" pdftag
802 >
803
804 In this case LRE, RLE, LRO, RLO, and PDF have to be declared as func-
805 tion names (mapped to the appropriate character numbers) in the SGML
806 declaration's concrete syntax:
807
808 FUNCTION
809 LRE FUNCHAR 8234 -- LEFT-TO-RIGHT EMBEDDING --
810 RLE FUNCHAR 8235 -- RIGHT-TO-LEFT EMBEDDING --
811 PDF FUNCHAR 8236 -- POP DIRECTIONAL FORMATTING --
812 LRO FUNCHAR 8237 -- LEFT-TO-RIGHT OVERRIDE --
813 RLO FUNCHAR 8238 -- RIGHT-TO-LEFT OVERRIDE --
814
815 The above shortrefs and <BIDI> element allow dealing with existing
816 text containing bidi controls, and doing so in the framework of
817 marked up text.
818
819 Another additional element is important to have for proper language-
820 dependent rendering. Short quotations, and in particular the quota-
821 tion marks surrounding them, are typically rendered differently in
822 different languages and on platforms with different graphic capabili-
823 ties: "a quotation in English", `another, slightly better one', ,,a
824 quotation in German", << a quotation in French >>. The <Q> element
825 is introduced for that purpose.
826
827 5. Forms
828
829
830 5.1. DTD additions
831
832 It is natural to expect input in any language in forms, as they pro-
833 vide one of the only ways of obtaining user input. While this is
834
835
836
837 Expires 20 February 1996 [Page 15]
838
839 Internet Draft HTML internationalization 15 August 1995
840
841
842 primarily a UI issue, there are some things that should be specified
843 at the HTML level to guide behavior and promote interoperability.
844
845 One is to add variants of the INPUT element corresponding to the ele-
846 ments described in section 4.2, thus allowing locale-independent
847 transmission of dates, times, etc. to a server. Specifically, DATE,
848 TIME, MEASURE and MONEY are added as possible values of the TYPE
849 attribute of the INPUT ELEMENT. Prior to transmission, the data
850 should be converted to a canonical form, where possible. For example,
851 if a user entered "24/12/1996" into a DATE field, it should be con-
852 verted to "1996-12-24" when transmitted. Where this is not possible,
853 information corresponding to the attributes of the elements defined
854 need to be transmitted as well. This can be accomplished by expanding
855 capabilities of the value part of the name-value pairs used to trans-
856 mit forms data. The following syntax is recommended:
857
858 forms-data = pair-list*
859 pair-list = pair ";" pair-list | pair
860 pair = name "=" value
861 name = text
862 value = simple-value | complex-value
863 simple-value = text
864 complex-value = "(" pair-list* ")"
865
866 In complex-values, the name of the attribute is used as the name part
867 of the name-value pair, "value" being the most common one.For exam-
868 ple, a date might be transmitted as:
869
870 date=(value=24/12/96;calendar=gregorian;lang=en-uk)
871
872 suitably encoded.
873
874 To ensure interoperability, it is necessary for the user agent (and
875 the user) to have an indication of the character set(s) that the
876 server providing a form will be able to handle upon submission of the
877 filled-in form. Such an indication is provided by the ACCEPT-CHARSET
878 attribute of the FORM element, modeled on the HTTP Accept-Charset
879 header (see [HTTP]), which contains a space and/or comma delimited
880 list of character sets acceptable to the server. A user agent may
881 want to somehow advise the user of the contents of this attribute, or
882 to restrict his possibility to enter unacceptable characters.
883
884 NOTE -- The list of character sets is to be interpreted as
885 an EXCLUSIVE-OR list; the server announces that it is ready
886 to accept any ONE of these character encoding schemes for
887 each part of a multipart entity.
888
889
890
891
892
893 Expires 20 February 1996 [Page 16]
894
895 Internet Draft HTML internationalization 15 August 1995
896
897
898 5.2. Form submission
899
900 The HTML 2.0 form submission mechanism, based on the "application/x-
901 www-form-urlencoded" media type, is hopelessly broken with regard to
902 internationalization. In fact, since URLs are restricted to ASCII
903 characters, the mechanism is broken even for ISO-8859-1 text. Sec-
904 tion 2.2 of [RFC1738] specifies that octets may be encoded using the
905 "%HH" notation, but text submitted from a form is composed of charac-
906 ters, not octets. Lacking a specification of a character encoding
907 scheme, the "%HH" notation has no meaning.
908
909 A partial solution to this sorry state of affairs is to specify a
910 default character encoding scheme to be assumed when the GET method
911 of form submission is used. Specifying UCS-2 would break all exist-
912 ing forms, so the only sensible way is to designate ISO-8859-1. That
913 is, the encoded URL sent to submit a form by the GET method is to be
914 interpreted as a sequence of single-octet characters encoded accord-
915 ing to ISO-8859-1, and further encoded according to the scheme of
916 [RFC1738] (the "%HH" notation). This is clearly insufficient, so the
917 GET method of form submission is deprecated and should not be used in
918 future documents, despite the language of section XX of [HTML-2].
919
920 A better solution is to add a MIME charset parameter to the Content-
921 Type header sent along with a POST method form submission, with the
922 understanding that the URL encoding of [RFC1738] is applied on top of
923 the specified character encoding, as a kind of implicit Content-
924 Transfer-Encoding. The default ISO-8859-1 is to be implied in the
925 absence of a charset parameter.
926
927 The best solution is to use the "multipart/form-data" media type
928 described in [FILE-UPLOAD] with the POST method of form submission.
929 This mechanism encapsulates the value part of each name-value pair in
930 a body-part of a multipart MIME body that is sent as the HTTP entity;
931 each body part can be labeled with an appropriate Content-Type,
932 including if necessary a charset parameter that specifies the charac-
933 ter encoding scheme. The changes to the DTD necessary to support
934 this method of form submission have been incorporated in the DTD
935 included in this specification.
936
937 How the user agent determines the encoding of the text entered by the
938 user is outside the scope of this specification.
939
940 6. Miscellaneous
941
942 Proper interpretation of a text document requires that the character
943 encoding scheme be known. Current HTTP servers, however, do not gen-
944 erally include an appropriate charset parameter with the Content-Type
945 header, even when the encoding scheme is different from the default
946
947
948
949 Expires 20 February 1996 [Page 17]
950
951 Internet Draft HTML internationalization 15 August 1995
952
953
954 ISO-8859-1. This is bad behaviour, and as such strongly discouraged,
955 but some preventive measures can be taken to minimize the detrimental
956 effects.
957
958 In the case where a document is accessed from a hyperlink in an ori-
959 gin HTML document, a CHARSET attribute is added to the attribute list
960 of elements with link semantics (A and LINK), specifically by adding
961 it to the linkExtraAttributes entity. The value of that attribute is
962 to be considered a hint to the User Agent as to the character encod-
963 ing scheme used by the ressource pointed to by the hyperlink; it
964 should be the appropriate value of the MIME charset parameter for
965 that ressource.
966
967 In any document, it may be wise to include an indication of the
968 encoding scheme like the following, as early as possible within the
969 HEAD of the document:
970
971 <META HTTP-EQUIV="Content-Type"
972 CONTENT="text/html; charset=ISO-2022-JP">
973
974 This is not foolproof, but will work if the encoding scheme is such
975 that ASCII characters stand for themselves at least until the META
976 element is parsed.
977
978 For definiteness, the "charset" parameter received from the source of
979 the document should be considered the most authoritative, followed in
980 order of preference by the contents of a META element such as the
981 above, and finally the CHARSET parameter of the anchor that was fol-
982 lowed (if any).
983
984 7. HTML Public Text
985
986 7.1. HTML DTD
987
988 <!-- html-2.x.dtd
989
990 Document Type Definition for the HyperText Markup Language,
991 version 2.x (HTML DTD)
992
993 Authors: Daniel W. Connolly <connolly@w3.org>
994 Franois Yergeau <yergeau@alis.com>
995 -->
996
997 <!ENTITY % HTML.Version
998 "-//IETF//DTD HTML 2.x//EN"
999
1000 -- Typical usage:
1001
1002
1003
1004
1005 Expires 20 February 1996 [Page 18]
1006
1007 Internet Draft HTML internationalization 15 August 1995
1008
1009
1010 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.x//EN">
1011 <html>
1012 ...
1013 </html>
1014 --
1015 >
1016
1017
1018 <!--============ Feature Test Entities ========================-->
1019
1020 <!ENTITY % HTML.Recommended "IGNORE"
1021 -- Certain features of the language are necessary for
1022 compatibility with widespread usage, but they may
1023 compromise the structural integrity of a document.
1024 This feature test entity enables a more prescriptive
1025 document type definition that eliminates
1026 those features.
1027 -->
1028
1029 <![ %HTML.Recommended [
1030 <!ENTITY % HTML.Deprecated "IGNORE">
1031 ]]>
1032
1033 <!ENTITY % HTML.Deprecated "INCLUDE"
1034 -- Certain features of the language are necessary for
1035 compatibility with earlier versions of the specification,
1036 but they tend to be used and implemented inconsistently,
1037 and their use is deprecated. This feature test entity
1038 enables a document type definition that eliminates
1039 these features.
1040 -->
1041
1042 <!ENTITY % HTML.Highlighting "INCLUDE"
1043 -- Use this feature test entity to validate that a
1044 document uses no highlighting tags, which may be
1045 ignored on minimal implementations.
1046 -->
1047
1048 <!ENTITY % HTML.Forms "INCLUDE"
1049 -- Use this feature test entity to validate that a document
1050 contains no forms, which may not be supported in minimal
1051 implementations
1052 -->
1053
1054 <!ENTITY % HTML.Bidi "INCLUDE"
1055 -- Use this feature test entity to validate that a document
1056 does not use the BIDI element, entities and SHORTREFs,
1057 which may not be supported in some implementations
1058
1059
1060
1061 Expires 20 February 1996 [Page 19]
1062
1063 Internet Draft HTML internationalization 15 August 1995
1064
1065
1066 -->
1067
1068 <!--============== Imported Names ==============================-->
1069
1070 <!ENTITY % Content-Type "CDATA"
1071 -- meaning an internet media type
1072 (aka MIME content type, as per RFC1521)
1073 -->
1074
1075 <!ENTITY % HTTP-Method "GET | POST"
1076 -- as per HTTP specification, in progress
1077 -->
1078
1079 <!ENTITY % URI "CDATA"
1080 -- The term URI means a CDATA attribute
1081 whose value is a Uniform Resource Identifier.
1082 The syntax is defined by
1083
1084 RFC 1808, "Relative Uniform Resource Locators."
1085 R. Fielding, June 1995
1086
1087 Note that CDATA attributes are limited by the LITLEN
1088 capacity (1024 in the current version of html.decl),
1089 so that URIs in HTML have a bounded length.
1090
1091 -->
1092
1093
1094 <!--========= DTD "Macros" =====================-->
1095
1096 <!ENTITY % heading "H1|H2|H3|H4|H5|H6">
1097
1098 <!ENTITY % list " UL | OL | DIR | MENU " >
1099
1100 <!ENTITY % attrs -- common attributes for elements --
1101 "lang NAME #IMPLIED -- RFC 1766 language tag --">
1102 <!--or CDATA?-->
1103
1104 <!--======= Character mnemonic entities =================-->
1105
1106 <!ENTITY % ISOlat1 PUBLIC
1107 "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
1108 %ISOlat1;
1109
1110 <!ENTITY amp CDATA "&#38;" -- ampersand -->
1111 <!ENTITY gt CDATA "&#62;" -- greater than -->
1112 <!ENTITY lt CDATA "&#60;" -- less than -->
1113 <!ENTITY quot CDATA "&#34;" -- double quote -->
1114
1115
1116
1117 Expires 20 February 1996 [Page 20]
1118
1119 Internet Draft HTML internationalization 15 August 1995
1120
1121
1122 <!--========= SGML Document Access (SDA) Parameter Entities =====-->
1123
1124 <!-- HTML 2.0 contains SGML Document Access (SDA) fixed attributes
1125 in support of easy transformation to the International Committee
1126 for Accessible Document Design (ICADD) DTD
1127 "-//EC-USA-CDA/ICADD//DTD ICADD22//EN".
1128 ICADD applications are designed to support usable access to
1129 structured information by print-impaired individuals through
1130 Braille, large print and voice synthesis. For more information on
1131 SDA & ICADD:
1132 - ISO 12083:1993, Annex A.8, Facilities for Braille,
1133 large print and computer voice
1134 - ICADD ListServ
1135 <ICADD%ASUACAD.BITNET@ARIZVM1.ccit.arizona.edu>
1136 - Usenet news group bit.listserv.easi
1137 - Recording for the Blind, +1 800 221 4792
1138 -->
1139
1140 <!ENTITY % SDAFORM "SDAFORM CDATA #FIXED"
1141 -- one to one mapping -->
1142 <!ENTITY % SDARULE "SDARULE CDATA #FIXED"
1143 -- context-sensitive mapping -->
1144 <!ENTITY % SDAPREF "SDAPREF CDATA #FIXED"
1145 -- generated text prefix -->
1146 <!ENTITY % SDASUFF "SDASUFF CDATA #FIXED"
1147 -- generated text suffix -->
1148 <!ENTITY % SDASUSP "SDASUSP NAME #FIXED"
1149 -- suspend transform process -->
1150
1151
1152 <!--========= Entities for bidirectionnal text (BIDI) =========-->
1153
1154 <![ %HTML.Bidi [
1155
1156 <!ENTITY % HTMLbidi PUBLIC
1157 "-//IETF//ENTITIES bidi//EN//HTML">
1158 %HTMLbidi;
1159
1160 <!-- The following, together with the BIDI element, allow dealing
1161 with text containing BIDI controls in the context of marked
1162 up text. -->
1163 <!ENTITY lretag "<BIDI DIR=LTR>" >
1164 <!ENTITY rletag "<BIDI DIR=RTL>" >
1165 <!ENTITY lrotag "<BIDI FORCE=GAD>" >
1166 <!ENTITY rlotag "<BIDI FORCE=DAG>" >
1167 <!ENTITY pdftag "</BIDI>" >
1168 <!SHORTREF bidi
1169 "&#LRE;" lretag
1170
1171
1172
1173 Expires 20 February 1996 [Page 21]
1174
1175 Internet Draft HTML internationalization 15 August 1995
1176
1177
1178 "&#RLE;" rletag
1179 "&#PDF;" pdftag
1180 "&#LRO;" lrotag
1181 "&#RLO;" rlotag
1182 >
1183
1184 ]]>
1185
1186 <!--========== Text Markup =====================-->
1187
1188 <!ENTITY % loc.values "DATE | TIME | MEASURE | MONEY">
1189
1190 <![ %HTML.Highlighting [
1191
1192 <!ENTITY % font " TT | B | I ">
1193
1194 <!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE | Q">
1195
1196 <![ %HTML.Bidi [
1197 <!ENTITY % text "#PCDATA | A | IMG | BR | %phrase | %font | LANG | BIDI | %loc.values">
1198 ]]>
1199
1200 <!ENTITY % text "#PCDATA | A | IMG | BR | %phrase | %font | LANG | %loc.values">
1201
1202 <!ELEMENT (%font;|%phrase) - - (%text)*>
1203 <!ATTLIST ( TT | CODE | SAMP | KBD | VAR )
1204 %attrs;
1205 %SDAFORM; "Lit"
1206 >
1207 <!ATTLIST ( B | STRONG )
1208 %attrs;
1209 %SDAFORM; "B"
1210 >
1211 <!ATTLIST ( I | EM | CITE )
1212 %attrs;
1213 %SDAFORM; "It"
1214 >
1215
1216 <!-- <TT> Typewriter text -->
1217 <!-- <B> Bold text -->
1218 <!-- <I> Italic text -->
1219 <!-- <EM> Emphasized phrase -->
1220 <!-- <STRONG> Strong emphasis -->
1221 <!-- <CODE> Source code phrase -->
1222 <!-- <SAMP> Sample text or characters -->
1223 <!-- <KBD> Keyboard phrase, e.g. user input -->
1224 <!-- <VAR> Variable phrase or substituable -->
1225 <!-- <CITE> Name or title of cited work -->
1226
1227
1228
1229 Expires 20 February 1996 [Page 22]
1230
1231 Internet Draft HTML internationalization 15 August 1995
1232
1233
1234 <!ENTITY % pre.content "#PCDATA | A | HR | BR | %font | %phrase | LANG">
1235
1236 ]]>
1237
1238 <![ %HTML.Bidi [
1239
1240 <!ENTITY % text "#PCDATA | A | IMG | BR | LANG | BIDI | Q | %loc.values">
1241
1242 <!-- Should the BIDI element have an SDAFORM attr.? Which? -->
1243 <!ELEMENT BIDI - - (%text)+>
1244 <!ATTLIST BIDI
1245 %attrs;
1246 DIR (ltr|rtl) #IMPLIED
1247 FORCE (gad|dag) #IMPLIED
1248 >
1249
1250 <!-- <BIDI> Control bidirectionnal text -->
1251
1252 ]]>
1253
1254 <!ENTITY % text "#PCDATA | A | IMG | BR | LANG | Q | %loc.values">
1255
1256 <!ELEMENT BR - O EMPTY>
1257 <!ATTLIST BR
1258 %SDAPREF; "&#RE;"
1259 >
1260
1261 <!-- <BR> Line break -->
1262
1263 <!-- Should the LANG element have an SDAFORM attr.? Which? -->
1264 <!ELEMENT LANG - - (text)*>
1265 <!ATTLIST LANG
1266 %attrs;
1267 >
1268
1269 <!-- <LANG> Container for language attribute -->
1270
1271 <!ATTLIST Q
1272 %attrs;
1273 %SDAFORM; "It" -- to be verified --
1274 >
1275 <!-- <Q> Short quotation -->
1276
1277 <!--========= Date, time, measures and monetary amounts ===========-->
1278
1279 <!ELEMENT (%loc.values) - O EMPTY>
1280 <!ATTLIST DATE
1281 %attrs;
1282
1283
1284
1285 Expires 20 February 1996 [Page 23]
1286
1287 Internet Draft HTML internationalization 15 August 1995
1288
1289
1290 CALENDAR CDATA #IMPLIED
1291 VALUE CDATA #REQUIRED
1292 >
1293 <!ATTLIST TIME
1294 %attrs;
1295 ZONE CDATA #IMPLIED
1296 VALUE CDATA #REQUIRED
1297 >
1298 <!ATTLIST MEASURE
1299 %attrs;
1300 TYPE (weight|count|length|area|volume) #REQUIRED
1301 UNIT CDATA #IMPLIED
1302 VALUE CDATA #REQUIRED
1303 >
1304 <!ATTLIST MONEY
1305 %attrs;
1306 UNIT CDATA #REQUIRED
1307 VALUE CDATA #REQUIRED
1308 >
1309
1310 <!-- DATE A date value -->
1311 <!-- TIME A time value -->
1312 <!-- MEASURE A measurement (length, weight, etc) -->
1313 <!-- MONEY A monetary amount -->
1314
1315 <!--========= Link Markup ======================-->
1316
1317 <!ENTITY % linkType "NAME">
1318
1319 <!ENTITY % linkExtraAttributes
1320 "REL %linkType #IMPLIED
1321 REV %linkType #IMPLIED
1322 URN CDATA #IMPLIED
1323 TITLE CDATA #IMPLIED
1324 METHODS NAMES #IMPLIED
1325 CHARSET NAME #IMPLIED
1326 ">
1327
1328 <![ %HTML.Recommended [
1329 <!ENTITY % A.content "(%text)*"
1330 -- <H1><a name="xxx">Heading</a></H1>
1331 is preferred to
1332 <a name="xxx"><H1>Heading</H1></a>
1333 -->
1334 ]]>
1335
1336 <!ENTITY % A.content "(%heading|%text)*">
1337
1338
1339
1340
1341 Expires 20 February 1996 [Page 24]
1342
1343 Internet Draft HTML internationalization 15 August 1995
1344
1345
1346 <!ELEMENT A - - %A.content -(A)>
1347 <!ATTLIST A
1348 %attrs;
1349 HREF %URI #IMPLIED
1350 NAME CDATA #IMPLIED
1351 %linkExtraAttributes;
1352 %SDAPREF; "<Anchor: #AttList>"
1353 >
1354 <!-- <A> Anchor; source/destination of link -->
1355 <!-- <A NAME="..."> Name of this anchor -->
1356 <!-- <A HREF="..."> Address of link destination -->
1357 <!-- <A URN="..."> Permanent address of destination -->
1358 <!-- <A REL=...> Relationship to destination -->
1359 <!-- <A REV=...> Relationship of destination to this -->
1360 <!-- <A TITLE="..."> Title of destination (advisory) -->
1361 <!-- <A CHARSET="..."> Charset of destination (advisory) -->
1362 <!-- <A METHODS="..."> Operations on destination (advisory) -->
1363
1364
1365 <!--========== Images ==========================-->
1366
1367 <!ELEMENT IMG - O EMPTY>
1368 <!ATTLIST IMG
1369 %attrs;
1370 SRC %URI; #REQUIRED
1371 ALT CDATA #IMPLIED
1372 ALIGN (top|middle|bottom) #IMPLIED
1373 ISMAP (ISMAP) #IMPLIED
1374 %SDAPREF; "<Fig><?SDATrans Img: #AttList>#AttVal(Alt)</Fig>"
1375 >
1376
1377 <!-- <IMG> Image; icon, glyph or illustration -->
1378 <!-- <IMG SRC="..."> Address of image object -->
1379 <!-- <IMG ALT="..."> Textual alternative -->
1380 <!-- <IMG ALIGN=...> Position relative to text -->
1381 <!-- <IMG ISMAP> Each pixel can be a link -->
1382
1383 <!--========== Paragraphs=======================-->
1384
1385 <!ELEMENT P - O (%text)*>
1386 <!ATTLIST P
1387 %attrs;
1388 %SDAFORM; "Para"
1389 >
1390
1391 <!-- <P> Paragraph -->
1392
1393
1394
1395
1396
1397 Expires 20 February 1996 [Page 25]
1398
1399 Internet Draft HTML internationalization 15 August 1995
1400
1401
1402 <!--========== Headings, Titles, Sections ===============-->
1403
1404 <!ELEMENT HR - O EMPTY>
1405 <!ATTLIST HR
1406 %attrs;
1407 %SDAPREF; "&#RE;&#RE;"
1408 >
1409
1410 <!-- <HR> Horizontal rule -->
1411
1412 <!ELEMENT ( %heading ) - - (%text;)*>
1413 <!ATTLIST H1
1414 %attrs;
1415 %SDAFORM; "H1"
1416 >
1417 <!ATTLIST H2
1418 %attrs;
1419 %SDAFORM; "H2"
1420 >
1421 <!ATTLIST H3
1422 %attrs;
1423 %SDAFORM; "H3"
1424 >
1425 <!ATTLIST H4
1426 %attrs;
1427 %SDAFORM; "H4"
1428 >
1429 <!ATTLIST H5
1430 %attrs;
1431 %SDAFORM; "H5"
1432 >
1433 <!ATTLIST H6
1434 %attrs;
1435 %SDAFORM; "H6"
1436 >
1437
1438 <!-- <H1> Heading, level 1 -->
1439 <!-- <H2> Heading, level 2 -->
1440 <!-- <H3> Heading, level 3 -->
1441 <!-- <H4> Heading, level 4 -->
1442 <!-- <H5> Heading, level 5 -->
1443 <!-- <H6> Heading, level 6 -->
1444
1445
1446 <!--========== Text Flows ======================-->
1447
1448 <![ %HTML.Forms [
1449 <!ENTITY % block.forms "BLOCKQUOTE | FORM | ISINDEX">
1450
1451
1452
1453 Expires 20 February 1996 [Page 26]
1454
1455 Internet Draft HTML internationalization 15 August 1995
1456
1457
1458 ]]>
1459
1460 <!ENTITY % block.forms "BLOCKQUOTE">
1461
1462 <![ %HTML.Deprecated [
1463 <!ENTITY % preformatted "PRE | XMP | LISTING">
1464 ]]>
1465
1466 <!ENTITY % preformatted "PRE">
1467
1468 <!ENTITY % block "P | %list | DL
1469 | %preformatted
1470 | %block.forms">
1471
1472 <!ENTITY % flow "(%text|%block)*">
1473
1474 <!ENTITY % pre.content "#PCDATA | A | HR | BR | LANG">
1475 <!ELEMENT PRE - - (%pre.content)*>
1476 <!ATTLIST PRE
1477 %attrs;
1478 WIDTH NUMBER #implied
1479 %SDAFORM; "Lit"
1480 >
1481
1482 <!-- <PRE> Preformatted text -->
1483 <!-- <PRE WIDTH=...> Maximum characters per line -->
1484
1485 <![ %HTML.Deprecated [
1486
1487 <!ENTITY % literal "CDATA"
1488 -- historical, non-conforming parsing mode where
1489 the only markup signal is the end tag
1490 in full
1491 -->
1492
1493 <!ELEMENT (XMP|LISTING) - - %literal>
1494 <!ATTLIST XMP
1495 %attrs;
1496 %SDAFORM; "Lit"
1497 %SDAPREF; "Example:&#RE;"
1498 >
1499 <!ATTLIST LISTING
1500 %attrs;
1501 %SDAFORM; "Lit"
1502 %SDAPREF; "Listing:&#RE;"
1503 >
1504
1505 <!-- <XMP> Example section -->
1506
1507
1508
1509 Expires 20 February 1996 [Page 27]
1510
1511 Internet Draft HTML internationalization 15 August 1995
1512
1513
1514 <!-- <LISTING> Computer listing -->
1515
1516 <!ELEMENT PLAINTEXT - O %literal>
1517 <!-- <PLAINTEXT> Plain text passage -->
1518
1519 <!ATTLIST PLAINTEXT
1520 %attrs;
1521 %SDAFORM; "Lit"
1522 >
1523 ]]>
1524
1525
1526 <!--========== Lists ==================-->
1527
1528 <!ELEMENT DL - - (DT | DD)+>
1529 <!ATTLIST DL
1530 %attrs;
1531 COMPACT (COMPACT) #IMPLIED
1532 %SDAFORM; "List"
1533 %SDAPREF; "Definition List:"
1534 >
1535
1536 <!ELEMENT DT - O (%text)*>
1537 <!ATTLIST DT
1538 %attrs;
1539 %SDAFORM; "Term"
1540 >
1541
1542 <!ELEMENT DD - O %flow>
1543 <!ATTLIST DD
1544 %attrs;
1545 %SDAFORM; "LItem"
1546 >
1547
1548 <!-- <DL> Definition list, or glossary -->
1549 <!-- <DL COMPACT> Compact style list -->
1550 <!-- <DT> Term in definition list -->
1551 <!-- <DD> Definition of term -->
1552
1553 <!ELEMENT (OL|UL) - - (LI)+>
1554 <!ATTLIST OL
1555 %attrs;
1556 COMPACT (COMPACT) #IMPLIED
1557 %SDAFORM; "List"
1558 >
1559 <!ATTLIST UL
1560 %attrs;
1561 COMPACT (COMPACT) #IMPLIED
1562
1563
1564
1565 Expires 20 February 1996 [Page 28]
1566
1567 Internet Draft HTML internationalization 15 August 1995
1568
1569
1570 %SDAFORM; "List"
1571 >
1572 <!-- <UL> Unordered list -->
1573 <!-- <UL COMPACT> Compact list style -->
1574 <!-- <OL> Ordered, or numbered list -->
1575 <!-- <OL COMPACT> Compact list style -->
1576
1577
1578 <!ELEMENT (DIR|MENU) - - (LI)+ -(%block)>
1579 <!ATTLIST DIR
1580 %attrs;
1581 COMPACT (COMPACT) #IMPLIED
1582 %SDAFORM; "List"
1583 %SDAPREF; "<LHead>Directory</LHead>"
1584 >
1585 <!ATTLIST MENU
1586 %attrs;
1587 COMPACT (COMPACT) #IMPLIED
1588 %SDAFORM; "List"
1589 %SDAPREF; "<LHead>Menu</LHead>"
1590 >
1591
1592 <!-- <DIR> Directory list -->
1593 <!-- <DIR COMPACT> Compact list style -->
1594 <!-- <MENU> Menu list -->
1595 <!-- <MENU COMPACT> Compact list style -->
1596
1597 <!ELEMENT LI - O %flow>
1598 <!ATTLIST LI
1599 %attrs;
1600 %SDAFORM; "LItem"
1601 >
1602
1603 <!-- <LI> List item -->
1604
1605 <!--========== Document Body ===================-->
1606
1607 <![ %HTML.Recommended [
1608 <!ENTITY % body.content "(%heading|%block|HR|ADDRESS|IMG)*"
1609 -- <h1>Heading</h1>
1610 <p>Text ...
1611 is preferred to
1612 <h1>Heading</h1>
1613 Text ...
1614 -->
1615 ]]>
1616
1617 <!ENTITY % body.content "(%heading | %text | %block | HR | ADDRESS)*">
1618
1619
1620
1621 Expires 20 February 1996 [Page 29]
1622
1623 Internet Draft HTML internationalization 15 August 1995
1624
1625
1626 <!ELEMENT BODY O O %body.content>
1627 <!ATTLIST BODY
1628 %attrs;
1629 >
1630
1631 <!-- <BODY> Document body -->
1632
1633 <!ELEMENT BLOCKQUOTE - - %body.content>
1634 <!ATTLIST BLOCKQUOTE
1635 %attrs;
1636 %SDAFORM; "BQ"
1637 >
1638
1639 <!-- <BLOCKQUOTE> Quoted passage -->
1640
1641 <!ELEMENT ADDRESS - - (%text|P)*>
1642 <!ATTLIST ADDRESS
1643 %attrs;
1644 %SDAFORM; "Lit"
1645 %SDAPREF; "Address:&#RE;"
1646 >
1647
1648 <!-- <ADDRESS> Address, signature, or byline -->
1649
1650
1651 <!--======= Forms ====================-->
1652
1653 <![ %HTML.Forms [
1654
1655 <!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
1656 <!ATTLIST FORM
1657 %attrs;
1658 ACTION %URI #IMPLIED
1659 METHOD (%HTTP-Method) GET
1660 ENCTYPE %Content-Type; "application/x-www-form-urlencoded"
1661 ACCEPT-CHARSET CDATA #IMPLIED
1662 %SDAPREF; "<Para>Form:</Para>"
1663 %SDASUFF; "<Para>Form End.</Para>"
1664 >
1665
1666 <!-- <FORM> Fill-out or data-entry form -->
1667 <!-- <FORM ACTION="..."> Address for completed form -->
1668 <!-- <FORM METHOD=...> Method of submitting form -->
1669 <!-- <FORM ENCTYPE="..."> Representation of form data -->
1670
1671 <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
1672 RADIO | SUBMIT | RESET |
1673 IMAGE | HIDDEN | DATE |
1674
1675
1676
1677 Expires 20 February 1996 [Page 30]
1678
1679 Internet Draft HTML internationalization 15 August 1995
1680
1681
1682 TIME | MEASURE | MONEY |
1683 FILE)">
1684 <!ELEMENT INPUT - O EMPTY>
1685 <!ATTLIST INPUT
1686 %attrs;
1687 TYPE %InputType TEXT
1688 NAME CDATA #IMPLIED
1689 VALUE CDATA #IMPLIED
1690 SRC %URI #IMPLIED
1691 CHECKED (CHECKED) #IMPLIED
1692 SIZE CDATA #IMPLIED
1693 MAXLENGTH NUMBER #IMPLIED
1694 ALIGN (top|middle|bottom) #IMPLIED
1695 ACCEPT CDATA #IMPLIED --list of content types --
1696 %SDAPREF; "Input: "
1697 >
1698
1699 <!-- <INPUT> Form input datum -->
1700 <!-- <INPUT TYPE=...> Type of input interaction -->
1701 <!-- <INPUT NAME=...> Name of form datum -->
1702 <!-- <INPUT VALUE="..."> Default/initial/selected value -->
1703 <!-- <INPUT SRC="..."> Address of image -->
1704 <!-- <INPUT CHECKED> Initial state is "on" -->
1705 <!-- <INPUT SIZE=...> Field size hint -->
1706 <!-- <INPUT MAXLENGTH=...> Data length maximum -->
1707 <!-- <INPUT ALIGN=...> Image alignment -->
1708
1709 <!ELEMENT SELECT - - (OPTION+) -(INPUT|SELECT|TEXTAREA)>
1710 <!ATTLIST SELECT
1711 %attrs;
1712 NAME CDATA #REQUIRED
1713 SIZE NUMBER #IMPLIED
1714 MULTIPLE (MULTIPLE) #IMPLIED
1715 %SDAFORM; "List"
1716 %SDAPREF;
1717 "<LHead>Select #AttVal(Multiple)</LHead>"
1718 >
1719
1720 <!-- <SELECT> Selection of option(s) -->
1721 <!-- <SELECT NAME=...> Name of form datum -->
1722 <!-- <SELECT SIZE=...> Options displayed at a time -->
1723 <!-- <SELECT MULTIPLE> Multiple selections allowed -->
1724
1725 <!ELEMENT OPTION - O (#PCDATA)*>
1726 <!ATTLIST OPTION
1727 %attrs;
1728 SELECTED (SELECTED) #IMPLIED
1729 VALUE CDATA #IMPLIED
1730
1731
1732
1733 Expires 20 February 1996 [Page 31]
1734
1735 Internet Draft HTML internationalization 15 August 1995
1736
1737
1738 %SDAFORM; "LItem"
1739 %SDAPREF;
1740 "Option: #AttVal(Value) #AttVal(Selected)"
1741 >
1742
1743 <!-- <OPTION> A selection option -->
1744 <!-- <OPTION SELECTED> Initial state -->
1745 <!-- <OPTION VALUE="..."> Form datum value for this option-->
1746
1747 <!ELEMENT TEXTAREA - - (#PCDATA)* -(INPUT|SELECT|TEXTAREA)>
1748 <!ATTLIST TEXTAREA
1749 %attrs;
1750 NAME CDATA #REQUIRED
1751 ROWS NUMBER #REQUIRED
1752 COLS NUMBER #REQUIRED
1753 %SDAFORM; "Para"
1754 %SDAPREF; "Input Text -- #AttVal(Name): "
1755 >
1756
1757 <!-- <TEXTAREA> An area for text input -->
1758 <!-- <TEXTAREA NAME=...> Name of form datum -->
1759 <!-- <TEXTAREA ROWS=...> Height of area -->
1760 <!-- <TEXTAREA COLS=...> Width of area -->
1761
1762 ]]>
1763
1764
1765 <!--======= Document Head ======================-->
1766
1767 <![ %HTML.Recommended [
1768 <!ENTITY % head.extra "">
1769 ]]>
1770 <!ENTITY % head.extra "& NEXTID?">
1771
1772 <!ENTITY % head.content "TITLE & ISINDEX? & BASE? %head.extra">
1773
1774 <!ELEMENT HEAD O O (%head.content) +(META|LINK)>
1775
1776 <!-- <HEAD> Document head -->
1777
1778 <!ELEMENT TITLE - - (#PCDATA)*>
1779 <!ATTLIST TITLE
1780 %attrs;
1781 %SDAFORM; "Ti" >
1782
1783 <!-- <TITLE> Title of document -->
1784
1785 <!ELEMENT LINK - O EMPTY>
1786
1787
1788
1789 Expires 20 February 1996 [Page 32]
1790
1791 Internet Draft HTML internationalization 15 August 1995
1792
1793
1794 <!ATTLIST LINK
1795 %attrs;
1796 HREF %URI #REQUIRED
1797 %linkExtraAttributes;
1798 %SDAPREF; "Linked to : #AttVal (TITLE) (URN) (HREF)>" >
1799
1800 <!-- <LINK> Link from this document -->
1801 <!-- <LINK HREF="..."> Address of link destination -->
1802 <!-- <LINK URN="..."> Lasting name of destination -->
1803 <!-- <LINK REL=...> Relationship to destination -->
1804 <!-- <LINK REV=...> Relationship of destination to this -->
1805 <!-- <LINK TITLE="..."> Title of destination (advisory) -->
1806 <!-- <LINK CHARSET="..."> Charset of destination (advisory) -->
1807 <!-- <LINK METHODS="..."> Operations allowed (advisory) -->
1808
1809 <!ELEMENT ISINDEX - O EMPTY>
1810 <!ATTLIST ISINDEX
1811 %attrs;
1812 %SDAPREF;
1813 "<Para>[Document is indexed/searchable.]</Para>">
1814
1815 <!-- <ISINDEX> Document is a searchable index -->
1816
1817 <!ELEMENT BASE - O EMPTY>
1818 <!ATTLIST BASE
1819 HREF %URI; #REQUIRED >
1820
1821 <!-- <BASE> Base context document -->
1822 <!-- <BASE HREF="..."> Address for this document -->
1823
1824 <!ELEMENT NEXTID - O EMPTY>
1825 <!ATTLIST NEXTID
1826 N CDATA #REQUIRED >
1827
1828 <!-- <NEXTID> Next ID to use for link name -->
1829 <!-- <NEXTID N=...> Next ID to use for link name -->
1830
1831 <!ELEMENT META - O EMPTY>
1832 <!ATTLIST META
1833 HTTP-EQUIV NAME #IMPLIED
1834 NAME NAME #IMPLIED
1835 CONTENT CDATA #REQUIRED
1836 >
1837
1838 <!-- <META> Generic Metainformation -->
1839 <!-- <META HTTP-EQUIV=...> HTTP response header name -->
1840 <!-- <META NAME=...> Metainformation name -->
1841 <!-- <META CONTENT="..."> Associated information -->
1842
1843
1844
1845 Expires 20 February 1996 [Page 33]
1846
1847 Internet Draft HTML internationalization 15 August 1995
1848
1849
1850 <!--======= Document Structure =================-->
1851
1852 <![ %HTML.Deprecated [
1853 <!ENTITY % html.content "HEAD, BODY, PLAINTEXT?">
1854 ]]>
1855 <!ENTITY % html.content "HEAD, BODY">
1856
1857 <!ELEMENT HTML O O (%html.content)>
1858 <!ENTITY % version.attr "VERSION CDATA #FIXED '%HTML.Version;'">
1859
1860 <!ATTLIST HTML
1861 %attrs;
1862 %version.attr;
1863 %SDAFORM; "Book"
1864 >
1865
1866 <!-- <HTML> HTML Document -->
1867
1868
1869 7.2. SGML Declaration for HTML
1870
1871 <!SGML "ISO 8879:1986"
1872 --
1873 SGML Declaration for HyperText Markup Language version 2.x
1874 (HTML 2.x).
1875
1876 --
1877
1878 CHARSET
1879 BASESET "ISO Registration Number 176//CHARSET
1880 ISO/IEC 10646-1:1993 UCS-2 with
1881 implementation level 3//ESC 2/5 2/15 4/5"
1882 DESCSET 0 9 UNUSED
1883 9 2 9
1884 11 2 UNUSED
1885 13 1 13
1886 14 18 UNUSED
1887 32 95 32
1888 127 1 UNUSED
1889 128 32 UNUSED
1890 160 65376 160
1891
1892
1893 CAPACITY SGMLREF
1894 TOTALCAP 150000
1895 GRPCAP 150000
1896 ENTCAP 150000
1897
1898
1899
1900
1901 Expires 20 February 1996 [Page 34]
1902
1903 Internet Draft HTML internationalization 15 August 1995
1904
1905
1906 SCOPE DOCUMENT
1907 SYNTAX
1908 SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1909 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127
1910 BASESET "ISO Registration Number 176//CHARSET
1911 ISO/IEC 10646-1:1993 UCS-2 with
1912 implementation level 3//ESC 2/5 2/15 4/5"
1913 DESCSET 0 65536 0
1914 FUNCTION
1915 RE 13
1916 RS 10
1917 SPACE 32
1918 TAB SEPCHAR 9
1919 EN-QUAD SEPCHAR 8192
1920 EM-QUAD SEPCHAR 8193
1921 EN-SPACE SEPCHAR 8194
1922 EM-SPACE SEPCHAR 8195
1923 T-P-E-SP SEPCHAR 8196
1924 F-P-E-SP SEPCHAR 8197
1925 S-P-E-SP SEPCHAR 8198
1926 FIG-SP SEPCHAR 8199
1927 PUNC-SP SEPCHAR 8200
1928 THIN-SP SEPCHAR 8201
1929 HAIR-SP SEPCHAR 8202
1930 Z-W-SP SEPCHAR 8203
1931 IDEO-SP SEPCHAR 12288
1932 LRE FUNCHAR 8234 -- LEFT-TO-RIGHT EMBEDDING --
1933 RLE FUNCHAR 8235 -- RIGHT-TO-LEFT EMBEDDING --
1934 PDF FUNCHAR 8236 -- POP DIRECTIONAL FORMATTING --
1935 LRO FUNCHAR 8237 -- LEFT-TO-RIGHT OVERRIDE --
1936 RLO FUNCHAR 8238 -- RIGHT-TO-LEFT OVERRIDE --
1937
1938 NAMING LCNMSTRT ""
1939 UCNMSTRT ""
1940 LCNMCHAR ".-"
1941 UCNMCHAR ".-"
1942 NAMECASE GENERAL YES
1943 ENTITY NO
1944 DELIM GENERAL SGMLREF
1945 SHORTREF SGMLREF
1946 "&#8234;" -- LEFT-TO-RIGHT EMBEDDING --
1947 "&#8235;" -- RIGHT-TO-LEFT EMBEDDING --
1948 "&#8236;" -- POP DIRECTIONAL FORMATTING --
1949 "&#8237;" -- LEFT-TO-RIGHT OVERRIDE --
1950 "&#8238;" -- RIGHT-TO-LEFT OVERRIDE --
1951 NAMES SGMLREF
1952 QUANTITY SGMLREF
1953 ATTSPLEN 2100
1954
1955
1956
1957 Expires 20 February 1996 [Page 35]
1958
1959 Internet Draft HTML internationalization 15 August 1995
1960
1961
1962 LITLEN 1024
1963 NAMELEN 72 -- somewhat arbitrary; taken from
1964 internet line length conventions --
1965 PILEN 1024
1966 TAGLVL 100
1967 TAGLEN 2100
1968 GRPGTCNT 150
1969 GRPCNT 64
1970
1971 FEATURES
1972 MINIMIZE
1973 DATATAG NO
1974 OMITTAG YES
1975 RANK NO
1976 SHORTTAG YES
1977 LINK
1978 SIMPLE NO
1979 IMPLICIT NO
1980 EXPLICIT NO
1981 OTHER
1982 CONCUR NO
1983 SUBDOC NO
1984 FORMAL YES
1985 APPINFO "SDA" -- conforming SGML Document Access application
1986 --
1987 >
1988
1989
1990 7.3. Entity sets
1991
1992 7.3.1. ISO Latin 1 Character Entity Set
1993
1994 The following public text lists each of the characters specified in the
1995 Added Latin 1 entity set, along with its name, syntax for use, and
1996 description. This list is derived from ISO Standard 8879:1986//ENTITIES
1997 Added Latin 1//EN. HTML includes the entire entity set, and adds enti-
1998 ties for all missing characters in the right part of ISO-8859-1.
1999
2000 <!-- (C) International Organization for Standardization 1986
2001 Permission to copy in any form is granted for use with
2002 conforming SGML systems and applications as defined in
2003 ISO 8879, provided this notice is included in all copies.
2004 -->
2005 <!-- Character entity set. Typical invocation:
2006 <!ENTITY % ISOlat1 PUBLIC
2007 "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
2008 %ISOlat1;
2009 -->
2010
2011
2012
2013 Expires 20 February 1996 [Page 36]
2014
2015 Internet Draft HTML internationalization 15 August 1995
2016
2017
2018 <!ENTITY nbsp CDATA "&#160;" -- no-break space -->
2019 <!ENTITY iexcl CDATA "&#161;" -- inverted exclamation mark -->
2020 <!ENTITY cent CDATA "&#162;" -- cent sign -->
2021 <!ENTITY pound CDATA "&#163;" -- pound sterling sign -->
2022 <!ENTITY curren CDATA "&#164;" -- general currency sign -->
2023 <!ENTITY yen CDATA "&#165;" -- yen sign -->
2024 <!ENTITY brvbar CDATA "&#166;" -- broken (vertical) bar -->
2025 <!ENTITY sect CDATA "&#167;" -- section sign -->
2026 <!ENTITY uml CDATA "&#168;" -- umlaut (dieresis) -->
2027 <!ENTITY copy CDATA "&#169;" -- copyright sign -->
2028 <!ENTITY ordf CDATA "&#170;" -- ordinal indicator, feminine -->
2029 <!ENTITY laquo CDATA "&#171;" -- angle quotation mark, left -->
2030 <!ENTITY not CDATA "&#172;" -- not sign -->
2031 <!ENTITY shy CDATA "&#173;" -- soft hyphen -->
2032 <!ENTITY reg CDATA "&#174;" -- registered sign -->
2033 <!ENTITY macr CDATA "&#175;" -- macron -->
2034 <!ENTITY deg CDATA "&#176;" -- degree sign -->
2035 <!ENTITY plusmn CDATA "&#177;" -- plus-or-minus sign -->
2036 <!ENTITY sup2 CDATA "&#178;" -- superscript two -->
2037 <!ENTITY sup3 CDATA "&#179;" -- superscript three -->
2038 <!ENTITY acute CDATA "&#180;" -- acute accent -->
2039 <!ENTITY micro CDATA "&#181;" -- micro sign -->
2040 <!ENTITY para CDATA "&#182;" -- pilcrow (paragraph sign) -->
2041 <!ENTITY middot CDATA "&#183;" -- middle dot -->
2042 <!ENTITY cedil CDATA "&#184;" -- cedilla -->
2043 <!ENTITY sup1 CDATA "&#185;" -- superscript one -->
2044 <!ENTITY ordm CDATA "&#186;" -- ordinal indicator, masculine -->
2045 <!ENTITY raquo CDATA "&#187;" -- angle quotation mark, right -->
2046 <!ENTITY frac14 CDATA "&#188;" -- fraction one-quarter -->
2047 <!ENTITY frac12 CDATA "&#189;" -- fraction one-half -->
2048 <!ENTITY frac34 CDATA "&#190;" -- fraction three-quarters -->
2049 <!ENTITY iquest CDATA "&#191;" -- inverted question mark -->
2050 <!ENTITY Agrave CDATA "&#192;" -- capital A, grave accent -->
2051 <!ENTITY Aacute CDATA "&#193;" -- capital A, acute accent -->
2052 <!ENTITY Acirc CDATA "&#194;" -- capital A, circumflex accent -->
2053 <!ENTITY Atilde CDATA "&#195;" -- capital A, tilde -->
2054 <!ENTITY Auml CDATA "&#196;" -- capital A, dieresis or umlaut mark -->
2055 <!ENTITY Aring CDATA "&#197;" -- capital A, ring -->
2056 <!ENTITY AElig CDATA "&#198;" -- capital AE diphthong (ligature) -->
2057 <!ENTITY Ccedil CDATA "&#199;" -- capital C, cedilla -->
2058 <!ENTITY Egrave CDATA "&#200;" -- capital E, grave accent -->
2059 <!ENTITY Eacute CDATA "&#201;" -- capital E, acute accent -->
2060 <!ENTITY Ecirc CDATA "&#202;" -- capital E, circumflex accent -->
2061 <!ENTITY Euml CDATA "&#203;" -- capital E, dieresis or umlaut mark -->
2062 <!ENTITY Igrave CDATA "&#204;" -- capital I, grave accent -->
2063 <!ENTITY Iacute CDATA "&#205;" -- capital I, acute accent -->
2064 <!ENTITY Icirc CDATA "&#206;" -- capital I, circumflex accent -->
2065 <!ENTITY Iuml CDATA "&#207;" -- capital I, dieresis or umlaut mark -->
2066
2067
2068
2069 Expires 20 February 1996 [Page 37]
2070
2071 Internet Draft HTML internationalization 15 August 1995
2072
2073
2074 <!ENTITY ETH CDATA "&#208;" -- capital Eth, Icelandic -->
2075 <!ENTITY Ntilde CDATA "&#209;" -- capital N, tilde -->
2076 <!ENTITY Ograve CDATA "&#210;" -- capital O, grave accent -->
2077 <!ENTITY Oacute CDATA "&#211;" -- capital O, acute accent -->
2078 <!ENTITY Ocirc CDATA "&#212;" -- capital O, circumflex accent -->
2079 <!ENTITY Otilde CDATA "&#213;" -- capital O, tilde -->
2080 <!ENTITY Ouml CDATA "&#214;" -- capital O, dieresis or umlaut mark -->
2081 <!ENTITY times CDATA "&#215;" -- multiply sign -->
2082 <!ENTITY Oslash CDATA "&#216;" -- capital O, slash -->
2083 <!ENTITY Ugrave CDATA "&#217;" -- capital U, grave accent -->
2084 <!ENTITY Uacute CDATA "&#218;" -- capital U, acute accent -->
2085 <!ENTITY Ucirc CDATA "&#219;" -- capital U, circumflex accent -->
2086 <!ENTITY Uuml CDATA "&#220;" -- capital U, dieresis or umlaut mark -->
2087 <!ENTITY Yacute CDATA "&#221;" -- capital Y, acute accent -->
2088 <!ENTITY THORN CDATA "&#222;" -- capital Thorn, Icelandic -->
2089 <!ENTITY szlig CDATA "&#223;" -- small sharp s, German (sz ligature) -->
2090 <!ENTITY agrave CDATA "&#224;" -- small a, grave accent -->
2091 <!ENTITY aacute CDATA "&#225;" -- small a, acute accent -->
2092 <!ENTITY acirc CDATA "&#226;" -- small a, circumflex accent -->
2093 <!ENTITY atilde CDATA "&#227;" -- small a, tilde -->
2094 <!ENTITY auml CDATA "&#228;" -- small a, dieresis or umlaut mark -->
2095 <!ENTITY aring CDATA "&#229;" -- small a, ring -->
2096 <!ENTITY aelig CDATA "&#230;" -- small ae diphthong (ligature) -->
2097 <!ENTITY ccedil CDATA "&#231;" -- small c, cedilla -->
2098 <!ENTITY egrave CDATA "&#232;" -- small e, grave accent -->
2099 <!ENTITY eacute CDATA "&#233;" -- small e, acute accent -->
2100 <!ENTITY ecirc CDATA "&#234;" -- small e, circumflex accent -->
2101 <!ENTITY euml CDATA "&#235;" -- small e, dieresis or umlaut mark -->
2102 <!ENTITY igrave CDATA "&#236;" -- small i, grave accent -->
2103 <!ENTITY iacute CDATA "&#237;" -- small i, acute accent -->
2104 <!ENTITY icirc CDATA "&#238;" -- small i, circumflex accent -->
2105 <!ENTITY iuml CDATA "&#239;" -- small i, dieresis or umlaut mark -->
2106 <!ENTITY eth CDATA "&#240;" -- small eth, Icelandic -->
2107 <!ENTITY ntilde CDATA "&#241;" -- small n, tilde -->
2108 <!ENTITY ograve CDATA "&#242;" -- small o, grave accent -->
2109 <!ENTITY oacute CDATA "&#243;" -- small o, acute accent -->
2110 <!ENTITY ocirc CDATA "&#244;" -- small o, circumflex accent -->
2111 <!ENTITY otilde CDATA "&#245;" -- small o, tilde -->
2112 <!ENTITY ouml CDATA "&#246;" -- small o, dieresis or umlaut mark -->
2113 <!ENTITY divide CDATA "&#247;" -- divide sign -->
2114 <!ENTITY oslash CDATA "&#248;" -- small o, slash -->
2115 <!ENTITY ugrave CDATA "&#249;" -- small u, grave accent -->
2116 <!ENTITY uacute CDATA "&#250;" -- small u, acute accent -->
2117 <!ENTITY ucirc CDATA "&#251;" -- small u, circumflex accent -->
2118 <!ENTITY uuml CDATA "&#252;" -- small u, dieresis or umlaut mark -->
2119 <!ENTITY yacute CDATA "&#253;" -- small y, acute accent -->
2120 <!ENTITY thorn CDATA "&#254;" -- small thorn, Icelandic -->
2121 <!ENTITY yuml CDATA "&#255;" -- small y, dieresis or umlaut mark -->
2122
2123
2124
2125 Expires 20 February 1996 [Page 38]
2126
2127 Internet Draft HTML internationalization 15 August 1995
2128
2129
2130 7.3.2. BIDI Entity Set
2131
2132 The following entity set is sufficient to support the full Unicode
2133 bidirectionnal algorithm.
2134
2135 <!-- Character entity set. Typical invocation:
2136 <!ENTITY % HTMLbidi PUBLIC
2137 "-//IETF//ENTITIES bidi//EN//HTML">
2138 %HTMLbidi;
2139 -->
2140 <!ENTITY zwnj SDATA "&#8204;"--=zero width non-joiner-->
2141 <!ENTITY zwj SDATA "&#8205;"--=zero width joiner-->
2142 <!ENTITY lrm SDATA "&#8206;"--=left-to-right mark-->
2143 <!ENTITY rlm SDATA "&#8207;"--=right-to-left mark-->
2144 <!ENTITY lre SDATA "&#8234;"--=left-to-right embedding-->
2145 <!ENTITY rle SDATA "&#8235;"--=right-to-left embedding-->
2146 <!ENTITY pdf SDATA "&#8236;"--=pop directional formatting-->
2147 <!ENTITY lro SDATA "&#8237;"--=left-to-right override-->
2148 <!ENTITY rlo SDATA "&#8238;"--=right-to-left override-->
2149
2150
2151 Bibliography
2152
2153 [BRYAN88] M. Bryan, "SGML -- An Author's Guide to the Standard
2154 Generalized Markup Language", Addison-Wesley, Reading,
2155 1988.
2156
2157 [ERCS] Extended Reference Concrete Syntax for SGML.
2158 <http://www.sgmlopen.org/sgml/docs/ercs/ercs-
2159 home.html>
2160
2161 [FILE-UPLOAD] E. Nebel and L. Masinter, "Form-based File Upload in
2162 HTML", Work in progress (draft-ietf-html-
2163 fileupload-02.txt), Xerox Corporation, April 1995.
2164
2165 [GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed.,
2166 Oxford University Press, 1990.
2167
2168 [HTML-2] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan-
2169 guage - 2.0", Work in progress (draft-ietf-html-
2170 spec-02.txt), MIT/W3C, May 1995.
2171
2172 [HTTP] T. Berners-Lee, R. T. Fielding, and H. Frystyk
2173 Nielsen, "Hypertext Transfer Protocol - HTTP/1.0",
2174 Work in progress (draft-ietf-http-v10-spec-00.ps),
2175 MIT, UC Irvine, CERN, March 1995.
2176
2177 [ISO-639] ISO 639:1988. Codes pour la reprsentation des noms de
2178
2179
2180
2181 Expires 20 February 1996 [Page 39]
2182
2183 Internet Draft HTML internationalization 15 August 1995
2184
2185
2186 langue. Technical content in
2187 <http://www.sil.org/sgml/iso639a.html>
2188
2189 [ISO-CD-639-2] ISO CD 639-2:1992. Technical content in
2190 <http://www.sil.org/sgml/iso639-2a.html>
2191
2192 [ISO-1000] ISO 1000:1992. Units SI et recommandations pour
2193 l'emploi de leurs multiples et de certaines autres
2194 units.
2195
2196 [ISO-3166] ISO 3166:1993. Codes pour la reprsentation des noms
2197 de pays.
2198
2199 [ISO-4217] ISO 4217:1990. Codes pour la reprsentation des mon-
2200 naies et types des fonds.
2201
2202 [ISO-8601] ISO 8601:1988. lments de donnes et formats
2203 d'change -- change d'information -- Reprsentation
2204 de la date et de l'heure.
2205
2206 [ISO-8859-1] ISO 8859-1:1987. International Standard -- Informa-
2207 tion Processing -- 8-bit Single-Byte Coded Graphic
2208 Character Sets -- Part 1: Latin Alphabet No. 1.
2209
2210 [ISO-8879] ISO 8879:1986. International Standard -- Information
2211 Processing -- Text and Office Systems -- Standard Gen-
2212 eralized Markup Language (SGML).
2213
2214 [ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Infor-
2215 mation technology -- Universal Multiple-Octet Coded
2216 Character Set (UCS) -- Part 1: Architecture and Basic
2217 Multilingual Plane.
2218
2219 [NICOL] G.T. Nicol, "The Multilingual World Wide Web", Elec-
2220 tronic Book Technologies, 1995,
2221 <http://www.ebt.com/docs/multling.html>
2222
2223 [RFC1468] J. Murai, M. Crispin and E. van der Poel, "Japanese
2224 Character Encoding for Internet Messages", RFC 1468,
2225 Keio University, Panda Programming, June 1993.
2226
2227 [RFC1521] N. Borenstein and N. Freed, "MIME (Multipurpose Inter-
2228 net Mail Extensions) Part One: Mechanisms for Specify-
2229 ing and Describing the Format of Internet Message Bod-
2230 ies", RFC 1521, Bellcore, Innosoft, September 1993.
2231
2232 [RFC1590] J. Postel, "Media Type Registration Procedure", RFC
2233 1590, USC/ISI, March 1994.
2234
2235
2236
2237 Expires 20 February 1996 [Page 40]
2238
2239 Internet Draft HTML internationalization 15 August 1995
2240
2241
2242 [RFC1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform
2243 Resource Locators (URL)", RFC 1738, CERN, Xerox PARC,
2244 University of Minnesota, October 1994.
2245
2246 [RFC1766] H. Alverstrand, "Tags for the Identification of Lan-
2247 guages", RFC 1766, UNINETT, March 1995.
2248
2249 [SQ91] SoftQuad, "The SGML Primer", 3rd ed., SoftQuad Inc.,
2250 1991.
2251
2252 [TAKADA] Toshihiro Takada, "Multilingual Information Exchange
2253 through the World-Wide Web", Computer Networks and
2254 ISDN Systems, Vol. 27, No. 2, Nov. 1994 , p. 235-241.
2255
2256 [TEI] TEI Guidelines for Electronic Text Encoding and Inter-
2257 change. <http://etext.virgina.edu/TEI.html>
2258
2259 [UNICODE] The Unicode Consortium, "The Unicode Standard --
2260 Worldwide Character Encoding -- Version 1.0", Addison-
2261 Wesley, Volume 1, 1991, Volume 2, 1992. The BIDI
2262 algorithm is in appendix A of volume 1, with correc-
2263 tions in appendix D of volume 2.
2264
2265 [VANH90] E. van Hervijnen, "Practical SGML", Kluwer Academicq
2266 Publishers Group, Norwell and Dordrecht, 1990.
2267
2268 Authors' Addresses
2269
2270 Franois Yergeau
2271 Alis Technologies
2272 3410, rue Griffith
2273 Montral QC H4T 1A7
2274 Canada
2275
2276 Tel: +1 (514) 738-9171
2277 Fax: +1 (514) 342-0318
2278 EMail: yergeau@alis.ca
2279
2280
2281 Gavin Thomas Nicol
2282 Electronic Book Technologies, Japan
2283 1-29-9 Tsurumaki,
2284 Setagaya-ku,
2285 Tokyo
2286 Japan
2287
2288 Tel + Fax: +81-3-3706-7351
2289 EMail: gtn@ebt.com, gtn@twics.co.jp
2290
2291
2292
2293 Expires 20 February 1996 [Page 41]
2294
2295 Internet Draft HTML internationalization 15 August 1995
2296
2297
2298 Glenn Adams
2299 Stonehand
2300 118 Magazine Street
2301 Cambridge, MA 02139
2302 U.S.A.
2303
2304 Tel: +1 (617) 864-5524
2305 Fax: +1 (617) 864-4965
2306 EMail: glenn@stonehand.com
2307
2308
2309 Martin J. Duerst
2310 Multimedia-Laboratory
2311 Departement of Computer Science
2312 University of Zurich
2313 Winterthurerstrasse 190
2314 CH-8057 Zurich
2315 Switzerland
2316
2317 Tel: +41 1 257 43 16
2318 Fax: +41 1 363 00 35
2319 E-mail: mduerst@ifi.unizh.ch
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
2344
2345
2346
2347
2348
2349 Expires 20 February 1996 [Page 42]

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24