/[suikacvs]/webroot/www/2004/id/draft-ietf-html-i18n-01.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-html-i18n-01.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Tue Jun 15 08:04:04 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 wakaba 1.1
2     Network Working Group F. Yergeau
3     Internet Draft G. Nicol
4     <draft-ietf-html-i18n-01.txt> G. Adams
5     Expires 30 March 1996 M. Duerst
6     25 September 1995
7    
8    
9     Internationalization of the Hypertext Markup Language
10    
11    
12     Status of this Memo
13    
14     This document is an Internet-Draft. Internet-Drafts are working doc-
15     uments of the Internet Engineering Task Force (IETF), its areas, and
16     its working groups. Note that other groups may also distribute work-
17     ing documents as Internet-Drafts.
18    
19     Internet-Drafts are draft documents valid for a maximum of six
20     months. Internet-Drafts may be updated, replaced, or obsoleted by
21     other documents at any time. It is not appropriate to use Internet-
22     Drafts as reference material or to cite them other than as a "working
23     draft" or "work in progress".
24    
25     To learn the current status of any Internet-Draft, please check the
26     1id-abstracts.txt listing contained in the Internet-Drafts Shadow
27     Directories on ds.internic.net (US East Coast), nic.nordu.net
28     (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
29     Rim).
30    
31     Distribution of this document is unlimited. Please send comments to
32     the HTML working group (HTML-WG) of the Internet Engineering Task
33     Force (IETF) at <html-wg@oclc.org>. Discussions of the group are
34     archived at URL: http://www.acl.lanl.gov/HTML_WG/archives.html.
35    
36    
37     Abstract
38    
39     The Hypertext Markup Language (HTML) is a simple markup language used
40     to create hypertext documents that are platform independent. Ini-
41     tially, the application of HTML on the World Wide Web was seriously
42     restricted by its reliance on the ISO-8859-1 coded character set,
43     which is appropriate only for Western European languages. Despite
44     this restriction, HTML has been widely used with other languages,
45     using other coded character sets or character encodings, through var-
46     ious ad hoc extensions to the language.
47    
48     This document is meant to address the issue of the internationaliza-
49     tion of HTML by extending the specification of HTML 2.0 and giving
50    
51    
52    
53     Expires 30 March 1996 [Page 1]
54    
55     Internet Draft HTML internationalization 25 September 1995
56    
57    
58     additional recommendations for proper internationalisation support.
59     A foremost consideration is to make sure that HTML remains a valid
60     application of SGML, while enabling its use in all languages of the
61     world.
62    
63     The "text/html; version=2.1" Internet Media Type [RFC1590] and MIME
64     Content Type [RFC1521] is defined by this specification, taken
65     together with the HTML 2.0 specification [HTML-2].
66    
67    
68     Table of contents
69    
70     1. Introduction .................................................. 2
71     1.1. Scope ...................................................... 3
72     1.2. Conformance ................................................ 3
73     2. The document character set ..................................... 5
74     2.1. Reference processing model ................................. 5
75     2.2. The HTML 2.1 document character set ........................ 7
76     2.3. Undisplayable characters ................................... 8
77     3. Language tags .................................................. 8
78     4. Additional entities, attributes and elements ...................10
79     4.1. Full Latin-1 entity set ....................................10
80     4.2. Markup for language-dependent presentation .................10
81     5. Forms ..........................................................12
82     5.1. DTD additions ..............................................12
83     5.2. Form submission ............................................13
84     6. Miscellaneous ..................................................14
85     7. HTML public text ...............................................15
86     7.1. HTML DTD ...................................................15
87     7.2. SGML declaration for HTML ..................................30
88     7.3. Entity sets ................................................31
89     7.3.1. ISO Latin 1 character entity set .......................31
90     Bibliography ......................................................34
91     Authors' Addresses ................................................36
92    
93    
94     1. Introduction
95    
96     The Hypertext Markup Language (HTML) is a simple markup language used
97     to create hypertext documents that are platform independent. Ini-
98     tially, the application of HTML on the World Wide Web was seriously
99     restricted by its reliance on the ISO-8859-1 coded character set,
100     which is appropriate only for Western European languages. Despite
101     this restriction, HTML has been widely used with other languages,
102     using other coded character sets or character encodings, through var-
103     ious ad hoc extensions to the language [TAKADA].
104    
105     This document is meant to address the issue of the
106    
107    
108    
109     Expires 30 March 1996 [Page 2]
110    
111     Internet Draft HTML internationalization 25 September 1995
112    
113    
114     internationalization of HTML by extending the specification of HTML
115     2.0 and giving additional recommendations for proper internationali-
116     sation support. It is in good part based on a paper by one of the
117     authors on multilingualism on the WWW [NICOL]. A foremost considera-
118     tion is to make sure that HTML remains a valid application of SGML,
119     while enabling its use in all languages of the world.
120    
121     The specific issues addressed are the SGML document character set to
122     be used for HTML, the proper treatment of the charset parameter asso-
123     ciated with the "text/html" content type and the specification of
124     language tags and additional entities.
125    
126    
127     1.1 Scope
128    
129     HTML has been in use by the World-Wide Web (WWW) global information
130     initiative since 1990. This specification extends the capabilities
131     of HTML 2.0 (RFC xxx), primarily by removing the restriction to the
132     ISO-8859-1 coded character set [ISO-8859-1]. Together with the HTML
133     2.0 specification, it defines a new version of HTML to be known as
134     "HTML 2.1".
135    
136     HTML is an application of ISO Standard 8879:1986, Information Pro-
137     cessing Text and Office Systems -- Standard Generalized Markup Lan-
138     guage (SGML) [ISO-8879]. The HTML Document Type Definition (DTD) is a
139     formal definition of the HTML syntax in terms of SGML. This specifi-
140     cation amends the DTD of HTML 2.0 in order to make it applicable to
141     documents encompassing a character repertoire much larger than that
142     of ISO-8859-1, while still remaining SGML conformant.
143    
144     Together with the HTML 2.0, specification, this specification also
145     defines HTML as an Internet Media Type [RFC1590] and MIME Content
146     Type [RFC1521] called "text/html", or "text/html; version=2.1". As
147     such, it defines the semantics of the HTML syntax and how that syntax
148     should be interpreted by user agents.
149    
150    
151     1.2 Conformance
152    
153     This specification governs the syntax of HTML documents and aspects
154     of the behavior of HTML user agents.
155    
156     1.2.1 Documents
157    
158     A document is a conforming HTML document if:
159    
160     * It is a conforming SGML document, and it conforms to the HTML DTD
161     (see 7.1, "HTML DTD").
162    
163    
164    
165     Expires 30 March 1996 [Page 3]
166    
167     Internet Draft HTML internationalization 25 September 1995
168    
169    
170     * It conforms to the application conventions in this specification.
171     For example, the value of the HREF attribute of the <A> element
172     must conform to the URI syntax.
173    
174     1.2.2. User agents
175    
176     An HTML user agent conforms to this specification if:
177    
178     * It parses the characters of an HTML document into data characters
179     and markup according to SGML [ISO-8879].
180    
181     NOTE -- In the interest of robustness and extensibility,
182     there are a number of widely deployed conventions for han-
183     dling non-conforming documents. See section 4.2.1 of the
184     HTML 2.0 specification [HTML-2], "Undeclared Markup Error
185     Handling" for details.
186    
187     * It supports at least the ISO-8859-1 character encoding scheme and
188     processes each character in the ISO Latin Alphabet No. 1 as speci-
189     fied in section 6.1 of [HTML-2].
190    
191     To ensure interoperability and proper support for at least
192     ISO-8859-1 in an environment where character encoding schemes
193     other than ISO-8859-1 are present, user agents must correctly
194     interpret the charset parameter accompanying an HTML document
195     received from the network.
196    
197     Furthermore, conforming user-agents are required to at least parse
198     correctly numeric character references within the range of the
199     Basic Multilingual Plane (BMP) of ISO 10646-1 [ISO-10646].
200    
201     NOTE -- To support non-western writing systems, HTML user
202     agents are encouraged to support `UNICODE-1-1' or similar
203     character encoding schemes and as much of the character
204     repertoire of [ISO-10646] as is practical.
205    
206     * It behaves identically for documents whose parsed token sequences
207     are identical.
208    
209     For example, comments and the whitespace in tags disappear during
210     tokenization, and hence they do not influence the behavior of con-
211     forming user agents.
212    
213     * It allows the user to traverse (or at least attempt to traverse,
214     resources permitting) all hyperlinks from <A> elements in an HTML
215     document.
216    
217     An HTML user agent is a level 2 user agent if, additionally:
218    
219    
220    
221     Expires 30 March 1996 [Page 4]
222    
223     Internet Draft HTML internationalization 25 September 1995
224    
225    
226     * It allows the user to express all form field values specified in
227     an HTML document and to (attempt to) submit the values as requests
228     to information services.
229    
230     2. The document character set
231    
232     2.1. Reference processing model
233    
234     This overview explains the reference processing model used for HTML
235     2.1, and in particular the SGML concept of a document character set.
236     An actual implementation may widely differ in its internal workings
237     from the model given below, but should behave as described to an out-
238     side observer.
239    
240     Because there are various widely differing encodings of text, SGML
241     does not directly address the question of how characters are encoded
242     e.g. in a file. SGML views the characters as a single set (called a
243     "character repertoire"), and a "code set" that assigns an integer
244     number (known as "character number") to each character in the reper-
245     toire. The document character set declaration defines what each of
246     the character numbers represents [GOLD90, p. 451]. In most cases, an
247     SGML DTD and all documents that refer to it have a single document
248     character set, and all markup and data characters are part of this
249     set.
250    
251     HTML, as an application of SGML, does not directly address the ques-
252     tion of how characters are encoded as octets in external representa-
253     tions such as files. This is deferred to mechanisms external to HTML,
254     such as the HTTP protocol, or MIME for electronic mail.
255    
256     For the HTTP protocol [HTTP], the way characters are encoded is
257     defined by the "charset" parameter[1] of the "Content-Type" field of
258     the header of an HTTP response. For example, to indicate that the
259     transmitted document is encoded in the "JIS" encoding of Japanese
260     [RFC1468], the header will contain the following line:
261    
262     Content-Type: text/html; charset=ISO-2022-JP
263    
264     _________________________
265     [1] The term "charset" in MIME is used to designate a
266     character encoding, rather than a coded character set
267     as the term may suggest. A character encoding is a
268     mapping (possibly many-to-one) of a sequence of octets
269     to a sequence of characters taken from one or more
270     character repertoires. A coded character set is a map-
271     ping between individual bit patterns and individual
272     characters from a single character repertoire.
273    
274    
275    
276    
277     Expires 30 March 1996 [Page 5]
278    
279     Internet Draft HTML internationalization 25 September 1995
280    
281    
282     The default charset parameter in case of the HTTP protocol is
283     ISO-8859-1 (the so-called "Latin-1" for Western European characters).
284     The HTTP protocol also defines a mechanism for the client to specify
285     the character encodings it can accept. Clients and servers are
286     strongly requested to use these mechanisms to assure correct trans-
287     mission and interpretation of any document. Provisions that can be
288     taken to help correct interpretation, even in cases where a server or
289     client do not yet use these mechanisms, are described in section 6.
290    
291     Similarly, if HTML documents are transferred by electronic mail, the
292     character encoding is defined by the "charset" parameter of the "Con-
293     tent-Type" MIME header line [RFC1521].
294    
295     In the case any other way of transferring and storing HTML documents
296     are defined or become popular, it is advised that similar provisions
297     should be made to clearly identify the character encoding used and/or
298     to use a single/default encoding capable of representing the widest
299     range of characters used in an international context.
300    
301     Whatever the external character encoding actually be, the reference
302     processing model translates it to a representation of the document
303     character set specified in Section 2.2 before processing specific to
304     SGML/HTML. The reference processing model can be depicted as fol-
305     lows:
306    
307     [resource]->[decoder]->[entity ]->[ SGML ]->[application]->[display]
308     [manager] [parser]
309     ^ |
310     | |
311     +----------+
312    
313     The decoder is responsible for decoding the external representation
314     of the resource to a representation using the document character set.
315     The entity manager, the parser, and the application deal only with
316     characters of the document character set. A display-oriented part of
317     the application or the display machinery itself may again convert
318     characters represented in the document character set to some other
319     representation more suitable for their purpose. In any case, the
320     entity manager, the parser, and the application, as far as character
321     semantics are concerned, are using the HTML 2.1 document character
322     set only.
323    
324     An actual implementation may choose, or not, to translate the docu-
325     ment into some encoding of the document character set as described
326     above; the behaviour described by this reference processing model can
327     be achieved otherwise. This subject is well out of the scope of this
328     specification, however, and the reader is invited to consult the SGML
329     standard [ISO-8879] or a SGML handbook [BRYAN88] [GOLD90] [VANH90]
330    
331    
332    
333     Expires 30 March 1996 [Page 6]
334    
335     Internet Draft HTML internationalization 25 September 1995
336    
337    
338     [SQ91] for further information.
339    
340     The most important consequence of this reference processing model is
341     that numeric character references are always resolved to the same
342     characters, whatever the external encoding actually used. For an
343     example, see Section 2.2.
344    
345     2.2. The HTML 2.1 document character set
346    
347     The document character set, in the SGML sense, of HTML 2.1 is the
348     Basic Multilingual Plane of ISO 10646:1993 [ISO-10646], also known as
349     UCS-2. This is code-by-code identical with the Unicode standard
350     [UNICODE]. The adoption of this document character set implies a
351     change in the SGML declaration specified in the HTML 2.0 specifica-
352     tion (section 9.5 of [HTML-2]). The change amounts to removing the
353     two BASESET specifications and their accompanying DESCSET declara-
354     tions, replacing them with the following declaration:
355    
356     BASESET "ISO Registration Number 176//CHARSET
357     ISO/IEC 10646-1:1993 UCS-2 with implementation level 3
358     //ESC 2/5 2/15 4/5"
359     DESCSET 0 9 UNUSED
360     9 2 9
361     11 2 UNUSED
362     13 1 13
363     14 18 UNUSED
364     32 95 32
365     127 1 UNUSED
366     128 32 UNUSED
367     160 65374 160
368    
369     Making UCS-2 the document character set does not create non-
370     conformance of any expression, construct or document that is conform-
371     ing to HTML 2.0. It does make conforming certain constructs that are
372     not admissible in HTML 2.0. One consequence is that data characters
373     outside the repertoire of ISO-8859-1, but within that of UCS-2 become
374     valid SGML characters. Another is that the upper limit of the range
375     of numeric character references is extended from 255 to 65533[2] ;
376     thus, &#1048; is a valid reference to a "CYRILLIC CAPITAL LETTER I".
377     [ERCS] is a good source of information on Unicode and SGML, although
378     its scope and technical content differ greatly from this
379     _________________________
380     [2] 65533 (FFFD hexadecimal) is the last valid char-
381     acter in UCS-2. 65534 (FFFE hexadecimal) is unassigned
382     and reserved as the byte-swapped version of ZERO WIDTH
383     NON-BREAKING SPACE for byte-sex detection purposes.
384     65535 (FFFF hexadecimal) is unassigned.
385    
386    
387    
388    
389     Expires 30 March 1996 [Page 7]
390    
391     Internet Draft HTML internationalization 25 September 1995
392    
393    
394     specification.
395    
396     ISO 10646-1:1993 is the most encompassing character set currently
397     existing, and there is no other character set that could take its
398     place as the document character set for HTML 2.1. Also, it is
399     expected that with future extensions of ISO 10646, this specification
400     may also be extended. If nevertheless for a specific application
401     there is a need to use characters outside this standard, this should
402     be done by avoiding any conflicts with present or future versions of
403     ISO 10646, i.e. by assigning these characters to a private zone.
404     Also, it should be borne in mind that such a use will be highly
405     unportable; in many cases, it may be better to use inline bitmaps.
406    
407     2.3. Undisplayable characters
408    
409     With the document character set being the full ISO 10646 BMP, the
410     possibility that a character cannot be displayed due to lack of
411     appropriate resources (fonts) cannot be avoided. Because there are
412     many different things that can be done in such a case, this document
413     does not recommend any specific behaviour. Depending on the implemen-
414     tation, this may also be handled by the underlaying display system
415     and not the application itself. The following considerations, how-
416     ever, may be of help:
417    
418     - A clearly visible, but unobtrusive behaviour should be preferred.
419     Some documents may contain many characters that cannot be renden-
420     dered, and so showing an alert for each of them is not the right
421     thing to do.
422    
423     - In case a numeric representation of the missing character is
424     given, its hexadecimal (not decimal) form is to be preferred,
425     because this form is used in character set standards [ERCS].
426    
427     3. Language tags
428    
429     Language tags can be used to control rendering of a marked up docu-
430     ment in various ways: character disambiguation, in cases where the
431     character encoding is not sufficient to resolve to a specific glyph;
432     quotation marks; hyphenation; ligatures; spacing; voice synthesis;
433     etc. Independently of rendering issues, language markup is useful as
434     content markup for purposes such as classification and searching.
435    
436     The language attribute, LANG, takes as its value a language tag that
437     identifies a natural language spoken, written, or otherwise conveyed
438     by human beings for communication of information to other human
439     beings. Computer languages are explicitly excluded.
440    
441     The syntax and registry of HTML language tags is the same as that
442    
443    
444    
445     Expires 30 March 1996 [Page 8]
446    
447     Internet Draft HTML internationalization 25 September 1995
448    
449    
450     defined by RFC 1766 [RFC1766]. In summary, a language tag is composed
451     of one or more parts: A primary language tag and a possibly empty
452     series of subtags:
453    
454     language-tag = primary-tag *( "-" subtag )
455     primary-tag = 1*8ALPHA
456     subtag = 1*8ALPHA
457    
458     Whitespace is not allowed within the tag and all tags are case-
459     insensitive. The namespace of language tags is administered by the
460     IANA. Example tags include:
461    
462     en, en-US, en-cockney, i-cherokee, x-pig-latin
463    
464     Two-letter primary-tags are reserved for ISO 639 language abbrevia-
465     tions [ISO-639], and three-letter primary-tags for the language
466     abbreviations of the "Ethnologue" [ETHNO] (the latter is in addition
467     to the requirements of RFC 1766). Any two-letter initial subtag is an
468     ISO 3166 country code [ISO-3166].
469    
470     In the context of HTML, a language tag is not to be interpreted as a
471     single token, as per RFC 1766, but as a hierarchy. For example, a
472     user agent that adjusts rendering according to language should con-
473     sider that it has a match when a language tag in a style sheet entry
474     matches the initial portion of the language tag of an element. An
475     exact match should be preferred. This interpretation allows an ele-
476     ment marked up as, for instance, "en-US" to trigger styles corre-
477     sponding to, in order of preference, US-English ("en-US") or 'plain'
478     or 'international' English ("en").
479    
480     NOTE -- using the language tag as a hierarchy does not
481     imply that all languages with a common prefix will be
482     understood by those fluent in one or more of those lan-
483     guages; it simply allows the user to request this commonal-
484     ity when it is true for that user.
485    
486     Since any text can logically be assigned a language, almost all HTML
487     elements admit the LANG attribute. The DTD reflects this. It is
488     also intended that any new element introduced in later versions of
489     HTML will admit the LANG attribute, unless there is a good reason not
490     to do so.
491    
492     The rendering of elements is meant to be controlled (in part) by the
493     LANG attribute. Specific user preferences set within the browser
494     should override the value of the LANG attribute, which in turn over-
495     rides the value specified by the LANG attribute of any enclosing ele-
496     ment. If none of these are set, a suitable default, perhaps con-
497     trolled by the user's locale, should be used to control rendering.
498    
499    
500    
501     Expires 30 March 1996 [Page 9]
502    
503     Internet Draft HTML internationalization 25 September 1995
504    
505    
506     4. Additional entities, attributes and elements
507    
508     4.1. Full Latin-1 entity set
509    
510     According to the suggestion of section 14 of [HTML-2], the set of
511     Latin-1 entities is extended to cover the whole right part of
512     ISO-8859-1. The names of the entities are taken from the appendices
513     of [SGML]. A list is provided in section 7.3.1 of this specifica-
514     tion.
515    
516     4.2. Markup for language-dependent presentation
517    
518     For the correct presentation of text from certain languages (irre-
519     spective of formatting issues), some support in the form of addi-
520     tional entities and elements is needed. In particular, bidirectional
521     text (BIDI for short) requires markup in special circumstances where
522     ambiguities as to the directionnality of some characters have to be
523     resolved. Plain text may contain this markup in the form of special-
524     purpose characters; in HTML, these are replaced by SGML markup to be
525     described below.
526    
527     This markup affects the ability to render BIDI text in a semantically
528     legible fashion. That is, without this special BIDI markup, cases
529     arise which would prevent *any* rendering whatsoever that reflected
530     the basic meaning of the text. It is for this reason that these spe-
531     cial characters were added to Unicode (and, thence, to ISO/IEC
532     10646). If it were possible to do reliable layout and rendering of
533     bidirectionnal text without them, they definitely would not have been
534     included in Unicode.
535    
536     First, a set of named character entities is added that allows partial
537     support of the Unicode bidirectional algorithm [UNICODE], plus some
538     help with languages requiring contextual analysis for rendering:
539    
540     <!ENTITY zwnj CDATA "&#8204;"--=zero width non-joiner-->
541     <!ENTITY zwj CDATA "&#8205;"--=zero width joiner-->
542     <!ENTITY lrm CDATA "&#8206;"--=left-to-right mark-->
543     <!ENTITY rlm CDATA "&#8207;"--=right-to-left mark-->
544    
545     The first two, zwnj and zwj, are used to force or block joining
546     behavior in contexts which joining would occur but should not or
547     would not occur but should. For example, ARABIC LETTER HEH is used
548     to abbreviate "Hijri" (the Islamic calendrical system); however, the
549     isolated form of HEH looks like the digit five as employed in Arabic
550     script (actually based on Indic digits). In order to prevent one
551     from reading HEH as a final digit five in a year, the initial form of
552     HEH is used. However, there is no following context (i.e., a joining
553     letter) to which the HEH can join. Therefore, the ZWJ is used to
554    
555    
556    
557     Expires 30 March 1996 [Page 10]
558    
559     Internet Draft HTML internationalization 25 September 1995
560    
561    
562     provide that context. In Farsi texts, there are cases where a letter
563     that normally would join a subsequent letter in a cursive connection
564     does not. Here the ZWNJ is used.
565    
566     The other two, lrm and rlm, are used to disambiguate directionality
567     of directionally neutral characters, e.g., if you have a double quote
568     sitting between an Arabic and a Latin letter, then which direction
569     does the quote resolve to? These characters are like zero width
570     spaces which have a directional property (but no word/line break
571     property).
572    
573     Next, an attribute called DIR is introduced, restricted to the values
574     LTR and RTL and admitted by most elements. On block-type elements,
575     the DIR attribute indicates the base directionnality of the text in
576     the block; if omitted it is inherited from the parent element. On
577     inline elements, it makes the element start a new embedding level; if
578     omitted the inline element does not start a new embedding level.
579     Embedding is used to handle nested directional runs; a common need
580     for the embedding characters is to handle text that has been pasted
581     from one bidi context to another, and the possibility of multiply
582     embedded pastings. Following is an example of a case where embedding
583     is needed, showing its effect:
584    
585     Given the following latin (upper case) and arabic (lower
586     case) letters in backing store with the specified embed-
587     dings (LRE is shorthand for <SPAN DIR=LTR>, RLE for <SPAN
588     DIR=RTL> and PDF for </SPAN>):
589    
590     LRE A B RLE a b LRE C D PDF c d PDF E F PDF
591    
592     One gets the following rendering (with [] showing the
593     directional transitions):
594    
595     [ A B [ d c [ C D ] b a ] E F ]
596    
597     On the other hand, without these characters, e.g., with
598    
599     A B a b C D c d E F
600    
601     and a base level of LTR one gets the following rendering:
602    
603     [ A B [ b a ] C D [ d c ] E F ]
604    
605     Notice that b,a is on the left and d,c on the right unlike
606     the above case where the embedding levels are used. With-
607     out the embedding characters one has at most two levels: a
608     base directional level and a single counterflow directional
609     level.
610    
611    
612    
613     Expires 30 March 1996 [Page 11]
614    
615     Internet Draft HTML internationalization 25 September 1995
616    
617    
618     A directionnal override feature is needed to deal with
619     unusual pieces of text in which directionality cannot be
620     resolved from context in an unambiguous fashion. For exam-
621     ple, in part numbers, formulas, telephone numbers, and
622     other similar pieces of text, it is difficult or impossible
623     to derive the directionality of numbers, punctuation, and
624     other neutrals from their context. To this effect, a new
625     element called BDO (BIDI override) is introduced, which
626     requires the DIR attribute to specify whether the override
627     is left-to-right or right-to-left.
628    
629     A few other additional elements are important to have for
630     proper language-dependent rendering. First, a generic con-
631     tainer is needed to carry the LANG and BIDI attributes in
632     cases where no other element is appropriate; the SPAN ele-
633     ment is introduced for that purpose.
634    
635     Short quotations, and in particular the quotation marks
636     surrounding them, are typically rendered differently in
637     different languages and on platforms with different graphic
638     capabilities: "a quotation in English", `another, slightly
639     better one', ,,a quotation in German", << a quotation in
640     French >>. The <Q> element is introduced for that purpose.
641    
642     Many languages, including English, require superscripts for
643     proper rendering: "the XXth century" should have "th" in
644     superscript. The <SUP> element, and its sibling <SUB>, are
645     introduced to allow proper markup of such text. <SUP> and
646     <SUB> contents are restricted to PCDATA to avoid nesting
647     problems.
648    
649     Finally, in many languages text justification is much more
650     important than it is in Western languages, and justifies
651     markup. The ALIGN attribute, admitting values of LEFT,
652     RIGHT, CENTER and JUSTIFY, is added to a selection of ele-
653     ments where it makes sense (block-like).
654    
655     5. Forms
656    
657    
658     5.1. DTD additions
659    
660     It is natural to expect input in any language in forms, as they pro-
661     vide one of the only ways of obtaining user input. While this is pri-
662     marily a UI issue, there are some things that should be specified at
663     the HTML level to guide behavior and promote interoperability.
664    
665     To ensure interoperability, it is necessary for the user agent (and
666    
667    
668    
669     Expires 30 March 1996 [Page 12]
670    
671     Internet Draft HTML internationalization 25 September 1995
672    
673    
674     the user) to have an indication of the character set(s) that the
675     server providing a form will be able to handle upon submission of the
676     filled-in form. Such an indication is provided by the ACCEPT-CHARSET
677     attribute of the FORM element, modeled on the HTTP Accept-Charset
678     header (see [HTTP]), which contains a space and/or comma delimited
679     list of character sets acceptable to the server. A user agent may
680     want to somehow advise the user of the contents of this attribute, or
681     to restrict his possibility to enter unacceptable characters.
682    
683     NOTE -- The list of character sets is to be interpreted as
684     an EXCLUSIVE-OR list; the server announces that it is ready
685     to accept any ONE of these character encoding schemes for
686     each part of a multipart entity.
687    
688    
689     5.2. Form submission
690    
691     The HTML 2.0 form submission mechanism, based on the "application/x-
692     www-form-urlencoded" media type, is hopelessly broken with regard to
693     internationalization. In fact, since URLs are restricted to ASCII
694     characters, the mechanism is broken even for ISO-8859-1 text. Sec-
695     tion 2.2 of [RFC1738] specifies that octets may be encoded using the
696     "%HH" notation, but text submitted from a form is composed of charac-
697     ters, not octets. Lacking a specification of a character encoding
698     scheme, the "%HH" notation has no meaning.
699    
700     A partial solution to this sorry state of affairs is to specify a
701     default character encoding scheme to be assumed when the GET method
702     of form submission is used. Specifying UCS-2 would break all exist-
703     ing forms, so the only sensible way is to designate ISO-8859-1. That
704     is, the encoded URL sent to submit a form by the GET method is to be
705     interpreted as a sequence of single-octet characters encoded accord-
706     ing to ISO-8859-1, and further encoded according to the scheme of
707     [RFC1738] (the "%HH" notation). This is clearly insufficient, so the
708     GET method of form submission is deprecated and should not be used in
709     future documents, despite the language of section XX of [HTML-2].
710    
711     A better solution is to add a MIME charset parameter to the Content-
712     Type header sent along with a POST method form submission, with the
713     understanding that the URL encoding of [RFC1738] is applied on top of
714     the specified character encoding, as a kind of implicit Content-
715     Transfer-Encoding. The default ISO-8859-1 is to be implied in the
716     absence of a charset parameter.
717    
718     The best solution is to use the "multipart/form-data" media type
719     described in [FILE-UPLOAD] with the POST method of form submission.
720     This mechanism encapsulates the value part of each name-value pair in
721     a body-part of a multipart MIME body that is sent as the HTTP entity;
722    
723    
724    
725     Expires 30 March 1996 [Page 13]
726    
727     Internet Draft HTML internationalization 25 September 1995
728    
729    
730     each body part can be labeled with an appropriate Content-Type,
731     including if necessary a charset parameter that specifies the charac-
732     ter encoding scheme. The changes to the DTD necessary to support
733     this method of form submission have been incorporated in the DTD
734     included in this specification.
735    
736     How the user agent determines the encoding of the text entered by the
737     user is outside the scope of this specification.
738    
739     6. Miscellaneous
740    
741     Proper interpretation of a text document requires that the character
742     encoding scheme be known. Current HTTP servers, however, do not gen-
743     erally include an appropriate charset parameter with the Content-Type
744     header, even when the encoding scheme is different from the default
745     ISO-8859-1. This is bad behaviour, and as such strongly discouraged,
746     but some preventive measures can be taken to minimize the detrimental
747     effects.
748    
749     In the case where a document is accessed from a hyperlink in an ori-
750     gin HTML document, a CHARSET attribute is added to the attribute list
751     of elements with link semantics (A and LINK), specifically by adding
752     it to the linkExtraAttributes entity. The value of that attribute is
753     to be considered a hint to the User Agent as to the character encod-
754     ing scheme used by the ressource pointed to by the hyperlink; it
755     should be the appropriate value of the MIME charset parameter for
756     that ressource.
757    
758     In any document, it may be wise to include an indication of the
759     encoding scheme like the following, as early as possible within the
760     HEAD of the document:
761    
762     <META HTTP-EQUIV="Content-Type"
763     CONTENT="text/html; charset=ISO-2022-JP">
764    
765     This is not foolproof, but will work if the encoding scheme is such
766     that ASCII characters stand for themselves at least until the META
767     element is parsed.
768    
769     For definiteness, the "charset" parameter received from the source of
770     the document should be considered the most authoritative, followed in
771     order of preference by the contents of a META element such as the
772     above, and finally the CHARSET parameter of the anchor that was fol-
773     lowed (if any).
774    
775     When HTML text is transmitted directly in UCS-2
776     (charset=UNICODE-1-1), the question of byte order arises: does the
777     high-order byte of each two-byte character come first or second? For
778    
779    
780    
781     Expires 30 March 1996 [Page 14]
782    
783     Internet Draft HTML internationalization 25 September 1995
784    
785    
786     definiteness, this specification recommends that UCS-2 be transmitted
787     in big-endian byte order (high order byte first), which corresponds
788     both to the established network byte order for two-byte quantities
789     and to the Unicode recommendation for serialized text data. Further-
790     more, to maximize chances of proper interpretation, it is recommended
791     that documents transmitted as UCS-2 always begin with a ZERO-WIDTH
792     NON-BREAKING SPACE character (hexadecimal FEFF) which, when byte-
793     reversed becomes number FFFE, a character guaranteed to be never
794     assigned. Thus, a user-agent receiving an FFFE as the first octets
795     of a text would know that bytes have to be reversed for the remainder
796     of the text.
797    
798     7. HTML Public Text
799    
800     7.1. HTML DTD
801    
802     <!-- html-2.1.dtd
803    
804     Document Type Definition for the HyperText Markup Language,
805     version 2.1 (HTML DTD)
806    
807     Last revised: 95/09/25
808    
809     Authors: Daniel W. Connolly <connolly@w3.org>
810     Francois Yergeau <yergeau@alis.com>
811     -->
812    
813     <!ENTITY % HTML.Version
814     "-//IETF//DTD HTML 2.1//EN"
815    
816     -- Typical usage:
817    
818     <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.1//EN">
819     <html>
820     ...
821     </html>
822     --
823     >
824    
825    
826     <!--============ Feature Test Entities ========================-->
827    
828     <!ENTITY % HTML.Recommended "IGNORE"
829     -- Certain features of the language are necessary for
830     compatibility with widespread usage, but they may
831     compromise the structural integrity of a document.
832     This feature test entity enables a more prescriptive
833     document type definition that eliminates
834    
835    
836    
837     Expires 30 March 1996 [Page 15]
838    
839     Internet Draft HTML internationalization 25 September 1995
840    
841    
842     those features.
843     -->
844    
845     <![ %HTML.Recommended [
846     <!ENTITY % HTML.Deprecated "IGNORE">
847     ]]>
848    
849     <!ENTITY % HTML.Deprecated "INCLUDE"
850     -- Certain features of the language are necessary for
851     compatibility with earlier versions of the specification,
852     but they tend to be used and implemented inconsistently,
853     and their use is deprecated. This feature test entity
854     enables a document type definition that eliminates
855     these features.
856     -->
857    
858     <!ENTITY % HTML.Highlighting "INCLUDE"
859     -- Use this feature test entity to validate that a
860     document uses no highlighting tags, which may be
861     ignored on minimal implementations.
862     -->
863    
864     <!ENTITY % HTML.Forms "INCLUDE"
865     -- Use this feature test entity to validate that a document
866     contains no forms, which may not be supported in minimal
867     implementations
868     -->
869    
870     <!--============== Imported Names ==============================-->
871    
872     <!ENTITY % Content-Type "CDATA"
873     -- meaning an internet media type
874     (aka MIME content type, as per RFC1521)
875     -->
876    
877     <!ENTITY % HTTP-Method "GET | POST"
878     -- as per HTTP specification, in progress
879     -->
880    
881     <!--========= DTD "Macros" =====================-->
882    
883     <!ENTITY % heading "H1|H2|H3|H4|H5|H6">
884    
885     <!ENTITY % list " UL | OL | DIR | MENU " >
886    
887     <!ENTITY % attrs -- common attributes for elements --
888     "LANG NAME #IMPLIED -- RFC 1766 language tag --
889     DIR (ltr|rtl) #IMPLIED -- text directionnality --">
890    
891    
892    
893     Expires 30 March 1996 [Page 16]
894    
895     Internet Draft HTML internationalization 25 September 1995
896    
897    
898     <!ENTITY % just -- an attribute for text justification --
899     "ALIGN (left|right|center|justify) #IMPLIED">
900    
901     <!--======= Character mnemonic entities =================-->
902    
903     <!ENTITY % ISOlat1 PUBLIC
904     "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
905     %ISOlat1;
906    
907     <!--Entities for markup significant characters -->
908     <!ENTITY amp CDATA "&#38;" -- ampersand -->
909     <!ENTITY gt CDATA "&#62;" -- greater than -->
910     <!ENTITY lt CDATA "&#60;" -- less than -->
911     <!ENTITY quot CDATA "&#34;" -- double quote -->
912    
913     <!--Entities for language-dependent presentation (BIDI and contextual analysis) -->
914     <!ENTITY zwnj CDATA "&#8204;"-- zero width non-joiner-->
915     <!ENTITY zwj CDATA "&#8205;"-- zero width joiner-->
916     <!ENTITY lrm CDATA "&#8206;"-- left-to-right mark-->
917     <!ENTITY rlm CDATA "&#8207;"-- right-to-left mark-->
918    
919     <!--========= SGML Document Access (SDA) Parameter Entities =====-->
920    
921     <!-- HTML 2.0 contains SGML Document Access (SDA) fixed attributes
922     in support of easy transformation to the International Committee
923     for Accessible Document Design (ICADD) DTD
924     "-//EC-USA-CDA/ICADD//DTD ICADD22//EN".
925     ICADD applications are designed to support usable access to
926     structured information by print-impaired individuals through
927     Braille, large print and voice synthesis. For more information on
928     SDA & ICADD:
929     - ISO 12083:1993, Annex A.8, Facilities for Braille,
930     large print and computer voice
931     - ICADD ListServ
932     <ICADD%ASUACAD.BITNET@ARIZVM1.ccit.arizona.edu>
933     - Usenet news group bit.listserv.easi
934     - Recording for the Blind, +1 800 221 4792
935     -->
936    
937     <!ENTITY % SDAFORM "SDAFORM CDATA #FIXED"
938     -- one to one mapping -->
939     <!ENTITY % SDARULE "SDARULE CDATA #FIXED"
940     -- context-sensitive mapping -->
941     <!ENTITY % SDAPREF "SDAPREF CDATA #FIXED"
942     -- generated text prefix -->
943     <!ENTITY % SDASUFF "SDASUFF CDATA #FIXED"
944     -- generated text suffix -->
945     <!ENTITY % SDASUSP "SDASUSP NAME #FIXED"
946    
947    
948    
949     Expires 30 March 1996 [Page 17]
950    
951     Internet Draft HTML internationalization 25 September 1995
952    
953    
954     -- suspend transform process -->
955    
956     <!--========== Text Markup =====================-->
957    
958     <![ %HTML.Highlighting [
959    
960     <!ENTITY % font " TT | B | I ">
961    
962     <!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE">
963    
964     <!ENTITY % text "#PCDATA|A|IMG|BR|%phrase|%font|SPAN|Q|BDO|SUP|SUB">
965    
966     <!ELEMENT (%font;|%phrase) - - (%text)*>
967     <!ATTLIST ( TT | CODE | SAMP | KBD | VAR )
968     %attrs;
969     %SDAFORM; "Lit"
970     >
971     <!ATTLIST ( B | STRONG )
972     %attrs;
973     %SDAFORM; "B"
974     >
975     <!ATTLIST ( I | EM | CITE )
976     %attrs;
977     %SDAFORM; "It"
978     >
979    
980     <!-- <TT> Typewriter text -->
981     <!-- <B> Bold text -->
982     <!-- <I> Italic text -->
983     <!-- <EM> Emphasized phrase -->
984     <!-- <STRONG> Strong emphasis -->
985     <!-- <CODE> Source code phrase -->
986     <!-- <SAMP> Sample text or characters -->
987     <!-- <KBD> Keyboard phrase, e.g. user input -->
988     <!-- <VAR> Variable phrase or substituable -->
989     <!-- <CITE> Name or title of cited work -->
990    
991     <!ENTITY % pre.content "#PCDATA|A|HR|BR|%font|%phrase|SPAN|BDO">
992    
993     ]]>
994    
995     <!ENTITY % text "#PCDATA|A|IMG|BR|SPAN|Q|BDO|SUP|SUB">
996    
997     <!-- Should the BDO element have an SDAFORM attr.? Which? -->
998     <!ELEMENT BDO - - (%text)+>
999     <!ATTLIST BDO
1000     LANG NAME #IMPLIED
1001     DIR (ltr|rtl) #REQUIRED
1002    
1003    
1004    
1005     Expires 30 March 1996 [Page 18]
1006    
1007     Internet Draft HTML internationalization 25 September 1995
1008    
1009    
1010     >
1011    
1012     <!-- <BDO> Control bidirectionnal text -->
1013    
1014     <!ELEMENT BR - O EMPTY>
1015     <!ATTLIST BR
1016     %SDAPREF; "&#RE;"
1017     >
1018    
1019     <!-- <BR> Line break -->
1020    
1021     <!-- Should the SPAN element have an SDAFORM attr.? Which? -->
1022     <!ELEMENT SPAN - - (%text)*>
1023     <!ATTLIST SPAN
1024     %attrs;
1025     >
1026    
1027     <!-- <SPAN> Generic container -->
1028    
1029     <!ELEMENT Q - - (%text)*>
1030     <!ATTLIST Q
1031     %attrs;
1032     %SDAFORM; "It" -- to be verified --
1033     >
1034    
1035     <!-- <Q> Short quotation -->
1036    
1037     <!ELEMENT (SUP|SUB) - - (#PCDATA)>
1038     <!ATTLIST (SUP|SUB)
1039     %attrs;
1040     >
1041    
1042     <!-- <SUP> Superscript -->
1043     <!-- <SUB> Subscript -->
1044    
1045     <!--========= Link Markup ======================-->
1046    
1047     <!ENTITY % linkType "NAME">
1048    
1049     <!ENTITY % linkExtraAttributes
1050     "REL %linkType #IMPLIED
1051     REV %linkType #IMPLIED
1052     URN CDATA #IMPLIED
1053     TITLE CDATA #IMPLIED
1054     METHODS NAMES #IMPLIED
1055     CHARSET NAME #IMPLIED
1056     ">
1057    
1058    
1059    
1060    
1061     Expires 30 March 1996 [Page 19]
1062    
1063     Internet Draft HTML internationalization 25 September 1995
1064    
1065    
1066     <![ %HTML.Recommended [
1067     <!ENTITY % A.content "(%text)*"
1068     -- <H1><a name="xxx">Heading</a></H1>
1069     is preferred to
1070     <a name="xxx"><H1>Heading</H1></a>
1071     -->
1072     ]]>
1073    
1074     <!ENTITY % A.content "(%heading|%text)*">
1075    
1076     <!ELEMENT A - - %A.content -(A)>
1077     <!ATTLIST A
1078     %attrs;
1079     HREF CDATA #IMPLIED
1080     NAME CDATA #IMPLIED
1081     %linkExtraAttributes;
1082     %SDAPREF; "<Anchor: #AttList>"
1083     >
1084     <!-- <A> Anchor; source/destination of link -->
1085     <!-- <A NAME="..."> Name of this anchor -->
1086     <!-- <A HREF="..."> Address of link destination -->
1087     <!-- <A URN="..."> Permanent address of destination -->
1088     <!-- <A REL=...> Relationship to destination -->
1089     <!-- <A REV=...> Relationship of destination to this -->
1090     <!-- <A TITLE="..."> Title of destination (advisory) -->
1091     <!-- <A METHODS="..."> Operations on destination (advisory) -->
1092     <!-- <A CHARSET="..."> Charset of destination (advisory) -->
1093    
1094    
1095     <!--========== Images ==========================-->
1096    
1097     <!ELEMENT IMG - O EMPTY>
1098     <!ATTLIST IMG
1099     %attrs;
1100     SRC CDATA #REQUIRED
1101     ALT CDATA #IMPLIED
1102     ALIGN (top|middle|bottom) #IMPLIED
1103     ISMAP (ISMAP) #IMPLIED
1104     %SDAPREF; "<Fig><?SDATrans Img: #AttList>#AttVal(Alt)</Fig>"
1105     >
1106    
1107     <!-- <IMG> Image; icon, glyph or illustration -->
1108     <!-- <IMG SRC="..."> Address of image object -->
1109     <!-- <IMG ALT="..."> Textual alternative -->
1110     <!-- <IMG ALIGN=...> Position relative to text -->
1111     <!-- <IMG ISMAP> Each pixel can be a link -->
1112    
1113     <!--========== Paragraphs=======================-->
1114    
1115    
1116    
1117     Expires 30 March 1996 [Page 20]
1118    
1119     Internet Draft HTML internationalization 25 September 1995
1120    
1121    
1122     <!ELEMENT P - O (%text)*>
1123     <!ATTLIST P
1124     %attrs;
1125     %just;
1126     %SDAFORM; "Para"
1127     >
1128    
1129     <!-- <P> Paragraph -->
1130    
1131    
1132     <!--========== Headings, Titles, Sections ===============-->
1133    
1134     <!ELEMENT HR - O EMPTY>
1135     <!ATTLIST HR
1136     %attrs;
1137     %just;
1138     %SDAPREF; "&#RE;&#RE;"
1139     >
1140    
1141     <!-- <HR> Horizontal rule -->
1142    
1143     <!ELEMENT ( %heading ) - - (%text;)*>
1144     <!ATTLIST H1
1145     %attrs;
1146     %just;
1147     %SDAFORM; "H1"
1148     >
1149     <!ATTLIST H2
1150     %attrs;
1151     %just;
1152     %SDAFORM; "H2"
1153     >
1154     <!ATTLIST H3
1155     %attrs;
1156     %just;
1157     %SDAFORM; "H3"
1158     >
1159     <!ATTLIST H4
1160     %attrs;
1161     %just;
1162     %SDAFORM; "H4"
1163     >
1164     <!ATTLIST H5
1165     %attrs;
1166     %just;
1167     %SDAFORM; "H5"
1168     >
1169     <!ATTLIST H6
1170    
1171    
1172    
1173     Expires 30 March 1996 [Page 21]
1174    
1175     Internet Draft HTML internationalization 25 September 1995
1176    
1177    
1178     %attrs;
1179     %just;
1180     %SDAFORM; "H6"
1181     >
1182    
1183     <!-- <H1> Heading, level 1 -->
1184     <!-- <H2> Heading, level 2 -->
1185     <!-- <H3> Heading, level 3 -->
1186     <!-- <H4> Heading, level 4 -->
1187     <!-- <H5> Heading, level 5 -->
1188     <!-- <H6> Heading, level 6 -->
1189    
1190    
1191     <!--========== Text Flows ======================-->
1192    
1193     <![ %HTML.Forms [
1194     <!ENTITY % block.forms "BLOCKQUOTE | FORM | ISINDEX">
1195     ]]>
1196    
1197     <!ENTITY % block.forms "BLOCKQUOTE">
1198    
1199     <![ %HTML.Deprecated [
1200     <!ENTITY % preformatted "PRE | XMP | LISTING">
1201     ]]>
1202    
1203     <!ENTITY % preformatted "PRE">
1204    
1205     <!ENTITY % block "P | %list | DL
1206     | %preformatted
1207     | %block.forms">
1208    
1209     <!ENTITY % flow "(%text|%block)*">
1210    
1211     <!ENTITY % pre.content "#PCDATA | A | HR | BR | SPAN | BDO">
1212     <!ELEMENT PRE - - (%pre.content)*>
1213     <!ATTLIST PRE
1214     %attrs;
1215     WIDTH NUMBER #implied
1216     %SDAFORM; "Lit"
1217     >
1218    
1219     <!-- <PRE> Preformatted text -->
1220     <!-- <PRE WIDTH=...> Maximum characters per line -->
1221    
1222     <![ %HTML.Deprecated [
1223    
1224     <!ENTITY % literal "CDATA"
1225     -- historical, non-conforming parsing mode where
1226    
1227    
1228    
1229     Expires 30 March 1996 [Page 22]
1230    
1231     Internet Draft HTML internationalization 25 September 1995
1232    
1233    
1234     the only markup signal is the end tag
1235     in full
1236     -->
1237    
1238     <!ELEMENT (XMP|LISTING) - - %literal>
1239     <!ATTLIST XMP
1240     %attrs;
1241     %SDAFORM; "Lit"
1242     %SDAPREF; "Example:&#RE;"
1243     >
1244     <!ATTLIST LISTING
1245     %attrs;
1246     %SDAFORM; "Lit"
1247     %SDAPREF; "Listing:&#RE;"
1248     >
1249    
1250     <!-- <XMP> Example section -->
1251     <!-- <LISTING> Computer listing -->
1252    
1253     <!ELEMENT PLAINTEXT - O %literal>
1254     <!-- <PLAINTEXT> Plain text passage -->
1255    
1256     <!ATTLIST PLAINTEXT
1257     %attrs;
1258     %SDAFORM; "Lit"
1259     >
1260     ]]>
1261    
1262    
1263     <!--========== Lists ==================-->
1264    
1265     <!ELEMENT DL - - (DT | DD)+>
1266     <!ATTLIST DL
1267     %attrs;
1268     COMPACT (COMPACT) #IMPLIED
1269     %SDAFORM; "List"
1270     %SDAPREF; "Definition List:"
1271     >
1272    
1273     <!ELEMENT DT - O (%text)*>
1274     <!ATTLIST DT
1275     %attrs;
1276     %SDAFORM; "Term"
1277     >
1278    
1279     <!ELEMENT DD - O %flow>
1280     <!ATTLIST DD
1281     %attrs;
1282    
1283    
1284    
1285     Expires 30 March 1996 [Page 23]
1286    
1287     Internet Draft HTML internationalization 25 September 1995
1288    
1289    
1290     %SDAFORM; "LItem"
1291     >
1292    
1293     <!-- <DL> Definition list, or glossary -->
1294     <!-- <DL COMPACT> Compact style list -->
1295     <!-- <DT> Term in definition list -->
1296     <!-- <DD> Definition of term -->
1297    
1298     <!ELEMENT (OL|UL) - - (LI)+>
1299     <!ATTLIST OL
1300     %attrs;
1301     %just;
1302     COMPACT (COMPACT) #IMPLIED
1303     %SDAFORM; "List"
1304     >
1305     <!ATTLIST UL
1306     %attrs;
1307     %just;
1308     COMPACT (COMPACT) #IMPLIED
1309     %SDAFORM; "List"
1310     >
1311     <!-- <UL> Unordered list -->
1312     <!-- <UL COMPACT> Compact list style -->
1313     <!-- <OL> Ordered, or numbered list -->
1314     <!-- <OL COMPACT> Compact list style -->
1315    
1316    
1317     <!ELEMENT (DIR|MENU) - - (LI)+ -(%block)>
1318     <!ATTLIST DIR
1319     %attrs;
1320     %just;
1321     COMPACT (COMPACT) #IMPLIED
1322     %SDAFORM; "List"
1323     %SDAPREF; "<LHead>Directory</LHead>"
1324     >
1325     <!ATTLIST MENU
1326     %attrs;
1327     %just;
1328     COMPACT (COMPACT) #IMPLIED
1329     %SDAFORM; "List"
1330     %SDAPREF; "<LHead>Menu</LHead>"
1331     >
1332    
1333     <!-- <DIR> Directory list -->
1334     <!-- <DIR COMPACT> Compact list style -->
1335     <!-- <MENU> Menu list -->
1336     <!-- <MENU COMPACT> Compact list style -->
1337    
1338    
1339    
1340    
1341     Expires 30 March 1996 [Page 24]
1342    
1343     Internet Draft HTML internationalization 25 September 1995
1344    
1345    
1346     <!ELEMENT LI - O %flow>
1347     <!ATTLIST LI
1348     %attrs;
1349     %just;
1350     %SDAFORM; "LItem"
1351     >
1352    
1353     <!-- <LI> List item -->
1354    
1355     <!--========== Document Body ===================-->
1356    
1357     <![ %HTML.Recommended [
1358     <!ENTITY % body.content "(%heading|%block|HR|ADDRESS|IMG)*"
1359     -- <h1>Heading</h1>
1360     <p>Text ...
1361     is preferred to
1362     <h1>Heading</h1>
1363     Text ...
1364     -->
1365     ]]>
1366    
1367     <!ENTITY % body.content "(%heading | %text | %block |
1368     HR | ADDRESS)*">
1369    
1370     <!ELEMENT BODY O O %body.content>
1371     <!ATTLIST BODY
1372     %attrs;
1373     >
1374    
1375     <!-- <BODY> Document body -->
1376    
1377     <!ELEMENT BLOCKQUOTE - - %body.content>
1378     <!ATTLIST BLOCKQUOTE
1379     %attrs;
1380     %just;
1381     %SDAFORM; "BQ"
1382     >
1383    
1384     <!-- <BLOCKQUOTE> Quoted passage -->
1385    
1386     <!ELEMENT ADDRESS - - (%text|P)*>
1387     <!ATTLIST ADDRESS
1388     %attrs;
1389     %just;
1390     %SDAFORM; "Lit"
1391     %SDAPREF; "Address:&#RE;"
1392     >
1393    
1394    
1395    
1396    
1397     Expires 30 March 1996 [Page 25]
1398    
1399     Internet Draft HTML internationalization 25 September 1995
1400    
1401    
1402     <!-- <ADDRESS> Address, signature, or byline -->
1403    
1404    
1405     <!--======= Forms ====================-->
1406    
1407     <![ %HTML.Forms [
1408    
1409     <!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)>
1410     <!ATTLIST FORM
1411     %attrs;
1412     ACTION CDATA #IMPLIED
1413     METHOD (%HTTP-Method) GET
1414     ENCTYPE %Content-Type; "application/x-www-form-urlencoded"
1415     ACCEPT-CHARSET CDATA #IMPLIED
1416     %SDAPREF; "<Para>Form:</Para>"
1417     %SDASUFF; "<Para>Form End.</Para>"
1418     >
1419    
1420     <!-- <FORM> Fill-out or data-entry form -->
1421     <!-- <FORM ACTION="..."> Address for completed form -->
1422     <!-- <FORM METHOD=...> Method of submitting form -->
1423     <!-- <FORM ENCTYPE="..."> Representation of form data -->
1424    
1425     <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
1426     RADIO | SUBMIT | RESET |
1427     IMAGE | HIDDEN | FILE )">
1428     <!ELEMENT INPUT - O EMPTY>
1429     <!ATTLIST INPUT
1430     %attrs;
1431     TYPE %InputType TEXT
1432     NAME CDATA #IMPLIED
1433     VALUE CDATA #IMPLIED
1434     SRC CDATA #IMPLIED
1435     CHECKED (CHECKED) #IMPLIED
1436     SIZE CDATA #IMPLIED
1437     MAXLENGTH NUMBER #IMPLIED
1438     ALIGN (top|middle|bottom) #IMPLIED
1439     ACCEPT CDATA #IMPLIED --list of content types --
1440     %SDAPREF; "Input: "
1441     >
1442    
1443     <!-- <INPUT> Form input datum -->
1444     <!-- <INPUT TYPE=...> Type of input interaction -->
1445     <!-- <INPUT NAME=...> Name of form datum -->
1446     <!-- <INPUT VALUE="..."> Default/initial/selected value -->
1447     <!-- <INPUT SRC="..."> Address of image -->
1448     <!-- <INPUT CHECKED> Initial state is "on" -->
1449     <!-- <INPUT SIZE=...> Field size hint -->
1450    
1451    
1452    
1453     Expires 30 March 1996 [Page 26]
1454    
1455     Internet Draft HTML internationalization 25 September 1995
1456    
1457    
1458     <!-- <INPUT MAXLENGTH=...> Data length maximum -->
1459     <!-- <INPUT ALIGN=...> Image alignment -->
1460    
1461     <!ELEMENT SELECT - - (OPTION+) -(INPUT|SELECT|TEXTAREA)>
1462     <!ATTLIST SELECT
1463     %attrs;
1464     NAME CDATA #REQUIRED
1465     SIZE NUMBER #IMPLIED
1466     MULTIPLE (MULTIPLE) #IMPLIED
1467     %SDAFORM; "List"
1468     %SDAPREF;
1469     "<LHead>Select #AttVal(Multiple)</LHead>"
1470     >
1471    
1472     <!-- <SELECT> Selection of option(s) -->
1473     <!-- <SELECT NAME=...> Name of form datum -->
1474     <!-- <SELECT SIZE=...> Options displayed at a time -->
1475     <!-- <SELECT MULTIPLE> Multiple selections allowed -->
1476    
1477     <!ELEMENT OPTION - O (#PCDATA)*>
1478     <!ATTLIST OPTION
1479     %attrs;
1480     SELECTED (SELECTED) #IMPLIED
1481     VALUE CDATA #IMPLIED
1482     %SDAFORM; "LItem"
1483     %SDAPREF;
1484     "Option: #AttVal(Value) #AttVal(Selected)"
1485     >
1486    
1487     <!-- <OPTION> A selection option -->
1488     <!-- <OPTION SELECTED> Initial state -->
1489     <!-- <OPTION VALUE="..."> Form datum value for this option-->
1490    
1491     <!ELEMENT TEXTAREA - - (#PCDATA)* -(INPUT|SELECT|TEXTAREA)>
1492     <!ATTLIST TEXTAREA
1493     %attrs;
1494     NAME CDATA #REQUIRED
1495     ROWS NUMBER #REQUIRED
1496     COLS NUMBER #REQUIRED
1497     %SDAFORM; "Para"
1498     %SDAPREF; "Input Text -- #AttVal(Name): "
1499     >
1500    
1501     <!-- <TEXTAREA> An area for text input -->
1502     <!-- <TEXTAREA NAME=...> Name of form datum -->
1503     <!-- <TEXTAREA ROWS=...> Height of area -->
1504     <!-- <TEXTAREA COLS=...> Width of area -->
1505    
1506    
1507    
1508    
1509     Expires 30 March 1996 [Page 27]
1510    
1511     Internet Draft HTML internationalization 25 September 1995
1512    
1513    
1514     ]]>
1515    
1516    
1517     <!--======= Document Head ======================-->
1518    
1519     <![ %HTML.Recommended [
1520     <!ENTITY % head.extra "">
1521     ]]>
1522     <!ENTITY % head.extra "& NEXTID?">
1523    
1524     <!ENTITY % head.content "TITLE & ISINDEX? & BASE? %head.extra">
1525    
1526     <!ELEMENT HEAD O O (%head.content) +(META|LINK)>
1527     <!ATTLIST HEAD
1528     %attrs; >
1529    
1530     <!-- <HEAD> Document head -->
1531    
1532     <!ELEMENT TITLE - - (#PCDATA)* -(META|LINK)>
1533     <!ATTLIST TITLE
1534     %attrs;
1535     %SDAFORM; "Ti" >
1536    
1537     <!-- <TITLE> Title of document -->
1538    
1539     <!ELEMENT LINK - O EMPTY>
1540     <!ATTLIST LINK
1541     %attrs;
1542     HREF CDATA #REQUIRED
1543     %linkExtraAttributes;
1544     %SDAPREF; "Linked to : #AttVal (TITLE) (URN) (HREF)>" >
1545    
1546     <!-- <LINK> Link from this document -->
1547     <!-- <LINK HREF="..."> Address of link destination -->
1548     <!-- <LINK URN="..."> Lasting name of destination -->
1549     <!-- <LINK REL=...> Relationship to destination -->
1550     <!-- <LINK REV=...> Relationship of destination to this -->
1551     <!-- <LINK TITLE="..."> Title of destination (advisory) -->
1552     <!-- <LINK CHARSET="..."> Charset of destination (advisory) -->
1553     <!-- <LINK METHODS="..."> Operations allowed (advisory) -->
1554    
1555     <!ELEMENT ISINDEX - O EMPTY>
1556     <!ATTLIST ISINDEX
1557     %attrs;
1558     %SDAPREF;
1559     "<Para>[Document is indexed/searchable.]</Para>">
1560    
1561     <!-- <ISINDEX> Document is a searchable index -->
1562    
1563    
1564    
1565     Expires 30 March 1996 [Page 28]
1566    
1567     Internet Draft HTML internationalization 25 September 1995
1568    
1569    
1570     <!ELEMENT BASE - O EMPTY>
1571     <!ATTLIST BASE
1572     HREF CDATA #REQUIRED >
1573    
1574     <!-- <BASE> Base context document -->
1575     <!-- <BASE HREF="..."> Address for this document -->
1576    
1577     <!ELEMENT NEXTID - O EMPTY>
1578     <!ATTLIST NEXTID
1579     N CDATA #REQUIRED >
1580    
1581     <!-- <NEXTID> Next ID to use for link name -->
1582     <!-- <NEXTID N=...> Next ID to use for link name -->
1583    
1584     <!ELEMENT META - O EMPTY>
1585     <!ATTLIST META
1586     HTTP-EQUIV NAME #IMPLIED
1587     NAME NAME #IMPLIED
1588     CONTENT CDATA #REQUIRED
1589     >
1590    
1591     <!-- <META> Generic Meta-information -->
1592     <!-- <META HTTP-EQUIV=...> HTTP response header name -->
1593     <!-- <META NAME=...> Meta-information name -->
1594     <!-- <META CONTENT="..."> Associated information -->
1595    
1596     <!--======= Document Structure =================-->
1597    
1598     <![ %HTML.Deprecated [
1599     <!ENTITY % html.content "HEAD, BODY, PLAINTEXT?">
1600     ]]>
1601     <!ENTITY % html.content "HEAD, BODY">
1602    
1603     <!ELEMENT HTML O O (%html.content)>
1604     <!ENTITY % version.attr "VERSION CDATA #FIXED '%HTML.Version;'">
1605    
1606     <!ATTLIST HTML
1607     %attrs;
1608     %version.attr;
1609     %SDAFORM; "Book"
1610     >
1611    
1612     <!-- <HTML> HTML Document -->
1613    
1614    
1615     7.2. SGML Declaration for HTML
1616    
1617     <!SGML "ISO 8879:1986"
1618    
1619    
1620    
1621     Expires 30 March 1996 [Page 29]
1622    
1623     Internet Draft HTML internationalization 25 September 1995
1624    
1625    
1626     --
1627     SGML Declaration for HyperText Markup Language version 2.x
1628     (HTML 2.x).
1629    
1630     --
1631    
1632     CHARSET
1633     BASESET "ISO Registration Number 176//CHARSET
1634     ISO/IEC 10646-1:1993 UCS-2 with
1635     implementation level 3//ESC 2/5 2/15 4/5"
1636     DESCSET 0 9 UNUSED
1637     9 2 9
1638     11 2 UNUSED
1639     13 1 13
1640     14 18 UNUSED
1641     32 95 32
1642     127 1 UNUSED
1643     128 32 UNUSED
1644     160 65376 160
1645    
1646    
1647     CAPACITY SGMLREF
1648     TOTALCAP 150000
1649     GRPCAP 150000
1650     ENTCAP 150000
1651    
1652     SCOPE DOCUMENT
1653     SYNTAX
1654     SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1655     17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127
1656     BASESET "ISO 646:1983//CHARSET
1657     International Reference Version
1658     (IRV)//ESC 2/5 4/0"
1659     DESCSET 0 128 0
1660    
1661     FUNCTION
1662     RE 13
1663     RS 10
1664     SPACE 32
1665     TAB SEPCHAR 9
1666    
1667     NAMING LCNMSTRT ""
1668     UCNMSTRT ""
1669     LCNMCHAR ".-"
1670     UCNMCHAR ".-"
1671     NAMECASE GENERAL YES
1672     ENTITY NO
1673     DELIM GENERAL SGMLREF
1674    
1675    
1676    
1677     Expires 30 March 1996 [Page 30]
1678    
1679     Internet Draft HTML internationalization 25 September 1995
1680    
1681    
1682     SHORTREF SGMLREF
1683     NAMES SGMLREF
1684     QUANTITY SGMLREF
1685     ATTSPLEN 2100
1686     LITLEN 1024
1687     NAMELEN 72 -- somewhat arbitrary; taken from
1688     internet line length conventions --
1689     PILEN 1024
1690     TAGLVL 100
1691     TAGLEN 2100
1692     GRPGTCNT 150
1693     GRPCNT 64
1694    
1695     FEATURES
1696     MINIMIZE
1697     DATATAG NO
1698     OMITTAG YES
1699     RANK NO
1700     SHORTTAG YES
1701     LINK
1702     SIMPLE NO
1703     IMPLICIT NO
1704     EXPLICIT NO
1705     OTHER
1706     CONCUR NO
1707     SUBDOC NO
1708     FORMAL YES
1709     APPINFO "SDA" -- conforming SGML Document Access application
1710     --
1711     >
1712    
1713    
1714     7.3. Entity sets
1715    
1716     7.3.1. ISO Latin 1 Character Entity Set
1717    
1718     The following public text lists each of the characters specified in the
1719     Added Latin 1 entity set, along with its name, syntax for use, and
1720     description. This list is derived from ISO Standard 8879:1986//ENTITIES
1721     Added Latin 1//EN. HTML includes the entire entity set, and adds enti-
1722     ties for all missing characters in the right part of ISO-8859-1.
1723    
1724     <!-- (C) International Organization for Standardization 1986
1725     Permission to copy in any form is granted for use with
1726     conforming SGML systems and applications as defined in
1727     ISO 8879, provided this notice is included in all copies.
1728     -->
1729     <!-- Character entity set. Typical invocation:
1730    
1731    
1732    
1733     Expires 30 March 1996 [Page 31]
1734    
1735     Internet Draft HTML internationalization 25 September 1995
1736    
1737    
1738     <!ENTITY % ISOlat1 PUBLIC
1739     "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML">
1740     %ISOlat1;
1741     -->
1742     <!ENTITY nbsp CDATA "&#160;" -- no-break space -->
1743     <!ENTITY iexcl CDATA "&#161;" -- inverted exclamation mark -->
1744     <!ENTITY cent CDATA "&#162;" -- cent sign -->
1745     <!ENTITY pound CDATA "&#163;" -- pound sterling sign -->
1746     <!ENTITY curren CDATA "&#164;" -- general currency sign -->
1747     <!ENTITY yen CDATA "&#165;" -- yen sign -->
1748     <!ENTITY brvbar CDATA "&#166;" -- broken (vertical) bar -->
1749     <!ENTITY sect CDATA "&#167;" -- section sign -->
1750     <!ENTITY uml CDATA "&#168;" -- umlaut (dieresis) -->
1751     <!ENTITY copy CDATA "&#169;" -- copyright sign -->
1752     <!ENTITY ordf CDATA "&#170;" -- ordinal indicator, feminine -->
1753     <!ENTITY laquo CDATA "&#171;" -- angle quotation mark, left -->
1754     <!ENTITY not CDATA "&#172;" -- not sign -->
1755     <!ENTITY shy CDATA "&#173;" -- soft hyphen -->
1756     <!ENTITY reg CDATA "&#174;" -- registered sign -->
1757     <!ENTITY macr CDATA "&#175;" -- macron -->
1758     <!ENTITY deg CDATA "&#176;" -- degree sign -->
1759     <!ENTITY plusmn CDATA "&#177;" -- plus-or-minus sign -->
1760     <!ENTITY sup2 CDATA "&#178;" -- superscript two -->
1761     <!ENTITY sup3 CDATA "&#179;" -- superscript three -->
1762     <!ENTITY acute CDATA "&#180;" -- acute accent -->
1763     <!ENTITY micro CDATA "&#181;" -- micro sign -->
1764     <!ENTITY para CDATA "&#182;" -- pilcrow (paragraph sign) -->
1765     <!ENTITY middot CDATA "&#183;" -- middle dot -->
1766     <!ENTITY cedil CDATA "&#184;" -- cedilla -->
1767     <!ENTITY sup1 CDATA "&#185;" -- superscript one -->
1768     <!ENTITY ordm CDATA "&#186;" -- ordinal indicator, masculine -->
1769     <!ENTITY raquo CDATA "&#187;" -- angle quotation mark, right -->
1770     <!ENTITY frac14 CDATA "&#188;" -- fraction one-quarter -->
1771     <!ENTITY frac12 CDATA "&#189;" -- fraction one-half -->
1772     <!ENTITY frac34 CDATA "&#190;" -- fraction three-quarters -->
1773     <!ENTITY iquest CDATA "&#191;" -- inverted question mark -->
1774     <!ENTITY Agrave CDATA "&#192;" -- capital A, grave accent -->
1775     <!ENTITY Aacute CDATA "&#193;" -- capital A, acute accent -->
1776     <!ENTITY Acirc CDATA "&#194;" -- capital A, circumflex accent -->
1777     <!ENTITY Atilde CDATA "&#195;" -- capital A, tilde -->
1778     <!ENTITY Auml CDATA "&#196;" -- capital A, dieresis or umlaut mark -->
1779     <!ENTITY Aring CDATA "&#197;" -- capital A, ring -->
1780     <!ENTITY AElig CDATA "&#198;" -- capital AE diphthong (ligature) -->
1781     <!ENTITY Ccedil CDATA "&#199;" -- capital C, cedilla -->
1782     <!ENTITY Egrave CDATA "&#200;" -- capital E, grave accent -->
1783     <!ENTITY Eacute CDATA "&#201;" -- capital E, acute accent -->
1784     <!ENTITY Ecirc CDATA "&#202;" -- capital E, circumflex accent -->
1785     <!ENTITY Euml CDATA "&#203;" -- capital E, dieresis or umlaut mark -->
1786    
1787    
1788    
1789     Expires 30 March 1996 [Page 32]
1790    
1791     Internet Draft HTML internationalization 25 September 1995
1792    
1793    
1794     <!ENTITY Igrave CDATA "&#204;" -- capital I, grave accent -->
1795     <!ENTITY Iacute CDATA "&#205;" -- capital I, acute accent -->
1796     <!ENTITY Icirc CDATA "&#206;" -- capital I, circumflex accent -->
1797     <!ENTITY Iuml CDATA "&#207;" -- capital I, dieresis or umlaut mark -->
1798     <!ENTITY ETH CDATA "&#208;" -- capital Eth, Icelandic -->
1799     <!ENTITY Ntilde CDATA "&#209;" -- capital N, tilde -->
1800     <!ENTITY Ograve CDATA "&#210;" -- capital O, grave accent -->
1801     <!ENTITY Oacute CDATA "&#211;" -- capital O, acute accent -->
1802     <!ENTITY Ocirc CDATA "&#212;" -- capital O, circumflex accent -->
1803     <!ENTITY Otilde CDATA "&#213;" -- capital O, tilde -->
1804     <!ENTITY Ouml CDATA "&#214;" -- capital O, dieresis or umlaut mark -->
1805     <!ENTITY times CDATA "&#215;" -- multiply sign -->
1806     <!ENTITY Oslash CDATA "&#216;" -- capital O, slash -->
1807     <!ENTITY Ugrave CDATA "&#217;" -- capital U, grave accent -->
1808     <!ENTITY Uacute CDATA "&#218;" -- capital U, acute accent -->
1809     <!ENTITY Ucirc CDATA "&#219;" -- capital U, circumflex accent -->
1810     <!ENTITY Uuml CDATA "&#220;" -- capital U, dieresis or umlaut mark -->
1811     <!ENTITY Yacute CDATA "&#221;" -- capital Y, acute accent -->
1812     <!ENTITY THORN CDATA "&#222;" -- capital Thorn, Icelandic -->
1813     <!ENTITY szlig CDATA "&#223;" -- small sharp s, German (sz ligature) -->
1814     <!ENTITY agrave CDATA "&#224;" -- small a, grave accent -->
1815     <!ENTITY aacute CDATA "&#225;" -- small a, acute accent -->
1816     <!ENTITY acirc CDATA "&#226;" -- small a, circumflex accent -->
1817     <!ENTITY atilde CDATA "&#227;" -- small a, tilde -->
1818     <!ENTITY auml CDATA "&#228;" -- small a, dieresis or umlaut mark -->
1819     <!ENTITY aring CDATA "&#229;" -- small a, ring -->
1820     <!ENTITY aelig CDATA "&#230;" -- small ae diphthong (ligature) -->
1821     <!ENTITY ccedil CDATA "&#231;" -- small c, cedilla -->
1822     <!ENTITY egrave CDATA "&#232;" -- small e, grave accent -->
1823     <!ENTITY eacute CDATA "&#233;" -- small e, acute accent -->
1824     <!ENTITY ecirc CDATA "&#234;" -- small e, circumflex accent -->
1825     <!ENTITY euml CDATA "&#235;" -- small e, dieresis or umlaut mark -->
1826     <!ENTITY igrave CDATA "&#236;" -- small i, grave accent -->
1827     <!ENTITY iacute CDATA "&#237;" -- small i, acute accent -->
1828     <!ENTITY icirc CDATA "&#238;" -- small i, circumflex accent -->
1829     <!ENTITY iuml CDATA "&#239;" -- small i, dieresis or umlaut mark -->
1830     <!ENTITY eth CDATA "&#240;" -- small eth, Icelandic -->
1831     <!ENTITY ntilde CDATA "&#241;" -- small n, tilde -->
1832     <!ENTITY ograve CDATA "&#242;" -- small o, grave accent -->
1833     <!ENTITY oacute CDATA "&#243;" -- small o, acute accent -->
1834     <!ENTITY ocirc CDATA "&#244;" -- small o, circumflex accent -->
1835     <!ENTITY otilde CDATA "&#245;" -- small o, tilde -->
1836     <!ENTITY ouml CDATA "&#246;" -- small o, dieresis or umlaut mark -->
1837     <!ENTITY divide CDATA "&#247;" -- divide sign -->
1838     <!ENTITY oslash CDATA "&#248;" -- small o, slash -->
1839     <!ENTITY ugrave CDATA "&#249;" -- small u, grave accent -->
1840     <!ENTITY uacute CDATA "&#250;" -- small u, acute accent -->
1841     <!ENTITY ucirc CDATA "&#251;" -- small u, circumflex accent -->
1842    
1843    
1844    
1845     Expires 30 March 1996 [Page 33]
1846    
1847     Internet Draft HTML internationalization 25 September 1995
1848    
1849    
1850     <!ENTITY uuml CDATA "&#252;" -- small u, dieresis or umlaut mark -->
1851     <!ENTITY yacute CDATA "&#253;" -- small y, acute accent -->
1852     <!ENTITY thorn CDATA "&#254;" -- small thorn, Icelandic -->
1853     <!ENTITY yuml CDATA "&#255;" -- small y, dieresis or umlaut mark -->
1854    
1855    
1856     Bibliography
1857    
1858     [BRYAN88] M. Bryan, "SGML -- An Author's Guide to the Standard
1859     Generalized Markup Language", Addison-Wesley, Reading,
1860     1988.
1861    
1862     [ERCS] Extended Reference Concrete Syntax for SGML.
1863     <http://www.sgmlopen.org/sgml/docs/ercs/ercs-
1864     home.html>
1865    
1866     [ETHNO] "Ethnologue, Languages of the World", 12th Edition,
1867     Barbara F. Grimes editor, Summer Institute of Linguis-
1868     tics, Dallas, 1992.
1869    
1870     [FILE-UPLOAD] E. Nebel and L. Masinter, "Form-based File Upload in
1871     HTML", Work in progress (draft-ietf-html-
1872     fileupload-03.txt), Xerox Corporation, August 1995.
1873    
1874     [GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed.,
1875     Oxford University Press, 1990.
1876    
1877     [HTML-2] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan-
1878     guage - 2.0", Work in progress (draft-ietf-html-
1879     spec-05.txt), MIT/W3C, August 1995.
1880    
1881     [HTTP] T. Berners-Lee, R. T. Fielding, and H. Frystyk
1882     Nielsen, "Hypertext Transfer Protocol - HTTP/1.0",
1883     Work in progress (draft-ietf-http-v10-spec-00.ps),
1884     MIT, UC Irvine, CERN, March 1995.
1885    
1886     [ISO-639] ISO 639:1988. Codes pour la reprsentation des noms de
1887     langue. Technical content in
1888     <http://www.sil.org/sgml/iso639a.html>
1889    
1890     [ISO-1000] ISO 1000:1992. Units SI et recommandations pour
1891     l'emploi de leurs multiples et de certaines autres
1892     units.
1893    
1894     [ISO-3166] ISO 3166:1993. Codes pour la reprsentation des noms
1895     de pays.
1896    
1897     [ISO-4217] ISO 4217:1990. Codes pour la reprsentation des
1898    
1899    
1900    
1901     Expires 30 March 1996 [Page 34]
1902    
1903     Internet Draft HTML internationalization 25 September 1995
1904    
1905    
1906     monnaies et types des fonds.
1907    
1908     [ISO-8601] ISO 8601:1988. lments de donnes et formats
1909     d'change -- change d'information -- Reprsentation
1910     de la date et de l'heure.
1911    
1912     [ISO-8859-1] ISO 8859-1:1987. International Standard -- Informa-
1913     tion Processing -- 8-bit Single-Byte Coded Graphic
1914     Character Sets -- Part 1: Latin Alphabet No. 1.
1915    
1916     [ISO-8879] ISO 8879:1986. International Standard -- Information
1917     Processing -- Text and Office Systems -- Standard Gen-
1918     eralized Markup Language (SGML).
1919    
1920     [ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Infor-
1921     mation technology -- Universal Multiple-Octet Coded
1922     Character Set (UCS) -- Part 1: Architecture and Basic
1923     Multilingual Plane.
1924    
1925     [NICOL] G.T. Nicol, "The Multilingual World Wide Web", Elec-
1926     tronic Book Technologies, 1995,
1927     <http://www.ebt.com/docs/multling.html>
1928    
1929     [RFC1468] J. Murai, M. Crispin and E. van der Poel, "Japanese
1930     Character Encoding for Internet Messages", RFC 1468,
1931     Keio University, Panda Programming, June 1993.
1932    
1933     [RFC1521] N. Borenstein and N. Freed, "MIME (Multipurpose Inter-
1934     net Mail Extensions) Part One: Mechanisms for Specify-
1935     ing and Describing the Format of Internet Message Bod-
1936     ies", RFC 1521, Bellcore, Innosoft, September 1993.
1937    
1938     [RFC1590] J. Postel, "Media Type Registration Procedure", RFC
1939     1590, USC/ISI, March 1994.
1940    
1941     [RFC1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform
1942     Resource Locators (URL)", RFC 1738, CERN, Xerox PARC,
1943     University of Minnesota, October 1994.
1944    
1945     [RFC1766] H. Alverstrand, "Tags for the Identification of Lan-
1946     guages", RFC 1766, UNINETT, March 1995.
1947    
1948     [SQ91] SoftQuad, "The SGML Primer", 3rd ed., SoftQuad Inc.,
1949     1991.
1950    
1951     [TAKADA] Toshihiro Takada, "Multilingual Information Exchange
1952     through the World-Wide Web", Computer Networks and
1953     ISDN Systems, Vol. 27, No. 2, Nov. 1994 , p. 235-241.
1954    
1955    
1956    
1957     Expires 30 March 1996 [Page 35]
1958    
1959     Internet Draft HTML internationalization 25 September 1995
1960    
1961    
1962     [TEI] TEI Guidelines for Electronic Text Encoding and Inter-
1963     change. <http://etext.virgina.edu/TEI.html>
1964    
1965     [UNICODE] The Unicode Consortium, "The Unicode Standard --
1966     Worldwide Character Encoding -- Version 1.0", Addison-
1967     Wesley, Volume 1, 1991, Volume 2, 1992. The BIDI
1968     algorithm is in appendix A of volume 1, with correc-
1969     tions in appendix D of volume 2.
1970    
1971     [VANH90] E. van Hervijnen, "Practical SGML", Kluwer Academicq
1972     Publishers Group, Norwell and Dordrecht, 1990.
1973    
1974     Authors' Addresses
1975    
1976     Franois Yergeau
1977     Alis Technologies
1978     3410, rue Griffith
1979     Montral QC H4T 1A7
1980     Canada
1981    
1982     Tel: +1 (514) 738-9171
1983     Fax: +1 (514) 342-0318
1984     EMail: yergeau@alis.ca
1985    
1986    
1987     Gavin Thomas Nicol
1988     Electronic Book Technologies, Japan
1989     1-29-9 Tsurumaki,
1990     Setagaya-ku,
1991     Tokyo
1992     Japan
1993    
1994     Tel + Fax: +81-3-3706-7351
1995     EMail: gtn@ebt.com, gtn@twics.co.jp
1996    
1997    
1998     Glenn Adams
1999     Stonehand
2000     118 Magazine Street
2001     Cambridge, MA 02139
2002     U.S.A.
2003    
2004     Tel: +1 (617) 864-5524
2005     Fax: +1 (617) 864-4965
2006     EMail: glenn@stonehand.com
2007    
2008    
2009     Martin J. Duerst
2010    
2011    
2012    
2013     Expires 30 March 1996 [Page 36]
2014    
2015     Internet Draft HTML internationalization 25 September 1995
2016    
2017    
2018     Multimedia-Laboratory
2019     Departement of Computer Science
2020     University of Zurich
2021     Winterthurerstrasse 190
2022     CH-8057 Zurich
2023     Switzerland
2024    
2025     Tel: +41 1 257 43 16
2026     Fax: +41 1 363 00 35
2027     E-mail: mduerst@ifi.unizh.ch
2028    
2029    
2030    
2031    
2032    
2033    
2034    
2035    
2036    
2037    
2038    
2039    
2040    
2041    
2042    
2043    
2044    
2045    
2046    
2047    
2048    
2049    
2050    
2051    
2052    
2053    
2054    
2055    
2056    
2057    
2058    
2059    
2060    
2061    
2062    
2063    
2064    
2065    
2066    
2067    
2068    
2069     Expires 30 March 1996 [Page 37]
2070    

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24