1 |
wakaba |
1.1 |
|
2 |
|
|
Network Working Group F. Yergeau |
3 |
|
|
Internet Draft G. Nicol |
4 |
|
|
<draft-ietf-html-i18n-01.txt> G. Adams |
5 |
|
|
Expires 30 March 1996 M. Duerst |
6 |
|
|
25 September 1995 |
7 |
|
|
|
8 |
|
|
|
9 |
|
|
Internationalization of the Hypertext Markup Language |
10 |
|
|
|
11 |
|
|
|
12 |
|
|
Status of this Memo |
13 |
|
|
|
14 |
|
|
This document is an Internet-Draft. Internet-Drafts are working doc- |
15 |
|
|
uments of the Internet Engineering Task Force (IETF), its areas, and |
16 |
|
|
its working groups. Note that other groups may also distribute work- |
17 |
|
|
ing documents as Internet-Drafts. |
18 |
|
|
|
19 |
|
|
Internet-Drafts are draft documents valid for a maximum of six |
20 |
|
|
months. Internet-Drafts may be updated, replaced, or obsoleted by |
21 |
|
|
other documents at any time. It is not appropriate to use Internet- |
22 |
|
|
Drafts as reference material or to cite them other than as a "working |
23 |
|
|
draft" or "work in progress". |
24 |
|
|
|
25 |
|
|
To learn the current status of any Internet-Draft, please check the |
26 |
|
|
1id-abstracts.txt listing contained in the Internet-Drafts Shadow |
27 |
|
|
Directories on ds.internic.net (US East Coast), nic.nordu.net |
28 |
|
|
(Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific |
29 |
|
|
Rim). |
30 |
|
|
|
31 |
|
|
Distribution of this document is unlimited. Please send comments to |
32 |
|
|
the HTML working group (HTML-WG) of the Internet Engineering Task |
33 |
|
|
Force (IETF) at <html-wg@oclc.org>. Discussions of the group are |
34 |
|
|
archived at URL: http://www.acl.lanl.gov/HTML_WG/archives.html. |
35 |
|
|
|
36 |
|
|
|
37 |
|
|
Abstract |
38 |
|
|
|
39 |
|
|
The Hypertext Markup Language (HTML) is a simple markup language used |
40 |
|
|
to create hypertext documents that are platform independent. Ini- |
41 |
|
|
tially, the application of HTML on the World Wide Web was seriously |
42 |
|
|
restricted by its reliance on the ISO-8859-1 coded character set, |
43 |
|
|
which is appropriate only for Western European languages. Despite |
44 |
|
|
this restriction, HTML has been widely used with other languages, |
45 |
|
|
using other coded character sets or character encodings, through var- |
46 |
|
|
ious ad hoc extensions to the language. |
47 |
|
|
|
48 |
|
|
This document is meant to address the issue of the internationaliza- |
49 |
|
|
tion of HTML by extending the specification of HTML 2.0 and giving |
50 |
|
|
|
51 |
|
|
|
52 |
|
|
|
53 |
|
|
Expires 30 March 1996 [Page 1] |
54 |
|
|
|
55 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
56 |
|
|
|
57 |
|
|
|
58 |
|
|
additional recommendations for proper internationalisation support. |
59 |
|
|
A foremost consideration is to make sure that HTML remains a valid |
60 |
|
|
application of SGML, while enabling its use in all languages of the |
61 |
|
|
world. |
62 |
|
|
|
63 |
|
|
The "text/html; version=2.1" Internet Media Type [RFC1590] and MIME |
64 |
|
|
Content Type [RFC1521] is defined by this specification, taken |
65 |
|
|
together with the HTML 2.0 specification [HTML-2]. |
66 |
|
|
|
67 |
|
|
|
68 |
|
|
Table of contents |
69 |
|
|
|
70 |
|
|
1. Introduction .................................................. 2 |
71 |
|
|
1.1. Scope ...................................................... 3 |
72 |
|
|
1.2. Conformance ................................................ 3 |
73 |
|
|
2. The document character set ..................................... 5 |
74 |
|
|
2.1. Reference processing model ................................. 5 |
75 |
|
|
2.2. The HTML 2.1 document character set ........................ 7 |
76 |
|
|
2.3. Undisplayable characters ................................... 8 |
77 |
|
|
3. Language tags .................................................. 8 |
78 |
|
|
4. Additional entities, attributes and elements ...................10 |
79 |
|
|
4.1. Full Latin-1 entity set ....................................10 |
80 |
|
|
4.2. Markup for language-dependent presentation .................10 |
81 |
|
|
5. Forms ..........................................................12 |
82 |
|
|
5.1. DTD additions ..............................................12 |
83 |
|
|
5.2. Form submission ............................................13 |
84 |
|
|
6. Miscellaneous ..................................................14 |
85 |
|
|
7. HTML public text ...............................................15 |
86 |
|
|
7.1. HTML DTD ...................................................15 |
87 |
|
|
7.2. SGML declaration for HTML ..................................30 |
88 |
|
|
7.3. Entity sets ................................................31 |
89 |
|
|
7.3.1. ISO Latin 1 character entity set .......................31 |
90 |
|
|
Bibliography ......................................................34 |
91 |
|
|
Authors' Addresses ................................................36 |
92 |
|
|
|
93 |
|
|
|
94 |
|
|
1. Introduction |
95 |
|
|
|
96 |
|
|
The Hypertext Markup Language (HTML) is a simple markup language used |
97 |
|
|
to create hypertext documents that are platform independent. Ini- |
98 |
|
|
tially, the application of HTML on the World Wide Web was seriously |
99 |
|
|
restricted by its reliance on the ISO-8859-1 coded character set, |
100 |
|
|
which is appropriate only for Western European languages. Despite |
101 |
|
|
this restriction, HTML has been widely used with other languages, |
102 |
|
|
using other coded character sets or character encodings, through var- |
103 |
|
|
ious ad hoc extensions to the language [TAKADA]. |
104 |
|
|
|
105 |
|
|
This document is meant to address the issue of the |
106 |
|
|
|
107 |
|
|
|
108 |
|
|
|
109 |
|
|
Expires 30 March 1996 [Page 2] |
110 |
|
|
|
111 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
112 |
|
|
|
113 |
|
|
|
114 |
|
|
internationalization of HTML by extending the specification of HTML |
115 |
|
|
2.0 and giving additional recommendations for proper internationali- |
116 |
|
|
sation support. It is in good part based on a paper by one of the |
117 |
|
|
authors on multilingualism on the WWW [NICOL]. A foremost considera- |
118 |
|
|
tion is to make sure that HTML remains a valid application of SGML, |
119 |
|
|
while enabling its use in all languages of the world. |
120 |
|
|
|
121 |
|
|
The specific issues addressed are the SGML document character set to |
122 |
|
|
be used for HTML, the proper treatment of the charset parameter asso- |
123 |
|
|
ciated with the "text/html" content type and the specification of |
124 |
|
|
language tags and additional entities. |
125 |
|
|
|
126 |
|
|
|
127 |
|
|
1.1 Scope |
128 |
|
|
|
129 |
|
|
HTML has been in use by the World-Wide Web (WWW) global information |
130 |
|
|
initiative since 1990. This specification extends the capabilities |
131 |
|
|
of HTML 2.0 (RFC xxx), primarily by removing the restriction to the |
132 |
|
|
ISO-8859-1 coded character set [ISO-8859-1]. Together with the HTML |
133 |
|
|
2.0 specification, it defines a new version of HTML to be known as |
134 |
|
|
"HTML 2.1". |
135 |
|
|
|
136 |
|
|
HTML is an application of ISO Standard 8879:1986, Information Pro- |
137 |
|
|
cessing Text and Office Systems -- Standard Generalized Markup Lan- |
138 |
|
|
guage (SGML) [ISO-8879]. The HTML Document Type Definition (DTD) is a |
139 |
|
|
formal definition of the HTML syntax in terms of SGML. This specifi- |
140 |
|
|
cation amends the DTD of HTML 2.0 in order to make it applicable to |
141 |
|
|
documents encompassing a character repertoire much larger than that |
142 |
|
|
of ISO-8859-1, while still remaining SGML conformant. |
143 |
|
|
|
144 |
|
|
Together with the HTML 2.0, specification, this specification also |
145 |
|
|
defines HTML as an Internet Media Type [RFC1590] and MIME Content |
146 |
|
|
Type [RFC1521] called "text/html", or "text/html; version=2.1". As |
147 |
|
|
such, it defines the semantics of the HTML syntax and how that syntax |
148 |
|
|
should be interpreted by user agents. |
149 |
|
|
|
150 |
|
|
|
151 |
|
|
1.2 Conformance |
152 |
|
|
|
153 |
|
|
This specification governs the syntax of HTML documents and aspects |
154 |
|
|
of the behavior of HTML user agents. |
155 |
|
|
|
156 |
|
|
1.2.1 Documents |
157 |
|
|
|
158 |
|
|
A document is a conforming HTML document if: |
159 |
|
|
|
160 |
|
|
* It is a conforming SGML document, and it conforms to the HTML DTD |
161 |
|
|
(see 7.1, "HTML DTD"). |
162 |
|
|
|
163 |
|
|
|
164 |
|
|
|
165 |
|
|
Expires 30 March 1996 [Page 3] |
166 |
|
|
|
167 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
168 |
|
|
|
169 |
|
|
|
170 |
|
|
* It conforms to the application conventions in this specification. |
171 |
|
|
For example, the value of the HREF attribute of the <A> element |
172 |
|
|
must conform to the URI syntax. |
173 |
|
|
|
174 |
|
|
1.2.2. User agents |
175 |
|
|
|
176 |
|
|
An HTML user agent conforms to this specification if: |
177 |
|
|
|
178 |
|
|
* It parses the characters of an HTML document into data characters |
179 |
|
|
and markup according to SGML [ISO-8879]. |
180 |
|
|
|
181 |
|
|
NOTE -- In the interest of robustness and extensibility, |
182 |
|
|
there are a number of widely deployed conventions for han- |
183 |
|
|
dling non-conforming documents. See section 4.2.1 of the |
184 |
|
|
HTML 2.0 specification [HTML-2], "Undeclared Markup Error |
185 |
|
|
Handling" for details. |
186 |
|
|
|
187 |
|
|
* It supports at least the ISO-8859-1 character encoding scheme and |
188 |
|
|
processes each character in the ISO Latin Alphabet No. 1 as speci- |
189 |
|
|
fied in section 6.1 of [HTML-2]. |
190 |
|
|
|
191 |
|
|
To ensure interoperability and proper support for at least |
192 |
|
|
ISO-8859-1 in an environment where character encoding schemes |
193 |
|
|
other than ISO-8859-1 are present, user agents must correctly |
194 |
|
|
interpret the charset parameter accompanying an HTML document |
195 |
|
|
received from the network. |
196 |
|
|
|
197 |
|
|
Furthermore, conforming user-agents are required to at least parse |
198 |
|
|
correctly numeric character references within the range of the |
199 |
|
|
Basic Multilingual Plane (BMP) of ISO 10646-1 [ISO-10646]. |
200 |
|
|
|
201 |
|
|
NOTE -- To support non-western writing systems, HTML user |
202 |
|
|
agents are encouraged to support `UNICODE-1-1' or similar |
203 |
|
|
character encoding schemes and as much of the character |
204 |
|
|
repertoire of [ISO-10646] as is practical. |
205 |
|
|
|
206 |
|
|
* It behaves identically for documents whose parsed token sequences |
207 |
|
|
are identical. |
208 |
|
|
|
209 |
|
|
For example, comments and the whitespace in tags disappear during |
210 |
|
|
tokenization, and hence they do not influence the behavior of con- |
211 |
|
|
forming user agents. |
212 |
|
|
|
213 |
|
|
* It allows the user to traverse (or at least attempt to traverse, |
214 |
|
|
resources permitting) all hyperlinks from <A> elements in an HTML |
215 |
|
|
document. |
216 |
|
|
|
217 |
|
|
An HTML user agent is a level 2 user agent if, additionally: |
218 |
|
|
|
219 |
|
|
|
220 |
|
|
|
221 |
|
|
Expires 30 March 1996 [Page 4] |
222 |
|
|
|
223 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
224 |
|
|
|
225 |
|
|
|
226 |
|
|
* It allows the user to express all form field values specified in |
227 |
|
|
an HTML document and to (attempt to) submit the values as requests |
228 |
|
|
to information services. |
229 |
|
|
|
230 |
|
|
2. The document character set |
231 |
|
|
|
232 |
|
|
2.1. Reference processing model |
233 |
|
|
|
234 |
|
|
This overview explains the reference processing model used for HTML |
235 |
|
|
2.1, and in particular the SGML concept of a document character set. |
236 |
|
|
An actual implementation may widely differ in its internal workings |
237 |
|
|
from the model given below, but should behave as described to an out- |
238 |
|
|
side observer. |
239 |
|
|
|
240 |
|
|
Because there are various widely differing encodings of text, SGML |
241 |
|
|
does not directly address the question of how characters are encoded |
242 |
|
|
e.g. in a file. SGML views the characters as a single set (called a |
243 |
|
|
"character repertoire"), and a "code set" that assigns an integer |
244 |
|
|
number (known as "character number") to each character in the reper- |
245 |
|
|
toire. The document character set declaration defines what each of |
246 |
|
|
the character numbers represents [GOLD90, p. 451]. In most cases, an |
247 |
|
|
SGML DTD and all documents that refer to it have a single document |
248 |
|
|
character set, and all markup and data characters are part of this |
249 |
|
|
set. |
250 |
|
|
|
251 |
|
|
HTML, as an application of SGML, does not directly address the ques- |
252 |
|
|
tion of how characters are encoded as octets in external representa- |
253 |
|
|
tions such as files. This is deferred to mechanisms external to HTML, |
254 |
|
|
such as the HTTP protocol, or MIME for electronic mail. |
255 |
|
|
|
256 |
|
|
For the HTTP protocol [HTTP], the way characters are encoded is |
257 |
|
|
defined by the "charset" parameter[1] of the "Content-Type" field of |
258 |
|
|
the header of an HTTP response. For example, to indicate that the |
259 |
|
|
transmitted document is encoded in the "JIS" encoding of Japanese |
260 |
|
|
[RFC1468], the header will contain the following line: |
261 |
|
|
|
262 |
|
|
Content-Type: text/html; charset=ISO-2022-JP |
263 |
|
|
|
264 |
|
|
_________________________ |
265 |
|
|
[1] The term "charset" in MIME is used to designate a |
266 |
|
|
character encoding, rather than a coded character set |
267 |
|
|
as the term may suggest. A character encoding is a |
268 |
|
|
mapping (possibly many-to-one) of a sequence of octets |
269 |
|
|
to a sequence of characters taken from one or more |
270 |
|
|
character repertoires. A coded character set is a map- |
271 |
|
|
ping between individual bit patterns and individual |
272 |
|
|
characters from a single character repertoire. |
273 |
|
|
|
274 |
|
|
|
275 |
|
|
|
276 |
|
|
|
277 |
|
|
Expires 30 March 1996 [Page 5] |
278 |
|
|
|
279 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
280 |
|
|
|
281 |
|
|
|
282 |
|
|
The default charset parameter in case of the HTTP protocol is |
283 |
|
|
ISO-8859-1 (the so-called "Latin-1" for Western European characters). |
284 |
|
|
The HTTP protocol also defines a mechanism for the client to specify |
285 |
|
|
the character encodings it can accept. Clients and servers are |
286 |
|
|
strongly requested to use these mechanisms to assure correct trans- |
287 |
|
|
mission and interpretation of any document. Provisions that can be |
288 |
|
|
taken to help correct interpretation, even in cases where a server or |
289 |
|
|
client do not yet use these mechanisms, are described in section 6. |
290 |
|
|
|
291 |
|
|
Similarly, if HTML documents are transferred by electronic mail, the |
292 |
|
|
character encoding is defined by the "charset" parameter of the "Con- |
293 |
|
|
tent-Type" MIME header line [RFC1521]. |
294 |
|
|
|
295 |
|
|
In the case any other way of transferring and storing HTML documents |
296 |
|
|
are defined or become popular, it is advised that similar provisions |
297 |
|
|
should be made to clearly identify the character encoding used and/or |
298 |
|
|
to use a single/default encoding capable of representing the widest |
299 |
|
|
range of characters used in an international context. |
300 |
|
|
|
301 |
|
|
Whatever the external character encoding actually be, the reference |
302 |
|
|
processing model translates it to a representation of the document |
303 |
|
|
character set specified in Section 2.2 before processing specific to |
304 |
|
|
SGML/HTML. The reference processing model can be depicted as fol- |
305 |
|
|
lows: |
306 |
|
|
|
307 |
|
|
[resource]->[decoder]->[entity ]->[ SGML ]->[application]->[display] |
308 |
|
|
[manager] [parser] |
309 |
|
|
^ | |
310 |
|
|
| | |
311 |
|
|
+----------+ |
312 |
|
|
|
313 |
|
|
The decoder is responsible for decoding the external representation |
314 |
|
|
of the resource to a representation using the document character set. |
315 |
|
|
The entity manager, the parser, and the application deal only with |
316 |
|
|
characters of the document character set. A display-oriented part of |
317 |
|
|
the application or the display machinery itself may again convert |
318 |
|
|
characters represented in the document character set to some other |
319 |
|
|
representation more suitable for their purpose. In any case, the |
320 |
|
|
entity manager, the parser, and the application, as far as character |
321 |
|
|
semantics are concerned, are using the HTML 2.1 document character |
322 |
|
|
set only. |
323 |
|
|
|
324 |
|
|
An actual implementation may choose, or not, to translate the docu- |
325 |
|
|
ment into some encoding of the document character set as described |
326 |
|
|
above; the behaviour described by this reference processing model can |
327 |
|
|
be achieved otherwise. This subject is well out of the scope of this |
328 |
|
|
specification, however, and the reader is invited to consult the SGML |
329 |
|
|
standard [ISO-8879] or a SGML handbook [BRYAN88] [GOLD90] [VANH90] |
330 |
|
|
|
331 |
|
|
|
332 |
|
|
|
333 |
|
|
Expires 30 March 1996 [Page 6] |
334 |
|
|
|
335 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
336 |
|
|
|
337 |
|
|
|
338 |
|
|
[SQ91] for further information. |
339 |
|
|
|
340 |
|
|
The most important consequence of this reference processing model is |
341 |
|
|
that numeric character references are always resolved to the same |
342 |
|
|
characters, whatever the external encoding actually used. For an |
343 |
|
|
example, see Section 2.2. |
344 |
|
|
|
345 |
|
|
2.2. The HTML 2.1 document character set |
346 |
|
|
|
347 |
|
|
The document character set, in the SGML sense, of HTML 2.1 is the |
348 |
|
|
Basic Multilingual Plane of ISO 10646:1993 [ISO-10646], also known as |
349 |
|
|
UCS-2. This is code-by-code identical with the Unicode standard |
350 |
|
|
[UNICODE]. The adoption of this document character set implies a |
351 |
|
|
change in the SGML declaration specified in the HTML 2.0 specifica- |
352 |
|
|
tion (section 9.5 of [HTML-2]). The change amounts to removing the |
353 |
|
|
two BASESET specifications and their accompanying DESCSET declara- |
354 |
|
|
tions, replacing them with the following declaration: |
355 |
|
|
|
356 |
|
|
BASESET "ISO Registration Number 176//CHARSET |
357 |
|
|
ISO/IEC 10646-1:1993 UCS-2 with implementation level 3 |
358 |
|
|
//ESC 2/5 2/15 4/5" |
359 |
|
|
DESCSET 0 9 UNUSED |
360 |
|
|
9 2 9 |
361 |
|
|
11 2 UNUSED |
362 |
|
|
13 1 13 |
363 |
|
|
14 18 UNUSED |
364 |
|
|
32 95 32 |
365 |
|
|
127 1 UNUSED |
366 |
|
|
128 32 UNUSED |
367 |
|
|
160 65374 160 |
368 |
|
|
|
369 |
|
|
Making UCS-2 the document character set does not create non- |
370 |
|
|
conformance of any expression, construct or document that is conform- |
371 |
|
|
ing to HTML 2.0. It does make conforming certain constructs that are |
372 |
|
|
not admissible in HTML 2.0. One consequence is that data characters |
373 |
|
|
outside the repertoire of ISO-8859-1, but within that of UCS-2 become |
374 |
|
|
valid SGML characters. Another is that the upper limit of the range |
375 |
|
|
of numeric character references is extended from 255 to 65533[2] ; |
376 |
|
|
thus, И is a valid reference to a "CYRILLIC CAPITAL LETTER I". |
377 |
|
|
[ERCS] is a good source of information on Unicode and SGML, although |
378 |
|
|
its scope and technical content differ greatly from this |
379 |
|
|
_________________________ |
380 |
|
|
[2] 65533 (FFFD hexadecimal) is the last valid char- |
381 |
|
|
acter in UCS-2. 65534 (FFFE hexadecimal) is unassigned |
382 |
|
|
and reserved as the byte-swapped version of ZERO WIDTH |
383 |
|
|
NON-BREAKING SPACE for byte-sex detection purposes. |
384 |
|
|
65535 (FFFF hexadecimal) is unassigned. |
385 |
|
|
|
386 |
|
|
|
387 |
|
|
|
388 |
|
|
|
389 |
|
|
Expires 30 March 1996 [Page 7] |
390 |
|
|
|
391 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
392 |
|
|
|
393 |
|
|
|
394 |
|
|
specification. |
395 |
|
|
|
396 |
|
|
ISO 10646-1:1993 is the most encompassing character set currently |
397 |
|
|
existing, and there is no other character set that could take its |
398 |
|
|
place as the document character set for HTML 2.1. Also, it is |
399 |
|
|
expected that with future extensions of ISO 10646, this specification |
400 |
|
|
may also be extended. If nevertheless for a specific application |
401 |
|
|
there is a need to use characters outside this standard, this should |
402 |
|
|
be done by avoiding any conflicts with present or future versions of |
403 |
|
|
ISO 10646, i.e. by assigning these characters to a private zone. |
404 |
|
|
Also, it should be borne in mind that such a use will be highly |
405 |
|
|
unportable; in many cases, it may be better to use inline bitmaps. |
406 |
|
|
|
407 |
|
|
2.3. Undisplayable characters |
408 |
|
|
|
409 |
|
|
With the document character set being the full ISO 10646 BMP, the |
410 |
|
|
possibility that a character cannot be displayed due to lack of |
411 |
|
|
appropriate resources (fonts) cannot be avoided. Because there are |
412 |
|
|
many different things that can be done in such a case, this document |
413 |
|
|
does not recommend any specific behaviour. Depending on the implemen- |
414 |
|
|
tation, this may also be handled by the underlaying display system |
415 |
|
|
and not the application itself. The following considerations, how- |
416 |
|
|
ever, may be of help: |
417 |
|
|
|
418 |
|
|
- A clearly visible, but unobtrusive behaviour should be preferred. |
419 |
|
|
Some documents may contain many characters that cannot be renden- |
420 |
|
|
dered, and so showing an alert for each of them is not the right |
421 |
|
|
thing to do. |
422 |
|
|
|
423 |
|
|
- In case a numeric representation of the missing character is |
424 |
|
|
given, its hexadecimal (not decimal) form is to be preferred, |
425 |
|
|
because this form is used in character set standards [ERCS]. |
426 |
|
|
|
427 |
|
|
3. Language tags |
428 |
|
|
|
429 |
|
|
Language tags can be used to control rendering of a marked up docu- |
430 |
|
|
ment in various ways: character disambiguation, in cases where the |
431 |
|
|
character encoding is not sufficient to resolve to a specific glyph; |
432 |
|
|
quotation marks; hyphenation; ligatures; spacing; voice synthesis; |
433 |
|
|
etc. Independently of rendering issues, language markup is useful as |
434 |
|
|
content markup for purposes such as classification and searching. |
435 |
|
|
|
436 |
|
|
The language attribute, LANG, takes as its value a language tag that |
437 |
|
|
identifies a natural language spoken, written, or otherwise conveyed |
438 |
|
|
by human beings for communication of information to other human |
439 |
|
|
beings. Computer languages are explicitly excluded. |
440 |
|
|
|
441 |
|
|
The syntax and registry of HTML language tags is the same as that |
442 |
|
|
|
443 |
|
|
|
444 |
|
|
|
445 |
|
|
Expires 30 March 1996 [Page 8] |
446 |
|
|
|
447 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
448 |
|
|
|
449 |
|
|
|
450 |
|
|
defined by RFC 1766 [RFC1766]. In summary, a language tag is composed |
451 |
|
|
of one or more parts: A primary language tag and a possibly empty |
452 |
|
|
series of subtags: |
453 |
|
|
|
454 |
|
|
language-tag = primary-tag *( "-" subtag ) |
455 |
|
|
primary-tag = 1*8ALPHA |
456 |
|
|
subtag = 1*8ALPHA |
457 |
|
|
|
458 |
|
|
Whitespace is not allowed within the tag and all tags are case- |
459 |
|
|
insensitive. The namespace of language tags is administered by the |
460 |
|
|
IANA. Example tags include: |
461 |
|
|
|
462 |
|
|
en, en-US, en-cockney, i-cherokee, x-pig-latin |
463 |
|
|
|
464 |
|
|
Two-letter primary-tags are reserved for ISO 639 language abbrevia- |
465 |
|
|
tions [ISO-639], and three-letter primary-tags for the language |
466 |
|
|
abbreviations of the "Ethnologue" [ETHNO] (the latter is in addition |
467 |
|
|
to the requirements of RFC 1766). Any two-letter initial subtag is an |
468 |
|
|
ISO 3166 country code [ISO-3166]. |
469 |
|
|
|
470 |
|
|
In the context of HTML, a language tag is not to be interpreted as a |
471 |
|
|
single token, as per RFC 1766, but as a hierarchy. For example, a |
472 |
|
|
user agent that adjusts rendering according to language should con- |
473 |
|
|
sider that it has a match when a language tag in a style sheet entry |
474 |
|
|
matches the initial portion of the language tag of an element. An |
475 |
|
|
exact match should be preferred. This interpretation allows an ele- |
476 |
|
|
ment marked up as, for instance, "en-US" to trigger styles corre- |
477 |
|
|
sponding to, in order of preference, US-English ("en-US") or 'plain' |
478 |
|
|
or 'international' English ("en"). |
479 |
|
|
|
480 |
|
|
NOTE -- using the language tag as a hierarchy does not |
481 |
|
|
imply that all languages with a common prefix will be |
482 |
|
|
understood by those fluent in one or more of those lan- |
483 |
|
|
guages; it simply allows the user to request this commonal- |
484 |
|
|
ity when it is true for that user. |
485 |
|
|
|
486 |
|
|
Since any text can logically be assigned a language, almost all HTML |
487 |
|
|
elements admit the LANG attribute. The DTD reflects this. It is |
488 |
|
|
also intended that any new element introduced in later versions of |
489 |
|
|
HTML will admit the LANG attribute, unless there is a good reason not |
490 |
|
|
to do so. |
491 |
|
|
|
492 |
|
|
The rendering of elements is meant to be controlled (in part) by the |
493 |
|
|
LANG attribute. Specific user preferences set within the browser |
494 |
|
|
should override the value of the LANG attribute, which in turn over- |
495 |
|
|
rides the value specified by the LANG attribute of any enclosing ele- |
496 |
|
|
ment. If none of these are set, a suitable default, perhaps con- |
497 |
|
|
trolled by the user's locale, should be used to control rendering. |
498 |
|
|
|
499 |
|
|
|
500 |
|
|
|
501 |
|
|
Expires 30 March 1996 [Page 9] |
502 |
|
|
|
503 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
504 |
|
|
|
505 |
|
|
|
506 |
|
|
4. Additional entities, attributes and elements |
507 |
|
|
|
508 |
|
|
4.1. Full Latin-1 entity set |
509 |
|
|
|
510 |
|
|
According to the suggestion of section 14 of [HTML-2], the set of |
511 |
|
|
Latin-1 entities is extended to cover the whole right part of |
512 |
|
|
ISO-8859-1. The names of the entities are taken from the appendices |
513 |
|
|
of [SGML]. A list is provided in section 7.3.1 of this specifica- |
514 |
|
|
tion. |
515 |
|
|
|
516 |
|
|
4.2. Markup for language-dependent presentation |
517 |
|
|
|
518 |
|
|
For the correct presentation of text from certain languages (irre- |
519 |
|
|
spective of formatting issues), some support in the form of addi- |
520 |
|
|
tional entities and elements is needed. In particular, bidirectional |
521 |
|
|
text (BIDI for short) requires markup in special circumstances where |
522 |
|
|
ambiguities as to the directionnality of some characters have to be |
523 |
|
|
resolved. Plain text may contain this markup in the form of special- |
524 |
|
|
purpose characters; in HTML, these are replaced by SGML markup to be |
525 |
|
|
described below. |
526 |
|
|
|
527 |
|
|
This markup affects the ability to render BIDI text in a semantically |
528 |
|
|
legible fashion. That is, without this special BIDI markup, cases |
529 |
|
|
arise which would prevent *any* rendering whatsoever that reflected |
530 |
|
|
the basic meaning of the text. It is for this reason that these spe- |
531 |
|
|
cial characters were added to Unicode (and, thence, to ISO/IEC |
532 |
|
|
10646). If it were possible to do reliable layout and rendering of |
533 |
|
|
bidirectionnal text without them, they definitely would not have been |
534 |
|
|
included in Unicode. |
535 |
|
|
|
536 |
|
|
First, a set of named character entities is added that allows partial |
537 |
|
|
support of the Unicode bidirectional algorithm [UNICODE], plus some |
538 |
|
|
help with languages requiring contextual analysis for rendering: |
539 |
|
|
|
540 |
|
|
<!ENTITY zwnj CDATA "‌"--=zero width non-joiner--> |
541 |
|
|
<!ENTITY zwj CDATA "‍"--=zero width joiner--> |
542 |
|
|
<!ENTITY lrm CDATA "‎"--=left-to-right mark--> |
543 |
|
|
<!ENTITY rlm CDATA "‏"--=right-to-left mark--> |
544 |
|
|
|
545 |
|
|
The first two, zwnj and zwj, are used to force or block joining |
546 |
|
|
behavior in contexts which joining would occur but should not or |
547 |
|
|
would not occur but should. For example, ARABIC LETTER HEH is used |
548 |
|
|
to abbreviate "Hijri" (the Islamic calendrical system); however, the |
549 |
|
|
isolated form of HEH looks like the digit five as employed in Arabic |
550 |
|
|
script (actually based on Indic digits). In order to prevent one |
551 |
|
|
from reading HEH as a final digit five in a year, the initial form of |
552 |
|
|
HEH is used. However, there is no following context (i.e., a joining |
553 |
|
|
letter) to which the HEH can join. Therefore, the ZWJ is used to |
554 |
|
|
|
555 |
|
|
|
556 |
|
|
|
557 |
|
|
Expires 30 March 1996 [Page 10] |
558 |
|
|
|
559 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
560 |
|
|
|
561 |
|
|
|
562 |
|
|
provide that context. In Farsi texts, there are cases where a letter |
563 |
|
|
that normally would join a subsequent letter in a cursive connection |
564 |
|
|
does not. Here the ZWNJ is used. |
565 |
|
|
|
566 |
|
|
The other two, lrm and rlm, are used to disambiguate directionality |
567 |
|
|
of directionally neutral characters, e.g., if you have a double quote |
568 |
|
|
sitting between an Arabic and a Latin letter, then which direction |
569 |
|
|
does the quote resolve to? These characters are like zero width |
570 |
|
|
spaces which have a directional property (but no word/line break |
571 |
|
|
property). |
572 |
|
|
|
573 |
|
|
Next, an attribute called DIR is introduced, restricted to the values |
574 |
|
|
LTR and RTL and admitted by most elements. On block-type elements, |
575 |
|
|
the DIR attribute indicates the base directionnality of the text in |
576 |
|
|
the block; if omitted it is inherited from the parent element. On |
577 |
|
|
inline elements, it makes the element start a new embedding level; if |
578 |
|
|
omitted the inline element does not start a new embedding level. |
579 |
|
|
Embedding is used to handle nested directional runs; a common need |
580 |
|
|
for the embedding characters is to handle text that has been pasted |
581 |
|
|
from one bidi context to another, and the possibility of multiply |
582 |
|
|
embedded pastings. Following is an example of a case where embedding |
583 |
|
|
is needed, showing its effect: |
584 |
|
|
|
585 |
|
|
Given the following latin (upper case) and arabic (lower |
586 |
|
|
case) letters in backing store with the specified embed- |
587 |
|
|
dings (LRE is shorthand for <SPAN DIR=LTR>, RLE for <SPAN |
588 |
|
|
DIR=RTL> and PDF for </SPAN>): |
589 |
|
|
|
590 |
|
|
LRE A B RLE a b LRE C D PDF c d PDF E F PDF |
591 |
|
|
|
592 |
|
|
One gets the following rendering (with [] showing the |
593 |
|
|
directional transitions): |
594 |
|
|
|
595 |
|
|
[ A B [ d c [ C D ] b a ] E F ] |
596 |
|
|
|
597 |
|
|
On the other hand, without these characters, e.g., with |
598 |
|
|
|
599 |
|
|
A B a b C D c d E F |
600 |
|
|
|
601 |
|
|
and a base level of LTR one gets the following rendering: |
602 |
|
|
|
603 |
|
|
[ A B [ b a ] C D [ d c ] E F ] |
604 |
|
|
|
605 |
|
|
Notice that b,a is on the left and d,c on the right unlike |
606 |
|
|
the above case where the embedding levels are used. With- |
607 |
|
|
out the embedding characters one has at most two levels: a |
608 |
|
|
base directional level and a single counterflow directional |
609 |
|
|
level. |
610 |
|
|
|
611 |
|
|
|
612 |
|
|
|
613 |
|
|
Expires 30 March 1996 [Page 11] |
614 |
|
|
|
615 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
616 |
|
|
|
617 |
|
|
|
618 |
|
|
A directionnal override feature is needed to deal with |
619 |
|
|
unusual pieces of text in which directionality cannot be |
620 |
|
|
resolved from context in an unambiguous fashion. For exam- |
621 |
|
|
ple, in part numbers, formulas, telephone numbers, and |
622 |
|
|
other similar pieces of text, it is difficult or impossible |
623 |
|
|
to derive the directionality of numbers, punctuation, and |
624 |
|
|
other neutrals from their context. To this effect, a new |
625 |
|
|
element called BDO (BIDI override) is introduced, which |
626 |
|
|
requires the DIR attribute to specify whether the override |
627 |
|
|
is left-to-right or right-to-left. |
628 |
|
|
|
629 |
|
|
A few other additional elements are important to have for |
630 |
|
|
proper language-dependent rendering. First, a generic con- |
631 |
|
|
tainer is needed to carry the LANG and BIDI attributes in |
632 |
|
|
cases where no other element is appropriate; the SPAN ele- |
633 |
|
|
ment is introduced for that purpose. |
634 |
|
|
|
635 |
|
|
Short quotations, and in particular the quotation marks |
636 |
|
|
surrounding them, are typically rendered differently in |
637 |
|
|
different languages and on platforms with different graphic |
638 |
|
|
capabilities: "a quotation in English", `another, slightly |
639 |
|
|
better one', ,,a quotation in German", << a quotation in |
640 |
|
|
French >>. The <Q> element is introduced for that purpose. |
641 |
|
|
|
642 |
|
|
Many languages, including English, require superscripts for |
643 |
|
|
proper rendering: "the XXth century" should have "th" in |
644 |
|
|
superscript. The <SUP> element, and its sibling <SUB>, are |
645 |
|
|
introduced to allow proper markup of such text. <SUP> and |
646 |
|
|
<SUB> contents are restricted to PCDATA to avoid nesting |
647 |
|
|
problems. |
648 |
|
|
|
649 |
|
|
Finally, in many languages text justification is much more |
650 |
|
|
important than it is in Western languages, and justifies |
651 |
|
|
markup. The ALIGN attribute, admitting values of LEFT, |
652 |
|
|
RIGHT, CENTER and JUSTIFY, is added to a selection of ele- |
653 |
|
|
ments where it makes sense (block-like). |
654 |
|
|
|
655 |
|
|
5. Forms |
656 |
|
|
|
657 |
|
|
|
658 |
|
|
5.1. DTD additions |
659 |
|
|
|
660 |
|
|
It is natural to expect input in any language in forms, as they pro- |
661 |
|
|
vide one of the only ways of obtaining user input. While this is pri- |
662 |
|
|
marily a UI issue, there are some things that should be specified at |
663 |
|
|
the HTML level to guide behavior and promote interoperability. |
664 |
|
|
|
665 |
|
|
To ensure interoperability, it is necessary for the user agent (and |
666 |
|
|
|
667 |
|
|
|
668 |
|
|
|
669 |
|
|
Expires 30 March 1996 [Page 12] |
670 |
|
|
|
671 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
672 |
|
|
|
673 |
|
|
|
674 |
|
|
the user) to have an indication of the character set(s) that the |
675 |
|
|
server providing a form will be able to handle upon submission of the |
676 |
|
|
filled-in form. Such an indication is provided by the ACCEPT-CHARSET |
677 |
|
|
attribute of the FORM element, modeled on the HTTP Accept-Charset |
678 |
|
|
header (see [HTTP]), which contains a space and/or comma delimited |
679 |
|
|
list of character sets acceptable to the server. A user agent may |
680 |
|
|
want to somehow advise the user of the contents of this attribute, or |
681 |
|
|
to restrict his possibility to enter unacceptable characters. |
682 |
|
|
|
683 |
|
|
NOTE -- The list of character sets is to be interpreted as |
684 |
|
|
an EXCLUSIVE-OR list; the server announces that it is ready |
685 |
|
|
to accept any ONE of these character encoding schemes for |
686 |
|
|
each part of a multipart entity. |
687 |
|
|
|
688 |
|
|
|
689 |
|
|
5.2. Form submission |
690 |
|
|
|
691 |
|
|
The HTML 2.0 form submission mechanism, based on the "application/x- |
692 |
|
|
www-form-urlencoded" media type, is hopelessly broken with regard to |
693 |
|
|
internationalization. In fact, since URLs are restricted to ASCII |
694 |
|
|
characters, the mechanism is broken even for ISO-8859-1 text. Sec- |
695 |
|
|
tion 2.2 of [RFC1738] specifies that octets may be encoded using the |
696 |
|
|
"%HH" notation, but text submitted from a form is composed of charac- |
697 |
|
|
ters, not octets. Lacking a specification of a character encoding |
698 |
|
|
scheme, the "%HH" notation has no meaning. |
699 |
|
|
|
700 |
|
|
A partial solution to this sorry state of affairs is to specify a |
701 |
|
|
default character encoding scheme to be assumed when the GET method |
702 |
|
|
of form submission is used. Specifying UCS-2 would break all exist- |
703 |
|
|
ing forms, so the only sensible way is to designate ISO-8859-1. That |
704 |
|
|
is, the encoded URL sent to submit a form by the GET method is to be |
705 |
|
|
interpreted as a sequence of single-octet characters encoded accord- |
706 |
|
|
ing to ISO-8859-1, and further encoded according to the scheme of |
707 |
|
|
[RFC1738] (the "%HH" notation). This is clearly insufficient, so the |
708 |
|
|
GET method of form submission is deprecated and should not be used in |
709 |
|
|
future documents, despite the language of section XX of [HTML-2]. |
710 |
|
|
|
711 |
|
|
A better solution is to add a MIME charset parameter to the Content- |
712 |
|
|
Type header sent along with a POST method form submission, with the |
713 |
|
|
understanding that the URL encoding of [RFC1738] is applied on top of |
714 |
|
|
the specified character encoding, as a kind of implicit Content- |
715 |
|
|
Transfer-Encoding. The default ISO-8859-1 is to be implied in the |
716 |
|
|
absence of a charset parameter. |
717 |
|
|
|
718 |
|
|
The best solution is to use the "multipart/form-data" media type |
719 |
|
|
described in [FILE-UPLOAD] with the POST method of form submission. |
720 |
|
|
This mechanism encapsulates the value part of each name-value pair in |
721 |
|
|
a body-part of a multipart MIME body that is sent as the HTTP entity; |
722 |
|
|
|
723 |
|
|
|
724 |
|
|
|
725 |
|
|
Expires 30 March 1996 [Page 13] |
726 |
|
|
|
727 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
728 |
|
|
|
729 |
|
|
|
730 |
|
|
each body part can be labeled with an appropriate Content-Type, |
731 |
|
|
including if necessary a charset parameter that specifies the charac- |
732 |
|
|
ter encoding scheme. The changes to the DTD necessary to support |
733 |
|
|
this method of form submission have been incorporated in the DTD |
734 |
|
|
included in this specification. |
735 |
|
|
|
736 |
|
|
How the user agent determines the encoding of the text entered by the |
737 |
|
|
user is outside the scope of this specification. |
738 |
|
|
|
739 |
|
|
6. Miscellaneous |
740 |
|
|
|
741 |
|
|
Proper interpretation of a text document requires that the character |
742 |
|
|
encoding scheme be known. Current HTTP servers, however, do not gen- |
743 |
|
|
erally include an appropriate charset parameter with the Content-Type |
744 |
|
|
header, even when the encoding scheme is different from the default |
745 |
|
|
ISO-8859-1. This is bad behaviour, and as such strongly discouraged, |
746 |
|
|
but some preventive measures can be taken to minimize the detrimental |
747 |
|
|
effects. |
748 |
|
|
|
749 |
|
|
In the case where a document is accessed from a hyperlink in an ori- |
750 |
|
|
gin HTML document, a CHARSET attribute is added to the attribute list |
751 |
|
|
of elements with link semantics (A and LINK), specifically by adding |
752 |
|
|
it to the linkExtraAttributes entity. The value of that attribute is |
753 |
|
|
to be considered a hint to the User Agent as to the character encod- |
754 |
|
|
ing scheme used by the ressource pointed to by the hyperlink; it |
755 |
|
|
should be the appropriate value of the MIME charset parameter for |
756 |
|
|
that ressource. |
757 |
|
|
|
758 |
|
|
In any document, it may be wise to include an indication of the |
759 |
|
|
encoding scheme like the following, as early as possible within the |
760 |
|
|
HEAD of the document: |
761 |
|
|
|
762 |
|
|
<META HTTP-EQUIV="Content-Type" |
763 |
|
|
CONTENT="text/html; charset=ISO-2022-JP"> |
764 |
|
|
|
765 |
|
|
This is not foolproof, but will work if the encoding scheme is such |
766 |
|
|
that ASCII characters stand for themselves at least until the META |
767 |
|
|
element is parsed. |
768 |
|
|
|
769 |
|
|
For definiteness, the "charset" parameter received from the source of |
770 |
|
|
the document should be considered the most authoritative, followed in |
771 |
|
|
order of preference by the contents of a META element such as the |
772 |
|
|
above, and finally the CHARSET parameter of the anchor that was fol- |
773 |
|
|
lowed (if any). |
774 |
|
|
|
775 |
|
|
When HTML text is transmitted directly in UCS-2 |
776 |
|
|
(charset=UNICODE-1-1), the question of byte order arises: does the |
777 |
|
|
high-order byte of each two-byte character come first or second? For |
778 |
|
|
|
779 |
|
|
|
780 |
|
|
|
781 |
|
|
Expires 30 March 1996 [Page 14] |
782 |
|
|
|
783 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
784 |
|
|
|
785 |
|
|
|
786 |
|
|
definiteness, this specification recommends that UCS-2 be transmitted |
787 |
|
|
in big-endian byte order (high order byte first), which corresponds |
788 |
|
|
both to the established network byte order for two-byte quantities |
789 |
|
|
and to the Unicode recommendation for serialized text data. Further- |
790 |
|
|
more, to maximize chances of proper interpretation, it is recommended |
791 |
|
|
that documents transmitted as UCS-2 always begin with a ZERO-WIDTH |
792 |
|
|
NON-BREAKING SPACE character (hexadecimal FEFF) which, when byte- |
793 |
|
|
reversed becomes number FFFE, a character guaranteed to be never |
794 |
|
|
assigned. Thus, a user-agent receiving an FFFE as the first octets |
795 |
|
|
of a text would know that bytes have to be reversed for the remainder |
796 |
|
|
of the text. |
797 |
|
|
|
798 |
|
|
7. HTML Public Text |
799 |
|
|
|
800 |
|
|
7.1. HTML DTD |
801 |
|
|
|
802 |
|
|
<!-- html-2.1.dtd |
803 |
|
|
|
804 |
|
|
Document Type Definition for the HyperText Markup Language, |
805 |
|
|
version 2.1 (HTML DTD) |
806 |
|
|
|
807 |
|
|
Last revised: 95/09/25 |
808 |
|
|
|
809 |
|
|
Authors: Daniel W. Connolly <connolly@w3.org> |
810 |
|
|
Francois Yergeau <yergeau@alis.com> |
811 |
|
|
--> |
812 |
|
|
|
813 |
|
|
<!ENTITY % HTML.Version |
814 |
|
|
"-//IETF//DTD HTML 2.1//EN" |
815 |
|
|
|
816 |
|
|
-- Typical usage: |
817 |
|
|
|
818 |
|
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.1//EN"> |
819 |
|
|
<html> |
820 |
|
|
... |
821 |
|
|
</html> |
822 |
|
|
-- |
823 |
|
|
> |
824 |
|
|
|
825 |
|
|
|
826 |
|
|
<!--============ Feature Test Entities ========================--> |
827 |
|
|
|
828 |
|
|
<!ENTITY % HTML.Recommended "IGNORE" |
829 |
|
|
-- Certain features of the language are necessary for |
830 |
|
|
compatibility with widespread usage, but they may |
831 |
|
|
compromise the structural integrity of a document. |
832 |
|
|
This feature test entity enables a more prescriptive |
833 |
|
|
document type definition that eliminates |
834 |
|
|
|
835 |
|
|
|
836 |
|
|
|
837 |
|
|
Expires 30 March 1996 [Page 15] |
838 |
|
|
|
839 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
840 |
|
|
|
841 |
|
|
|
842 |
|
|
those features. |
843 |
|
|
--> |
844 |
|
|
|
845 |
|
|
<![ %HTML.Recommended [ |
846 |
|
|
<!ENTITY % HTML.Deprecated "IGNORE"> |
847 |
|
|
]]> |
848 |
|
|
|
849 |
|
|
<!ENTITY % HTML.Deprecated "INCLUDE" |
850 |
|
|
-- Certain features of the language are necessary for |
851 |
|
|
compatibility with earlier versions of the specification, |
852 |
|
|
but they tend to be used and implemented inconsistently, |
853 |
|
|
and their use is deprecated. This feature test entity |
854 |
|
|
enables a document type definition that eliminates |
855 |
|
|
these features. |
856 |
|
|
--> |
857 |
|
|
|
858 |
|
|
<!ENTITY % HTML.Highlighting "INCLUDE" |
859 |
|
|
-- Use this feature test entity to validate that a |
860 |
|
|
document uses no highlighting tags, which may be |
861 |
|
|
ignored on minimal implementations. |
862 |
|
|
--> |
863 |
|
|
|
864 |
|
|
<!ENTITY % HTML.Forms "INCLUDE" |
865 |
|
|
-- Use this feature test entity to validate that a document |
866 |
|
|
contains no forms, which may not be supported in minimal |
867 |
|
|
implementations |
868 |
|
|
--> |
869 |
|
|
|
870 |
|
|
<!--============== Imported Names ==============================--> |
871 |
|
|
|
872 |
|
|
<!ENTITY % Content-Type "CDATA" |
873 |
|
|
-- meaning an internet media type |
874 |
|
|
(aka MIME content type, as per RFC1521) |
875 |
|
|
--> |
876 |
|
|
|
877 |
|
|
<!ENTITY % HTTP-Method "GET | POST" |
878 |
|
|
-- as per HTTP specification, in progress |
879 |
|
|
--> |
880 |
|
|
|
881 |
|
|
<!--========= DTD "Macros" =====================--> |
882 |
|
|
|
883 |
|
|
<!ENTITY % heading "H1|H2|H3|H4|H5|H6"> |
884 |
|
|
|
885 |
|
|
<!ENTITY % list " UL | OL | DIR | MENU " > |
886 |
|
|
|
887 |
|
|
<!ENTITY % attrs -- common attributes for elements -- |
888 |
|
|
"LANG NAME #IMPLIED -- RFC 1766 language tag -- |
889 |
|
|
DIR (ltr|rtl) #IMPLIED -- text directionnality --"> |
890 |
|
|
|
891 |
|
|
|
892 |
|
|
|
893 |
|
|
Expires 30 March 1996 [Page 16] |
894 |
|
|
|
895 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
896 |
|
|
|
897 |
|
|
|
898 |
|
|
<!ENTITY % just -- an attribute for text justification -- |
899 |
|
|
"ALIGN (left|right|center|justify) #IMPLIED"> |
900 |
|
|
|
901 |
|
|
<!--======= Character mnemonic entities =================--> |
902 |
|
|
|
903 |
|
|
<!ENTITY % ISOlat1 PUBLIC |
904 |
|
|
"ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML"> |
905 |
|
|
%ISOlat1; |
906 |
|
|
|
907 |
|
|
<!--Entities for markup significant characters --> |
908 |
|
|
<!ENTITY amp CDATA "&" -- ampersand --> |
909 |
|
|
<!ENTITY gt CDATA ">" -- greater than --> |
910 |
|
|
<!ENTITY lt CDATA "<" -- less than --> |
911 |
|
|
<!ENTITY quot CDATA """ -- double quote --> |
912 |
|
|
|
913 |
|
|
<!--Entities for language-dependent presentation (BIDI and contextual analysis) --> |
914 |
|
|
<!ENTITY zwnj CDATA "‌"-- zero width non-joiner--> |
915 |
|
|
<!ENTITY zwj CDATA "‍"-- zero width joiner--> |
916 |
|
|
<!ENTITY lrm CDATA "‎"-- left-to-right mark--> |
917 |
|
|
<!ENTITY rlm CDATA "‏"-- right-to-left mark--> |
918 |
|
|
|
919 |
|
|
<!--========= SGML Document Access (SDA) Parameter Entities =====--> |
920 |
|
|
|
921 |
|
|
<!-- HTML 2.0 contains SGML Document Access (SDA) fixed attributes |
922 |
|
|
in support of easy transformation to the International Committee |
923 |
|
|
for Accessible Document Design (ICADD) DTD |
924 |
|
|
"-//EC-USA-CDA/ICADD//DTD ICADD22//EN". |
925 |
|
|
ICADD applications are designed to support usable access to |
926 |
|
|
structured information by print-impaired individuals through |
927 |
|
|
Braille, large print and voice synthesis. For more information on |
928 |
|
|
SDA & ICADD: |
929 |
|
|
- ISO 12083:1993, Annex A.8, Facilities for Braille, |
930 |
|
|
large print and computer voice |
931 |
|
|
- ICADD ListServ |
932 |
|
|
<ICADD%ASUACAD.BITNET@ARIZVM1.ccit.arizona.edu> |
933 |
|
|
- Usenet news group bit.listserv.easi |
934 |
|
|
- Recording for the Blind, +1 800 221 4792 |
935 |
|
|
--> |
936 |
|
|
|
937 |
|
|
<!ENTITY % SDAFORM "SDAFORM CDATA #FIXED" |
938 |
|
|
-- one to one mapping --> |
939 |
|
|
<!ENTITY % SDARULE "SDARULE CDATA #FIXED" |
940 |
|
|
-- context-sensitive mapping --> |
941 |
|
|
<!ENTITY % SDAPREF "SDAPREF CDATA #FIXED" |
942 |
|
|
-- generated text prefix --> |
943 |
|
|
<!ENTITY % SDASUFF "SDASUFF CDATA #FIXED" |
944 |
|
|
-- generated text suffix --> |
945 |
|
|
<!ENTITY % SDASUSP "SDASUSP NAME #FIXED" |
946 |
|
|
|
947 |
|
|
|
948 |
|
|
|
949 |
|
|
Expires 30 March 1996 [Page 17] |
950 |
|
|
|
951 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
952 |
|
|
|
953 |
|
|
|
954 |
|
|
-- suspend transform process --> |
955 |
|
|
|
956 |
|
|
<!--========== Text Markup =====================--> |
957 |
|
|
|
958 |
|
|
<![ %HTML.Highlighting [ |
959 |
|
|
|
960 |
|
|
<!ENTITY % font " TT | B | I "> |
961 |
|
|
|
962 |
|
|
<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE"> |
963 |
|
|
|
964 |
|
|
<!ENTITY % text "#PCDATA|A|IMG|BR|%phrase|%font|SPAN|Q|BDO|SUP|SUB"> |
965 |
|
|
|
966 |
|
|
<!ELEMENT (%font;|%phrase) - - (%text)*> |
967 |
|
|
<!ATTLIST ( TT | CODE | SAMP | KBD | VAR ) |
968 |
|
|
%attrs; |
969 |
|
|
%SDAFORM; "Lit" |
970 |
|
|
> |
971 |
|
|
<!ATTLIST ( B | STRONG ) |
972 |
|
|
%attrs; |
973 |
|
|
%SDAFORM; "B" |
974 |
|
|
> |
975 |
|
|
<!ATTLIST ( I | EM | CITE ) |
976 |
|
|
%attrs; |
977 |
|
|
%SDAFORM; "It" |
978 |
|
|
> |
979 |
|
|
|
980 |
|
|
<!-- <TT> Typewriter text --> |
981 |
|
|
<!-- <B> Bold text --> |
982 |
|
|
<!-- <I> Italic text --> |
983 |
|
|
<!-- <EM> Emphasized phrase --> |
984 |
|
|
<!-- <STRONG> Strong emphasis --> |
985 |
|
|
<!-- <CODE> Source code phrase --> |
986 |
|
|
<!-- <SAMP> Sample text or characters --> |
987 |
|
|
<!-- <KBD> Keyboard phrase, e.g. user input --> |
988 |
|
|
<!-- <VAR> Variable phrase or substituable --> |
989 |
|
|
<!-- <CITE> Name or title of cited work --> |
990 |
|
|
|
991 |
|
|
<!ENTITY % pre.content "#PCDATA|A|HR|BR|%font|%phrase|SPAN|BDO"> |
992 |
|
|
|
993 |
|
|
]]> |
994 |
|
|
|
995 |
|
|
<!ENTITY % text "#PCDATA|A|IMG|BR|SPAN|Q|BDO|SUP|SUB"> |
996 |
|
|
|
997 |
|
|
<!-- Should the BDO element have an SDAFORM attr.? Which? --> |
998 |
|
|
<!ELEMENT BDO - - (%text)+> |
999 |
|
|
<!ATTLIST BDO |
1000 |
|
|
LANG NAME #IMPLIED |
1001 |
|
|
DIR (ltr|rtl) #REQUIRED |
1002 |
|
|
|
1003 |
|
|
|
1004 |
|
|
|
1005 |
|
|
Expires 30 March 1996 [Page 18] |
1006 |
|
|
|
1007 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1008 |
|
|
|
1009 |
|
|
|
1010 |
|
|
> |
1011 |
|
|
|
1012 |
|
|
<!-- <BDO> Control bidirectionnal text --> |
1013 |
|
|
|
1014 |
|
|
<!ELEMENT BR - O EMPTY> |
1015 |
|
|
<!ATTLIST BR |
1016 |
|
|
%SDAPREF; "&#RE;" |
1017 |
|
|
> |
1018 |
|
|
|
1019 |
|
|
<!-- <BR> Line break --> |
1020 |
|
|
|
1021 |
|
|
<!-- Should the SPAN element have an SDAFORM attr.? Which? --> |
1022 |
|
|
<!ELEMENT SPAN - - (%text)*> |
1023 |
|
|
<!ATTLIST SPAN |
1024 |
|
|
%attrs; |
1025 |
|
|
> |
1026 |
|
|
|
1027 |
|
|
<!-- <SPAN> Generic container --> |
1028 |
|
|
|
1029 |
|
|
<!ELEMENT Q - - (%text)*> |
1030 |
|
|
<!ATTLIST Q |
1031 |
|
|
%attrs; |
1032 |
|
|
%SDAFORM; "It" -- to be verified -- |
1033 |
|
|
> |
1034 |
|
|
|
1035 |
|
|
<!-- <Q> Short quotation --> |
1036 |
|
|
|
1037 |
|
|
<!ELEMENT (SUP|SUB) - - (#PCDATA)> |
1038 |
|
|
<!ATTLIST (SUP|SUB) |
1039 |
|
|
%attrs; |
1040 |
|
|
> |
1041 |
|
|
|
1042 |
|
|
<!-- <SUP> Superscript --> |
1043 |
|
|
<!-- <SUB> Subscript --> |
1044 |
|
|
|
1045 |
|
|
<!--========= Link Markup ======================--> |
1046 |
|
|
|
1047 |
|
|
<!ENTITY % linkType "NAME"> |
1048 |
|
|
|
1049 |
|
|
<!ENTITY % linkExtraAttributes |
1050 |
|
|
"REL %linkType #IMPLIED |
1051 |
|
|
REV %linkType #IMPLIED |
1052 |
|
|
URN CDATA #IMPLIED |
1053 |
|
|
TITLE CDATA #IMPLIED |
1054 |
|
|
METHODS NAMES #IMPLIED |
1055 |
|
|
CHARSET NAME #IMPLIED |
1056 |
|
|
"> |
1057 |
|
|
|
1058 |
|
|
|
1059 |
|
|
|
1060 |
|
|
|
1061 |
|
|
Expires 30 March 1996 [Page 19] |
1062 |
|
|
|
1063 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1064 |
|
|
|
1065 |
|
|
|
1066 |
|
|
<![ %HTML.Recommended [ |
1067 |
|
|
<!ENTITY % A.content "(%text)*" |
1068 |
|
|
-- <H1><a name="xxx">Heading</a></H1> |
1069 |
|
|
is preferred to |
1070 |
|
|
<a name="xxx"><H1>Heading</H1></a> |
1071 |
|
|
--> |
1072 |
|
|
]]> |
1073 |
|
|
|
1074 |
|
|
<!ENTITY % A.content "(%heading|%text)*"> |
1075 |
|
|
|
1076 |
|
|
<!ELEMENT A - - %A.content -(A)> |
1077 |
|
|
<!ATTLIST A |
1078 |
|
|
%attrs; |
1079 |
|
|
HREF CDATA #IMPLIED |
1080 |
|
|
NAME CDATA #IMPLIED |
1081 |
|
|
%linkExtraAttributes; |
1082 |
|
|
%SDAPREF; "<Anchor: #AttList>" |
1083 |
|
|
> |
1084 |
|
|
<!-- <A> Anchor; source/destination of link --> |
1085 |
|
|
<!-- <A NAME="..."> Name of this anchor --> |
1086 |
|
|
<!-- <A HREF="..."> Address of link destination --> |
1087 |
|
|
<!-- <A URN="..."> Permanent address of destination --> |
1088 |
|
|
<!-- <A REL=...> Relationship to destination --> |
1089 |
|
|
<!-- <A REV=...> Relationship of destination to this --> |
1090 |
|
|
<!-- <A TITLE="..."> Title of destination (advisory) --> |
1091 |
|
|
<!-- <A METHODS="..."> Operations on destination (advisory) --> |
1092 |
|
|
<!-- <A CHARSET="..."> Charset of destination (advisory) --> |
1093 |
|
|
|
1094 |
|
|
|
1095 |
|
|
<!--========== Images ==========================--> |
1096 |
|
|
|
1097 |
|
|
<!ELEMENT IMG - O EMPTY> |
1098 |
|
|
<!ATTLIST IMG |
1099 |
|
|
%attrs; |
1100 |
|
|
SRC CDATA #REQUIRED |
1101 |
|
|
ALT CDATA #IMPLIED |
1102 |
|
|
ALIGN (top|middle|bottom) #IMPLIED |
1103 |
|
|
ISMAP (ISMAP) #IMPLIED |
1104 |
|
|
%SDAPREF; "<Fig><?SDATrans Img: #AttList>#AttVal(Alt)</Fig>" |
1105 |
|
|
> |
1106 |
|
|
|
1107 |
|
|
<!-- <IMG> Image; icon, glyph or illustration --> |
1108 |
|
|
<!-- <IMG SRC="..."> Address of image object --> |
1109 |
|
|
<!-- <IMG ALT="..."> Textual alternative --> |
1110 |
|
|
<!-- <IMG ALIGN=...> Position relative to text --> |
1111 |
|
|
<!-- <IMG ISMAP> Each pixel can be a link --> |
1112 |
|
|
|
1113 |
|
|
<!--========== Paragraphs=======================--> |
1114 |
|
|
|
1115 |
|
|
|
1116 |
|
|
|
1117 |
|
|
Expires 30 March 1996 [Page 20] |
1118 |
|
|
|
1119 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1120 |
|
|
|
1121 |
|
|
|
1122 |
|
|
<!ELEMENT P - O (%text)*> |
1123 |
|
|
<!ATTLIST P |
1124 |
|
|
%attrs; |
1125 |
|
|
%just; |
1126 |
|
|
%SDAFORM; "Para" |
1127 |
|
|
> |
1128 |
|
|
|
1129 |
|
|
<!-- <P> Paragraph --> |
1130 |
|
|
|
1131 |
|
|
|
1132 |
|
|
<!--========== Headings, Titles, Sections ===============--> |
1133 |
|
|
|
1134 |
|
|
<!ELEMENT HR - O EMPTY> |
1135 |
|
|
<!ATTLIST HR |
1136 |
|
|
%attrs; |
1137 |
|
|
%just; |
1138 |
|
|
%SDAPREF; "&#RE;&#RE;" |
1139 |
|
|
> |
1140 |
|
|
|
1141 |
|
|
<!-- <HR> Horizontal rule --> |
1142 |
|
|
|
1143 |
|
|
<!ELEMENT ( %heading ) - - (%text;)*> |
1144 |
|
|
<!ATTLIST H1 |
1145 |
|
|
%attrs; |
1146 |
|
|
%just; |
1147 |
|
|
%SDAFORM; "H1" |
1148 |
|
|
> |
1149 |
|
|
<!ATTLIST H2 |
1150 |
|
|
%attrs; |
1151 |
|
|
%just; |
1152 |
|
|
%SDAFORM; "H2" |
1153 |
|
|
> |
1154 |
|
|
<!ATTLIST H3 |
1155 |
|
|
%attrs; |
1156 |
|
|
%just; |
1157 |
|
|
%SDAFORM; "H3" |
1158 |
|
|
> |
1159 |
|
|
<!ATTLIST H4 |
1160 |
|
|
%attrs; |
1161 |
|
|
%just; |
1162 |
|
|
%SDAFORM; "H4" |
1163 |
|
|
> |
1164 |
|
|
<!ATTLIST H5 |
1165 |
|
|
%attrs; |
1166 |
|
|
%just; |
1167 |
|
|
%SDAFORM; "H5" |
1168 |
|
|
> |
1169 |
|
|
<!ATTLIST H6 |
1170 |
|
|
|
1171 |
|
|
|
1172 |
|
|
|
1173 |
|
|
Expires 30 March 1996 [Page 21] |
1174 |
|
|
|
1175 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1176 |
|
|
|
1177 |
|
|
|
1178 |
|
|
%attrs; |
1179 |
|
|
%just; |
1180 |
|
|
%SDAFORM; "H6" |
1181 |
|
|
> |
1182 |
|
|
|
1183 |
|
|
<!-- <H1> Heading, level 1 --> |
1184 |
|
|
<!-- <H2> Heading, level 2 --> |
1185 |
|
|
<!-- <H3> Heading, level 3 --> |
1186 |
|
|
<!-- <H4> Heading, level 4 --> |
1187 |
|
|
<!-- <H5> Heading, level 5 --> |
1188 |
|
|
<!-- <H6> Heading, level 6 --> |
1189 |
|
|
|
1190 |
|
|
|
1191 |
|
|
<!--========== Text Flows ======================--> |
1192 |
|
|
|
1193 |
|
|
<![ %HTML.Forms [ |
1194 |
|
|
<!ENTITY % block.forms "BLOCKQUOTE | FORM | ISINDEX"> |
1195 |
|
|
]]> |
1196 |
|
|
|
1197 |
|
|
<!ENTITY % block.forms "BLOCKQUOTE"> |
1198 |
|
|
|
1199 |
|
|
<![ %HTML.Deprecated [ |
1200 |
|
|
<!ENTITY % preformatted "PRE | XMP | LISTING"> |
1201 |
|
|
]]> |
1202 |
|
|
|
1203 |
|
|
<!ENTITY % preformatted "PRE"> |
1204 |
|
|
|
1205 |
|
|
<!ENTITY % block "P | %list | DL |
1206 |
|
|
| %preformatted |
1207 |
|
|
| %block.forms"> |
1208 |
|
|
|
1209 |
|
|
<!ENTITY % flow "(%text|%block)*"> |
1210 |
|
|
|
1211 |
|
|
<!ENTITY % pre.content "#PCDATA | A | HR | BR | SPAN | BDO"> |
1212 |
|
|
<!ELEMENT PRE - - (%pre.content)*> |
1213 |
|
|
<!ATTLIST PRE |
1214 |
|
|
%attrs; |
1215 |
|
|
WIDTH NUMBER #implied |
1216 |
|
|
%SDAFORM; "Lit" |
1217 |
|
|
> |
1218 |
|
|
|
1219 |
|
|
<!-- <PRE> Preformatted text --> |
1220 |
|
|
<!-- <PRE WIDTH=...> Maximum characters per line --> |
1221 |
|
|
|
1222 |
|
|
<![ %HTML.Deprecated [ |
1223 |
|
|
|
1224 |
|
|
<!ENTITY % literal "CDATA" |
1225 |
|
|
-- historical, non-conforming parsing mode where |
1226 |
|
|
|
1227 |
|
|
|
1228 |
|
|
|
1229 |
|
|
Expires 30 March 1996 [Page 22] |
1230 |
|
|
|
1231 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1232 |
|
|
|
1233 |
|
|
|
1234 |
|
|
the only markup signal is the end tag |
1235 |
|
|
in full |
1236 |
|
|
--> |
1237 |
|
|
|
1238 |
|
|
<!ELEMENT (XMP|LISTING) - - %literal> |
1239 |
|
|
<!ATTLIST XMP |
1240 |
|
|
%attrs; |
1241 |
|
|
%SDAFORM; "Lit" |
1242 |
|
|
%SDAPREF; "Example:&#RE;" |
1243 |
|
|
> |
1244 |
|
|
<!ATTLIST LISTING |
1245 |
|
|
%attrs; |
1246 |
|
|
%SDAFORM; "Lit" |
1247 |
|
|
%SDAPREF; "Listing:&#RE;" |
1248 |
|
|
> |
1249 |
|
|
|
1250 |
|
|
<!-- <XMP> Example section --> |
1251 |
|
|
<!-- <LISTING> Computer listing --> |
1252 |
|
|
|
1253 |
|
|
<!ELEMENT PLAINTEXT - O %literal> |
1254 |
|
|
<!-- <PLAINTEXT> Plain text passage --> |
1255 |
|
|
|
1256 |
|
|
<!ATTLIST PLAINTEXT |
1257 |
|
|
%attrs; |
1258 |
|
|
%SDAFORM; "Lit" |
1259 |
|
|
> |
1260 |
|
|
]]> |
1261 |
|
|
|
1262 |
|
|
|
1263 |
|
|
<!--========== Lists ==================--> |
1264 |
|
|
|
1265 |
|
|
<!ELEMENT DL - - (DT | DD)+> |
1266 |
|
|
<!ATTLIST DL |
1267 |
|
|
%attrs; |
1268 |
|
|
COMPACT (COMPACT) #IMPLIED |
1269 |
|
|
%SDAFORM; "List" |
1270 |
|
|
%SDAPREF; "Definition List:" |
1271 |
|
|
> |
1272 |
|
|
|
1273 |
|
|
<!ELEMENT DT - O (%text)*> |
1274 |
|
|
<!ATTLIST DT |
1275 |
|
|
%attrs; |
1276 |
|
|
%SDAFORM; "Term" |
1277 |
|
|
> |
1278 |
|
|
|
1279 |
|
|
<!ELEMENT DD - O %flow> |
1280 |
|
|
<!ATTLIST DD |
1281 |
|
|
%attrs; |
1282 |
|
|
|
1283 |
|
|
|
1284 |
|
|
|
1285 |
|
|
Expires 30 March 1996 [Page 23] |
1286 |
|
|
|
1287 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1288 |
|
|
|
1289 |
|
|
|
1290 |
|
|
%SDAFORM; "LItem" |
1291 |
|
|
> |
1292 |
|
|
|
1293 |
|
|
<!-- <DL> Definition list, or glossary --> |
1294 |
|
|
<!-- <DL COMPACT> Compact style list --> |
1295 |
|
|
<!-- <DT> Term in definition list --> |
1296 |
|
|
<!-- <DD> Definition of term --> |
1297 |
|
|
|
1298 |
|
|
<!ELEMENT (OL|UL) - - (LI)+> |
1299 |
|
|
<!ATTLIST OL |
1300 |
|
|
%attrs; |
1301 |
|
|
%just; |
1302 |
|
|
COMPACT (COMPACT) #IMPLIED |
1303 |
|
|
%SDAFORM; "List" |
1304 |
|
|
> |
1305 |
|
|
<!ATTLIST UL |
1306 |
|
|
%attrs; |
1307 |
|
|
%just; |
1308 |
|
|
COMPACT (COMPACT) #IMPLIED |
1309 |
|
|
%SDAFORM; "List" |
1310 |
|
|
> |
1311 |
|
|
<!-- <UL> Unordered list --> |
1312 |
|
|
<!-- <UL COMPACT> Compact list style --> |
1313 |
|
|
<!-- <OL> Ordered, or numbered list --> |
1314 |
|
|
<!-- <OL COMPACT> Compact list style --> |
1315 |
|
|
|
1316 |
|
|
|
1317 |
|
|
<!ELEMENT (DIR|MENU) - - (LI)+ -(%block)> |
1318 |
|
|
<!ATTLIST DIR |
1319 |
|
|
%attrs; |
1320 |
|
|
%just; |
1321 |
|
|
COMPACT (COMPACT) #IMPLIED |
1322 |
|
|
%SDAFORM; "List" |
1323 |
|
|
%SDAPREF; "<LHead>Directory</LHead>" |
1324 |
|
|
> |
1325 |
|
|
<!ATTLIST MENU |
1326 |
|
|
%attrs; |
1327 |
|
|
%just; |
1328 |
|
|
COMPACT (COMPACT) #IMPLIED |
1329 |
|
|
%SDAFORM; "List" |
1330 |
|
|
%SDAPREF; "<LHead>Menu</LHead>" |
1331 |
|
|
> |
1332 |
|
|
|
1333 |
|
|
<!-- <DIR> Directory list --> |
1334 |
|
|
<!-- <DIR COMPACT> Compact list style --> |
1335 |
|
|
<!-- <MENU> Menu list --> |
1336 |
|
|
<!-- <MENU COMPACT> Compact list style --> |
1337 |
|
|
|
1338 |
|
|
|
1339 |
|
|
|
1340 |
|
|
|
1341 |
|
|
Expires 30 March 1996 [Page 24] |
1342 |
|
|
|
1343 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1344 |
|
|
|
1345 |
|
|
|
1346 |
|
|
<!ELEMENT LI - O %flow> |
1347 |
|
|
<!ATTLIST LI |
1348 |
|
|
%attrs; |
1349 |
|
|
%just; |
1350 |
|
|
%SDAFORM; "LItem" |
1351 |
|
|
> |
1352 |
|
|
|
1353 |
|
|
<!-- <LI> List item --> |
1354 |
|
|
|
1355 |
|
|
<!--========== Document Body ===================--> |
1356 |
|
|
|
1357 |
|
|
<![ %HTML.Recommended [ |
1358 |
|
|
<!ENTITY % body.content "(%heading|%block|HR|ADDRESS|IMG)*" |
1359 |
|
|
-- <h1>Heading</h1> |
1360 |
|
|
<p>Text ... |
1361 |
|
|
is preferred to |
1362 |
|
|
<h1>Heading</h1> |
1363 |
|
|
Text ... |
1364 |
|
|
--> |
1365 |
|
|
]]> |
1366 |
|
|
|
1367 |
|
|
<!ENTITY % body.content "(%heading | %text | %block | |
1368 |
|
|
HR | ADDRESS)*"> |
1369 |
|
|
|
1370 |
|
|
<!ELEMENT BODY O O %body.content> |
1371 |
|
|
<!ATTLIST BODY |
1372 |
|
|
%attrs; |
1373 |
|
|
> |
1374 |
|
|
|
1375 |
|
|
<!-- <BODY> Document body --> |
1376 |
|
|
|
1377 |
|
|
<!ELEMENT BLOCKQUOTE - - %body.content> |
1378 |
|
|
<!ATTLIST BLOCKQUOTE |
1379 |
|
|
%attrs; |
1380 |
|
|
%just; |
1381 |
|
|
%SDAFORM; "BQ" |
1382 |
|
|
> |
1383 |
|
|
|
1384 |
|
|
<!-- <BLOCKQUOTE> Quoted passage --> |
1385 |
|
|
|
1386 |
|
|
<!ELEMENT ADDRESS - - (%text|P)*> |
1387 |
|
|
<!ATTLIST ADDRESS |
1388 |
|
|
%attrs; |
1389 |
|
|
%just; |
1390 |
|
|
%SDAFORM; "Lit" |
1391 |
|
|
%SDAPREF; "Address:&#RE;" |
1392 |
|
|
> |
1393 |
|
|
|
1394 |
|
|
|
1395 |
|
|
|
1396 |
|
|
|
1397 |
|
|
Expires 30 March 1996 [Page 25] |
1398 |
|
|
|
1399 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1400 |
|
|
|
1401 |
|
|
|
1402 |
|
|
<!-- <ADDRESS> Address, signature, or byline --> |
1403 |
|
|
|
1404 |
|
|
|
1405 |
|
|
<!--======= Forms ====================--> |
1406 |
|
|
|
1407 |
|
|
<![ %HTML.Forms [ |
1408 |
|
|
|
1409 |
|
|
<!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)> |
1410 |
|
|
<!ATTLIST FORM |
1411 |
|
|
%attrs; |
1412 |
|
|
ACTION CDATA #IMPLIED |
1413 |
|
|
METHOD (%HTTP-Method) GET |
1414 |
|
|
ENCTYPE %Content-Type; "application/x-www-form-urlencoded" |
1415 |
|
|
ACCEPT-CHARSET CDATA #IMPLIED |
1416 |
|
|
%SDAPREF; "<Para>Form:</Para>" |
1417 |
|
|
%SDASUFF; "<Para>Form End.</Para>" |
1418 |
|
|
> |
1419 |
|
|
|
1420 |
|
|
<!-- <FORM> Fill-out or data-entry form --> |
1421 |
|
|
<!-- <FORM ACTION="..."> Address for completed form --> |
1422 |
|
|
<!-- <FORM METHOD=...> Method of submitting form --> |
1423 |
|
|
<!-- <FORM ENCTYPE="..."> Representation of form data --> |
1424 |
|
|
|
1425 |
|
|
<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | |
1426 |
|
|
RADIO | SUBMIT | RESET | |
1427 |
|
|
IMAGE | HIDDEN | FILE )"> |
1428 |
|
|
<!ELEMENT INPUT - O EMPTY> |
1429 |
|
|
<!ATTLIST INPUT |
1430 |
|
|
%attrs; |
1431 |
|
|
TYPE %InputType TEXT |
1432 |
|
|
NAME CDATA #IMPLIED |
1433 |
|
|
VALUE CDATA #IMPLIED |
1434 |
|
|
SRC CDATA #IMPLIED |
1435 |
|
|
CHECKED (CHECKED) #IMPLIED |
1436 |
|
|
SIZE CDATA #IMPLIED |
1437 |
|
|
MAXLENGTH NUMBER #IMPLIED |
1438 |
|
|
ALIGN (top|middle|bottom) #IMPLIED |
1439 |
|
|
ACCEPT CDATA #IMPLIED --list of content types -- |
1440 |
|
|
%SDAPREF; "Input: " |
1441 |
|
|
> |
1442 |
|
|
|
1443 |
|
|
<!-- <INPUT> Form input datum --> |
1444 |
|
|
<!-- <INPUT TYPE=...> Type of input interaction --> |
1445 |
|
|
<!-- <INPUT NAME=...> Name of form datum --> |
1446 |
|
|
<!-- <INPUT VALUE="..."> Default/initial/selected value --> |
1447 |
|
|
<!-- <INPUT SRC="..."> Address of image --> |
1448 |
|
|
<!-- <INPUT CHECKED> Initial state is "on" --> |
1449 |
|
|
<!-- <INPUT SIZE=...> Field size hint --> |
1450 |
|
|
|
1451 |
|
|
|
1452 |
|
|
|
1453 |
|
|
Expires 30 March 1996 [Page 26] |
1454 |
|
|
|
1455 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1456 |
|
|
|
1457 |
|
|
|
1458 |
|
|
<!-- <INPUT MAXLENGTH=...> Data length maximum --> |
1459 |
|
|
<!-- <INPUT ALIGN=...> Image alignment --> |
1460 |
|
|
|
1461 |
|
|
<!ELEMENT SELECT - - (OPTION+) -(INPUT|SELECT|TEXTAREA)> |
1462 |
|
|
<!ATTLIST SELECT |
1463 |
|
|
%attrs; |
1464 |
|
|
NAME CDATA #REQUIRED |
1465 |
|
|
SIZE NUMBER #IMPLIED |
1466 |
|
|
MULTIPLE (MULTIPLE) #IMPLIED |
1467 |
|
|
%SDAFORM; "List" |
1468 |
|
|
%SDAPREF; |
1469 |
|
|
"<LHead>Select #AttVal(Multiple)</LHead>" |
1470 |
|
|
> |
1471 |
|
|
|
1472 |
|
|
<!-- <SELECT> Selection of option(s) --> |
1473 |
|
|
<!-- <SELECT NAME=...> Name of form datum --> |
1474 |
|
|
<!-- <SELECT SIZE=...> Options displayed at a time --> |
1475 |
|
|
<!-- <SELECT MULTIPLE> Multiple selections allowed --> |
1476 |
|
|
|
1477 |
|
|
<!ELEMENT OPTION - O (#PCDATA)*> |
1478 |
|
|
<!ATTLIST OPTION |
1479 |
|
|
%attrs; |
1480 |
|
|
SELECTED (SELECTED) #IMPLIED |
1481 |
|
|
VALUE CDATA #IMPLIED |
1482 |
|
|
%SDAFORM; "LItem" |
1483 |
|
|
%SDAPREF; |
1484 |
|
|
"Option: #AttVal(Value) #AttVal(Selected)" |
1485 |
|
|
> |
1486 |
|
|
|
1487 |
|
|
<!-- <OPTION> A selection option --> |
1488 |
|
|
<!-- <OPTION SELECTED> Initial state --> |
1489 |
|
|
<!-- <OPTION VALUE="..."> Form datum value for this option--> |
1490 |
|
|
|
1491 |
|
|
<!ELEMENT TEXTAREA - - (#PCDATA)* -(INPUT|SELECT|TEXTAREA)> |
1492 |
|
|
<!ATTLIST TEXTAREA |
1493 |
|
|
%attrs; |
1494 |
|
|
NAME CDATA #REQUIRED |
1495 |
|
|
ROWS NUMBER #REQUIRED |
1496 |
|
|
COLS NUMBER #REQUIRED |
1497 |
|
|
%SDAFORM; "Para" |
1498 |
|
|
%SDAPREF; "Input Text -- #AttVal(Name): " |
1499 |
|
|
> |
1500 |
|
|
|
1501 |
|
|
<!-- <TEXTAREA> An area for text input --> |
1502 |
|
|
<!-- <TEXTAREA NAME=...> Name of form datum --> |
1503 |
|
|
<!-- <TEXTAREA ROWS=...> Height of area --> |
1504 |
|
|
<!-- <TEXTAREA COLS=...> Width of area --> |
1505 |
|
|
|
1506 |
|
|
|
1507 |
|
|
|
1508 |
|
|
|
1509 |
|
|
Expires 30 March 1996 [Page 27] |
1510 |
|
|
|
1511 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1512 |
|
|
|
1513 |
|
|
|
1514 |
|
|
]]> |
1515 |
|
|
|
1516 |
|
|
|
1517 |
|
|
<!--======= Document Head ======================--> |
1518 |
|
|
|
1519 |
|
|
<![ %HTML.Recommended [ |
1520 |
|
|
<!ENTITY % head.extra ""> |
1521 |
|
|
]]> |
1522 |
|
|
<!ENTITY % head.extra "& NEXTID?"> |
1523 |
|
|
|
1524 |
|
|
<!ENTITY % head.content "TITLE & ISINDEX? & BASE? %head.extra"> |
1525 |
|
|
|
1526 |
|
|
<!ELEMENT HEAD O O (%head.content) +(META|LINK)> |
1527 |
|
|
<!ATTLIST HEAD |
1528 |
|
|
%attrs; > |
1529 |
|
|
|
1530 |
|
|
<!-- <HEAD> Document head --> |
1531 |
|
|
|
1532 |
|
|
<!ELEMENT TITLE - - (#PCDATA)* -(META|LINK)> |
1533 |
|
|
<!ATTLIST TITLE |
1534 |
|
|
%attrs; |
1535 |
|
|
%SDAFORM; "Ti" > |
1536 |
|
|
|
1537 |
|
|
<!-- <TITLE> Title of document --> |
1538 |
|
|
|
1539 |
|
|
<!ELEMENT LINK - O EMPTY> |
1540 |
|
|
<!ATTLIST LINK |
1541 |
|
|
%attrs; |
1542 |
|
|
HREF CDATA #REQUIRED |
1543 |
|
|
%linkExtraAttributes; |
1544 |
|
|
%SDAPREF; "Linked to : #AttVal (TITLE) (URN) (HREF)>" > |
1545 |
|
|
|
1546 |
|
|
<!-- <LINK> Link from this document --> |
1547 |
|
|
<!-- <LINK HREF="..."> Address of link destination --> |
1548 |
|
|
<!-- <LINK URN="..."> Lasting name of destination --> |
1549 |
|
|
<!-- <LINK REL=...> Relationship to destination --> |
1550 |
|
|
<!-- <LINK REV=...> Relationship of destination to this --> |
1551 |
|
|
<!-- <LINK TITLE="..."> Title of destination (advisory) --> |
1552 |
|
|
<!-- <LINK CHARSET="..."> Charset of destination (advisory) --> |
1553 |
|
|
<!-- <LINK METHODS="..."> Operations allowed (advisory) --> |
1554 |
|
|
|
1555 |
|
|
<!ELEMENT ISINDEX - O EMPTY> |
1556 |
|
|
<!ATTLIST ISINDEX |
1557 |
|
|
%attrs; |
1558 |
|
|
%SDAPREF; |
1559 |
|
|
"<Para>[Document is indexed/searchable.]</Para>"> |
1560 |
|
|
|
1561 |
|
|
<!-- <ISINDEX> Document is a searchable index --> |
1562 |
|
|
|
1563 |
|
|
|
1564 |
|
|
|
1565 |
|
|
Expires 30 March 1996 [Page 28] |
1566 |
|
|
|
1567 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1568 |
|
|
|
1569 |
|
|
|
1570 |
|
|
<!ELEMENT BASE - O EMPTY> |
1571 |
|
|
<!ATTLIST BASE |
1572 |
|
|
HREF CDATA #REQUIRED > |
1573 |
|
|
|
1574 |
|
|
<!-- <BASE> Base context document --> |
1575 |
|
|
<!-- <BASE HREF="..."> Address for this document --> |
1576 |
|
|
|
1577 |
|
|
<!ELEMENT NEXTID - O EMPTY> |
1578 |
|
|
<!ATTLIST NEXTID |
1579 |
|
|
N CDATA #REQUIRED > |
1580 |
|
|
|
1581 |
|
|
<!-- <NEXTID> Next ID to use for link name --> |
1582 |
|
|
<!-- <NEXTID N=...> Next ID to use for link name --> |
1583 |
|
|
|
1584 |
|
|
<!ELEMENT META - O EMPTY> |
1585 |
|
|
<!ATTLIST META |
1586 |
|
|
HTTP-EQUIV NAME #IMPLIED |
1587 |
|
|
NAME NAME #IMPLIED |
1588 |
|
|
CONTENT CDATA #REQUIRED |
1589 |
|
|
> |
1590 |
|
|
|
1591 |
|
|
<!-- <META> Generic Meta-information --> |
1592 |
|
|
<!-- <META HTTP-EQUIV=...> HTTP response header name --> |
1593 |
|
|
<!-- <META NAME=...> Meta-information name --> |
1594 |
|
|
<!-- <META CONTENT="..."> Associated information --> |
1595 |
|
|
|
1596 |
|
|
<!--======= Document Structure =================--> |
1597 |
|
|
|
1598 |
|
|
<![ %HTML.Deprecated [ |
1599 |
|
|
<!ENTITY % html.content "HEAD, BODY, PLAINTEXT?"> |
1600 |
|
|
]]> |
1601 |
|
|
<!ENTITY % html.content "HEAD, BODY"> |
1602 |
|
|
|
1603 |
|
|
<!ELEMENT HTML O O (%html.content)> |
1604 |
|
|
<!ENTITY % version.attr "VERSION CDATA #FIXED '%HTML.Version;'"> |
1605 |
|
|
|
1606 |
|
|
<!ATTLIST HTML |
1607 |
|
|
%attrs; |
1608 |
|
|
%version.attr; |
1609 |
|
|
%SDAFORM; "Book" |
1610 |
|
|
> |
1611 |
|
|
|
1612 |
|
|
<!-- <HTML> HTML Document --> |
1613 |
|
|
|
1614 |
|
|
|
1615 |
|
|
7.2. SGML Declaration for HTML |
1616 |
|
|
|
1617 |
|
|
<!SGML "ISO 8879:1986" |
1618 |
|
|
|
1619 |
|
|
|
1620 |
|
|
|
1621 |
|
|
Expires 30 March 1996 [Page 29] |
1622 |
|
|
|
1623 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1624 |
|
|
|
1625 |
|
|
|
1626 |
|
|
-- |
1627 |
|
|
SGML Declaration for HyperText Markup Language version 2.x |
1628 |
|
|
(HTML 2.x). |
1629 |
|
|
|
1630 |
|
|
-- |
1631 |
|
|
|
1632 |
|
|
CHARSET |
1633 |
|
|
BASESET "ISO Registration Number 176//CHARSET |
1634 |
|
|
ISO/IEC 10646-1:1993 UCS-2 with |
1635 |
|
|
implementation level 3//ESC 2/5 2/15 4/5" |
1636 |
|
|
DESCSET 0 9 UNUSED |
1637 |
|
|
9 2 9 |
1638 |
|
|
11 2 UNUSED |
1639 |
|
|
13 1 13 |
1640 |
|
|
14 18 UNUSED |
1641 |
|
|
32 95 32 |
1642 |
|
|
127 1 UNUSED |
1643 |
|
|
128 32 UNUSED |
1644 |
|
|
160 65376 160 |
1645 |
|
|
|
1646 |
|
|
|
1647 |
|
|
CAPACITY SGMLREF |
1648 |
|
|
TOTALCAP 150000 |
1649 |
|
|
GRPCAP 150000 |
1650 |
|
|
ENTCAP 150000 |
1651 |
|
|
|
1652 |
|
|
SCOPE DOCUMENT |
1653 |
|
|
SYNTAX |
1654 |
|
|
SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
1655 |
|
|
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 |
1656 |
|
|
BASESET "ISO 646:1983//CHARSET |
1657 |
|
|
International Reference Version |
1658 |
|
|
(IRV)//ESC 2/5 4/0" |
1659 |
|
|
DESCSET 0 128 0 |
1660 |
|
|
|
1661 |
|
|
FUNCTION |
1662 |
|
|
RE 13 |
1663 |
|
|
RS 10 |
1664 |
|
|
SPACE 32 |
1665 |
|
|
TAB SEPCHAR 9 |
1666 |
|
|
|
1667 |
|
|
NAMING LCNMSTRT "" |
1668 |
|
|
UCNMSTRT "" |
1669 |
|
|
LCNMCHAR ".-" |
1670 |
|
|
UCNMCHAR ".-" |
1671 |
|
|
NAMECASE GENERAL YES |
1672 |
|
|
ENTITY NO |
1673 |
|
|
DELIM GENERAL SGMLREF |
1674 |
|
|
|
1675 |
|
|
|
1676 |
|
|
|
1677 |
|
|
Expires 30 March 1996 [Page 30] |
1678 |
|
|
|
1679 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1680 |
|
|
|
1681 |
|
|
|
1682 |
|
|
SHORTREF SGMLREF |
1683 |
|
|
NAMES SGMLREF |
1684 |
|
|
QUANTITY SGMLREF |
1685 |
|
|
ATTSPLEN 2100 |
1686 |
|
|
LITLEN 1024 |
1687 |
|
|
NAMELEN 72 -- somewhat arbitrary; taken from |
1688 |
|
|
internet line length conventions -- |
1689 |
|
|
PILEN 1024 |
1690 |
|
|
TAGLVL 100 |
1691 |
|
|
TAGLEN 2100 |
1692 |
|
|
GRPGTCNT 150 |
1693 |
|
|
GRPCNT 64 |
1694 |
|
|
|
1695 |
|
|
FEATURES |
1696 |
|
|
MINIMIZE |
1697 |
|
|
DATATAG NO |
1698 |
|
|
OMITTAG YES |
1699 |
|
|
RANK NO |
1700 |
|
|
SHORTTAG YES |
1701 |
|
|
LINK |
1702 |
|
|
SIMPLE NO |
1703 |
|
|
IMPLICIT NO |
1704 |
|
|
EXPLICIT NO |
1705 |
|
|
OTHER |
1706 |
|
|
CONCUR NO |
1707 |
|
|
SUBDOC NO |
1708 |
|
|
FORMAL YES |
1709 |
|
|
APPINFO "SDA" -- conforming SGML Document Access application |
1710 |
|
|
-- |
1711 |
|
|
> |
1712 |
|
|
|
1713 |
|
|
|
1714 |
|
|
7.3. Entity sets |
1715 |
|
|
|
1716 |
|
|
7.3.1. ISO Latin 1 Character Entity Set |
1717 |
|
|
|
1718 |
|
|
The following public text lists each of the characters specified in the |
1719 |
|
|
Added Latin 1 entity set, along with its name, syntax for use, and |
1720 |
|
|
description. This list is derived from ISO Standard 8879:1986//ENTITIES |
1721 |
|
|
Added Latin 1//EN. HTML includes the entire entity set, and adds enti- |
1722 |
|
|
ties for all missing characters in the right part of ISO-8859-1. |
1723 |
|
|
|
1724 |
|
|
<!-- (C) International Organization for Standardization 1986 |
1725 |
|
|
Permission to copy in any form is granted for use with |
1726 |
|
|
conforming SGML systems and applications as defined in |
1727 |
|
|
ISO 8879, provided this notice is included in all copies. |
1728 |
|
|
--> |
1729 |
|
|
<!-- Character entity set. Typical invocation: |
1730 |
|
|
|
1731 |
|
|
|
1732 |
|
|
|
1733 |
|
|
Expires 30 March 1996 [Page 31] |
1734 |
|
|
|
1735 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1736 |
|
|
|
1737 |
|
|
|
1738 |
|
|
<!ENTITY % ISOlat1 PUBLIC |
1739 |
|
|
"ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML"> |
1740 |
|
|
%ISOlat1; |
1741 |
|
|
--> |
1742 |
|
|
<!ENTITY nbsp CDATA " " -- no-break space --> |
1743 |
|
|
<!ENTITY iexcl CDATA "¡" -- inverted exclamation mark --> |
1744 |
|
|
<!ENTITY cent CDATA "¢" -- cent sign --> |
1745 |
|
|
<!ENTITY pound CDATA "£" -- pound sterling sign --> |
1746 |
|
|
<!ENTITY curren CDATA "¤" -- general currency sign --> |
1747 |
|
|
<!ENTITY yen CDATA "¥" -- yen sign --> |
1748 |
|
|
<!ENTITY brvbar CDATA "¦" -- broken (vertical) bar --> |
1749 |
|
|
<!ENTITY sect CDATA "§" -- section sign --> |
1750 |
|
|
<!ENTITY uml CDATA "¨" -- umlaut (dieresis) --> |
1751 |
|
|
<!ENTITY copy CDATA "©" -- copyright sign --> |
1752 |
|
|
<!ENTITY ordf CDATA "ª" -- ordinal indicator, feminine --> |
1753 |
|
|
<!ENTITY laquo CDATA "«" -- angle quotation mark, left --> |
1754 |
|
|
<!ENTITY not CDATA "¬" -- not sign --> |
1755 |
|
|
<!ENTITY shy CDATA "­" -- soft hyphen --> |
1756 |
|
|
<!ENTITY reg CDATA "®" -- registered sign --> |
1757 |
|
|
<!ENTITY macr CDATA "¯" -- macron --> |
1758 |
|
|
<!ENTITY deg CDATA "°" -- degree sign --> |
1759 |
|
|
<!ENTITY plusmn CDATA "±" -- plus-or-minus sign --> |
1760 |
|
|
<!ENTITY sup2 CDATA "²" -- superscript two --> |
1761 |
|
|
<!ENTITY sup3 CDATA "³" -- superscript three --> |
1762 |
|
|
<!ENTITY acute CDATA "´" -- acute accent --> |
1763 |
|
|
<!ENTITY micro CDATA "µ" -- micro sign --> |
1764 |
|
|
<!ENTITY para CDATA "¶" -- pilcrow (paragraph sign) --> |
1765 |
|
|
<!ENTITY middot CDATA "·" -- middle dot --> |
1766 |
|
|
<!ENTITY cedil CDATA "¸" -- cedilla --> |
1767 |
|
|
<!ENTITY sup1 CDATA "¹" -- superscript one --> |
1768 |
|
|
<!ENTITY ordm CDATA "º" -- ordinal indicator, masculine --> |
1769 |
|
|
<!ENTITY raquo CDATA "»" -- angle quotation mark, right --> |
1770 |
|
|
<!ENTITY frac14 CDATA "¼" -- fraction one-quarter --> |
1771 |
|
|
<!ENTITY frac12 CDATA "½" -- fraction one-half --> |
1772 |
|
|
<!ENTITY frac34 CDATA "¾" -- fraction three-quarters --> |
1773 |
|
|
<!ENTITY iquest CDATA "¿" -- inverted question mark --> |
1774 |
|
|
<!ENTITY Agrave CDATA "À" -- capital A, grave accent --> |
1775 |
|
|
<!ENTITY Aacute CDATA "Á" -- capital A, acute accent --> |
1776 |
|
|
<!ENTITY Acirc CDATA "Â" -- capital A, circumflex accent --> |
1777 |
|
|
<!ENTITY Atilde CDATA "Ã" -- capital A, tilde --> |
1778 |
|
|
<!ENTITY Auml CDATA "Ä" -- capital A, dieresis or umlaut mark --> |
1779 |
|
|
<!ENTITY Aring CDATA "Å" -- capital A, ring --> |
1780 |
|
|
<!ENTITY AElig CDATA "Æ" -- capital AE diphthong (ligature) --> |
1781 |
|
|
<!ENTITY Ccedil CDATA "Ç" -- capital C, cedilla --> |
1782 |
|
|
<!ENTITY Egrave CDATA "È" -- capital E, grave accent --> |
1783 |
|
|
<!ENTITY Eacute CDATA "É" -- capital E, acute accent --> |
1784 |
|
|
<!ENTITY Ecirc CDATA "Ê" -- capital E, circumflex accent --> |
1785 |
|
|
<!ENTITY Euml CDATA "Ë" -- capital E, dieresis or umlaut mark --> |
1786 |
|
|
|
1787 |
|
|
|
1788 |
|
|
|
1789 |
|
|
Expires 30 March 1996 [Page 32] |
1790 |
|
|
|
1791 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1792 |
|
|
|
1793 |
|
|
|
1794 |
|
|
<!ENTITY Igrave CDATA "Ì" -- capital I, grave accent --> |
1795 |
|
|
<!ENTITY Iacute CDATA "Í" -- capital I, acute accent --> |
1796 |
|
|
<!ENTITY Icirc CDATA "Î" -- capital I, circumflex accent --> |
1797 |
|
|
<!ENTITY Iuml CDATA "Ï" -- capital I, dieresis or umlaut mark --> |
1798 |
|
|
<!ENTITY ETH CDATA "Ð" -- capital Eth, Icelandic --> |
1799 |
|
|
<!ENTITY Ntilde CDATA "Ñ" -- capital N, tilde --> |
1800 |
|
|
<!ENTITY Ograve CDATA "Ò" -- capital O, grave accent --> |
1801 |
|
|
<!ENTITY Oacute CDATA "Ó" -- capital O, acute accent --> |
1802 |
|
|
<!ENTITY Ocirc CDATA "Ô" -- capital O, circumflex accent --> |
1803 |
|
|
<!ENTITY Otilde CDATA "Õ" -- capital O, tilde --> |
1804 |
|
|
<!ENTITY Ouml CDATA "Ö" -- capital O, dieresis or umlaut mark --> |
1805 |
|
|
<!ENTITY times CDATA "×" -- multiply sign --> |
1806 |
|
|
<!ENTITY Oslash CDATA "Ø" -- capital O, slash --> |
1807 |
|
|
<!ENTITY Ugrave CDATA "Ù" -- capital U, grave accent --> |
1808 |
|
|
<!ENTITY Uacute CDATA "Ú" -- capital U, acute accent --> |
1809 |
|
|
<!ENTITY Ucirc CDATA "Û" -- capital U, circumflex accent --> |
1810 |
|
|
<!ENTITY Uuml CDATA "Ü" -- capital U, dieresis or umlaut mark --> |
1811 |
|
|
<!ENTITY Yacute CDATA "Ý" -- capital Y, acute accent --> |
1812 |
|
|
<!ENTITY THORN CDATA "Þ" -- capital Thorn, Icelandic --> |
1813 |
|
|
<!ENTITY szlig CDATA "ß" -- small sharp s, German (sz ligature) --> |
1814 |
|
|
<!ENTITY agrave CDATA "à" -- small a, grave accent --> |
1815 |
|
|
<!ENTITY aacute CDATA "á" -- small a, acute accent --> |
1816 |
|
|
<!ENTITY acirc CDATA "â" -- small a, circumflex accent --> |
1817 |
|
|
<!ENTITY atilde CDATA "ã" -- small a, tilde --> |
1818 |
|
|
<!ENTITY auml CDATA "ä" -- small a, dieresis or umlaut mark --> |
1819 |
|
|
<!ENTITY aring CDATA "å" -- small a, ring --> |
1820 |
|
|
<!ENTITY aelig CDATA "æ" -- small ae diphthong (ligature) --> |
1821 |
|
|
<!ENTITY ccedil CDATA "ç" -- small c, cedilla --> |
1822 |
|
|
<!ENTITY egrave CDATA "è" -- small e, grave accent --> |
1823 |
|
|
<!ENTITY eacute CDATA "é" -- small e, acute accent --> |
1824 |
|
|
<!ENTITY ecirc CDATA "ê" -- small e, circumflex accent --> |
1825 |
|
|
<!ENTITY euml CDATA "ë" -- small e, dieresis or umlaut mark --> |
1826 |
|
|
<!ENTITY igrave CDATA "ì" -- small i, grave accent --> |
1827 |
|
|
<!ENTITY iacute CDATA "í" -- small i, acute accent --> |
1828 |
|
|
<!ENTITY icirc CDATA "î" -- small i, circumflex accent --> |
1829 |
|
|
<!ENTITY iuml CDATA "ï" -- small i, dieresis or umlaut mark --> |
1830 |
|
|
<!ENTITY eth CDATA "ð" -- small eth, Icelandic --> |
1831 |
|
|
<!ENTITY ntilde CDATA "ñ" -- small n, tilde --> |
1832 |
|
|
<!ENTITY ograve CDATA "ò" -- small o, grave accent --> |
1833 |
|
|
<!ENTITY oacute CDATA "ó" -- small o, acute accent --> |
1834 |
|
|
<!ENTITY ocirc CDATA "ô" -- small o, circumflex accent --> |
1835 |
|
|
<!ENTITY otilde CDATA "õ" -- small o, tilde --> |
1836 |
|
|
<!ENTITY ouml CDATA "ö" -- small o, dieresis or umlaut mark --> |
1837 |
|
|
<!ENTITY divide CDATA "÷" -- divide sign --> |
1838 |
|
|
<!ENTITY oslash CDATA "ø" -- small o, slash --> |
1839 |
|
|
<!ENTITY ugrave CDATA "ù" -- small u, grave accent --> |
1840 |
|
|
<!ENTITY uacute CDATA "ú" -- small u, acute accent --> |
1841 |
|
|
<!ENTITY ucirc CDATA "û" -- small u, circumflex accent --> |
1842 |
|
|
|
1843 |
|
|
|
1844 |
|
|
|
1845 |
|
|
Expires 30 March 1996 [Page 33] |
1846 |
|
|
|
1847 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1848 |
|
|
|
1849 |
|
|
|
1850 |
|
|
<!ENTITY uuml CDATA "ü" -- small u, dieresis or umlaut mark --> |
1851 |
|
|
<!ENTITY yacute CDATA "ý" -- small y, acute accent --> |
1852 |
|
|
<!ENTITY thorn CDATA "þ" -- small thorn, Icelandic --> |
1853 |
|
|
<!ENTITY yuml CDATA "ÿ" -- small y, dieresis or umlaut mark --> |
1854 |
|
|
|
1855 |
|
|
|
1856 |
|
|
Bibliography |
1857 |
|
|
|
1858 |
|
|
[BRYAN88] M. Bryan, "SGML -- An Author's Guide to the Standard |
1859 |
|
|
Generalized Markup Language", Addison-Wesley, Reading, |
1860 |
|
|
1988. |
1861 |
|
|
|
1862 |
|
|
[ERCS] Extended Reference Concrete Syntax for SGML. |
1863 |
|
|
<http://www.sgmlopen.org/sgml/docs/ercs/ercs- |
1864 |
|
|
home.html> |
1865 |
|
|
|
1866 |
|
|
[ETHNO] "Ethnologue, Languages of the World", 12th Edition, |
1867 |
|
|
Barbara F. Grimes editor, Summer Institute of Linguis- |
1868 |
|
|
tics, Dallas, 1992. |
1869 |
|
|
|
1870 |
|
|
[FILE-UPLOAD] E. Nebel and L. Masinter, "Form-based File Upload in |
1871 |
|
|
HTML", Work in progress (draft-ietf-html- |
1872 |
|
|
fileupload-03.txt), Xerox Corporation, August 1995. |
1873 |
|
|
|
1874 |
|
|
[GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed., |
1875 |
|
|
Oxford University Press, 1990. |
1876 |
|
|
|
1877 |
|
|
[HTML-2] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan- |
1878 |
|
|
guage - 2.0", Work in progress (draft-ietf-html- |
1879 |
|
|
spec-05.txt), MIT/W3C, August 1995. |
1880 |
|
|
|
1881 |
|
|
[HTTP] T. Berners-Lee, R. T. Fielding, and H. Frystyk |
1882 |
|
|
Nielsen, "Hypertext Transfer Protocol - HTTP/1.0", |
1883 |
|
|
Work in progress (draft-ietf-http-v10-spec-00.ps), |
1884 |
|
|
MIT, UC Irvine, CERN, March 1995. |
1885 |
|
|
|
1886 |
|
|
[ISO-639] ISO 639:1988. Codes pour la reprsentation des noms de |
1887 |
|
|
langue. Technical content in |
1888 |
|
|
<http://www.sil.org/sgml/iso639a.html> |
1889 |
|
|
|
1890 |
|
|
[ISO-1000] ISO 1000:1992. Units SI et recommandations pour |
1891 |
|
|
l'emploi de leurs multiples et de certaines autres |
1892 |
|
|
units. |
1893 |
|
|
|
1894 |
|
|
[ISO-3166] ISO 3166:1993. Codes pour la reprsentation des noms |
1895 |
|
|
de pays. |
1896 |
|
|
|
1897 |
|
|
[ISO-4217] ISO 4217:1990. Codes pour la reprsentation des |
1898 |
|
|
|
1899 |
|
|
|
1900 |
|
|
|
1901 |
|
|
Expires 30 March 1996 [Page 34] |
1902 |
|
|
|
1903 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1904 |
|
|
|
1905 |
|
|
|
1906 |
|
|
monnaies et types des fonds. |
1907 |
|
|
|
1908 |
|
|
[ISO-8601] ISO 8601:1988. lments de donnes et formats |
1909 |
|
|
d'change -- change d'information -- Reprsentation |
1910 |
|
|
de la date et de l'heure. |
1911 |
|
|
|
1912 |
|
|
[ISO-8859-1] ISO 8859-1:1987. International Standard -- Informa- |
1913 |
|
|
tion Processing -- 8-bit Single-Byte Coded Graphic |
1914 |
|
|
Character Sets -- Part 1: Latin Alphabet No. 1. |
1915 |
|
|
|
1916 |
|
|
[ISO-8879] ISO 8879:1986. International Standard -- Information |
1917 |
|
|
Processing -- Text and Office Systems -- Standard Gen- |
1918 |
|
|
eralized Markup Language (SGML). |
1919 |
|
|
|
1920 |
|
|
[ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Infor- |
1921 |
|
|
mation technology -- Universal Multiple-Octet Coded |
1922 |
|
|
Character Set (UCS) -- Part 1: Architecture and Basic |
1923 |
|
|
Multilingual Plane. |
1924 |
|
|
|
1925 |
|
|
[NICOL] G.T. Nicol, "The Multilingual World Wide Web", Elec- |
1926 |
|
|
tronic Book Technologies, 1995, |
1927 |
|
|
<http://www.ebt.com/docs/multling.html> |
1928 |
|
|
|
1929 |
|
|
[RFC1468] J. Murai, M. Crispin and E. van der Poel, "Japanese |
1930 |
|
|
Character Encoding for Internet Messages", RFC 1468, |
1931 |
|
|
Keio University, Panda Programming, June 1993. |
1932 |
|
|
|
1933 |
|
|
[RFC1521] N. Borenstein and N. Freed, "MIME (Multipurpose Inter- |
1934 |
|
|
net Mail Extensions) Part One: Mechanisms for Specify- |
1935 |
|
|
ing and Describing the Format of Internet Message Bod- |
1936 |
|
|
ies", RFC 1521, Bellcore, Innosoft, September 1993. |
1937 |
|
|
|
1938 |
|
|
[RFC1590] J. Postel, "Media Type Registration Procedure", RFC |
1939 |
|
|
1590, USC/ISI, March 1994. |
1940 |
|
|
|
1941 |
|
|
[RFC1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform |
1942 |
|
|
Resource Locators (URL)", RFC 1738, CERN, Xerox PARC, |
1943 |
|
|
University of Minnesota, October 1994. |
1944 |
|
|
|
1945 |
|
|
[RFC1766] H. Alverstrand, "Tags for the Identification of Lan- |
1946 |
|
|
guages", RFC 1766, UNINETT, March 1995. |
1947 |
|
|
|
1948 |
|
|
[SQ91] SoftQuad, "The SGML Primer", 3rd ed., SoftQuad Inc., |
1949 |
|
|
1991. |
1950 |
|
|
|
1951 |
|
|
[TAKADA] Toshihiro Takada, "Multilingual Information Exchange |
1952 |
|
|
through the World-Wide Web", Computer Networks and |
1953 |
|
|
ISDN Systems, Vol. 27, No. 2, Nov. 1994 , p. 235-241. |
1954 |
|
|
|
1955 |
|
|
|
1956 |
|
|
|
1957 |
|
|
Expires 30 March 1996 [Page 35] |
1958 |
|
|
|
1959 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
1960 |
|
|
|
1961 |
|
|
|
1962 |
|
|
[TEI] TEI Guidelines for Electronic Text Encoding and Inter- |
1963 |
|
|
change. <http://etext.virgina.edu/TEI.html> |
1964 |
|
|
|
1965 |
|
|
[UNICODE] The Unicode Consortium, "The Unicode Standard -- |
1966 |
|
|
Worldwide Character Encoding -- Version 1.0", Addison- |
1967 |
|
|
Wesley, Volume 1, 1991, Volume 2, 1992. The BIDI |
1968 |
|
|
algorithm is in appendix A of volume 1, with correc- |
1969 |
|
|
tions in appendix D of volume 2. |
1970 |
|
|
|
1971 |
|
|
[VANH90] E. van Hervijnen, "Practical SGML", Kluwer Academicq |
1972 |
|
|
Publishers Group, Norwell and Dordrecht, 1990. |
1973 |
|
|
|
1974 |
|
|
Authors' Addresses |
1975 |
|
|
|
1976 |
|
|
Franois Yergeau |
1977 |
|
|
Alis Technologies |
1978 |
|
|
3410, rue Griffith |
1979 |
|
|
Montral QC H4T 1A7 |
1980 |
|
|
Canada |
1981 |
|
|
|
1982 |
|
|
Tel: +1 (514) 738-9171 |
1983 |
|
|
Fax: +1 (514) 342-0318 |
1984 |
|
|
EMail: yergeau@alis.ca |
1985 |
|
|
|
1986 |
|
|
|
1987 |
|
|
Gavin Thomas Nicol |
1988 |
|
|
Electronic Book Technologies, Japan |
1989 |
|
|
1-29-9 Tsurumaki, |
1990 |
|
|
Setagaya-ku, |
1991 |
|
|
Tokyo |
1992 |
|
|
Japan |
1993 |
|
|
|
1994 |
|
|
Tel + Fax: +81-3-3706-7351 |
1995 |
|
|
EMail: gtn@ebt.com, gtn@twics.co.jp |
1996 |
|
|
|
1997 |
|
|
|
1998 |
|
|
Glenn Adams |
1999 |
|
|
Stonehand |
2000 |
|
|
118 Magazine Street |
2001 |
|
|
Cambridge, MA 02139 |
2002 |
|
|
U.S.A. |
2003 |
|
|
|
2004 |
|
|
Tel: +1 (617) 864-5524 |
2005 |
|
|
Fax: +1 (617) 864-4965 |
2006 |
|
|
EMail: glenn@stonehand.com |
2007 |
|
|
|
2008 |
|
|
|
2009 |
|
|
Martin J. Duerst |
2010 |
|
|
|
2011 |
|
|
|
2012 |
|
|
|
2013 |
|
|
Expires 30 March 1996 [Page 36] |
2014 |
|
|
|
2015 |
|
|
Internet Draft HTML internationalization 25 September 1995 |
2016 |
|
|
|
2017 |
|
|
|
2018 |
|
|
Multimedia-Laboratory |
2019 |
|
|
Departement of Computer Science |
2020 |
|
|
University of Zurich |
2021 |
|
|
Winterthurerstrasse 190 |
2022 |
|
|
CH-8057 Zurich |
2023 |
|
|
Switzerland |
2024 |
|
|
|
2025 |
|
|
Tel: +41 1 257 43 16 |
2026 |
|
|
Fax: +41 1 363 00 35 |
2027 |
|
|
E-mail: mduerst@ifi.unizh.ch |
2028 |
|
|
|
2029 |
|
|
|
2030 |
|
|
|
2031 |
|
|
|
2032 |
|
|
|
2033 |
|
|
|
2034 |
|
|
|
2035 |
|
|
|
2036 |
|
|
|
2037 |
|
|
|
2038 |
|
|
|
2039 |
|
|
|
2040 |
|
|
|
2041 |
|
|
|
2042 |
|
|
|
2043 |
|
|
|
2044 |
|
|
|
2045 |
|
|
|
2046 |
|
|
|
2047 |
|
|
|
2048 |
|
|
|
2049 |
|
|
|
2050 |
|
|
|
2051 |
|
|
|
2052 |
|
|
|
2053 |
|
|
|
2054 |
|
|
|
2055 |
|
|
|
2056 |
|
|
|
2057 |
|
|
|
2058 |
|
|
|
2059 |
|
|
|
2060 |
|
|
|
2061 |
|
|
|
2062 |
|
|
|
2063 |
|
|
|
2064 |
|
|
|
2065 |
|
|
|
2066 |
|
|
|
2067 |
|
|
|
2068 |
|
|
|
2069 |
|
|
Expires 30 March 1996 [Page 37] |
2070 |
|
|
|