1 |
|
2 |
|
3 |
|
4 |
Network Working Group F. Yergeau |
5 |
Internet Draft G. Nicol |
6 |
<draft-ietf-html-i18n-04.txt> G. Adams |
7 |
Expires 2 December 1996 M. Duerst |
8 |
27 May 1996 |
9 |
|
10 |
|
11 |
Internationalization of the Hypertext Markup Language |
12 |
|
13 |
|
14 |
Status of this Memo |
15 |
|
16 |
This document is an Internet-Draft. Internet-Drafts are working doc- |
17 |
uments of the Internet Engineering Task Force (IETF), its areas, and |
18 |
its working groups. Note that other groups may also distribute work- |
19 |
ing documents as Internet-Drafts. |
20 |
|
21 |
Internet-Drafts are draft documents valid for a maximum of six |
22 |
months. Internet-Drafts may be updated, replaced, or obsoleted by |
23 |
other documents at any time. It is not appropriate to use Internet- |
24 |
Drafts as reference material or to cite them other than as a "working |
25 |
draft" or "work in progress". |
26 |
|
27 |
To learn the current status of any Internet-Draft, please check the |
28 |
1id-abstracts.txt listing contained in the Internet-Drafts Shadow |
29 |
Directories on ds.internic.net (US East Coast), nic.nordu.net |
30 |
(Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific |
31 |
Rim). |
32 |
|
33 |
Distribution of this document is unlimited. Please send comments to |
34 |
the HTML working group (HTML-WG) of the Internet Engineering Task |
35 |
Force (IETF) at <html-wg@w3.org>. Subscription address is <html-wg- |
36 |
request@w3.org>. Discussions of the group are archived at |
37 |
<URL:http://www.acl.lanl.gov/HTML_WG/archives.html>. |
38 |
|
39 |
|
40 |
Abstract |
41 |
|
42 |
The Hypertext Markup Language (HTML) is a simple markup language used |
43 |
to create hypertext documents that are platform independent. Ini- |
44 |
tially, the application of HTML on the World Wide Web was seriously |
45 |
restricted by its reliance on the ISO-8859-1 coded character set, |
46 |
which is appropriate only for Western European languages. Despite |
47 |
this restriction, HTML has been widely used with other languages, |
48 |
using other coded character sets or character encodings, at the |
49 |
expense of interoperability. |
50 |
|
51 |
This document is meant to address the issue of the |
52 |
|
53 |
|
54 |
|
55 |
Expires 2 December 1996 [Page 1] |
56 |
|
57 |
Internet Draft HTML internationalization 27 May 1996 |
58 |
|
59 |
|
60 |
internationalization (i18n, i followed by 18 letters followed by n) |
61 |
of HTML by extending the specification of HTML and giving additional |
62 |
recommendations for proper internationalization support. A foremost |
63 |
consideration is to make sure that HTML remains a valid application |
64 |
of SGML, while enabling its use in all languages of the world. |
65 |
|
66 |
|
67 |
Table of contents |
68 |
|
69 |
1. Introduction .................................................. 2 |
70 |
1.1. Scope ...................................................... 3 |
71 |
1.2. Conformance ................................................ 3 |
72 |
2. The document character set ..................................... 4 |
73 |
2.1. Reference processing model ................................. 4 |
74 |
2.2. The document character set ................................. 6 |
75 |
2.3. Undisplayable characters ................................... 8 |
76 |
3. The LANG attribute.............................................. 8 |
77 |
4. Additional entities, attributes and elements ................... 9 |
78 |
4.1. Full Latin-1 entity set .................................... 9 |
79 |
4.2. Markup for language-dependent presentation ................. 9 |
80 |
5. Forms ..........................................................15 |
81 |
5.1. DTD additions ..............................................15 |
82 |
5.2. Form submission ............................................15 |
83 |
6. Miscellaneous ..................................................17 |
84 |
7. HTML public text ...............................................18 |
85 |
7.1. HTML DTD ...................................................18 |
86 |
7.2. SGML declaration for HTML ..................................34 |
87 |
7.3. ISO Latin 1 character entity set ...........................35 |
88 |
Bibliography ......................................................38 |
89 |
Authors' Addresses ................................................40 |
90 |
|
91 |
|
92 |
1. Introduction |
93 |
|
94 |
The Hypertext Markup Language (HTML) is a simple markup language used |
95 |
to create hypertext documents that are platform independent. Ini- |
96 |
tially, the application of HTML on the World Wide Web was seriously |
97 |
restricted by its reliance on the ISO-8859-1 coded character set, |
98 |
which is appropriate only for Western European languages. Despite |
99 |
this restriction, HTML has been widely used with other languages, |
100 |
using other coded character sets or character encodings, through var- |
101 |
ious ad hoc extensions to the language [TAKADA]. |
102 |
|
103 |
This document is meant to address the issue of the internationaliza- |
104 |
tion of HTML by extending the specification of HTML and giving addi- |
105 |
tional recommendations for proper internationalization support. It |
106 |
is in good part based on a paper by one of the authors on multilin- |
107 |
gualism on the WWW [NICOL]. A foremost consideration is to make sure |
108 |
|
109 |
|
110 |
|
111 |
Expires 2 December 1996 [Page 2] |
112 |
|
113 |
Internet Draft HTML internationalization 27 May 1996 |
114 |
|
115 |
|
116 |
that HTML remains a valid application of SGML, while enabling its use |
117 |
in all languages of the world. |
118 |
|
119 |
The specific issues addressed are the SGML document character set to |
120 |
be used for HTML, the proper treatment of the charset parameter asso- |
121 |
ciated with the "text/html" content type and the specification of |
122 |
some additional elements and entities. |
123 |
|
124 |
|
125 |
1.1 Scope |
126 |
|
127 |
HTML has been in use by the World-Wide Web (WWW) global information |
128 |
initiative since 1990. This specification extends the capabilities |
129 |
of HTML 2.0 (RFC 1866), primarily by removing the restriction to the |
130 |
ISO-8859-1 coded character set [ISO-8859-1]. |
131 |
|
132 |
HTML is an application of ISO Standard 8879:1986, Information Pro- |
133 |
cessing Text and Office Systems -- Standard Generalized Markup Lan- |
134 |
guage (SGML) [ISO-8879]. The HTML Document Type Definition (DTD) is a |
135 |
formal definition of the HTML syntax in terms of SGML. This specifi- |
136 |
cation amends the DTD of HTML in order to make it applicable to docu- |
137 |
ments encompassing a character repertoire much larger than that of |
138 |
ISO-8859-1, while still remaining SGML conformant. |
139 |
|
140 |
Both formal and actual development of HTML are advancing very fast. |
141 |
The features described in this document are designed so that they can |
142 |
(and should) be added to other forms of HTML besides that described |
143 |
in RFC 1866. Where indicated, attributes introduced here should be |
144 |
extended to the appropriate elements. |
145 |
|
146 |
|
147 |
1.2 Conformance |
148 |
|
149 |
This specification changes slightly the conformance requirements of |
150 |
HTML documents and HTML user agents. |
151 |
|
152 |
1.2.1 Documents |
153 |
|
154 |
All HTML 2.0 conforming documents remain conforming with this speci- |
155 |
fication. However, the extensions introduced here make valid cer- |
156 |
tains documents that would not be HTML 2.0 conforming, in particular |
157 |
those containing characters or character references outside of the |
158 |
repertoire of ISO 8859-1, and those containing markup introduced |
159 |
herein. |
160 |
|
161 |
|
162 |
|
163 |
|
164 |
|
165 |
|
166 |
|
167 |
Expires 2 December 1996 [Page 3] |
168 |
|
169 |
Internet Draft HTML internationalization 27 May 1996 |
170 |
|
171 |
|
172 |
1.2.2. User agents |
173 |
|
174 |
In addition to the requirements of RFC 1866, the following require- |
175 |
ments are placed on HTML user agents. |
176 |
|
177 |
To ensure interoperability and proper support for at least |
178 |
ISO-8859-1 in an environment where character encoding schemes |
179 |
other than ISO-8859-1 are present, user agents must correctly |
180 |
interpret the charset parameter accompanying an HTML document |
181 |
received from the network. |
182 |
|
183 |
Furthermore, conforming user-agents are required to at least parse |
184 |
correctly all numeric character references within the range of ISO |
185 |
10646-1 [ISO-10646]. |
186 |
|
187 |
Conforming user-agents are required to apply the BIDI presentation |
188 |
algorithm if they display right-to-left characters. If there is |
189 |
no displayable right-to-left character in a document, there is no |
190 |
need to apply BIDI processing. |
191 |
|
192 |
2. The document character set |
193 |
|
194 |
2.1. Reference processing model |
195 |
|
196 |
This overview explains a reference processing model used for HTML, |
197 |
and in particular the SGML concept of a document character set. An |
198 |
actual implementation may widely differ in its internal workings from |
199 |
the model given below, but should behave as described to an outside |
200 |
observer. |
201 |
|
202 |
Because there are various widely differing encodings of text, SGML |
203 |
does not directly address the question of how characters are encoded |
204 |
e.g. in a file. SGML views the characters as a single set (called a |
205 |
"character repertoire"), and a "code set" that assigns an integer |
206 |
number (known as "character number") to each character in the reper- |
207 |
toire. The document character set declaration defines what each of |
208 |
the character numbers represents [GOLD90, p. 451]. In most cases, an |
209 |
SGML DTD and all documents that refer to it have a single document |
210 |
character set, and all markup and data characters are part of this |
211 |
set. |
212 |
|
213 |
HTML, as an application of SGML, does not directly address the ques- |
214 |
tion of how characters are encoded as octets in external representa- |
215 |
tions such as files. This is deferred to mechanisms external to HTML, |
216 |
such as MIME as used by the HTTP protocol or by electronic mail. |
217 |
|
218 |
For the HTTP protocol [RFC1945], the way characters are encoded is |
219 |
|
220 |
|
221 |
|
222 |
|
223 |
Expires 2 December 1996 [Page 4] |
224 |
|
225 |
Internet Draft HTML internationalization 27 May 1996 |
226 |
|
227 |
|
228 |
defined by the "charset" parameter[1] of the "Content-Type" field of |
229 |
the header of an HTTP response. For example, to indicate that the |
230 |
transmitted document is encoded in the "JIS" encoding of Japanese |
231 |
[RFC1468], the header will contain the following line: |
232 |
|
233 |
Content-Type: text/html; charset=ISO-2022-JP |
234 |
|
235 |
The HTTP protocol also defines a mechanism for the client to specify |
236 |
the character encodings it can accept. Clients and servers are |
237 |
strongly requested to use these mechanisms to assure correct trans- |
238 |
mission and interpretation of any document. Provisions that can be |
239 |
taken to help correct interpretation, even in cases where a server or |
240 |
client do not yet use these mechanisms, are described in section 6. |
241 |
|
242 |
Similarly, if HTML documents are transferred by electronic mail, the |
243 |
character encoding is defined by the "charset" parameter of the "Con- |
244 |
tent-Type" MIME header line [RFC1521], and defaults to US-ASCII in |
245 |
its absence. |
246 |
|
247 |
In the case any other way of transferring and storing HTML documents |
248 |
are defined or become popular, it is advised that similar provisions |
249 |
be made to clearly identify the character encoding used and/or to use |
250 |
a single/default encoding capable of representing the widest range of |
251 |
characters used in an international context. |
252 |
|
253 |
Whatever the external character encoding may be, the reference pro- |
254 |
cessing model translates it to a representation of the document char- |
255 |
acter set specified in Section 2.2 before processing specific to |
256 |
SGML/HTML. The reference processing model can be depicted as fol- |
257 |
lows: |
258 |
|
259 |
[resource]->[decoder]->[entity ]->[ SGML ]->[application]->[display] |
260 |
[manager] [parser] |
261 |
^ | |
262 |
| | |
263 |
+----------+ |
264 |
|
265 |
The decoder is responsible for decoding the external representation |
266 |
of the resource to a representation using the document character set. |
267 |
The entity manager, the parser, and the application deal only with |
268 |
characters of the document character set. A display-oriented part of |
269 |
the application or the display machinery itself may again convert |
270 |
----------- |
271 |
1 The term "charset" in MIME is used to designate a char- |
272 |
acter encoding, rather than a coded character set as the |
273 |
term may suggest. A character encoding is a mapping (possi- |
274 |
bly many-to-one) of a sequence of octets to a sequence of |
275 |
characters taken from one or more character repertoires. |
276 |
|
277 |
|
278 |
|
279 |
Expires 2 December 1996 [Page 5] |
280 |
|
281 |
Internet Draft HTML internationalization 27 May 1996 |
282 |
|
283 |
|
284 |
characters represented in the document character set to some other |
285 |
representation more suitable for their purpose. In any case, the |
286 |
entity manager, the parser, and the application, as far as character |
287 |
semantics are concerned, are using the HTML document character set |
288 |
only. |
289 |
|
290 |
An actual implementation may choose, or not, to translate the docu- |
291 |
ment into some encoding of the document character set as described |
292 |
above; the behaviour described by this reference processing model can |
293 |
be achieved otherwise. This subject is well out of the scope of this |
294 |
specification, however, and the reader is invited to consult the SGML |
295 |
standard [ISO-8879] or an SGML handbook [BRYAN88] [GOLD90] [VANH90] |
296 |
[SQ91] for further information. |
297 |
|
298 |
The most important consequence of this reference processing model is |
299 |
that numeric character references are always resolved with respect to |
300 |
the fixed document character set, and thus to the same characters, |
301 |
whatever the external encoding actually used. For an example, see |
302 |
Section 2.2. |
303 |
|
304 |
2.2. The document character set |
305 |
|
306 |
The document character set, in the SGML sense, is the Universal Char- |
307 |
acter Set (UCS) of ISO 10646:1993 [ISO-10646], as amended. Cur- |
308 |
rently, this is code-by-code identical with the Unicode standard, |
309 |
version 1.1 [UNICODE]. |
310 |
|
311 |
NOTE -- implementers should be aware that ISO 10646 is |
312 |
amended from time to time; 4 amendments have been adopted |
313 |
since the initial 1993 publication, none of which signifi- |
314 |
cantly affects this specification. A fifth amendment, now |
315 |
under consideration, will introduce incompatible changes to |
316 |
the standard: 6556 Korean Hangul syllables allocated |
317 |
between code positions 3400 and 4DFF (hexadecimal) will be |
318 |
moved to new positions (and 4516 new syllables added), thus |
319 |
making references to the old positions invalid. Since the |
320 |
Unicode consortium has already adopted a corresponding |
321 |
amendment for inclusion in the forthcoming Unicode 2.0, |
322 |
adoption of DAM 5 is considered likely and implementers |
323 |
should probably consider the old code positions as already |
324 |
invalid. Despite this one-time change, the relevant stan- |
325 |
dard bodies appear to remain committed not to change any |
326 |
allocated code position in the future. To encode Korean |
327 |
Hangul irrespective of these changes, the combining Hangul |
328 |
Jamo in the range 1110-11F9 can be used. |
329 |
|
330 |
The adoption of this document character set implies a change in the |
331 |
SGML declaration specified in the HTML 2.0 specification (section 9.5 |
332 |
|
333 |
|
334 |
|
335 |
Expires 2 December 1996 [Page 6] |
336 |
|
337 |
Internet Draft HTML internationalization 27 May 1996 |
338 |
|
339 |
|
340 |
of [RFC1866]). The change amounts to removing the first BASESET |
341 |
specification and its accompanying DESCSET declaration, replacing |
342 |
them with the following declaration: |
343 |
|
344 |
BASESET "ISO Registration Number 177//CHARSET |
345 |
ISO/IEC 10646-1:1993 UCS-4 with implementation level 3 |
346 |
//ESC 2/5 2/15 4/6" |
347 |
DESCSET 0 9 UNUSED |
348 |
9 2 9 |
349 |
11 2 UNUSED |
350 |
13 1 13 |
351 |
14 18 UNUSED |
352 |
32 95 32 |
353 |
127 1 UNUSED |
354 |
128 32 UNUSED |
355 |
160 2147483486 160 |
356 |
|
357 |
Making the UCS the document character set does not create non- |
358 |
conformance of any expression, construct or document that is conform- |
359 |
ing to HTML 2.0. It does make conforming certain constructs that are |
360 |
not admissible in HTML 2.0. One consequence is that data characters |
361 |
outside the repertoire of ISO-8859-1, but within that of UCS-4 become |
362 |
valid SGML characters. Another is that the upper limit of the range |
363 |
of numeric character references is extended from 255 to 2147483645; |
364 |
thus, И is a valid reference to a "CYRILLIC CAPITAL LETTER I". |
365 |
[ERCS] is a good source of information on Unicode and SGML, although |
366 |
its scope and technical content differ greatly from this specifica- |
367 |
tion. |
368 |
|
369 |
NOTE -- the above SGML declaration, like that of HTML 2.0, |
370 |
specifies the character numbers 128 to 159 (80 to 9F hex) |
371 |
as UNUSED. This means that numeric character references |
372 |
within that range (e.g. ’) are illegal in HTML. Nei- |
373 |
ther ISO 8859-1 nor ISO 10646 contain characters in that |
374 |
range, which is reserved for control characters. |
375 |
|
376 |
ISO 10646-1:1993 is the most encompassing character set currently |
377 |
existing, and there is no other character set that could take its |
378 |
place as the document character set for HTML. If nevertheless for a |
379 |
specific application there is a need to use characters outside this |
380 |
standard, this should be done by avoiding any conflicts with present |
381 |
or future versions of ISO 10646, i.e. by assigning these characters |
382 |
to a private zone. Also, it should be borne in mind that such a use |
383 |
will be highly unportable; in many cases, it may be better to use |
384 |
inline bitmaps. |
385 |
|
386 |
|
387 |
|
388 |
|
389 |
|
390 |
|
391 |
Expires 2 December 1996 [Page 7] |
392 |
|
393 |
Internet Draft HTML internationalization 27 May 1996 |
394 |
|
395 |
|
396 |
2.3. Undisplayable characters |
397 |
|
398 |
With the document character set being the full ISO 10646, the possi- |
399 |
bility that a character cannot be displayed due to lack of appropri- |
400 |
ate resources (fonts) cannot be avoided. Because there are many dif- |
401 |
ferent things that can be done in such a case, this document does not |
402 |
prescribe any specific behaviour. Depending on the implementation, |
403 |
this may also be handled by the underlaying display system and not |
404 |
the application itself. The following considerations, however, may |
405 |
be of help: |
406 |
|
407 |
- A clearly visible, but unobtrusive behaviour should be preferred. |
408 |
Some documents may contain many characters that cannot be renden- |
409 |
dered, and so showing an alert for each of them is not the right |
410 |
thing to do. |
411 |
|
412 |
- In case a numeric representation of the missing character is |
413 |
given, its hexadecimal (not decimal) form is to be preferred, |
414 |
because this form is used in character set standards [ERCS]. |
415 |
|
416 |
3. The LANG attribute |
417 |
|
418 |
Language tags can be used to control rendering of a marked up docu- |
419 |
ment in various ways: glyph disambiguation, in cases where the char- |
420 |
acter encoding is not sufficient to resolve to a specific glyph; quo- |
421 |
tation marks; hyphenation; ligatures; spacing; voice synthesis; etc. |
422 |
Independently of rendering issues, language markup is useful as con- |
423 |
tent markup for purposes such as classification and searching. |
424 |
|
425 |
Since any text can logically be assigned a language, almost all HTML |
426 |
elements admit the LANG attribute. The DTD reflects this. It is |
427 |
also intended that any new element introduced in later versions of |
428 |
HTML will admit the LANG attribute, unless there is a good reason not |
429 |
to do so. |
430 |
|
431 |
The language attribute, LANG, takes as its value a language tag that |
432 |
identifies a natural language spoken, written, or otherwise conveyed |
433 |
by human beings for communication of information to other human |
434 |
beings. Computer languages are explicitly excluded. |
435 |
|
436 |
The syntax and registry of HTML language tags is the same as that |
437 |
defined by RFC 1766 [RFC1766]. In summary, a language tag is composed |
438 |
of one or more parts: A primary language tag and a possibly empty |
439 |
series of subtags: |
440 |
|
441 |
language-tag = primary-tag *( "-" subtag ) |
442 |
primary-tag = 1*8ALPHA |
443 |
subtag = 1*8ALPHA |
444 |
|
445 |
|
446 |
|
447 |
Expires 2 December 1996 [Page 8] |
448 |
|
449 |
Internet Draft HTML internationalization 27 May 1996 |
450 |
|
451 |
|
452 |
Whitespace is not allowed within the tag and all tags are case- |
453 |
insensitive. The namespace of language tags is administered by the |
454 |
IANA. Example tags include: |
455 |
|
456 |
en, en-US, en-cockney, i-cherokee, x-pig-latin |
457 |
|
458 |
In the context of HTML, a language tag is not to be interpreted as a |
459 |
single token, as per RFC 1766, but as a hierarchy. For example, a |
460 |
user agent that adjusts rendering according to language should con- |
461 |
sider that it has a match when a language tag in a style sheet entry |
462 |
matches the initial portion of the language tag of an element. An |
463 |
exact match should be preferred. This interpretation allows an ele- |
464 |
ment marked up as, for instance, "en-US" to trigger styles corre- |
465 |
sponding to, in order of preference, US-English ("en-US") or 'plain' |
466 |
or 'international' English ("en"). |
467 |
|
468 |
NOTE -- using the language tag as a hierarchy does not |
469 |
imply that all languages with a common prefix will be |
470 |
understood by those fluent in one or more of those lan- |
471 |
guages; it simply allows the user to request this commonal- |
472 |
ity when it is true for that user. |
473 |
|
474 |
The rendering of elements may be affected by the LANG attribute. For |
475 |
any element, the value of the LANG attribute overrides the value |
476 |
specified by the LANG attribute of any enclosing element and the |
477 |
value (if any) of the HTTP Content-Language header. If none of these |
478 |
are set, a suitable default, perhaps controlled by user preferences, |
479 |
by automatic context analysis or by the user's locale, should be used |
480 |
to control rendering. |
481 |
|
482 |
4. Additional entities, attributes and elements |
483 |
|
484 |
4.1. Full Latin-1 entity set |
485 |
|
486 |
According to the suggestion of section 14 of [RFC1866], the set of |
487 |
Latin-1 entities is extended to cover the whole right part of |
488 |
ISO-8859-1 (all code positions with the high-order bit set), includ- |
489 |
ing the already commonly used , © and ®. The names of |
490 |
the entities are taken from the appendices of SGML [ISO-8879]. A |
491 |
list is provided in section 7.3 of this specification. |
492 |
|
493 |
4.2. Markup for language-dependent presentation |
494 |
|
495 |
|
496 |
4.2.1. Overview |
497 |
|
498 |
For the correct presentation of text in certain languages (irrespec- |
499 |
tive of formatting issues), some support in the form of additional |
500 |
|
501 |
|
502 |
|
503 |
Expires 2 December 1996 [Page 9] |
504 |
|
505 |
Internet Draft HTML internationalization 27 May 1996 |
506 |
|
507 |
|
508 |
entities and elements is needed. |
509 |
|
510 |
In particular, the following features are dealt with: |
511 |
|
512 |
- Markup of bidirectional text, i.e. text where left-to-right and |
513 |
right-to-left scripts are mixed. |
514 |
|
515 |
- Control of cursive joining behaviour in contexts where the default |
516 |
behaviour is not appropriate. |
517 |
|
518 |
- Language-dependent rendering of short (in-line) quotations. |
519 |
|
520 |
- Better justification control for languages where this is impor- |
521 |
tant. |
522 |
|
523 |
- Superscripts and subscripts for languages where they appear as |
524 |
part of general text. |
525 |
|
526 |
Some of the above features need very little additional support; oth- |
527 |
ers need more. The additional features are introduced below with |
528 |
brief comments only. Explanations on cursive joining behaviour and |
529 |
bidirectional text follow later. For cursive joining behaviour and |
530 |
bidirectional text, this document follows [UNICODE] in that: i) char- |
531 |
acter semantics, where applicable, are identical to [UNICODE], and |
532 |
ii) where functionality is moved to HTML as a higher level protocol, |
533 |
this is done in a way that allows straightforward conversion to the |
534 |
lower-level mechanisms defined in [UNICODE]. |
535 |
|
536 |
|
537 |
4.2.2. List of entities, elements, and attributes |
538 |
|
539 |
First, a generic container is needed to carry the LANG and DIR (see |
540 |
below) attributes in cases where no other element is appropriate; the |
541 |
SPAN element is introduced for that purpose. |
542 |
|
543 |
A set of named character entities is added for use with bidirectional |
544 |
rendering and cursive joining control: |
545 |
|
546 |
<!ENTITY zwnj CDATA "‌"--=zero width non-joiner--> |
547 |
<!ENTITY zwj CDATA "‍"--=zero width joiner--> |
548 |
<!ENTITY lrm CDATA "‎"--=left-to-right mark--> |
549 |
<!ENTITY rlm CDATA "‏"--=right-to-left mark--> |
550 |
|
551 |
These entities can be used in place of the corresponding formatting |
552 |
characters whenever convenient, for example to ease keyboard entry or |
553 |
when a formatting character is not available in the character encod- |
554 |
ing of the document. |
555 |
|
556 |
|
557 |
|
558 |
|
559 |
Expires 2 December 1996 [Page 10] |
560 |
|
561 |
Internet Draft HTML internationalization 27 May 1996 |
562 |
|
563 |
|
564 |
Next, an attribute called DIR is introduced, restricted to the values |
565 |
LTR (left-to-right) and RTL (right-to-left) and admitted by most ele- |
566 |
ments, for the indication of directionality in the context of bidi- |
567 |
rectional text (see 4.2.4 below for details). Since any text and |
568 |
many other elements (e.g. tables) can logically be assigned a direc- |
569 |
tionality, almost all HTML elements admit the DIR attribute. The DTD |
570 |
reflects this. It is also intended that any new element introduced |
571 |
in later versions of HTML will admit the DIR attribute, unless there |
572 |
is a good reason not to do so. |
573 |
|
574 |
A new element called BDO (BIDI Override) is introduced, which |
575 |
requires the DIR attribute to specify whether the override is left- |
576 |
to-right or right-to-left. This element is required for bidirec- |
577 |
tional text control; for detailed explanations, see section 4.2.4. |
578 |
|
579 |
The <Q> element is introduced to allow language-dependent rendering |
580 |
of short quotations depending on language and platform capability. |
581 |
As the following examples show, in particular the quotation marks |
582 |
surrounding the quotation are affected: "a quotation in English", |
583 |
`another, slightly better one', ,,a quotation in German'', << a quo- |
584 |
tation in French >>. The contents of the <Q> element does not |
585 |
include quotation marks, they have to be added by the rendering pro- |
586 |
cess. |
587 |
|
588 |
NOTE -- <Q> elements can be nested. Many languages use dif- |
589 |
ferent quotation styles for outer and inner quotations, and |
590 |
this should be respected by user-agents implementing this |
591 |
element. |
592 |
|
593 |
Many languages require superscripts for proper rendering: as an exam- |
594 |
ple, the French "Mlle Dupont" should have "lle" in superscript. The |
595 |
<SUP> element, and its sibling <SUB>, are introduced to allow proper |
596 |
markup of such text. <SUP> and <SUB> contents are restricted to |
597 |
PCDATA to avoid nesting problems. |
598 |
|
599 |
Finally, in many languages text justification is much more important |
600 |
than it is in Western languages, and justifies markup. The ALIGN |
601 |
attribute, admitting values of LEFT, RIGHT, CENTER and JUSTIFY, is |
602 |
added to a selection of elements where it makes sense (block-like). |
603 |
If a user-agent chooses to have LEFT as a default for blocks of left- |
604 |
to-right directionality, it should use RIGHT for blocks of right-to- |
605 |
left directionality. |
606 |
|
607 |
In the DTD, the LANG and DIR attributes are grouped together in a |
608 |
parameter entity called attrs. In addition, the ID and CLASS |
609 |
attributes from RFC 1942 [RFC1942] were added to attrs, as was done |
610 |
in the latter. The ID, and CLASS attributes are required for use with |
611 |
style sheets, and RFC 1942 defines them as follows: |
612 |
|
613 |
|
614 |
|
615 |
Expires 2 December 1996 [Page 11] |
616 |
|
617 |
Internet Draft HTML internationalization 27 May 1996 |
618 |
|
619 |
|
620 |
ID Used to define a document-wide identifier. This can be used |
621 |
for naming positions within documents as the destination of a |
622 |
hypertext link. It may also be used by style sheets for ren- |
623 |
dering an element in a unique style. An ID attribute value is |
624 |
an SGML NAME token. NAME tokens are formed by an initial let- |
625 |
ter followed by letters, digits, "-" and "." characters. The |
626 |
letters are restricted to A-Z and a-z. |
627 |
|
628 |
CLASS A space separated list of SGML NAME tokens. CLASS names spec- |
629 |
ify that the element belongs to the corresponding named |
630 |
classes. It allows authors to distinguish different roles |
631 |
played by the same tag. The classes may be used by style |
632 |
sheets to provide different renderings as appropriate to |
633 |
these roles. |
634 |
|
635 |
4.2.3. Cursive joining behaviour |
636 |
|
637 |
Markup is needed in some cases to force cursive joining behavior in |
638 |
contexts in which it would not normally occur, or to block it when it |
639 |
would normally occur. |
640 |
|
641 |
The zero-width joiner and non-joiner (‍ and ‌) are used to |
642 |
control cursive joining behaviour. For example, ARABIC LETTER HEH is |
643 |
used in isolation to abbreviate "Hijri" (the Islamic calendrical sys- |
644 |
tem); however, the initial form of the letter is desired, because the |
645 |
isolated form of HEH looks like the digit five as employed in Arabic |
646 |
script. This is obtained by following the HEH with a zero-width |
647 |
joiner whose only effect is to provide context. In Persian texts, |
648 |
there are cases where a letter that normally would join a subsequent |
649 |
letter in a cursive connection does not. Here a zero-width non- |
650 |
joiner is used. |
651 |
|
652 |
4.2.4. Bidirectional text |
653 |
|
654 |
Many languages are written in horizontal lines from left to right, |
655 |
while others are written from right to left. When both writing |
656 |
directions are present, one talks of bidirectional text (BIDI for |
657 |
short). BIDI text requires markup in special circumstances where |
658 |
ambiguities as to the directionality of some characters have to be |
659 |
resolved. This markup affects the ability to render BIDI text in a |
660 |
semantically legible fashion. That is, without this special BIDI |
661 |
markup, cases arise which would prevent *any* rendering whatsoever |
662 |
that reflected the basic meaning of the text. Plain text may contain |
663 |
this markup (joining or BIDI) in the form of special-purpose charac- |
664 |
ters; in HTML, these are supplemented by SGML markup. |
665 |
|
666 |
BIDI is a complex issue, and implementers are advised to consult |
667 |
appropriate documentation such as [UNICODE]. Here, explanations are |
668 |
|
669 |
|
670 |
|
671 |
Expires 2 December 1996 [Page 12] |
672 |
|
673 |
Internet Draft HTML internationalization 27 May 1996 |
674 |
|
675 |
|
676 |
given only as far as they are needed to understand the necessity of |
677 |
the features introduced and to define their exact semantics. |
678 |
|
679 |
The Unicode BIDI algorithm is based on a logical sequence of text |
680 |
characters and works mainly by reference to the implicit directional- |
681 |
ity of characters (e.g. Hebrew and Arabic characters are specified to |
682 |
be rendered from right to left, etc.). |
683 |
|
684 |
The left-to-right and right-to-left marks (‎ and ‏) are used |
685 |
to disambiguate directionality of neutral characters. For example, |
686 |
when a double quote sits between an Arabic and a Latin letter, its |
687 |
direction is ambiguous; if a directional mark is added on one side |
688 |
such that the quotation mark is surrounded by characters of only one |
689 |
directionality, the ambiguity is removed. These characters are like |
690 |
zero width spaces which have a directional property (but no word/line |
691 |
break property). |
692 |
|
693 |
Nested embeddings of contra-directional text runs, due to nested quo- |
694 |
tations or to the pasting of text from one BIDI context to another, |
695 |
is also a case where the implicit directionality of characters is not |
696 |
sufficient, requiring markup. Also, it is frequently desirable to |
697 |
specify the basic directionality of a block of text. For these pur- |
698 |
poses, the DIR attribute is used. |
699 |
|
700 |
On block-type elements, the DIR attribute indicates the base direc- |
701 |
tionality of the text in the block; if omitted it is inherited from |
702 |
the parent element. The default directionality of the overall HTML |
703 |
document is left-to-right. |
704 |
|
705 |
On inline elements, it makes the element start a new embedding level |
706 |
(to be explained below); if omitted the inline element does not start |
707 |
a new embedding level. |
708 |
|
709 |
NOTE -- the PRE, XMP and LISTING elements admit the DIR |
710 |
attribute, indicating that the contents should not be con- |
711 |
sidered as preformatted with respect to bidirectional lay- |
712 |
out. The BIDI algorithm still needs to be applied to each |
713 |
line of text. |
714 |
|
715 |
Following is an example of a case where embedding is needed, showing |
716 |
its effect: |
717 |
|
718 |
Given the following latin (upper case) and arabic (lower |
719 |
case) letters in backing store with the specified embed- |
720 |
dings: |
721 |
|
722 |
<SPAN DIR=LTR> AB <SPAN DIR=RTL> xy <SPAN DIR=LTR> CD |
723 |
</SPAN> zw </SPAN> EF </SPAN> |
724 |
|
725 |
|
726 |
|
727 |
Expires 2 December 1996 [Page 13] |
728 |
|
729 |
Internet Draft HTML internationalization 27 May 1996 |
730 |
|
731 |
|
732 |
One gets the following rendering (with [] showing the |
733 |
directional transitions): |
734 |
|
735 |
[ AB [ wz [ CD ] yx ] EF ] |
736 |
|
737 |
On the other hand, without this markup and with a base |
738 |
direction of LTR one gets the following rendering: |
739 |
|
740 |
[ AB [ yx ] CD [ wz ] EF ] |
741 |
|
742 |
Notice that yx is on the left and wz on the right unlike |
743 |
the above case where the embedding levels are used. With- |
744 |
out the embedding markup one has at most two levels: a base |
745 |
directional level and a single counterflow directional |
746 |
level. |
747 |
|
748 |
The DIR attribute on inline elements is equivalent to the formatting |
749 |
characters LEFT-TO-RIGHT EMBEDDING (202A) and RIGHT-TO-LEFT EMBED- |
750 |
DING (202B) of ISO 10646. The end tag of the element is equivalent |
751 |
to the POP DIRECTIONAL FORMATTING (202C) character. |
752 |
|
753 |
Directional override, as provided by the <BDO> element, is needed to |
754 |
deal with unusual short pieces of text in which directionality cannot |
755 |
be resolved from context in an unambiguous fashion. For example, it |
756 |
can be used to force left-to-right (or right-to-left) display of part |
757 |
numbers composed of Latin letters, digits and Hebrew letters. |
758 |
|
759 |
The effect of <BDO> is to force the directionality of all characters |
760 |
within it to the value of DIR, irrespective of their intrinsic direc- |
761 |
tional properties. It is equivalent to using the LEFT-TO-RIGHT OVER- |
762 |
RIDE (202D) or RIGHT-TO-LEFT OVERRIDE (202E) characters of ISO 10646, |
763 |
the end tag again being equivalent to the POP DIRECTIONAL FORMATTING |
764 |
(202C) character. |
765 |
|
766 |
NOTE -- authors and authoring software writers should be |
767 |
aware that conflicts can arise if the DIR attribute is used |
768 |
on inline elements (including <BDO>) concurrently with the |
769 |
use of the corresponding ISO 10646 formatting characters. |
770 |
Preferably one or the other should be used exclusively; the |
771 |
markup method is better able to guarantee document struc- |
772 |
tural integrity, and alleviates some problems when editing |
773 |
bidirectional HTML text with a simple text editor, but some |
774 |
software may be more apt at using the 10646 characters. If |
775 |
both methods are used, great care should be exercised to |
776 |
insure proper nesting of markup and directional embedding |
777 |
or override; otherwise, rendering results are undefined. |
778 |
|
779 |
|
780 |
|
781 |
|
782 |
|
783 |
Expires 2 December 1996 [Page 14] |
784 |
|
785 |
Internet Draft HTML internationalization 27 May 1996 |
786 |
|
787 |
|
788 |
5. Forms |
789 |
|
790 |
|
791 |
5.1. DTD additions |
792 |
|
793 |
It is natural to expect input in any language in forms, as they pro- |
794 |
vide one of the only ways of obtaining user input. While this is pri- |
795 |
marily a UI issue, there are some things that should be specified at |
796 |
the HTML level to guide behavior and promote interoperability. |
797 |
|
798 |
To ensure full interoperability, it is necessary for the user agent |
799 |
(and the user) to have an indication of the character encoding(s) |
800 |
that the server providing a form will be able to handle upon submis- |
801 |
sion of the filled-in form. Such an indication is provided by the |
802 |
ACCEPT-CHARSET attribute of the INPUT and TEXTAREA elements, modeled |
803 |
on the HTTP Accept-Charset header (see [HTTP-1.1]), which contains a |
804 |
space and/or comma delimited list of character sets acceptable to the |
805 |
server. A user agent may want to somehow advise the user of the con- |
806 |
tents of this attribute, or to restrict his possibility to enter |
807 |
characters outside the repertoires of the listed character sets. |
808 |
|
809 |
NOTE -- The list of character sets is to be interpreted as |
810 |
an EXCLUSIVE-OR list; the server announces that it is ready |
811 |
to accept any ONE of these character encoding schemes for |
812 |
each part of a multipart entity. The client may perform |
813 |
character encoding translation to satisfy the server if |
814 |
necessary. |
815 |
|
816 |
NOTE -- The default value for the ACCEPT-CHARSET attribute |
817 |
of an INPUT or TEXTAREA element is the reserved value |
818 |
"UNKNOWN". A user agent may interpret that value as the |
819 |
character encoding scheme that was used to transmit the |
820 |
document containing that element. |
821 |
|
822 |
|
823 |
5.2. Form submission |
824 |
|
825 |
The HTML 2.0 form submission mechanism, based on the "application/x- |
826 |
www-form-urlencoded" media type, is ill-equipped with regard to |
827 |
internationalization. In fact, since URLs are restricted to ASCII |
828 |
characters, the mechanism is akward even for ISO-8859-1 text. Sec- |
829 |
tion 2.2 of [RFC1738] specifies that octets may be encoded using the |
830 |
"%HH" notation, but text submitted from a form is composed of charac- |
831 |
ters, not octets. Lacking a specification of a character encoding |
832 |
scheme, the "%HH" notation has no well-defined meaning. |
833 |
|
834 |
The best solution is to use the "multipart/form-data" media type |
835 |
described in [RFC1867] with the POST method of form submission. This |
836 |
|
837 |
|
838 |
|
839 |
Expires 2 December 1996 [Page 15] |
840 |
|
841 |
Internet Draft HTML internationalization 27 May 1996 |
842 |
|
843 |
|
844 |
mechanism encapsulates the value part of each name-value pair in a |
845 |
body-part of a multipart MIME body that is sent as the HTTP entity; |
846 |
each body part can be labeled with an appropriate Content-Type, |
847 |
including if necessary a charset parameter that specifies the charac- |
848 |
ter encoding scheme. The changes to the DTD necessary to support |
849 |
this method of form submission have been incorporated in the DTD |
850 |
included in this specification. |
851 |
|
852 |
A less satisfactory solution is to add a MIME charset parameter to |
853 |
the "application/x-www-form-urlencoded" media type specifier sent |
854 |
along with a POST method form submission, with the understanding that |
855 |
the URL encoding of [RFC1738] is applied on top of the specified |
856 |
character encoding, as a kind of implicit Content-Transfer-Encoding. |
857 |
|
858 |
One problem with both solutions above is that current browsers do not |
859 |
generally allow for bookmarks to specify the POST method; this should |
860 |
be improved. Conversely, the GET method could be used with the form |
861 |
data transmitted in the body instead of in the URL. Nothing in the |
862 |
protocol seems to prevent it, but no implementations appear to exist |
863 |
at present. |
864 |
|
865 |
How the user agent determines the encoding of the text entered by the |
866 |
user is outside the scope of this specification. |
867 |
|
868 |
NOTE -- Designers of forms and their handling scripts |
869 |
should be aware of an important caveat: when the default |
870 |
value of a field (the VALUE attribute) is returned upon |
871 |
form submission (i.e. the user did not modify this value), |
872 |
it cannot be guaranteed to be transmitted as a sequence of |
873 |
octets identical to that in the source document -- only as |
874 |
a possibly different but valid encoding of the same |
875 |
sequence of text elements. This may be true even if the |
876 |
encoding of the document containing the form and that used |
877 |
for submission are the same. |
878 |
|
879 |
Differences can occur when a sequence of characters can be |
880 |
represented by various sequences of octets, and also when a |
881 |
composite sequence (a base character plus one or more com- |
882 |
bining diacritics) can be represented by either a different |
883 |
but equivalent composite sequence or by a fully precomposed |
884 |
character. For instance, the UCS-2 sequence 00EA+0232 |
885 |
(LATIN SMALL LETTER E WITH CIRCUMFLEX ACCENT + COMBINING |
886 |
DOT BELOW) may be transformed into 1EC7 (LATIN SMALL LETTER |
887 |
E WITH CIRCUMFLEX ACCENT AND DOT BELOW), into |
888 |
0065+0302+0323 (LATIN SMALL LETTER E + COMBINING CIRCUMFLEX |
889 |
ACCENT + COMBINING DOT BELOW), as well as into other equiv- |
890 |
alent composite sequences. |
891 |
|
892 |
|
893 |
|
894 |
|
895 |
Expires 2 December 1996 [Page 16] |
896 |
|
897 |
Internet Draft HTML internationalization 27 May 1996 |
898 |
|
899 |
|
900 |
6. Miscellaneous |
901 |
|
902 |
Proper interpretation of a text document requires that the character |
903 |
encoding scheme be known. Current HTTP servers, however, do not gen- |
904 |
erally include an appropriate charset parameter with the Content-Type |
905 |
header. This is bad behaviour[2], and as such strongly discouraged, |
906 |
but some preventive measures can be taken to minimize the detrimental |
907 |
effects. |
908 |
|
909 |
In the case where a document is accessed from a hyperlink in an ori- |
910 |
gin HTML document, a CHARSET attribute is added to the attribute list |
911 |
of elements with link semantics (A and LINK), specifically by adding |
912 |
it to the linkExtraAttributes entity. The value of that attribute is |
913 |
to be considered a hint to the User Agent as to the character encod- |
914 |
ing scheme used by the ressource pointed to by the hyperlink; it |
915 |
should be the appropriate value of the MIME charset parameter for |
916 |
that ressource. |
917 |
|
918 |
In any document, it is possible to include an indication of the |
919 |
encoding scheme like the following, as early as possible within the |
920 |
HEAD of the document: |
921 |
|
922 |
<META HTTP-EQUIV="Content-Type" |
923 |
CONTENT="text/html; charset=ISO-2022-JP"> |
924 |
|
925 |
This is not foolproof, but will work if the encoding scheme is such |
926 |
that ASCII characters stand for themselves at least until the META |
927 |
element is parsed. Note that there are better ways for a server to |
928 |
obtain character encoding information, instead of the unreliable |
929 |
<META> above; see [NICOL2] for some details and a proposal. |
930 |
|
931 |
For definiteness, the "charset" parameter received from the source of |
932 |
the document should be considered the most authoritative, followed in |
933 |
order of preference by the contents of a META element such as the |
934 |
above, and finally the CHARSET parameter of the anchor that was fol- |
935 |
lowed (if any). |
936 |
|
937 |
When HTML text is transmitted directly in UCS-2 or UCS-4 form, the |
938 |
question of byte order arises: does the high-order byte of each |
939 |
multi-byte character come first or last? For definiteness, this |
940 |
specification recommends that UCS-2 and UCS-4 be transmitted in big- |
941 |
----------- |
942 |
2 This bad behaviour is even encouraged by the continued |
943 |
existence of browsers that declare an unrecognized media |
944 |
type when they receive a charset parameter. User agent |
945 |
implementators are strongly encouraged to make their soft- |
946 |
ware tolerant of this parameter, even if they cannot take |
947 |
advantage of it. |
948 |
|
949 |
|
950 |
|
951 |
Expires 2 December 1996 [Page 17] |
952 |
|
953 |
Internet Draft HTML internationalization 27 May 1996 |
954 |
|
955 |
|
956 |
endian byte order (high order byte first), which corresponds to the |
957 |
established network byte order for two- and four-byte quantities, to |
958 |
the Unicode recommendation for serialized text data and to RFC 1641. |
959 |
Furthermore, to maximize chances of proper interpretation, it is rec- |
960 |
ommended that documents transmitted as UCS-2 or UCS-4 always begin |
961 |
with a ZERO-WIDTH NON-BREAKING SPACE character (hexadecimal FEFF or |
962 |
0000FEFF) which, when byte-reversed becomes number FFFE or FFFE0000, |
963 |
a character guaranteed to be never assigned. Thus, a user-agent |
964 |
receiving an FFFE as the first octets of a text would know that bytes |
965 |
have to be reversed for the remainder of the text. |
966 |
|
967 |
There exist so-called UCS Transformation Formats than can be used to |
968 |
transmit UCS data, in addition to UCS-2 and UCS-4. UTF-7 [RFC1642] |
969 |
and UTF-8 [UTF-8] have favorable properties (no byte-ordering prob- |
970 |
lem, different flavours of ASCII compatibility) that make them worthy |
971 |
of consideration, especially for transmission of multilingual text. |
972 |
Another encoding scheme, MNEM [RFC1345], also has interesting proper- |
973 |
ties and the capability to transmit the full UCS. The UTF-1 trans- |
974 |
formation format of ISO 10646:1993 (registered by IANA as |
975 |
ISO-10646-UTF-1), has been removed from ISO 10646 by amendment 4, and |
976 |
should not be used. |
977 |
|
978 |
The SOFT HYPHEN character (U+00AD) needs a little attention from |
979 |
user-agent implementers. It is present in many character sets |
980 |
(including the whole ISO 8859 series and, of course, ISO 10646), and |
981 |
has semantics different from the plain HYPHEN. If not used for |
982 |
hyphenation, the soft hyphen must be completely ignored. For exam- |
983 |
ple, "rec­ord" should display as "record", should match a search |
984 |
for "record", and should sort as "record". Non-observance of these |
985 |
semantics effectively discourages its use on the World Wide Web, even |
986 |
with software that does support it. |
987 |
|
988 |
7. HTML Public Text |
989 |
|
990 |
7.1. HTML DTD |
991 |
|
992 |
This section contains a DTD for HTML based on the HTML 2.0 DTD of RFC |
993 |
1866, incorporating the changes for file upload as specified in RFC |
994 |
1867, and the changes deriving from this document. |
995 |
|
996 |
<!-- html.dtd |
997 |
|
998 |
Document Type Definition for the HyperText Markup Language, |
999 |
extended for internationalisation (HTML DTD) |
1000 |
|
1001 |
Last revised: 96/05/27 |
1002 |
|
1003 |
Authors: Daniel W. Connolly <connolly@w3.org> |
1004 |
|
1005 |
|
1006 |
|
1007 |
Expires 2 December 1996 [Page 18] |
1008 |
|
1009 |
Internet Draft HTML internationalization 27 May 1996 |
1010 |
|
1011 |
|
1012 |
Francois Yergeau <yergeau@alis.com> |
1013 |
See Also: html.decl, html-1.dtd |
1014 |
http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html |
1015 |
--> |
1016 |
|
1017 |
<!ENTITY % HTML.Version |
1018 |
"-//IETF//DTD HTML//EN" |
1019 |
|
1020 |
-- Typical usage: |
1021 |
|
1022 |
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> |
1023 |
<html> |
1024 |
... |
1025 |
</html> |
1026 |
-- |
1027 |
> |
1028 |
|
1029 |
|
1030 |
<!--============ Feature Test Entities ========================--> |
1031 |
|
1032 |
<!ENTITY % HTML.Recommended "IGNORE" |
1033 |
-- Certain features of the language are necessary for |
1034 |
compatibility with widespread usage, but they may |
1035 |
compromise the structural integrity of a document. |
1036 |
This feature test entity enables a more prescriptive |
1037 |
document type definition that eliminates |
1038 |
those features. |
1039 |
--> |
1040 |
|
1041 |
<![ %HTML.Recommended [ |
1042 |
<!ENTITY % HTML.Deprecated "IGNORE"> |
1043 |
]]> |
1044 |
|
1045 |
<!ENTITY % HTML.Deprecated "INCLUDE" |
1046 |
-- Certain features of the language are necessary for |
1047 |
compatibility with earlier versions of the specification, |
1048 |
but they tend to be used and implemented inconsistently, |
1049 |
and their use is deprecated. This feature test entity |
1050 |
enables a document type definition that eliminates |
1051 |
these features. |
1052 |
--> |
1053 |
|
1054 |
<!ENTITY % HTML.Highlighting "INCLUDE" |
1055 |
-- Use this feature test entity to validate that a |
1056 |
document uses no highlighting tags, which may be |
1057 |
ignored on minimal implementations. |
1058 |
--> |
1059 |
|
1060 |
|
1061 |
|
1062 |
|
1063 |
Expires 2 December 1996 [Page 19] |
1064 |
|
1065 |
Internet Draft HTML internationalization 27 May 1996 |
1066 |
|
1067 |
|
1068 |
<!ENTITY % HTML.Forms "INCLUDE" |
1069 |
-- Use this feature test entity to validate that a document |
1070 |
contains no forms, which may not be supported in minimal |
1071 |
implementations |
1072 |
--> |
1073 |
|
1074 |
<!--============== Imported Names ==============================--> |
1075 |
|
1076 |
<!ENTITY % Content-Type "CDATA" |
1077 |
-- meaning an internet media type |
1078 |
(aka MIME content type, as per RFC1521) |
1079 |
--> |
1080 |
|
1081 |
<!ENTITY % HTTP-Method "GET | POST" |
1082 |
-- as per HTTP specification, RFC1945 |
1083 |
--> |
1084 |
|
1085 |
<!--========= DTD "Macros" =====================--> |
1086 |
|
1087 |
<!ENTITY % heading "H1|H2|H3|H4|H5|H6"> |
1088 |
|
1089 |
<!ENTITY % list " UL | OL | DIR | MENU " > |
1090 |
|
1091 |
<!ENTITY % attrs -- common attributes for elements -- |
1092 |
"LANG NAME #IMPLIED -- RFC 1766 language tag -- |
1093 |
DIR (ltr|rtl) #IMPLIED -- text directionnality -- |
1094 |
ID ID #IMPLIED -- element identifier (from RFC1942) -- |
1095 |
CLASS NAMES #IMPLIED -- for subclassing elements (from RFC1942) --"> |
1096 |
|
1097 |
<!ENTITY % just -- an attribute for text justification -- |
1098 |
"ALIGN (left|right|center|justify) #IMPLIED" |
1099 |
-- default is left for ltr paragraphs, right for rtl -- > |
1100 |
|
1101 |
<!--======= Character mnemonic entities =================--> |
1102 |
|
1103 |
<!ENTITY % ISOlat1 PUBLIC |
1104 |
"ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML"> |
1105 |
%ISOlat1; |
1106 |
|
1107 |
<!ENTITY amp CDATA "&" -- ampersand --> |
1108 |
<!ENTITY gt CDATA ">" -- greater than --> |
1109 |
<!ENTITY lt CDATA "<" -- less than --> |
1110 |
<!ENTITY quot CDATA """ -- double quote --> |
1111 |
|
1112 |
<!--Entities for language-dependent presentation (BIDI and contextual analysis) --> |
1113 |
<!ENTITY zwnj CDATA "‌"-- zero width non-joiner--> |
1114 |
<!ENTITY zwj CDATA "‍"-- zero width joiner--> |
1115 |
<!ENTITY lrm CDATA "‎"-- left-to-right mark--> |
1116 |
|
1117 |
|
1118 |
|
1119 |
Expires 2 December 1996 [Page 20] |
1120 |
|
1121 |
Internet Draft HTML internationalization 27 May 1996 |
1122 |
|
1123 |
|
1124 |
<!ENTITY rlm CDATA "‏"-- right-to-left mark--> |
1125 |
|
1126 |
|
1127 |
<!--========= SGML Document Access (SDA) Parameter Entities =====--> |
1128 |
|
1129 |
<!-- HTML contains SGML Document Access (SDA) fixed attributes |
1130 |
in support of easy transformation to the International Committee |
1131 |
for Accessible Document Design (ICADD) DTD |
1132 |
"-//EC-USA-CDA/ICADD//DTD ICADD22//EN". |
1133 |
ICADD applications are designed to support usable access to |
1134 |
structured information by print-impaired individuals through |
1135 |
Braille, large print and voice synthesis. For more information on |
1136 |
SDA & ICADD: |
1137 |
- ISO 12083:1993, Annex A.8, Facilities for Braille, |
1138 |
large print and computer voice |
1139 |
- ICADD ListServ |
1140 |
<ICADD%ASUACAD.BITNET@ARIZVM1.ccit.arizona.edu> |
1141 |
- Usenet news group bit.listserv.easi |
1142 |
- Recording for the Blind, +1 800 221 4792 |
1143 |
--> |
1144 |
|
1145 |
<!ENTITY % SDAFORM "SDAFORM CDATA #FIXED" |
1146 |
-- one to one mapping --> |
1147 |
<!ENTITY % SDARULE "SDARULE CDATA #FIXED" |
1148 |
-- context-sensitive mapping --> |
1149 |
<!ENTITY % SDAPREF "SDAPREF CDATA #FIXED" |
1150 |
-- generated text prefix --> |
1151 |
<!ENTITY % SDASUFF "SDASUFF CDATA #FIXED" |
1152 |
-- generated text suffix --> |
1153 |
<!ENTITY % SDASUSP "SDASUSP NAME #FIXED" |
1154 |
-- suspend transform process --> |
1155 |
|
1156 |
|
1157 |
<!--========== Text Markup =====================--> |
1158 |
|
1159 |
<![ %HTML.Highlighting [ |
1160 |
|
1161 |
<!ENTITY % font " TT | B | I "> |
1162 |
|
1163 |
<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE "> |
1164 |
|
1165 |
<!ENTITY % text "#PCDATA|A|IMG|BR|%phrase|%font|SPAN|Q|BDO|SUP|SUB"> |
1166 |
|
1167 |
<!ELEMENT (%font;|%phrase) - - (%text)*> |
1168 |
<!ATTLIST ( TT | CODE | SAMP | KBD | VAR ) |
1169 |
%attrs; |
1170 |
%SDAFORM; "Lit" |
1171 |
> |
1172 |
|
1173 |
|
1174 |
|
1175 |
Expires 2 December 1996 [Page 21] |
1176 |
|
1177 |
Internet Draft HTML internationalization 27 May 1996 |
1178 |
|
1179 |
|
1180 |
<!ATTLIST ( B | STRONG ) |
1181 |
%attrs; |
1182 |
%SDAFORM; "B" |
1183 |
> |
1184 |
<!ATTLIST ( I | EM | CITE ) |
1185 |
%attrs; |
1186 |
%SDAFORM; "It" |
1187 |
> |
1188 |
|
1189 |
<!-- <TT> Typewriter text --> |
1190 |
<!-- <B> Bold text --> |
1191 |
<!-- <I> Italic text --> |
1192 |
|
1193 |
<!-- <EM> Emphasized phrase --> |
1194 |
<!-- <STRONG> Strong emphasis --> |
1195 |
<!-- <CODE> Source code phrase --> |
1196 |
<!-- <SAMP> Sample text or characters --> |
1197 |
<!-- <KBD> Keyboard phrase, e.g. user input --> |
1198 |
<!-- <VAR> Variable phrase or substitutable --> |
1199 |
<!-- <CITE> Name or title of cited work --> |
1200 |
|
1201 |
<!ENTITY % pre.content "#PCDATA|A|HR|BR|%font|%phrase|SPAN|BDO"> |
1202 |
|
1203 |
]]> |
1204 |
|
1205 |
<!ENTITY % text "#PCDATA|A|IMG|BR|SPAN|Q|BDO|SUP|SUB"> |
1206 |
|
1207 |
<!ELEMENT BR - O EMPTY> |
1208 |
<!ATTLIST BR |
1209 |
%SDAPREF; "&#RE;" |
1210 |
> |
1211 |
|
1212 |
<!-- <BR> Line break --> |
1213 |
|
1214 |
<!ELEMENT SPAN - - (%text)*> |
1215 |
<!ATTLIST SPAN |
1216 |
%attrs; |
1217 |
%SDAFORM; "other #Attlist" |
1218 |
> |
1219 |
|
1220 |
<!-- <SPAN> Generic inline container --> |
1221 |
<!-- <SPAN DIR=...> New counterflow embedding --> |
1222 |
<!-- <SPAN LANG="..."> Language of contents --> |
1223 |
|
1224 |
<!ELEMENT Q - - (%text)*> |
1225 |
<!ATTLIST Q |
1226 |
%attrs; |
1227 |
%SDAPREF; '"' |
1228 |
|
1229 |
|
1230 |
|
1231 |
Expires 2 December 1996 [Page 22] |
1232 |
|
1233 |
Internet Draft HTML internationalization 27 May 1996 |
1234 |
|
1235 |
|
1236 |
%SDASUFF; '"' |
1237 |
> |
1238 |
|
1239 |
<!-- <Q> Short quotation --> |
1240 |
<!-- <Q LANG=xx> Language of quotation is xx --> |
1241 |
<!-- <Q DIR=...> New conterflow embedding --> |
1242 |
|
1243 |
<!ELEMENT BDO - - (%text)+> |
1244 |
<!ATTLIST BDO |
1245 |
LANG NAME #IMPLIED |
1246 |
DIR (ltr|rtl) #REQUIRED |
1247 |
%SDAPREF "Bidi Override #Attval(DIR): " |
1248 |
%SDASUFF "End Bidi" |
1249 |
> |
1250 |
|
1251 |
<!-- <BDO DIR=...> Override directionality of text to value of DIR --> |
1252 |
<!-- <BDO LANG=...> Language of contents --> |
1253 |
|
1254 |
<!ELEMENT (SUP|SUB) - - (#PCDATA)> |
1255 |
<!ATTLIST (SUP) |
1256 |
%attrs; |
1257 |
%SDAPREF "Superscript(#content)" |
1258 |
> |
1259 |
<!ATTLIST (SUB) |
1260 |
%attrs; |
1261 |
%SDAPREF "Subscript(#content)" |
1262 |
> |
1263 |
|
1264 |
<!-- <SUP> Superscript --> |
1265 |
<!-- <SUB> Subscript --> |
1266 |
|
1267 |
<!--========= Link Markup ======================--> |
1268 |
|
1269 |
<!ENTITY % linkType "NAMES"> |
1270 |
|
1271 |
<!ENTITY % linkExtraAttributes |
1272 |
"REL %linkType #IMPLIED |
1273 |
REV %linkType #IMPLIED |
1274 |
URN CDATA #IMPLIED |
1275 |
TITLE CDATA #IMPLIED |
1276 |
METHODS NAMES #IMPLIED |
1277 |
CHARSET NAME #IMPLIED |
1278 |
"> |
1279 |
|
1280 |
<![ %HTML.Recommended [ |
1281 |
<!ENTITY % A.content "(%text)*" |
1282 |
-- <H1><a name="xxx">Heading</a></H1> |
1283 |
is preferred to |
1284 |
|
1285 |
|
1286 |
|
1287 |
Expires 2 December 1996 [Page 23] |
1288 |
|
1289 |
Internet Draft HTML internationalization 27 May 1996 |
1290 |
|
1291 |
|
1292 |
<a name="xxx"><H1>Heading</H1></a> |
1293 |
--> |
1294 |
]]> |
1295 |
|
1296 |
<!ENTITY % A.content "(%heading|%text)*"> |
1297 |
|
1298 |
<!ELEMENT A - - %A.content -(A)> |
1299 |
<!ATTLIST A |
1300 |
%attrs; |
1301 |
HREF CDATA #IMPLIED |
1302 |
NAME CDATA #IMPLIED |
1303 |
%linkExtraAttributes; |
1304 |
%SDAPREF; "<Anchor: #AttList>" |
1305 |
> |
1306 |
<!-- <A> Anchor; source/destination of link --> |
1307 |
<!-- <A NAME="..."> Name of this anchor --> |
1308 |
<!-- <A HREF="..."> Address of link destination --> |
1309 |
<!-- <A URN="..."> Permanent address of destination --> |
1310 |
<!-- <A REL=...> Relationship to destination --> |
1311 |
<!-- <A REV=...> Relationship of destination to this --> |
1312 |
<!-- <A TITLE="..."> Title of destination (advisory) --> |
1313 |
<!-- <A METHODS="..."> Operations on destination (advisory) --> |
1314 |
<!-- <A CHARSET="..."> Charset of destination (advisory) --> |
1315 |
<!-- <A LANG="..."> Language of contents btw <A> and </A> --> |
1316 |
<!-- <A DIR=...> Contents is a new counterflow embedding --> |
1317 |
|
1318 |
<!--========== Images ==========================--> |
1319 |
|
1320 |
<!ELEMENT IMG - O EMPTY> |
1321 |
<!ATTLIST IMG |
1322 |
%attrs; |
1323 |
SRC CDATA #REQUIRED |
1324 |
ALT CDATA #IMPLIED |
1325 |
ALIGN (top|middle|bottom) #IMPLIED |
1326 |
ISMAP (ISMAP) #IMPLIED |
1327 |
%SDAPREF; "<Fig><?SDATrans Img: #AttList>#AttVal(Alt)</Fig>" |
1328 |
> |
1329 |
|
1330 |
<!-- <IMG> Image; icon, glyph or illustration --> |
1331 |
<!-- <IMG SRC="..."> Address of image object --> |
1332 |
<!-- <IMG ALT="..."> Textual alternative --> |
1333 |
<!-- <IMG ALIGN=...> Position relative to text --> |
1334 |
<!-- <IMG LANG=...> Image contains "text" in that language --> |
1335 |
<!-- <IMG DIR=rtl> Inline image acts as a right-to-left |
1336 |
embedding w/r to BIDI algorithm --> |
1337 |
<!-- <IMG ISMAP> Each pixel can be a link --> |
1338 |
|
1339 |
<!--========== Paragraphs=======================--> |
1340 |
|
1341 |
|
1342 |
|
1343 |
Expires 2 December 1996 [Page 24] |
1344 |
|
1345 |
Internet Draft HTML internationalization 27 May 1996 |
1346 |
|
1347 |
|
1348 |
<!ELEMENT P - O (%text)*> |
1349 |
<!ATTLIST P |
1350 |
%attrs; |
1351 |
%just; |
1352 |
%SDAFORM; "Para" |
1353 |
> |
1354 |
|
1355 |
<!-- <P> Paragraph --> |
1356 |
<!-- <P LANG="..."> Language of paragraph text --> |
1357 |
<!-- <P DIR=...> Base directionality of paragraph --> |
1358 |
<!-- <P ALIGN=...> Paragraph alignment (justification) --> |
1359 |
|
1360 |
<!--========== Headings, Titles, Sections ===============--> |
1361 |
|
1362 |
<!ELEMENT HR - O EMPTY> |
1363 |
<!ATTLIST HR |
1364 |
%just; |
1365 |
%SDAPREF; "&#RE;&#RE;" |
1366 |
> |
1367 |
|
1368 |
<!-- <HR> Horizontal rule --> |
1369 |
|
1370 |
<!ELEMENT ( %heading ) - - (%text;)*> |
1371 |
<!ATTLIST H1 |
1372 |
%attrs; |
1373 |
%just; |
1374 |
%SDAFORM; "H1" |
1375 |
> |
1376 |
<!ATTLIST H2 |
1377 |
%attrs; |
1378 |
%just; |
1379 |
%SDAFORM; "H2" |
1380 |
> |
1381 |
<!ATTLIST H3 |
1382 |
%attrs; |
1383 |
%just; |
1384 |
%SDAFORM; "H3" |
1385 |
> |
1386 |
<!ATTLIST H4 |
1387 |
%attrs; |
1388 |
%just; |
1389 |
%SDAFORM; "H4" |
1390 |
> |
1391 |
<!ATTLIST H5 |
1392 |
%attrs; |
1393 |
%just; |
1394 |
%SDAFORM; "H5" |
1395 |
> |
1396 |
|
1397 |
|
1398 |
|
1399 |
Expires 2 December 1996 [Page 25] |
1400 |
|
1401 |
Internet Draft HTML internationalization 27 May 1996 |
1402 |
|
1403 |
|
1404 |
<!ATTLIST H6 |
1405 |
%attrs; |
1406 |
%just; |
1407 |
%SDAFORM; "H6" |
1408 |
> |
1409 |
|
1410 |
<!-- <H1> Heading, level 1 --> |
1411 |
<!-- <H2> Heading, level 2 --> |
1412 |
<!-- <H3> Heading, level 3 --> |
1413 |
<!-- <H4> Heading, level 4 --> |
1414 |
<!-- <H5> Heading, level 5 --> |
1415 |
<!-- <H6> Heading, level 6 --> |
1416 |
|
1417 |
|
1418 |
<!--========== Text Flows ======================--> |
1419 |
|
1420 |
<![ %HTML.Forms [ |
1421 |
<!ENTITY % block.forms "BLOCKQUOTE | FORM | ISINDEX"> |
1422 |
]]> |
1423 |
|
1424 |
<!ENTITY % block.forms "BLOCKQUOTE"> |
1425 |
|
1426 |
<![ %HTML.Deprecated [ |
1427 |
<!ENTITY % preformatted "PRE | XMP | LISTING"> |
1428 |
]]> |
1429 |
|
1430 |
<!ENTITY % preformatted "PRE"> |
1431 |
|
1432 |
<!ENTITY % block "P | %list | DL |
1433 |
| %preformatted |
1434 |
| %block.forms"> |
1435 |
|
1436 |
<!ENTITY % flow "(%text|%block)*"> |
1437 |
|
1438 |
<!ENTITY % pre.content "#PCDATA | A | HR | BR | SPAN | BDO"> |
1439 |
<!ELEMENT PRE - - (%pre.content)*> |
1440 |
<!ATTLIST PRE |
1441 |
%attrs; |
1442 |
WIDTH NUMBER #implied |
1443 |
%SDAFORM; "Lit" |
1444 |
> |
1445 |
|
1446 |
<!-- <PRE> Preformatted text --> |
1447 |
<!-- <PRE WIDTH=...> Maximum characters per line --> |
1448 |
<!-- <PRE DIR=...> Base direction of preformatted block --> |
1449 |
<!-- <PRE LANG=...> Language of contents --> |
1450 |
|
1451 |
<![ %HTML.Deprecated [ |
1452 |
|
1453 |
|
1454 |
|
1455 |
Expires 2 December 1996 [Page 26] |
1456 |
|
1457 |
Internet Draft HTML internationalization 27 May 1996 |
1458 |
|
1459 |
|
1460 |
<!ENTITY % literal "CDATA" |
1461 |
-- historical, non-conforming parsing mode where |
1462 |
the only markup signal is the end tag |
1463 |
in full |
1464 |
--> |
1465 |
|
1466 |
<!ELEMENT (XMP|LISTING) - - %literal> |
1467 |
<!ATTLIST XMP |
1468 |
%attrs; |
1469 |
%SDAFORM; "Lit" |
1470 |
%SDAPREF; "Example:&#RE;" |
1471 |
> |
1472 |
<!ATTLIST LISTING |
1473 |
%attrs; |
1474 |
%SDAFORM; "Lit" |
1475 |
%SDAPREF; "Listing:&#RE;" |
1476 |
> |
1477 |
|
1478 |
<!-- <XMP> Example section --> |
1479 |
<!-- <LISTING> Computer listing --> |
1480 |
|
1481 |
<!ELEMENT PLAINTEXT - O %literal> |
1482 |
<!-- <PLAINTEXT> Plain text passage --> |
1483 |
|
1484 |
<!ATTLIST PLAINTEXT |
1485 |
%attrs; |
1486 |
%SDAFORM; "Lit" |
1487 |
> |
1488 |
]]> |
1489 |
|
1490 |
|
1491 |
<!--========== Lists ==================--> |
1492 |
|
1493 |
<!ELEMENT DL - - (DT | DD)+> |
1494 |
<!ATTLIST DL |
1495 |
%attrs; |
1496 |
COMPACT (COMPACT) #IMPLIED |
1497 |
%SDAFORM; "List" |
1498 |
%SDAPREF; "Definition List:" |
1499 |
> |
1500 |
|
1501 |
<!ELEMENT DT - O (%text)*> |
1502 |
<!ATTLIST DT |
1503 |
%attrs; |
1504 |
%SDAFORM; "Term" |
1505 |
> |
1506 |
|
1507 |
<!ELEMENT DD - O %flow> |
1508 |
|
1509 |
|
1510 |
|
1511 |
Expires 2 December 1996 [Page 27] |
1512 |
|
1513 |
Internet Draft HTML internationalization 27 May 1996 |
1514 |
|
1515 |
|
1516 |
<!ATTLIST DD |
1517 |
%attrs; |
1518 |
%SDAFORM; "LItem" |
1519 |
> |
1520 |
|
1521 |
<!-- <DL> Definition list, or glossary --> |
1522 |
<!-- <DL COMPACT> Compact style list --> |
1523 |
<!-- <DT> Term in definition list --> |
1524 |
<!-- <DD> Definition of term --> |
1525 |
|
1526 |
<!ELEMENT (OL|UL) - - (LI)+> |
1527 |
<!ATTLIST OL |
1528 |
%attrs; |
1529 |
%just; |
1530 |
COMPACT (COMPACT) #IMPLIED |
1531 |
%SDAFORM; "List" |
1532 |
> |
1533 |
<!ATTLIST UL |
1534 |
%attrs; |
1535 |
%just; |
1536 |
COMPACT (COMPACT) #IMPLIED |
1537 |
%SDAFORM; "List" |
1538 |
> |
1539 |
<!-- <UL> Unordered list --> |
1540 |
<!-- <UL COMPACT> Compact list style --> |
1541 |
<!-- <OL> Ordered, or numbered list --> |
1542 |
<!-- <OL COMPACT> Compact list style --> |
1543 |
|
1544 |
|
1545 |
<!ELEMENT (DIR|MENU) - - (LI)+ -(%block)> |
1546 |
<!ATTLIST DIR |
1547 |
%attrs; |
1548 |
%just; |
1549 |
COMPACT (COMPACT) #IMPLIED |
1550 |
%SDAFORM; "List" |
1551 |
%SDAPREF; "<LHead>Directory</LHead>" |
1552 |
> |
1553 |
<!ATTLIST MENU |
1554 |
%attrs; |
1555 |
%just; |
1556 |
COMPACT (COMPACT) #IMPLIED |
1557 |
%SDAFORM; "List" |
1558 |
%SDAPREF; "<LHead>Menu</LHead>" |
1559 |
> |
1560 |
|
1561 |
<!-- <DIR> Directory list --> |
1562 |
<!-- <DIR COMPACT> Compact list style --> |
1563 |
<!-- <MENU> Menu list --> |
1564 |
|
1565 |
|
1566 |
|
1567 |
Expires 2 December 1996 [Page 28] |
1568 |
|
1569 |
Internet Draft HTML internationalization 27 May 1996 |
1570 |
|
1571 |
|
1572 |
<!-- <MENU COMPACT> Compact list style --> |
1573 |
|
1574 |
<!ELEMENT LI - O %flow> |
1575 |
<!ATTLIST LI |
1576 |
%attrs; |
1577 |
%just; |
1578 |
%SDAFORM; "LItem" |
1579 |
> |
1580 |
|
1581 |
<!-- <LI> List item --> |
1582 |
|
1583 |
<!--========== Document Body ===================--> |
1584 |
|
1585 |
<![ %HTML.Recommended [ |
1586 |
<!ENTITY % body.content "(%heading|%block|HR|ADDRESS|IMG)*" |
1587 |
-- <h1>Heading</h1> |
1588 |
<p>Text ... |
1589 |
is preferred to |
1590 |
<h1>Heading</h1> |
1591 |
Text ... |
1592 |
--> |
1593 |
]]> |
1594 |
|
1595 |
<!ENTITY % body.content "(%heading | %text | %block | |
1596 |
HR | ADDRESS)*"> |
1597 |
|
1598 |
<!ELEMENT BODY O O %body.content> |
1599 |
<!ATTLIST BODY |
1600 |
%attrs; |
1601 |
> |
1602 |
|
1603 |
<!-- <BODY> Document body --> |
1604 |
<!-- <BODY DIR=...> Base direction of whole body --> |
1605 |
<!-- <BODY LANG=...> Language of contents --> |
1606 |
|
1607 |
<!ELEMENT BLOCKQUOTE - - %body.content> |
1608 |
<!ATTLIST BLOCKQUOTE |
1609 |
%attrs; |
1610 |
%just; |
1611 |
%SDAFORM; "BQ" |
1612 |
> |
1613 |
|
1614 |
<!-- <BLOCKQUOTE> Quoted passage --> |
1615 |
|
1616 |
<!ELEMENT ADDRESS - - (%text|P)*> |
1617 |
<!ATTLIST ADDRESS |
1618 |
%attrs; |
1619 |
%just; |
1620 |
|
1621 |
|
1622 |
|
1623 |
Expires 2 December 1996 [Page 29] |
1624 |
|
1625 |
Internet Draft HTML internationalization 27 May 1996 |
1626 |
|
1627 |
|
1628 |
%SDAFORM; "Lit" |
1629 |
%SDAPREF; "Address:&#RE;" |
1630 |
> |
1631 |
|
1632 |
<!-- <ADDRESS> Address, signature, or byline --> |
1633 |
|
1634 |
|
1635 |
<!--======= Forms ====================--> |
1636 |
|
1637 |
<![ %HTML.Forms [ |
1638 |
|
1639 |
<!ELEMENT FORM - - %body.content -(FORM) +(INPUT|SELECT|TEXTAREA)> |
1640 |
<!ATTLIST FORM |
1641 |
%attrs; |
1642 |
ACTION CDATA #IMPLIED |
1643 |
METHOD (%HTTP-Method) GET |
1644 |
ENCTYPE %Content-Type; "application/x-www-form-urlencoded" |
1645 |
%SDAPREF; "<Para>Form:</Para>" |
1646 |
%SDASUFF; "<Para>Form End.</Para>" |
1647 |
> |
1648 |
|
1649 |
<!-- <FORM> Fill-out or data-entry form --> |
1650 |
<!-- <FORM ACTION="..."> Address for completed form --> |
1651 |
<!-- <FORM METHOD=...> Method of submitting form --> |
1652 |
<!-- <FORM ENCTYPE="..."> Representation of form data --> |
1653 |
<!-- <FORM DIR=...> Base direction of form --> |
1654 |
<!-- <FORM LANG=...> Language of contents --> |
1655 |
|
1656 |
<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | |
1657 |
RADIO | SUBMIT | RESET | |
1658 |
IMAGE | HIDDEN | FILE )"> |
1659 |
<!ELEMENT INPUT - O EMPTY> |
1660 |
<!ATTLIST INPUT |
1661 |
%attrs; |
1662 |
TYPE %InputType TEXT |
1663 |
NAME CDATA #IMPLIED |
1664 |
VALUE CDATA #IMPLIED |
1665 |
SRC CDATA #IMPLIED |
1666 |
CHECKED (CHECKED) #IMPLIED |
1667 |
SIZE CDATA #IMPLIED |
1668 |
MAXLENGTH NUMBER #IMPLIED |
1669 |
ALIGN (top|middle|bottom) #IMPLIED |
1670 |
ACCEPT CDATA #IMPLIED --list of content types -- |
1671 |
ACCEPT-CHARSET CDATA #IMPLIED --list of charsets accepted by server -- |
1672 |
%SDAPREF; "Input: " |
1673 |
> |
1674 |
|
1675 |
<!-- <INPUT> Form input datum --> |
1676 |
|
1677 |
|
1678 |
|
1679 |
Expires 2 December 1996 [Page 30] |
1680 |
|
1681 |
Internet Draft HTML internationalization 27 May 1996 |
1682 |
|
1683 |
|
1684 |
<!-- <INPUT TYPE=...> Type of input interaction --> |
1685 |
<!-- <INPUT NAME=...> Name of form datum --> |
1686 |
<!-- <INPUT VALUE="..."> Default/initial/selected value --> |
1687 |
<!-- <INPUT SRC="..."> Address of image --> |
1688 |
<!-- <INPUT CHECKED> Initial state is "on" --> |
1689 |
<!-- <INPUT SIZE=...> Field size hint --> |
1690 |
<!-- <INPUT MAXLENGTH=...> Data length maximum --> |
1691 |
<!-- <INPUT ALIGN=...> Image alignment --> |
1692 |
<!-- <INPUT ACCEPT="..."> List of desired media types --> |
1693 |
<!-- <INPUT ACCEPT-CHARSET="..."> List of acceptable charsets --> |
1694 |
|
1695 |
<!ELEMENT SELECT - - (OPTION+) -(INPUT|SELECT|TEXTAREA)> |
1696 |
<!ATTLIST SELECT |
1697 |
%attrs; |
1698 |
NAME CDATA #REQUIRED |
1699 |
SIZE NUMBER #IMPLIED |
1700 |
MULTIPLE (MULTIPLE) #IMPLIED |
1701 |
%SDAFORM; "List" |
1702 |
%SDAPREF; |
1703 |
"<LHead>Select #AttVal(Multiple)</LHead>" |
1704 |
> |
1705 |
|
1706 |
<!-- <SELECT> Selection of option(s) --> |
1707 |
<!-- <SELECT NAME=...> Name of form datum --> |
1708 |
<!-- <SELECT SIZE=...> Options displayed at a time --> |
1709 |
<!-- <SELECT MULTIPLE> Multiple selections allowed --> |
1710 |
|
1711 |
<!ELEMENT OPTION - O (#PCDATA)*> |
1712 |
<!ATTLIST OPTION |
1713 |
%attrs; |
1714 |
SELECTED (SELECTED) #IMPLIED |
1715 |
VALUE CDATA #IMPLIED |
1716 |
%SDAFORM; "LItem" |
1717 |
%SDAPREF; |
1718 |
"Option: #AttVal(Value) #AttVal(Selected)" |
1719 |
> |
1720 |
|
1721 |
<!-- <OPTION> A selection option --> |
1722 |
<!-- <OPTION SELECTED> Initial state --> |
1723 |
<!-- <OPTION VALUE="..."> Form datum value for this option--> |
1724 |
|
1725 |
<!ELEMENT TEXTAREA - - (#PCDATA)* -(INPUT|SELECT|TEXTAREA)> |
1726 |
<!ATTLIST TEXTAREA |
1727 |
%attrs; |
1728 |
NAME CDATA #REQUIRED |
1729 |
ROWS NUMBER #REQUIRED |
1730 |
COLS NUMBER #REQUIRED |
1731 |
ACCEPT-CHARSET CDATA #IMPLIED -- list of charsets accepted by server -- |
1732 |
|
1733 |
|
1734 |
|
1735 |
Expires 2 December 1996 [Page 31] |
1736 |
|
1737 |
Internet Draft HTML internationalization 27 May 1996 |
1738 |
|
1739 |
|
1740 |
%SDAFORM; "Para" |
1741 |
%SDAPREF; "Input Text -- #AttVal(Name): " |
1742 |
> |
1743 |
|
1744 |
<!-- <TEXTAREA> An area for text input --> |
1745 |
<!-- <TEXTAREA NAME=...> Name of form datum --> |
1746 |
<!-- <TEXTAREA ROWS=...> Height of area --> |
1747 |
<!-- <TEXTAREA COLS=...> Width of area --> |
1748 |
|
1749 |
]]> |
1750 |
|
1751 |
|
1752 |
<!--======= Document Head ======================--> |
1753 |
|
1754 |
<![ %HTML.Recommended [ |
1755 |
<!ENTITY % head.extra ""> |
1756 |
]]> |
1757 |
<!ENTITY % head.extra "& NEXTID?"> |
1758 |
|
1759 |
<!ENTITY % head.content "TITLE & ISINDEX? & BASE? %head.extra"> |
1760 |
|
1761 |
<!ELEMENT HEAD O O (%head.content) +(META|LINK)> |
1762 |
<!ATTLIST HEAD |
1763 |
%attrs; > |
1764 |
|
1765 |
<!-- <HEAD> Document head --> |
1766 |
|
1767 |
<!ELEMENT TITLE - - (#PCDATA)* -(META|LINK)> |
1768 |
<!ATTLIST TITLE |
1769 |
%attrs; |
1770 |
%SDAFORM; "Ti" > |
1771 |
|
1772 |
<!-- <TITLE> Title of document --> |
1773 |
|
1774 |
<!ELEMENT LINK - O EMPTY> |
1775 |
<!ATTLIST LINK |
1776 |
%attrs; |
1777 |
HREF CDATA #REQUIRED |
1778 |
%linkExtraAttributes; |
1779 |
%SDAPREF; "Linked to : #AttVal (TITLE) (URN) (HREF)>" > |
1780 |
|
1781 |
<!-- <LINK> Link from this document --> |
1782 |
<!-- <LINK HREF="..."> Address of link destination --> |
1783 |
<!-- <LINK URN="..."> Lasting name of destination --> |
1784 |
<!-- <LINK REL=...> Relationship to destination --> |
1785 |
<!-- <LINK REV=...> Relationship of destination to this --> |
1786 |
<!-- <LINK TITLE="..."> Title of destination (advisory) --> |
1787 |
<!-- <LINK CHARSET="..."> Charset of destination (advisory) --> |
1788 |
|
1789 |
|
1790 |
|
1791 |
Expires 2 December 1996 [Page 32] |
1792 |
|
1793 |
Internet Draft HTML internationalization 27 May 1996 |
1794 |
|
1795 |
|
1796 |
<!-- <LINK METHODS="..."> Operations allowed (advisory) --> |
1797 |
|
1798 |
<!ELEMENT ISINDEX - O EMPTY> |
1799 |
<!ATTLIST ISINDEX |
1800 |
%attrs; |
1801 |
%SDAPREF; |
1802 |
"<Para>[Document is indexed/searchable.]</Para>"> |
1803 |
|
1804 |
<!-- <ISINDEX> Document is a searchable index --> |
1805 |
|
1806 |
<!ELEMENT BASE - O EMPTY> |
1807 |
<!ATTLIST BASE |
1808 |
HREF CDATA #REQUIRED > |
1809 |
|
1810 |
<!-- <BASE> Base context document --> |
1811 |
<!-- <BASE HREF="..."> Address for this document --> |
1812 |
|
1813 |
<!ELEMENT NEXTID - O EMPTY> |
1814 |
<!ATTLIST NEXTID |
1815 |
N CDATA #REQUIRED > |
1816 |
|
1817 |
<!-- <NEXTID> Next ID to use for link name --> |
1818 |
<!-- <NEXTID N=...> Next ID to use for link name --> |
1819 |
|
1820 |
<!ELEMENT META - O EMPTY> |
1821 |
<!ATTLIST META |
1822 |
HTTP-EQUIV NAME #IMPLIED |
1823 |
NAME NAME #IMPLIED |
1824 |
CONTENT CDATA #REQUIRED > |
1825 |
|
1826 |
<!-- <META> Generic Meta-information --> |
1827 |
<!-- <META HTTP-EQUIV=...> HTTP response header name --> |
1828 |
<!-- <META NAME=...> Meta-information name --> |
1829 |
<!-- <META CONTENT="..."> Associated information --> |
1830 |
|
1831 |
<!--======= Document Structure =================--> |
1832 |
|
1833 |
<![ %HTML.Deprecated [ |
1834 |
<!ENTITY % html.content "HEAD, BODY, PLAINTEXT?"> |
1835 |
]]> |
1836 |
<!ENTITY % html.content "HEAD, BODY"> |
1837 |
|
1838 |
<!ELEMENT HTML O O (%html.content)> |
1839 |
<!ENTITY % version.attr "VERSION CDATA #FIXED '%HTML.Version;'"> |
1840 |
|
1841 |
<!ATTLIST HTML |
1842 |
%attrs; |
1843 |
%version.attr; |
1844 |
|
1845 |
|
1846 |
|
1847 |
Expires 2 December 1996 [Page 33] |
1848 |
|
1849 |
Internet Draft HTML internationalization 27 May 1996 |
1850 |
|
1851 |
|
1852 |
%SDAFORM; "Book" |
1853 |
> |
1854 |
|
1855 |
<!-- <HTML> HTML Document --> |
1856 |
|
1857 |
|
1858 |
7.2. SGML Declaration for HTML |
1859 |
|
1860 |
<!SGML "ISO 8879:1986" |
1861 |
-- |
1862 |
SGML Declaration for HyperText Markup Language version 2.x |
1863 |
(HTML 2.x = HTML 2.0 + i18n). |
1864 |
|
1865 |
-- |
1866 |
|
1867 |
CHARSET |
1868 |
BASESET "ISO Registration Number 177//CHARSET |
1869 |
ISO/IEC 10646-1:1993 UCS-4 with |
1870 |
implementation level 3//ESC 2/5 2/15 4/6" |
1871 |
DESCSET 0 9 UNUSED |
1872 |
9 2 9 |
1873 |
11 2 UNUSED |
1874 |
13 1 13 |
1875 |
14 18 UNUSED |
1876 |
32 95 32 |
1877 |
127 1 UNUSED |
1878 |
128 32 UNUSED |
1879 |
160 2147483486 160 |
1880 |
|
1881 |
CAPACITY SGMLREF |
1882 |
TOTALCAP 150000 |
1883 |
GRPCAP 150000 |
1884 |
ENTCAP 150000 |
1885 |
|
1886 |
SCOPE DOCUMENT |
1887 |
SYNTAX |
1888 |
SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
1889 |
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 |
1890 |
BASESET "ISO 646:1983//CHARSET |
1891 |
International Reference Version |
1892 |
(IRV)//ESC 2/5 4/0" |
1893 |
DESCSET 0 128 0 |
1894 |
|
1895 |
FUNCTION |
1896 |
RE 13 |
1897 |
RS 10 |
1898 |
SPACE 32 |
1899 |
TAB SEPCHAR 9 |
1900 |
|
1901 |
|
1902 |
|
1903 |
Expires 2 December 1996 [Page 34] |
1904 |
|
1905 |
Internet Draft HTML internationalization 27 May 1996 |
1906 |
|
1907 |
|
1908 |
NAMING LCNMSTRT "" |
1909 |
UCNMSTRT "" |
1910 |
LCNMCHAR ".-" |
1911 |
UCNMCHAR ".-" |
1912 |
NAMECASE GENERAL YES |
1913 |
ENTITY NO |
1914 |
DELIM GENERAL SGMLREF |
1915 |
SHORTREF SGMLREF |
1916 |
NAMES SGMLREF |
1917 |
QUANTITY SGMLREF |
1918 |
ATTSPLEN 2100 |
1919 |
LITLEN 1024 |
1920 |
NAMELEN 72 -- somewhat arbitrary; taken from |
1921 |
internet line length conventions -- |
1922 |
PILEN 1024 |
1923 |
TAGLVL 100 |
1924 |
TAGLEN 2100 |
1925 |
GRPGTCNT 150 |
1926 |
GRPCNT 64 |
1927 |
|
1928 |
FEATURES |
1929 |
MINIMIZE |
1930 |
DATATAG NO |
1931 |
OMITTAG YES |
1932 |
RANK NO |
1933 |
SHORTTAG YES |
1934 |
LINK |
1935 |
SIMPLE NO |
1936 |
IMPLICIT NO |
1937 |
EXPLICIT NO |
1938 |
OTHER |
1939 |
CONCUR NO |
1940 |
SUBDOC NO |
1941 |
FORMAL YES |
1942 |
APPINFO "SDA" -- conforming SGML Document Access application |
1943 |
-- |
1944 |
> |
1945 |
|
1946 |
|
1947 |
7.3. ISO Latin 1 entity set |
1948 |
|
1949 |
The following public text lists each of the characters specified in |
1950 |
the Added Latin 1 entity set, along with its name, syntax for use, |
1951 |
and description. This list is derived from ISO Standard |
1952 |
8879:1986//ENTITIES Added Latin 1//EN. HTML includes the entire |
1953 |
entity set, and adds entities for all missing characters in the right |
1954 |
part of ISO-8859-1. |
1955 |
|
1956 |
|
1957 |
|
1958 |
|
1959 |
Expires 2 December 1996 [Page 35] |
1960 |
|
1961 |
Internet Draft HTML internationalization 27 May 1996 |
1962 |
|
1963 |
|
1964 |
<!-- (C) International Organization for Standardization 1986 |
1965 |
Permission to copy in any form is granted for use with |
1966 |
conforming SGML systems and applications as defined in |
1967 |
ISO 8879, provided this notice is included in all copies. |
1968 |
--> |
1969 |
<!-- Character entity set. Typical invocation: |
1970 |
<!ENTITY % ISOlat1 PUBLIC |
1971 |
"ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML"> |
1972 |
%ISOlat1; |
1973 |
--> |
1974 |
<!ENTITY nbsp CDATA " " -- no-break space --> |
1975 |
<!ENTITY iexcl CDATA "¡" -- inverted exclamation mark --> |
1976 |
<!ENTITY cent CDATA "¢" -- cent sign --> |
1977 |
<!ENTITY pound CDATA "£" -- pound sterling sign --> |
1978 |
<!ENTITY curren CDATA "¤" -- general currency sign --> |
1979 |
<!ENTITY yen CDATA "¥" -- yen sign --> |
1980 |
<!ENTITY brvbar CDATA "¦" -- broken (vertical) bar --> |
1981 |
<!ENTITY sect CDATA "§" -- section sign --> |
1982 |
<!ENTITY uml CDATA "¨" -- umlaut (dieresis) --> |
1983 |
<!ENTITY copy CDATA "©" -- copyright sign --> |
1984 |
<!ENTITY ordf CDATA "ª" -- ordinal indicator, feminine --> |
1985 |
<!ENTITY laquo CDATA "«" -- angle quotation mark, left --> |
1986 |
<!ENTITY not CDATA "¬" -- not sign --> |
1987 |
<!ENTITY shy CDATA "­" -- soft hyphen --> |
1988 |
<!ENTITY reg CDATA "®" -- registered sign --> |
1989 |
<!ENTITY macr CDATA "¯" -- macron --> |
1990 |
<!ENTITY deg CDATA "°" -- degree sign --> |
1991 |
<!ENTITY plusmn CDATA "±" -- plus-or-minus sign --> |
1992 |
<!ENTITY sup2 CDATA "²" -- superscript two --> |
1993 |
<!ENTITY sup3 CDATA "³" -- superscript three --> |
1994 |
<!ENTITY acute CDATA "´" -- acute accent --> |
1995 |
<!ENTITY micro CDATA "µ" -- micro sign --> |
1996 |
<!ENTITY para CDATA "¶" -- pilcrow (paragraph sign) --> |
1997 |
<!ENTITY middot CDATA "·" -- middle dot --> |
1998 |
<!ENTITY cedil CDATA "¸" -- cedilla --> |
1999 |
<!ENTITY sup1 CDATA "¹" -- superscript one --> |
2000 |
<!ENTITY ordm CDATA "º" -- ordinal indicator, masculine --> |
2001 |
<!ENTITY raquo CDATA "»" -- angle quotation mark, right --> |
2002 |
<!ENTITY frac14 CDATA "¼" -- fraction one-quarter --> |
2003 |
<!ENTITY frac12 CDATA "½" -- fraction one-half --> |
2004 |
<!ENTITY frac34 CDATA "¾" -- fraction three-quarters --> |
2005 |
<!ENTITY iquest CDATA "¿" -- inverted question mark --> |
2006 |
<!ENTITY Agrave CDATA "À" -- capital A, grave accent --> |
2007 |
<!ENTITY Aacute CDATA "Á" -- capital A, acute accent --> |
2008 |
<!ENTITY Acirc CDATA "Â" -- capital A, circumflex accent --> |
2009 |
<!ENTITY Atilde CDATA "Ã" -- capital A, tilde --> |
2010 |
<!ENTITY Auml CDATA "Ä" -- capital A, dieresis or umlaut mark --> |
2011 |
<!ENTITY Aring CDATA "Å" -- capital A, ring --> |
2012 |
|
2013 |
|
2014 |
|
2015 |
Expires 2 December 1996 [Page 36] |
2016 |
|
2017 |
Internet Draft HTML internationalization 27 May 1996 |
2018 |
|
2019 |
|
2020 |
<!ENTITY AElig CDATA "Æ" -- capital AE diphthong (ligature) --> |
2021 |
<!ENTITY Ccedil CDATA "Ç" -- capital C, cedilla --> |
2022 |
<!ENTITY Egrave CDATA "È" -- capital E, grave accent --> |
2023 |
<!ENTITY Eacute CDATA "É" -- capital E, acute accent --> |
2024 |
<!ENTITY Ecirc CDATA "Ê" -- capital E, circumflex accent --> |
2025 |
<!ENTITY Euml CDATA "Ë" -- capital E, dieresis or umlaut mark --> |
2026 |
<!ENTITY Igrave CDATA "Ì" -- capital I, grave accent --> |
2027 |
<!ENTITY Iacute CDATA "Í" -- capital I, acute accent --> |
2028 |
<!ENTITY Icirc CDATA "Î" -- capital I, circumflex accent --> |
2029 |
<!ENTITY Iuml CDATA "Ï" -- capital I, dieresis or umlaut mark --> |
2030 |
<!ENTITY ETH CDATA "Ð" -- capital Eth, Icelandic --> |
2031 |
<!ENTITY Ntilde CDATA "Ñ" -- capital N, tilde --> |
2032 |
<!ENTITY Ograve CDATA "Ò" -- capital O, grave accent --> |
2033 |
<!ENTITY Oacute CDATA "Ó" -- capital O, acute accent --> |
2034 |
<!ENTITY Ocirc CDATA "Ô" -- capital O, circumflex accent --> |
2035 |
<!ENTITY Otilde CDATA "Õ" -- capital O, tilde --> |
2036 |
<!ENTITY Ouml CDATA "Ö" -- capital O, dieresis or umlaut mark --> |
2037 |
<!ENTITY times CDATA "×" -- multiply sign --> |
2038 |
<!ENTITY Oslash CDATA "Ø" -- capital O, slash --> |
2039 |
<!ENTITY Ugrave CDATA "Ù" -- capital U, grave accent --> |
2040 |
<!ENTITY Uacute CDATA "Ú" -- capital U, acute accent --> |
2041 |
<!ENTITY Ucirc CDATA "Û" -- capital U, circumflex accent --> |
2042 |
<!ENTITY Uuml CDATA "Ü" -- capital U, dieresis or umlaut mark --> |
2043 |
<!ENTITY Yacute CDATA "Ý" -- capital Y, acute accent --> |
2044 |
<!ENTITY THORN CDATA "Þ" -- capital Thorn, Icelandic --> |
2045 |
<!ENTITY szlig CDATA "ß" -- small sharp s, German (sz ligature) --> |
2046 |
<!ENTITY agrave CDATA "à" -- small a, grave accent --> |
2047 |
<!ENTITY aacute CDATA "á" -- small a, acute accent --> |
2048 |
<!ENTITY acirc CDATA "â" -- small a, circumflex accent --> |
2049 |
<!ENTITY atilde CDATA "ã" -- small a, tilde --> |
2050 |
<!ENTITY auml CDATA "ä" -- small a, dieresis or umlaut mark --> |
2051 |
<!ENTITY aring CDATA "å" -- small a, ring --> |
2052 |
<!ENTITY aelig CDATA "æ" -- small ae diphthong (ligature) --> |
2053 |
<!ENTITY ccedil CDATA "ç" -- small c, cedilla --> |
2054 |
<!ENTITY egrave CDATA "è" -- small e, grave accent --> |
2055 |
<!ENTITY eacute CDATA "é" -- small e, acute accent --> |
2056 |
<!ENTITY ecirc CDATA "ê" -- small e, circumflex accent --> |
2057 |
<!ENTITY euml CDATA "ë" -- small e, dieresis or umlaut mark --> |
2058 |
<!ENTITY igrave CDATA "ì" -- small i, grave accent --> |
2059 |
<!ENTITY iacute CDATA "í" -- small i, acute accent --> |
2060 |
<!ENTITY icirc CDATA "î" -- small i, circumflex accent --> |
2061 |
<!ENTITY iuml CDATA "ï" -- small i, dieresis or umlaut mark --> |
2062 |
<!ENTITY eth CDATA "ð" -- small eth, Icelandic --> |
2063 |
<!ENTITY ntilde CDATA "ñ" -- small n, tilde --> |
2064 |
<!ENTITY ograve CDATA "ò" -- small o, grave accent --> |
2065 |
<!ENTITY oacute CDATA "ó" -- small o, acute accent --> |
2066 |
<!ENTITY ocirc CDATA "ô" -- small o, circumflex accent --> |
2067 |
<!ENTITY otilde CDATA "õ" -- small o, tilde --> |
2068 |
|
2069 |
|
2070 |
|
2071 |
Expires 2 December 1996 [Page 37] |
2072 |
|
2073 |
Internet Draft HTML internationalization 27 May 1996 |
2074 |
|
2075 |
|
2076 |
<!ENTITY ouml CDATA "ö" -- small o, dieresis or umlaut mark --> |
2077 |
<!ENTITY divide CDATA "÷" -- divide sign --> |
2078 |
<!ENTITY oslash CDATA "ø" -- small o, slash --> |
2079 |
<!ENTITY ugrave CDATA "ù" -- small u, grave accent --> |
2080 |
<!ENTITY uacute CDATA "ú" -- small u, acute accent --> |
2081 |
<!ENTITY ucirc CDATA "û" -- small u, circumflex accent --> |
2082 |
<!ENTITY uuml CDATA "ü" -- small u, dieresis or umlaut mark --> |
2083 |
<!ENTITY yacute CDATA "ý" -- small y, acute accent --> |
2084 |
<!ENTITY thorn CDATA "þ" -- small thorn, Icelandic --> |
2085 |
<!ENTITY yuml CDATA "ÿ" -- small y, dieresis or umlaut mark --> |
2086 |
|
2087 |
|
2088 |
Bibliography |
2089 |
|
2090 |
[BRYAN88] M. Bryan, "SGML -- An Author's Guide to the Standard |
2091 |
Generalized Markup Language", Addison-Wesley, Reading, |
2092 |
1988. |
2093 |
|
2094 |
[ERCS] Extended Reference Concrete Syntax for SGML. |
2095 |
<http://www.sgmlopen.org/sgml/docs/ercs/ercs- |
2096 |
home.html> |
2097 |
|
2098 |
[GOLD90] C. F. Goldfarb, "The SGML Handbook", Y. Rubinsky, Ed., |
2099 |
Oxford University Press, 1990. |
2100 |
|
2101 |
[HTTP-1.1] R.T. Fielding, H. Frystyk Nielsen, and T. Berners-Lee, |
2102 |
"Hypertext Transfer Protocol -- HTTP/1.1", Work in |
2103 |
progress (draft-ietf-http-v11-spec-03.txt), MIT/LCS, |
2104 |
May 1996. |
2105 |
|
2106 |
[ISO-639] ISO 639:1988. Codes pour la représentation des noms de |
2107 |
langue. Technical content in |
2108 |
<http://www.sil.org/sgml/iso639a.html> |
2109 |
|
2110 |
[ISO-3166] ISO 3166:1993. Codes pour la représentation des noms |
2111 |
de pays. |
2112 |
|
2113 |
[ISO-8601] ISO 8601:1988. Éléments de données et formats |
2114 |
d'échange -- Échange d'information -- Représentation |
2115 |
de la date et de l'heure. |
2116 |
|
2117 |
[ISO-8859-1] ISO 8859-1:1987. International Standard -- Informa- |
2118 |
tion Processing -- 8-bit Single-Byte Coded Graphic |
2119 |
Character Sets -- Part 1: Latin Alphabet No. 1. |
2120 |
|
2121 |
[ISO-8879] ISO 8879:1986. International Standard -- Information |
2122 |
Processing -- Text and Office Systems -- Standard Gen- |
2123 |
eralized Markup Language (SGML). |
2124 |
|
2125 |
|
2126 |
|
2127 |
Expires 2 December 1996 [Page 38] |
2128 |
|
2129 |
Internet Draft HTML internationalization 27 May 1996 |
2130 |
|
2131 |
|
2132 |
[ISO-10646] ISO/IEC 10646-1:1993. International Standard -- Infor- |
2133 |
mation technology -- Universal Multiple-Octet Coded |
2134 |
Character Set (UCS) -- Part 1: Architecture and Basic |
2135 |
Multilingual Plane. |
2136 |
|
2137 |
[NICOL] G.T. Nicol, "The Multilingual World Wide Web", Elec- |
2138 |
tronic Book Technologies, 1995, |
2139 |
<http://www.ebt.com/docs/multling.html> |
2140 |
|
2141 |
[NICOL2] G.T. Nicol, "MIME Header Supplemented File Type", Work |
2142 |
in progress, <draft-nicol-mime-header-type-00.txt>, |
2143 |
EBT, October 1995. |
2144 |
|
2145 |
[RFC1345] K. Simonsen, "Character Mnemonics & Character Sets", |
2146 |
RFC 1345, Rationel Almen Planlaegning, June 1992. |
2147 |
|
2148 |
[RFC1468] J. Murai, M. Crispin and E. van der Poel, "Japanese |
2149 |
Character Encoding for Internet Messages", RFC 1468, |
2150 |
Keio University, Panda Programming, June 1993. |
2151 |
|
2152 |
[RFC1521] N. Borenstein and N. Freed, "MIME (Multipurpose Inter- |
2153 |
net Mail Extensions) Part One: Mechanisms for Specify- |
2154 |
ing and Describing the Format of Internet Message Bod- |
2155 |
ies", RFC 1521, Bellcore, Innosoft, September 1993. |
2156 |
|
2157 |
[RFC1641] D. Goldsmith, M.Davis, "Using Unicode with MIME", RFC |
2158 |
1641, Taligent inc., July 1994. |
2159 |
|
2160 |
[RFC1642] D. Goldsmith, M. Davis, "UTF-7: A Mail-safe Transfor- |
2161 |
mation Format of Unicode", RFC 1642, Taligent inc., |
2162 |
July 1994. |
2163 |
|
2164 |
[RFC1738] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform |
2165 |
Resource Locators (URL)", RFC 1738, CERN, Xerox PARC, |
2166 |
University of Minnesota, October 1994. |
2167 |
|
2168 |
[RFC1766] H. Alverstrand, "Tags for the Identification of Lan- |
2169 |
guages", RFC 1766, UNINETT, March 1995. |
2170 |
|
2171 |
[RFC1866] T. Berners-Lee and D. Connolly, "Hypertext Markup Lan- |
2172 |
guage - 2.0", RFC 1866, MIT/W3C, November 1995. |
2173 |
|
2174 |
[RFC1867] E. Nebel and L. Masinter, "Form-based File Upload in |
2175 |
HTML", RFC 1867, Xerox Corporation, November 1995. |
2176 |
|
2177 |
[RFC1942] D. Raggett, "HTML Tables", RFC 1942, W3C, May 1996. |
2178 |
|
2179 |
|
2180 |
|
2181 |
|
2182 |
|
2183 |
Expires 2 December 1996 [Page 39] |
2184 |
|
2185 |
Internet Draft HTML internationalization 27 May 1996 |
2186 |
|
2187 |
|
2188 |
[RFC1945] T. Berners-Lee, R.T. Fielding, and H. Frystyk Nielsen, |
2189 |
"Hypertext Transfer Protocol -- HTTP/1.0", RFC 1945, |
2190 |
MIT/LCS, UC Irvine, May 1996. |
2191 |
|
2192 |
[SQ91] SoftQuad, "The SGML Primer", 3rd ed., SoftQuad Inc., |
2193 |
1991. |
2194 |
|
2195 |
[TAKADA] Toshihiro Takada, "Multilingual Information Exchange |
2196 |
through the World-Wide Web", Computer Networks and |
2197 |
ISDN Systems, Vol. 27, No. 2, Nov. 1994 , p. 235-241. |
2198 |
|
2199 |
[TEI] TEI Guidelines for Electronic Text Encoding and Inter- |
2200 |
change. <http://etext.virgina.edu/TEI.html> |
2201 |
|
2202 |
[UNICODE] The Unicode Consortium, "The Unicode Standard -- |
2203 |
Worldwide Character Encoding -- Version 1.0", Addison- |
2204 |
Wesley, Volume 1, 1991, Volume 2, 1992, and Technical |
2205 |
Report #4, 1993. The BIDI algorithm is in appendix A |
2206 |
of volume 1, with corrections in appendix D of volume |
2207 |
2. |
2208 |
|
2209 |
[UTF-8] ISO/IEC 10646-1:1993 AMENDMENT 2 (1996). UCS Transfor- |
2210 |
mation Format 8 (UTF-8). |
2211 |
|
2212 |
[VANH90] E. van Hervijnen, "Practical SGML", Kluwer Academicq |
2213 |
Publishers Group, Norwell and Dordrecht, 1990. |
2214 |
|
2215 |
Authors' Addresses |
2216 |
|
2217 |
François Yergeau |
2218 |
Alis Technologies |
2219 |
100, boul. Alexis-Nihon, bureau 600 |
2220 |
Montréal QC H4M 2P2 |
2221 |
Canada |
2222 |
|
2223 |
Tel: +1 (514) 747-2547 |
2224 |
Fax: +1 (514) 747-2561 |
2225 |
EMail: fyergeau@alis.com |
2226 |
|
2227 |
|
2228 |
Gavin Thomas Nicol |
2229 |
Electronic Book Technologies, Japan |
2230 |
1-29-9 Tsurumaki, |
2231 |
Setagaya-ku, |
2232 |
Tokyo |
2233 |
Japan |
2234 |
|
2235 |
Tel: +81-3-3230-8161 |
2236 |
|
2237 |
|
2238 |
|
2239 |
Expires 2 December 1996 [Page 40] |
2240 |
|
2241 |
Internet Draft HTML internationalization 27 May 1996 |
2242 |
|
2243 |
|
2244 |
Fax: +81-3-3230-8163 |
2245 |
EMail: gtn@ebt.com, gtn@twics.co.jp |
2246 |
|
2247 |
|
2248 |
Glenn Adams |
2249 |
Spyglass |
2250 |
118 Magazine Street |
2251 |
Cambridge, MA 02139 |
2252 |
U.S.A. |
2253 |
|
2254 |
Tel: +1 (617) 864-5524 |
2255 |
Fax: +1 (617) 864-4965 |
2256 |
EMail: glenn@spyglass.com |
2257 |
|
2258 |
|
2259 |
Martin J. Duerst |
2260 |
Multimedia-Laboratory |
2261 |
Department of Computer Science |
2262 |
University of Zurich |
2263 |
Winterthurerstrasse 190 |
2264 |
CH-8057 Zurich |
2265 |
Switzerland |
2266 |
|
2267 |
Tel: +41 1 257 43 16 |
2268 |
Fax: +41 1 363 00 35 |
2269 |
E-mail: mduerst@ifi.unizh.ch |
2270 |
|
2271 |
|
2272 |
|
2273 |
|
2274 |
|
2275 |
|
2276 |
|
2277 |
|
2278 |
|
2279 |
|
2280 |
|
2281 |
|
2282 |
|
2283 |
|
2284 |
|
2285 |
|
2286 |
|
2287 |
|
2288 |
|
2289 |
|
2290 |
|
2291 |
|
2292 |
|
2293 |
|
2294 |
|
2295 |
Expires 2 December 1996 [Page 41] |
2296 |
|