1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
Network Working Group E. Levinson |
8 |
Internet Draft: MIME/SGML Accurate Information |
9 |
<draft-levinson-sgml-02.txt> Systems, Inc. |
10 |
November 1, 1994 |
11 |
|
12 |
MIME Content-types for SGML Documents |
13 |
|
14 |
This draft document is being circulated for comment. Please |
15 |
send your comments to the authors or to the ietf-822 |
16 |
maillist <ietf-822@dimacs.rutgers.edu>. If consensus is |
17 |
reached this document may be submitted to the RFC editor as |
18 |
a Proposed Standard protocol specification for use with |
19 |
MIME. |
20 |
|
21 |
Status of this Memo |
22 |
|
23 |
This document is an Internet Draft; Internet Drafts are |
24 |
working documents of the Internet Engineering Task Force |
25 |
(IETF) its Areas, and Working Groups. Note that other |
26 |
groups may also distribute working documents as Internet |
27 |
Drafts. |
28 |
|
29 |
Internet Drafts are draft documents valid for a maximum of |
30 |
six months. They may be updated, replaced, or obsoleted by |
31 |
other documents at any time. It is not appropriate to use |
32 |
Internet Drafts as reference material or to cite them other |
33 |
than as a "working draft" or "work in progress". |
34 |
|
35 |
Please check the abstract listing in each Internet Draft |
36 |
directory for the current status of this or any other |
37 |
Internet Draft. |
38 |
|
39 |
Abstract |
40 |
|
41 |
This document specifies how a document written using the |
42 |
Standardized General Markup Language (SGML) is to be carried |
43 |
within a MIME message. MIME provides a flexible mechanism |
44 |
for structuring RFC 822 message bodies. Additional |
45 |
mechanisms are required to encapsulate SGML documents which |
46 |
can consist of multiple parts. The mechanisms specify how |
47 |
to label the SGML parts and how to represent the internal |
48 |
references between the files. In addition, this document |
49 |
specifies the requirements for conformance to the SGML |
50 |
Document Interchange Format (SDIF). Subtypes are proposed |
51 |
for the Multipart and Application content types to support |
52 |
SGML documents and their interchange format within MIME. |
53 |
|
54 |
SGML is one of a class of compound documents which consist |
55 |
of a number of a set of parts or files, some of which may |
56 |
contain references to other files in the set. Explicit |
57 |
indications of the bindings between the sender's file names |
58 |
and the MIME body parts are needed to re-bind the sender's |
59 |
file names to ones on the recipient's system. Further, the |
60 |
document may refer to host system commands, a security |
61 |
|
62 |
|
63 |
|
64 |
E. Levinson Expires May 1, 1995 [Page 1] |
65 |
|
66 |
Internet Draft MIME Types for SGML Documents |
67 |
|
68 |
|
69 |
hazard which must be controlled. |
70 |
|
71 |
0 Changes since last issued |
72 |
|
73 |
Changed DTD to prolog in text and definition. |
74 |
|
75 |
Increased precision when referring to SGML language |
76 |
elements. |
77 |
|
78 |
Various changes to clarify meaning. |
79 |
|
80 |
1 Introduction |
81 |
|
82 |
The Multipurpose Internet Mail Extensions (MIME) |
83 |
specification [RFC-MIME] does not provide for objects |
84 |
(content-types) that consist of many interelated body parts |
85 |
when arbitrary relationships exist between the object's body |
86 |
parts. This document provides structuring mechanisms for |
87 |
encapsulating objects or documents encoded with the Standard |
88 |
Generalized Markup Language (SGML) [ISO-SGML]. |
89 |
|
90 |
MIME based mail User Agents can be configured to display |
91 |
(and compose) many different message body content-types. |
92 |
The user agents invoke applications that correspond to each |
93 |
particular content-type. Existing MIME content-types |
94 |
consist of a single body part and mechanisms to convey a |
95 |
collection of body parts. Recently, MIME subtypes have been |
96 |
registered that consist of a few related body parts, e.g. |
97 |
Apple MacIntosh file objects. In each case the body part is |
98 |
an independent and atomic unit. That is not the case with |
99 |
document objects, here the documents consist of many body |
100 |
parts that refer to one another; body parts whose display, |
101 |
independent of other body parts, has no meaning. |
102 |
Consequently the entire set of body parts must be treated as |
103 |
a single unit, a capability not present in MIME today. |
104 |
|
105 |
The general mechanism used here may apply to other markup |
106 |
languages, e.g. troff and TeX. The issues of internal |
107 |
structure, security, and control must be addressed |
108 |
separately for each markup language. |
109 |
|
110 |
1.1 SGML |
111 |
|
112 |
SGML is used in several communities to encode document |
113 |
structure and layout. A rigorous description of SGML is |
114 |
left to [ISO-SGML]. Appendix A of this document, which is |
115 |
unbelievably brief, contains a description of the SGML |
116 |
elements relevant to MIME encapsulation. The terms used in |
117 |
the present document attempt to be consistent with SGML |
118 |
terminology and usage. |
119 |
|
120 |
A complete SGML document consists of an SGML declaration, a |
121 |
prolog, and a document instance. The prolog contains the |
122 |
document type definition (DTD), a description of the class |
123 |
|
124 |
|
125 |
|
126 |
E. Levinson Expires May 1, 1995 [Page 2] |
127 |
|
128 |
Internet Draft MIME Types for SGML Documents |
129 |
|
130 |
|
131 |
of documents to which the document instance belongs. |
132 |
Subdocuments are defined by their own prolog (and DTD). The |
133 |
prolog may, recursively, refer to subdocuments each of which |
134 |
consists of a prolog and instance. The applications that |
135 |
process SGML documents may require these documents or |
136 |
subdocuments to have each part (declaration, prolog, or |
137 |
instance) be individual files or a single file. |
138 |
|
139 |
For a person or application to receive and display a |
140 |
complete SGML document the mail message must carry a precise |
141 |
definition for each of the SGML document parts. In the |
142 |
sender's environment the document parts may reference |
143 |
standard definitions or specific local files. Further, a |
144 |
DTD may reference other files, for example images and |
145 |
graphics. The identity of the document parts and the |
146 |
content of each file must be available to enable the |
147 |
recipient to transform the sender's file name references |
148 |
into an equivalent local reference and to instantiate the |
149 |
files locally. |
150 |
|
151 |
1.2 SGML Document Interchange Format (SDIF) |
152 |
|
153 |
The SGML Document Interchange Format (SDIF) [ISO-SDIF] |
154 |
specifies the structure for a data stream which contains one |
155 |
or more SGML documents. SDIF is focused on transferring |
156 |
documents between sites and does not include a requirement |
157 |
that the documents be displayed as they are encountered. |
158 |
Users of mail based systems, however, expect to have each |
159 |
mail item in a multipart message displayed -- more |
160 |
precisely, ready for display -- when encountered. This |
161 |
document shows how to meet both the SDIF and display |
162 |
requirements. |
163 |
|
164 |
1.3 Organization of this Memorandum |
165 |
|
166 |
First a model for transferring SGML documents is presented; |
167 |
the model defines the processing required and introduces the |
168 |
SGML content types and parameters. The ensuing section |
169 |
provides the formal definition of those content types. That |
170 |
is followed by a brief discussion of the use of the |
171 |
Content-ID field. The last two sections cover SDIF and |
172 |
security considerations. |
173 |
|
174 |
2 A Model for MIME/SGML |
175 |
|
176 |
Four issues must be addressed for the recipient's user agent |
177 |
to display a complete SGML document: the various parts must |
178 |
be specified and file references on the sender's systems |
179 |
must be resolved to references on the receiver's system, |
180 |
similarly, command references must be resolved. Finally, an |
181 |
appropriate application, an unpacker, must be in control to |
182 |
unpack of the MIME body parts and present them to the |
183 |
display software. The controlling application is discussed |
184 |
first and then the document parts, file references, and |
185 |
|
186 |
|
187 |
|
188 |
E. Levinson Expires May 1, 1995 [Page 3] |
189 |
|
190 |
Internet Draft MIME Types for SGML Documents |
191 |
|
192 |
|
193 |
command references. |
194 |
|
195 |
2.1 Invoking the SGML Parser Application |
196 |
|
197 |
MIME offers the possibility to add SGML capability to |
198 |
existing MIME-capable mail user agents. They achieve that |
199 |
flexibility by invoking an application process based upon |
200 |
the body part's content-type. That process receives the |
201 |
body part and displays it. The approach is satisfactory |
202 |
when each body part is an atomic entity but this is not the |
203 |
case with compound SGML documents; the display must be |
204 |
deferred until all the body parts are available to the |
205 |
application and the internal entity structure is usable. To |
206 |
accomplish this, a process is interposed between the SGML |
207 |
application and the MIME user agent. Working on behalf of |
208 |
the recipient, the process instantiates each body part as a |
209 |
file on the local system, adjusts the internal file name |
210 |
references, and provides the appropriate formatting |
211 |
applications for non-SGML data; this process is called, in |
212 |
accordance with [ISO-SDIF], the unpacker (the corresponding |
213 |
process for the sender is the packer). Once the files have |
214 |
been instantiated the SGML viewer application can be |
215 |
invoked. |
216 |
|
217 |
It is natural to think of the various document parts as |
218 |
elements of a multipart MIME message. Insuring they are |
219 |
processed by the unpacker requires a specific multipart |
220 |
subtype, SGML. A MIME user agent that cannot handle SGML |
221 |
documents can use multipart/mixed as the default. |
222 |
|
223 |
The remaining parts of this section describe the |
224 |
requirements for instantiating a compound SGML document on |
225 |
the recipient's system. The next subsection discusses the |
226 |
document parts and the following one considers other data, |
227 |
included in the document by reference. |
228 |
|
229 |
2.2 Specifying the Document Parts |
230 |
|
231 |
Different implementations of SGML parsers use different |
232 |
methods for storing the SGML declaration, DTD, and document |
233 |
instance. Consequently, for transfer, they are sent as |
234 |
separate MIME body parts. The unpacker will find these |
235 |
parts and store them as the local application requires; the |
236 |
receiver may substitute the local default for a missing SGML |
237 |
declaration or DTD. |
238 |
|
239 |
An alternate solution would have been to require a single |
240 |
file, the concatenation of the declaration, DTD, and |
241 |
instance. That would often require transferring much more |
242 |
data than needed; often only the document instance is |
243 |
required. |
244 |
|
245 |
The SGML document parts are specified as parameters of the |
246 |
Multipart/SGML content-type. Thus a sender includes the |
247 |
|
248 |
|
249 |
|
250 |
E. Levinson Expires May 1, 1995 [Page 4] |
251 |
|
252 |
Internet Draft MIME Types for SGML Documents |
253 |
|
254 |
|
255 |
declaration, prolog, and instance as separate body parts, |
256 |
and indicates, as the parameter value, the file-token |
257 |
associated with that body part. If neither the SGML |
258 |
declaration nor prolog is specified the recipient is free to |
259 |
apply a local default. |
260 |
|
261 |
The following parameters are provided for each document or |
262 |
subdocument instance. |
263 |
|
264 |
sgml-parm := *( ";" sgml-part "=" sgml-part-spec) |
265 |
[ ";" "sgml-version" "=" iso-sgml-spec ] |
266 |
[ ";" "created-with" "=" ref-or-tok ] |
267 |
[ ";" "charset" "=" charset ] |
268 |
|
269 |
sgml-part := "instance" / "prolog" |
270 |
/ "dtd" / "fosi" / extension-token |
271 |
|
272 |
sgml-part-spec := file-token / sgml-public |
273 |
/ extension-token |
274 |
|
275 |
file-token := content-ID |
276 |
|
277 |
sgml-public := <An SGML PUBLIC identifier> |
278 |
|
279 |
iso-sgml-spec := <The identifier of the SGML |
280 |
specification to which the |
281 |
document conforms, e.g. |
282 |
ISO 8879-1986> |
283 |
|
284 |
Sgml-parts specify the various parts of a complete document. |
285 |
File-tokens are discussed in the next section; they refers |
286 |
to body parts whose content-ID: field matches the file- |
287 |
token. Sgml-public are identifiers that represent well |
288 |
known files or entities; they are discussed in the next |
289 |
section. |
290 |
|
291 |
The two parameters, sgml-version and created-with, are |
292 |
provided for guidance to user agents. Sgml-version |
293 |
specifies the particular SGML standard to which the document |
294 |
conforms. A user agent can use this value to invoke the |
295 |
application appropriate to that version of the standard. In |
296 |
the absence of this parameter the receiver will assume the |
297 |
version to be ISO 8879-1986. The created-with parameter |
298 |
provides guidance in cases where inter-operability with |
299 |
respect to SGML may be a problem. In environments where |
300 |
user's maintain several SGML processors, this parameter can |
301 |
be used to invoke the appropriate implementation; it is |
302 |
provided solely to accommodate existing environments. The |
303 |
created-with parameter is likely to be deprecated. |
304 |
|
305 |
The charset parameter specifies the body part character set. |
306 |
If not specified, the default is us-ascii. |
307 |
|
308 |
2.3 Resolving File References |
309 |
|
310 |
|
311 |
|
312 |
E. Levinson Expires May 1, 1995 [Page 5] |
313 |
|
314 |
Internet Draft MIME Types for SGML Documents |
315 |
|
316 |
|
317 |
The SGML prolog contains a Document Type Definition (DTD) |
318 |
which defines document parts (entities) that can be |
319 |
referenced for inclusion or interpolation. The entities |
320 |
point to files that can contain SGML coded text, text not to |
321 |
be interpreted, images, or other data. Within SGML there |
322 |
are two types of file reference entities, SYSTEM and PUBLIC. |
323 |
PUBLIC entities specify SGML document parts that are known |
324 |
to and resolvable by SGML viewers and editors. The SYSTEM |
325 |
identifiers refer to files in the local environment. In |
326 |
order for the recipient's SGML application to properly |
327 |
process the document, the file references must be resolvable |
328 |
in the recipient's environment. Conceptually, one must |
329 |
replace each of the sender's file references with a |
330 |
corresponding reference in the recipient's file system. |
331 |
|
332 |
There are two issues here. First, the sending user agent |
333 |
must identify, in an architecturally neutral form, the |
334 |
sender's file references. Second, the correspondence |
335 |
between the file identity (name) and MIME body part must be |
336 |
preserved. This document applies the principle of "sender |
337 |
makes right" to these issues and requires first, that the |
338 |
packer convert all file references into a unique token |
339 |
containing only US-ASCII characters. The sender also must |
340 |
insure that the corresponding body part header contains a |
341 |
Content-ID: field whose value is the file's token. Thus, |
342 |
the internal file name, now a token suitable for an 822 |
343 |
header, explicitly appears in the document and as a |
344 |
content-ID label of the corresponding MIME body part. When |
345 |
the receiver stores the body part in the recipient's file |
346 |
system it can convert the internal file references (tokens) |
347 |
into valid local references. |
348 |
|
349 |
2.4 Processors for Non-SGML Data |
350 |
|
351 |
Non-SGML data requires the SGML application to invoke a |
352 |
processor to format the data for display. The |
353 |
correspondence between the entity (file name) that contains |
354 |
the data and the processor is contained in the type field of |
355 |
the SGML entity declaration. The type corresponds to an |
356 |
SGML notation declaration The notation declaration contains |
357 |
the operating system command string that invokes the |
358 |
processor. That is, the string in the notation declaration |
359 |
is an arbitrary command! There are two problems with this |
360 |
situation, the command may only be valid in the sender's |
361 |
environment and, even if it is valid in the recipient's, |
362 |
invoking that command is a security hazard. |
363 |
|
364 |
Therefore, this document requires that any type used in an |
365 |
SGML notation be an valid MIME content type (or an extension |
366 |
token) thus insuring the recipient's user agent knows how to |
367 |
invoke the local display software. The recipient's mail |
368 |
user agent just substitutes a local string for the string in |
369 |
the corresponding notation declaration. |
370 |
|
371 |
|
372 |
|
373 |
|
374 |
E. Levinson Expires May 1, 1995 [Page 6] |
375 |
|
376 |
Internet Draft MIME Types for SGML Documents |
377 |
|
378 |
|
379 |
3 The SGML Subtypes |
380 |
|
381 |
A complete document may be a single instance in which all |
382 |
the other document parts are defined by existing standards |
383 |
or private agreements. It may also be a set of parts |
384 |
several of which must be included in the MIME message. Two |
385 |
SGML subtypes are defined, Application/SGML and |
386 |
Multipart/SGML. Both content types use the same parameters. |
387 |
The multipart subtype is considered first; it is the general |
388 |
case. The application subtype is a simplification for the |
389 |
case where the multipart would contain a single part. It is |
390 |
also used for SGML subdocument entities, that is, text with |
391 |
mark-up. |
392 |
|
393 |
3.1 The Multipart/SGML Subtype |
394 |
|
395 |
An SGML document can be carried in a MIME message as a |
396 |
Multipart body of subtype SGML (Content-Type: |
397 |
Multipart/SGML). The content-type parameters specify each |
398 |
of the parts of the SGML document. Additional parameters |
399 |
specify the software that created the document and the |
400 |
applicable SGML standard. |
401 |
|
402 |
A complex document contains references to standard parts |
403 |
(PUBLIC entities) and other files (SYSTEM entities). For |
404 |
PUBLIC entities no corresponding body part need exist. Each |
405 |
SYSTEM entity, on the other hand, corresponds to a body part |
406 |
labelled with a Content-ID: header. |
407 |
|
408 |
Thus a complete SGML document can appear as the following |
409 |
MIME message. |
410 |
|
411 |
|
412 |
|
413 |
|
414 |
|
415 |
|
416 |
|
417 |
|
418 |
|
419 |
|
420 |
|
421 |
|
422 |
|
423 |
|
424 |
|
425 |
|
426 |
|
427 |
|
428 |
|
429 |
|
430 |
|
431 |
|
432 |
|
433 |
|
434 |
|
435 |
|
436 |
E. Levinson Expires May 1, 1995 [Page 7] |
437 |
|
438 |
Internet Draft MIME Types for SGML Documents |
439 |
|
440 |
|
441 |
|
442 |
Content-Type: Multipart/SGML; |
443 |
instance=9311141653.VHFradio@ryc.pa.nj.us; |
444 |
prolog=9310240942.radioDTD@ryc.pa.nj.us; |
445 |
boundary=tiger-lily |
446 |
|
447 |
--tiger-lily |
448 |
Content-type: Application/SGML |
449 |
Content-ID: 9311141653.VHFradio@ryc.pa.nj.us |
450 |
|
451 |
< ... an SGML instance > |
452 |
|
453 |
--tiger-lily |
454 |
Content-type: Image/gif |
455 |
Content-ID: 9312161426.figure1@ryc.pa.nj.us |
456 |
|
457 |
<-- binary GIF date --> |
458 |
--tiger-lily |
459 |
Content-type: Application/SGML |
460 |
Content-ID: 9310240942.radioDTD@ryc.pa.nj.us |
461 |
|
462 |
<!DOCTYPE radio |
463 |
PUBLIC -//USA-DOD//DTD DRAFT TEMPLATE 911201//EN" |
464 |
[<!ENTITY figure1 |
465 |
SYSTEM "9312161426.figure1@ryc.pa.nj.us" NDATA gif |
466 |
-- a reference to the file "figure1" -- |
467 |
>] |
468 |
|
469 |
--tiger-lily-- |
470 |
|
471 |
3.2 The Application/SGML Subtype |
472 |
|
473 |
Application/SGML is used when transferring a file containing |
474 |
text and mark-up as part of a Multipart/SGML message or as a |
475 |
SGML document contained in a single body part. |
476 |
|
477 |
application-subtype := ("octet-stream" *stream) |
478 |
/ "postscript" |
479 |
/ ("sgml" *sgml-parm) |
480 |
/ extension-token |
481 |
|
482 |
The following example shows a MIME message with a content- |
483 |
type that specifies a prolog. |
484 |
|
485 |
Content-Type: application/SGML; |
486 |
prolog="-//XYZ-CORP//SUBDOC RFC2010 100401//EN" |
487 |
|
488 |
<! ... an SGML instance > |
489 |
|
490 |
3.3 Character Set Considerations |
491 |
|
492 |
Consistent with MIME defaults, SGML documents use, by |
493 |
default, the ASCII character set. For documents not in that |
494 |
character set, the charset= parameter of the Content-Type: |
495 |
|
496 |
|
497 |
|
498 |
E. Levinson Expires May 1, 1995 [Page 8] |
499 |
|
500 |
Internet Draft MIME Types for SGML Documents |
501 |
|
502 |
|
503 |
field specifies the actual character set. Note that the |
504 |
values of the charset parameter must be registered with |
505 |
IANA, or be a mutually agreed upon extension-token (i.e., |
506 |
charset=X-set). |
507 |
|
508 |
4 The Content-ID Header Field |
509 |
|
510 |
The Content-ID: header field , defined in [RFC-MIME], is a |
511 |
globally unique identifier and provides the link between |
512 |
file references within the SGML document and the MIME body |
513 |
parts. It constitutes a unique file name token which |
514 |
represents the sender's file name to which the body part |
515 |
corresponds. The process that creates the Multipart/SGML |
516 |
body parts, generates a content-ID value value for each file |
517 |
referenced by the document's DTDs. The packer then uses the |
518 |
content-ID value to replace the file name in the |
519 |
corresponding SGML ENTITY statements. Conversion of those |
520 |
values into valid references in the receiver's file system |
521 |
occurs upon document storage or display. |
522 |
|
523 |
It is recommended that the content-ID value include the name |
524 |
used within the ENTITY statement; this will improve the |
525 |
ability to associate the sender's data with the receivers. |
526 |
|
527 |
5 SDIF [ISO-SDIF] Data Streams |
528 |
|
529 |
SDIF is an interchange format standard for SGML documents |
530 |
[ISO-SDIF]. It defines a data stream that may contain |
531 |
several SGML documents. This section defines a Multipart |
532 |
subtype "SDIF" for an SDIF data stream that contains one or |
533 |
more SGML documents. Messages that conform to the SDIF |
534 |
subtype will conform to [ISO-SDIF]. |
535 |
|
536 |
Briefly, an SDIF data stream is a sequence of SGML documents |
537 |
and their subdocument and external entities (c.f. Appendix |
538 |
A). These external entities are defined in the DTD and are |
539 |
referred to via their SGML name elsewhere in the prolog or |
540 |
in the document instance. Names are not unique across |
541 |
documents and subdocuments. Their scope is the document or |
542 |
subdocument that defines it and the subdocuments contained |
543 |
therein until redefined in one of those subdocuments. |
544 |
|
545 |
To provide unique identities within the SDIF data stream, |
546 |
each document and subdocument is recursively examined, |
547 |
breadth first, for external entities; each entity is |
548 |
sequentially numbered and placed in the data stream in the |
549 |
order encountered. Then each SGML document or subdocument |
550 |
structure in the SDIF data stream lists the number of the |
551 |
first entity it contains. When encoded in MIME, SDIF |
552 |
sequence numbers need not be explicitly represented; they |
553 |
can be recovered by parsing the input. The content-ID |
554 |
serves the same purpose and the MIME user agent can use it |
555 |
directly. |
556 |
|
557 |
|
558 |
|
559 |
|
560 |
E. Levinson Expires May 1, 1995 [Page 9] |
561 |
|
562 |
Internet Draft MIME Types for SGML Documents |
563 |
|
564 |
|
565 |
An SDIF data stream is encoded within a MIME message as a |
566 |
Multipart/SDIF body part. It contains one to three body |
567 |
parts. The first and third body parts, are Multipart/Mixed |
568 |
and are labelled with a content description field whose |
569 |
value is related-documents-A and related-documents-B |
570 |
respectively. The second body part is either a |
571 |
Multipart/SGML or Application/SGML (mime/sgml, for |
572 |
convenience) A Multipart/Mixed body parts only contain |
573 |
mime/sgml body parts. |
574 |
|
575 |
The Multipart/SDIF content type has a character set |
576 |
parameter which specifies the character set used for SGML |
577 |
markup tokens through-out the data stream. |
578 |
|
579 |
There are five SDIF entity types: |
580 |
|
581 |
|
582 |
subdocument These can contain references to external entities as |
583 |
well as marked up text. |
584 |
|
585 |
|
586 |
text An external entity containing only marked up text. |
587 |
|
588 |
|
589 |
data An external entity containing non-SGML data, images, |
590 |
for example. |
591 |
|
592 |
|
593 |
public-text Corresponds to a PUBLIC external reference and contains |
594 |
a NULL message body. |
595 |
|
596 |
|
597 |
cross-reference Corresponds to a previously included external entity. |
598 |
This avoids duplicating material previously included. |
599 |
It contains a NULL message body. This document |
600 |
requires the referenced body part to have already |
601 |
appeared to enable the user agent to display the SGML |
602 |
documents as they are encountered. |
603 |
|
604 |
The subdocument and text SDIF entities become Application/SGML body |
605 |
parts and data entities which are encapsulated as the appropriate MIME |
606 |
content type, application/SGML, image/gif, etc. The public-text and |
607 |
cross-reference entities have null message bodies and are handled as |
608 |
parameters, public and cross-reference, of an Application/SDIF content |
609 |
type. The syntax is: |
610 |
|
611 |
|
612 |
|
613 |
|
614 |
|
615 |
|
616 |
|
617 |
|
618 |
|
619 |
|
620 |
|
621 |
|
622 |
E. Levinson Expires May 1, 1995 [Page 10] |
623 |
|
624 |
Internet Draft MIME Types for SGML Documents |
625 |
|
626 |
|
627 |
|
628 |
appl-subtype := <existing> / "sdif" sdif-param |
629 |
|
630 |
sdif-param := ";" "public" "=" |
631 |
<an SGML PUBLIC identifier> |
632 |
/ ";" "cross-reference" "=" |
633 |
<a previous MIME body part Content-ID> |
634 |
<-- the enclosing Multipart/SDIF body |
635 |
part is taken as the root (level 1) |
636 |
for numbering body parts --> |
637 |
|
638 |
SDIF requires the entity name to accompany each entity in the data |
639 |
stream. When MIME is used to transfer SDIF data streams the entity name |
640 |
will be the value of the content-description field in each body part. |
641 |
|
642 |
SDIF does not distinguish among the parts of a document entity (declara- |
643 |
tion, prolog, and instance). However, when SGML documents are contained |
644 |
in a Multipart/SDIF message the document parts are sent as separate body |
645 |
parts; that is, as a Multipart/SGML body part. |
646 |
|
647 |
Finally, SDIF uses sequential numbers to uniquely identify each entity, |
648 |
called an entity-identifier in [ISO-SDIF]. To locate the position of |
649 |
the first external entity, a first-identifier is provided for each docu- |
650 |
ment or sub-document. These are not necessary when using the methods in |
651 |
this paper but can be derived. Within a Multipart/SDIF message each |
652 |
body part is numbered sequentially, starting at 1 with the first |
653 |
Application/SGML body part. Note that the only Multipart body part that |
654 |
can be present in a Multipart/SDIF message is Mulitpart/Alternative. |
655 |
That will resolve into a single body part and shall be treated as though |
656 |
it were a non-multipart body part. The subdocument, text and data enti- |
657 |
ties may, in fact, be Message/External body parts. With the numbering |
658 |
described the unpacker may build a table to translate body parts into |
659 |
SDIF. |
660 |
|
661 |
When encapsulated with MIME each document can be displayed individually. |
662 |
The SDIF stream is a sequence of Multipart/SGML body parts. |
663 |
|
664 |
6 Security |
665 |
|
666 |
An SGML parser can be directed to invoke a local process, usually to |
667 |
format or display a graphical image. That capability presents an oppor- |
668 |
tunity for abuse. To understand the potential problems requires under- |
669 |
standing two SGML constructs, entity and notation statements, presented |
670 |
below. Capitalized items are literals, lowercase ones are tokens, and |
671 |
the special characters are markup escape sequences. |
672 |
|
673 |
<!ENTITY name SYSTEM file NDATA type> |
674 |
<!NOTATION type SYSTEM qstring> |
675 |
|
676 |
The document text will refer to name which, in turn, will cause the |
677 |
application, type, represented by qstring to be invoked. Qstring could |
678 |
be "delete *.*". |
679 |
|
680 |
To eliminate potential problems the unpacker should accept only |
681 |
|
682 |
|
683 |
|
684 |
E. Levinson Expires May 1, 1995 [Page 11] |
685 |
|
686 |
Internet Draft MIME Types for SGML Documents |
687 |
|
688 |
|
689 |
registered MIME types and subtypes. The unpacker can then replace the |
690 |
qstring in the NOTATION statement with the appropriate string for the |
691 |
recipient's environment. They are already defined for MIME. Types in |
692 |
the message that are not defined can be replaced by a no-operation NOTA- |
693 |
TION statement. |
694 |
|
695 |
SGML also provides for sending non-interpreted data to the display dev- |
696 |
ice or typesetter. The security hazard presented is similar to that |
697 |
posed by the use of PostScript. Greater threats may be posed by more |
698 |
"powerful" display systems and typesetters. Unauthorized access to the |
699 |
recipient's system and resources may be possible. |
700 |
|
701 |
7 References |
702 |
|
703 |
|
704 |
[ISO-SGML] ISO 8879:1988, Information processing -- Text and office |
705 |
systems -- Standard Generalized Markup Language (SGML). |
706 |
|
707 |
|
708 |
[ISO-SDIF] ISO 9069:1988, Information Processing - SGML Support Facili- |
709 |
ties -- SGML Document Interchange Format (SDIF). |
710 |
|
711 |
|
712 |
[RFC-822] Crocker, D., Standard for the Format of ARPA Internet Text |
713 |
Messages, August 1982, University of Delaware, RFC 822. |
714 |
|
715 |
|
716 |
[RFC-HDRC] Moore, Keith, Representation of Non-Ascii Text in Internet |
717 |
Message Headers, June, 1992, RFC 1342 |
718 |
|
719 |
|
720 |
[RFC-MIME] Borenstein, N. and Freed, N., MIME (Mulitpurpose Internet |
721 |
Mail Extensions): Mechanisms for Specifying and Describing |
722 |
the Format of Internet Message Bodies, June 1992, RFC 1341. |
723 |
|
724 |
|
725 |
[US-ASCII] Coded Character Set -- 7-Bit American Standard Code for |
726 |
Information Interchange, ANSI X3.4-1986. |
727 |
|
728 |
8 Acknowledgements |
729 |
|
730 |
The author acknowledges Harald T. Alvestrand, UniNett, Andy Gelsey, CSC, |
731 |
Nathaniel Borenstein, First Virtual Holdings Incorporated, Einar Stef- |
732 |
ferud, Network Management Associates, Inc, John Klensin, MCI, and Erik |
733 |
Naggum, for their suggestions, explanations, and encouragement. No |
734 |
errors or faults in this document can be ascribed to them, those are |
735 |
mine. |
736 |
|
737 |
UNIX is a registered trademark of UNIX System Laboratories, Inc. |
738 |
|
739 |
9 Author's Address |
740 |
|
741 |
Ed Levinson elevinson@accurate.com Accurate Information Systems, Inc. 2 |
742 |
Industrial Way Eatontown, NJ 0772 |
743 |
|
744 |
|
745 |
|
746 |
E. Levinson Expires May 1, 1995 [Page 12] |
747 |
|
748 |
Internet Draft MIME Types for SGML Documents |
749 |
|
750 |
|
751 |
Appendix A. SGML for IETFers |
752 |
|
753 |
This is a description of the elements of the Standard Generalized Markup |
754 |
Language (SGML) that are key to understanding the relationship between |
755 |
SGML and the Multipurpose Internet Mail Extensions (MIME). For the pur- |
756 |
poses of this discussion, and without doing too much damage to the SGML |
757 |
specification, an SGML document contains text, markup, and references to |
758 |
non-text document elements (graphics). For a complete and accurate |
759 |
description see ISO 8879, Information Processing - Text and office sys- |
760 |
tems - Standard Generalized Markup Language (SGML). |
761 |
|
762 |
An SGML document has the following structure (the parenthesized numbers |
763 |
refer to productions in ISO 8879) and is processed by an application |
764 |
called an SGML parser. Note that Internet style ABNF is used for nota- |
765 |
tion here, SGML uses a different style. |
766 |
|
767 |
sgml-doc ::= sgml-decl prolog doc-inst (2) |
768 |
sgml-sub-doc ::= dtd doc-inst (3) |
769 |
|
770 |
Sgml-decl defines the various elements and parameters of SGML. For |
771 |
example, the characters that introduce and end markup tags, "<" and ">" |
772 |
respectively will be used here, the maximum length of markup tags, etc.. |
773 |
|
774 |
The prolog defines the document structure, usually through an SGML con- |
775 |
struct called the document type definition (DTD). Most importantly for |
776 |
interchange considerations, the DTD contains references to external |
777 |
files, system commands, and text to be sent directly to a typesetter or |
778 |
printer. |
779 |
|
780 |
Doc-inst is the actual document instance or text; it also includes |
781 |
graphic elements, other text with or without markup, by reference to DTD |
782 |
elements. |
783 |
|
784 |
The remainder of this discussion focuses on two elements which a DTD |
785 |
references, entities and notations. They appear in the DTD and have the |
786 |
following format. |
787 |
|
788 |
entity ::= "<!" "ENTITY" name e-text ">" (101) |
789 |
e-text ::= q-string | data | b-text | external (105) |
790 |
data ::= ( "CDATA" | "SDATA" | "PI" ) q-string (106) |
791 |
external ::= ext-id |
792 |
[ ( "SUBDOC" | ( "NDATA" type ) ) ] (108) |
793 |
ext-id ::= ( "SYSTEM" q-string) |
794 |
| ( "PUBLIC" pub-id [q-string] ) (73) |
795 |
notation ::= "<!" "NOTATION" type ext-id ">" (148) |
796 |
|
797 |
where name is a character string and the definition of b-text left to |
798 |
ISO 8879; for convenience q-string has been substituted for the SGML |
799 |
term parameter literal. Entities referred to via the SUBDOC keyword |
800 |
differ from SGML documents in that they only contain a DTD and a doc- |
801 |
inst. |
802 |
|
803 |
Using the above productions the following simple example entities demon- |
804 |
strate the important issues. Name and type are alphanumeric tokens and |
805 |
|
806 |
|
807 |
|
808 |
E. Levinson Expires May 1, 1995 [Page 13] |
809 |
|
810 |
Internet Draft MIME Types for SGML Documents |
811 |
|
812 |
|
813 |
q-string is a series of characters enclosed in double (or single) quote |
814 |
marks. |
815 |
|
816 |
<!ENTITY name PUBLIC pname> (A) |
817 |
<!ENTITY name SYSTEM fname> (B) |
818 |
<!ENTITY name SYSTEM fname NDATA type> (C) |
819 |
<!NOTATION type SYSTEM command> (D) |
820 |
<!ENTITY name PI q-string> (E) |
821 |
|
822 |
Form A refers to a well known or "public" name that the SGML parser is |
823 |
able to resolve; in the marked up text there will be a markup item |
824 |
<name> that directs the parser to include the corresponding public file. |
825 |
Similarly, form B corresponds to a locally known file. Form C allows |
826 |
the markup text to refer to non-SGML data, an image for example, and the |
827 |
type parameter must match the type of a NOTATION element . The matching |
828 |
element's command parameter specifies the command which processes the |
829 |
file fname. Finally form E, processing instructions, specifies a string |
830 |
of characters to be sent directly to the output device. |
831 |
|
832 |
These examples give rise to the following issues when the document is |
833 |
transferred from one environment to another. |
834 |
|
835 |
|
836 |
A Is the public name known to the recipient? The recipient SGML |
837 |
parser may not know of the public file and this will be discovered |
838 |
when it processes the document. |
839 |
|
840 |
|
841 |
B What is the file name on the recipient system? There must be some |
842 |
process which binds the sender's file names to the recipient. |
843 |
|
844 |
|
845 |
C See B and D. |
846 |
|
847 |
|
848 |
D Direct use of the NOTATION form is a large security risk, an invi- |
849 |
tation to a Trojan Horse attack. The recipient must be protected |
850 |
from a sender invoking an arbitrary command on the recipient sys- |
851 |
tem. |
852 |
|
853 |
|
854 |
E Processing instructions permit the sender to manipulate the reci- |
855 |
pient output device. This is the same risk that exists for |
856 |
PostScript documents and is not addressed. |
857 |
|
858 |
Issues A through D are addressed in this document. |
859 |
|
860 |
|
861 |
|
862 |
|
863 |
|
864 |
|
865 |
|
866 |
|
867 |
|
868 |
|
869 |
|
870 |
E. Levinson Expires May 1, 1995 [Page 14] |
871 |
|
872 |
Internet Draft MIME Types for SGML Documents |
873 |
|
874 |
|
875 |
Appendix B. Content-Type registrations |
876 |
_________________________________ |
877 |
|
878 |
The Application/SGML Content-Type |
879 |
|
880 |
|
881 |
(1) MIME type name: Application |
882 |
|
883 |
|
884 |
(2) MIME subtype name: SGML |
885 |
|
886 |
|
887 |
(3) Required parameters: none |
888 |
|
889 |
|
890 |
(4) Optional parameters: declaration, prolog, instance, fosi, charset, |
891 |
SGML-version, created-with |
892 |
|
893 |
|
894 |
(5) Encoding considerations: may be encoded |
895 |
|
896 |
|
897 |
(6) Security considerations: none |
898 |
|
899 |
|
900 |
(7) Specification: |
901 |
|
902 |
This subtype is used for text marked with the Standard Generalized |
903 |
Markup Language [ISO SGML]. Body parts of this subtype within a |
904 |
Multipart/SGML body part will contain a Content-ID: field. |
905 |
|
906 |
_________________________________ |
907 |
|
908 |
The Application/SDIF Content-Type |
909 |
|
910 |
|
911 |
(1) Mime type name: Application |
912 |
|
913 |
|
914 |
(2) MIME subtype name: SDIF |
915 |
|
916 |
|
917 |
(3) Required parameters: one of public or cross-reference |
918 |
|
919 |
|
920 |
(4) Optional parameters: none |
921 |
|
922 |
|
923 |
(5) Encoding considerations: none |
924 |
|
925 |
|
926 |
(6) Security considerations: |
927 |
|
928 |
|
929 |
|
930 |
|
931 |
|
932 |
E. Levinson Expires May 1, 1995 [Page 15] |
933 |
|
934 |
Internet Draft MIME Types for SGML Documents |
935 |
|
936 |
|
937 |
(7) Specification: |
938 |
|
939 |
This subtype contains a NULL or empty message body. The value of |
940 |
the public parameter is an SGML PUBLIC entity identifier. The |
941 |
value of cross-reference is the body part identifier of a previ- |
942 |
ously occurring body part. |
943 |
|
944 |
_________________________________ |
945 |
|
946 |
The Multipart/SGML Content-Type |
947 |
|
948 |
|
949 |
(1) Mime type name: Multipart |
950 |
|
951 |
|
952 |
(2) MIME subtype name: SGML |
953 |
|
954 |
|
955 |
(3) Required parameters: boundary |
956 |
|
957 |
|
958 |
(4) Optional parameters: declaration, prolog, instance, fosi, charset, |
959 |
SGML- version, created-with |
960 |
|
961 |
|
962 |
|
963 |
(5) Encoding considerations: none |
964 |
|
965 |
|
966 |
(6) Security considerations: none |
967 |
|
968 |
|
969 |
(7) Specification: |
970 |
|
971 |
As specified in this document. |
972 |
|
973 |
_________________________________ |
974 |
|
975 |
The Multipart/SDIF Content-Type |
976 |
|
977 |
|
978 |
(1) Mime type name: Multipart |
979 |
|
980 |
|
981 |
(2) MIME subtype name: SDIF |
982 |
|
983 |
|
984 |
(3) Required parameters: boundary |
985 |
|
986 |
|
987 |
(4) Optional parameters: charset |
988 |
|
989 |
|
990 |
(5) Encoding considerations: none |
991 |
|
992 |
|
993 |
|
994 |
E. Levinson Expires May 1, 1995 [Page 16] |
995 |
|
996 |
Internet Draft MIME Types for SGML Documents |
997 |
|
998 |
|
999 |
(6) Security considerations: none |
1000 |
|
1001 |
|
1002 |
(7) Specification: |
1003 |
|
1004 |
As specified in this document. |
1005 |
|
1006 |
|
1007 |
|
1008 |
|
1009 |
|
1010 |
|
1011 |
|
1012 |
|
1013 |
|
1014 |
|
1015 |
|
1016 |
|
1017 |
|
1018 |
|
1019 |
|
1020 |
|
1021 |
|
1022 |
|
1023 |
|
1024 |
|
1025 |
|
1026 |
|
1027 |
|
1028 |
|
1029 |
|
1030 |
|
1031 |
|
1032 |
|
1033 |
|
1034 |
|
1035 |
|
1036 |
|
1037 |
|
1038 |
|
1039 |
|
1040 |
|
1041 |
|
1042 |
|
1043 |
|
1044 |
|
1045 |
|
1046 |
|
1047 |
|
1048 |
|
1049 |
|
1050 |
|
1051 |
|
1052 |
|
1053 |
|
1054 |
|
1055 |
|
1056 |
E. Levinson Expires May 1, 1995 [Page 17] |
1057 |
|