/[suikacvs]/webroot/www/2004/id/draft-levinson-sgml-02.txt
Suika

Contents of /webroot/www/2004/id/draft-levinson-sgml-02.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2
3
4
5
6
7 Network Working Group E. Levinson
8 Internet Draft: MIME/SGML Accurate Information
9 <draft-levinson-sgml-02.txt> Systems, Inc.
10 November 1, 1994
11
12 MIME Content-types for SGML Documents
13
14 This draft document is being circulated for comment. Please
15 send your comments to the authors or to the ietf-822
16 maillist <ietf-822@dimacs.rutgers.edu>. If consensus is
17 reached this document may be submitted to the RFC editor as
18 a Proposed Standard protocol specification for use with
19 MIME.
20
21 Status of this Memo
22
23 This document is an Internet Draft; Internet Drafts are
24 working documents of the Internet Engineering Task Force
25 (IETF) its Areas, and Working Groups. Note that other
26 groups may also distribute working documents as Internet
27 Drafts.
28
29 Internet Drafts are draft documents valid for a maximum of
30 six months. They may be updated, replaced, or obsoleted by
31 other documents at any time. It is not appropriate to use
32 Internet Drafts as reference material or to cite them other
33 than as a "working draft" or "work in progress".
34
35 Please check the abstract listing in each Internet Draft
36 directory for the current status of this or any other
37 Internet Draft.
38
39 Abstract
40
41 This document specifies how a document written using the
42 Standardized General Markup Language (SGML) is to be carried
43 within a MIME message. MIME provides a flexible mechanism
44 for structuring RFC 822 message bodies. Additional
45 mechanisms are required to encapsulate SGML documents which
46 can consist of multiple parts. The mechanisms specify how
47 to label the SGML parts and how to represent the internal
48 references between the files. In addition, this document
49 specifies the requirements for conformance to the SGML
50 Document Interchange Format (SDIF). Subtypes are proposed
51 for the Multipart and Application content types to support
52 SGML documents and their interchange format within MIME.
53
54 SGML is one of a class of compound documents which consist
55 of a number of a set of parts or files, some of which may
56 contain references to other files in the set. Explicit
57 indications of the bindings between the sender's file names
58 and the MIME body parts are needed to re-bind the sender's
59 file names to ones on the recipient's system. Further, the
60 document may refer to host system commands, a security
61
62
63
64 E. Levinson Expires May 1, 1995 [Page 1]
65
66 Internet Draft MIME Types for SGML Documents
67
68
69 hazard which must be controlled.
70
71 0 Changes since last issued
72
73 Changed DTD to prolog in text and definition.
74
75 Increased precision when referring to SGML language
76 elements.
77
78 Various changes to clarify meaning.
79
80 1 Introduction
81
82 The Multipurpose Internet Mail Extensions (MIME)
83 specification [RFC-MIME] does not provide for objects
84 (content-types) that consist of many interelated body parts
85 when arbitrary relationships exist between the object's body
86 parts. This document provides structuring mechanisms for
87 encapsulating objects or documents encoded with the Standard
88 Generalized Markup Language (SGML) [ISO-SGML].
89
90 MIME based mail User Agents can be configured to display
91 (and compose) many different message body content-types.
92 The user agents invoke applications that correspond to each
93 particular content-type. Existing MIME content-types
94 consist of a single body part and mechanisms to convey a
95 collection of body parts. Recently, MIME subtypes have been
96 registered that consist of a few related body parts, e.g.
97 Apple MacIntosh file objects. In each case the body part is
98 an independent and atomic unit. That is not the case with
99 document objects, here the documents consist of many body
100 parts that refer to one another; body parts whose display,
101 independent of other body parts, has no meaning.
102 Consequently the entire set of body parts must be treated as
103 a single unit, a capability not present in MIME today.
104
105 The general mechanism used here may apply to other markup
106 languages, e.g. troff and TeX. The issues of internal
107 structure, security, and control must be addressed
108 separately for each markup language.
109
110 1.1 SGML
111
112 SGML is used in several communities to encode document
113 structure and layout. A rigorous description of SGML is
114 left to [ISO-SGML]. Appendix A of this document, which is
115 unbelievably brief, contains a description of the SGML
116 elements relevant to MIME encapsulation. The terms used in
117 the present document attempt to be consistent with SGML
118 terminology and usage.
119
120 A complete SGML document consists of an SGML declaration, a
121 prolog, and a document instance. The prolog contains the
122 document type definition (DTD), a description of the class
123
124
125
126 E. Levinson Expires May 1, 1995 [Page 2]
127
128 Internet Draft MIME Types for SGML Documents
129
130
131 of documents to which the document instance belongs.
132 Subdocuments are defined by their own prolog (and DTD). The
133 prolog may, recursively, refer to subdocuments each of which
134 consists of a prolog and instance. The applications that
135 process SGML documents may require these documents or
136 subdocuments to have each part (declaration, prolog, or
137 instance) be individual files or a single file.
138
139 For a person or application to receive and display a
140 complete SGML document the mail message must carry a precise
141 definition for each of the SGML document parts. In the
142 sender's environment the document parts may reference
143 standard definitions or specific local files. Further, a
144 DTD may reference other files, for example images and
145 graphics. The identity of the document parts and the
146 content of each file must be available to enable the
147 recipient to transform the sender's file name references
148 into an equivalent local reference and to instantiate the
149 files locally.
150
151 1.2 SGML Document Interchange Format (SDIF)
152
153 The SGML Document Interchange Format (SDIF) [ISO-SDIF]
154 specifies the structure for a data stream which contains one
155 or more SGML documents. SDIF is focused on transferring
156 documents between sites and does not include a requirement
157 that the documents be displayed as they are encountered.
158 Users of mail based systems, however, expect to have each
159 mail item in a multipart message displayed -- more
160 precisely, ready for display -- when encountered. This
161 document shows how to meet both the SDIF and display
162 requirements.
163
164 1.3 Organization of this Memorandum
165
166 First a model for transferring SGML documents is presented;
167 the model defines the processing required and introduces the
168 SGML content types and parameters. The ensuing section
169 provides the formal definition of those content types. That
170 is followed by a brief discussion of the use of the
171 Content-ID field. The last two sections cover SDIF and
172 security considerations.
173
174 2 A Model for MIME/SGML
175
176 Four issues must be addressed for the recipient's user agent
177 to display a complete SGML document: the various parts must
178 be specified and file references on the sender's systems
179 must be resolved to references on the receiver's system,
180 similarly, command references must be resolved. Finally, an
181 appropriate application, an unpacker, must be in control to
182 unpack of the MIME body parts and present them to the
183 display software. The controlling application is discussed
184 first and then the document parts, file references, and
185
186
187
188 E. Levinson Expires May 1, 1995 [Page 3]
189
190 Internet Draft MIME Types for SGML Documents
191
192
193 command references.
194
195 2.1 Invoking the SGML Parser Application
196
197 MIME offers the possibility to add SGML capability to
198 existing MIME-capable mail user agents. They achieve that
199 flexibility by invoking an application process based upon
200 the body part's content-type. That process receives the
201 body part and displays it. The approach is satisfactory
202 when each body part is an atomic entity but this is not the
203 case with compound SGML documents; the display must be
204 deferred until all the body parts are available to the
205 application and the internal entity structure is usable. To
206 accomplish this, a process is interposed between the SGML
207 application and the MIME user agent. Working on behalf of
208 the recipient, the process instantiates each body part as a
209 file on the local system, adjusts the internal file name
210 references, and provides the appropriate formatting
211 applications for non-SGML data; this process is called, in
212 accordance with [ISO-SDIF], the unpacker (the corresponding
213 process for the sender is the packer). Once the files have
214 been instantiated the SGML viewer application can be
215 invoked.
216
217 It is natural to think of the various document parts as
218 elements of a multipart MIME message. Insuring they are
219 processed by the unpacker requires a specific multipart
220 subtype, SGML. A MIME user agent that cannot handle SGML
221 documents can use multipart/mixed as the default.
222
223 The remaining parts of this section describe the
224 requirements for instantiating a compound SGML document on
225 the recipient's system. The next subsection discusses the
226 document parts and the following one considers other data,
227 included in the document by reference.
228
229 2.2 Specifying the Document Parts
230
231 Different implementations of SGML parsers use different
232 methods for storing the SGML declaration, DTD, and document
233 instance. Consequently, for transfer, they are sent as
234 separate MIME body parts. The unpacker will find these
235 parts and store them as the local application requires; the
236 receiver may substitute the local default for a missing SGML
237 declaration or DTD.
238
239 An alternate solution would have been to require a single
240 file, the concatenation of the declaration, DTD, and
241 instance. That would often require transferring much more
242 data than needed; often only the document instance is
243 required.
244
245 The SGML document parts are specified as parameters of the
246 Multipart/SGML content-type. Thus a sender includes the
247
248
249
250 E. Levinson Expires May 1, 1995 [Page 4]
251
252 Internet Draft MIME Types for SGML Documents
253
254
255 declaration, prolog, and instance as separate body parts,
256 and indicates, as the parameter value, the file-token
257 associated with that body part. If neither the SGML
258 declaration nor prolog is specified the recipient is free to
259 apply a local default.
260
261 The following parameters are provided for each document or
262 subdocument instance.
263
264 sgml-parm := *( ";" sgml-part "=" sgml-part-spec)
265 [ ";" "sgml-version" "=" iso-sgml-spec ]
266 [ ";" "created-with" "=" ref-or-tok ]
267 [ ";" "charset" "=" charset ]
268
269 sgml-part := "instance" / "prolog"
270 / "dtd" / "fosi" / extension-token
271
272 sgml-part-spec := file-token / sgml-public
273 / extension-token
274
275 file-token := content-ID
276
277 sgml-public := <An SGML PUBLIC identifier>
278
279 iso-sgml-spec := <The identifier of the SGML
280 specification to which the
281 document conforms, e.g.
282 ISO 8879-1986>
283
284 Sgml-parts specify the various parts of a complete document.
285 File-tokens are discussed in the next section; they refers
286 to body parts whose content-ID: field matches the file-
287 token. Sgml-public are identifiers that represent well
288 known files or entities; they are discussed in the next
289 section.
290
291 The two parameters, sgml-version and created-with, are
292 provided for guidance to user agents. Sgml-version
293 specifies the particular SGML standard to which the document
294 conforms. A user agent can use this value to invoke the
295 application appropriate to that version of the standard. In
296 the absence of this parameter the receiver will assume the
297 version to be ISO 8879-1986. The created-with parameter
298 provides guidance in cases where inter-operability with
299 respect to SGML may be a problem. In environments where
300 user's maintain several SGML processors, this parameter can
301 be used to invoke the appropriate implementation; it is
302 provided solely to accommodate existing environments. The
303 created-with parameter is likely to be deprecated.
304
305 The charset parameter specifies the body part character set.
306 If not specified, the default is us-ascii.
307
308 2.3 Resolving File References
309
310
311
312 E. Levinson Expires May 1, 1995 [Page 5]
313
314 Internet Draft MIME Types for SGML Documents
315
316
317 The SGML prolog contains a Document Type Definition (DTD)
318 which defines document parts (entities) that can be
319 referenced for inclusion or interpolation. The entities
320 point to files that can contain SGML coded text, text not to
321 be interpreted, images, or other data. Within SGML there
322 are two types of file reference entities, SYSTEM and PUBLIC.
323 PUBLIC entities specify SGML document parts that are known
324 to and resolvable by SGML viewers and editors. The SYSTEM
325 identifiers refer to files in the local environment. In
326 order for the recipient's SGML application to properly
327 process the document, the file references must be resolvable
328 in the recipient's environment. Conceptually, one must
329 replace each of the sender's file references with a
330 corresponding reference in the recipient's file system.
331
332 There are two issues here. First, the sending user agent
333 must identify, in an architecturally neutral form, the
334 sender's file references. Second, the correspondence
335 between the file identity (name) and MIME body part must be
336 preserved. This document applies the principle of "sender
337 makes right" to these issues and requires first, that the
338 packer convert all file references into a unique token
339 containing only US-ASCII characters. The sender also must
340 insure that the corresponding body part header contains a
341 Content-ID: field whose value is the file's token. Thus,
342 the internal file name, now a token suitable for an 822
343 header, explicitly appears in the document and as a
344 content-ID label of the corresponding MIME body part. When
345 the receiver stores the body part in the recipient's file
346 system it can convert the internal file references (tokens)
347 into valid local references.
348
349 2.4 Processors for Non-SGML Data
350
351 Non-SGML data requires the SGML application to invoke a
352 processor to format the data for display. The
353 correspondence between the entity (file name) that contains
354 the data and the processor is contained in the type field of
355 the SGML entity declaration. The type corresponds to an
356 SGML notation declaration The notation declaration contains
357 the operating system command string that invokes the
358 processor. That is, the string in the notation declaration
359 is an arbitrary command! There are two problems with this
360 situation, the command may only be valid in the sender's
361 environment and, even if it is valid in the recipient's,
362 invoking that command is a security hazard.
363
364 Therefore, this document requires that any type used in an
365 SGML notation be an valid MIME content type (or an extension
366 token) thus insuring the recipient's user agent knows how to
367 invoke the local display software. The recipient's mail
368 user agent just substitutes a local string for the string in
369 the corresponding notation declaration.
370
371
372
373
374 E. Levinson Expires May 1, 1995 [Page 6]
375
376 Internet Draft MIME Types for SGML Documents
377
378
379 3 The SGML Subtypes
380
381 A complete document may be a single instance in which all
382 the other document parts are defined by existing standards
383 or private agreements. It may also be a set of parts
384 several of which must be included in the MIME message. Two
385 SGML subtypes are defined, Application/SGML and
386 Multipart/SGML. Both content types use the same parameters.
387 The multipart subtype is considered first; it is the general
388 case. The application subtype is a simplification for the
389 case where the multipart would contain a single part. It is
390 also used for SGML subdocument entities, that is, text with
391 mark-up.
392
393 3.1 The Multipart/SGML Subtype
394
395 An SGML document can be carried in a MIME message as a
396 Multipart body of subtype SGML (Content-Type:
397 Multipart/SGML). The content-type parameters specify each
398 of the parts of the SGML document. Additional parameters
399 specify the software that created the document and the
400 applicable SGML standard.
401
402 A complex document contains references to standard parts
403 (PUBLIC entities) and other files (SYSTEM entities). For
404 PUBLIC entities no corresponding body part need exist. Each
405 SYSTEM entity, on the other hand, corresponds to a body part
406 labelled with a Content-ID: header.
407
408 Thus a complete SGML document can appear as the following
409 MIME message.
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436 E. Levinson Expires May 1, 1995 [Page 7]
437
438 Internet Draft MIME Types for SGML Documents
439
440
441
442 Content-Type: Multipart/SGML;
443 instance=9311141653.VHFradio@ryc.pa.nj.us;
444 prolog=9310240942.radioDTD@ryc.pa.nj.us;
445 boundary=tiger-lily
446
447 --tiger-lily
448 Content-type: Application/SGML
449 Content-ID: 9311141653.VHFradio@ryc.pa.nj.us
450
451 < ... an SGML instance >
452
453 --tiger-lily
454 Content-type: Image/gif
455 Content-ID: 9312161426.figure1@ryc.pa.nj.us
456
457 <-- binary GIF date -->
458 --tiger-lily
459 Content-type: Application/SGML
460 Content-ID: 9310240942.radioDTD@ryc.pa.nj.us
461
462 <!DOCTYPE radio
463 PUBLIC -//USA-DOD//DTD DRAFT TEMPLATE 911201//EN"
464 [<!ENTITY figure1
465 SYSTEM "9312161426.figure1@ryc.pa.nj.us" NDATA gif
466 -- a reference to the file "figure1" --
467 >]
468
469 --tiger-lily--
470
471 3.2 The Application/SGML Subtype
472
473 Application/SGML is used when transferring a file containing
474 text and mark-up as part of a Multipart/SGML message or as a
475 SGML document contained in a single body part.
476
477 application-subtype := ("octet-stream" *stream)
478 / "postscript"
479 / ("sgml" *sgml-parm)
480 / extension-token
481
482 The following example shows a MIME message with a content-
483 type that specifies a prolog.
484
485 Content-Type: application/SGML;
486 prolog="-//XYZ-CORP//SUBDOC RFC2010 100401//EN"
487
488 <! ... an SGML instance >
489
490 3.3 Character Set Considerations
491
492 Consistent with MIME defaults, SGML documents use, by
493 default, the ASCII character set. For documents not in that
494 character set, the charset= parameter of the Content-Type:
495
496
497
498 E. Levinson Expires May 1, 1995 [Page 8]
499
500 Internet Draft MIME Types for SGML Documents
501
502
503 field specifies the actual character set. Note that the
504 values of the charset parameter must be registered with
505 IANA, or be a mutually agreed upon extension-token (i.e.,
506 charset=X-set).
507
508 4 The Content-ID Header Field
509
510 The Content-ID: header field , defined in [RFC-MIME], is a
511 globally unique identifier and provides the link between
512 file references within the SGML document and the MIME body
513 parts. It constitutes a unique file name token which
514 represents the sender's file name to which the body part
515 corresponds. The process that creates the Multipart/SGML
516 body parts, generates a content-ID value value for each file
517 referenced by the document's DTDs. The packer then uses the
518 content-ID value to replace the file name in the
519 corresponding SGML ENTITY statements. Conversion of those
520 values into valid references in the receiver's file system
521 occurs upon document storage or display.
522
523 It is recommended that the content-ID value include the name
524 used within the ENTITY statement; this will improve the
525 ability to associate the sender's data with the receivers.
526
527 5 SDIF [ISO-SDIF] Data Streams
528
529 SDIF is an interchange format standard for SGML documents
530 [ISO-SDIF]. It defines a data stream that may contain
531 several SGML documents. This section defines a Multipart
532 subtype "SDIF" for an SDIF data stream that contains one or
533 more SGML documents. Messages that conform to the SDIF
534 subtype will conform to [ISO-SDIF].
535
536 Briefly, an SDIF data stream is a sequence of SGML documents
537 and their subdocument and external entities (c.f. Appendix
538 A). These external entities are defined in the DTD and are
539 referred to via their SGML name elsewhere in the prolog or
540 in the document instance. Names are not unique across
541 documents and subdocuments. Their scope is the document or
542 subdocument that defines it and the subdocuments contained
543 therein until redefined in one of those subdocuments.
544
545 To provide unique identities within the SDIF data stream,
546 each document and subdocument is recursively examined,
547 breadth first, for external entities; each entity is
548 sequentially numbered and placed in the data stream in the
549 order encountered. Then each SGML document or subdocument
550 structure in the SDIF data stream lists the number of the
551 first entity it contains. When encoded in MIME, SDIF
552 sequence numbers need not be explicitly represented; they
553 can be recovered by parsing the input. The content-ID
554 serves the same purpose and the MIME user agent can use it
555 directly.
556
557
558
559
560 E. Levinson Expires May 1, 1995 [Page 9]
561
562 Internet Draft MIME Types for SGML Documents
563
564
565 An SDIF data stream is encoded within a MIME message as a
566 Multipart/SDIF body part. It contains one to three body
567 parts. The first and third body parts, are Multipart/Mixed
568 and are labelled with a content description field whose
569 value is related-documents-A and related-documents-B
570 respectively. The second body part is either a
571 Multipart/SGML or Application/SGML (mime/sgml, for
572 convenience) A Multipart/Mixed body parts only contain
573 mime/sgml body parts.
574
575 The Multipart/SDIF content type has a character set
576 parameter which specifies the character set used for SGML
577 markup tokens through-out the data stream.
578
579 There are five SDIF entity types:
580
581
582 subdocument These can contain references to external entities as
583 well as marked up text.
584
585
586 text An external entity containing only marked up text.
587
588
589 data An external entity containing non-SGML data, images,
590 for example.
591
592
593 public-text Corresponds to a PUBLIC external reference and contains
594 a NULL message body.
595
596
597 cross-reference Corresponds to a previously included external entity.
598 This avoids duplicating material previously included.
599 It contains a NULL message body. This document
600 requires the referenced body part to have already
601 appeared to enable the user agent to display the SGML
602 documents as they are encountered.
603
604 The subdocument and text SDIF entities become Application/SGML body
605 parts and data entities which are encapsulated as the appropriate MIME
606 content type, application/SGML, image/gif, etc. The public-text and
607 cross-reference entities have null message bodies and are handled as
608 parameters, public and cross-reference, of an Application/SDIF content
609 type. The syntax is:
610
611
612
613
614
615
616
617
618
619
620
621
622 E. Levinson Expires May 1, 1995 [Page 10]
623
624 Internet Draft MIME Types for SGML Documents
625
626
627
628 appl-subtype := <existing> / "sdif" sdif-param
629
630 sdif-param := ";" "public" "="
631 <an SGML PUBLIC identifier>
632 / ";" "cross-reference" "="
633 <a previous MIME body part Content-ID>
634 <-- the enclosing Multipart/SDIF body
635 part is taken as the root (level 1)
636 for numbering body parts -->
637
638 SDIF requires the entity name to accompany each entity in the data
639 stream. When MIME is used to transfer SDIF data streams the entity name
640 will be the value of the content-description field in each body part.
641
642 SDIF does not distinguish among the parts of a document entity (declara-
643 tion, prolog, and instance). However, when SGML documents are contained
644 in a Multipart/SDIF message the document parts are sent as separate body
645 parts; that is, as a Multipart/SGML body part.
646
647 Finally, SDIF uses sequential numbers to uniquely identify each entity,
648 called an entity-identifier in [ISO-SDIF]. To locate the position of
649 the first external entity, a first-identifier is provided for each docu-
650 ment or sub-document. These are not necessary when using the methods in
651 this paper but can be derived. Within a Multipart/SDIF message each
652 body part is numbered sequentially, starting at 1 with the first
653 Application/SGML body part. Note that the only Multipart body part that
654 can be present in a Multipart/SDIF message is Mulitpart/Alternative.
655 That will resolve into a single body part and shall be treated as though
656 it were a non-multipart body part. The subdocument, text and data enti-
657 ties may, in fact, be Message/External body parts. With the numbering
658 described the unpacker may build a table to translate body parts into
659 SDIF.
660
661 When encapsulated with MIME each document can be displayed individually.
662 The SDIF stream is a sequence of Multipart/SGML body parts.
663
664 6 Security
665
666 An SGML parser can be directed to invoke a local process, usually to
667 format or display a graphical image. That capability presents an oppor-
668 tunity for abuse. To understand the potential problems requires under-
669 standing two SGML constructs, entity and notation statements, presented
670 below. Capitalized items are literals, lowercase ones are tokens, and
671 the special characters are markup escape sequences.
672
673 <!ENTITY name SYSTEM file NDATA type>
674 <!NOTATION type SYSTEM qstring>
675
676 The document text will refer to name which, in turn, will cause the
677 application, type, represented by qstring to be invoked. Qstring could
678 be "delete *.*".
679
680 To eliminate potential problems the unpacker should accept only
681
682
683
684 E. Levinson Expires May 1, 1995 [Page 11]
685
686 Internet Draft MIME Types for SGML Documents
687
688
689 registered MIME types and subtypes. The unpacker can then replace the
690 qstring in the NOTATION statement with the appropriate string for the
691 recipient's environment. They are already defined for MIME. Types in
692 the message that are not defined can be replaced by a no-operation NOTA-
693 TION statement.
694
695 SGML also provides for sending non-interpreted data to the display dev-
696 ice or typesetter. The security hazard presented is similar to that
697 posed by the use of PostScript. Greater threats may be posed by more
698 "powerful" display systems and typesetters. Unauthorized access to the
699 recipient's system and resources may be possible.
700
701 7 References
702
703
704 [ISO-SGML] ISO 8879:1988, Information processing -- Text and office
705 systems -- Standard Generalized Markup Language (SGML).
706
707
708 [ISO-SDIF] ISO 9069:1988, Information Processing - SGML Support Facili-
709 ties -- SGML Document Interchange Format (SDIF).
710
711
712 [RFC-822] Crocker, D., Standard for the Format of ARPA Internet Text
713 Messages, August 1982, University of Delaware, RFC 822.
714
715
716 [RFC-HDRC] Moore, Keith, Representation of Non-Ascii Text in Internet
717 Message Headers, June, 1992, RFC 1342
718
719
720 [RFC-MIME] Borenstein, N. and Freed, N., MIME (Mulitpurpose Internet
721 Mail Extensions): Mechanisms for Specifying and Describing
722 the Format of Internet Message Bodies, June 1992, RFC 1341.
723
724
725 [US-ASCII] Coded Character Set -- 7-Bit American Standard Code for
726 Information Interchange, ANSI X3.4-1986.
727
728 8 Acknowledgements
729
730 The author acknowledges Harald T. Alvestrand, UniNett, Andy Gelsey, CSC,
731 Nathaniel Borenstein, First Virtual Holdings Incorporated, Einar Stef-
732 ferud, Network Management Associates, Inc, John Klensin, MCI, and Erik
733 Naggum, for their suggestions, explanations, and encouragement. No
734 errors or faults in this document can be ascribed to them, those are
735 mine.
736
737 UNIX is a registered trademark of UNIX System Laboratories, Inc.
738
739 9 Author's Address
740
741 Ed Levinson elevinson@accurate.com Accurate Information Systems, Inc. 2
742 Industrial Way Eatontown, NJ 0772
743
744
745
746 E. Levinson Expires May 1, 1995 [Page 12]
747
748 Internet Draft MIME Types for SGML Documents
749
750
751 Appendix A. SGML for IETFers
752
753 This is a description of the elements of the Standard Generalized Markup
754 Language (SGML) that are key to understanding the relationship between
755 SGML and the Multipurpose Internet Mail Extensions (MIME). For the pur-
756 poses of this discussion, and without doing too much damage to the SGML
757 specification, an SGML document contains text, markup, and references to
758 non-text document elements (graphics). For a complete and accurate
759 description see ISO 8879, Information Processing - Text and office sys-
760 tems - Standard Generalized Markup Language (SGML).
761
762 An SGML document has the following structure (the parenthesized numbers
763 refer to productions in ISO 8879) and is processed by an application
764 called an SGML parser. Note that Internet style ABNF is used for nota-
765 tion here, SGML uses a different style.
766
767 sgml-doc ::= sgml-decl prolog doc-inst (2)
768 sgml-sub-doc ::= dtd doc-inst (3)
769
770 Sgml-decl defines the various elements and parameters of SGML. For
771 example, the characters that introduce and end markup tags, "<" and ">"
772 respectively will be used here, the maximum length of markup tags, etc..
773
774 The prolog defines the document structure, usually through an SGML con-
775 struct called the document type definition (DTD). Most importantly for
776 interchange considerations, the DTD contains references to external
777 files, system commands, and text to be sent directly to a typesetter or
778 printer.
779
780 Doc-inst is the actual document instance or text; it also includes
781 graphic elements, other text with or without markup, by reference to DTD
782 elements.
783
784 The remainder of this discussion focuses on two elements which a DTD
785 references, entities and notations. They appear in the DTD and have the
786 following format.
787
788 entity ::= "<!" "ENTITY" name e-text ">" (101)
789 e-text ::= q-string | data | b-text | external (105)
790 data ::= ( "CDATA" | "SDATA" | "PI" ) q-string (106)
791 external ::= ext-id
792 [ ( "SUBDOC" | ( "NDATA" type ) ) ] (108)
793 ext-id ::= ( "SYSTEM" q-string)
794 | ( "PUBLIC" pub-id [q-string] ) (73)
795 notation ::= "<!" "NOTATION" type ext-id ">" (148)
796
797 where name is a character string and the definition of b-text left to
798 ISO 8879; for convenience q-string has been substituted for the SGML
799 term parameter literal. Entities referred to via the SUBDOC keyword
800 differ from SGML documents in that they only contain a DTD and a doc-
801 inst.
802
803 Using the above productions the following simple example entities demon-
804 strate the important issues. Name and type are alphanumeric tokens and
805
806
807
808 E. Levinson Expires May 1, 1995 [Page 13]
809
810 Internet Draft MIME Types for SGML Documents
811
812
813 q-string is a series of characters enclosed in double (or single) quote
814 marks.
815
816 <!ENTITY name PUBLIC pname> (A)
817 <!ENTITY name SYSTEM fname> (B)
818 <!ENTITY name SYSTEM fname NDATA type> (C)
819 <!NOTATION type SYSTEM command> (D)
820 <!ENTITY name PI q-string> (E)
821
822 Form A refers to a well known or "public" name that the SGML parser is
823 able to resolve; in the marked up text there will be a markup item
824 <name> that directs the parser to include the corresponding public file.
825 Similarly, form B corresponds to a locally known file. Form C allows
826 the markup text to refer to non-SGML data, an image for example, and the
827 type parameter must match the type of a NOTATION element . The matching
828 element's command parameter specifies the command which processes the
829 file fname. Finally form E, processing instructions, specifies a string
830 of characters to be sent directly to the output device.
831
832 These examples give rise to the following issues when the document is
833 transferred from one environment to another.
834
835
836 A Is the public name known to the recipient? The recipient SGML
837 parser may not know of the public file and this will be discovered
838 when it processes the document.
839
840
841 B What is the file name on the recipient system? There must be some
842 process which binds the sender's file names to the recipient.
843
844
845 C See B and D.
846
847
848 D Direct use of the NOTATION form is a large security risk, an invi-
849 tation to a Trojan Horse attack. The recipient must be protected
850 from a sender invoking an arbitrary command on the recipient sys-
851 tem.
852
853
854 E Processing instructions permit the sender to manipulate the reci-
855 pient output device. This is the same risk that exists for
856 PostScript documents and is not addressed.
857
858 Issues A through D are addressed in this document.
859
860
861
862
863
864
865
866
867
868
869
870 E. Levinson Expires May 1, 1995 [Page 14]
871
872 Internet Draft MIME Types for SGML Documents
873
874
875 Appendix B. Content-Type registrations
876 _________________________________
877
878 The Application/SGML Content-Type
879
880
881 (1) MIME type name: Application
882
883
884 (2) MIME subtype name: SGML
885
886
887 (3) Required parameters: none
888
889
890 (4) Optional parameters: declaration, prolog, instance, fosi, charset,
891 SGML-version, created-with
892
893
894 (5) Encoding considerations: may be encoded
895
896
897 (6) Security considerations: none
898
899
900 (7) Specification:
901
902 This subtype is used for text marked with the Standard Generalized
903 Markup Language [ISO SGML]. Body parts of this subtype within a
904 Multipart/SGML body part will contain a Content-ID: field.
905
906 _________________________________
907
908 The Application/SDIF Content-Type
909
910
911 (1) Mime type name: Application
912
913
914 (2) MIME subtype name: SDIF
915
916
917 (3) Required parameters: one of public or cross-reference
918
919
920 (4) Optional parameters: none
921
922
923 (5) Encoding considerations: none
924
925
926 (6) Security considerations:
927
928
929
930
931
932 E. Levinson Expires May 1, 1995 [Page 15]
933
934 Internet Draft MIME Types for SGML Documents
935
936
937 (7) Specification:
938
939 This subtype contains a NULL or empty message body. The value of
940 the public parameter is an SGML PUBLIC entity identifier. The
941 value of cross-reference is the body part identifier of a previ-
942 ously occurring body part.
943
944 _________________________________
945
946 The Multipart/SGML Content-Type
947
948
949 (1) Mime type name: Multipart
950
951
952 (2) MIME subtype name: SGML
953
954
955 (3) Required parameters: boundary
956
957
958 (4) Optional parameters: declaration, prolog, instance, fosi, charset,
959 SGML- version, created-with
960
961
962
963 (5) Encoding considerations: none
964
965
966 (6) Security considerations: none
967
968
969 (7) Specification:
970
971 As specified in this document.
972
973 _________________________________
974
975 The Multipart/SDIF Content-Type
976
977
978 (1) Mime type name: Multipart
979
980
981 (2) MIME subtype name: SDIF
982
983
984 (3) Required parameters: boundary
985
986
987 (4) Optional parameters: charset
988
989
990 (5) Encoding considerations: none
991
992
993
994 E. Levinson Expires May 1, 1995 [Page 16]
995
996 Internet Draft MIME Types for SGML Documents
997
998
999 (6) Security considerations: none
1000
1001
1002 (7) Specification:
1003
1004 As specified in this document.
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056 E. Levinson Expires May 1, 1995 [Page 17]
1057

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24