MHTML Working Group Alexander Hopmann INTERNET-DRAFT ResNova Software, Inc. Expires SIX MONTHS FROM---> February 20th, 1995 Packaging Aggregate HTML Objects Inside MIME Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet- Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Distribution of this document is unlimited. Please send comments to the MHTML working group at . To subscribe to this list, send a message to which contains the text "sub mhtml ". Discussions of the working group are archived at . Abstract Although HTML was designed within the context of MIME, more than the specification of HTML as defined in RFC 1866 is needed for two electronic mail user agents to be able to interoperate using HTML as a document format. These issues include the naming of objects that are normally referred to by URIs, and the means of aggregating objects that go together. This draft describes a set of guidelines that will allow conforming mail user agents to be able to send, deliver and display these HTML objects. In addition it is hoped that these techniques will also apply to the wider category of URI-enabled objects. Table of Contents 1. Introduction 1.1 Purpose 1.2 Overall Operation 1.3 URI References 1.3.1 Use of Multipart/Related 1.3.2 Content-ID URL References 1.3.3 New MIME Content Headers 1.4 Other MIME Issues 2. Examples 3. Security Considerations 4. Acknowledgments 5. References 6. Author's Address 1. Introduction 1.1 Purpose Although HTML [1] is a valid MIME [2] type, RFC 1866 does not provide enough specification in order for two electronic mail user agents to be able to interoperate using HTML as a document format. This draft describes a set of guidelines that will allow conforming mail user agents to be able to send, deliver and display HTML objects. In addition it is hoped that these techniques will also apply to the wider category of URI-enabled [3] objects. An HTML aggregate object is a MIME-encoded message that contains an HTML document as well as other data that is required in order to represent that object (inline pictures, style sheets, etc.). HTML aggregate objects can also include additional HTML documents that are linked to the first object, as well as other arbitrary MIME content. In designing HTML capabilities for electronic mail user agents (UAs), it is important to keep in mind the differing needs of several audiences. Mail sending agents will send aggregate HTML objects as an encoding of normal day-to-day electronic mail. Mail sending agents will also send aggregate HTML objects when a user wishes to mail a particular document from the World Wide Web to someone else. Finally mail sending agents will send aggregate HTML documents as automatic responders, providing access to WWW resources for non-IP connected clients. Mail receiving agents also have several differing needs. Some mail receiving agents will be able to receive an aggregate HTML document and display it just as any other text content type would be displayed. Others will have to pass this aggregate HTML document to an HTML browsing program, and provisions need to be made to make this possible. Finally several other constraints on the problem arise. It is important that it be possible for an HTML document to be signed and for it to be able to be transmitted to a client and displayed with a minimum chance of breaking the message integrity check that is part of the signature. 1.2 Overall Operation A mail user agent that wishes to send a content-type of HTML can just do so, so long as the normal data encoding issues are taken care of as specified in [2]. However at a basic level there are some differences between HTML being transferred by HTTP and HTML being transferred through Internet email. When transferred through HTTP, HTML by default uses the document character set ISO-8859-1. Within electronic mail, the default character set is US-ASCII. If a document uses any characters that are not in US-ASCII, the document must explicitly label the character set and perform appropriate MIME content encodings. Instead of applying normal MIME content encodings it is possible to translate non US-ASCII characters to HTML defined entity references. However it is inappropriate to use entity references to non-US-ASCII characters without labeling the document character set appropriately. 1.3 URI References The use of URI references creates some additional issues for aggregate HTML objects. Normal URI references can of course be used, however it is likely that many user agents may not be able to retrieve those objects referred to. This document provides a means for these additional objects to be transmitted with the HTML and for the links between these objects to be properly resolved. 1.3.1 Use of Multipart/Related Multiple objects should be aggregated using the multipart/related content type as defined in RFC 1872 [4]. RFC 1872 says that multipart/related should have parameters "start", and "type". The "start" parameter refers to the Content-ID of the sub-part which contains the main document, in this case, usually the primary HTML document. The main document should be the one first displayed by the receiving UA. The "type" parameter serves as a label for the type of the aggregate object, in this case "text/html". 1.3.2 Content-ID URL References An HTML body part can use Content-ID URLs as described by draft- levinson-cid-01.txt [5] to refer to other body parts of the same MIME message. Content-ID URLs can also refer to body parts in other MIME messages, but it is unlikely that many clients will be able to resolve the reference. 1.3.3 New MIME Content Headers In order to resolve URI references to other body parts, two new MIME content headers are required. Both of these place URIs in MIME header fields. Since MIME header fields have a limited length and URIs can get quite long, these lines may have to be folded. When the lines are folded, no additional non-white space characters may be introduced, and since white space is not allowed in URIs it is simply ignored. 1.3.3.1 Content-Base The Content-Base header field may be included in any MIME content header. It specifies the Content-Base for the body part and should be a full URL. Any relative URL references within the body part are made relative to the body parts "base URL". An HTML body part can also include a tag. If it does contain a tag, that tag takes precedence over the Content-Base header field. 1.3.3.2 Content-Location The Content-Location content header may be included in any MIME subpart header. It specifies the URI that corresponds to the object present in that subpart. If a URL is specified using the Content-Location header, it should be a fully qualified URL. 1.4 Other MIME issues Several other tricky issues may exist regarding the deployment of HTML email, however they are out of the scope of this document. These issues include the use of multipart/alternative for content negotiation, the use of mail/WWW gateways, and the use of URLs referring to objects outside of the encapsulation (both to WWW-based objects, objects in other MIME documents, and objects in other parts of the same MIME message but not in the same multipart/related). The use of multipart/alternative is in no way an HTML-specific issue and no clear solution exists at this time for the problem of content negotiation though electronic mail. The use of mail/WWW gateways should be facilitated by the provisions of this document. However this document makes no attempt to specify the format of a request to such a gateway. The use of references to outside of the "encapsulating MIME object" is not something that can be prohibited, but it is simply something that the sending UA needs to realize creates a danger of the receiving UA not being able to resolve the reference. Finally, a sending user agent should not make any assumptions about the method that the receiving user agent will use to display the HTML files. For example it should not use Content- Disposition and/or "file:" URLs under the assumption that the receiving UA is going to save the pieces of the HTML aggregate object as files on a disk to be displayed by a separate browser. Content-Disposition should only be used as described in RFC 1806 in order to distinguish between file attachments and inline message components. 2. Examples The first example is the simplest form of an HTML email message. This is not an aggregate HTML object, but simply one by itself. This message contains a hot-link but does not provide the ability to resolve the hot-link. To resolve the hot-link the receiving client would need either IP access to the Internet, or an electronic mail web gateway. From: some.user@resnova.com To: someone.else@entropy.net Subject: Hello there Mime-Version: 1.0 Content-Type: text/html

Hi there!

This is a rather pointless example of an HTML message.

Try clicking here.

The second example shows a simple HTML document with a picture in it: From: some.user@resnova.com To: someone.else@entropy.net Subject: Hello there Mime-Version: 1.0 Content-Type: multipart/related; boundary=3Dabcdefghij start=3D"<12345@entropy.net>"; type=3D"text/html" --abcdefghij Content-Type: text/html Content-ID: <12345@entropy.net>

Hi there!

This is a rather pointless example of an HTML message.

Try clicking here.

--abcdefghij Content-Type: image/jpeg Content-ID: <45678@entropy.net> Content-Transfer-Encoding: Base64 jdisfdsufhsdjvbfdhvbfhbvfhbvfdvifdjrivjifdjvfivfd etc=85 --abcdefghij=97 The third example shows the use of Content-Location. This example could be a web page that was mailed to someone. Note that the starting body part still needs a Content-ID. From: some.user@resnova.com To: someone.else@entropy.net Subject: Hello there Mime-Version: 1.0 Content-Type: multipart/related; boundary=3Dabcdefghij start=3D"<12345@entropy.net>"; type=3D"text/html" --abcdefghij Content-Type: text/html Content-ID: <12345@entropy.net> Content-Location: http://www.entropy.net/mydoc.html

Hi there!

This is a rather pointless example of an HTML message.

Try clicking here.

--abcdefghij Content-Type: image/jpeg Content-Location: http://www.entropy.net/myimg.jpg Content-Transfer-Encoding: Base64 jdisfdsufhsdjvbfdhvbfhbvfhbvfdvifdjrivjifdjvfivfd etc=85 --abcdefghij=97 3. Security Some Security Considerations include the potential to mail someone an object, and claim that it is represented by a particular URI (by giving it a Content-Location: header field). There can be no assurance that a WWW request for that same URI would normally result in that same object. Because of this problem, receiving User Agents should not cache this data in the same way that data that was retrieved through an HTTP or FTP request might be cached. In addition, by allowing people to mail aggregate HTML objects, we are opening the door to other potential security problems that until now were only problems for WWW users. For example, some HTML documents now either themselves contain executable content (JavaScript) or contain links to executable content (The "INSERT" specification, Java). It would be exceedingly dangerous for a receiving User Agent to execute content received through a mail message without careful attention to restrictions on the capabilities of that executable content. 4. Acknowledgments Thanks to Dave Crocker, Roy Fielding, Ed Levinson, and Paul Hoffman who, with me, worked out most of the details of this draft in an informal discussion at the Dallas December 1995 IETF. Additional thanks to Greg Herlihy and Ed Levinson for reviewing this draft. Thanks also to Jacob Palme who encouraged the existence of the Mail HTML effort and addressed many of the issues in his drafts. 5 References [1] T. Berners-Lee and D. Connolly. "HyperText Markup Language Specification 2.0" RFC 1866, Proposed Standard , November 1995. [2] N. Borenstein and N. Freed. "MIME (Multipurpose Internet Mail Extensions) Part One" RFC 1521, Proposed Standard , September 1993. [3] T. Berners-Lee, L. Masinter, and M. McCahill "Uniform Resource Locators (URL)" RFC 1738, Proposed Standard , December 1994. [4] E. Levinson "The MIME Multipart/Related Content-type" RFC 1872, Experimental , December 1995. 6 Author's Address Alex Hopmann alex.hopmann@resnova.com President ResNova Software, Inc. 5011 Argosy Dr. #13 Huntington Beach, CA 92649