INTERNET-DRAFT E. Nebel Form-based File Upload in HTML L. Masinter draft-ietf-html-fileupload-02.txt Xerox Corporation Expires in 6 months April 19, 1995 Form-based File Upload in HTML Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). 1. Abstract Currently, HTML forms allow the producer of the form to request information from the user reading the form. These forms have proven useful in a wide variety of applications in which input from the user is necessary. However, this capability is limited because HTML forms don't provide a way to ask the user to submit files of data. Service providers who need to get files from the user have had to implement custom user applications. (Examples of these custom browsers have appeared on the www-talk mailing list.) Since file-upload is a feature that will benefit many applications, this draft proposes an extension to HTML to allow information providers to express file upload requests uniformly, and a MIME compatible representation for file upload responses. This draft also includes a description of a backward compatibility strategy that allows new servers to interact with the current HTML user agents. The proposal is independent of which version of HTML it becomes a part. 2. HTML forms with file submission The current draft HTML specification defines eight possible values for the attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT. In addition, it defines the default ENCTYPE attribute of the FORM element using the POST METHOD to have the default value "application/x-www-form-urlencoded". This proposal makes three changes: 1) add a FILE option for the TYPE attribute of INPUT 2) Allow an ACCEPT attribute for INPUT tag, which is a list of media types or type patterns allowed for the input 3) allow the ENCTYPE of a FORM to be "multipart/form-data". These changes might be considered independently, but are all necessary for reasonable file upload. The author of an HTML form who wants to request one or more files from a user would write (for example):
The change to the HTML DTD is to add one item to the entity "InputType". In addition, it is proposed that the INPUT tag have an ACCEPT attribute, which is a list of comma-separated media types. ... (other elements) ... ... (other elements) ... 3. Suggested implementation While user agents that interpret HTML have wide leeway to choose the most appropriate mechanism for their context, this section suggests how one class of user agent, WWW browsers, might implement file upload. 3.1 Display of FILE widget When a INPUT tag of type FILE is encountered, the browser might show a display of (previously selected) file names, and a "Browse" button or selection method. Selecting the "Browse" button would cause the browser to enter into a file selection mode appropriate for the platform. Window-based browsers might pop up a file selection window, for example. In such a file selection dialog, the user would have the option of replacing a current selection, adding a new file selection, etc. Browser implementors might choose let the list of file names be manually edited. If an ACCEPT attribute is present, the browser might constrain the file patterns prompted for to match those with the corresponding appropriate file extensions for the platform. 3.2 Action on submit When the user completes the form, and selects the SUBMIT element, the browser should send the form data and the content of the selected files. The encoding type application/x-www-form-urlencoded is inefficient for sending large quantities of binary data. Thus, a new media type, multipart/form-data, is proposed as a way of efficiently sending the values associated with a filled-out form from client to server. 3.3 use of multipart/form-data The definition of multipart/form-data is included in section 7. The media-type multipart/form-data follows the rules of all multipart MIME data streams as outlined in RFC 1521--a boundary is selected that does not occur in any of the data. Each field of the form is sent, in the order in which it occurs in the form, as a part of the multipart stream. Each part identifies the INPUT name within the original HTML form using a "content-disposition: form-data" header with a name attribute specifying the field name. Each part has an optional Content-Type (which defaults to text/plain). File inputs should be identified as either application/octet-stream or the appropriate media type, if known. If multiple files were selected, they should be transferred together using the multipart/mixed format. The "content-transfer-encoding" header should be supplied for all fields whose values do not conform to the default 7BIT encoding. (All characters 7-bit US-ASCII data with lines no longer than 1000 characters.) Otherwise, file data and longer field values may be transferred using a content-transfer-encoding appropriate to the protocol of the ACTION in the form. For HTTP applications, content-transfer-encoding of "binary" may be use. If the ACTION is a "mailto:" URL, then the user agent may encode the data appropriately to the mail transport mechanism. [See section 5 of RFC 1521 for more details.] File inputs may optionally identify the file name using the "filename" attribute on the content-disposition header. This is not required, but is as a convenience for those cases where, for example, the uploaded files might contain references to each other, e.g., a TeX file and its .sty auxiliary style description. On the server end, the ACTION might point to a HTTP URL that implements the forms action via CGI. In such a case, the CGI program would note that the content-type is multipart/form-data, parse the various fields (checking for validity, writing the file data to local files for subsequent processing, etc.). 3.4 Interpretation of other attributes The VALUE attribute might be used with tags for a default file name. This use is probably platform dependent. It might be useful, however, in sequences of more than one transaction, e.g., to avoid having the user prompted for the same file name over and over again. The SIZE attribute might be specified using SIZE=width,height, where width is some default for file name width, while height is the expected size showing the list of selected files. For example, this would be useful for forms designers who expect to get several files and who would like to show a multiline file input field in the browser (with a "browse" button beside it, hopefully). It would be useful to show a one line text field when no height is specified (when the forms designer expects one file, only) and to show a multiline text area with scrollbars when the height is greater than 1 (when the forms designer expects multiple files). 4. Backward compatibility issues While not necessary for successful adoption of an enhancement to the current WWW form mechanism, it is useful to also plan for a migration strategy: users with older browsers can still participate in file upload dialogs, using a helper application. Most current web browers, when given , will treat it as and give the user a text box. The user can type in a file name into this text box. In addition, current browsers seem to ignore the ENCTYPE parameter in the and the user types "Joe Blow" in the name field, and selects a text file "file1.txt" and also an image file "file2.gif" for the answer to 'What files are you sending?'. The client would send back the following data: Content-type: multipart/form-data, boundary=AaB03x --AaB03x content-disposition: form-data; name="field1" Joe Blow --AaB03x content-disposition: form-data; name="pics" Content-type: multipart/mixed, boundary=BbC04y --BbC04y Content-disposition: attachment; filename="file1.txt" Content-Type: text/plain Content-Transfer-Encoding: binary ... contents of file1.txt ... --BbC04y Content-disposition: attachment; filename="file2.gif" Content-type: image/gif Content-Transfer-Encoding: binary ...contents of file2.gif... --BbC04y-- --AaB03x-- 7. Registration of multipart/form-data The media-type multipart/form-data follows the rules of all multipart MIME data streams as outlined in RFC 1521. It is intended for use in returning the data that comes about from filling out a form. In a form (in HTML, although other applications may also use forms), there are a series of fields to be supplied by the user who fills out the form. Each field has a name. The name of the field is restricted to be a set of US-ASCII graphic characters; within a given form, the names are unique. multipart/form-data contains a series of parts. Each part is expected to contain a content-disposition header where the value is "form-data" and a name attribute specifies the field name within the form, e.g., 'content-disposition: form-data; name="xxxxx"', where xxxxx is the field name corresponding to that field. As with all multipart MIME types, each part has an optional Content-Type which defaults to text/plain. Note that mime headers are generally required to consist only of 7-bit data in the US-ASCII character set. This specification thus requires that the field names used consist of 7-bit ascii US characters. If the contents of a file are returned via filling out a form, then the file input is identified as application/octet-stream or the appropriate media type, if known. If multiple files are to be returned as the result of a single form entry, they can be returned as multipart/mixed embedded within the multipart/form-data. The "content-transfer-encoding" header should be supplied for all fields whose values do not conform to the default 7BIT encoding (all characters 7-bit US-ASCII data with lines no longer than 1000 characters.) Otherwise, file data and longer field values may be transferred using a content-transfer-encoding appropriate to the protocol of the ACTION in the form. For HTTP applications, content-transfer-encoding of "binary" may be use. If the ACTION is a "mailto:" URL, then the user agent may encode the data appropriately to the mail transport mechanism. [See section 5 of RFC 1521 for more details.] File inputs may also identify the file name. The file name may be described using the 'filename' parameter of the "content-disposition" header. This is not required, but is strongly recommended in any case where the original filename is known. This is useful or necessary in many applications. 8. Security Considerations It is important that a user agent not send any file that the user has not asked to be sent, explicitly. Thus, HTML interpreting agents are expected to confirm any default file names that might be suggested with . Never have any hidden fields be able to specify any file. 9. Conclusion The suggested implementation gives the client a lot of flexibility in the number and types of files it can send to the server, it gives the server control of the decision to accept the files, and it gives servers a chance to interact with browsers which do not support INPUT TYPE "file". The change to the HTML DTD is very simple, but very powerful. It enables a much greater variety of services to be implemented via the World-Wide Web than is currently possible due to the lack of a file submission facility. This would be an extremely valuable addition to the capabilities of the World-Wide Web. A. Authors' Addresses Larry Masinter masinter@parc.xerox.com Xerox Palo Alto Research Center Voice: (415) 812-4365 3333 Coyote Hill Road Fax: (415) 812-4333 Palo Alto, CA 94304 Ernesto Nebel nebel@xsoft.sd.xerox.com XSoft, Xerox Corporation Voice: (619) 676-7817 10875 Rancho Bernardo Road, Suite 200 Fax: (619) 676-7865 San Diego, CA 92127-2116 B. Media type registration for multipart/form-data Media Type name: multipart Media subtype name: form-data Required parameters: none Optional parameters: none Encoding considerations: No additional considerations other than as for other multipart types. Published specification: draft-ietf-html-fileupload-02.txt Security Considerations The multipart/form-data type introduces no new security considerations beyond what might occur with any of the enclosed parts. Person & email address to contact for further information: Larry Masinter masinter@parc.xerox.com