/[suikacvs]/webroot/www/2004/id/draft-ietf-html-fileupload-01.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-html-fileupload-01.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Tue Jun 15 08:04:04 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 wakaba 1.1
2     INTERNET-DRAFT E. Nebel
3     Form-based File Upload in HTML L. Masinter
4     draft-ietf-html-fileupload-01.txt Xerox Corporation
5     Expires May 30, 1995 November 30, 1994
6    
7    
8     Form-based File Upload in HTML
9    
10     Status of this Memo
11    
12     This document is an Internet-Draft. Internet-Drafts are working
13     documents of the Internet Engineering Task Force (IETF), its areas,
14     and its working groups. Note that other groups may also distribute
15     working documents as Internet-Drafts.
16    
17     Internet-Drafts are draft documents valid for a maximum of six months
18     and may be updated, replaced, or obsoleted by other documents at any
19     time. It is inappropriate to use Internet-Drafts as reference
20     material or to cite them other than as ``work in progress.''
21    
22     To learn the current status of any Internet-Draft, please check the
23     ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
24     Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
25     munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
26     ftp.isi.edu (US West Coast).
27    
28    
29     1. Abstract
30    
31     Currently, HTML forms allow the producer of the form to request
32     information from the user reading the form. These forms have proven
33     useful in a wide variety of applications in which input from the
34     user is necessary. However, this capability is limited because HTML
35     forms don't provide a way to ask the user to submit files of data.
36     Service providers who need to get files from the user have had to
37     implement custom user applications. (Examples of these custom
38     browsers have appeared on the www-talk mailing list.) Since
39     file-upload is a feature that will benefit many applications, this
40     draft proposes an extension to HTML to allow information providers
41     to express file upload requests uniformly, and a MIME compatible
42     representation for file upload responses. This draft also includes
43     a description of a backward compatibility strategy that allows new
44     servers to interact with the current HTML user agents.
45    
46     The proposal is independent of which version of HTML it becomes a
47     part; in particular, no proposal is being made of whether this might
48     be part of HTML 2.0, 2.1, or 3.0.
49    
50     2. HTML forms with file submission
51    
52     The current draft HTML specification <URL:http://www.hal.com/
53     %7Fconnolly/html-spec/spyglass-19941014/html-19941014.txt.Z> defines
54     eight possible values for the attribute TYPE of an INPUT element:
55     CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT.
56    
57     In addition, it defines the default ENCTYPE attribute of the FORM
58     element using the POST METHOD to have the default value
59     "application/x-www-form-urlencoded".
60    
61     This proposal makes three changes:
62     1) add a FILE option for the TYPE attribute of INPUT
63     2) Allow an ACCEPT attribute for INPUT tag, which is a list of
64     media types or type patterns allowed for the input
65     3) allow the ENCTYPE of a FORM to be "multipart/form-data".
66    
67     These changes might be considered independently, but are all
68     necessary for reasonable file upload.
69    
70     The author of an HTML form who wants to request one or more files
71     from a user would write (for example):
72    
73     <FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST>
74    
75     File to process: <INPUT NAME="userfile1" TYPE="file">
76    
77     <INPUT TYPE="submit" VALUE="Send File">
78    
79     </FORM>
80    
81     The change to the HTML DTD is to add one item to the entity
82     "InputType". In addition, it is proposed that the INPUT tag have an
83     ACCEPT attribute, which is a list of comma-separated media types.
84    
85    
86     ... (other elements) ...
87    
88     <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
89     RADIO | SUBMIT | RESET |
90     IMAGE | HIDDEN | FILE )">
91     <!ELEMENT INPUT - 0 EMPTY>
92     <!ATTLIST INPUT
93     TYPE %InputType TEXT
94     NAME CDATA #IMPLIED -- required for all but submit and reset
95     VALUE CDATA #IMPLIED
96     SRC %URI #IMPLIED -- for image inputs --
97     CHECKED (CHECKED) #IMPLIED
98     SIZE CDATA #IMPLIED --like NUMBERS,
99     but delimited with comma, not space
100     MAXLENGTH NUMBER #IMPLIED
101     ALIGN (top|middle|bottom) #IMPLIED
102     ACCEPT CDATA #IMPLIED --list of content types
103     >
104    
105     ... (other elements) ...
106    
107     3. Suggested implementation
108    
109     While user agents that interpret HTML have wide leeway to choose the
110     most appropriate mechanism for their context, this section suggests
111     how one class of user agent, WWW browsers, might implement file
112     upload.
113    
114     When a INPUT tag of type FILE is encountered, the browser might show
115     a display of (previously selected) file names, and a "Browse" button
116     or selection method. Selecting the "Browse" button would cause the
117     browser to enter into a file selection mode appropriate for the
118     platform. Window-based browsers might pop up a file selection
119     window, for example. In such a file selection dialog, the user would
120     have the option of replacing a current selection, adding a new file
121     selection, etc. Browser implementors might choose let the list of
122     file names be manually edited.
123    
124     If an ACCEPT attribute is present, the browser might constrain the
125     file patterns prompted for to match those with the corresponding
126     appropriate file extensions for the platform.
127    
128     When the user completes the form, and selects the SUBMIT element,
129     the browser should send the form data and the content of the
130     selected files. The encoding type application/x-www-form-urlencoded
131     is inefficient for sending large quantities of binary data. Thus, a
132     new media type, multipart/form-data, is proposed as a way of
133     efficiently sending the values associated with a filled-out form
134     from client to server.
135    
136     The media-type multipart/form-data follows the rules of all
137     multipart MIME data streams as outlined in RFC 1521--a boundary is
138     selected that does not occur in any of the data. Each field of the
139     form is sent, in the order in which it occurs in the form, as a part
140     of the multipart stream. Each part identifies the INPUT name within
141     the original HTML form using a "Name:" header. Each part has an
142     optional Content-Type (which defaults to text/plain). File inputs
143     should be identified as either application/octet-stream or the
144     appropriate media type, if known. If multiple files were selected,
145     they should be transferred together using the multipart/mixed
146     format.
147    
148     The "content-transfer-encoding" header should be supplied for all
149     fields whose values do not conform to the default 7BIT encoding.
150     (All characters 7-bit US-ASCII data with lines no longer than 1000
151     characters.) Otherwise, file data and longer field values may be
152     transferred using a content-transfer-encoding appropriate to the
153     protocol of the ACTION in the form. For HTTP applications,
154     content-transfer-encoding of "binary" may be use. If the ACTION is
155     a "mailto:" URL, then the user agent may encode the data
156     appropriately to the mail transport mechanism. [See section 5 of
157     RFC 1521 for more details.]
158    
159     File inputs may optionally identify the file name using the
160     "content-disposition" header. This is not required, but is as a
161     convenience for those cases where, for example, the uploaded files
162     might contain references to each other, e.g., a TeX file and its
163     .sty auxiliary style description.
164    
165     Browers may optionally include a Content-Length header both in the
166     overall reply and in individual components. The content-length is
167     *not* intended as a replacement for the multipart boundary as a way
168     of detecting the end of an individual component. It is *only*
169     supplied as a way forwarning the server of the amount of data
170     coming.
171    
172     On the server end, the ACTION might point to a HTTP URL that
173     implements the forms action via CGI. In such a case, the CGI program
174     would note that the content-type is multipart/form-data, parse the
175     various fields (checking for validity, writing the file data to local
176     files for subsequent processing, etc.).
177    
178     4. Backward compatibility issues
179    
180     While not necessary for successful adoption of an enhancement to the
181     current WWW form mechanism, it is useful to also plan for a
182     migration strategy: users with older browsers can still participate
183     in file upload dialogs, using a helper application. Most current web
184     browers, when given <INPUT TYPE=FILE>, will treat it as <INPUT
185     TYPE=TEXT> and give the user a text box. The user can type in a file
186     name into this text box. In addition, current browsers seem to
187     ignore the ENCTYPE parameter in the <FORM> element, and always
188     transmit the data as application/x-www-form-urlencoded.
189    
190     Thus, the server CGI might be written in a way that would note that
191     the form data returned had content-type
192     application/x-www-form-urlencoded instead of
193     multipart/form-data, and know that the user was using a browser
194     that didn't implement file upload.
195    
196     In this case, rather than replying with a "text/html" response, the
197     CGI on the server could instead send back a data stream that a helper
198     application might process instead; this would be a data stream of
199     type "application/x-please-send-files", which contains:
200    
201     * The (fully qualified) URL to which the actual form data should
202     be posted (terminated with CRLF)
203     * The list of field names that were supposed to be file contents
204     (space separated, terminated with CRLF)
205     * The entire original application/x-www-form-urlencoded form data
206     as originally sent from client to server.
207    
208     In this case, the browser needs to be configured to process
209     application/x-please-send-files to launch a helper application.
210    
211     The helper would read the form data, note which fields contained
212     'local file names' that needed to be replaced with their data
213     content, might itself prompt the user for changing or adding to the
214     list of files available, and then repackage the data & file contents
215     in multipart/form-data for retransmission back to the server.
216    
217     The helper would generate the kind of data that a 'new' browser should
218     actually have sent in the first place, with the intention that the URL
219     to which it is sent corresponds to the original ACTION URL. The point
220     of this is that the server can use the *same* CGI to implement the
221     mechanism for dealing with both old and new browsers.
222    
223     The helper need not display the form data, but *should* ensure that
224     the user actually be prompted about the suitability of sending the
225     files requested (this is to avoid a security problem with malicious
226     servers that ask for files that weren't actually promised by the
227     user.) It would be useful if the status of the transfer of the files
228     involved could be displayed.
229    
230     5. Other considerations
231    
232     5.1 Compression, encryption
233    
234     This scheme doesn't address the possible compression of files.
235     After some consideration, it seemed that the optimization issues of
236     file compression were too complex to try to automatically have
237     browsers decide that files should be compressed. Many link-layer
238     transport mechanisms (e.g., high-speed modems) perform data
239     compression over the link, and optimizing for compression at this
240     layer might not be appropriate. It might be possible for browsers to
241     optionally produce a content-transfer-encoding of x-compress for
242     file data, and for servers to decompress the data before processing,
243     if desired; this was left out of the proposal, however.
244    
245     Similarly, the proposal does not contain a mechanism for encryption
246     of the data; this should be handled by whatever other mechanisms are
247     in place for secure transmission of data, whether via secure HTTP or
248     mail.
249    
250     5.2 Deferred file transmission
251    
252     In some situations, it might be advisable to have the server
253     validate various elements of the form data (user name, account,
254     etc.) before actually preparing to receive the data. However,
255     after some consideration, it seemed best to require that servers
256     that wish to do this should implement this as a series of forms,
257     where some of the data elements that were previously validated might
258     be sent back to the client as 'hidden' fields, or by arranging the
259     form so that the elements that need validation occur first. This
260     puts the onus of maintaining the state of a transaction only on
261     those servers that wish to build a complex application, while
262     allowing those cases that have simple input needs to be built
263     simply. Clients are encouraged to supply content-length for overall
264     file input so that a busy server could detect if the proposed file
265     data is too large to be processed reasonably and just return an
266     error code and close the connection without waiting to process all
267     of the incoming data.
268    
269     If the INPUT tag includes the attribute MAXLENGTH, the user agent
270     should consider its value to represent the maximum Content-Length
271     (in bytes) which the server will accept for transferred files. In
272     this way, servers can hint to the client how much space they have
273     available for a file upload, before that upload takes place. It is
274     important to note, however, that this is only a hint, and the actual
275     requirements of the server may change between form creation and file
276     submission.
277    
278     5.3 Other choices for return transmission of binary data
279    
280     Various people have suggested using new mime top-level type
281     "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of
282     "packet" to express indeterminate-length binary data, rather than
283     relying on the multipart-style boundaries. While we are not opposed
284     to doing so, this would require additional design and
285     standardization work to get acceptance of "aggregate". On the other
286     hand, the 'multipart' mechanisms are well established, simple to
287     implement on both the sending client and receiving server, and as
288     efficient as other methods of dealing with multiple combinations of
289     binary data.
290    
291     5.4 Not overloading <INPUT>:
292    
293     Various people have wondered about the advisability of overloading
294     'INPUT' for this function, rather than merely providing a different
295     type of FORM element. Among other considerations, the migration
296     strategy which is allowed when using <INPUT> is important. In
297     addition, the <INPUT> field *is* already overloaded to contain most
298     kinds of data input; rather than creating multiple kinds of <INPUT>
299     tags, it seems most reasonable to enhance <INPUT>. The 'type' of
300     INPUT is not the content-type of what is returned, but rather the
301     'widget-type'; i.e., it identifies the interaction style with the
302     user. The description here is carefully written to allow <INPUT
303     TYPE=FILE> to work for text browsers or audio-markup.
304    
305     5.5 Default content-type of field data
306    
307     Many input fields in HTML are to be typed in. There has been some
308     ambiguity as to how form data should be transmitted back to servers.
309     Making the content-type of <INPUT> fields be text/plain clearly
310     disambiguates that the client should properly encode the data before
311     sending it back to the server with CRLFs.
312    
313     5.6 Interpretation of other attributes
314    
315     The VALUE attribute might be used with <INPUT TYPE=file> tags for
316     a default file name. This use is probably platform dependent,
317     however, and probably should be avoided.
318    
319     The SIZE attribute should probably not be used with <INPUT
320     TYPE=file> tags. For text input, it indicates the length in
321     characters for the text area for the prompt.
322    
323     5.7 Using Name instead of existing MIME headers for field names
324    
325     To avoid any confusion and overloading of existing mime headers,
326     this proposal has "Name" (instead of content-id or
327     content-description) for the name of the original field in the form
328     in multipart/form-data.
329    
330     On the other hand, the original file names might appear within a
331     multipart/mixed stream, and thus the proposal avoids defining a new
332     header field for transmitting the original file name by using
333     the "content-disposition" field to send the file name, if
334     appropriate. (Content-disposition is apparently the proposed way of
335     encoding file names in MIME messages, as well.)
336    
337     5.8 Allow form ACTION to be "mailto:"
338    
339     Independent of this proposal, it would be very useful for HTML
340     interpreting user agents to allow a ACTION in a form to be a
341     "mailto:" URL. This seems like a good idea, with or without this
342     proposal. Similarly, the ACTION for a HTML form which is received
343     via mail should probably default to the "reply-to:" of the message.
344     These two proposals would allow HTML forms to be served via HTTP
345     servers but sent back via mail, or, alternatively, allow HTML forms
346     to be sent by mail, filled out by HTML-aware mail recipients, and
347     the results mailed back.
348    
349     5.9 Remote files with third-party transfer
350    
351     In some scenarios, the user operating the client software might want
352     to specify a URL for remote data rather than a local file. In this
353     case, is there a way to allow the browser to send to the client a
354     pointer to the external data rather than the entire contents? This
355     capability could be implemented, for example, by having the client
356     send to the server data of type "message/external-body" with
357     "access-type" set to, say, "uri", and the URL of the remote data in
358     the body of the message.
359    
360     5.10 File transfer with ENCTYPE=x-www-form-urlencoded
361    
362     If a form contains <INPUT TYPE=file> elements but does not contain
363     an ENCTYPE in the enclosing <FORM>, the behavior is not specified.
364     It is probably inappropriate to attempt to URN-encode large
365     quantities of data to servers that don't expect it.
366    
367     5.11 CRLF used as line separator
368    
369     As with all MIME transmissions, CRLF is used as the separator for
370     lines in a POST of the data in multipart/www-form-data.
371    
372     6. Examples
373    
374     Suppose the server supplies the following HTML:
375    
376     <FORM ACTION="http://server.dom/cgi/handle"
377     ENCTYPE="multipart/form-data"
378     METHOD=POST>
379     What is your name? <INPUT TYPE=TEXT NAME=submitter>
380     What files are you sending? <INPUT TYPE=FILE NAME=pics>
381     </FORM>
382    
383     and the user types "Joe Blow" in the name field, and selects
384     a text file "file1.txt" and also an image file "file2.gif" for
385     the answer to 'What files are you sending?'.
386    
387     The client would send back the following data:
388    
389     Content-type: multipart/form-data, boundary=AaB03x
390     --AaB03x
391     Name: field1
392    
393     Joe Blow
394     --AaB03x
395     Name: pics
396     Content-type: multipart/mixed, boundary=BbC04y
397    
398     --BbC04y
399     Content-Description: file1.txt
400     Content-Type: text/plain
401     Content-Transfer-Encoding: binary
402    
403     ... contents of file1.txt ...
404     --BbC04y
405     Content-Description: file2.gif
406     Content-type: image/gif
407     Content-Transfer-Encoding: binary
408    
409     ...contents of file2.gif...
410     --BbC04y--
411     --AaB03x--
412    
413    
414     7. Conclusion
415    
416     The suggested implementation gives the client a lot of flexibility in
417     the number and types of files it can send to the server, it gives the
418     server control of the decision to accept the files, and it gives
419     servers a chance to interact with browsers which do not support INPUT
420     TYPE "file".
421    
422     The change to the HTML DTD is very simple, but very powerful. It
423     enables a much greater variety of services to be implemented via the
424     World-Wide Web than is currently possible due to the lack of a file
425     submission facility. This would be an extremely valuable addition to
426     the capabilities of the World-Wide Web.
427    
428    
429     A. Authors' Addresses
430    
431     Larry Masinter masinter@parc.xerox.com
432     Xerox Palo Alto Research Center Voice: (415) 812-4365
433     3333 Coyote Hill Road Fax: (415) 812-4333
434     Palo Alto, CA 94304
435    
436     Ernesto Nebel nebel@xsoft.sd.xerox.com
437     XSoft, Xerox Corporation Voice: (619) 676-7817
438     10875 Rancho Bernardo Road, Suite 200 Fax: (619) 676-7865
439     San Diego, CA 92127-2116
440    

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24