/[suikacvs]/webroot/www/2004/id/draft-ietf-html-fileupload-02.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-html-fileupload-02.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:04 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 INTERNET-DRAFT E. Nebel
2 Form-based File Upload in HTML L. Masinter
3 draft-ietf-html-fileupload-02.txt Xerox Corporation
4 Expires in 6 months April 19, 1995
5
6
7 Form-based File Upload in HTML
8
9 Status of this Memo
10
11 This document is an Internet-Draft. Internet-Drafts are working
12 documents of the Internet Engineering Task Force (IETF), its areas,
13 and its working groups. Note that other groups may also distribute
14 working documents as Internet-Drafts.
15
16 Internet-Drafts are draft documents valid for a maximum of six months
17 and may be updated, replaced, or obsoleted by other documents at any
18 time. It is inappropriate to use Internet-Drafts as reference
19 material or to cite them other than as ``work in progress.''
20
21 To learn the current status of any Internet-Draft, please check the
22 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
23 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
24 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
25 ftp.isi.edu (US West Coast).
26
27
28 1. Abstract
29
30 Currently, HTML forms allow the producer of the form to request
31 information from the user reading the form. These forms have proven
32 useful in a wide variety of applications in which input from the
33 user is necessary. However, this capability is limited because HTML
34 forms don't provide a way to ask the user to submit files of data.
35 Service providers who need to get files from the user have had to
36 implement custom user applications. (Examples of these custom
37 browsers have appeared on the www-talk mailing list.) Since
38 file-upload is a feature that will benefit many applications, this
39 draft proposes an extension to HTML to allow information providers
40 to express file upload requests uniformly, and a MIME compatible
41 representation for file upload responses. This draft also includes
42 a description of a backward compatibility strategy that allows new
43 servers to interact with the current HTML user agents.
44
45 The proposal is independent of which version of HTML it becomes a
46 part.
47
48 2. HTML forms with file submission
49
50 The current draft HTML specification defines eight possible values
51 for the attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE,
52 PASSWORD, RADIO, RESET, SUBMIT, TEXT.
53
54 In addition, it defines the default ENCTYPE attribute of the FORM
55 element using the POST METHOD to have the default value
56 "application/x-www-form-urlencoded".
57
58 This proposal makes three changes:
59 1) add a FILE option for the TYPE attribute of INPUT
60 2) Allow an ACCEPT attribute for INPUT tag, which is a list of
61 media types or type patterns allowed for the input
62 3) allow the ENCTYPE of a FORM to be "multipart/form-data".
63
64 These changes might be considered independently, but are all
65 necessary for reasonable file upload.
66
67 The author of an HTML form who wants to request one or more files
68 from a user would write (for example):
69
70 <FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST>
71
72 File to process: <INPUT NAME="userfile1" TYPE="file">
73
74 <INPUT TYPE="submit" VALUE="Send File">
75
76 </FORM>
77
78 The change to the HTML DTD is to add one item to the entity
79 "InputType". In addition, it is proposed that the INPUT tag have an
80 ACCEPT attribute, which is a list of comma-separated media types.
81
82
83 ... (other elements) ...
84
85 <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
86 RADIO | SUBMIT | RESET |
87 IMAGE | HIDDEN | FILE )">
88 <!ELEMENT INPUT - 0 EMPTY>
89 <!ATTLIST INPUT
90 TYPE %InputType TEXT
91 NAME CDATA #IMPLIED -- required for all but submit and reset
92 VALUE CDATA #IMPLIED
93 SRC %URI #IMPLIED -- for image inputs --
94 CHECKED (CHECKED) #IMPLIED
95 SIZE CDATA #IMPLIED --like NUMBERS,
96 but delimited with comma, not space
97 MAXLENGTH NUMBER #IMPLIED
98 ALIGN (top|middle|bottom) #IMPLIED
99 ACCEPT CDATA #IMPLIED --list of content types
100 >
101
102 ... (other elements) ...
103
104 3. Suggested implementation
105
106 While user agents that interpret HTML have wide leeway to choose the
107 most appropriate mechanism for their context, this section suggests
108 how one class of user agent, WWW browsers, might implement file
109 upload.
110
111 3.1 Display of FILE widget
112
113 When a INPUT tag of type FILE is encountered, the browser might show
114 a display of (previously selected) file names, and a "Browse" button
115 or selection method. Selecting the "Browse" button would cause the
116 browser to enter into a file selection mode appropriate for the
117 platform. Window-based browsers might pop up a file selection
118 window, for example. In such a file selection dialog, the user would
119 have the option of replacing a current selection, adding a new file
120 selection, etc. Browser implementors might choose let the list of
121 file names be manually edited.
122
123 If an ACCEPT attribute is present, the browser might constrain the
124 file patterns prompted for to match those with the corresponding
125 appropriate file extensions for the platform.
126
127 3.2 Action on submit
128
129 When the user completes the form, and selects the SUBMIT element,
130 the browser should send the form data and the content of the
131 selected files. The encoding type application/x-www-form-urlencoded
132 is inefficient for sending large quantities of binary data. Thus, a
133 new media type, multipart/form-data, is proposed as a way of
134 efficiently sending the values associated with a filled-out form
135 from client to server.
136
137
138 3.3 use of multipart/form-data
139
140 The definition of multipart/form-data is included in section 7.
141 The media-type multipart/form-data follows the rules of all
142 multipart MIME data streams as outlined in RFC 1521--a boundary is
143 selected that does not occur in any of the data. Each field of the
144 form is sent, in the order in which it occurs in the form, as a part
145 of the multipart stream. Each part identifies the INPUT name within
146 the original HTML form using a "content-disposition: form-data" header
147 with a name attribute specifying the field name. Each part has an
148 optional Content-Type (which defaults to text/plain). File inputs
149 should be identified as either application/octet-stream or the
150 appropriate media type, if known. If multiple files were selected,
151 they should be transferred together using the multipart/mixed
152 format.
153
154 The "content-transfer-encoding" header should be supplied for all
155 fields whose values do not conform to the default 7BIT encoding.
156 (All characters 7-bit US-ASCII data with lines no longer than 1000
157 characters.) Otherwise, file data and longer field values may be
158 transferred using a content-transfer-encoding appropriate to the
159 protocol of the ACTION in the form. For HTTP applications,
160 content-transfer-encoding of "binary" may be use. If the ACTION is
161 a "mailto:" URL, then the user agent may encode the data
162 appropriately to the mail transport mechanism. [See section 5 of
163 RFC 1521 for more details.]
164
165 File inputs may optionally identify the file name using the
166 "filename" attribute on the content-disposition header. This is not
167 required, but is as a convenience for those cases where, for
168 example, the uploaded files might contain references to each other,
169 e.g., a TeX file and its .sty auxiliary style description.
170
171
172 On the server end, the ACTION might point to a HTTP URL that
173 implements the forms action via CGI. In such a case, the CGI program
174 would note that the content-type is multipart/form-data, parse the
175 various fields (checking for validity, writing the file data to local
176 files for subsequent processing, etc.).
177
178 3.4 Interpretation of other attributes
179
180 The VALUE attribute might be used with <INPUT TYPE=file> tags for
181 a default file name. This use is probably platform dependent.
182 It might be useful, however, in sequences of more than one
183 transaction, e.g., to avoid having the user prompted for the same
184 file name over and over again.
185
186 The SIZE attribute might be specified using SIZE=width,height, where
187 width is some default for file name width, while height is the
188 expected size showing the list of selected files. For example, this
189 would be useful for forms designers who expect to get several files
190 and who would like to show a multiline file input field in the
191 browser (with a "browse" button beside it, hopefully). It would be
192 useful to show a one line text field when no height is specified
193 (when the forms designer expects one file, only) and to show a
194 multiline text area with scrollbars when the height is greater than
195 1 (when the forms designer expects multiple files).
196
197 4. Backward compatibility issues
198
199 While not necessary for successful adoption of an enhancement to the
200 current WWW form mechanism, it is useful to also plan for a
201 migration strategy: users with older browsers can still participate
202 in file upload dialogs, using a helper application. Most current web
203 browers, when given <INPUT TYPE=FILE>, will treat it as <INPUT
204 TYPE=TEXT> and give the user a text box. The user can type in a file
205 name into this text box. In addition, current browsers seem to
206 ignore the ENCTYPE parameter in the <FORM> element, and always
207 transmit the data as application/x-www-form-urlencoded.
208
209 Thus, the server CGI might be written in a way that would note that
210 the form data returned had content-type
211 application/x-www-form-urlencoded instead of
212 multipart/form-data, and know that the user was using a browser
213 that didn't implement file upload.
214
215 In this case, rather than replying with a "text/html" response, the
216 CGI on the server could instead send back a data stream that a helper
217 application might process instead; this would be a data stream of
218 type "application/x-please-send-files", which contains:
219
220 * The (fully qualified) URL to which the actual form data should
221 be posted (terminated with CRLF)
222 * The list of field names that were supposed to be file contents
223 (space separated, terminated with CRLF)
224 * The entire original application/x-www-form-urlencoded form data
225 as originally sent from client to server.
226
227 In this case, the browser needs to be configured to process
228 application/x-please-send-files to launch a helper application.
229
230 The helper would read the form data, note which fields contained
231 'local file names' that needed to be replaced with their data
232 content, might itself prompt the user for changing or adding to the
233 list of files available, and then repackage the data & file contents
234 in multipart/form-data for retransmission back to the server.
235
236 The helper would generate the kind of data that a 'new' browser should
237 actually have sent in the first place, with the intention that the URL
238 to which it is sent corresponds to the original ACTION URL. The point
239 of this is that the server can use the *same* CGI to implement the
240 mechanism for dealing with both old and new browsers.
241
242 The helper need not display the form data, but *should* ensure that
243 the user actually be prompted about the suitability of sending the
244 files requested (this is to avoid a security problem with malicious
245 servers that ask for files that weren't actually promised by the
246 user.) It would be useful if the status of the transfer of the files
247 involved could be displayed.
248
249 5. Other considerations
250
251 5.1 Compression, encryption
252
253 This scheme doesn't address the possible compression of files.
254 After some consideration, it seemed that the optimization issues of
255 file compression were too complex to try to automatically have
256 browsers decide that files should be compressed. Many link-layer
257 transport mechanisms (e.g., high-speed modems) perform data
258 compression over the link, and optimizing for compression at this
259 layer might not be appropriate. It might be possible for browsers to
260 optionally produce a content-transfer-encoding of x-compress for
261 file data, and for servers to decompress the data before processing,
262 if desired; this was left out of the proposal, however.
263
264 Similarly, the proposal does not contain a mechanism for encryption
265 of the data; this should be handled by whatever other mechanisms are
266 in place for secure transmission of data, whether via secure HTTP or
267 mail.
268
269 5.2 Deferred file transmission
270
271 In some situations, it might be advisable to have the server
272 validate various elements of the form data (user name, account,
273 etc.) before actually preparing to receive the data. However,
274 after some consideration, it seemed best to require that servers
275 that wish to do this should implement this as a series of forms,
276 where some of the data elements that were previously validated might
277 be sent back to the client as 'hidden' fields, or by arranging the
278 form so that the elements that need validation occur first. This
279 puts the onus of maintaining the state of a transaction only on
280 those servers that wish to build a complex application, while
281 allowing those cases that have simple input needs to be built
282 simply. Clients are encouraged to supply content-length for overall
283 file input so that a busy server could detect if the proposed file
284 data is too large to be processed reasonably and just return an
285 error code and close the connection without waiting to process all
286 of the incoming data.
287
288 If the INPUT tag includes the attribute MAXLENGTH, the user agent
289 should consider its value to represent the maximum Content-Length
290 (in bytes) which the server will accept for transferred files. In
291 this way, servers can hint to the client how much space they have
292 available for a file upload, before that upload takes place. It is
293 important to note, however, that this is only a hint, and the actual
294 requirements of the server may change between form creation and file
295 submission.
296
297 5.3 Other choices for return transmission of binary data
298
299 Various people have suggested using new mime top-level type
300 "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of
301 "packet" to express indeterminate-length binary data, rather than
302 relying on the multipart-style boundaries. While we are not opposed
303 to doing so, this would require additional design and
304 standardization work to get acceptance of "aggregate". On the other
305 hand, the 'multipart' mechanisms are well established, simple to
306 implement on both the sending client and receiving server, and as
307 efficient as other methods of dealing with multiple combinations of
308 binary data.
309
310 5.4 Not overloading <INPUT>:
311
312 Various people have wondered about the advisability of overloading
313 'INPUT' for this function, rather than merely providing a different
314 type of FORM element. Among other considerations, the migration
315 strategy which is allowed when using <INPUT> is important. In
316 addition, the <INPUT> field *is* already overloaded to contain most
317 kinds of data input; rather than creating multiple kinds of <INPUT>
318 tags, it seems most reasonable to enhance <INPUT>. The 'type' of
319 INPUT is not the content-type of what is returned, but rather the
320 'widget-type'; i.e., it identifies the interaction style with the
321 user. The description here is carefully written to allow <INPUT
322 TYPE=FILE> to work for text browsers or audio-markup.
323
324 5.5 Default content-type of field data
325
326 Many input fields in HTML are to be typed in. There has been some
327 ambiguity as to how form data should be transmitted back to servers.
328 Making the content-type of <INPUT> fields be text/plain clearly
329 disambiguates that the client should properly encode the data before
330 sending it back to the server with CRLFs.
331
332 5.6 Allow form ACTION to be "mailto:"
333
334 Independent of this proposal, it would be very useful for HTML
335 interpreting user agents to allow a ACTION in a form to be a
336 "mailto:" URL. This seems like a good idea, with or without this
337 proposal. Similarly, the ACTION for a HTML form which is received
338 via mail should probably default to the "reply-to:" of the message.
339 These two proposals would allow HTML forms to be served via HTTP
340 servers but sent back via mail, or, alternatively, allow HTML forms
341 to be sent by mail, filled out by HTML-aware mail recipients, and
342 the results mailed back.
343
344 5.7 Remote files with third-party transfer
345
346 In some scenarios, the user operating the client software might want
347 to specify a URL for remote data rather than a local file. In this
348 case, is there a way to allow the browser to send to the client a
349 pointer to the external data rather than the entire contents? This
350 capability could be implemented, for example, by having the client
351 send to the server data of type "message/external-body" with
352 "access-type" set to, say, "uri", and the URL of the remote data in
353 the body of the message.
354
355 5.8 File transfer with ENCTYPE=x-www-form-urlencoded
356
357 If a form contains <INPUT TYPE=file> elements but does not contain
358 an ENCTYPE in the enclosing <FORM>, the behavior is not specified.
359 It is probably inappropriate to attempt to URN-encode large
360 quantities of data to servers that don't expect it.
361
362 5.9 CRLF used as line separator
363
364 As with all MIME transmissions, CRLF is used as the separator for
365 lines in a POST of the data in multipart/www-form-data.
366
367 6. Examples
368
369 Suppose the server supplies the following HTML:
370
371 <FORM ACTION="http://server.dom/cgi/handle"
372 ENCTYPE="multipart/form-data"
373 METHOD=POST>
374 What is your name? <INPUT TYPE=TEXT NAME=submitter>
375 What files are you sending? <INPUT TYPE=FILE NAME=pics>
376 </FORM>
377
378 and the user types "Joe Blow" in the name field, and selects
379 a text file "file1.txt" and also an image file "file2.gif" for
380 the answer to 'What files are you sending?'.
381
382 The client would send back the following data:
383
384 Content-type: multipart/form-data, boundary=AaB03x
385 --AaB03x
386 content-disposition: form-data; name="field1"
387
388 Joe Blow
389 --AaB03x
390 content-disposition: form-data; name="pics"
391 Content-type: multipart/mixed, boundary=BbC04y
392
393 --BbC04y
394 Content-disposition: attachment; filename="file1.txt"
395 Content-Type: text/plain
396 Content-Transfer-Encoding: binary
397
398 ... contents of file1.txt ...
399 --BbC04y
400 Content-disposition: attachment; filename="file2.gif"
401 Content-type: image/gif
402 Content-Transfer-Encoding: binary
403
404 ...contents of file2.gif...
405 --BbC04y--
406 --AaB03x--
407
408
409 7. Registration of multipart/form-data
410
411 The media-type multipart/form-data follows the rules of all
412 multipart MIME data streams as outlined in RFC 1521. It is intended
413 for use in returning the data that comes about from filling out a
414 form. In a form (in HTML, although other applications may also use
415 forms), there are a series of fields to be supplied by the user who
416 fills out the form. Each field has a name. The name of the field
417 is restricted to be a set of US-ASCII graphic characters; within a
418 given form, the names are unique.
419
420 multipart/form-data contains a series of parts. Each part is expected
421 to contain a content-disposition header where the value is
422 "form-data" and a name attribute specifies the field name within the
423 form, e.g., 'content-disposition: form-data; name="xxxxx"', where
424 xxxxx is the field name corresponding to that field. As with all
425 multipart MIME types, each part has an optional Content-Type which
426 defaults to text/plain.
427
428 Note that mime headers are generally required to consist only of
429 7-bit data in the US-ASCII character set. This specification thus
430 requires that the field names used consist of 7-bit ascii US
431 characters.
432
433 If the contents of a file are returned via filling out a form, then
434 the file input is identified as application/octet-stream or the
435 appropriate media type, if known. If multiple files are to be
436 returned as the result of a single form entry, they can be returned
437 as multipart/mixed embedded within the multipart/form-data.
438
439 The "content-transfer-encoding" header should be supplied for all
440 fields whose values do not conform to the default 7BIT encoding
441 (all characters 7-bit US-ASCII data with lines no longer than 1000
442 characters.)
443
444 Otherwise, file data and longer field values may be
445 transferred using a content-transfer-encoding appropriate to the
446 protocol of the ACTION in the form. For HTTP applications,
447 content-transfer-encoding of "binary" may be use. If the ACTION is
448 a "mailto:" URL, then the user agent may encode the data
449 appropriately to the mail transport mechanism. [See section 5 of
450 RFC 1521 for more details.]
451
452 File inputs may also identify the file name. The file name may be
453 described using the 'filename' parameter of the
454 "content-disposition" header. This is not required, but is strongly
455 recommended in any case where the original filename is known. This
456 is useful or necessary in many applications.
457
458
459 8. Security Considerations
460
461 It is important that a user agent not send any file that the user
462 has not asked to be sent, explicitly. Thus, HTML interpreting agents
463 are expected to confirm any default file names that might be
464 suggested with <INPUT TYPE=file VALUE="yyyy">. Never have any
465 hidden fields be able to specify any file.
466
467 9. Conclusion
468
469 The suggested implementation gives the client a lot of flexibility in
470 the number and types of files it can send to the server, it gives the
471 server control of the decision to accept the files, and it gives
472 servers a chance to interact with browsers which do not support INPUT
473 TYPE "file".
474
475 The change to the HTML DTD is very simple, but very powerful. It
476 enables a much greater variety of services to be implemented via the
477 World-Wide Web than is currently possible due to the lack of a file
478 submission facility. This would be an extremely valuable addition to
479 the capabilities of the World-Wide Web.
480
481
482 A. Authors' Addresses
483
484 Larry Masinter masinter@parc.xerox.com
485 Xerox Palo Alto Research Center Voice: (415) 812-4365
486 3333 Coyote Hill Road Fax: (415) 812-4333
487 Palo Alto, CA 94304
488
489 Ernesto Nebel nebel@xsoft.sd.xerox.com
490 XSoft, Xerox Corporation Voice: (619) 676-7817
491 10875 Rancho Bernardo Road, Suite 200 Fax: (619) 676-7865
492 San Diego, CA 92127-2116
493
494
495 B. Media type registration for multipart/form-data
496 Media Type name:
497 multipart
498
499 Media subtype name:
500 form-data
501
502 Required parameters:
503 none
504
505 Optional parameters:
506 none
507
508 Encoding considerations:
509 No additional considerations other than as for other multipart types.
510
511 Published specification:
512 draft-ietf-html-fileupload-02.txt
513
514 Security Considerations
515
516 The multipart/form-data type introduces no new security
517 considerations beyond what might occur with any of the enclosed
518 parts.
519
520 Person & email address to contact for further information:
521
522 Larry Masinter
523 masinter@parc.xerox.com
524
525
526
527

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24