1 |
INTERNET-DRAFT E. Nebel |
2 |
Form-based File Upload in HTML L. Masinter |
3 |
draft-ietf-html-fileupload-02.txt Xerox Corporation |
4 |
Expires in 6 months April 19, 1995 |
5 |
|
6 |
|
7 |
Form-based File Upload in HTML |
8 |
|
9 |
Status of this Memo |
10 |
|
11 |
This document is an Internet-Draft. Internet-Drafts are working |
12 |
documents of the Internet Engineering Task Force (IETF), its areas, |
13 |
and its working groups. Note that other groups may also distribute |
14 |
working documents as Internet-Drafts. |
15 |
|
16 |
Internet-Drafts are draft documents valid for a maximum of six months |
17 |
and may be updated, replaced, or obsoleted by other documents at any |
18 |
time. It is inappropriate to use Internet-Drafts as reference |
19 |
material or to cite them other than as ``work in progress.'' |
20 |
|
21 |
To learn the current status of any Internet-Draft, please check the |
22 |
``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow |
23 |
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), |
24 |
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or |
25 |
ftp.isi.edu (US West Coast). |
26 |
|
27 |
|
28 |
1. Abstract |
29 |
|
30 |
Currently, HTML forms allow the producer of the form to request |
31 |
information from the user reading the form. These forms have proven |
32 |
useful in a wide variety of applications in which input from the |
33 |
user is necessary. However, this capability is limited because HTML |
34 |
forms don't provide a way to ask the user to submit files of data. |
35 |
Service providers who need to get files from the user have had to |
36 |
implement custom user applications. (Examples of these custom |
37 |
browsers have appeared on the www-talk mailing list.) Since |
38 |
file-upload is a feature that will benefit many applications, this |
39 |
draft proposes an extension to HTML to allow information providers |
40 |
to express file upload requests uniformly, and a MIME compatible |
41 |
representation for file upload responses. This draft also includes |
42 |
a description of a backward compatibility strategy that allows new |
43 |
servers to interact with the current HTML user agents. |
44 |
|
45 |
The proposal is independent of which version of HTML it becomes a |
46 |
part. |
47 |
|
48 |
2. HTML forms with file submission |
49 |
|
50 |
The current draft HTML specification defines eight possible values |
51 |
for the attribute TYPE of an INPUT element: CHECKBOX, HIDDEN, IMAGE, |
52 |
PASSWORD, RADIO, RESET, SUBMIT, TEXT. |
53 |
|
54 |
In addition, it defines the default ENCTYPE attribute of the FORM |
55 |
element using the POST METHOD to have the default value |
56 |
"application/x-www-form-urlencoded". |
57 |
|
58 |
This proposal makes three changes: |
59 |
1) add a FILE option for the TYPE attribute of INPUT |
60 |
2) Allow an ACCEPT attribute for INPUT tag, which is a list of |
61 |
media types or type patterns allowed for the input |
62 |
3) allow the ENCTYPE of a FORM to be "multipart/form-data". |
63 |
|
64 |
These changes might be considered independently, but are all |
65 |
necessary for reasonable file upload. |
66 |
|
67 |
The author of an HTML form who wants to request one or more files |
68 |
from a user would write (for example): |
69 |
|
70 |
<FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST> |
71 |
|
72 |
File to process: <INPUT NAME="userfile1" TYPE="file"> |
73 |
|
74 |
<INPUT TYPE="submit" VALUE="Send File"> |
75 |
|
76 |
</FORM> |
77 |
|
78 |
The change to the HTML DTD is to add one item to the entity |
79 |
"InputType". In addition, it is proposed that the INPUT tag have an |
80 |
ACCEPT attribute, which is a list of comma-separated media types. |
81 |
|
82 |
|
83 |
... (other elements) ... |
84 |
|
85 |
<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | |
86 |
RADIO | SUBMIT | RESET | |
87 |
IMAGE | HIDDEN | FILE )"> |
88 |
<!ELEMENT INPUT - 0 EMPTY> |
89 |
<!ATTLIST INPUT |
90 |
TYPE %InputType TEXT |
91 |
NAME CDATA #IMPLIED -- required for all but submit and reset |
92 |
VALUE CDATA #IMPLIED |
93 |
SRC %URI #IMPLIED -- for image inputs -- |
94 |
CHECKED (CHECKED) #IMPLIED |
95 |
SIZE CDATA #IMPLIED --like NUMBERS, |
96 |
but delimited with comma, not space |
97 |
MAXLENGTH NUMBER #IMPLIED |
98 |
ALIGN (top|middle|bottom) #IMPLIED |
99 |
ACCEPT CDATA #IMPLIED --list of content types |
100 |
> |
101 |
|
102 |
... (other elements) ... |
103 |
|
104 |
3. Suggested implementation |
105 |
|
106 |
While user agents that interpret HTML have wide leeway to choose the |
107 |
most appropriate mechanism for their context, this section suggests |
108 |
how one class of user agent, WWW browsers, might implement file |
109 |
upload. |
110 |
|
111 |
3.1 Display of FILE widget |
112 |
|
113 |
When a INPUT tag of type FILE is encountered, the browser might show |
114 |
a display of (previously selected) file names, and a "Browse" button |
115 |
or selection method. Selecting the "Browse" button would cause the |
116 |
browser to enter into a file selection mode appropriate for the |
117 |
platform. Window-based browsers might pop up a file selection |
118 |
window, for example. In such a file selection dialog, the user would |
119 |
have the option of replacing a current selection, adding a new file |
120 |
selection, etc. Browser implementors might choose let the list of |
121 |
file names be manually edited. |
122 |
|
123 |
If an ACCEPT attribute is present, the browser might constrain the |
124 |
file patterns prompted for to match those with the corresponding |
125 |
appropriate file extensions for the platform. |
126 |
|
127 |
3.2 Action on submit |
128 |
|
129 |
When the user completes the form, and selects the SUBMIT element, |
130 |
the browser should send the form data and the content of the |
131 |
selected files. The encoding type application/x-www-form-urlencoded |
132 |
is inefficient for sending large quantities of binary data. Thus, a |
133 |
new media type, multipart/form-data, is proposed as a way of |
134 |
efficiently sending the values associated with a filled-out form |
135 |
from client to server. |
136 |
|
137 |
|
138 |
3.3 use of multipart/form-data |
139 |
|
140 |
The definition of multipart/form-data is included in section 7. |
141 |
The media-type multipart/form-data follows the rules of all |
142 |
multipart MIME data streams as outlined in RFC 1521--a boundary is |
143 |
selected that does not occur in any of the data. Each field of the |
144 |
form is sent, in the order in which it occurs in the form, as a part |
145 |
of the multipart stream. Each part identifies the INPUT name within |
146 |
the original HTML form using a "content-disposition: form-data" header |
147 |
with a name attribute specifying the field name. Each part has an |
148 |
optional Content-Type (which defaults to text/plain). File inputs |
149 |
should be identified as either application/octet-stream or the |
150 |
appropriate media type, if known. If multiple files were selected, |
151 |
they should be transferred together using the multipart/mixed |
152 |
format. |
153 |
|
154 |
The "content-transfer-encoding" header should be supplied for all |
155 |
fields whose values do not conform to the default 7BIT encoding. |
156 |
(All characters 7-bit US-ASCII data with lines no longer than 1000 |
157 |
characters.) Otherwise, file data and longer field values may be |
158 |
transferred using a content-transfer-encoding appropriate to the |
159 |
protocol of the ACTION in the form. For HTTP applications, |
160 |
content-transfer-encoding of "binary" may be use. If the ACTION is |
161 |
a "mailto:" URL, then the user agent may encode the data |
162 |
appropriately to the mail transport mechanism. [See section 5 of |
163 |
RFC 1521 for more details.] |
164 |
|
165 |
File inputs may optionally identify the file name using the |
166 |
"filename" attribute on the content-disposition header. This is not |
167 |
required, but is as a convenience for those cases where, for |
168 |
example, the uploaded files might contain references to each other, |
169 |
e.g., a TeX file and its .sty auxiliary style description. |
170 |
|
171 |
|
172 |
On the server end, the ACTION might point to a HTTP URL that |
173 |
implements the forms action via CGI. In such a case, the CGI program |
174 |
would note that the content-type is multipart/form-data, parse the |
175 |
various fields (checking for validity, writing the file data to local |
176 |
files for subsequent processing, etc.). |
177 |
|
178 |
3.4 Interpretation of other attributes |
179 |
|
180 |
The VALUE attribute might be used with <INPUT TYPE=file> tags for |
181 |
a default file name. This use is probably platform dependent. |
182 |
It might be useful, however, in sequences of more than one |
183 |
transaction, e.g., to avoid having the user prompted for the same |
184 |
file name over and over again. |
185 |
|
186 |
The SIZE attribute might be specified using SIZE=width,height, where |
187 |
width is some default for file name width, while height is the |
188 |
expected size showing the list of selected files. For example, this |
189 |
would be useful for forms designers who expect to get several files |
190 |
and who would like to show a multiline file input field in the |
191 |
browser (with a "browse" button beside it, hopefully). It would be |
192 |
useful to show a one line text field when no height is specified |
193 |
(when the forms designer expects one file, only) and to show a |
194 |
multiline text area with scrollbars when the height is greater than |
195 |
1 (when the forms designer expects multiple files). |
196 |
|
197 |
4. Backward compatibility issues |
198 |
|
199 |
While not necessary for successful adoption of an enhancement to the |
200 |
current WWW form mechanism, it is useful to also plan for a |
201 |
migration strategy: users with older browsers can still participate |
202 |
in file upload dialogs, using a helper application. Most current web |
203 |
browers, when given <INPUT TYPE=FILE>, will treat it as <INPUT |
204 |
TYPE=TEXT> and give the user a text box. The user can type in a file |
205 |
name into this text box. In addition, current browsers seem to |
206 |
ignore the ENCTYPE parameter in the <FORM> element, and always |
207 |
transmit the data as application/x-www-form-urlencoded. |
208 |
|
209 |
Thus, the server CGI might be written in a way that would note that |
210 |
the form data returned had content-type |
211 |
application/x-www-form-urlencoded instead of |
212 |
multipart/form-data, and know that the user was using a browser |
213 |
that didn't implement file upload. |
214 |
|
215 |
In this case, rather than replying with a "text/html" response, the |
216 |
CGI on the server could instead send back a data stream that a helper |
217 |
application might process instead; this would be a data stream of |
218 |
type "application/x-please-send-files", which contains: |
219 |
|
220 |
* The (fully qualified) URL to which the actual form data should |
221 |
be posted (terminated with CRLF) |
222 |
* The list of field names that were supposed to be file contents |
223 |
(space separated, terminated with CRLF) |
224 |
* The entire original application/x-www-form-urlencoded form data |
225 |
as originally sent from client to server. |
226 |
|
227 |
In this case, the browser needs to be configured to process |
228 |
application/x-please-send-files to launch a helper application. |
229 |
|
230 |
The helper would read the form data, note which fields contained |
231 |
'local file names' that needed to be replaced with their data |
232 |
content, might itself prompt the user for changing or adding to the |
233 |
list of files available, and then repackage the data & file contents |
234 |
in multipart/form-data for retransmission back to the server. |
235 |
|
236 |
The helper would generate the kind of data that a 'new' browser should |
237 |
actually have sent in the first place, with the intention that the URL |
238 |
to which it is sent corresponds to the original ACTION URL. The point |
239 |
of this is that the server can use the *same* CGI to implement the |
240 |
mechanism for dealing with both old and new browsers. |
241 |
|
242 |
The helper need not display the form data, but *should* ensure that |
243 |
the user actually be prompted about the suitability of sending the |
244 |
files requested (this is to avoid a security problem with malicious |
245 |
servers that ask for files that weren't actually promised by the |
246 |
user.) It would be useful if the status of the transfer of the files |
247 |
involved could be displayed. |
248 |
|
249 |
5. Other considerations |
250 |
|
251 |
5.1 Compression, encryption |
252 |
|
253 |
This scheme doesn't address the possible compression of files. |
254 |
After some consideration, it seemed that the optimization issues of |
255 |
file compression were too complex to try to automatically have |
256 |
browsers decide that files should be compressed. Many link-layer |
257 |
transport mechanisms (e.g., high-speed modems) perform data |
258 |
compression over the link, and optimizing for compression at this |
259 |
layer might not be appropriate. It might be possible for browsers to |
260 |
optionally produce a content-transfer-encoding of x-compress for |
261 |
file data, and for servers to decompress the data before processing, |
262 |
if desired; this was left out of the proposal, however. |
263 |
|
264 |
Similarly, the proposal does not contain a mechanism for encryption |
265 |
of the data; this should be handled by whatever other mechanisms are |
266 |
in place for secure transmission of data, whether via secure HTTP or |
267 |
mail. |
268 |
|
269 |
5.2 Deferred file transmission |
270 |
|
271 |
In some situations, it might be advisable to have the server |
272 |
validate various elements of the form data (user name, account, |
273 |
etc.) before actually preparing to receive the data. However, |
274 |
after some consideration, it seemed best to require that servers |
275 |
that wish to do this should implement this as a series of forms, |
276 |
where some of the data elements that were previously validated might |
277 |
be sent back to the client as 'hidden' fields, or by arranging the |
278 |
form so that the elements that need validation occur first. This |
279 |
puts the onus of maintaining the state of a transaction only on |
280 |
those servers that wish to build a complex application, while |
281 |
allowing those cases that have simple input needs to be built |
282 |
simply. Clients are encouraged to supply content-length for overall |
283 |
file input so that a busy server could detect if the proposed file |
284 |
data is too large to be processed reasonably and just return an |
285 |
error code and close the connection without waiting to process all |
286 |
of the incoming data. |
287 |
|
288 |
If the INPUT tag includes the attribute MAXLENGTH, the user agent |
289 |
should consider its value to represent the maximum Content-Length |
290 |
(in bytes) which the server will accept for transferred files. In |
291 |
this way, servers can hint to the client how much space they have |
292 |
available for a file upload, before that upload takes place. It is |
293 |
important to note, however, that this is only a hint, and the actual |
294 |
requirements of the server may change between form creation and file |
295 |
submission. |
296 |
|
297 |
5.3 Other choices for return transmission of binary data |
298 |
|
299 |
Various people have suggested using new mime top-level type |
300 |
"aggregate", e.g., aggregate/mixed or a content-transfer-encoding of |
301 |
"packet" to express indeterminate-length binary data, rather than |
302 |
relying on the multipart-style boundaries. While we are not opposed |
303 |
to doing so, this would require additional design and |
304 |
standardization work to get acceptance of "aggregate". On the other |
305 |
hand, the 'multipart' mechanisms are well established, simple to |
306 |
implement on both the sending client and receiving server, and as |
307 |
efficient as other methods of dealing with multiple combinations of |
308 |
binary data. |
309 |
|
310 |
5.4 Not overloading <INPUT>: |
311 |
|
312 |
Various people have wondered about the advisability of overloading |
313 |
'INPUT' for this function, rather than merely providing a different |
314 |
type of FORM element. Among other considerations, the migration |
315 |
strategy which is allowed when using <INPUT> is important. In |
316 |
addition, the <INPUT> field *is* already overloaded to contain most |
317 |
kinds of data input; rather than creating multiple kinds of <INPUT> |
318 |
tags, it seems most reasonable to enhance <INPUT>. The 'type' of |
319 |
INPUT is not the content-type of what is returned, but rather the |
320 |
'widget-type'; i.e., it identifies the interaction style with the |
321 |
user. The description here is carefully written to allow <INPUT |
322 |
TYPE=FILE> to work for text browsers or audio-markup. |
323 |
|
324 |
5.5 Default content-type of field data |
325 |
|
326 |
Many input fields in HTML are to be typed in. There has been some |
327 |
ambiguity as to how form data should be transmitted back to servers. |
328 |
Making the content-type of <INPUT> fields be text/plain clearly |
329 |
disambiguates that the client should properly encode the data before |
330 |
sending it back to the server with CRLFs. |
331 |
|
332 |
5.6 Allow form ACTION to be "mailto:" |
333 |
|
334 |
Independent of this proposal, it would be very useful for HTML |
335 |
interpreting user agents to allow a ACTION in a form to be a |
336 |
"mailto:" URL. This seems like a good idea, with or without this |
337 |
proposal. Similarly, the ACTION for a HTML form which is received |
338 |
via mail should probably default to the "reply-to:" of the message. |
339 |
These two proposals would allow HTML forms to be served via HTTP |
340 |
servers but sent back via mail, or, alternatively, allow HTML forms |
341 |
to be sent by mail, filled out by HTML-aware mail recipients, and |
342 |
the results mailed back. |
343 |
|
344 |
5.7 Remote files with third-party transfer |
345 |
|
346 |
In some scenarios, the user operating the client software might want |
347 |
to specify a URL for remote data rather than a local file. In this |
348 |
case, is there a way to allow the browser to send to the client a |
349 |
pointer to the external data rather than the entire contents? This |
350 |
capability could be implemented, for example, by having the client |
351 |
send to the server data of type "message/external-body" with |
352 |
"access-type" set to, say, "uri", and the URL of the remote data in |
353 |
the body of the message. |
354 |
|
355 |
5.8 File transfer with ENCTYPE=x-www-form-urlencoded |
356 |
|
357 |
If a form contains <INPUT TYPE=file> elements but does not contain |
358 |
an ENCTYPE in the enclosing <FORM>, the behavior is not specified. |
359 |
It is probably inappropriate to attempt to URN-encode large |
360 |
quantities of data to servers that don't expect it. |
361 |
|
362 |
5.9 CRLF used as line separator |
363 |
|
364 |
As with all MIME transmissions, CRLF is used as the separator for |
365 |
lines in a POST of the data in multipart/www-form-data. |
366 |
|
367 |
6. Examples |
368 |
|
369 |
Suppose the server supplies the following HTML: |
370 |
|
371 |
<FORM ACTION="http://server.dom/cgi/handle" |
372 |
ENCTYPE="multipart/form-data" |
373 |
METHOD=POST> |
374 |
What is your name? <INPUT TYPE=TEXT NAME=submitter> |
375 |
What files are you sending? <INPUT TYPE=FILE NAME=pics> |
376 |
</FORM> |
377 |
|
378 |
and the user types "Joe Blow" in the name field, and selects |
379 |
a text file "file1.txt" and also an image file "file2.gif" for |
380 |
the answer to 'What files are you sending?'. |
381 |
|
382 |
The client would send back the following data: |
383 |
|
384 |
Content-type: multipart/form-data, boundary=AaB03x |
385 |
--AaB03x |
386 |
content-disposition: form-data; name="field1" |
387 |
|
388 |
Joe Blow |
389 |
--AaB03x |
390 |
content-disposition: form-data; name="pics" |
391 |
Content-type: multipart/mixed, boundary=BbC04y |
392 |
|
393 |
--BbC04y |
394 |
Content-disposition: attachment; filename="file1.txt" |
395 |
Content-Type: text/plain |
396 |
Content-Transfer-Encoding: binary |
397 |
|
398 |
... contents of file1.txt ... |
399 |
--BbC04y |
400 |
Content-disposition: attachment; filename="file2.gif" |
401 |
Content-type: image/gif |
402 |
Content-Transfer-Encoding: binary |
403 |
|
404 |
...contents of file2.gif... |
405 |
--BbC04y-- |
406 |
--AaB03x-- |
407 |
|
408 |
|
409 |
7. Registration of multipart/form-data |
410 |
|
411 |
The media-type multipart/form-data follows the rules of all |
412 |
multipart MIME data streams as outlined in RFC 1521. It is intended |
413 |
for use in returning the data that comes about from filling out a |
414 |
form. In a form (in HTML, although other applications may also use |
415 |
forms), there are a series of fields to be supplied by the user who |
416 |
fills out the form. Each field has a name. The name of the field |
417 |
is restricted to be a set of US-ASCII graphic characters; within a |
418 |
given form, the names are unique. |
419 |
|
420 |
multipart/form-data contains a series of parts. Each part is expected |
421 |
to contain a content-disposition header where the value is |
422 |
"form-data" and a name attribute specifies the field name within the |
423 |
form, e.g., 'content-disposition: form-data; name="xxxxx"', where |
424 |
xxxxx is the field name corresponding to that field. As with all |
425 |
multipart MIME types, each part has an optional Content-Type which |
426 |
defaults to text/plain. |
427 |
|
428 |
Note that mime headers are generally required to consist only of |
429 |
7-bit data in the US-ASCII character set. This specification thus |
430 |
requires that the field names used consist of 7-bit ascii US |
431 |
characters. |
432 |
|
433 |
If the contents of a file are returned via filling out a form, then |
434 |
the file input is identified as application/octet-stream or the |
435 |
appropriate media type, if known. If multiple files are to be |
436 |
returned as the result of a single form entry, they can be returned |
437 |
as multipart/mixed embedded within the multipart/form-data. |
438 |
|
439 |
The "content-transfer-encoding" header should be supplied for all |
440 |
fields whose values do not conform to the default 7BIT encoding |
441 |
(all characters 7-bit US-ASCII data with lines no longer than 1000 |
442 |
characters.) |
443 |
|
444 |
Otherwise, file data and longer field values may be |
445 |
transferred using a content-transfer-encoding appropriate to the |
446 |
protocol of the ACTION in the form. For HTTP applications, |
447 |
content-transfer-encoding of "binary" may be use. If the ACTION is |
448 |
a "mailto:" URL, then the user agent may encode the data |
449 |
appropriately to the mail transport mechanism. [See section 5 of |
450 |
RFC 1521 for more details.] |
451 |
|
452 |
File inputs may also identify the file name. The file name may be |
453 |
described using the 'filename' parameter of the |
454 |
"content-disposition" header. This is not required, but is strongly |
455 |
recommended in any case where the original filename is known. This |
456 |
is useful or necessary in many applications. |
457 |
|
458 |
|
459 |
8. Security Considerations |
460 |
|
461 |
It is important that a user agent not send any file that the user |
462 |
has not asked to be sent, explicitly. Thus, HTML interpreting agents |
463 |
are expected to confirm any default file names that might be |
464 |
suggested with <INPUT TYPE=file VALUE="yyyy">. Never have any |
465 |
hidden fields be able to specify any file. |
466 |
|
467 |
9. Conclusion |
468 |
|
469 |
The suggested implementation gives the client a lot of flexibility in |
470 |
the number and types of files it can send to the server, it gives the |
471 |
server control of the decision to accept the files, and it gives |
472 |
servers a chance to interact with browsers which do not support INPUT |
473 |
TYPE "file". |
474 |
|
475 |
The change to the HTML DTD is very simple, but very powerful. It |
476 |
enables a much greater variety of services to be implemented via the |
477 |
World-Wide Web than is currently possible due to the lack of a file |
478 |
submission facility. This would be an extremely valuable addition to |
479 |
the capabilities of the World-Wide Web. |
480 |
|
481 |
|
482 |
A. Authors' Addresses |
483 |
|
484 |
Larry Masinter masinter@parc.xerox.com |
485 |
Xerox Palo Alto Research Center Voice: (415) 812-4365 |
486 |
3333 Coyote Hill Road Fax: (415) 812-4333 |
487 |
Palo Alto, CA 94304 |
488 |
|
489 |
Ernesto Nebel nebel@xsoft.sd.xerox.com |
490 |
XSoft, Xerox Corporation Voice: (619) 676-7817 |
491 |
10875 Rancho Bernardo Road, Suite 200 Fax: (619) 676-7865 |
492 |
San Diego, CA 92127-2116 |
493 |
|
494 |
|
495 |
B. Media type registration for multipart/form-data |
496 |
Media Type name: |
497 |
multipart |
498 |
|
499 |
Media subtype name: |
500 |
form-data |
501 |
|
502 |
Required parameters: |
503 |
none |
504 |
|
505 |
Optional parameters: |
506 |
none |
507 |
|
508 |
Encoding considerations: |
509 |
No additional considerations other than as for other multipart types. |
510 |
|
511 |
Published specification: |
512 |
draft-ietf-html-fileupload-02.txt |
513 |
|
514 |
Security Considerations |
515 |
|
516 |
The multipart/form-data type introduces no new security |
517 |
considerations beyond what might occur with any of the enclosed |
518 |
parts. |
519 |
|
520 |
Person & email address to contact for further information: |
521 |
|
522 |
Larry Masinter |
523 |
masinter@parc.xerox.com |
524 |
|
525 |
|
526 |
|
527 |
|