| 1 |
|
| 2 |
INTERNET-DRAFT E. Nebel |
| 3 |
Form-based File Upload in HTML L. Masinter |
| 4 |
draft-ietf-html-fileupload-01.txt Xerox Corporation |
| 5 |
Expires May 30, 1995 November 30, 1994 |
| 6 |
|
| 7 |
|
| 8 |
Form-based File Upload in HTML |
| 9 |
|
| 10 |
Status of this Memo |
| 11 |
|
| 12 |
This document is an Internet-Draft. Internet-Drafts are working |
| 13 |
documents of the Internet Engineering Task Force (IETF), its areas, |
| 14 |
and its working groups. Note that other groups may also distribute |
| 15 |
working documents as Internet-Drafts. |
| 16 |
|
| 17 |
Internet-Drafts are draft documents valid for a maximum of six months |
| 18 |
and may be updated, replaced, or obsoleted by other documents at any |
| 19 |
time. It is inappropriate to use Internet-Drafts as reference |
| 20 |
material or to cite them other than as ``work in progress.'' |
| 21 |
|
| 22 |
To learn the current status of any Internet-Draft, please check the |
| 23 |
``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow |
| 24 |
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), |
| 25 |
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or |
| 26 |
ftp.isi.edu (US West Coast). |
| 27 |
|
| 28 |
|
| 29 |
1. Abstract |
| 30 |
|
| 31 |
Currently, HTML forms allow the producer of the form to request |
| 32 |
information from the user reading the form. These forms have proven |
| 33 |
useful in a wide variety of applications in which input from the |
| 34 |
user is necessary. However, this capability is limited because HTML |
| 35 |
forms don't provide a way to ask the user to submit files of data. |
| 36 |
Service providers who need to get files from the user have had to |
| 37 |
implement custom user applications. (Examples of these custom |
| 38 |
browsers have appeared on the www-talk mailing list.) Since |
| 39 |
file-upload is a feature that will benefit many applications, this |
| 40 |
draft proposes an extension to HTML to allow information providers |
| 41 |
to express file upload requests uniformly, and a MIME compatible |
| 42 |
representation for file upload responses. This draft also includes |
| 43 |
a description of a backward compatibility strategy that allows new |
| 44 |
servers to interact with the current HTML user agents. |
| 45 |
|
| 46 |
The proposal is independent of which version of HTML it becomes a |
| 47 |
part; in particular, no proposal is being made of whether this might |
| 48 |
be part of HTML 2.0, 2.1, or 3.0. |
| 49 |
|
| 50 |
2. HTML forms with file submission |
| 51 |
|
| 52 |
The current draft HTML specification <URL:http://www.hal.com/ |
| 53 |
%7Fconnolly/html-spec/spyglass-19941014/html-19941014.txt.Z> defines |
| 54 |
eight possible values for the attribute TYPE of an INPUT element: |
| 55 |
CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT. |
| 56 |
|
| 57 |
In addition, it defines the default ENCTYPE attribute of the FORM |
| 58 |
element using the POST METHOD to have the default value |
| 59 |
"application/x-www-form-urlencoded". |
| 60 |
|
| 61 |
This proposal makes three changes: |
| 62 |
1) add a FILE option for the TYPE attribute of INPUT |
| 63 |
2) Allow an ACCEPT attribute for INPUT tag, which is a list of |
| 64 |
media types or type patterns allowed for the input |
| 65 |
3) allow the ENCTYPE of a FORM to be "multipart/form-data". |
| 66 |
|
| 67 |
These changes might be considered independently, but are all |
| 68 |
necessary for reasonable file upload. |
| 69 |
|
| 70 |
The author of an HTML form who wants to request one or more files |
| 71 |
from a user would write (for example): |
| 72 |
|
| 73 |
<FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST> |
| 74 |
|
| 75 |
File to process: <INPUT NAME="userfile1" TYPE="file"> |
| 76 |
|
| 77 |
<INPUT TYPE="submit" VALUE="Send File"> |
| 78 |
|
| 79 |
</FORM> |
| 80 |
|
| 81 |
The change to the HTML DTD is to add one item to the entity |
| 82 |
"InputType". In addition, it is proposed that the INPUT tag have an |
| 83 |
ACCEPT attribute, which is a list of comma-separated media types. |
| 84 |
|
| 85 |
|
| 86 |
... (other elements) ... |
| 87 |
|
| 88 |
<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | |
| 89 |
RADIO | SUBMIT | RESET | |
| 90 |
IMAGE | HIDDEN | FILE )"> |
| 91 |
<!ELEMENT INPUT - 0 EMPTY> |
| 92 |
<!ATTLIST INPUT |
| 93 |
TYPE %InputType TEXT |
| 94 |
NAME CDATA #IMPLIED -- required for all but submit and reset |
| 95 |
VALUE CDATA #IMPLIED |
| 96 |
SRC %URI #IMPLIED -- for image inputs -- |
| 97 |
CHECKED (CHECKED) #IMPLIED |
| 98 |
SIZE CDATA #IMPLIED --like NUMBERS, |
| 99 |
but delimited with comma, not space |
| 100 |
MAXLENGTH NUMBER #IMPLIED |
| 101 |
ALIGN (top|middle|bottom) #IMPLIED |
| 102 |
ACCEPT CDATA #IMPLIED --list of content types |
| 103 |
> |
| 104 |
|
| 105 |
... (other elements) ... |
| 106 |
|
| 107 |
3. Suggested implementation |
| 108 |
|
| 109 |
While user agents that interpret HTML have wide leeway to choose the |
| 110 |
most appropriate mechanism for their context, this section suggests |
| 111 |
how one class of user agent, WWW browsers, might implement file |
| 112 |
upload. |
| 113 |
|
| 114 |
When a INPUT tag of type FILE is encountered, the browser might show |
| 115 |
a display of (previously selected) file names, and a "Browse" button |
| 116 |
or selection method. Selecting the "Browse" button would cause the |
| 117 |
browser to enter into a file selection mode appropriate for the |
| 118 |
platform. Window-based browsers might pop up a file selection |
| 119 |
window, for example. In such a file selection dialog, the user would |
| 120 |
have the option of replacing a current selection, adding a new file |
| 121 |
selection, etc. Browser implementors might choose let the list of |
| 122 |
file names be manually edited. |
| 123 |
|
| 124 |
If an ACCEPT attribute is present, the browser might constrain the |
| 125 |
file patterns prompted for to match those with the corresponding |
| 126 |
appropriate file extensions for the platform. |
| 127 |
|
| 128 |
When the user completes the form, and selects the SUBMIT element, |
| 129 |
the browser should send the form data and the content of the |
| 130 |
selected files. The encoding type application/x-www-form-urlencoded |
| 131 |
is inefficient for sending large quantities of binary data. Thus, a |
| 132 |
new media type, multipart/form-data, is proposed as a way of |
| 133 |
efficiently sending the values associated with a filled-out form |
| 134 |
from client to server. |
| 135 |
|
| 136 |
The media-type multipart/form-data follows the rules of all |
| 137 |
multipart MIME data streams as outlined in RFC 1521--a boundary is |
| 138 |
selected that does not occur in any of the data. Each field of the |
| 139 |
form is sent, in the order in which it occurs in the form, as a part |
| 140 |
of the multipart stream. Each part identifies the INPUT name within |
| 141 |
the original HTML form using a "Name:" header. Each part has an |
| 142 |
optional Content-Type (which defaults to text/plain). File inputs |
| 143 |
should be identified as either application/octet-stream or the |
| 144 |
appropriate media type, if known. If multiple files were selected, |
| 145 |
they should be transferred together using the multipart/mixed |
| 146 |
format. |
| 147 |
|
| 148 |
The "content-transfer-encoding" header should be supplied for all |
| 149 |
fields whose values do not conform to the default 7BIT encoding. |
| 150 |
(All characters 7-bit US-ASCII data with lines no longer than 1000 |
| 151 |
characters.) Otherwise, file data and longer field values may be |
| 152 |
transferred using a content-transfer-encoding appropriate to the |
| 153 |
protocol of the ACTION in the form. For HTTP applications, |
| 154 |
content-transfer-encoding of "binary" may be use. If the ACTION is |
| 155 |
a "mailto:" URL, then the user agent may encode the data |
| 156 |
appropriately to the mail transport mechanism. [See section 5 of |
| 157 |
RFC 1521 for more details.] |
| 158 |
|
| 159 |
File inputs may optionally identify the file name using the |
| 160 |
"content-disposition" header. This is not required, but is as a |
| 161 |
convenience for those cases where, for example, the uploaded files |
| 162 |
might contain references to each other, e.g., a TeX file and its |
| 163 |
.sty auxiliary style description. |
| 164 |
|
| 165 |
Browers may optionally include a Content-Length header both in the |
| 166 |
overall reply and in individual components. The content-length is |
| 167 |
*not* intended as a replacement for the multipart boundary as a way |
| 168 |
of detecting the end of an individual component. It is *only* |
| 169 |
supplied as a way forwarning the server of the amount of data |
| 170 |
coming. |
| 171 |
|
| 172 |
On the server end, the ACTION might point to a HTTP URL that |
| 173 |
implements the forms action via CGI. In such a case, the CGI program |
| 174 |
would note that the content-type is multipart/form-data, parse the |
| 175 |
various fields (checking for validity, writing the file data to local |
| 176 |
files for subsequent processing, etc.). |
| 177 |
|
| 178 |
4. Backward compatibility issues |
| 179 |
|
| 180 |
While not necessary for successful adoption of an enhancement to the |
| 181 |
current WWW form mechanism, it is useful to also plan for a |
| 182 |
migration strategy: users with older browsers can still participate |
| 183 |
in file upload dialogs, using a helper application. Most current web |
| 184 |
browers, when given <INPUT TYPE=FILE>, will treat it as <INPUT |
| 185 |
TYPE=TEXT> and give the user a text box. The user can type in a file |
| 186 |
name into this text box. In addition, current browsers seem to |
| 187 |
ignore the ENCTYPE parameter in the <FORM> element, and always |
| 188 |
transmit the data as application/x-www-form-urlencoded. |
| 189 |
|
| 190 |
Thus, the server CGI might be written in a way that would note that |
| 191 |
the form data returned had content-type |
| 192 |
application/x-www-form-urlencoded instead of |
| 193 |
multipart/form-data, and know that the user was using a browser |
| 194 |
that didn't implement file upload. |
| 195 |
|
| 196 |
In this case, rather than replying with a "text/html" response, the |
| 197 |
CGI on the server could instead send back a data stream that a helper |
| 198 |
application might process instead; this would be a data stream of |
| 199 |
type "application/x-please-send-files", which contains: |
| 200 |
|
| 201 |
* The (fully qualified) URL to which the actual form data should |
| 202 |
be posted (terminated with CRLF) |
| 203 |
* The list of field names that were supposed to be file contents |
| 204 |
(space separated, terminated with CRLF) |
| 205 |
* The entire original application/x-www-form-urlencoded form data |
| 206 |
as originally sent from client to server. |
| 207 |
|
| 208 |
In this case, the browser needs to be configured to process |
| 209 |
application/x-please-send-files to launch a helper application. |
| 210 |
|
| 211 |
The helper would read the form data, note which fields contained |
| 212 |
'local file names' that needed to be replaced with their data |
| 213 |
content, might itself prompt the user for changing or adding to the |
| 214 |
list of files available, and then repackage the data & file contents |
| 215 |
in multipart/form-data for retransmission back to the server. |
| 216 |
|
| 217 |
The helper would generate the kind of data that a 'new' browser should |
| 218 |
actually have sent in the first place, with the intention that the URL |
| 219 |
to which it is sent corresponds to the original ACTION URL. The point |
| 220 |
of this is that the server can use the *same* CGI to implement the |
| 221 |
mechanism for dealing with both old and new browsers. |
| 222 |
|
| 223 |
The helper need not display the form data, but *should* ensure that |
| 224 |
the user actually be prompted about the suitability of sending the |
| 225 |
files requested (this is to avoid a security problem with malicious |
| 226 |
servers that ask for files that weren't actually promised by the |
| 227 |
user.) It would be useful if the status of the transfer of the files |
| 228 |
involved could be displayed. |
| 229 |
|
| 230 |
5. Other considerations |
| 231 |
|
| 232 |
5.1 Compression, encryption |
| 233 |
|
| 234 |
This scheme doesn't address the possible compression of files. |
| 235 |
After some consideration, it seemed that the optimization issues of |
| 236 |
file compression were too complex to try to automatically have |
| 237 |
browsers decide that files should be compressed. Many link-layer |
| 238 |
transport mechanisms (e.g., high-speed modems) perform data |
| 239 |
compression over the link, and optimizing for compression at this |
| 240 |
layer might not be appropriate. It might be possible for browsers to |
| 241 |
optionally produce a content-transfer-encoding of x-compress for |
| 242 |
file data, and for servers to decompress the data before processing, |
| 243 |
if desired; this was left out of the proposal, however. |
| 244 |
|
| 245 |
Similarly, the proposal does not contain a mechanism for encryption |
| 246 |
of the data; this should be handled by whatever other mechanisms are |
| 247 |
in place for secure transmission of data, whether via secure HTTP or |
| 248 |
mail. |
| 249 |
|
| 250 |
5.2 Deferred file transmission |
| 251 |
|
| 252 |
In some situations, it might be advisable to have the server |
| 253 |
validate various elements of the form data (user name, account, |
| 254 |
etc.) before actually preparing to receive the data. However, |
| 255 |
after some consideration, it seemed best to require that servers |
| 256 |
that wish to do this should implement this as a series of forms, |
| 257 |
where some of the data elements that were previously validated might |
| 258 |
be sent back to the client as 'hidden' fields, or by arranging the |
| 259 |
form so that the elements that need validation occur first. This |
| 260 |
puts the onus of maintaining the state of a transaction only on |
| 261 |
those servers that wish to build a complex application, while |
| 262 |
allowing those cases that have simple input needs to be built |
| 263 |
simply. Clients are encouraged to supply content-length for overall |
| 264 |
file input so that a busy server could detect if the proposed file |
| 265 |
data is too large to be processed reasonably and just return an |
| 266 |
error code and close the connection without waiting to process all |
| 267 |
of the incoming data. |
| 268 |
|
| 269 |
If the INPUT tag includes the attribute MAXLENGTH, the user agent |
| 270 |
should consider its value to represent the maximum Content-Length |
| 271 |
(in bytes) which the server will accept for transferred files. In |
| 272 |
this way, servers can hint to the client how much space they have |
| 273 |
available for a file upload, before that upload takes place. It is |
| 274 |
important to note, however, that this is only a hint, and the actual |
| 275 |
requirements of the server may change between form creation and file |
| 276 |
submission. |
| 277 |
|
| 278 |
5.3 Other choices for return transmission of binary data |
| 279 |
|
| 280 |
Various people have suggested using new mime top-level type |
| 281 |
"aggregate", e.g., aggregate/mixed or a content-transfer-encoding of |
| 282 |
"packet" to express indeterminate-length binary data, rather than |
| 283 |
relying on the multipart-style boundaries. While we are not opposed |
| 284 |
to doing so, this would require additional design and |
| 285 |
standardization work to get acceptance of "aggregate". On the other |
| 286 |
hand, the 'multipart' mechanisms are well established, simple to |
| 287 |
implement on both the sending client and receiving server, and as |
| 288 |
efficient as other methods of dealing with multiple combinations of |
| 289 |
binary data. |
| 290 |
|
| 291 |
5.4 Not overloading <INPUT>: |
| 292 |
|
| 293 |
Various people have wondered about the advisability of overloading |
| 294 |
'INPUT' for this function, rather than merely providing a different |
| 295 |
type of FORM element. Among other considerations, the migration |
| 296 |
strategy which is allowed when using <INPUT> is important. In |
| 297 |
addition, the <INPUT> field *is* already overloaded to contain most |
| 298 |
kinds of data input; rather than creating multiple kinds of <INPUT> |
| 299 |
tags, it seems most reasonable to enhance <INPUT>. The 'type' of |
| 300 |
INPUT is not the content-type of what is returned, but rather the |
| 301 |
'widget-type'; i.e., it identifies the interaction style with the |
| 302 |
user. The description here is carefully written to allow <INPUT |
| 303 |
TYPE=FILE> to work for text browsers or audio-markup. |
| 304 |
|
| 305 |
5.5 Default content-type of field data |
| 306 |
|
| 307 |
Many input fields in HTML are to be typed in. There has been some |
| 308 |
ambiguity as to how form data should be transmitted back to servers. |
| 309 |
Making the content-type of <INPUT> fields be text/plain clearly |
| 310 |
disambiguates that the client should properly encode the data before |
| 311 |
sending it back to the server with CRLFs. |
| 312 |
|
| 313 |
5.6 Interpretation of other attributes |
| 314 |
|
| 315 |
The VALUE attribute might be used with <INPUT TYPE=file> tags for |
| 316 |
a default file name. This use is probably platform dependent, |
| 317 |
however, and probably should be avoided. |
| 318 |
|
| 319 |
The SIZE attribute should probably not be used with <INPUT |
| 320 |
TYPE=file> tags. For text input, it indicates the length in |
| 321 |
characters for the text area for the prompt. |
| 322 |
|
| 323 |
5.7 Using Name instead of existing MIME headers for field names |
| 324 |
|
| 325 |
To avoid any confusion and overloading of existing mime headers, |
| 326 |
this proposal has "Name" (instead of content-id or |
| 327 |
content-description) for the name of the original field in the form |
| 328 |
in multipart/form-data. |
| 329 |
|
| 330 |
On the other hand, the original file names might appear within a |
| 331 |
multipart/mixed stream, and thus the proposal avoids defining a new |
| 332 |
header field for transmitting the original file name by using |
| 333 |
the "content-disposition" field to send the file name, if |
| 334 |
appropriate. (Content-disposition is apparently the proposed way of |
| 335 |
encoding file names in MIME messages, as well.) |
| 336 |
|
| 337 |
5.8 Allow form ACTION to be "mailto:" |
| 338 |
|
| 339 |
Independent of this proposal, it would be very useful for HTML |
| 340 |
interpreting user agents to allow a ACTION in a form to be a |
| 341 |
"mailto:" URL. This seems like a good idea, with or without this |
| 342 |
proposal. Similarly, the ACTION for a HTML form which is received |
| 343 |
via mail should probably default to the "reply-to:" of the message. |
| 344 |
These two proposals would allow HTML forms to be served via HTTP |
| 345 |
servers but sent back via mail, or, alternatively, allow HTML forms |
| 346 |
to be sent by mail, filled out by HTML-aware mail recipients, and |
| 347 |
the results mailed back. |
| 348 |
|
| 349 |
5.9 Remote files with third-party transfer |
| 350 |
|
| 351 |
In some scenarios, the user operating the client software might want |
| 352 |
to specify a URL for remote data rather than a local file. In this |
| 353 |
case, is there a way to allow the browser to send to the client a |
| 354 |
pointer to the external data rather than the entire contents? This |
| 355 |
capability could be implemented, for example, by having the client |
| 356 |
send to the server data of type "message/external-body" with |
| 357 |
"access-type" set to, say, "uri", and the URL of the remote data in |
| 358 |
the body of the message. |
| 359 |
|
| 360 |
5.10 File transfer with ENCTYPE=x-www-form-urlencoded |
| 361 |
|
| 362 |
If a form contains <INPUT TYPE=file> elements but does not contain |
| 363 |
an ENCTYPE in the enclosing <FORM>, the behavior is not specified. |
| 364 |
It is probably inappropriate to attempt to URN-encode large |
| 365 |
quantities of data to servers that don't expect it. |
| 366 |
|
| 367 |
5.11 CRLF used as line separator |
| 368 |
|
| 369 |
As with all MIME transmissions, CRLF is used as the separator for |
| 370 |
lines in a POST of the data in multipart/www-form-data. |
| 371 |
|
| 372 |
6. Examples |
| 373 |
|
| 374 |
Suppose the server supplies the following HTML: |
| 375 |
|
| 376 |
<FORM ACTION="http://server.dom/cgi/handle" |
| 377 |
ENCTYPE="multipart/form-data" |
| 378 |
METHOD=POST> |
| 379 |
What is your name? <INPUT TYPE=TEXT NAME=submitter> |
| 380 |
What files are you sending? <INPUT TYPE=FILE NAME=pics> |
| 381 |
</FORM> |
| 382 |
|
| 383 |
and the user types "Joe Blow" in the name field, and selects |
| 384 |
a text file "file1.txt" and also an image file "file2.gif" for |
| 385 |
the answer to 'What files are you sending?'. |
| 386 |
|
| 387 |
The client would send back the following data: |
| 388 |
|
| 389 |
Content-type: multipart/form-data, boundary=AaB03x |
| 390 |
--AaB03x |
| 391 |
Name: field1 |
| 392 |
|
| 393 |
Joe Blow |
| 394 |
--AaB03x |
| 395 |
Name: pics |
| 396 |
Content-type: multipart/mixed, boundary=BbC04y |
| 397 |
|
| 398 |
--BbC04y |
| 399 |
Content-Description: file1.txt |
| 400 |
Content-Type: text/plain |
| 401 |
Content-Transfer-Encoding: binary |
| 402 |
|
| 403 |
... contents of file1.txt ... |
| 404 |
--BbC04y |
| 405 |
Content-Description: file2.gif |
| 406 |
Content-type: image/gif |
| 407 |
Content-Transfer-Encoding: binary |
| 408 |
|
| 409 |
...contents of file2.gif... |
| 410 |
--BbC04y-- |
| 411 |
--AaB03x-- |
| 412 |
|
| 413 |
|
| 414 |
7. Conclusion |
| 415 |
|
| 416 |
The suggested implementation gives the client a lot of flexibility in |
| 417 |
the number and types of files it can send to the server, it gives the |
| 418 |
server control of the decision to accept the files, and it gives |
| 419 |
servers a chance to interact with browsers which do not support INPUT |
| 420 |
TYPE "file". |
| 421 |
|
| 422 |
The change to the HTML DTD is very simple, but very powerful. It |
| 423 |
enables a much greater variety of services to be implemented via the |
| 424 |
World-Wide Web than is currently possible due to the lack of a file |
| 425 |
submission facility. This would be an extremely valuable addition to |
| 426 |
the capabilities of the World-Wide Web. |
| 427 |
|
| 428 |
|
| 429 |
A. Authors' Addresses |
| 430 |
|
| 431 |
Larry Masinter masinter@parc.xerox.com |
| 432 |
Xerox Palo Alto Research Center Voice: (415) 812-4365 |
| 433 |
3333 Coyote Hill Road Fax: (415) 812-4333 |
| 434 |
Palo Alto, CA 94304 |
| 435 |
|
| 436 |
Ernesto Nebel nebel@xsoft.sd.xerox.com |
| 437 |
XSoft, Xerox Corporation Voice: (619) 676-7817 |
| 438 |
10875 Rancho Bernardo Road, Suite 200 Fax: (619) 676-7865 |
| 439 |
San Diego, CA 92127-2116 |
| 440 |
|