1 |
wakaba |
1.1 |
|
2 |
|
|
INTERNET-DRAFT E. Nebel |
3 |
|
|
File Transmission from WWW Browsers to Servers L. Masinter |
4 |
|
|
draft-ietf-html-fileupload-00.txt Xerox Corporation |
5 |
|
|
Expires May 18, 1995 November 18, 1994 |
6 |
|
|
|
7 |
|
|
|
8 |
|
|
File Transfer from World-Wide Web Browsers to Servers |
9 |
|
|
|
10 |
|
|
i. Status of this Memo |
11 |
|
|
|
12 |
|
|
This document is an Internet-Draft. Internet-Drafts are working |
13 |
|
|
documents of the Internet Engineering Task Force (IETF), its areas, |
14 |
|
|
and its working groups. Note that other groups may also distribute |
15 |
|
|
working documents as Internet-Drafts. |
16 |
|
|
|
17 |
|
|
Internet-Drafts are draft documents valid for a maximum of six months |
18 |
|
|
and may be updated, replaced, or obsoleted by other documents at any |
19 |
|
|
time. It is inappropriate to use Internet-Drafts as reference |
20 |
|
|
material or to cite them other than as ``work in progress.'' |
21 |
|
|
|
22 |
|
|
To learn the current status of any Internet-Draft, please check the |
23 |
|
|
``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow |
24 |
|
|
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), |
25 |
|
|
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or |
26 |
|
|
ftp.isi.edu (US West Coast). |
27 |
|
|
|
28 |
|
|
|
29 |
|
|
1. Abstract |
30 |
|
|
|
31 |
|
|
Currently, a World-Wide Web server can get information from users |
32 |
|
|
with HTML forms. These forms have proven useful in a wide variety |
33 |
|
|
of applications in which input from the user is necessary. But this |
34 |
|
|
capability is still greatly limited because HTML forms don't provide |
35 |
|
|
a way for the user to submit files to the server. Service providers |
36 |
|
|
who need to get files from the user have had to implement custom |
37 |
|
|
browsers. (Examples of these custom browsers have appeared on the |
38 |
|
|
www-talk mailing list.) To avoid the necessity for custom browsers |
39 |
|
|
and to make WWW servers complete in their ability to get information |
40 |
|
|
from the user, the WWW needs to provide a way for users to send files |
41 |
|
|
to servers. Since user information is sent back to the server using |
42 |
|
|
HTML forms, it is most logical to extend HTML forms to support file |
43 |
|
|
submission. |
44 |
|
|
|
45 |
|
|
This document proposes an extention to HTML to allow forms to request |
46 |
|
|
users supply files as data to be returned when the form has been |
47 |
|
|
completely filled out and submitted. It also includes a description of |
48 |
|
|
a backward compatibility strategy that allows new servers to interact |
49 |
|
|
with old WWW browsers. |
50 |
|
|
|
51 |
|
|
2. HTML forms with file submission |
52 |
|
|
|
53 |
|
|
The current draft HTML specification <URL:http://www.hal.com/ |
54 |
|
|
%7Fconnolly/html-spec/spyglass-19941014/html-19941014.txt.Z> defines |
55 |
|
|
eight possible values for the attribute TYPE of an INPUT element: |
56 |
|
|
CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT. |
57 |
|
|
|
58 |
|
|
In addition, it defines the default ENCTYPE attribute of the FORM |
59 |
|
|
element using the POST METHOD to have the default type |
60 |
|
|
"application/x-www-form-urlencoded". |
61 |
|
|
|
62 |
|
|
This proposal makes two changes: |
63 |
|
|
1) add a FILE option for the TYPE attribute of INPUT |
64 |
|
|
2) allow the ENCTYPE of a FORM to be "multipart/www-form-data". |
65 |
|
|
|
66 |
|
|
These changes might be considered independently, but are both |
67 |
|
|
necessary for reasonable file upload. |
68 |
|
|
|
69 |
|
|
The author of an HTML form who wants to request one or more files |
70 |
|
|
from a user would write (for example): |
71 |
|
|
|
72 |
|
|
<FORM ENCTYPE="multipart/www-form-data" ACTION="_URL_" METHOD=POST> |
73 |
|
|
|
74 |
|
|
File to process: <INPUT NAME="userfile1" TYPE="file"> |
75 |
|
|
|
76 |
|
|
<INPUT TYPE="submit" VALUE="Send File"> |
77 |
|
|
|
78 |
|
|
</FORM> |
79 |
|
|
|
80 |
|
|
The change to the HTML DTD is trivial--just one item added to the |
81 |
|
|
entity "InputType", as follows: |
82 |
|
|
|
83 |
|
|
|
84 |
|
|
... (other elements) ... |
85 |
|
|
|
86 |
|
|
<!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX | |
87 |
|
|
RADIO | SUBMIT | RESET | |
88 |
|
|
IMAGE | HIDDEN | FILE )"> |
89 |
|
|
<!ELEMENT INPUT - 0 EMPTY> |
90 |
|
|
<!ATTLIST INPUT |
91 |
|
|
TYPE %InputType TEXT |
92 |
|
|
NAME CDATA #IMPLIED -- required for all but submit and reset |
93 |
|
|
VALUE CDATA #IMPLIED |
94 |
|
|
SRC %URI #IMPLIED -- for image inputs -- |
95 |
|
|
CHECKED (CHECKED) #IMPLIED |
96 |
|
|
SIZE CDATA #IMPLIED --like NUMBERS, |
97 |
|
|
but delimited with comma, not space |
98 |
|
|
MAXLENGTH NUMBER #IMPLIED |
99 |
|
|
ALIGN (top|middle|bottom) #IMPLIED |
100 |
|
|
> |
101 |
|
|
|
102 |
|
|
... (other elements) ... |
103 |
|
|
|
104 |
|
|
This is the minimal change requested. Other, larger changes to the |
105 |
|
|
InputType entity might also be contemplated but are not part of this |
106 |
|
|
proposal. For example, an INPUT element might usefully have an |
107 |
|
|
attribute which identifies a set of acceptable media-types, e.g., |
108 |
|
|
<INPUT TYPE=file ACCEPT="image/gif, image/tiff" NAME="image1">. |
109 |
|
|
|
110 |
|
|
|
111 |
|
|
3. Proposed implementation |
112 |
|
|
|
113 |
|
|
The proposed implementation in WWW browsers is, when a INPUT tag of |
114 |
|
|
type FILE is encountered, to show a a display of (previously selected) |
115 |
|
|
file names, and a "Browse" button or selection method. Selecting the |
116 |
|
|
"Browse" button would cause the browser to enter into a file selection |
117 |
|
|
mode appropriate for the platform. Window-based browsers might pops up |
118 |
|
|
a file selection window, for example. In such a file selection dialog, |
119 |
|
|
the user would have the option of replacing a current selection, |
120 |
|
|
adding a new file selection, etc. Browser implementors might choose |
121 |
|
|
let the list of file names be manually edited. |
122 |
|
|
|
123 |
|
|
When the user completes the form, and selects the SUBMIT element, the |
124 |
|
|
browser should send the form data and the content of the selected |
125 |
|
|
files. The encoding type "application/x-www-form-urlencoded" is |
126 |
|
|
inefficient for efficiently sending large quantities of binary data. |
127 |
|
|
Thus, a (new) media type, "multipart/www-form-data" is proposed as a |
128 |
|
|
way of efficiently sending the values associated with a filled-out |
129 |
|
|
form from client to server. |
130 |
|
|
|
131 |
|
|
The media-type (MIME-type) "multipart/www-form-data" follows the |
132 |
|
|
rules of all multipart MIME data streams as outlined in RFC 1521--a |
133 |
|
|
boundary is selected that does not occur (with more than |
134 |
|
|
infinitessimal probability) in any of the data. Each field of the |
135 |
|
|
form is sent, in |
136 |
|
|
|
137 |
|
|
the order in which it occurs in the form, as a part of the multipart |
138 |
|
|
stream. Each part identifies the INPUT name within the original |
139 |
|
|
HTML form using a "Name: " attribute. Each part has an optional |
140 |
|
|
Content-Type (which defaults to text/plain). File inputs should be |
141 |
|
|
identified as either application/binary or the appropriate media |
142 |
|
|
type, if known. If multiple files were selected, they should be |
143 |
|
|
transferred together using the multipart/mixed format. The |
144 |
|
|
"content-transfer-encoding" for each part should be "binary". |
145 |
|
|
|
146 |
|
|
File inputs may optionally identify the file name using the |
147 |
|
|
"Content-Description" header. Browers may optionally include a |
148 |
|
|
Content-Length header both in the overall reply and in individual |
149 |
|
|
components; the content-length is not intended as a replacement for |
150 |
|
|
the multipart boundary as a way of detecting the end of an |
151 |
|
|
individual component; rather, it is just as a way forwarning the |
152 |
|
|
server of the amount of data coming. |
153 |
|
|
|
154 |
|
|
On the server end, the ACTION might point to a HTTP URL that |
155 |
|
|
implements the forms action via CGI. In such a case, the CGI program |
156 |
|
|
would note that the content-type is multipart/www-form-data, parse the |
157 |
|
|
various fields (checking for validity, writing the file data to local |
158 |
|
|
files for subsequent processing, etc.). |
159 |
|
|
|
160 |
|
|
4. Backward compatibility issues |
161 |
|
|
|
162 |
|
|
While not necessary for successful adoption of an enhancement to the |
163 |
|
|
current WWW form mechanism, it is useful to also plan for a migration |
164 |
|
|
strategy: |
165 |
|
|
users with older browsers can still participate in file |
166 |
|
|
upload dialogs, using a helper application. Most current web browers, |
167 |
|
|
when given <INPUT TYPE=FILE>, will treat it |
168 |
|
|
as <INPUT TYPE=TEXT> and give the user a text box. The user can type |
169 |
|
|
in a file name into this text box. In addition, current browsers seem |
170 |
|
|
to ignore the ENCTYPE parameter in the <FORM> element, and always |
171 |
|
|
transmit the data as application/x-www-form-urlencoded. |
172 |
|
|
|
173 |
|
|
Thus, the server CGI might be written in a way that would note that |
174 |
|
|
the form data returned had content-type |
175 |
|
|
application/x-www-form-urlencoded instead of multipart/www-form-data, |
176 |
|
|
and know that the user was using a browser that didn't implement file |
177 |
|
|
upload. |
178 |
|
|
|
179 |
|
|
In this case, rather than replying with a "text/html" response, the |
180 |
|
|
CGI on the server could instead send back a data stream that a helper |
181 |
|
|
application might process instead; this would be a data stream that |
182 |
|
|
contained: |
183 |
|
|
* The (fully qualified) URL to which the actual form data should |
184 |
|
|
be posted (terminated with CRLF) |
185 |
|
|
* The list of field names that were supposed to be file contents |
186 |
|
|
(space separated, terminated with CRLF) |
187 |
|
|
* The entire original application/x-www-form-urlencoded form data. |
188 |
|
|
|
189 |
|
|
This data stream would be marked as application/x-please-send-files. |
190 |
|
|
|
191 |
|
|
In this case, the browser needs to be configured to process |
192 |
|
|
application/x-please-send-files to launch a helper application. |
193 |
|
|
|
194 |
|
|
The helper would read the form data, note which fields contained |
195 |
|
|
'local file names' that needed to be replaced with their data |
196 |
|
|
content, might itself prompt the user for changing or adding to the |
197 |
|
|
list of files available, and then repackage the data & file contents |
198 |
|
|
in multipart/www-form-data for retransmission back to the server. |
199 |
|
|
|
200 |
|
|
The helper would generate the kind of data that a 'new' browser should |
201 |
|
|
actually have sent in the first place, with the intention that the URL |
202 |
|
|
to which it is sent corresponds to the original ACTION URL. The point |
203 |
|
|
of this is that the server can use the *same* CGI to implement the |
204 |
|
|
mechanism for dealing with both old and new browsers. |
205 |
|
|
|
206 |
|
|
The helper need not display the form data, but *should* ensure that |
207 |
|
|
the user actually be prompted about the suitability of sending the |
208 |
|
|
files requested (this is to avoid a security problem with malicious |
209 |
|
|
servers that ask for files that weren't actually promised by the |
210 |
|
|
user.) It would be useful if the status of the transfer of the files |
211 |
|
|
involved could be displayed. |
212 |
|
|
|
213 |
|
|
5. Other considerations |
214 |
|
|
|
215 |
|
|
Compression: |
216 |
|
|
|
217 |
|
|
This scheme doesn't address the possible compression of files. |
218 |
|
|
After some consideration, it seemed that the optimization issues of |
219 |
|
|
file compression were too complex to try to automatically have |
220 |
|
|
browsers decide that files should be compressed. Many link-layer |
221 |
|
|
transport mechanisms (e.g., high-speed modems) perform data |
222 |
|
|
compression over the link, and optimizing for compression at this |
223 |
|
|
layer might not be appropriate. It might be possible for browsers to |
224 |
|
|
optionally produce a content-transfer-encoding of x-compress for |
225 |
|
|
file data, and for servers to decompress the data before processing, |
226 |
|
|
if desired; this was left out of the proposal, however. |
227 |
|
|
|
228 |
|
|
Deferred file transmission: |
229 |
|
|
|
230 |
|
|
In some situations, it might be advisable to have the server validate |
231 |
|
|
various elements of the clients data (user name, account, etc.) |
232 |
|
|
before actually preparing to receive the data. However, after some |
233 |
|
|
consideration, it seemed best to require that servers that wish to do |
234 |
|
|
this should implement this as a series of forms, where some of the |
235 |
|
|
data elements that were previously validated might be sent back to |
236 |
|
|
the client as 'hidden' fields. This puts the onus of maintaining the |
237 |
|
|
state of a transaction only on those servers that wish to build a |
238 |
|
|
complex application, while allowing those cases that have simple |
239 |
|
|
input needs to be built simply. |
240 |
|
|
|
241 |
|
|
Other choices for return transmission of binary data: |
242 |
|
|
|
243 |
|
|
Various people have suggested using new mime top-level type |
244 |
|
|
"aggregate", e.g., aggregate/mixed or a content-transfer-encoding of |
245 |
|
|
"packet" to express indeterminate-length binary data, rather than |
246 |
|
|
relying on the multipart-style boundaries. While we are not opposed |
247 |
|
|
to doing so, this would require additional design and standardization |
248 |
|
|
work to get acceptance of "aggregate". On the other hand, the |
249 |
|
|
'multipart' mechanisms are well established, trivial to implement on |
250 |
|
|
both the sending client and receiving server, and as efficient as |
251 |
|
|
other methods of dealing with multiple combinations of binary data. |
252 |
|
|
|
253 |
|
|
Not overloading <INPUT>: |
254 |
|
|
|
255 |
|
|
Various people have wondered about the advisability of overloading |
256 |
|
|
'INPUT' for this function, rather than merely providing a different |
257 |
|
|
type of FORM element. Among other considerations, the migration |
258 |
|
|
strategy which is allowed when using <INPUT> is important. In |
259 |
|
|
addition, the <INPUT> field *is* already overloaded to contain most |
260 |
|
|
kinds of data input; rather than creating multiple kinds of <INPUT> |
261 |
|
|
tags, it seems most reasonable to enhance <INPUT>. The 'type' of |
262 |
|
|
INPUT is not the content-type of what is returned, but rather the |
263 |
|
|
'widget-type'; i.e., it identifies the interaction style with the |
264 |
|
|
user. The description here is carefully written to allow <INPUT |
265 |
|
|
TYPE=FILE> to work for text browsers or audio-markup. |
266 |
|
|
|
267 |
|
|
Default content-type of field data: |
268 |
|
|
|
269 |
|
|
Many input fields in HTML are to be typed in. There has been some |
270 |
|
|
ambiguity as to how form data should be transmitted back to servers. |
271 |
|
|
Making the content-type of <INPUT> fields be text/plain clearly |
272 |
|
|
disambiguates that the client should properly encode the data before |
273 |
|
|
sending it back to the server with CRLFs. |
274 |
|
|
|
275 |
|
|
6. Examples |
276 |
|
|
|
277 |
|
|
Suppose the server supplies the following HTML: |
278 |
|
|
|
279 |
|
|
<FORM ACTION="http://server.dom/cgi/handle"> |
280 |
|
|
What is your name? <INPUT TYPE=TEXT NAME=submitter> |
281 |
|
|
What files are you sending? <INPUT TYPE=FILE NAME=pics> |
282 |
|
|
</FORM> |
283 |
|
|
|
284 |
|
|
and the user types "Joe Blow" in the name field, and selects |
285 |
|
|
a text file "file1.txt" and also an image file "file2.gif" for |
286 |
|
|
the answer to 'What files are you sending?'. |
287 |
|
|
|
288 |
|
|
The client would send back the following data: |
289 |
|
|
|
290 |
|
|
Content-type: multipart/www-form-data, boundary=AaB03x |
291 |
|
|
--AaB03x |
292 |
|
|
Name: field1 |
293 |
|
|
|
294 |
|
|
Joe Blow |
295 |
|
|
--AaB03x |
296 |
|
|
Name: pics |
297 |
|
|
Content-type: multipart/mixed, boundary=BbC04y |
298 |
|
|
|
299 |
|
|
--BbC04y |
300 |
|
|
Content-Description: file1.txt |
301 |
|
|
Content-Type: text/plain |
302 |
|
|
|
303 |
|
|
... contents of file1.txt ... |
304 |
|
|
--BbC04y |
305 |
|
|
Content-Description: file2.gif |
306 |
|
|
Content-type: image/gif |
307 |
|
|
|
308 |
|
|
...contents of file2.gif... |
309 |
|
|
--BbC04y-- |
310 |
|
|
--AaB03x-- |
311 |
|
|
|
312 |
|
|
7. Conclusion |
313 |
|
|
|
314 |
|
|
The suggested implementation gives the client a lot of flexibility in |
315 |
|
|
the number and types of files it can send to the server, it gives the |
316 |
|
|
server control of the decision to accept the files, and it gives |
317 |
|
|
servers a chance to interact with browsers which do not support INPUT |
318 |
|
|
TYPE "file". |
319 |
|
|
|
320 |
|
|
The change to the HTML DTD is very simple, but very powerful. It |
321 |
|
|
enables a much greater variety of services to be implemented via the |
322 |
|
|
World-Wide Web than is currently possible due to the lack of a file |
323 |
|
|
submission facility. This would be an extremely valuable addition to |
324 |
|
|
the capabilities of the World-Wide Web. |
325 |
|
|
|
326 |
|
|
|
327 |
|
|
A. Authors' Addresses |
328 |
|
|
|
329 |
|
|
Larry Masinter masinter@parc.xerox.com |
330 |
|
|
Xerox Palo Alto Research Center Voice: (415) 812-4365 |
331 |
|
|
3333 Coyote Hill Road Fax: (415) 812-4333 |
332 |
|
|
Palo Alto, CA 94304 |
333 |
|
|
|
334 |
|
|
Ernesto Nebel nebel@xsoft.sd.xerox.com |
335 |
|
|
XSoft, Xerox Corporation Voice: (619) 676-7817 |
336 |
|
|
10875 Rancho Bernardo Road, Suite 200 Fax: (619) 676-7865 |
337 |
|
|
San Diego, CA 92127-2116 |