/[suikacvs]/webroot/www/2004/id/draft-ietf-html-fileupload-01.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-html-fileupload-01.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:04 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2 INTERNET-DRAFT E. Nebel
3 Form-based File Upload in HTML L. Masinter
4 draft-ietf-html-fileupload-01.txt Xerox Corporation
5 Expires May 30, 1995 November 30, 1994
6
7
8 Form-based File Upload in HTML
9
10 Status of this Memo
11
12 This document is an Internet-Draft. Internet-Drafts are working
13 documents of the Internet Engineering Task Force (IETF), its areas,
14 and its working groups. Note that other groups may also distribute
15 working documents as Internet-Drafts.
16
17 Internet-Drafts are draft documents valid for a maximum of six months
18 and may be updated, replaced, or obsoleted by other documents at any
19 time. It is inappropriate to use Internet-Drafts as reference
20 material or to cite them other than as ``work in progress.''
21
22 To learn the current status of any Internet-Draft, please check the
23 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
24 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
25 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
26 ftp.isi.edu (US West Coast).
27
28
29 1. Abstract
30
31 Currently, HTML forms allow the producer of the form to request
32 information from the user reading the form. These forms have proven
33 useful in a wide variety of applications in which input from the
34 user is necessary. However, this capability is limited because HTML
35 forms don't provide a way to ask the user to submit files of data.
36 Service providers who need to get files from the user have had to
37 implement custom user applications. (Examples of these custom
38 browsers have appeared on the www-talk mailing list.) Since
39 file-upload is a feature that will benefit many applications, this
40 draft proposes an extension to HTML to allow information providers
41 to express file upload requests uniformly, and a MIME compatible
42 representation for file upload responses. This draft also includes
43 a description of a backward compatibility strategy that allows new
44 servers to interact with the current HTML user agents.
45
46 The proposal is independent of which version of HTML it becomes a
47 part; in particular, no proposal is being made of whether this might
48 be part of HTML 2.0, 2.1, or 3.0.
49
50 2. HTML forms with file submission
51
52 The current draft HTML specification <URL:http://www.hal.com/
53 %7Fconnolly/html-spec/spyglass-19941014/html-19941014.txt.Z> defines
54 eight possible values for the attribute TYPE of an INPUT element:
55 CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT.
56
57 In addition, it defines the default ENCTYPE attribute of the FORM
58 element using the POST METHOD to have the default value
59 "application/x-www-form-urlencoded".
60
61 This proposal makes three changes:
62 1) add a FILE option for the TYPE attribute of INPUT
63 2) Allow an ACCEPT attribute for INPUT tag, which is a list of
64 media types or type patterns allowed for the input
65 3) allow the ENCTYPE of a FORM to be "multipart/form-data".
66
67 These changes might be considered independently, but are all
68 necessary for reasonable file upload.
69
70 The author of an HTML form who wants to request one or more files
71 from a user would write (for example):
72
73 <FORM ENCTYPE="multipart/form-data" ACTION="_URL_" METHOD=POST>
74
75 File to process: <INPUT NAME="userfile1" TYPE="file">
76
77 <INPUT TYPE="submit" VALUE="Send File">
78
79 </FORM>
80
81 The change to the HTML DTD is to add one item to the entity
82 "InputType". In addition, it is proposed that the INPUT tag have an
83 ACCEPT attribute, which is a list of comma-separated media types.
84
85
86 ... (other elements) ...
87
88 <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
89 RADIO | SUBMIT | RESET |
90 IMAGE | HIDDEN | FILE )">
91 <!ELEMENT INPUT - 0 EMPTY>
92 <!ATTLIST INPUT
93 TYPE %InputType TEXT
94 NAME CDATA #IMPLIED -- required for all but submit and reset
95 VALUE CDATA #IMPLIED
96 SRC %URI #IMPLIED -- for image inputs --
97 CHECKED (CHECKED) #IMPLIED
98 SIZE CDATA #IMPLIED --like NUMBERS,
99 but delimited with comma, not space
100 MAXLENGTH NUMBER #IMPLIED
101 ALIGN (top|middle|bottom) #IMPLIED
102 ACCEPT CDATA #IMPLIED --list of content types
103 >
104
105 ... (other elements) ...
106
107 3. Suggested implementation
108
109 While user agents that interpret HTML have wide leeway to choose the
110 most appropriate mechanism for their context, this section suggests
111 how one class of user agent, WWW browsers, might implement file
112 upload.
113
114 When a INPUT tag of type FILE is encountered, the browser might show
115 a display of (previously selected) file names, and a "Browse" button
116 or selection method. Selecting the "Browse" button would cause the
117 browser to enter into a file selection mode appropriate for the
118 platform. Window-based browsers might pop up a file selection
119 window, for example. In such a file selection dialog, the user would
120 have the option of replacing a current selection, adding a new file
121 selection, etc. Browser implementors might choose let the list of
122 file names be manually edited.
123
124 If an ACCEPT attribute is present, the browser might constrain the
125 file patterns prompted for to match those with the corresponding
126 appropriate file extensions for the platform.
127
128 When the user completes the form, and selects the SUBMIT element,
129 the browser should send the form data and the content of the
130 selected files. The encoding type application/x-www-form-urlencoded
131 is inefficient for sending large quantities of binary data. Thus, a
132 new media type, multipart/form-data, is proposed as a way of
133 efficiently sending the values associated with a filled-out form
134 from client to server.
135
136 The media-type multipart/form-data follows the rules of all
137 multipart MIME data streams as outlined in RFC 1521--a boundary is
138 selected that does not occur in any of the data. Each field of the
139 form is sent, in the order in which it occurs in the form, as a part
140 of the multipart stream. Each part identifies the INPUT name within
141 the original HTML form using a "Name:" header. Each part has an
142 optional Content-Type (which defaults to text/plain). File inputs
143 should be identified as either application/octet-stream or the
144 appropriate media type, if known. If multiple files were selected,
145 they should be transferred together using the multipart/mixed
146 format.
147
148 The "content-transfer-encoding" header should be supplied for all
149 fields whose values do not conform to the default 7BIT encoding.
150 (All characters 7-bit US-ASCII data with lines no longer than 1000
151 characters.) Otherwise, file data and longer field values may be
152 transferred using a content-transfer-encoding appropriate to the
153 protocol of the ACTION in the form. For HTTP applications,
154 content-transfer-encoding of "binary" may be use. If the ACTION is
155 a "mailto:" URL, then the user agent may encode the data
156 appropriately to the mail transport mechanism. [See section 5 of
157 RFC 1521 for more details.]
158
159 File inputs may optionally identify the file name using the
160 "content-disposition" header. This is not required, but is as a
161 convenience for those cases where, for example, the uploaded files
162 might contain references to each other, e.g., a TeX file and its
163 .sty auxiliary style description.
164
165 Browers may optionally include a Content-Length header both in the
166 overall reply and in individual components. The content-length is
167 *not* intended as a replacement for the multipart boundary as a way
168 of detecting the end of an individual component. It is *only*
169 supplied as a way forwarning the server of the amount of data
170 coming.
171
172 On the server end, the ACTION might point to a HTTP URL that
173 implements the forms action via CGI. In such a case, the CGI program
174 would note that the content-type is multipart/form-data, parse the
175 various fields (checking for validity, writing the file data to local
176 files for subsequent processing, etc.).
177
178 4. Backward compatibility issues
179
180 While not necessary for successful adoption of an enhancement to the
181 current WWW form mechanism, it is useful to also plan for a
182 migration strategy: users with older browsers can still participate
183 in file upload dialogs, using a helper application. Most current web
184 browers, when given <INPUT TYPE=FILE>, will treat it as <INPUT
185 TYPE=TEXT> and give the user a text box. The user can type in a file
186 name into this text box. In addition, current browsers seem to
187 ignore the ENCTYPE parameter in the <FORM> element, and always
188 transmit the data as application/x-www-form-urlencoded.
189
190 Thus, the server CGI might be written in a way that would note that
191 the form data returned had content-type
192 application/x-www-form-urlencoded instead of
193 multipart/form-data, and know that the user was using a browser
194 that didn't implement file upload.
195
196 In this case, rather than replying with a "text/html" response, the
197 CGI on the server could instead send back a data stream that a helper
198 application might process instead; this would be a data stream of
199 type "application/x-please-send-files", which contains:
200
201 * The (fully qualified) URL to which the actual form data should
202 be posted (terminated with CRLF)
203 * The list of field names that were supposed to be file contents
204 (space separated, terminated with CRLF)
205 * The entire original application/x-www-form-urlencoded form data
206 as originally sent from client to server.
207
208 In this case, the browser needs to be configured to process
209 application/x-please-send-files to launch a helper application.
210
211 The helper would read the form data, note which fields contained
212 'local file names' that needed to be replaced with their data
213 content, might itself prompt the user for changing or adding to the
214 list of files available, and then repackage the data & file contents
215 in multipart/form-data for retransmission back to the server.
216
217 The helper would generate the kind of data that a 'new' browser should
218 actually have sent in the first place, with the intention that the URL
219 to which it is sent corresponds to the original ACTION URL. The point
220 of this is that the server can use the *same* CGI to implement the
221 mechanism for dealing with both old and new browsers.
222
223 The helper need not display the form data, but *should* ensure that
224 the user actually be prompted about the suitability of sending the
225 files requested (this is to avoid a security problem with malicious
226 servers that ask for files that weren't actually promised by the
227 user.) It would be useful if the status of the transfer of the files
228 involved could be displayed.
229
230 5. Other considerations
231
232 5.1 Compression, encryption
233
234 This scheme doesn't address the possible compression of files.
235 After some consideration, it seemed that the optimization issues of
236 file compression were too complex to try to automatically have
237 browsers decide that files should be compressed. Many link-layer
238 transport mechanisms (e.g., high-speed modems) perform data
239 compression over the link, and optimizing for compression at this
240 layer might not be appropriate. It might be possible for browsers to
241 optionally produce a content-transfer-encoding of x-compress for
242 file data, and for servers to decompress the data before processing,
243 if desired; this was left out of the proposal, however.
244
245 Similarly, the proposal does not contain a mechanism for encryption
246 of the data; this should be handled by whatever other mechanisms are
247 in place for secure transmission of data, whether via secure HTTP or
248 mail.
249
250 5.2 Deferred file transmission
251
252 In some situations, it might be advisable to have the server
253 validate various elements of the form data (user name, account,
254 etc.) before actually preparing to receive the data. However,
255 after some consideration, it seemed best to require that servers
256 that wish to do this should implement this as a series of forms,
257 where some of the data elements that were previously validated might
258 be sent back to the client as 'hidden' fields, or by arranging the
259 form so that the elements that need validation occur first. This
260 puts the onus of maintaining the state of a transaction only on
261 those servers that wish to build a complex application, while
262 allowing those cases that have simple input needs to be built
263 simply. Clients are encouraged to supply content-length for overall
264 file input so that a busy server could detect if the proposed file
265 data is too large to be processed reasonably and just return an
266 error code and close the connection without waiting to process all
267 of the incoming data.
268
269 If the INPUT tag includes the attribute MAXLENGTH, the user agent
270 should consider its value to represent the maximum Content-Length
271 (in bytes) which the server will accept for transferred files. In
272 this way, servers can hint to the client how much space they have
273 available for a file upload, before that upload takes place. It is
274 important to note, however, that this is only a hint, and the actual
275 requirements of the server may change between form creation and file
276 submission.
277
278 5.3 Other choices for return transmission of binary data
279
280 Various people have suggested using new mime top-level type
281 "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of
282 "packet" to express indeterminate-length binary data, rather than
283 relying on the multipart-style boundaries. While we are not opposed
284 to doing so, this would require additional design and
285 standardization work to get acceptance of "aggregate". On the other
286 hand, the 'multipart' mechanisms are well established, simple to
287 implement on both the sending client and receiving server, and as
288 efficient as other methods of dealing with multiple combinations of
289 binary data.
290
291 5.4 Not overloading <INPUT>:
292
293 Various people have wondered about the advisability of overloading
294 'INPUT' for this function, rather than merely providing a different
295 type of FORM element. Among other considerations, the migration
296 strategy which is allowed when using <INPUT> is important. In
297 addition, the <INPUT> field *is* already overloaded to contain most
298 kinds of data input; rather than creating multiple kinds of <INPUT>
299 tags, it seems most reasonable to enhance <INPUT>. The 'type' of
300 INPUT is not the content-type of what is returned, but rather the
301 'widget-type'; i.e., it identifies the interaction style with the
302 user. The description here is carefully written to allow <INPUT
303 TYPE=FILE> to work for text browsers or audio-markup.
304
305 5.5 Default content-type of field data
306
307 Many input fields in HTML are to be typed in. There has been some
308 ambiguity as to how form data should be transmitted back to servers.
309 Making the content-type of <INPUT> fields be text/plain clearly
310 disambiguates that the client should properly encode the data before
311 sending it back to the server with CRLFs.
312
313 5.6 Interpretation of other attributes
314
315 The VALUE attribute might be used with <INPUT TYPE=file> tags for
316 a default file name. This use is probably platform dependent,
317 however, and probably should be avoided.
318
319 The SIZE attribute should probably not be used with <INPUT
320 TYPE=file> tags. For text input, it indicates the length in
321 characters for the text area for the prompt.
322
323 5.7 Using Name instead of existing MIME headers for field names
324
325 To avoid any confusion and overloading of existing mime headers,
326 this proposal has "Name" (instead of content-id or
327 content-description) for the name of the original field in the form
328 in multipart/form-data.
329
330 On the other hand, the original file names might appear within a
331 multipart/mixed stream, and thus the proposal avoids defining a new
332 header field for transmitting the original file name by using
333 the "content-disposition" field to send the file name, if
334 appropriate. (Content-disposition is apparently the proposed way of
335 encoding file names in MIME messages, as well.)
336
337 5.8 Allow form ACTION to be "mailto:"
338
339 Independent of this proposal, it would be very useful for HTML
340 interpreting user agents to allow a ACTION in a form to be a
341 "mailto:" URL. This seems like a good idea, with or without this
342 proposal. Similarly, the ACTION for a HTML form which is received
343 via mail should probably default to the "reply-to:" of the message.
344 These two proposals would allow HTML forms to be served via HTTP
345 servers but sent back via mail, or, alternatively, allow HTML forms
346 to be sent by mail, filled out by HTML-aware mail recipients, and
347 the results mailed back.
348
349 5.9 Remote files with third-party transfer
350
351 In some scenarios, the user operating the client software might want
352 to specify a URL for remote data rather than a local file. In this
353 case, is there a way to allow the browser to send to the client a
354 pointer to the external data rather than the entire contents? This
355 capability could be implemented, for example, by having the client
356 send to the server data of type "message/external-body" with
357 "access-type" set to, say, "uri", and the URL of the remote data in
358 the body of the message.
359
360 5.10 File transfer with ENCTYPE=x-www-form-urlencoded
361
362 If a form contains <INPUT TYPE=file> elements but does not contain
363 an ENCTYPE in the enclosing <FORM>, the behavior is not specified.
364 It is probably inappropriate to attempt to URN-encode large
365 quantities of data to servers that don't expect it.
366
367 5.11 CRLF used as line separator
368
369 As with all MIME transmissions, CRLF is used as the separator for
370 lines in a POST of the data in multipart/www-form-data.
371
372 6. Examples
373
374 Suppose the server supplies the following HTML:
375
376 <FORM ACTION="http://server.dom/cgi/handle"
377 ENCTYPE="multipart/form-data"
378 METHOD=POST>
379 What is your name? <INPUT TYPE=TEXT NAME=submitter>
380 What files are you sending? <INPUT TYPE=FILE NAME=pics>
381 </FORM>
382
383 and the user types "Joe Blow" in the name field, and selects
384 a text file "file1.txt" and also an image file "file2.gif" for
385 the answer to 'What files are you sending?'.
386
387 The client would send back the following data:
388
389 Content-type: multipart/form-data, boundary=AaB03x
390 --AaB03x
391 Name: field1
392
393 Joe Blow
394 --AaB03x
395 Name: pics
396 Content-type: multipart/mixed, boundary=BbC04y
397
398 --BbC04y
399 Content-Description: file1.txt
400 Content-Type: text/plain
401 Content-Transfer-Encoding: binary
402
403 ... contents of file1.txt ...
404 --BbC04y
405 Content-Description: file2.gif
406 Content-type: image/gif
407 Content-Transfer-Encoding: binary
408
409 ...contents of file2.gif...
410 --BbC04y--
411 --AaB03x--
412
413
414 7. Conclusion
415
416 The suggested implementation gives the client a lot of flexibility in
417 the number and types of files it can send to the server, it gives the
418 server control of the decision to accept the files, and it gives
419 servers a chance to interact with browsers which do not support INPUT
420 TYPE "file".
421
422 The change to the HTML DTD is very simple, but very powerful. It
423 enables a much greater variety of services to be implemented via the
424 World-Wide Web than is currently possible due to the lack of a file
425 submission facility. This would be an extremely valuable addition to
426 the capabilities of the World-Wide Web.
427
428
429 A. Authors' Addresses
430
431 Larry Masinter masinter@parc.xerox.com
432 Xerox Palo Alto Research Center Voice: (415) 812-4365
433 3333 Coyote Hill Road Fax: (415) 812-4333
434 Palo Alto, CA 94304
435
436 Ernesto Nebel nebel@xsoft.sd.xerox.com
437 XSoft, Xerox Corporation Voice: (619) 676-7817
438 10875 Rancho Bernardo Road, Suite 200 Fax: (619) 676-7865
439 San Diego, CA 92127-2116
440

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24