/[suikacvs]/webroot/www/2004/id/draft-ietf-html-fileupload-00.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-html-fileupload-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:04 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2 INTERNET-DRAFT E. Nebel
3 File Transmission from WWW Browsers to Servers L. Masinter
4 draft-ietf-html-fileupload-00.txt Xerox Corporation
5 Expires May 18, 1995 November 18, 1994
6
7
8 File Transfer from World-Wide Web Browsers to Servers
9
10 i. Status of this Memo
11
12 This document is an Internet-Draft. Internet-Drafts are working
13 documents of the Internet Engineering Task Force (IETF), its areas,
14 and its working groups. Note that other groups may also distribute
15 working documents as Internet-Drafts.
16
17 Internet-Drafts are draft documents valid for a maximum of six months
18 and may be updated, replaced, or obsoleted by other documents at any
19 time. It is inappropriate to use Internet-Drafts as reference
20 material or to cite them other than as ``work in progress.''
21
22 To learn the current status of any Internet-Draft, please check the
23 ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
24 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
25 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
26 ftp.isi.edu (US West Coast).
27
28
29 1. Abstract
30
31 Currently, a World-Wide Web server can get information from users
32 with HTML forms. These forms have proven useful in a wide variety
33 of applications in which input from the user is necessary. But this
34 capability is still greatly limited because HTML forms don't provide
35 a way for the user to submit files to the server. Service providers
36 who need to get files from the user have had to implement custom
37 browsers. (Examples of these custom browsers have appeared on the
38 www-talk mailing list.) To avoid the necessity for custom browsers
39 and to make WWW servers complete in their ability to get information
40 from the user, the WWW needs to provide a way for users to send files
41 to servers. Since user information is sent back to the server using
42 HTML forms, it is most logical to extend HTML forms to support file
43 submission.
44
45 This document proposes an extention to HTML to allow forms to request
46 users supply files as data to be returned when the form has been
47 completely filled out and submitted. It also includes a description of
48 a backward compatibility strategy that allows new servers to interact
49 with old WWW browsers.
50
51 2. HTML forms with file submission
52
53 The current draft HTML specification <URL:http://www.hal.com/
54 %7Fconnolly/html-spec/spyglass-19941014/html-19941014.txt.Z> defines
55 eight possible values for the attribute TYPE of an INPUT element:
56 CHECKBOX, HIDDEN, IMAGE, PASSWORD, RADIO, RESET, SUBMIT, TEXT.
57
58 In addition, it defines the default ENCTYPE attribute of the FORM
59 element using the POST METHOD to have the default type
60 "application/x-www-form-urlencoded".
61
62 This proposal makes two changes:
63 1) add a FILE option for the TYPE attribute of INPUT
64 2) allow the ENCTYPE of a FORM to be "multipart/www-form-data".
65
66 These changes might be considered independently, but are both
67 necessary for reasonable file upload.
68
69 The author of an HTML form who wants to request one or more files
70 from a user would write (for example):
71
72 <FORM ENCTYPE="multipart/www-form-data" ACTION="_URL_" METHOD=POST>
73
74 File to process: <INPUT NAME="userfile1" TYPE="file">
75
76 <INPUT TYPE="submit" VALUE="Send File">
77
78 </FORM>
79
80 The change to the HTML DTD is trivial--just one item added to the
81 entity "InputType", as follows:
82
83
84 ... (other elements) ...
85
86 <!ENTITY % InputType "(TEXT | PASSWORD | CHECKBOX |
87 RADIO | SUBMIT | RESET |
88 IMAGE | HIDDEN | FILE )">
89 <!ELEMENT INPUT - 0 EMPTY>
90 <!ATTLIST INPUT
91 TYPE %InputType TEXT
92 NAME CDATA #IMPLIED -- required for all but submit and reset
93 VALUE CDATA #IMPLIED
94 SRC %URI #IMPLIED -- for image inputs --
95 CHECKED (CHECKED) #IMPLIED
96 SIZE CDATA #IMPLIED --like NUMBERS,
97 but delimited with comma, not space
98 MAXLENGTH NUMBER #IMPLIED
99 ALIGN (top|middle|bottom) #IMPLIED
100 >
101
102 ... (other elements) ...
103
104 This is the minimal change requested. Other, larger changes to the
105 InputType entity might also be contemplated but are not part of this
106 proposal. For example, an INPUT element might usefully have an
107 attribute which identifies a set of acceptable media-types, e.g.,
108 <INPUT TYPE=file ACCEPT="image/gif, image/tiff" NAME="image1">.
109
110
111 3. Proposed implementation
112
113 The proposed implementation in WWW browsers is, when a INPUT tag of
114 type FILE is encountered, to show a a display of (previously selected)
115 file names, and a "Browse" button or selection method. Selecting the
116 "Browse" button would cause the browser to enter into a file selection
117 mode appropriate for the platform. Window-based browsers might pops up
118 a file selection window, for example. In such a file selection dialog,
119 the user would have the option of replacing a current selection,
120 adding a new file selection, etc. Browser implementors might choose
121 let the list of file names be manually edited.
122
123 When the user completes the form, and selects the SUBMIT element, the
124 browser should send the form data and the content of the selected
125 files. The encoding type "application/x-www-form-urlencoded" is
126 inefficient for efficiently sending large quantities of binary data.
127 Thus, a (new) media type, "multipart/www-form-data" is proposed as a
128 way of efficiently sending the values associated with a filled-out
129 form from client to server.
130
131 The media-type (MIME-type) "multipart/www-form-data" follows the
132 rules of all multipart MIME data streams as outlined in RFC 1521--a
133 boundary is selected that does not occur (with more than
134 infinitessimal probability) in any of the data. Each field of the
135 form is sent, in
136
137 the order in which it occurs in the form, as a part of the multipart
138 stream. Each part identifies the INPUT name within the original
139 HTML form using a "Name: " attribute. Each part has an optional
140 Content-Type (which defaults to text/plain). File inputs should be
141 identified as either application/binary or the appropriate media
142 type, if known. If multiple files were selected, they should be
143 transferred together using the multipart/mixed format. The
144 "content-transfer-encoding" for each part should be "binary".
145
146 File inputs may optionally identify the file name using the
147 "Content-Description" header. Browers may optionally include a
148 Content-Length header both in the overall reply and in individual
149 components; the content-length is not intended as a replacement for
150 the multipart boundary as a way of detecting the end of an
151 individual component; rather, it is just as a way forwarning the
152 server of the amount of data coming.
153
154 On the server end, the ACTION might point to a HTTP URL that
155 implements the forms action via CGI. In such a case, the CGI program
156 would note that the content-type is multipart/www-form-data, parse the
157 various fields (checking for validity, writing the file data to local
158 files for subsequent processing, etc.).
159
160 4. Backward compatibility issues
161
162 While not necessary for successful adoption of an enhancement to the
163 current WWW form mechanism, it is useful to also plan for a migration
164 strategy:
165 users with older browsers can still participate in file
166 upload dialogs, using a helper application. Most current web browers,
167 when given <INPUT TYPE=FILE>, will treat it
168 as <INPUT TYPE=TEXT> and give the user a text box. The user can type
169 in a file name into this text box. In addition, current browsers seem
170 to ignore the ENCTYPE parameter in the <FORM> element, and always
171 transmit the data as application/x-www-form-urlencoded.
172
173 Thus, the server CGI might be written in a way that would note that
174 the form data returned had content-type
175 application/x-www-form-urlencoded instead of multipart/www-form-data,
176 and know that the user was using a browser that didn't implement file
177 upload.
178
179 In this case, rather than replying with a "text/html" response, the
180 CGI on the server could instead send back a data stream that a helper
181 application might process instead; this would be a data stream that
182 contained:
183 * The (fully qualified) URL to which the actual form data should
184 be posted (terminated with CRLF)
185 * The list of field names that were supposed to be file contents
186 (space separated, terminated with CRLF)
187 * The entire original application/x-www-form-urlencoded form data.
188
189 This data stream would be marked as application/x-please-send-files.
190
191 In this case, the browser needs to be configured to process
192 application/x-please-send-files to launch a helper application.
193
194 The helper would read the form data, note which fields contained
195 'local file names' that needed to be replaced with their data
196 content, might itself prompt the user for changing or adding to the
197 list of files available, and then repackage the data & file contents
198 in multipart/www-form-data for retransmission back to the server.
199
200 The helper would generate the kind of data that a 'new' browser should
201 actually have sent in the first place, with the intention that the URL
202 to which it is sent corresponds to the original ACTION URL. The point
203 of this is that the server can use the *same* CGI to implement the
204 mechanism for dealing with both old and new browsers.
205
206 The helper need not display the form data, but *should* ensure that
207 the user actually be prompted about the suitability of sending the
208 files requested (this is to avoid a security problem with malicious
209 servers that ask for files that weren't actually promised by the
210 user.) It would be useful if the status of the transfer of the files
211 involved could be displayed.
212
213 5. Other considerations
214
215 Compression:
216
217 This scheme doesn't address the possible compression of files.
218 After some consideration, it seemed that the optimization issues of
219 file compression were too complex to try to automatically have
220 browsers decide that files should be compressed. Many link-layer
221 transport mechanisms (e.g., high-speed modems) perform data
222 compression over the link, and optimizing for compression at this
223 layer might not be appropriate. It might be possible for browsers to
224 optionally produce a content-transfer-encoding of x-compress for
225 file data, and for servers to decompress the data before processing,
226 if desired; this was left out of the proposal, however.
227
228 Deferred file transmission:
229
230 In some situations, it might be advisable to have the server validate
231 various elements of the clients data (user name, account, etc.)
232 before actually preparing to receive the data. However, after some
233 consideration, it seemed best to require that servers that wish to do
234 this should implement this as a series of forms, where some of the
235 data elements that were previously validated might be sent back to
236 the client as 'hidden' fields. This puts the onus of maintaining the
237 state of a transaction only on those servers that wish to build a
238 complex application, while allowing those cases that have simple
239 input needs to be built simply.
240
241 Other choices for return transmission of binary data:
242
243 Various people have suggested using new mime top-level type
244 "aggregate", e.g., aggregate/mixed or a content-transfer-encoding of
245 "packet" to express indeterminate-length binary data, rather than
246 relying on the multipart-style boundaries. While we are not opposed
247 to doing so, this would require additional design and standardization
248 work to get acceptance of "aggregate". On the other hand, the
249 'multipart' mechanisms are well established, trivial to implement on
250 both the sending client and receiving server, and as efficient as
251 other methods of dealing with multiple combinations of binary data.
252
253 Not overloading <INPUT>:
254
255 Various people have wondered about the advisability of overloading
256 'INPUT' for this function, rather than merely providing a different
257 type of FORM element. Among other considerations, the migration
258 strategy which is allowed when using <INPUT> is important. In
259 addition, the <INPUT> field *is* already overloaded to contain most
260 kinds of data input; rather than creating multiple kinds of <INPUT>
261 tags, it seems most reasonable to enhance <INPUT>. The 'type' of
262 INPUT is not the content-type of what is returned, but rather the
263 'widget-type'; i.e., it identifies the interaction style with the
264 user. The description here is carefully written to allow <INPUT
265 TYPE=FILE> to work for text browsers or audio-markup.
266
267 Default content-type of field data:
268
269 Many input fields in HTML are to be typed in. There has been some
270 ambiguity as to how form data should be transmitted back to servers.
271 Making the content-type of <INPUT> fields be text/plain clearly
272 disambiguates that the client should properly encode the data before
273 sending it back to the server with CRLFs.
274
275 6. Examples
276
277 Suppose the server supplies the following HTML:
278
279 <FORM ACTION="http://server.dom/cgi/handle">
280 What is your name? <INPUT TYPE=TEXT NAME=submitter>
281 What files are you sending? <INPUT TYPE=FILE NAME=pics>
282 </FORM>
283
284 and the user types "Joe Blow" in the name field, and selects
285 a text file "file1.txt" and also an image file "file2.gif" for
286 the answer to 'What files are you sending?'.
287
288 The client would send back the following data:
289
290 Content-type: multipart/www-form-data, boundary=AaB03x
291 --AaB03x
292 Name: field1
293
294 Joe Blow
295 --AaB03x
296 Name: pics
297 Content-type: multipart/mixed, boundary=BbC04y
298
299 --BbC04y
300 Content-Description: file1.txt
301 Content-Type: text/plain
302
303 ... contents of file1.txt ...
304 --BbC04y
305 Content-Description: file2.gif
306 Content-type: image/gif
307
308 ...contents of file2.gif...
309 --BbC04y--
310 --AaB03x--
311
312 7. Conclusion
313
314 The suggested implementation gives the client a lot of flexibility in
315 the number and types of files it can send to the server, it gives the
316 server control of the decision to accept the files, and it gives
317 servers a chance to interact with browsers which do not support INPUT
318 TYPE "file".
319
320 The change to the HTML DTD is very simple, but very powerful. It
321 enables a much greater variety of services to be implemented via the
322 World-Wide Web than is currently possible due to the lack of a file
323 submission facility. This would be an extremely valuable addition to
324 the capabilities of the World-Wide Web.
325
326
327 A. Authors' Addresses
328
329 Larry Masinter masinter@parc.xerox.com
330 Xerox Palo Alto Research Center Voice: (415) 812-4365
331 3333 Coyote Hill Road Fax: (415) 812-4333
332 Palo Alto, CA 94304
333
334 Ernesto Nebel nebel@xsoft.sd.xerox.com
335 XSoft, Xerox Corporation Voice: (619) 676-7817
336 10875 Rancho Bernardo Road, Suite 200 Fax: (619) 676-7865
337 San Diego, CA 92127-2116

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24