1 |
wakaba |
1.1 |
INTERNET-DRAFT J. Salsman |
2 |
|
|
Suggested filename: <draft-www-device-upload-00.txt> WebTV Networks |
3 |
|
|
Expiration date: 15 May 1998 12 November 1997 |
4 |
|
|
|
5 |
|
|
Form-based Device Input and Upload in HTML |
6 |
|
|
|
7 |
|
|
Status of this Memo |
8 |
|
|
|
9 |
|
|
This draft extends an experimental protocol for the Internet |
10 |
|
|
community. This draft does not specify an Internet standard of any |
11 |
|
|
kind. Discussion and suggestions for improvement are requested. |
12 |
|
|
Distribution of this memo will be unlimited when the W3C approves the |
13 |
|
|
HTML 4.0 standard. Until then, please do not distribute this draft |
14 |
|
|
beyond your department. |
15 |
|
|
|
16 |
|
|
1. Abstract and introduction |
17 |
|
|
|
18 |
|
|
Currently, HTML forms allow the producer of the form to request |
19 |
|
|
information -- including files of data -- from the operator reading |
20 |
|
|
the form. However, this capability is limited because HTML forms |
21 |
|
|
don't provide a way to ask the operator to submit input from |
22 |
|
|
arbitrary sources such as audio devices like microphones. Since |
23 |
|
|
input and upload from various devices is a feature that will |
24 |
|
|
benefit many applications, this draft proposes an extension to the |
25 |
|
|
HTML INPUT TYPE=FILE form element specified in RFC 1867 to allow |
26 |
|
|
information providers to express requests for uploads from audio |
27 |
|
|
and other devices uniformly, and a discussion of MIME audio data |
28 |
|
|
types to facilitate useful audio upload responses. This draft also |
29 |
|
|
includes security and audio usability and quality discussions as |
30 |
|
|
well as a description of a backward compatibility strategy that |
31 |
|
|
allows new user agents to utilize HTML written with earlier |
32 |
|
|
proposals for audio input in mind, and concludes with motivations. |
33 |
|
|
|
34 |
|
|
This document is an Internet-Draft. Internet-Drafts are working |
35 |
|
|
documents of the Internet Engineering Task Force (IETF), its |
36 |
|
|
areas, and its working groups. Note that other groups may also |
37 |
|
|
distribute working documents as Internet-Drafts. |
38 |
|
|
|
39 |
|
|
Internet-Drafts are draft documents valid for a maximum of six |
40 |
|
|
months and may be updated, replaced, or obsoleted by other |
41 |
|
|
documents at any time. It is inappropriate to use Internet- |
42 |
|
|
Drafts as reference material or to cite them other than as |
43 |
|
|
"work in progress." |
44 |
|
|
|
45 |
|
|
To view the entire list of current Internet-Drafts, please check |
46 |
|
|
the "1id-abstracts.txt" listing contained in the Internet-Drafts |
47 |
|
|
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net |
48 |
|
|
(Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East |
49 |
|
|
Coast), or ftp.isi.edu (US West Coast). |
50 |
|
|
|
51 |
|
|
2. HTML forms with device input file upload submission |
52 |
|
|
|
53 |
|
|
Section 3.1 of RFC 1867 provides for the presentation of an |
54 |
|
|
arbitrary "widget" to specify input for file uploads. When an |
55 |
|
|
INPUT tag of type FILE is encountered with a DEVICE attribute, the |
56 |
|
|
associated value (such as MICROPHONE, or MIC) might select the use |
57 |
|
|
of a widget capable of buffering and editing real-time input (such |
58 |
|
|
as speech) instead of entering a file selection mode. |
59 |
|
|
|
60 |
|
|
If an ACCEPT attribute is present in a device file input element, |
61 |
|
|
the browser might constrain the MIME type of uploaded data to match |
62 |
|
|
those with the corresponding list of types specified. If the value |
63 |
|
|
of the DEVICE parameter is FILESYSTEM or FILES then the INPUT |
64 |
|
|
element might be treated as usual according to RFC 1867 except that |
65 |
|
|
the subset of files presented to the operator to choose from may be |
66 |
|
|
constrained by the specified list of MIME types instead of a |
67 |
|
|
pattern of file names or extensions. |
68 |
|
|
|
69 |
|
|
Since there is no original filename as specified in section 3.3 of |
70 |
|
|
RFC 1867 for parameters of the 'content-disposition: form-data' and |
71 |
|
|
'content-disposition: file' HTTP headers, those headers might be |
72 |
|
|
provided with a 'type' parameter representing the MIME type of the |
73 |
|
|
encoded data, if known, and a 'device' parameter with the same |
74 |
|
|
value as the DEVICE attribute of the associated form input element, |
75 |
|
|
unless the device or MIME type(s) specified are unsupported in |
76 |
|
|
which case the value of the 'device' header parameter might be |
77 |
|
|
'unsupported', or unless the device is unavailable in which case |
78 |
|
|
the value might be 'unavailable'. If the MIME types requested are |
79 |
|
|
unsupported, an additional parameter 'alternates' might be included |
80 |
|
|
with a space-separated list of MIME types of the same content-type |
81 |
|
|
which may be supported as alternatives for the specified device. |
82 |
|
|
|
83 |
|
|
There may be significant limitations on the client browser's |
84 |
|
|
ability to buffer input for upload. Browsers might provide an |
85 |
|
|
estimate of the default MAXLENGTH available for device input and |
86 |
|
|
upload through the HTTP header 'Pragma: DEVICE-MAXLENGTH='BYTES |
87 |
|
|
which represents the content-length available to the browser for |
88 |
|
|
buffering (see section 14.32 of RFC 2068.) |
89 |
|
|
|
90 |
|
|
Furthermore, the VALUE attribute may be used to provide a |
91 |
|
|
disambiguation between multiple similar devices when present. |
92 |
|
|
|
93 |
|
|
If real time events, such as those described and proposed by |
94 |
|
|
Gregory S. Aist in "A General Architecture for a Real-Time |
95 |
|
|
Discourse Agent and a Case Study in Computerized Oral Reading |
96 |
|
|
Tutoring" (Carnegie Mellon University Computational Linguistics |
97 |
|
|
Program, 6 December 1996), are required, then the Real-time |
98 |
|
|
Transport Protocol (RTP, currently RFC 1889) should be used |
99 |
|
|
instead. Because of security concerns discussed in section 3 |
100 |
|
|
below, HTML scripts might not be able to invoke a form submission |
101 |
|
|
when the form involves any kind of file upload without explicit |
102 |
|
|
instructions from the session operator to the contrary. |
103 |
|
|
|
104 |
|
|
2.1. Examples |
105 |
|
|
|
106 |
|
|
<FORM ENCTYPE="multipart/form-data" METHOD=POST ACTION="_URL_"> |
107 |
|
|
Say something: <INPUT NAME=SPEECH1 TYPE=FILE DEVICE=MIC> |
108 |
|
|
<INPUT TYPE=SUBMIT VALUE="Send Speech"> |
109 |
|
|
</FORM> |
110 |
|
|
|
111 |
|
|
In this simple form, the HTML author has requested the upload of |
112 |
|
|
sampled microphone input from the operator upon form submission. |
113 |
|
|
|
114 |
|
|
<INPUT NAME=SPEECH2 TYPE=FILE DEVICE=MICROPHONE |
115 |
|
|
ACCEPT="audio/l16 ;rate=11025 ;channels=1 audio/x-cepstral-voc"> |
116 |
|
|
|
117 |
|
|
Here MIC is not used as an abbreviation. The author of the HTML has |
118 |
|
|
requested that the data input from the microphone be encoded as |
119 |
|
|
either |
120 |
|
|
the MIME type Audio/L16 -- sixteen bit signed linear audio samples |
121 |
|
|
(most-significant byte first) -- as specified in RFC 1890 section |
122 |
|
|
4.4.8, with a single (monaural) channel and a sample rate of 11,025 |
123 |
|
|
samples per second, or an unspecified extended MIME Audio type named |
124 |
|
|
'x-cepstral-voc'. |
125 |
|
|
|
126 |
|
|
<INPUT NAME=FILE1 TYPE=FILE DEVICE=FILES ACCEPT="text/*"> |
127 |
|
|
|
128 |
|
|
Here the form element may be used to upload a file as usual, except |
129 |
|
|
that the files to select from might be constrained to text files, |
130 |
|
|
without explicit regard of their filename or extensions. |
131 |
|
|
|
132 |
|
|
<INPUT NAME=PICTURE1 TYPE=FILE DEVICE=CAMERA VALUE=2> |
133 |
|
|
|
134 |
|
|
The final example shows how these extensions may be used to request |
135 |
|
|
input from other kinds of devices, such as the second of two or |
136 |
|
|
more cameras connected to the system running the browser. |
137 |
|
|
|
138 |
|
|
3. Security considerations |
139 |
|
|
|
140 |
|
|
Browser operators may not want to send their files, recordings, |
141 |
|
|
pictures, video, or other device inputs to arbitrary sites without |
142 |
|
|
their explicit permission and direction. Therefore, browser |
143 |
|
|
authors are encouraged to disallow the submission of forms which |
144 |
|
|
include any kind of file upload by any means other than the |
145 |
|
|
standard HTML operator-controlled buttons for form submission |
146 |
|
|
without explicit instruction from the session operator to the |
147 |
|
|
contrary. Accordingly, the SIZE parameter, document style sheets, |
148 |
|
|
and document layers may be prevented from obscuring any kind of |
149 |
|
|
file upload widget, especially those capable of accepting a default |
150 |
|
|
filename. Finally, just as the operator may take direct action to |
151 |
|
|
initiate, terminate, review and edit recording as described in the |
152 |
|
|
next section, browser authors are encouraged to prevent HTML |
153 |
|
|
scripts from taking those and similar actions, unless for example |
154 |
|
|
the operator has specifically enabled such script actions with a |
155 |
|
|
security option. Even then, such preferences might be specified by |
156 |
|
|
the operator to reset after an interval or at the end of the |
157 |
|
|
session. Furthermore, explicit information might be provided to |
158 |
|
|
insure that the operator is informed when files are being uploaded. |
159 |
|
|
|
160 |
|
|
4. User interface usability and quality concerns for audio |
161 |
|
|
|
162 |
|
|
An audio sample is customarily recorded on computer equipment with |
163 |
|
|
a dialog routine capable of allowing the user to record, pause, |
164 |
|
|
play back, erase, or otherwise edit the recording. Browsers might |
165 |
|
|
provide the operator with the same kind of dialog routine for audio |
166 |
|
|
device input. And if a MAXLENGTH has been specified or is in force |
167 |
|
|
because of limited buffer size, a display of the buffer size used |
168 |
|
|
and remaining might be displayed as a dynamic bar graph (or |
169 |
|
|
percentage if graphics are unavailable.) A display of time in |
170 |
|
|
seconds used and remaining in the buffer may also be provided. |
171 |
|
|
|
172 |
|
|
Most MIME types defined for audio do not provide high-quality audio |
173 |
|
|
encodings. The 'audio/basic' and other types which use a sample |
174 |
|
|
rate of 8,000 samples per second truncate the audio spectrum at |
175 |
|
|
4,000 Hz according to the Nyquist theorem, discarding information |
176 |
|
|
important for discerning consonants. Also, audio/basic and other |
177 |
|
|
MIME Audio types use a sample size of eight bits, which does not |
178 |
|
|
usually provide enough dynamic range for accurate automatic speech |
179 |
|
|
recognition unless published automatic gain control algorithms are |
180 |
|
|
reliably used. If sixteen-bit unsigned audio encodings are used |
181 |
|
|
according to section 4.4.8 of RFC 1890, the sample rate -- |
182 |
|
|
specified as the 'rate' parameter of the MIME type 'audio/l16' -- |
183 |
|
|
might be at least 11,025 or 16,000 to adequately provide sufficient |
184 |
|
|
information for automatic speech recognition. Otherwise, the audio |
185 |
|
|
feature extraction encoding of the speech recognition algorithm |
186 |
|
|
might be used to provide a more compact representation to shorten |
187 |
|
|
the upload. |
188 |
|
|
|
189 |
|
|
5. Compatibility with earlier forms of audio input |
190 |
|
|
|
191 |
|
|
Audio device input has been proposed before and implemented from a |
192 |
|
|
microphone at least as early as 1994 in experimental versions of |
193 |
|
|
common Web browsers. To accommodate the syntax of these earlier |
194 |
|
|
extensions, a browser might interpret a valid XML statement such as |
195 |
|
|
|
196 |
|
|
<INPUT TYPE=AUDIO ...> |
197 |
|
|
|
198 |
|
|
as the device input form |
199 |
|
|
|
200 |
|
|
<INPUT TYPE=FILE DEVICE=MICROPHONE ...> |
201 |
|
|
|
202 |
|
|
with all other attribute/value pairs of the original INPUT element |
203 |
|
|
kept the same as specified. This would retain compatibility for |
204 |
|
|
all implementations of which the author of this draft is aware. |
205 |
|
|
|
206 |
|
|
6. HTML Document Type Description changes |
207 |
|
|
|
208 |
|
|
Along with the extension to the HTML InputType entity described in |
209 |
|
|
the previous section, this proposal makes an addition to the HTML |
210 |
|
|
DTD for the INPUT element ATTLIST of an #IMPLIED attribute DEVICE |
211 |
|
|
of type CDATA. |
212 |
|
|
|
213 |
|
|
7. Motivations and conclusion |
214 |
|
|
|
215 |
|
|
The primary motivation for these extensions is to add the |
216 |
|
|
capability of speech input to Web-based educational systems. For |
217 |
|
|
example, the "Test of English as a Foreign Language," or TOEFL |
218 |
|
|
assessment is comprised of multiple choice questions based on media |
219 |
|
|
comprised of text and audio recordings, so it would be possible to |
220 |
|
|
represent the TOEFL with current HTML multimedia content and forms. |
221 |
|
|
However, the TOEFL makes no provision whatsoever about the accuracy |
222 |
|
|
of pronunciation by the subjects of the assessment, except that |
223 |
|
|
provided by the ability to accurately identify the terms in the |
224 |
|
|
text of the assessment. So while scoring on the important ability |
225 |
|
|
to listen, the TOEFL does not make provisions to assess the |
226 |
|
|
important ability to speak with correct pronunciation. But with |
227 |
|
|
form-based audio input and upload, and speech recognition servers |
228 |
|
|
capable of aligning and scoring the pronunciation of words and |
229 |
|
|
phonemes, such a Web-based TOEFL could be extended to reduce the |
230 |
|
|
number of inscrutable graduate teaching assistants, for example. |
231 |
|
|
Of course the possibilities for language instruction enabled by |
232 |
|
|
these extensions are not limited to the graduate level or English. |
233 |
|
|
|
234 |
|
|
Other motivations include the development of "dictation servers" |
235 |
|
|
capable of transforming spoken audio uploaded though an HTTP |
236 |
|
|
session to the corresponding text suitable for sending in email or |
237 |
|
|
including in another document, for example. Natural language |
238 |
|
|
continuous speech recognition software conforming to standard APIs |
239 |
|
|
for automatic dictation is as of this writing available from retail |
240 |
|
|
outlets for less than US$90 so there is ample reason to believe |
241 |
|
|
that dictation servers could soon become commonplace on the Web |
242 |
|
|
with these extensions. |
243 |
|
|
|
244 |
|
|
Finally, this could be a great help for hearing impaired people who |
245 |
|
|
want to use a "phonology server" (similar to the server described |
246 |
|
|
in the Web-TOEFL example above) to practice improving their |
247 |
|
|
pronunciation without depending on a human speech coach. |
248 |
|
|
|
249 |
|
|
The change to the HTML DTD is very simple, but very powerful. It |
250 |
|
|
enables a much greater variety of services to be implemented via |
251 |
|
|
the World-Wide Web than is currently possible due to the lack of a |
252 |
|
|
peripheral input upload submission facility. This would be a very |
253 |
|
|
valuable addition to the capabilities of the World-Wide Web. |
254 |
|
|
|
255 |
|
|
8. Author's address and acknowledgments |
256 |
|
|
|
257 |
|
|
James Salsman |
258 |
|
|
Bovik Research (nonprofit research institute) |
259 |
|
|
courtesy WebTV Networks, Microsoft Corporation |
260 |
|
|
and MindSource Software Engineers |
261 |
|
|
|
262 |
|
|
575 S. Rengstorff Avenue |
263 |
|
|
Mountain View, CA 94040-1982 |
264 |
|
|
|
265 |
|
|
Email: jps@bovik.org, jsalsman@corp.webtv.net |
266 |
|
|
Phone: (650) 938-1440 |
267 |
|
|
|
268 |
|
|
"TOEFL" and "Test Of English as a Foreign Language" are |
269 |
|
|
registered trademarks of Educational Testing Service. |
270 |
|
|
|
271 |
|
|
References |
272 |
|
|
|
273 |
|
|
[RFC 1867] Form-based File Upload in HTML. E. Nebel & L. Masinter, |
274 |
|
|
November 1995. ftp://ds.internic.net/rfc/rfc1867.txt |
275 |
|
|
|
276 |
|
|
[RFC 1889] RTP: A Transport Protocol for Real-Time Applications. |
277 |
|
|
H. Schulzrinne, S. Casner, R. Frederick, & V. Jacobson, |
278 |
|
|
January 1996. ftp://ds.internic.net/rfc/rfc1889.txt |
279 |
|
|
|
280 |
|
|
[RFC 1890] RTP Profile for Audio and Video Conferences with Minimal |
281 |
|
|
Control. H. Schulzrinne, January 1996. |
282 |
|
|
ftp://ds.internic.net/rfc/rfc1890.txt |
283 |
|
|
|
284 |
|
|
[RFC 2068] Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding, |
285 |
|
|
J. Gettys, J. Mogul, H. Frystyk, & T. Berners-Lee, |
286 |
|
|
January 1997. ftp://ds.internic.net/rfc/rfc2068.txt |
287 |
|
|
|