| 1 |
INTERNET-DRAFT J. Salsman |
| 2 |
Suggested filename: <draft-www-device-upload-00.txt> WebTV Networks |
| 3 |
Expiration date: 15 May 1998 12 November 1997 |
| 4 |
|
| 5 |
Form-based Device Input and Upload in HTML |
| 6 |
|
| 7 |
Status of this Memo |
| 8 |
|
| 9 |
This draft extends an experimental protocol for the Internet |
| 10 |
community. This draft does not specify an Internet standard of any |
| 11 |
kind. Discussion and suggestions for improvement are requested. |
| 12 |
Distribution of this memo will be unlimited when the W3C approves the |
| 13 |
HTML 4.0 standard. Until then, please do not distribute this draft |
| 14 |
beyond your department. |
| 15 |
|
| 16 |
1. Abstract and introduction |
| 17 |
|
| 18 |
Currently, HTML forms allow the producer of the form to request |
| 19 |
information -- including files of data -- from the operator reading |
| 20 |
the form. However, this capability is limited because HTML forms |
| 21 |
don't provide a way to ask the operator to submit input from |
| 22 |
arbitrary sources such as audio devices like microphones. Since |
| 23 |
input and upload from various devices is a feature that will |
| 24 |
benefit many applications, this draft proposes an extension to the |
| 25 |
HTML INPUT TYPE=FILE form element specified in RFC 1867 to allow |
| 26 |
information providers to express requests for uploads from audio |
| 27 |
and other devices uniformly, and a discussion of MIME audio data |
| 28 |
types to facilitate useful audio upload responses. This draft also |
| 29 |
includes security and audio usability and quality discussions as |
| 30 |
well as a description of a backward compatibility strategy that |
| 31 |
allows new user agents to utilize HTML written with earlier |
| 32 |
proposals for audio input in mind, and concludes with motivations. |
| 33 |
|
| 34 |
This document is an Internet-Draft. Internet-Drafts are working |
| 35 |
documents of the Internet Engineering Task Force (IETF), its |
| 36 |
areas, and its working groups. Note that other groups may also |
| 37 |
distribute working documents as Internet-Drafts. |
| 38 |
|
| 39 |
Internet-Drafts are draft documents valid for a maximum of six |
| 40 |
months and may be updated, replaced, or obsoleted by other |
| 41 |
documents at any time. It is inappropriate to use Internet- |
| 42 |
Drafts as reference material or to cite them other than as |
| 43 |
"work in progress." |
| 44 |
|
| 45 |
To view the entire list of current Internet-Drafts, please check |
| 46 |
the "1id-abstracts.txt" listing contained in the Internet-Drafts |
| 47 |
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net |
| 48 |
(Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East |
| 49 |
Coast), or ftp.isi.edu (US West Coast). |
| 50 |
|
| 51 |
2. HTML forms with device input file upload submission |
| 52 |
|
| 53 |
Section 3.1 of RFC 1867 provides for the presentation of an |
| 54 |
arbitrary "widget" to specify input for file uploads. When an |
| 55 |
INPUT tag of type FILE is encountered with a DEVICE attribute, the |
| 56 |
associated value (such as MICROPHONE, or MIC) might select the use |
| 57 |
of a widget capable of buffering and editing real-time input (such |
| 58 |
as speech) instead of entering a file selection mode. |
| 59 |
|
| 60 |
If an ACCEPT attribute is present in a device file input element, |
| 61 |
the browser might constrain the MIME type of uploaded data to match |
| 62 |
those with the corresponding list of types specified. If the value |
| 63 |
of the DEVICE parameter is FILESYSTEM or FILES then the INPUT |
| 64 |
element might be treated as usual according to RFC 1867 except that |
| 65 |
the subset of files presented to the operator to choose from may be |
| 66 |
constrained by the specified list of MIME types instead of a |
| 67 |
pattern of file names or extensions. |
| 68 |
|
| 69 |
Since there is no original filename as specified in section 3.3 of |
| 70 |
RFC 1867 for parameters of the 'content-disposition: form-data' and |
| 71 |
'content-disposition: file' HTTP headers, those headers might be |
| 72 |
provided with a 'type' parameter representing the MIME type of the |
| 73 |
encoded data, if known, and a 'device' parameter with the same |
| 74 |
value as the DEVICE attribute of the associated form input element, |
| 75 |
unless the device or MIME type(s) specified are unsupported in |
| 76 |
which case the value of the 'device' header parameter might be |
| 77 |
'unsupported', or unless the device is unavailable in which case |
| 78 |
the value might be 'unavailable'. If the MIME types requested are |
| 79 |
unsupported, an additional parameter 'alternates' might be included |
| 80 |
with a space-separated list of MIME types of the same content-type |
| 81 |
which may be supported as alternatives for the specified device. |
| 82 |
|
| 83 |
There may be significant limitations on the client browser's |
| 84 |
ability to buffer input for upload. Browsers might provide an |
| 85 |
estimate of the default MAXLENGTH available for device input and |
| 86 |
upload through the HTTP header 'Pragma: DEVICE-MAXLENGTH='BYTES |
| 87 |
which represents the content-length available to the browser for |
| 88 |
buffering (see section 14.32 of RFC 2068.) |
| 89 |
|
| 90 |
Furthermore, the VALUE attribute may be used to provide a |
| 91 |
disambiguation between multiple similar devices when present. |
| 92 |
|
| 93 |
If real time events, such as those described and proposed by |
| 94 |
Gregory S. Aist in "A General Architecture for a Real-Time |
| 95 |
Discourse Agent and a Case Study in Computerized Oral Reading |
| 96 |
Tutoring" (Carnegie Mellon University Computational Linguistics |
| 97 |
Program, 6 December 1996), are required, then the Real-time |
| 98 |
Transport Protocol (RTP, currently RFC 1889) should be used |
| 99 |
instead. Because of security concerns discussed in section 3 |
| 100 |
below, HTML scripts might not be able to invoke a form submission |
| 101 |
when the form involves any kind of file upload without explicit |
| 102 |
instructions from the session operator to the contrary. |
| 103 |
|
| 104 |
2.1. Examples |
| 105 |
|
| 106 |
<FORM ENCTYPE="multipart/form-data" METHOD=POST ACTION="_URL_"> |
| 107 |
Say something: <INPUT NAME=SPEECH1 TYPE=FILE DEVICE=MIC> |
| 108 |
<INPUT TYPE=SUBMIT VALUE="Send Speech"> |
| 109 |
</FORM> |
| 110 |
|
| 111 |
In this simple form, the HTML author has requested the upload of |
| 112 |
sampled microphone input from the operator upon form submission. |
| 113 |
|
| 114 |
<INPUT NAME=SPEECH2 TYPE=FILE DEVICE=MICROPHONE |
| 115 |
ACCEPT="audio/l16 ;rate=11025 ;channels=1 audio/x-cepstral-voc"> |
| 116 |
|
| 117 |
Here MIC is not used as an abbreviation. The author of the HTML has |
| 118 |
requested that the data input from the microphone be encoded as |
| 119 |
either |
| 120 |
the MIME type Audio/L16 -- sixteen bit signed linear audio samples |
| 121 |
(most-significant byte first) -- as specified in RFC 1890 section |
| 122 |
4.4.8, with a single (monaural) channel and a sample rate of 11,025 |
| 123 |
samples per second, or an unspecified extended MIME Audio type named |
| 124 |
'x-cepstral-voc'. |
| 125 |
|
| 126 |
<INPUT NAME=FILE1 TYPE=FILE DEVICE=FILES ACCEPT="text/*"> |
| 127 |
|
| 128 |
Here the form element may be used to upload a file as usual, except |
| 129 |
that the files to select from might be constrained to text files, |
| 130 |
without explicit regard of their filename or extensions. |
| 131 |
|
| 132 |
<INPUT NAME=PICTURE1 TYPE=FILE DEVICE=CAMERA VALUE=2> |
| 133 |
|
| 134 |
The final example shows how these extensions may be used to request |
| 135 |
input from other kinds of devices, such as the second of two or |
| 136 |
more cameras connected to the system running the browser. |
| 137 |
|
| 138 |
3. Security considerations |
| 139 |
|
| 140 |
Browser operators may not want to send their files, recordings, |
| 141 |
pictures, video, or other device inputs to arbitrary sites without |
| 142 |
their explicit permission and direction. Therefore, browser |
| 143 |
authors are encouraged to disallow the submission of forms which |
| 144 |
include any kind of file upload by any means other than the |
| 145 |
standard HTML operator-controlled buttons for form submission |
| 146 |
without explicit instruction from the session operator to the |
| 147 |
contrary. Accordingly, the SIZE parameter, document style sheets, |
| 148 |
and document layers may be prevented from obscuring any kind of |
| 149 |
file upload widget, especially those capable of accepting a default |
| 150 |
filename. Finally, just as the operator may take direct action to |
| 151 |
initiate, terminate, review and edit recording as described in the |
| 152 |
next section, browser authors are encouraged to prevent HTML |
| 153 |
scripts from taking those and similar actions, unless for example |
| 154 |
the operator has specifically enabled such script actions with a |
| 155 |
security option. Even then, such preferences might be specified by |
| 156 |
the operator to reset after an interval or at the end of the |
| 157 |
session. Furthermore, explicit information might be provided to |
| 158 |
insure that the operator is informed when files are being uploaded. |
| 159 |
|
| 160 |
4. User interface usability and quality concerns for audio |
| 161 |
|
| 162 |
An audio sample is customarily recorded on computer equipment with |
| 163 |
a dialog routine capable of allowing the user to record, pause, |
| 164 |
play back, erase, or otherwise edit the recording. Browsers might |
| 165 |
provide the operator with the same kind of dialog routine for audio |
| 166 |
device input. And if a MAXLENGTH has been specified or is in force |
| 167 |
because of limited buffer size, a display of the buffer size used |
| 168 |
and remaining might be displayed as a dynamic bar graph (or |
| 169 |
percentage if graphics are unavailable.) A display of time in |
| 170 |
seconds used and remaining in the buffer may also be provided. |
| 171 |
|
| 172 |
Most MIME types defined for audio do not provide high-quality audio |
| 173 |
encodings. The 'audio/basic' and other types which use a sample |
| 174 |
rate of 8,000 samples per second truncate the audio spectrum at |
| 175 |
4,000 Hz according to the Nyquist theorem, discarding information |
| 176 |
important for discerning consonants. Also, audio/basic and other |
| 177 |
MIME Audio types use a sample size of eight bits, which does not |
| 178 |
usually provide enough dynamic range for accurate automatic speech |
| 179 |
recognition unless published automatic gain control algorithms are |
| 180 |
reliably used. If sixteen-bit unsigned audio encodings are used |
| 181 |
according to section 4.4.8 of RFC 1890, the sample rate -- |
| 182 |
specified as the 'rate' parameter of the MIME type 'audio/l16' -- |
| 183 |
might be at least 11,025 or 16,000 to adequately provide sufficient |
| 184 |
information for automatic speech recognition. Otherwise, the audio |
| 185 |
feature extraction encoding of the speech recognition algorithm |
| 186 |
might be used to provide a more compact representation to shorten |
| 187 |
the upload. |
| 188 |
|
| 189 |
5. Compatibility with earlier forms of audio input |
| 190 |
|
| 191 |
Audio device input has been proposed before and implemented from a |
| 192 |
microphone at least as early as 1994 in experimental versions of |
| 193 |
common Web browsers. To accommodate the syntax of these earlier |
| 194 |
extensions, a browser might interpret a valid XML statement such as |
| 195 |
|
| 196 |
<INPUT TYPE=AUDIO ...> |
| 197 |
|
| 198 |
as the device input form |
| 199 |
|
| 200 |
<INPUT TYPE=FILE DEVICE=MICROPHONE ...> |
| 201 |
|
| 202 |
with all other attribute/value pairs of the original INPUT element |
| 203 |
kept the same as specified. This would retain compatibility for |
| 204 |
all implementations of which the author of this draft is aware. |
| 205 |
|
| 206 |
6. HTML Document Type Description changes |
| 207 |
|
| 208 |
Along with the extension to the HTML InputType entity described in |
| 209 |
the previous section, this proposal makes an addition to the HTML |
| 210 |
DTD for the INPUT element ATTLIST of an #IMPLIED attribute DEVICE |
| 211 |
of type CDATA. |
| 212 |
|
| 213 |
7. Motivations and conclusion |
| 214 |
|
| 215 |
The primary motivation for these extensions is to add the |
| 216 |
capability of speech input to Web-based educational systems. For |
| 217 |
example, the "Test of English as a Foreign Language," or TOEFL |
| 218 |
assessment is comprised of multiple choice questions based on media |
| 219 |
comprised of text and audio recordings, so it would be possible to |
| 220 |
represent the TOEFL with current HTML multimedia content and forms. |
| 221 |
However, the TOEFL makes no provision whatsoever about the accuracy |
| 222 |
of pronunciation by the subjects of the assessment, except that |
| 223 |
provided by the ability to accurately identify the terms in the |
| 224 |
text of the assessment. So while scoring on the important ability |
| 225 |
to listen, the TOEFL does not make provisions to assess the |
| 226 |
important ability to speak with correct pronunciation. But with |
| 227 |
form-based audio input and upload, and speech recognition servers |
| 228 |
capable of aligning and scoring the pronunciation of words and |
| 229 |
phonemes, such a Web-based TOEFL could be extended to reduce the |
| 230 |
number of inscrutable graduate teaching assistants, for example. |
| 231 |
Of course the possibilities for language instruction enabled by |
| 232 |
these extensions are not limited to the graduate level or English. |
| 233 |
|
| 234 |
Other motivations include the development of "dictation servers" |
| 235 |
capable of transforming spoken audio uploaded though an HTTP |
| 236 |
session to the corresponding text suitable for sending in email or |
| 237 |
including in another document, for example. Natural language |
| 238 |
continuous speech recognition software conforming to standard APIs |
| 239 |
for automatic dictation is as of this writing available from retail |
| 240 |
outlets for less than US$90 so there is ample reason to believe |
| 241 |
that dictation servers could soon become commonplace on the Web |
| 242 |
with these extensions. |
| 243 |
|
| 244 |
Finally, this could be a great help for hearing impaired people who |
| 245 |
want to use a "phonology server" (similar to the server described |
| 246 |
in the Web-TOEFL example above) to practice improving their |
| 247 |
pronunciation without depending on a human speech coach. |
| 248 |
|
| 249 |
The change to the HTML DTD is very simple, but very powerful. It |
| 250 |
enables a much greater variety of services to be implemented via |
| 251 |
the World-Wide Web than is currently possible due to the lack of a |
| 252 |
peripheral input upload submission facility. This would be a very |
| 253 |
valuable addition to the capabilities of the World-Wide Web. |
| 254 |
|
| 255 |
8. Author's address and acknowledgments |
| 256 |
|
| 257 |
James Salsman |
| 258 |
Bovik Research (nonprofit research institute) |
| 259 |
courtesy WebTV Networks, Microsoft Corporation |
| 260 |
and MindSource Software Engineers |
| 261 |
|
| 262 |
575 S. Rengstorff Avenue |
| 263 |
Mountain View, CA 94040-1982 |
| 264 |
|
| 265 |
Email: jps@bovik.org, jsalsman@corp.webtv.net |
| 266 |
Phone: (650) 938-1440 |
| 267 |
|
| 268 |
"TOEFL" and "Test Of English as a Foreign Language" are |
| 269 |
registered trademarks of Educational Testing Service. |
| 270 |
|
| 271 |
References |
| 272 |
|
| 273 |
[RFC 1867] Form-based File Upload in HTML. E. Nebel & L. Masinter, |
| 274 |
November 1995. ftp://ds.internic.net/rfc/rfc1867.txt |
| 275 |
|
| 276 |
[RFC 1889] RTP: A Transport Protocol for Real-Time Applications. |
| 277 |
H. Schulzrinne, S. Casner, R. Frederick, & V. Jacobson, |
| 278 |
January 1996. ftp://ds.internic.net/rfc/rfc1889.txt |
| 279 |
|
| 280 |
[RFC 1890] RTP Profile for Audio and Video Conferences with Minimal |
| 281 |
Control. H. Schulzrinne, January 1996. |
| 282 |
ftp://ds.internic.net/rfc/rfc1890.txt |
| 283 |
|
| 284 |
[RFC 2068] Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding, |
| 285 |
J. Gettys, J. Mogul, H. Frystyk, & T. Berners-Lee, |
| 286 |
January 1997. ftp://ds.internic.net/rfc/rfc2068.txt |
| 287 |
|