/[suikacvs]/webroot/www/2004/id/draft-www-device-upload-00.txt
Suika

Contents of /webroot/www/2004/id/draft-www-device-upload-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (21 years, 1 month ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 wakaba 1.1 INTERNET-DRAFT J. Salsman
2     Suggested filename: <draft-www-device-upload-00.txt> WebTV Networks
3     Expiration date: 15 May 1998 12 November 1997
4    
5     Form-based Device Input and Upload in HTML
6    
7     Status of this Memo
8    
9     This draft extends an experimental protocol for the Internet
10     community. This draft does not specify an Internet standard of any
11     kind. Discussion and suggestions for improvement are requested.
12     Distribution of this memo will be unlimited when the W3C approves the
13     HTML 4.0 standard. Until then, please do not distribute this draft
14     beyond your department.
15    
16     1. Abstract and introduction
17    
18     Currently, HTML forms allow the producer of the form to request
19     information -- including files of data -- from the operator reading
20     the form. However, this capability is limited because HTML forms
21     don't provide a way to ask the operator to submit input from
22     arbitrary sources such as audio devices like microphones. Since
23     input and upload from various devices is a feature that will
24     benefit many applications, this draft proposes an extension to the
25     HTML INPUT TYPE=FILE form element specified in RFC 1867 to allow
26     information providers to express requests for uploads from audio
27     and other devices uniformly, and a discussion of MIME audio data
28     types to facilitate useful audio upload responses. This draft also
29     includes security and audio usability and quality discussions as
30     well as a description of a backward compatibility strategy that
31     allows new user agents to utilize HTML written with earlier
32     proposals for audio input in mind, and concludes with motivations.
33    
34     This document is an Internet-Draft. Internet-Drafts are working
35     documents of the Internet Engineering Task Force (IETF), its
36     areas, and its working groups. Note that other groups may also
37     distribute working documents as Internet-Drafts.
38    
39     Internet-Drafts are draft documents valid for a maximum of six
40     months and may be updated, replaced, or obsoleted by other
41     documents at any time. It is inappropriate to use Internet-
42     Drafts as reference material or to cite them other than as
43     "work in progress."
44    
45     To view the entire list of current Internet-Drafts, please check
46     the "1id-abstracts.txt" listing contained in the Internet-Drafts
47     Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
48     (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
49     Coast), or ftp.isi.edu (US West Coast).
50    
51     2. HTML forms with device input file upload submission
52    
53     Section 3.1 of RFC 1867 provides for the presentation of an
54     arbitrary "widget" to specify input for file uploads. When an
55     INPUT tag of type FILE is encountered with a DEVICE attribute, the
56     associated value (such as MICROPHONE, or MIC) might select the use
57     of a widget capable of buffering and editing real-time input (such
58     as speech) instead of entering a file selection mode.
59    
60     If an ACCEPT attribute is present in a device file input element,
61     the browser might constrain the MIME type of uploaded data to match
62     those with the corresponding list of types specified. If the value
63     of the DEVICE parameter is FILESYSTEM or FILES then the INPUT
64     element might be treated as usual according to RFC 1867 except that
65     the subset of files presented to the operator to choose from may be
66     constrained by the specified list of MIME types instead of a
67     pattern of file names or extensions.
68    
69     Since there is no original filename as specified in section 3.3 of
70     RFC 1867 for parameters of the 'content-disposition: form-data' and
71     'content-disposition: file' HTTP headers, those headers might be
72     provided with a 'type' parameter representing the MIME type of the
73     encoded data, if known, and a 'device' parameter with the same
74     value as the DEVICE attribute of the associated form input element,
75     unless the device or MIME type(s) specified are unsupported in
76     which case the value of the 'device' header parameter might be
77     'unsupported', or unless the device is unavailable in which case
78     the value might be 'unavailable'. If the MIME types requested are
79     unsupported, an additional parameter 'alternates' might be included
80     with a space-separated list of MIME types of the same content-type
81     which may be supported as alternatives for the specified device.
82    
83     There may be significant limitations on the client browser's
84     ability to buffer input for upload. Browsers might provide an
85     estimate of the default MAXLENGTH available for device input and
86     upload through the HTTP header 'Pragma: DEVICE-MAXLENGTH='BYTES
87     which represents the content-length available to the browser for
88     buffering (see section 14.32 of RFC 2068.)
89    
90     Furthermore, the VALUE attribute may be used to provide a
91     disambiguation between multiple similar devices when present.
92    
93     If real time events, such as those described and proposed by
94     Gregory S. Aist in "A General Architecture for a Real-Time
95     Discourse Agent and a Case Study in Computerized Oral Reading
96     Tutoring" (Carnegie Mellon University Computational Linguistics
97     Program, 6 December 1996), are required, then the Real-time
98     Transport Protocol (RTP, currently RFC 1889) should be used
99     instead. Because of security concerns discussed in section 3
100     below, HTML scripts might not be able to invoke a form submission
101     when the form involves any kind of file upload without explicit
102     instructions from the session operator to the contrary.
103    
104     2.1. Examples
105    
106     <FORM ENCTYPE="multipart/form-data" METHOD=POST ACTION="_URL_">
107     Say something: <INPUT NAME=SPEECH1 TYPE=FILE DEVICE=MIC>
108     <INPUT TYPE=SUBMIT VALUE="Send Speech">
109     </FORM>
110    
111     In this simple form, the HTML author has requested the upload of
112     sampled microphone input from the operator upon form submission.
113    
114     <INPUT NAME=SPEECH2 TYPE=FILE DEVICE=MICROPHONE
115     ACCEPT="audio/l16 ;rate=11025 ;channels=1 audio/x-cepstral-voc">
116    
117     Here MIC is not used as an abbreviation. The author of the HTML has
118     requested that the data input from the microphone be encoded as
119     either
120     the MIME type Audio/L16 -- sixteen bit signed linear audio samples
121     (most-significant byte first) -- as specified in RFC 1890 section
122     4.4.8, with a single (monaural) channel and a sample rate of 11,025
123     samples per second, or an unspecified extended MIME Audio type named
124     'x-cepstral-voc'.
125    
126     <INPUT NAME=FILE1 TYPE=FILE DEVICE=FILES ACCEPT="text/*">
127    
128     Here the form element may be used to upload a file as usual, except
129     that the files to select from might be constrained to text files,
130     without explicit regard of their filename or extensions.
131    
132     <INPUT NAME=PICTURE1 TYPE=FILE DEVICE=CAMERA VALUE=2>
133    
134     The final example shows how these extensions may be used to request
135     input from other kinds of devices, such as the second of two or
136     more cameras connected to the system running the browser.
137    
138     3. Security considerations
139    
140     Browser operators may not want to send their files, recordings,
141     pictures, video, or other device inputs to arbitrary sites without
142     their explicit permission and direction. Therefore, browser
143     authors are encouraged to disallow the submission of forms which
144     include any kind of file upload by any means other than the
145     standard HTML operator-controlled buttons for form submission
146     without explicit instruction from the session operator to the
147     contrary. Accordingly, the SIZE parameter, document style sheets,
148     and document layers may be prevented from obscuring any kind of
149     file upload widget, especially those capable of accepting a default
150     filename. Finally, just as the operator may take direct action to
151     initiate, terminate, review and edit recording as described in the
152     next section, browser authors are encouraged to prevent HTML
153     scripts from taking those and similar actions, unless for example
154     the operator has specifically enabled such script actions with a
155     security option. Even then, such preferences might be specified by
156     the operator to reset after an interval or at the end of the
157     session. Furthermore, explicit information might be provided to
158     insure that the operator is informed when files are being uploaded.
159    
160     4. User interface usability and quality concerns for audio
161    
162     An audio sample is customarily recorded on computer equipment with
163     a dialog routine capable of allowing the user to record, pause,
164     play back, erase, or otherwise edit the recording. Browsers might
165     provide the operator with the same kind of dialog routine for audio
166     device input. And if a MAXLENGTH has been specified or is in force
167     because of limited buffer size, a display of the buffer size used
168     and remaining might be displayed as a dynamic bar graph (or
169     percentage if graphics are unavailable.) A display of time in
170     seconds used and remaining in the buffer may also be provided.
171    
172     Most MIME types defined for audio do not provide high-quality audio
173     encodings. The 'audio/basic' and other types which use a sample
174     rate of 8,000 samples per second truncate the audio spectrum at
175     4,000 Hz according to the Nyquist theorem, discarding information
176     important for discerning consonants. Also, audio/basic and other
177     MIME Audio types use a sample size of eight bits, which does not
178     usually provide enough dynamic range for accurate automatic speech
179     recognition unless published automatic gain control algorithms are
180     reliably used. If sixteen-bit unsigned audio encodings are used
181     according to section 4.4.8 of RFC 1890, the sample rate --
182     specified as the 'rate' parameter of the MIME type 'audio/l16' --
183     might be at least 11,025 or 16,000 to adequately provide sufficient
184     information for automatic speech recognition. Otherwise, the audio
185     feature extraction encoding of the speech recognition algorithm
186     might be used to provide a more compact representation to shorten
187     the upload.
188    
189     5. Compatibility with earlier forms of audio input
190    
191     Audio device input has been proposed before and implemented from a
192     microphone at least as early as 1994 in experimental versions of
193     common Web browsers. To accommodate the syntax of these earlier
194     extensions, a browser might interpret a valid XML statement such as
195    
196     <INPUT TYPE=AUDIO ...>
197    
198     as the device input form
199    
200     <INPUT TYPE=FILE DEVICE=MICROPHONE ...>
201    
202     with all other attribute/value pairs of the original INPUT element
203     kept the same as specified. This would retain compatibility for
204     all implementations of which the author of this draft is aware.
205    
206     6. HTML Document Type Description changes
207    
208     Along with the extension to the HTML InputType entity described in
209     the previous section, this proposal makes an addition to the HTML
210     DTD for the INPUT element ATTLIST of an #IMPLIED attribute DEVICE
211     of type CDATA.
212    
213     7. Motivations and conclusion
214    
215     The primary motivation for these extensions is to add the
216     capability of speech input to Web-based educational systems. For
217     example, the "Test of English as a Foreign Language," or TOEFL
218     assessment is comprised of multiple choice questions based on media
219     comprised of text and audio recordings, so it would be possible to
220     represent the TOEFL with current HTML multimedia content and forms.
221     However, the TOEFL makes no provision whatsoever about the accuracy
222     of pronunciation by the subjects of the assessment, except that
223     provided by the ability to accurately identify the terms in the
224     text of the assessment. So while scoring on the important ability
225     to listen, the TOEFL does not make provisions to assess the
226     important ability to speak with correct pronunciation. But with
227     form-based audio input and upload, and speech recognition servers
228     capable of aligning and scoring the pronunciation of words and
229     phonemes, such a Web-based TOEFL could be extended to reduce the
230     number of inscrutable graduate teaching assistants, for example.
231     Of course the possibilities for language instruction enabled by
232     these extensions are not limited to the graduate level or English.
233    
234     Other motivations include the development of "dictation servers"
235     capable of transforming spoken audio uploaded though an HTTP
236     session to the corresponding text suitable for sending in email or
237     including in another document, for example. Natural language
238     continuous speech recognition software conforming to standard APIs
239     for automatic dictation is as of this writing available from retail
240     outlets for less than US$90 so there is ample reason to believe
241     that dictation servers could soon become commonplace on the Web
242     with these extensions.
243    
244     Finally, this could be a great help for hearing impaired people who
245     want to use a "phonology server" (similar to the server described
246     in the Web-TOEFL example above) to practice improving their
247     pronunciation without depending on a human speech coach.
248    
249     The change to the HTML DTD is very simple, but very powerful. It
250     enables a much greater variety of services to be implemented via
251     the World-Wide Web than is currently possible due to the lack of a
252     peripheral input upload submission facility. This would be a very
253     valuable addition to the capabilities of the World-Wide Web.
254    
255     8. Author's address and acknowledgments
256    
257     James Salsman
258     Bovik Research (nonprofit research institute)
259     courtesy WebTV Networks, Microsoft Corporation
260     and MindSource Software Engineers
261    
262     575 S. Rengstorff Avenue
263     Mountain View, CA 94040-1982
264    
265     Email: jps@bovik.org, jsalsman@corp.webtv.net
266     Phone: (650) 938-1440
267    
268     "TOEFL" and "Test Of English as a Foreign Language" are
269     registered trademarks of Educational Testing Service.
270    
271     References
272    
273     [RFC 1867] Form-based File Upload in HTML. E. Nebel & L. Masinter,
274     November 1995. ftp://ds.internic.net/rfc/rfc1867.txt
275    
276     [RFC 1889] RTP: A Transport Protocol for Real-Time Applications.
277     H. Schulzrinne, S. Casner, R. Frederick, & V. Jacobson,
278     January 1996. ftp://ds.internic.net/rfc/rfc1889.txt
279    
280     [RFC 1890] RTP Profile for Audio and Video Conferences with Minimal
281     Control. H. Schulzrinne, January 1996.
282     ftp://ds.internic.net/rfc/rfc1890.txt
283    
284     [RFC 2068] Hypertext Transfer Protocol -- HTTP/1.1. R. Fielding,
285     J. Gettys, J. Mogul, H. Frystyk, & T. Berners-Lee,
286     January 1997. ftp://ds.internic.net/rfc/rfc2068.txt
287    

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24