/[suikacvs]/webroot/www/2004/id/draft-ietf-urlreg-guide-04.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-urlreg-guide-04.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 wakaba 1.1
2     INTERNET-DRAFT Larry Masinter
3     <draft-ietf-urlreg-guide-04.txt> Harald T. Alvestrand
4     November 13, 1998 Dan Zigmond
5     Rich Petke
6    
7    
8     Guidelines for new URL Schemes
9    
10    
11     Status of this Memo
12    
13     This document is an Internet-Draft. Internet-Drafts are working
14     documents of the Internet Engineering Task Force (IETF), its
15     areas, and its working groups. Note that other groups may also
16     distribute working documents as Internet-Drafts.
17    
18     Internet-Drafts are draft documents valid for a maximum of six
19     months and may be updated, replaced, or obsoleted by other
20     documents at any time. It is inappropriate to use Internet-
21     Drafts as reference material or to cite them other than as
22     "work in progress."
23    
24     To view the entire list of current Internet-Drafts, please check
25     the "1id-abstracts.txt" listing contained in the Internet-Drafts
26     Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
27     (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
28     (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
29     (US West Coast).
30    
31     Distribution of this memo is unlimited.
32    
33     This Internet Draft expires May 13, 1999.
34    
35     Copyright Notice
36    
37     Copyright (C) The Internet Society (1998). All Rights Reserved.
38    
39     Abstract
40    
41     A Uniform Resource Locator (URL) is a compact string representation
42     of the location for a resource that is available via the Internet.
43     This document provides guidelines for the definition of new URL
44     schemes.
45    
46    
47     1. Introduction
48    
49     A Uniform Resource Locator (URL) is a compact string representation
50     of the location for a resource that is available via the Internet.
51     RFC 2396 [1] defines the general syntax and semantics of URIs, and,
52     by inclusion, URLs. URLs are designated by including a "<scheme>:"
53     and then a "<scheme-specific-part>". Many URL schemes are already
54     defined.
55    
56     This document provides guidelines for the definition of new URL
57     schemes, for consideration by those who are defining and
58     registering or evaluating those definitions.
59    
60     The process by which new URL schemes are registered is defined in
61     RFC [URL-PROCESS] [2].
62    
63    
64     2. Guidelines for new URL schemes
65    
66     Because new URL schemes potentially complicate client software, new
67     schemes must have demonstrable utility and operability, as well as
68     compatibility with existing URL schemes. This section elaborates
69     these criteria.
70    
71    
72     2.1 Syntactic compatibility
73    
74     New URL schemes should follow the same syntactic conventions of
75     existing schemes when appropriate. If a URI scheme that has
76     embedded links in content accessed by that scheme does not share
77     syntax with a different scheme, the same content cannot be served up
78     under different schemes without rewriting the content. This can
79     already be a problem, and with future digital signature schemes,
80     rewriting may not even be possible. Deployment of other schemes in
81     the future could therefore become extremely difficult.
82    
83    
84     2.1.1 Motivations for syntactic compatibility
85    
86     Why should new URL schemes share as much of the generic URI syntax
87     (that makes sense to share) as possible? Consider the following:
88    
89     o If fragment syntax isn't shared between two schemes, (e.g. "<a
90     href="#foo">"), you can't move individual completely self
91     referential documents between schemes without rewriting the
92     embedded references within the document. In the Web, the fragment
93     syntax is a property of the media type, and evaluated by the
94     client.
95    
96     o If fragment syntax is not shared between different media types of
97     the same capability (e.g. HTML, XML, Word, or image types such as
98     GIF, JPEG, PNG) then you can't have a URI reference that can
99     evolve to superior media types as they become available, or even
100     likely work properly today with content negotiation.
101    
102     o If relative syntax (to the extent of understanding the URI is
103     relative, and what part of the URI string is relative) isn't
104     shared between two schemes, (e.g. "<a href="foo">"), you can't
105     move sets of documents that are internally self referential
106     between schemes without rewriting the embedded URIs.
107    
108     o If the ".." syntax as a path component in relative URI's isn't
109     shared between schemes, you can't easily have sets of document
110     sets and refer to them between schemes without rewriting the
111     embedded references.
112    
113     o If the "/" syntax (to the extent of understanding that the URI
114     refers to a path relative to the current naming authority, see
115     section 2.1.1) isn't shared, you can't have multiple sets of
116     documents easily be moved up or down in a relative hierarchy of
117     names and share a common set of documents between them, without
118     rewriting the content, shared either in that scheme or between
119     schemes. The best example is a site that has a common set of
120     GIF's, JPEG and PNG images, and you want to reorganize the site
121     changing the depth of a subtree from one depth to another, or
122     from one directory to another where the depth isn't the same.
123    
124     o If naming authority syntax (e.g. what comes after "//" in most URL
125     schemes, see section 2.1.1) and relative path syntax is shared, to
126     the extent of understanding that the URI has a naming authority,
127     and what part of the URI string is the naming authority vs. path),
128     isn't shared between two schemes, you can't share identical name
129     spaces and serve them up via different schemes. (The naming
130     authority syntax is a property of the scheme). The fact that
131     HTTP, and FTP have the same syntax, for example, has often been
132     exploited by sites transitioning from ftp archive service to HTTP
133     archive service so that the URL's can be identical between schemes
134     except for the scheme; the same content can be served via two
135     schemes simultaneously.
136    
137    
138     2.1.2 Improper use of "//" following "<scheme>:"
139    
140     Contrary to some examples set in past years, the use of double
141     slashes as the first component of the <scheme-specific-part> of a
142     URL is not simply an artistic indicator that what follows is a URL:
143     Double slashes are used ONLY when the syntax of the URL's
144     <scheme-specific-part> contains a hierarchical structure as
145     described in RFC 2396. In URLs from such schemes, the use of double
146     slashes indicates that what follows is the top hierarchical element
147     for a naming authority. (See section 3 of RFC 2396 for more
148     details.) URL schemes which do not contain a conformant
149     hierarchical structure in their <scheme-specific-part> should not
150     use double slashes following the "<scheme>:" string.
151    
152    
153     2.1.3 Compatibility with relative URLs
154    
155     URL schemes should use the generic URL syntax if they are intended
156     to be used with relative URLs. A description of the allowed
157     relative forms should be included in the scheme's definition.
158     Many applications use relative URLs extensively. Specifically,
159    
160     o Can the scheme be parsed according to RFC 2396 - that is, if the
161     tokens "//", "/", ";", "?" and "#" are used, do they have the
162     meaning given in RFC 2396?
163    
164     o Does the scheme make sense to use it in relative URLs like those
165     RFC 2396 specifies?
166    
167     o If the scheme syntax is designed to be broken into pieces, does
168     the documentation for the scheme's syntax specify what those
169     pieces are, why it should be broken in this way, and why the
170     breaks aren't where RFC 2396 says that they usually should be?
171    
172     o If the scheme has a hierarchy, does it go left-to-right and with
173     slash separators like RFC 2396? If not, why not?
174    
175    
176     2.1.4 Compatibility with fragment syntax
177    
178     Fragment syntax should be shared across URL schemes whenever
179     possible. Fragments indicate a location within a particular
180     document, of a particular media type. As media types evolve,
181     and content negotiation becomes deployed, a shared fragment syntax
182     allows a fragment to point to the correct location within documents
183     of different media types. For example, a named fragment (#foo),
184     should to be able to point to the foo label in either a HTML
185     document or an XML document. Similarly for fragments identifying a
186     location in an image, where the image may want to evolve from GIF,
187     to JPEG, to PNG, the fragment ID should point to the same location.
188    
189    
190     2.2 Is the scheme well defined?
191    
192     It is important that the semantics of the "resource" that a URL
193     "locates" be well defined. This might mean different things
194     depending on the nature of the URL scheme.
195    
196    
197     2.2.1 Clear mapping from other name spaces
198    
199     In many cases, new URL schemes are defined as ways to translate
200     other protocols and name spaces into the general framework of
201     URLs. The "ftp" URL scheme translates from the FTP protocol, while
202     the "mid" URL scheme translates from the Message-ID field of
203     messages.
204    
205     In either case, the description of the mapping must be complete,
206     must describe how characters get encoded or not in URLs, must
207     describe exactly how all legal values of the base standard can be
208     represented using the URL scheme, and exactly which modifiers,
209     alternate forms and other artifacts from the base standards are
210     included or not included. These requirements are elaborated
211     below.
212    
213    
214     2.2.2 URL schemes associated with network protocols
215    
216     Most new URL schemes are associated with network resources that
217     have one or several network protocols that can access them. The
218     'ftp', 'news', and 'http' schemes are of this nature. For such
219     schemes, the specification should completely describe how URLs are
220     translated into protocol actions in sufficient detail to make the
221     access of the network resource unambiguous. If an implementation
222     of the URL scheme requires some configuration, the configuration
223     elements must be clearly identified. (For example, the 'news'
224     scheme, if implemented using NTTP, requires configuration of the
225     NTTP server.)
226    
227    
228     2.2.3 Definition of non-protocol URL schemes
229    
230     In some cases, URL schemes do not have particular network protocols
231     associated with them, because their use is limited to contexts
232     where the access method is understood. This is the case, for
233     example, with the "cid" and "mid" URL schemes. For these URL
234     schemes, the specification should describe the notation of the
235     scheme and a complete mapping of the locator from its source.
236    
237    
238     2.2.4 Definition of URL schemes not associated with data resources
239    
240     Most URL schemes locate Internet resources that correspond
241     to data objects that can be retrieved or modified. This is the
242     case with "ftp" and "http", for example. However, some URL schemes
243     do not; for example, the "mailto" URL scheme corresponds to an
244     Internet mail address.
245    
246     If a new URL scheme does not locate resources that are data
247     objects, the properties of names in the new space must be clearly
248     defined.
249    
250    
251     2.2.5 Character encoding
252    
253     When describing URL schemes in which (some of) the elements of
254     the URL are actually representations of sequences of characters,
255     care should be taken not to introduce unnecessary variety in the
256     ways in which characters are encoded into octets and then into
257     URL characters. Unless there is some compelling reason for a
258     particular scheme to do otherwise, translating character sequences
259     into UTF-8 (RFC 2279) [3] and then subsequently using the %HH
260     encoding for unsafe octets is recommended.
261    
262    
263     2.2.6 Definition of operations
264    
265     In some contexts (for example, HTML forms) it is possible to
266     specify any one of a list of operations to be performed on a
267     specific URL. (Outside forms, it is generally assumed to be
268     something you GET.)
269    
270     The URL scheme definition should describe all well-defined
271     operations on the URL identifier, and what they are supposed to
272     do.
273    
274     Some URL schemes (for example, "telnet") provide location
275     information for hooking onto bi-directional data streams, and don't
276     fit the "infoaccess" paradigm of most URLs very well; this should
277     be documented.
278    
279     NOTE: It is perfectly valid to say that "no operation apart from
280     GET is defined for this URL". It is also valid to say that "there's
281     only one operation defined for this URL, and it's not very
282     GET-like". The important point is that what is defined on this type
283     is described.
284    
285    
286     2.3 Demonstrated utility
287    
288     URL schemes should have demonstrated utility. New URL schemes are
289     expensive things to support. Often they require special code in
290     browsers, proxies, and/or servers. Having a lot of ways to say the
291     same thing needless complicates these programs without adding value
292     to the Internet.
293    
294     The kinds of things that are useful include:
295    
296     o Things that cannot be referred to in any other way.
297    
298     o Things where it is much easier to get at them using this scheme
299     than (for instance) a proxy gateway.
300    
301    
302     2.3.1 Proxy into HTTP/HTML
303    
304     One way to provide a demonstration of utility is via a gateway
305     which provides objects in the new scheme for clients using an
306     existing protocol. It is much easier to deploy gateways to a new
307     service than it is to deploy browsers that understand the new URL
308     object.
309    
310     Things to look for when thinking about a proxy are:
311    
312     o Is there a single global resolution mechanism whereby any proxy
313     can find the referenced object?
314     o If not, is there a way in which the user can find any object of
315     this type, and "run his own proxy"?
316     o Are the operations mappable one-to-one (or possibly using
317     modifiers) to HTTP operations?
318     o Is the type of returned objects well defined?
319     - as MIME content-types?
320     - as something that can be translated to HTML?
321     o Is there running code for a proxy?
322    
323    
324     2.4 Are there security considerations?
325    
326     Above and beyond the security considerations of the base mechanism
327     a scheme builds upon, one must think of things that can happen in
328     the normal course of URL usage.
329    
330     In particular:
331    
332     o Does the user need to be warned that such a thing is happening
333     without an explicit request (GET for the source of an IMG tag,
334     for instance)? This has implications for the design of a proxy
335     gateway, of course.
336    
337     o Is it possible to fake URLs of this type that point to different
338     things in a dangerous way?
339    
340     o Are there mechanisms for identifying the requester that can be
341     used or need to be used with this mechanism (the From: field in a
342     mailto: URL, or the Kerberos login required for AFS access in the
343     AFS: URL, for instance)?
344    
345     o Does the mechanism contain passwords or other security
346     information that are passed inside the referring document in the
347     clear (as in the "ftp" URL, for instance)?
348    
349    
350     2.5 Does it start with UR?
351    
352     Any scheme starting with the letters "U" and "R", in particular if
353     it attaches any of the meanings "uniform", "universal" or
354     "unifying" to the first letter, is going to cause intense debate,
355     and generate much heat (but maybe little light).
356    
357     Any such proposal should either make sure that there is a large
358     consensus behind it that it will be the only scheme of its type, or
359     pick another name.
360    
361    
362     2.6 Non-considerations
363    
364     Some issues that are often raised but are not relevant to new URL
365     schemes include the following.
366    
367    
368     2.6.1 Are all objects accessible?
369    
370     Can all objects in the world that are validly identified by a
371     scheme be accessed by any UA implementing it?
372    
373     Sometimes the answer will be yes and sometimes no; often it will
374     depend on factors (like firewalls or client configuration) not
375     directly related to the scheme itself.
376    
377    
378     3. Security considerations
379    
380     New URL schemes are required to address all security considerations
381     in their definitions.
382    
383    
384     4. References
385    
386     [1] Berners-Lee, T., Fielding, R., Masinter, L., "Uniform Resource
387     Identifiers (URI): Generic Syntax", RFC 2396, August 1998
388    
389     [2] Petke, R., "Registration Procedures for URL Scheme Names",
390     RFC [URL-PROCESS], November 1998
391    
392     [3] Yergeau, F., "UTF-8, A Transformation Format of Unicode and ISO
393     10646", RFC 2279, January 1998.
394    
395    
396     5. Authors' Addresses
397    
398     Larry Masinter
399     Xerox Corporation
400     Palo Alto Research Center
401     3333 Coyote Hill Road
402     Palo Alto, CA 94304
403     Fax: +1-415-812-4333
404     EMail: masinter@parc.xerox.com
405    
406     Harald Tveit Alvestrand
407     Maxware, Pirsenteret
408     N-7005 Trondheim
409     NORWAY
410     Voice: +47 73 54 57 00
411     EMail: harald.alvestrand@maxware.no
412    
413     Dan Zigmond
414     WebTV Networks, Inc.
415     305 Lytton Avenue
416     Palo Alto, CA 94301
417     USA
418     Voice: +1-650-614-6071
419     EMail: djz@corp.webtv.net
420    
421     Rich Petke
422     MCIWORLDCOM Advanced Networks
423     5000 Britton Road
424     P. O. Box 5000
425     Hilliard, OH 43026-5000
426     Voice: +1-614-723-4157
427     Fax: +1-614-723-1333
428     EMail: rpetke@wcom.net
429    

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24