/[suikacvs]/webroot/www/2004/id/draft-ietf-urlreg-guide-04.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-urlreg-guide-04.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
Error occurred while calculating annotation data.
New

1
2 INTERNET-DRAFT Larry Masinter
3 <draft-ietf-urlreg-guide-04.txt> Harald T. Alvestrand
4 November 13, 1998 Dan Zigmond
5 Rich Petke
6
7
8 Guidelines for new URL Schemes
9
10
11 Status of this Memo
12
13 This document is an Internet-Draft. Internet-Drafts are working
14 documents of the Internet Engineering Task Force (IETF), its
15 areas, and its working groups. Note that other groups may also
16 distribute working documents as Internet-Drafts.
17
18 Internet-Drafts are draft documents valid for a maximum of six
19 months and may be updated, replaced, or obsoleted by other
20 documents at any time. It is inappropriate to use Internet-
21 Drafts as reference material or to cite them other than as
22 "work in progress."
23
24 To view the entire list of current Internet-Drafts, please check
25 the "1id-abstracts.txt" listing contained in the Internet-Drafts
26 Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
27 (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
28 (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
29 (US West Coast).
30
31 Distribution of this memo is unlimited.
32
33 This Internet Draft expires May 13, 1999.
34
35 Copyright Notice
36
37 Copyright (C) The Internet Society (1998). All Rights Reserved.
38
39 Abstract
40
41 A Uniform Resource Locator (URL) is a compact string representation
42 of the location for a resource that is available via the Internet.
43 This document provides guidelines for the definition of new URL
44 schemes.
45
46
47 1. Introduction
48
49 A Uniform Resource Locator (URL) is a compact string representation
50 of the location for a resource that is available via the Internet.
51 RFC 2396 [1] defines the general syntax and semantics of URIs, and,
52 by inclusion, URLs. URLs are designated by including a "<scheme>:"
53 and then a "<scheme-specific-part>". Many URL schemes are already
54 defined.
55
56 This document provides guidelines for the definition of new URL
57 schemes, for consideration by those who are defining and
58 registering or evaluating those definitions.
59
60 The process by which new URL schemes are registered is defined in
61 RFC [URL-PROCESS] [2].
62
63
64 2. Guidelines for new URL schemes
65
66 Because new URL schemes potentially complicate client software, new
67 schemes must have demonstrable utility and operability, as well as
68 compatibility with existing URL schemes. This section elaborates
69 these criteria.
70
71
72 2.1 Syntactic compatibility
73
74 New URL schemes should follow the same syntactic conventions of
75 existing schemes when appropriate. If a URI scheme that has
76 embedded links in content accessed by that scheme does not share
77 syntax with a different scheme, the same content cannot be served up
78 under different schemes without rewriting the content. This can
79 already be a problem, and with future digital signature schemes,
80 rewriting may not even be possible. Deployment of other schemes in
81 the future could therefore become extremely difficult.
82
83
84 2.1.1 Motivations for syntactic compatibility
85
86 Why should new URL schemes share as much of the generic URI syntax
87 (that makes sense to share) as possible? Consider the following:
88
89 o If fragment syntax isn't shared between two schemes, (e.g. "<a
90 href="#foo">"), you can't move individual completely self
91 referential documents between schemes without rewriting the
92 embedded references within the document. In the Web, the fragment
93 syntax is a property of the media type, and evaluated by the
94 client.
95
96 o If fragment syntax is not shared between different media types of
97 the same capability (e.g. HTML, XML, Word, or image types such as
98 GIF, JPEG, PNG) then you can't have a URI reference that can
99 evolve to superior media types as they become available, or even
100 likely work properly today with content negotiation.
101
102 o If relative syntax (to the extent of understanding the URI is
103 relative, and what part of the URI string is relative) isn't
104 shared between two schemes, (e.g. "<a href="foo">"), you can't
105 move sets of documents that are internally self referential
106 between schemes without rewriting the embedded URIs.
107
108 o If the ".." syntax as a path component in relative URI's isn't
109 shared between schemes, you can't easily have sets of document
110 sets and refer to them between schemes without rewriting the
111 embedded references.
112
113 o If the "/" syntax (to the extent of understanding that the URI
114 refers to a path relative to the current naming authority, see
115 section 2.1.1) isn't shared, you can't have multiple sets of
116 documents easily be moved up or down in a relative hierarchy of
117 names and share a common set of documents between them, without
118 rewriting the content, shared either in that scheme or between
119 schemes. The best example is a site that has a common set of
120 GIF's, JPEG and PNG images, and you want to reorganize the site
121 changing the depth of a subtree from one depth to another, or
122 from one directory to another where the depth isn't the same.
123
124 o If naming authority syntax (e.g. what comes after "//" in most URL
125 schemes, see section 2.1.1) and relative path syntax is shared, to
126 the extent of understanding that the URI has a naming authority,
127 and what part of the URI string is the naming authority vs. path),
128 isn't shared between two schemes, you can't share identical name
129 spaces and serve them up via different schemes. (The naming
130 authority syntax is a property of the scheme). The fact that
131 HTTP, and FTP have the same syntax, for example, has often been
132 exploited by sites transitioning from ftp archive service to HTTP
133 archive service so that the URL's can be identical between schemes
134 except for the scheme; the same content can be served via two
135 schemes simultaneously.
136
137
138 2.1.2 Improper use of "//" following "<scheme>:"
139
140 Contrary to some examples set in past years, the use of double
141 slashes as the first component of the <scheme-specific-part> of a
142 URL is not simply an artistic indicator that what follows is a URL:
143 Double slashes are used ONLY when the syntax of the URL's
144 <scheme-specific-part> contains a hierarchical structure as
145 described in RFC 2396. In URLs from such schemes, the use of double
146 slashes indicates that what follows is the top hierarchical element
147 for a naming authority. (See section 3 of RFC 2396 for more
148 details.) URL schemes which do not contain a conformant
149 hierarchical structure in their <scheme-specific-part> should not
150 use double slashes following the "<scheme>:" string.
151
152
153 2.1.3 Compatibility with relative URLs
154
155 URL schemes should use the generic URL syntax if they are intended
156 to be used with relative URLs. A description of the allowed
157 relative forms should be included in the scheme's definition.
158 Many applications use relative URLs extensively. Specifically,
159
160 o Can the scheme be parsed according to RFC 2396 - that is, if the
161 tokens "//", "/", ";", "?" and "#" are used, do they have the
162 meaning given in RFC 2396?
163
164 o Does the scheme make sense to use it in relative URLs like those
165 RFC 2396 specifies?
166
167 o If the scheme syntax is designed to be broken into pieces, does
168 the documentation for the scheme's syntax specify what those
169 pieces are, why it should be broken in this way, and why the
170 breaks aren't where RFC 2396 says that they usually should be?
171
172 o If the scheme has a hierarchy, does it go left-to-right and with
173 slash separators like RFC 2396? If not, why not?
174
175
176 2.1.4 Compatibility with fragment syntax
177
178 Fragment syntax should be shared across URL schemes whenever
179 possible. Fragments indicate a location within a particular
180 document, of a particular media type. As media types evolve,
181 and content negotiation becomes deployed, a shared fragment syntax
182 allows a fragment to point to the correct location within documents
183 of different media types. For example, a named fragment (#foo),
184 should to be able to point to the foo label in either a HTML
185 document or an XML document. Similarly for fragments identifying a
186 location in an image, where the image may want to evolve from GIF,
187 to JPEG, to PNG, the fragment ID should point to the same location.
188
189
190 2.2 Is the scheme well defined?
191
192 It is important that the semantics of the "resource" that a URL
193 "locates" be well defined. This might mean different things
194 depending on the nature of the URL scheme.
195
196
197 2.2.1 Clear mapping from other name spaces
198
199 In many cases, new URL schemes are defined as ways to translate
200 other protocols and name spaces into the general framework of
201 URLs. The "ftp" URL scheme translates from the FTP protocol, while
202 the "mid" URL scheme translates from the Message-ID field of
203 messages.
204
205 In either case, the description of the mapping must be complete,
206 must describe how characters get encoded or not in URLs, must
207 describe exactly how all legal values of the base standard can be
208 represented using the URL scheme, and exactly which modifiers,
209 alternate forms and other artifacts from the base standards are
210 included or not included. These requirements are elaborated
211 below.
212
213
214 2.2.2 URL schemes associated with network protocols
215
216 Most new URL schemes are associated with network resources that
217 have one or several network protocols that can access them. The
218 'ftp', 'news', and 'http' schemes are of this nature. For such
219 schemes, the specification should completely describe how URLs are
220 translated into protocol actions in sufficient detail to make the
221 access of the network resource unambiguous. If an implementation
222 of the URL scheme requires some configuration, the configuration
223 elements must be clearly identified. (For example, the 'news'
224 scheme, if implemented using NTTP, requires configuration of the
225 NTTP server.)
226
227
228 2.2.3 Definition of non-protocol URL schemes
229
230 In some cases, URL schemes do not have particular network protocols
231 associated with them, because their use is limited to contexts
232 where the access method is understood. This is the case, for
233 example, with the "cid" and "mid" URL schemes. For these URL
234 schemes, the specification should describe the notation of the
235 scheme and a complete mapping of the locator from its source.
236
237
238 2.2.4 Definition of URL schemes not associated with data resources
239
240 Most URL schemes locate Internet resources that correspond
241 to data objects that can be retrieved or modified. This is the
242 case with "ftp" and "http", for example. However, some URL schemes
243 do not; for example, the "mailto" URL scheme corresponds to an
244 Internet mail address.
245
246 If a new URL scheme does not locate resources that are data
247 objects, the properties of names in the new space must be clearly
248 defined.
249
250
251 2.2.5 Character encoding
252
253 When describing URL schemes in which (some of) the elements of
254 the URL are actually representations of sequences of characters,
255 care should be taken not to introduce unnecessary variety in the
256 ways in which characters are encoded into octets and then into
257 URL characters. Unless there is some compelling reason for a
258 particular scheme to do otherwise, translating character sequences
259 into UTF-8 (RFC 2279) [3] and then subsequently using the %HH
260 encoding for unsafe octets is recommended.
261
262
263 2.2.6 Definition of operations
264
265 In some contexts (for example, HTML forms) it is possible to
266 specify any one of a list of operations to be performed on a
267 specific URL. (Outside forms, it is generally assumed to be
268 something you GET.)
269
270 The URL scheme definition should describe all well-defined
271 operations on the URL identifier, and what they are supposed to
272 do.
273
274 Some URL schemes (for example, "telnet") provide location
275 information for hooking onto bi-directional data streams, and don't
276 fit the "infoaccess" paradigm of most URLs very well; this should
277 be documented.
278
279 NOTE: It is perfectly valid to say that "no operation apart from
280 GET is defined for this URL". It is also valid to say that "there's
281 only one operation defined for this URL, and it's not very
282 GET-like". The important point is that what is defined on this type
283 is described.
284
285
286 2.3 Demonstrated utility
287
288 URL schemes should have demonstrated utility. New URL schemes are
289 expensive things to support. Often they require special code in
290 browsers, proxies, and/or servers. Having a lot of ways to say the
291 same thing needless complicates these programs without adding value
292 to the Internet.
293
294 The kinds of things that are useful include:
295
296 o Things that cannot be referred to in any other way.
297
298 o Things where it is much easier to get at them using this scheme
299 than (for instance) a proxy gateway.
300
301
302 2.3.1 Proxy into HTTP/HTML
303
304 One way to provide a demonstration of utility is via a gateway
305 which provides objects in the new scheme for clients using an
306 existing protocol. It is much easier to deploy gateways to a new
307 service than it is to deploy browsers that understand the new URL
308 object.
309
310 Things to look for when thinking about a proxy are:
311
312 o Is there a single global resolution mechanism whereby any proxy
313 can find the referenced object?
314 o If not, is there a way in which the user can find any object of
315 this type, and "run his own proxy"?
316 o Are the operations mappable one-to-one (or possibly using
317 modifiers) to HTTP operations?
318 o Is the type of returned objects well defined?
319 - as MIME content-types?
320 - as something that can be translated to HTML?
321 o Is there running code for a proxy?
322
323
324 2.4 Are there security considerations?
325
326 Above and beyond the security considerations of the base mechanism
327 a scheme builds upon, one must think of things that can happen in
328 the normal course of URL usage.
329
330 In particular:
331
332 o Does the user need to be warned that such a thing is happening
333 without an explicit request (GET for the source of an IMG tag,
334 for instance)? This has implications for the design of a proxy
335 gateway, of course.
336
337 o Is it possible to fake URLs of this type that point to different
338 things in a dangerous way?
339
340 o Are there mechanisms for identifying the requester that can be
341 used or need to be used with this mechanism (the From: field in a
342 mailto: URL, or the Kerberos login required for AFS access in the
343 AFS: URL, for instance)?
344
345 o Does the mechanism contain passwords or other security
346 information that are passed inside the referring document in the
347 clear (as in the "ftp" URL, for instance)?
348
349
350 2.5 Does it start with UR?
351
352 Any scheme starting with the letters "U" and "R", in particular if
353 it attaches any of the meanings "uniform", "universal" or
354 "unifying" to the first letter, is going to cause intense debate,
355 and generate much heat (but maybe little light).
356
357 Any such proposal should either make sure that there is a large
358 consensus behind it that it will be the only scheme of its type, or
359 pick another name.
360
361
362 2.6 Non-considerations
363
364 Some issues that are often raised but are not relevant to new URL
365 schemes include the following.
366
367
368 2.6.1 Are all objects accessible?
369
370 Can all objects in the world that are validly identified by a
371 scheme be accessed by any UA implementing it?
372
373 Sometimes the answer will be yes and sometimes no; often it will
374 depend on factors (like firewalls or client configuration) not
375 directly related to the scheme itself.
376
377
378 3. Security considerations
379
380 New URL schemes are required to address all security considerations
381 in their definitions.
382
383
384 4. References
385
386 [1] Berners-Lee, T., Fielding, R., Masinter, L., "Uniform Resource
387 Identifiers (URI): Generic Syntax", RFC 2396, August 1998
388
389 [2] Petke, R., "Registration Procedures for URL Scheme Names",
390 RFC [URL-PROCESS], November 1998
391
392 [3] Yergeau, F., "UTF-8, A Transformation Format of Unicode and ISO
393 10646", RFC 2279, January 1998.
394
395
396 5. Authors' Addresses
397
398 Larry Masinter
399 Xerox Corporation
400 Palo Alto Research Center
401 3333 Coyote Hill Road
402 Palo Alto, CA 94304
403 Fax: +1-415-812-4333
404 EMail: masinter@parc.xerox.com
405
406 Harald Tveit Alvestrand
407 Maxware, Pirsenteret
408 N-7005 Trondheim
409 NORWAY
410 Voice: +47 73 54 57 00
411 EMail: harald.alvestrand@maxware.no
412
413 Dan Zigmond
414 WebTV Networks, Inc.
415 305 Lytton Avenue
416 Palo Alto, CA 94301
417 USA
418 Voice: +1-650-614-6071
419 EMail: djz@corp.webtv.net
420
421 Rich Petke
422 MCIWORLDCOM Advanced Networks
423 5000 Britton Road
424 P. O. Box 5000
425 Hilliard, OH 43026-5000
426 Voice: +1-614-723-4157
427 Fax: +1-614-723-1333
428 EMail: rpetke@wcom.net
429

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24