1 |
|
2 |
INTERNET-DRAFT Larry Masinter |
3 |
<draft-ietf-urlreg-guide-04.txt> Harald T. Alvestrand |
4 |
November 13, 1998 Dan Zigmond |
5 |
Rich Petke |
6 |
|
7 |
|
8 |
Guidelines for new URL Schemes |
9 |
|
10 |
|
11 |
Status of this Memo |
12 |
|
13 |
This document is an Internet-Draft. Internet-Drafts are working |
14 |
documents of the Internet Engineering Task Force (IETF), its |
15 |
areas, and its working groups. Note that other groups may also |
16 |
distribute working documents as Internet-Drafts. |
17 |
|
18 |
Internet-Drafts are draft documents valid for a maximum of six |
19 |
months and may be updated, replaced, or obsoleted by other |
20 |
documents at any time. It is inappropriate to use Internet- |
21 |
Drafts as reference material or to cite them other than as |
22 |
"work in progress." |
23 |
|
24 |
To view the entire list of current Internet-Drafts, please check |
25 |
the "1id-abstracts.txt" listing contained in the Internet-Drafts |
26 |
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net |
27 |
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au |
28 |
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu |
29 |
(US West Coast). |
30 |
|
31 |
Distribution of this memo is unlimited. |
32 |
|
33 |
This Internet Draft expires May 13, 1999. |
34 |
|
35 |
Copyright Notice |
36 |
|
37 |
Copyright (C) The Internet Society (1998). All Rights Reserved. |
38 |
|
39 |
Abstract |
40 |
|
41 |
A Uniform Resource Locator (URL) is a compact string representation |
42 |
of the location for a resource that is available via the Internet. |
43 |
This document provides guidelines for the definition of new URL |
44 |
schemes. |
45 |
|
46 |
|
47 |
1. Introduction |
48 |
|
49 |
A Uniform Resource Locator (URL) is a compact string representation |
50 |
of the location for a resource that is available via the Internet. |
51 |
RFC 2396 [1] defines the general syntax and semantics of URIs, and, |
52 |
by inclusion, URLs. URLs are designated by including a "<scheme>:" |
53 |
and then a "<scheme-specific-part>". Many URL schemes are already |
54 |
defined. |
55 |
|
56 |
This document provides guidelines for the definition of new URL |
57 |
schemes, for consideration by those who are defining and |
58 |
registering or evaluating those definitions. |
59 |
|
60 |
The process by which new URL schemes are registered is defined in |
61 |
RFC [URL-PROCESS] [2]. |
62 |
|
63 |
|
64 |
2. Guidelines for new URL schemes |
65 |
|
66 |
Because new URL schemes potentially complicate client software, new |
67 |
schemes must have demonstrable utility and operability, as well as |
68 |
compatibility with existing URL schemes. This section elaborates |
69 |
these criteria. |
70 |
|
71 |
|
72 |
2.1 Syntactic compatibility |
73 |
|
74 |
New URL schemes should follow the same syntactic conventions of |
75 |
existing schemes when appropriate. If a URI scheme that has |
76 |
embedded links in content accessed by that scheme does not share |
77 |
syntax with a different scheme, the same content cannot be served up |
78 |
under different schemes without rewriting the content. This can |
79 |
already be a problem, and with future digital signature schemes, |
80 |
rewriting may not even be possible. Deployment of other schemes in |
81 |
the future could therefore become extremely difficult. |
82 |
|
83 |
|
84 |
2.1.1 Motivations for syntactic compatibility |
85 |
|
86 |
Why should new URL schemes share as much of the generic URI syntax |
87 |
(that makes sense to share) as possible? Consider the following: |
88 |
|
89 |
o If fragment syntax isn't shared between two schemes, (e.g. "<a |
90 |
href="#foo">"), you can't move individual completely self |
91 |
referential documents between schemes without rewriting the |
92 |
embedded references within the document. In the Web, the fragment |
93 |
syntax is a property of the media type, and evaluated by the |
94 |
client. |
95 |
|
96 |
o If fragment syntax is not shared between different media types of |
97 |
the same capability (e.g. HTML, XML, Word, or image types such as |
98 |
GIF, JPEG, PNG) then you can't have a URI reference that can |
99 |
evolve to superior media types as they become available, or even |
100 |
likely work properly today with content negotiation. |
101 |
|
102 |
o If relative syntax (to the extent of understanding the URI is |
103 |
relative, and what part of the URI string is relative) isn't |
104 |
shared between two schemes, (e.g. "<a href="foo">"), you can't |
105 |
move sets of documents that are internally self referential |
106 |
between schemes without rewriting the embedded URIs. |
107 |
|
108 |
o If the ".." syntax as a path component in relative URI's isn't |
109 |
shared between schemes, you can't easily have sets of document |
110 |
sets and refer to them between schemes without rewriting the |
111 |
embedded references. |
112 |
|
113 |
o If the "/" syntax (to the extent of understanding that the URI |
114 |
refers to a path relative to the current naming authority, see |
115 |
section 2.1.1) isn't shared, you can't have multiple sets of |
116 |
documents easily be moved up or down in a relative hierarchy of |
117 |
names and share a common set of documents between them, without |
118 |
rewriting the content, shared either in that scheme or between |
119 |
schemes. The best example is a site that has a common set of |
120 |
GIF's, JPEG and PNG images, and you want to reorganize the site |
121 |
changing the depth of a subtree from one depth to another, or |
122 |
from one directory to another where the depth isn't the same. |
123 |
|
124 |
o If naming authority syntax (e.g. what comes after "//" in most URL |
125 |
schemes, see section 2.1.1) and relative path syntax is shared, to |
126 |
the extent of understanding that the URI has a naming authority, |
127 |
and what part of the URI string is the naming authority vs. path), |
128 |
isn't shared between two schemes, you can't share identical name |
129 |
spaces and serve them up via different schemes. (The naming |
130 |
authority syntax is a property of the scheme). The fact that |
131 |
HTTP, and FTP have the same syntax, for example, has often been |
132 |
exploited by sites transitioning from ftp archive service to HTTP |
133 |
archive service so that the URL's can be identical between schemes |
134 |
except for the scheme; the same content can be served via two |
135 |
schemes simultaneously. |
136 |
|
137 |
|
138 |
2.1.2 Improper use of "//" following "<scheme>:" |
139 |
|
140 |
Contrary to some examples set in past years, the use of double |
141 |
slashes as the first component of the <scheme-specific-part> of a |
142 |
URL is not simply an artistic indicator that what follows is a URL: |
143 |
Double slashes are used ONLY when the syntax of the URL's |
144 |
<scheme-specific-part> contains a hierarchical structure as |
145 |
described in RFC 2396. In URLs from such schemes, the use of double |
146 |
slashes indicates that what follows is the top hierarchical element |
147 |
for a naming authority. (See section 3 of RFC 2396 for more |
148 |
details.) URL schemes which do not contain a conformant |
149 |
hierarchical structure in their <scheme-specific-part> should not |
150 |
use double slashes following the "<scheme>:" string. |
151 |
|
152 |
|
153 |
2.1.3 Compatibility with relative URLs |
154 |
|
155 |
URL schemes should use the generic URL syntax if they are intended |
156 |
to be used with relative URLs. A description of the allowed |
157 |
relative forms should be included in the scheme's definition. |
158 |
Many applications use relative URLs extensively. Specifically, |
159 |
|
160 |
o Can the scheme be parsed according to RFC 2396 - that is, if the |
161 |
tokens "//", "/", ";", "?" and "#" are used, do they have the |
162 |
meaning given in RFC 2396? |
163 |
|
164 |
o Does the scheme make sense to use it in relative URLs like those |
165 |
RFC 2396 specifies? |
166 |
|
167 |
o If the scheme syntax is designed to be broken into pieces, does |
168 |
the documentation for the scheme's syntax specify what those |
169 |
pieces are, why it should be broken in this way, and why the |
170 |
breaks aren't where RFC 2396 says that they usually should be? |
171 |
|
172 |
o If the scheme has a hierarchy, does it go left-to-right and with |
173 |
slash separators like RFC 2396? If not, why not? |
174 |
|
175 |
|
176 |
2.1.4 Compatibility with fragment syntax |
177 |
|
178 |
Fragment syntax should be shared across URL schemes whenever |
179 |
possible. Fragments indicate a location within a particular |
180 |
document, of a particular media type. As media types evolve, |
181 |
and content negotiation becomes deployed, a shared fragment syntax |
182 |
allows a fragment to point to the correct location within documents |
183 |
of different media types. For example, a named fragment (#foo), |
184 |
should to be able to point to the foo label in either a HTML |
185 |
document or an XML document. Similarly for fragments identifying a |
186 |
location in an image, where the image may want to evolve from GIF, |
187 |
to JPEG, to PNG, the fragment ID should point to the same location. |
188 |
|
189 |
|
190 |
2.2 Is the scheme well defined? |
191 |
|
192 |
It is important that the semantics of the "resource" that a URL |
193 |
"locates" be well defined. This might mean different things |
194 |
depending on the nature of the URL scheme. |
195 |
|
196 |
|
197 |
2.2.1 Clear mapping from other name spaces |
198 |
|
199 |
In many cases, new URL schemes are defined as ways to translate |
200 |
other protocols and name spaces into the general framework of |
201 |
URLs. The "ftp" URL scheme translates from the FTP protocol, while |
202 |
the "mid" URL scheme translates from the Message-ID field of |
203 |
messages. |
204 |
|
205 |
In either case, the description of the mapping must be complete, |
206 |
must describe how characters get encoded or not in URLs, must |
207 |
describe exactly how all legal values of the base standard can be |
208 |
represented using the URL scheme, and exactly which modifiers, |
209 |
alternate forms and other artifacts from the base standards are |
210 |
included or not included. These requirements are elaborated |
211 |
below. |
212 |
|
213 |
|
214 |
2.2.2 URL schemes associated with network protocols |
215 |
|
216 |
Most new URL schemes are associated with network resources that |
217 |
have one or several network protocols that can access them. The |
218 |
'ftp', 'news', and 'http' schemes are of this nature. For such |
219 |
schemes, the specification should completely describe how URLs are |
220 |
translated into protocol actions in sufficient detail to make the |
221 |
access of the network resource unambiguous. If an implementation |
222 |
of the URL scheme requires some configuration, the configuration |
223 |
elements must be clearly identified. (For example, the 'news' |
224 |
scheme, if implemented using NTTP, requires configuration of the |
225 |
NTTP server.) |
226 |
|
227 |
|
228 |
2.2.3 Definition of non-protocol URL schemes |
229 |
|
230 |
In some cases, URL schemes do not have particular network protocols |
231 |
associated with them, because their use is limited to contexts |
232 |
where the access method is understood. This is the case, for |
233 |
example, with the "cid" and "mid" URL schemes. For these URL |
234 |
schemes, the specification should describe the notation of the |
235 |
scheme and a complete mapping of the locator from its source. |
236 |
|
237 |
|
238 |
2.2.4 Definition of URL schemes not associated with data resources |
239 |
|
240 |
Most URL schemes locate Internet resources that correspond |
241 |
to data objects that can be retrieved or modified. This is the |
242 |
case with "ftp" and "http", for example. However, some URL schemes |
243 |
do not; for example, the "mailto" URL scheme corresponds to an |
244 |
Internet mail address. |
245 |
|
246 |
If a new URL scheme does not locate resources that are data |
247 |
objects, the properties of names in the new space must be clearly |
248 |
defined. |
249 |
|
250 |
|
251 |
2.2.5 Character encoding |
252 |
|
253 |
When describing URL schemes in which (some of) the elements of |
254 |
the URL are actually representations of sequences of characters, |
255 |
care should be taken not to introduce unnecessary variety in the |
256 |
ways in which characters are encoded into octets and then into |
257 |
URL characters. Unless there is some compelling reason for a |
258 |
particular scheme to do otherwise, translating character sequences |
259 |
into UTF-8 (RFC 2279) [3] and then subsequently using the %HH |
260 |
encoding for unsafe octets is recommended. |
261 |
|
262 |
|
263 |
2.2.6 Definition of operations |
264 |
|
265 |
In some contexts (for example, HTML forms) it is possible to |
266 |
specify any one of a list of operations to be performed on a |
267 |
specific URL. (Outside forms, it is generally assumed to be |
268 |
something you GET.) |
269 |
|
270 |
The URL scheme definition should describe all well-defined |
271 |
operations on the URL identifier, and what they are supposed to |
272 |
do. |
273 |
|
274 |
Some URL schemes (for example, "telnet") provide location |
275 |
information for hooking onto bi-directional data streams, and don't |
276 |
fit the "infoaccess" paradigm of most URLs very well; this should |
277 |
be documented. |
278 |
|
279 |
NOTE: It is perfectly valid to say that "no operation apart from |
280 |
GET is defined for this URL". It is also valid to say that "there's |
281 |
only one operation defined for this URL, and it's not very |
282 |
GET-like". The important point is that what is defined on this type |
283 |
is described. |
284 |
|
285 |
|
286 |
2.3 Demonstrated utility |
287 |
|
288 |
URL schemes should have demonstrated utility. New URL schemes are |
289 |
expensive things to support. Often they require special code in |
290 |
browsers, proxies, and/or servers. Having a lot of ways to say the |
291 |
same thing needless complicates these programs without adding value |
292 |
to the Internet. |
293 |
|
294 |
The kinds of things that are useful include: |
295 |
|
296 |
o Things that cannot be referred to in any other way. |
297 |
|
298 |
o Things where it is much easier to get at them using this scheme |
299 |
than (for instance) a proxy gateway. |
300 |
|
301 |
|
302 |
2.3.1 Proxy into HTTP/HTML |
303 |
|
304 |
One way to provide a demonstration of utility is via a gateway |
305 |
which provides objects in the new scheme for clients using an |
306 |
existing protocol. It is much easier to deploy gateways to a new |
307 |
service than it is to deploy browsers that understand the new URL |
308 |
object. |
309 |
|
310 |
Things to look for when thinking about a proxy are: |
311 |
|
312 |
o Is there a single global resolution mechanism whereby any proxy |
313 |
can find the referenced object? |
314 |
o If not, is there a way in which the user can find any object of |
315 |
this type, and "run his own proxy"? |
316 |
o Are the operations mappable one-to-one (or possibly using |
317 |
modifiers) to HTTP operations? |
318 |
o Is the type of returned objects well defined? |
319 |
- as MIME content-types? |
320 |
- as something that can be translated to HTML? |
321 |
o Is there running code for a proxy? |
322 |
|
323 |
|
324 |
2.4 Are there security considerations? |
325 |
|
326 |
Above and beyond the security considerations of the base mechanism |
327 |
a scheme builds upon, one must think of things that can happen in |
328 |
the normal course of URL usage. |
329 |
|
330 |
In particular: |
331 |
|
332 |
o Does the user need to be warned that such a thing is happening |
333 |
without an explicit request (GET for the source of an IMG tag, |
334 |
for instance)? This has implications for the design of a proxy |
335 |
gateway, of course. |
336 |
|
337 |
o Is it possible to fake URLs of this type that point to different |
338 |
things in a dangerous way? |
339 |
|
340 |
o Are there mechanisms for identifying the requester that can be |
341 |
used or need to be used with this mechanism (the From: field in a |
342 |
mailto: URL, or the Kerberos login required for AFS access in the |
343 |
AFS: URL, for instance)? |
344 |
|
345 |
o Does the mechanism contain passwords or other security |
346 |
information that are passed inside the referring document in the |
347 |
clear (as in the "ftp" URL, for instance)? |
348 |
|
349 |
|
350 |
2.5 Does it start with UR? |
351 |
|
352 |
Any scheme starting with the letters "U" and "R", in particular if |
353 |
it attaches any of the meanings "uniform", "universal" or |
354 |
"unifying" to the first letter, is going to cause intense debate, |
355 |
and generate much heat (but maybe little light). |
356 |
|
357 |
Any such proposal should either make sure that there is a large |
358 |
consensus behind it that it will be the only scheme of its type, or |
359 |
pick another name. |
360 |
|
361 |
|
362 |
2.6 Non-considerations |
363 |
|
364 |
Some issues that are often raised but are not relevant to new URL |
365 |
schemes include the following. |
366 |
|
367 |
|
368 |
2.6.1 Are all objects accessible? |
369 |
|
370 |
Can all objects in the world that are validly identified by a |
371 |
scheme be accessed by any UA implementing it? |
372 |
|
373 |
Sometimes the answer will be yes and sometimes no; often it will |
374 |
depend on factors (like firewalls or client configuration) not |
375 |
directly related to the scheme itself. |
376 |
|
377 |
|
378 |
3. Security considerations |
379 |
|
380 |
New URL schemes are required to address all security considerations |
381 |
in their definitions. |
382 |
|
383 |
|
384 |
4. References |
385 |
|
386 |
[1] Berners-Lee, T., Fielding, R., Masinter, L., "Uniform Resource |
387 |
Identifiers (URI): Generic Syntax", RFC 2396, August 1998 |
388 |
|
389 |
[2] Petke, R., "Registration Procedures for URL Scheme Names", |
390 |
RFC [URL-PROCESS], November 1998 |
391 |
|
392 |
[3] Yergeau, F., "UTF-8, A Transformation Format of Unicode and ISO |
393 |
10646", RFC 2279, January 1998. |
394 |
|
395 |
|
396 |
5. Authors' Addresses |
397 |
|
398 |
Larry Masinter |
399 |
Xerox Corporation |
400 |
Palo Alto Research Center |
401 |
3333 Coyote Hill Road |
402 |
Palo Alto, CA 94304 |
403 |
Fax: +1-415-812-4333 |
404 |
EMail: masinter@parc.xerox.com |
405 |
|
406 |
Harald Tveit Alvestrand |
407 |
Maxware, Pirsenteret |
408 |
N-7005 Trondheim |
409 |
NORWAY |
410 |
Voice: +47 73 54 57 00 |
411 |
EMail: harald.alvestrand@maxware.no |
412 |
|
413 |
Dan Zigmond |
414 |
WebTV Networks, Inc. |
415 |
305 Lytton Avenue |
416 |
Palo Alto, CA 94301 |
417 |
USA |
418 |
Voice: +1-650-614-6071 |
419 |
EMail: djz@corp.webtv.net |
420 |
|
421 |
Rich Petke |
422 |
MCIWORLDCOM Advanced Networks |
423 |
5000 Britton Road |
424 |
P. O. Box 5000 |
425 |
Hilliard, OH 43026-5000 |
426 |
Voice: +1-614-723-4157 |
427 |
Fax: +1-614-723-1333 |
428 |
EMail: rpetke@wcom.net |
429 |
|