1 |
|
2 |
INTERNET-DRAFT Larry Masinter |
3 |
<draft-ietf-urlreg-guide-02.txt> Harald T. Alvestrand |
4 |
May 8, 1998 Dan Zigmond |
5 |
Rich Petke |
6 |
|
7 |
|
8 |
Guidelines for new URL Schemes |
9 |
|
10 |
|
11 |
Status of this Memo |
12 |
|
13 |
This document is an Internet-Draft. Internet-Drafts are working |
14 |
documents of the Internet Engineering Task Force (IETF), its areas, |
15 |
and its working groups. Note that other groups may also distribute |
16 |
working documents as Internet-Drafts. |
17 |
|
18 |
Internet-Drafts are draft documents valid for a maximum of six |
19 |
months and may be updated, replaced, or obsoleted by other |
20 |
documents at any time. It is inappropriate to use Internet-Drafts |
21 |
as reference material or to cite them other than as ``work in |
22 |
progress.'' |
23 |
|
24 |
To view the entire list of current Internet-Drafts, please check |
25 |
the "1id-abstracts.txt" listing contained in the Internet-Drafts |
26 |
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net |
27 |
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au |
28 |
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu |
29 |
(US West Coast). |
30 |
|
31 |
Abstract |
32 |
|
33 |
A Uniform Resource Locator (URL) is a compact string representation |
34 |
of the location for a resource that is available via the Internet. |
35 |
This document provides guidelines for the definition of new URL |
36 |
schemes. |
37 |
|
38 |
|
39 |
1. Introduction |
40 |
|
41 |
A Uniform Resource Locator (URL) is a compact string representation |
42 |
of the location for a resource that is available via the Internet. |
43 |
RFC [URI-SYNTAX] defines the general syntax and semantics of URIs, |
44 |
and, by inclusion, URLs. URLs are designated by including a |
45 |
"scheme" and then a "scheme-specific part". Many URL schemes are |
46 |
already defined. |
47 |
|
48 |
This document provides guidelines for the definition of new URL |
49 |
schemes, for consideration by those who are defining and |
50 |
registering or evaluating those definitions. |
51 |
|
52 |
The process by which new URL schemes are registered is defined in |
53 |
RFC [URL-PROCESS]. |
54 |
|
55 |
|
56 |
2. Guidelines for new URL schemes |
57 |
|
58 |
Because new URL schemes potentially complicate client software, new |
59 |
schemes must have demonstrable utility and operability, as well as |
60 |
compatibility with existing URL schemes. This section elaborates |
61 |
these criteria. |
62 |
|
63 |
|
64 |
2.1 Syntactic compatibility |
65 |
|
66 |
New URL schemes should follow the same syntactic conventions of |
67 |
existing schemes when appropriate. |
68 |
|
69 |
|
70 |
2.1.1 Use of initial "//" for top level |
71 |
|
72 |
Many proposed new URL schemes seem to use "://" as a kind of |
73 |
indicator that what follows is a URL. However, the use of "//" |
74 |
indicates a "top level" for schemes that support relative |
75 |
URLs, and is not necessary (and just confusing) for schemes |
76 |
that have no relative forms. URL schemes without relative |
77 |
forms (such as mailto, cid, mid) do not use an initial "//". |
78 |
|
79 |
|
80 |
2.1.2 Compatibility with relative URLs |
81 |
|
82 |
URL schemes should use the generic URL syntax if they are intended |
83 |
to be used with relative URLs. A description of the allowed |
84 |
relative forms should be included in the scheme's definition. |
85 |
Many applications use relative URLs extensively. Specifically, |
86 |
|
87 |
o Can the scheme be parsed according to RFC [URI-SYNTAX] - that is, |
88 |
if the tokens "//", "/", ";", "?" and "#" are used, do they have |
89 |
the meaning given in RFC [URI-SYNTAX]? |
90 |
|
91 |
o Does the scheme make sense to use it in relative URLs like those |
92 |
RFC [URI-SYNTAX] specifies? |
93 |
|
94 |
o If the scheme syntax is designed to be broken into pieces, does |
95 |
the documentation for the scheme's syntax specify what those |
96 |
pieces are, why it should be broken in this way, and why the |
97 |
breaks aren't where RFC [URI-SYNTAX] says that they usually should |
98 |
be? |
99 |
|
100 |
o If the scheme has a hierarchy, does it go left-to-right and with |
101 |
slash separators like RFC [URI-SYNTAX]? If not, why not? |
102 |
|
103 |
|
104 |
2.2 Is the scheme well defined? |
105 |
|
106 |
It is important that the semantics of the "resource" that a URL |
107 |
"locates" be well defined. This might mean different things |
108 |
depending on the nature of the URL scheme. |
109 |
|
110 |
|
111 |
2.2.1 Clear mapping from other name spaces |
112 |
|
113 |
In many cases, new URL schemes are defined as ways to translate |
114 |
other protocols and name spaces into the general framework of |
115 |
URLs. The "ftp" URL scheme translates from the FTP protocol, while |
116 |
the "mid" URL scheme translates from the Message-ID field of |
117 |
messages. |
118 |
|
119 |
In either case, the description of the mapping must be complete, |
120 |
must describe how character sets get encoded or not in URLs, must |
121 |
describe exactly how all legal values of the base standard can be |
122 |
represented using the URL scheme, and exactly which modifiers, |
123 |
alternate forms and other artifacts from the base standards are |
124 |
included or not included. These requirements are elaborated |
125 |
below. |
126 |
|
127 |
|
128 |
2.2.2 URL schemes associated with network protocols |
129 |
|
130 |
Most new URL schemes are associated with network resources that |
131 |
have one or several network protocols that can access them. The |
132 |
'ftp', 'news', and 'http' schemes are of this nature. For such |
133 |
schemes, the specification should completely describe how URLs are |
134 |
translated into protocol actions in sufficient detail to make the |
135 |
access of the network resource unambiguous. If an implementation |
136 |
of the URL scheme requires some configuration, the configuration |
137 |
elements must be clearly identified. (For example, the 'news' |
138 |
scheme, if implemented using NTTP, requires configuration of the |
139 |
NTTP server.) |
140 |
|
141 |
|
142 |
2.2.3 Character encoding |
143 |
|
144 |
When describing URL schemes in which (some of) the elements of |
145 |
the URL are actually representations of sequences of characters, |
146 |
care should be taken not to introduce unnecessary variety in the |
147 |
ways in which characters are encoded into octets and then into |
148 |
URL characters. Unless there is some compelling reason for a |
149 |
particular scheme to do otherwise, translating character sequences |
150 |
into UTF-8 [RFC 2044] and then subsequently using the %HH encoding |
151 |
for unsafe octets is recommended. |
152 |
|
153 |
|
154 |
2.2.4 Definition of non-protocol URL schemes |
155 |
|
156 |
In some cases, URL schemes do not have particular network protocols |
157 |
associated with them, because their use is limited to contexts |
158 |
where the access method is understood. This is the case, for |
159 |
example, with the "cid" and "mid" URL schemes. For these URL |
160 |
schemes, the specification should describe the notation of the |
161 |
scheme and a complete mapping of the locator from its source. |
162 |
|
163 |
|
164 |
2.2.5 Definition of URL schemes not associated with data resources |
165 |
|
166 |
Most URL schemes locate Internet resources that correspond |
167 |
to data objects that can be retrieved or modified. This is the |
168 |
case with "ftp" and "http", for example. However, some URL schemes |
169 |
do not; for example, the "mailto" URL scheme corresponds to an |
170 |
Internet mail address. |
171 |
|
172 |
If a new URL scheme does not locate resources that are data |
173 |
objects, the properties of names in the new space must be clearly |
174 |
defined. |
175 |
|
176 |
|
177 |
2.2.6 Definition of operations |
178 |
|
179 |
In some contexts (for example, HTML forms) it is possible to |
180 |
specify any one of a list of operations to be performed on a |
181 |
specific URL. (Outside forms, it is generally assumed to be |
182 |
something you GET.) |
183 |
|
184 |
The URL scheme definition should describe all well-defined |
185 |
operations on the URL identifier, and what they are supposed to |
186 |
do. |
187 |
|
188 |
Some URL schemes (for example, "telnet") provide location |
189 |
information for hooking onto bi-directional data streams, and don't |
190 |
fit the "infoaccess" paradigm of most URLs very well; this should |
191 |
be documented. |
192 |
|
193 |
NOTE: It is perfectly valid to say that "no operation apart from |
194 |
GET is defined for this URL". It is also valid to say that "there's |
195 |
only one operation defined for this URL, and it's not very |
196 |
GET-like". The important point is that what is defined on this type |
197 |
is described. |
198 |
|
199 |
|
200 |
2.3 Demonstrated utility |
201 |
|
202 |
URL schemes should have demonstrated utility. New URL schemes are |
203 |
expensive things to support. Often they require special code in |
204 |
browsers, proxies, and/or servers. Having a lot of ways to say the |
205 |
same thing needless complicates these programs without adding value |
206 |
to the Internet. |
207 |
|
208 |
The kinds of things that are useful include: |
209 |
|
210 |
o Things that cannot be referred to in any other way. |
211 |
|
212 |
o Things where it is much easier to get at them using this |
213 |
scheme than (for instance) a proxy gateway. |
214 |
|
215 |
|
216 |
2.3.1 Proxy into HTTP/HTML |
217 |
|
218 |
One way to provide a demonstration of utility is via a gateway |
219 |
which provides objects in the new scheme for clients using an |
220 |
existing protocol. It is much easier to deploy gateways to a new |
221 |
service than it is to deploy browsers that understand the new URL |
222 |
object. |
223 |
|
224 |
Things to look for when thinking about a proxy are: |
225 |
|
226 |
o Is there a single global resolution mechanism whereby any proxy |
227 |
can find the referenced object? |
228 |
o If not, is there a way in which the user can find any object of |
229 |
this type, and "run his own proxy"? |
230 |
o Are the operations mappable one-to-one (or possibly using |
231 |
modifiers) to HTTP operations? |
232 |
o Is the type of returned objects well defined? |
233 |
* as MIME content-types? |
234 |
* as something that can be translated to HTML? |
235 |
o Is there running code for a proxy? |
236 |
|
237 |
|
238 |
2.4 Are there security considerations? |
239 |
|
240 |
Above and beyond the security considerations of the base mechanism |
241 |
a scheme builds upon, one must think of things that can happen in |
242 |
the normal course of URL usage. |
243 |
|
244 |
In particular: |
245 |
|
246 |
o Does the user need to be warned that such a thing is happening |
247 |
without an explicit request (GET for the source of an IMG tag, |
248 |
for instance)? This has implications for the design of a proxy |
249 |
gateway, of course. |
250 |
|
251 |
o Is it possible to fake URLs of this type that point to different |
252 |
things in a dangerous way? |
253 |
|
254 |
o Are there mechanisms for identifying the requester that can be |
255 |
used or need to be used with this mechanism (the From: field in a |
256 |
mailto: URL, or the Kerberos login required for AFS access in the |
257 |
AFS: URL, for instance)? |
258 |
|
259 |
o Does the mechanism contain passwords or other security |
260 |
information that are passed inside the referring document in the |
261 |
clear (as in the "ftp" URL, for instance)? |
262 |
|
263 |
|
264 |
2.5 Does it start with UR? |
265 |
|
266 |
Any scheme starting with the letters "U" and "R", in particular if |
267 |
it attaches any of the meanings "uniform", "universal" or |
268 |
"unifying" to the first letter, is going to cause intense debate, |
269 |
and generate much heat (but maybe little light). |
270 |
|
271 |
Any such proposal should either make sure that there is a large |
272 |
consensus behind it that it will be the only scheme of its type, or |
273 |
pick another name. |
274 |
|
275 |
|
276 |
2.6 Non-considerations |
277 |
|
278 |
Some issues that are often raised but are not relevant to new URL |
279 |
schemes include the following. |
280 |
|
281 |
|
282 |
2.6.1 Are all objects accessible? |
283 |
|
284 |
Can all objects in the world that are validly identified by a |
285 |
scheme be accessed by any UA implementing it? |
286 |
|
287 |
Sometimes the answer will be yes and sometimes no; often it will |
288 |
depend on factors (like firewalls or client configuration) not |
289 |
directly related to the scheme itself. |
290 |
|
291 |
|
292 |
3. Security considerations |
293 |
|
294 |
New URL schemes are required to address all security considerations |
295 |
in their definitions. |
296 |
|
297 |
|
298 |
4. IANA considerations |
299 |
|
300 |
The process by which URL schemes names are registered is specified |
301 |
in RFC [URL-PROCESS]. |
302 |
|
303 |
|
304 |
5. References |
305 |
|
306 |
RFC 2044 F. Yergeau, "UTF-8, A Transformation Format of Unicode |
307 |
and ISO 10646", Alis Technologies, October 1996. |
308 |
|
309 |
RFC [URI-SYNTAX] T. Berners-Lee, R. Fielding, L. Masinter, "Uniform |
310 |
Resource Identifiers (URI): Generic Syntax and Semantics", |
311 |
<draft-fielding-uri-syntax-*.txt>. |
312 |
|
313 |
RFC [URL-PROCESS] R. Petke, "Registration Procedures for URL Scheme |
314 |
Names", <draft-ietf-urlreg-procedures-*.txt> |
315 |
|
316 |
|
317 |
6. Authors' Addresses |
318 |
|
319 |
Larry Masinter |
320 |
Xerox Corporation |
321 |
Palo Alto Research Center |
322 |
3333 Coyote Hill Road |
323 |
Palo Alto, CA 94304 |
324 |
Fax: +1-415-812-4333 |
325 |
EMail: masinter@parc.xerox.com |
326 |
|
327 |
Harald Tveit Alvestrand |
328 |
Maxware, Pirsenteret |
329 |
N-7005 Trondheim |
330 |
NORWAY |
331 |
Voice: +47 73 54 57 00 |
332 |
EMail: harald.alvestrand@maxware.no |
333 |
|
334 |
Dan Zigmond |
335 |
WebTV Networks, Inc. |
336 |
305 Lytton Avenue |
337 |
Palo Alto, CA 94301 |
338 |
USA |
339 |
Voice: +1-650-614-6071 |
340 |
EMail: djz@corp.webtv.net |
341 |
|
342 |
Rich Petke |
343 |
WorldCom Advanced Networks |
344 |
5000 Britton Road |
345 |
P. O. Box 5000 |
346 |
Hilliard, OH 43026-5000 |
347 |
Voice: +1-614-723-4157 |
348 |
Fax: +1-614-723-1333 |
349 |
EMail: rpetke@compuserve.net |
350 |
|