1 |
wakaba |
1.1 |
|
2 |
|
|
INTERNET-DRAFT Larry Masinter |
3 |
|
|
<draft-ietf-urlreg-guide-01.txt> Harald T. Alvestrand |
4 |
|
|
December 9, 1997 Dan Zigmond |
5 |
|
|
|
6 |
|
|
Guidelines for new URL Schemes |
7 |
|
|
|
8 |
|
|
Status of this Memo |
9 |
|
|
|
10 |
|
|
This document is an Internet-Draft. Internet-Drafts are working |
11 |
|
|
documents of the Internet Engineering Task Force (IETF), its areas, |
12 |
|
|
and its working groups. Note that other groups may also distribute |
13 |
|
|
working documents as Internet-Drafts. |
14 |
|
|
|
15 |
|
|
Internet-Drafts are draft documents valid for a maximum of six |
16 |
|
|
months and may be updated, replaced, or obsoleted by other |
17 |
|
|
documents at any time. It is inappropriate to use Internet-Drafts |
18 |
|
|
as reference material or to cite them other than as ``work in |
19 |
|
|
progress.'' |
20 |
|
|
|
21 |
|
|
To learn the current status of any Internet-Draft, please check the |
22 |
|
|
``1id-abstracts.txt'' listing contained in the Internet-Drafts |
23 |
|
|
Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net |
24 |
|
|
(Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East |
25 |
|
|
Coast), or ftp.isi.edu (US West Coast). |
26 |
|
|
|
27 |
|
|
Abstract |
28 |
|
|
|
29 |
|
|
A Uniform Resource Locator (URL) is a compact string representation |
30 |
|
|
of the location for a resource that is available via the Internet. |
31 |
|
|
This document provides guidelines for the definition of new URL |
32 |
|
|
schemes. |
33 |
|
|
|
34 |
|
|
1. Introduction |
35 |
|
|
|
36 |
|
|
A Uniform Resource Locator (URL) is a compact string representation |
37 |
|
|
of the location for a resource that is available via the Internet. |
38 |
|
|
[URI-SYNTAX] defines the general syntax and semantics of URIs, and, |
39 |
|
|
by inclusion, URLs. URLs are designated by including a "scheme" |
40 |
|
|
and then a "scheme-specific part". Many URL schemes are already |
41 |
|
|
defined. |
42 |
|
|
|
43 |
|
|
This document provides guidelines for the definition of new URL |
44 |
|
|
schemes, for consideration by those who are defining and |
45 |
|
|
registering or evaluating those definitions. |
46 |
|
|
|
47 |
|
|
The process by which new URL schemes are registered or defined |
48 |
|
|
is not defined here. |
49 |
|
|
|
50 |
|
|
2. Guildelines for new URL schemes |
51 |
|
|
|
52 |
|
|
Because new URL schemes potentially complicate client software, new |
53 |
|
|
schemes must have demonstrable utility and operability, as well as |
54 |
|
|
compatibility with existing URL schemes. This section elaborates |
55 |
|
|
these criteria. |
56 |
|
|
|
57 |
|
|
2.1 Syntactic compatibility |
58 |
|
|
|
59 |
|
|
New URL schemes should follow the same syntactic conventions of |
60 |
|
|
existing schemes when appropriate. |
61 |
|
|
|
62 |
|
|
2.1.1 Use of initial "//" for top level |
63 |
|
|
|
64 |
|
|
Many proposed new URL schemes seem to use "://" as a kind of |
65 |
|
|
indicator that what follows is a URL. However, the use of "//" |
66 |
|
|
indicates a "top level" for schemes that support relative |
67 |
|
|
URLs, and is not necessary (and just confusing) for schemes |
68 |
|
|
that have no relative forms. URL schemes without relative |
69 |
|
|
forms (such as mailto, cid, mid) do not use an initial "//". |
70 |
|
|
|
71 |
|
|
2.1.2 Compatibility with relative URLs |
72 |
|
|
|
73 |
|
|
URL schemes should use the generic-URL syntax if they are intended |
74 |
|
|
to be used with relative URLs. A description of the allowed |
75 |
|
|
relative forms should be included in the scheme's definition. |
76 |
|
|
Many applications use relative URLs extensively. |
77 |
|
|
|
78 |
|
|
o Can it be parsed according to [URI-SYNTAX] - that is, if the |
79 |
|
|
tokens "//", "/", ";", "?" and "#" are used, do they have the |
80 |
|
|
meaning given in RFC URI-SYNTAX? |
81 |
|
|
|
82 |
|
|
o Does it make sense to use it in relative URLs like those |
83 |
|
|
[URI-SYNTAX] specifies? |
84 |
|
|
|
85 |
|
|
o If something is designed to be broken into pieces, does it |
86 |
|
|
document what those pieces are, why it should be broken in this |
87 |
|
|
way, and why the breaks aren't where URI-SYNTAX says that they |
88 |
|
|
usually should be? |
89 |
|
|
|
90 |
|
|
o If it has a hierarchy, does it go left-to-right and with slash |
91 |
|
|
separators like [URN-SYNTAX]? If not, why not? |
92 |
|
|
|
93 |
|
|
2.2 Is the scheme well defined? |
94 |
|
|
|
95 |
|
|
It is important that the semantics of the "resource" that a URL |
96 |
|
|
"locates" be well defined. This might mean different things |
97 |
|
|
depending on the nature of the URL scheme. |
98 |
|
|
|
99 |
|
|
2.2.1 Clear mapping from other name spaces |
100 |
|
|
|
101 |
|
|
In many cases, new URL schemes are defined as ways to translate |
102 |
|
|
other protocols and name spaces into the general framework of |
103 |
|
|
URLs. The "ftp" URL scheme translates from the FTP protocol, while |
104 |
|
|
the "mid" URL scheme translates from the Message-ID field of |
105 |
|
|
messages. |
106 |
|
|
|
107 |
|
|
In either case, the description of the mapping must be complete, |
108 |
|
|
must describe how character sets get encoded or not in URLs, must |
109 |
|
|
describe exactly how all legal values of the base standard can be |
110 |
|
|
represented using the URL scheme, and exactly which modifiers, |
111 |
|
|
alternate forms and other artifacts from the base standards are |
112 |
|
|
included or not included. These requirements are elaborated |
113 |
|
|
below. |
114 |
|
|
|
115 |
|
|
2.2.2 URL schemes associated with network protocols |
116 |
|
|
|
117 |
|
|
Most new URL schemes are associated with network resources that |
118 |
|
|
have one or several network protocols that can access them. The |
119 |
|
|
'ftp', 'news', and 'http' schemes are of this nature. For such |
120 |
|
|
schemes, the specification should completely describe how URLs are |
121 |
|
|
translated into protocol actions in sufficient detail to make the |
122 |
|
|
access of the network resource unambiguous. If an implementation |
123 |
|
|
of of the URL scheme requires some configuration, the configuration |
124 |
|
|
elements must be clearly identified. (For example, the 'news' |
125 |
|
|
scheme, if implemented using NTTP, requires configuration of the |
126 |
|
|
NTTP server.) |
127 |
|
|
|
128 |
|
|
2.2.3 Character encoding |
129 |
|
|
|
130 |
|
|
When describing URL schemes in which (some of) the elements of |
131 |
|
|
the URL are actually representations of sequences of characters, |
132 |
|
|
care should be taken not to introduce unnecessary variety in the |
133 |
|
|
ways in which characters are encoded into octets and then into |
134 |
|
|
URL characters. Unless there is some compelling reason for a |
135 |
|
|
particular scheme to do otherwise, translating character sequences |
136 |
|
|
into UTF-8 [RFC2044] and then subsequently using the %HH encoding |
137 |
|
|
for unsafe octets is recommended. |
138 |
|
|
|
139 |
|
|
2.2.4 Definition of non-protocol URL schemes |
140 |
|
|
|
141 |
|
|
In some cases, URL schemes do not have particular network protocols |
142 |
|
|
associated with them, because their use is limited to contexts |
143 |
|
|
where the access method is understood. This is the case, for |
144 |
|
|
example, with the "cid" and "mid" URL schemes. For these URL |
145 |
|
|
schemes, the specification should describe the notation of the |
146 |
|
|
scheme and a complete mapping of the locator from its source. |
147 |
|
|
|
148 |
|
|
2.2.5 Definition of URL schemes not associated with data resources |
149 |
|
|
|
150 |
|
|
Most URL schemes locate Internet resources that correspond |
151 |
|
|
to data objects that can be retrieved or modified. This is the |
152 |
|
|
case with "ftp" and "http", for example. However, some URL schemes |
153 |
|
|
do not; for example, the "mailto" URL scheme corresponds to an |
154 |
|
|
Internet mail address. |
155 |
|
|
|
156 |
|
|
If a new URL scheme does not locate resources that are data |
157 |
|
|
objects, the properties of names in the new space must be clearly |
158 |
|
|
defined. |
159 |
|
|
|
160 |
|
|
2.2.6 Definition of operations |
161 |
|
|
|
162 |
|
|
In some contexts (for example, HTML forms) it is possible to |
163 |
|
|
specify any one of a list of operations to be performed on a |
164 |
|
|
specifc URL. (Outside forms, it is generally assumed to be |
165 |
|
|
something you GET.) |
166 |
|
|
|
167 |
|
|
The URL scheme definition should describe all well-defined |
168 |
|
|
operations on the URL identifier, and what they are supposed to |
169 |
|
|
do. |
170 |
|
|
|
171 |
|
|
Some URL schemes (for example, "telnet") provide location |
172 |
|
|
information for hooking onto bidirectional data streams, and don't |
173 |
|
|
fit the "infoaccess" paradigm of most URLs very well; this should |
174 |
|
|
be documented. |
175 |
|
|
|
176 |
|
|
NOTE: It is perfectly valid to say that "no operation apart from |
177 |
|
|
GET is defined for this URL". It is also valid to say that "there's |
178 |
|
|
only one operation defined for this URL, and it's not very |
179 |
|
|
GET-like". The important point is that what is defined on this type |
180 |
|
|
is described. |
181 |
|
|
|
182 |
|
|
2.3 Demonstrated utility |
183 |
|
|
|
184 |
|
|
URL schemes should have demonstrated utility. New URL schemes are |
185 |
|
|
expensive things to support. Often they require special code in |
186 |
|
|
browsers, proxies, and/or servers. Having a lot of ways to say the |
187 |
|
|
same thing needless complicates these programs without adding value |
188 |
|
|
to the Internet. |
189 |
|
|
|
190 |
|
|
The kinds of things that are useful include: |
191 |
|
|
|
192 |
|
|
o Things that cannot be referred to in any other way. |
193 |
|
|
|
194 |
|
|
o Things where it is much easier to get at them using this |
195 |
|
|
scheme than (for instance) a proxy gateway. |
196 |
|
|
|
197 |
|
|
|
198 |
|
|
2.3.1 Proxy into HTTP/HTML |
199 |
|
|
|
200 |
|
|
One way to provide a demonstration of utility is via a gateway |
201 |
|
|
which provides objects in the new scheme for clients using an |
202 |
|
|
existing protocol. It is much easier to deploy gateways to a new |
203 |
|
|
service than it is to deploy browsers that understand the new URL |
204 |
|
|
object. |
205 |
|
|
|
206 |
|
|
Things to look for when thinking about a proxy are: |
207 |
|
|
|
208 |
|
|
o Is there a single global resolution mechanism whereby any proxy can |
209 |
|
|
find the referenced object? |
210 |
|
|
o If not, is there a way in which the user can find any object of |
211 |
|
|
this type, and "run his own proxy"? |
212 |
|
|
o Are the operations mappable one-to-one (or possibly using |
213 |
|
|
modifiers) to HTTP operations? |
214 |
|
|
o Is the type of returned objects well defined? |
215 |
|
|
* as MIME content-types? |
216 |
|
|
* as something that can be translated to HTML? |
217 |
|
|
o Is there running code for a proxy? |
218 |
|
|
|
219 |
|
|
2.4 Are there security considerations? |
220 |
|
|
|
221 |
|
|
Above and beyond the security considerations of the base mechanism |
222 |
|
|
a scheme builds upon, one must think of things that can happen in |
223 |
|
|
the normal course of URL usage. |
224 |
|
|
|
225 |
|
|
In particular: |
226 |
|
|
|
227 |
|
|
o Does the user need to be warned that such a thing is happening |
228 |
|
|
without an explicit request (GET for the source of an IMG tag, |
229 |
|
|
for instance)? This has implications for the design of a proxy |
230 |
|
|
gateway, of course. |
231 |
|
|
|
232 |
|
|
o Is it possible to fake URLs of this type that point to different |
233 |
|
|
things in a dangerous way? |
234 |
|
|
|
235 |
|
|
o Are there mechanisms for identifying the requester that can be |
236 |
|
|
used or need to be used with this mechanism (the From: field in a |
237 |
|
|
mailto: URL, or the Kerberos login required for AFS access in the |
238 |
|
|
AFS: url, for instance)? |
239 |
|
|
|
240 |
|
|
o Does the mechanism contain passwords or other security |
241 |
|
|
information that are passed inside the referring document in the |
242 |
|
|
clear (as in the "ftp" URL, for instance)? |
243 |
|
|
|
244 |
|
|
2.5 Does it start with UR? |
245 |
|
|
|
246 |
|
|
Any scheme starting with the letters "U" and "R", in particular if |
247 |
|
|
it attaches any of the meanings "uniform", "universal" or |
248 |
|
|
"unifying" to the first letter, is going to cause intense debate, |
249 |
|
|
and generate much heat (but maybe little light). |
250 |
|
|
|
251 |
|
|
Any such proposal should either make sure that there is a large |
252 |
|
|
consensus behind it that it will be the only scheme of its type, or |
253 |
|
|
pick another name. |
254 |
|
|
|
255 |
|
|
2.6 Non-considerations |
256 |
|
|
|
257 |
|
|
Some issues that are often raised but are not relevent to new URL |
258 |
|
|
schemes include the following. |
259 |
|
|
|
260 |
|
|
2.6.1 Are all objects acessible? |
261 |
|
|
|
262 |
|
|
Can all objects in the world that are validly identified by a |
263 |
|
|
scheme be accessed by any UA implementing it? |
264 |
|
|
|
265 |
|
|
Sometimes the answer will be yes and sometimes no; often it will |
266 |
|
|
depend on factors (like firewalls or client configuration) not |
267 |
|
|
directly related to the scheme itself. |
268 |
|
|
|
269 |
|
|
3. Security considerations |
270 |
|
|
|
271 |
|
|
New URL schemes are required to address all security considerations |
272 |
|
|
in their definitions. |
273 |
|
|
|
274 |
|
|
4. IANA considerations |
275 |
|
|
|
276 |
|
|
The process by which URL schemes are defined and registered |
277 |
|
|
is not defined in this document. |
278 |
|
|
|
279 |
|
|
5. References |
280 |
|
|
|
281 |
|
|
[RFC2044] F. Yergeau, "UTF-8, A Transformation Format of Unicode |
282 |
|
|
and ISO 10646", Alis Technologies, October 1996. |
283 |
|
|
|
284 |
|
|
[URI-SYNTAX] T. Berners-Lee, R. Fielding, L. Masinter, "Uniform |
285 |
|
|
Resource Identifiers (URI): Generic Syntax and Semantics", |
286 |
|
|
<draft-fielding-uri-syntax-*>. |
287 |
|
|
|
288 |
|
|
6. Author's Addresses |
289 |
|
|
|
290 |
|
|
Larry Masinter |
291 |
|
|
Xerox Corporation |
292 |
|
|
Palo Alto Research Center |
293 |
|
|
3333 Coyote Hill Road |
294 |
|
|
Palo Alto, CA 94304 |
295 |
|
|
Fax: +1-415-812-4333 |
296 |
|
|
EMail: masinter@parc.xerox.com |
297 |
|
|
|
298 |
|
|
Harald T. Alvestrand |
299 |
|
|
UNINETT A/S |
300 |
|
|
Postboks 6683 Elgeseter 7002 |
301 |
|
|
Trondheim, Norway |
302 |
|
|
Tel: +47 73 59 70 94 |
303 |
|
|
EMail: Harald.T.Alvestrand@uninett.no |
304 |
|
|
|
305 |
|
|
Dan Zigmond |
306 |
|
|
Wink Communications |
307 |
|
|
1001 Marina Village Parkway |
308 |
|
|
Alameda CA 94610 |
309 |
|
|
Fax: +1-510-337-2960 |
310 |
|
|
Phone: +1-510-337-6359 |
311 |
|
|
Email: dan.zigmond@wink.com |
312 |
|
|
|