| 1 |
wakaba |
1.1 |
|
| 2 |
|
|
INTERNET-DRAFT Larry Masinter |
| 3 |
|
|
<draft-ietf-urlreg-guide-03.txt> Harald T. Alvestrand |
| 4 |
|
|
August 7, 1998 Dan Zigmond |
| 5 |
|
|
Rich Petke |
| 6 |
|
|
|
| 7 |
|
|
|
| 8 |
|
|
Guidelines for new URL Schemes |
| 9 |
|
|
|
| 10 |
|
|
|
| 11 |
|
|
Status of this Memo |
| 12 |
|
|
|
| 13 |
|
|
This document is an Internet-Draft. Internet-Drafts are working |
| 14 |
|
|
documents of the Internet Engineering Task Force (IETF), its |
| 15 |
|
|
areas, and its working groups. Note that other groups may also |
| 16 |
|
|
distribute working documents as Internet-Drafts. |
| 17 |
|
|
|
| 18 |
|
|
Internet-Drafts are draft documents valid for a maximum of six |
| 19 |
|
|
months and may be updated, replaced, or obsoleted by other |
| 20 |
|
|
documents at any time. It is inappropriate to use Internet- |
| 21 |
|
|
Drafts as reference material or to cite them other than as |
| 22 |
|
|
"work in progress." |
| 23 |
|
|
|
| 24 |
|
|
To view the entire list of current Internet-Drafts, please check |
| 25 |
|
|
the "1id-abstracts.txt" listing contained in the Internet-Drafts |
| 26 |
|
|
Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net |
| 27 |
|
|
(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au |
| 28 |
|
|
(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu |
| 29 |
|
|
(US West Coast). |
| 30 |
|
|
|
| 31 |
|
|
Distribution of this memo is unlimited. |
| 32 |
|
|
|
| 33 |
|
|
This Internet Draft expires February 7, 1999. |
| 34 |
|
|
|
| 35 |
|
|
Copyright Notice |
| 36 |
|
|
|
| 37 |
|
|
Copyright (C) The Internet Society (1998). All Rights Reserved. |
| 38 |
|
|
|
| 39 |
|
|
Abstract |
| 40 |
|
|
|
| 41 |
|
|
A Uniform Resource Locator (URL) is a compact string representation |
| 42 |
|
|
of the location for a resource that is available via the Internet. |
| 43 |
|
|
This document provides guidelines for the definition of new URL |
| 44 |
|
|
schemes. |
| 45 |
|
|
|
| 46 |
|
|
|
| 47 |
|
|
1. Introduction |
| 48 |
|
|
|
| 49 |
|
|
A Uniform Resource Locator (URL) is a compact string representation |
| 50 |
|
|
of the location for a resource that is available via the Internet. |
| 51 |
|
|
RFC [URI-SYNTAX] [1] defines the general syntax and semantics of URIs, |
| 52 |
|
|
and, by inclusion, URLs. URLs are designated by including a |
| 53 |
|
|
"<scheme>:" and then a "<scheme-specific-part>". Many URL schemes |
| 54 |
|
|
are already defined. |
| 55 |
|
|
|
| 56 |
|
|
This document provides guidelines for the definition of new URL |
| 57 |
|
|
schemes, for consideration by those who are defining and |
| 58 |
|
|
registering or evaluating those definitions. |
| 59 |
|
|
|
| 60 |
|
|
The process by which new URL schemes are registered is defined in |
| 61 |
|
|
RFC [URL-PROCESS] [2]. |
| 62 |
|
|
|
| 63 |
|
|
|
| 64 |
|
|
2. Guidelines for new URL schemes |
| 65 |
|
|
|
| 66 |
|
|
Because new URL schemes potentially complicate client software, new |
| 67 |
|
|
schemes must have demonstrable utility and operability, as well as |
| 68 |
|
|
compatibility with existing URL schemes. This section elaborates |
| 69 |
|
|
these criteria. |
| 70 |
|
|
|
| 71 |
|
|
|
| 72 |
|
|
2.1 Syntactic compatibility |
| 73 |
|
|
|
| 74 |
|
|
New URL schemes should follow the same syntactic conventions of |
| 75 |
|
|
existing schemes when appropriate. |
| 76 |
|
|
|
| 77 |
|
|
|
| 78 |
|
|
2.1.1 Improper use of "//" following "<scheme>:" |
| 79 |
|
|
|
| 80 |
|
|
Contrary to some examples set in past years, the use of double |
| 81 |
|
|
slashes as the first component of the <scheme-specific-part> of a |
| 82 |
|
|
URL is not simply an artistic indicator that what follows is a URL: |
| 83 |
|
|
Double slashes are used ONLY when the syntax of the URL's |
| 84 |
|
|
<scheme-specific-part> contains a hierarchical structure as |
| 85 |
|
|
described in RFC [URI-SYNTAX]. In URLs from such schemes, the use |
| 86 |
|
|
of double slashes indicates that what follows is the top |
| 87 |
|
|
hierarchical element for a naming authority. (See section 3 of RFC |
| 88 |
|
|
[URI-SYNTAX] for more details.) URL schemes which do not contain a |
| 89 |
|
|
conformant hierarchical structure in their <scheme-specific-part> |
| 90 |
|
|
should not use double slashes following the "<scheme>:" string. |
| 91 |
|
|
|
| 92 |
|
|
|
| 93 |
|
|
2.1.2 Compatibility with relative URLs |
| 94 |
|
|
|
| 95 |
|
|
URL schemes should use the generic URL syntax if they are intended |
| 96 |
|
|
to be used with relative URLs. A description of the allowed |
| 97 |
|
|
relative forms should be included in the scheme's definition. |
| 98 |
|
|
Many applications use relative URLs extensively. Specifically, |
| 99 |
|
|
|
| 100 |
|
|
o Can the scheme be parsed according to RFC [URI-SYNTAX] - that is, |
| 101 |
|
|
if the tokens "//", "/", ";", "?" and "#" are used, do they have |
| 102 |
|
|
the meaning given in RFC [URI-SYNTAX]? |
| 103 |
|
|
|
| 104 |
|
|
o Does the scheme make sense to use it in relative URLs like those |
| 105 |
|
|
RFC [URI-SYNTAX] specifies? |
| 106 |
|
|
|
| 107 |
|
|
o If the scheme syntax is designed to be broken into pieces, does |
| 108 |
|
|
the documentation for the scheme's syntax specify what those |
| 109 |
|
|
pieces are, why it should be broken in this way, and why the |
| 110 |
|
|
breaks aren't where RFC [URI-SYNTAX] says that they usually should |
| 111 |
|
|
be? |
| 112 |
|
|
|
| 113 |
|
|
o If the scheme has a hierarchy, does it go left-to-right and with |
| 114 |
|
|
slash separators like RFC [URI-SYNTAX]? If not, why not? |
| 115 |
|
|
|
| 116 |
|
|
|
| 117 |
|
|
2.2 Is the scheme well defined? |
| 118 |
|
|
|
| 119 |
|
|
It is important that the semantics of the "resource" that a URL |
| 120 |
|
|
"locates" be well defined. This might mean different things |
| 121 |
|
|
depending on the nature of the URL scheme. |
| 122 |
|
|
|
| 123 |
|
|
|
| 124 |
|
|
2.2.1 Clear mapping from other name spaces |
| 125 |
|
|
|
| 126 |
|
|
In many cases, new URL schemes are defined as ways to translate |
| 127 |
|
|
other protocols and name spaces into the general framework of |
| 128 |
|
|
URLs. The "ftp" URL scheme translates from the FTP protocol, while |
| 129 |
|
|
the "mid" URL scheme translates from the Message-ID field of |
| 130 |
|
|
messages. |
| 131 |
|
|
|
| 132 |
|
|
In either case, the description of the mapping must be complete, |
| 133 |
|
|
must describe how character sets get encoded or not in URLs, must |
| 134 |
|
|
describe exactly how all legal values of the base standard can be |
| 135 |
|
|
represented using the URL scheme, and exactly which modifiers, |
| 136 |
|
|
alternate forms and other artifacts from the base standards are |
| 137 |
|
|
included or not included. These requirements are elaborated |
| 138 |
|
|
below. |
| 139 |
|
|
|
| 140 |
|
|
|
| 141 |
|
|
2.2.2 URL schemes associated with network protocols |
| 142 |
|
|
|
| 143 |
|
|
Most new URL schemes are associated with network resources that |
| 144 |
|
|
have one or several network protocols that can access them. The |
| 145 |
|
|
'ftp', 'news', and 'http' schemes are of this nature. For such |
| 146 |
|
|
schemes, the specification should completely describe how URLs are |
| 147 |
|
|
translated into protocol actions in sufficient detail to make the |
| 148 |
|
|
access of the network resource unambiguous. If an implementation |
| 149 |
|
|
of the URL scheme requires some configuration, the configuration |
| 150 |
|
|
elements must be clearly identified. (For example, the 'news' |
| 151 |
|
|
scheme, if implemented using NTTP, requires configuration of the |
| 152 |
|
|
NTTP server.) |
| 153 |
|
|
|
| 154 |
|
|
|
| 155 |
|
|
2.2.3 Character encoding |
| 156 |
|
|
|
| 157 |
|
|
When describing URL schemes in which (some of) the elements of |
| 158 |
|
|
the URL are actually representations of sequences of characters, |
| 159 |
|
|
care should be taken not to introduce unnecessary variety in the |
| 160 |
|
|
ways in which characters are encoded into octets and then into |
| 161 |
|
|
URL characters. Unless there is some compelling reason for a |
| 162 |
|
|
particular scheme to do otherwise, translating character sequences |
| 163 |
|
|
into UTF-8 (RFC 2044) [3] and then subsequently using the %HH |
| 164 |
|
|
encoding for unsafe octets is recommended. |
| 165 |
|
|
|
| 166 |
|
|
|
| 167 |
|
|
2.2.4 Definition of non-protocol URL schemes |
| 168 |
|
|
|
| 169 |
|
|
In some cases, URL schemes do not have particular network protocols |
| 170 |
|
|
associated with them, because their use is limited to contexts |
| 171 |
|
|
where the access method is understood. This is the case, for |
| 172 |
|
|
example, with the "cid" and "mid" URL schemes. For these URL |
| 173 |
|
|
schemes, the specification should describe the notation of the |
| 174 |
|
|
scheme and a complete mapping of the locator from its source. |
| 175 |
|
|
|
| 176 |
|
|
|
| 177 |
|
|
2.2.5 Definition of URL schemes not associated with data resources |
| 178 |
|
|
|
| 179 |
|
|
Most URL schemes locate Internet resources that correspond |
| 180 |
|
|
to data objects that can be retrieved or modified. This is the |
| 181 |
|
|
case with "ftp" and "http", for example. However, some URL schemes |
| 182 |
|
|
do not; for example, the "mailto" URL scheme corresponds to an |
| 183 |
|
|
Internet mail address. |
| 184 |
|
|
|
| 185 |
|
|
If a new URL scheme does not locate resources that are data |
| 186 |
|
|
objects, the properties of names in the new space must be clearly |
| 187 |
|
|
defined. |
| 188 |
|
|
|
| 189 |
|
|
|
| 190 |
|
|
2.2.6 Definition of operations |
| 191 |
|
|
|
| 192 |
|
|
In some contexts (for example, HTML forms) it is possible to |
| 193 |
|
|
specify any one of a list of operations to be performed on a |
| 194 |
|
|
specific URL. (Outside forms, it is generally assumed to be |
| 195 |
|
|
something you GET.) |
| 196 |
|
|
|
| 197 |
|
|
The URL scheme definition should describe all well-defined |
| 198 |
|
|
operations on the URL identifier, and what they are supposed to |
| 199 |
|
|
do. |
| 200 |
|
|
|
| 201 |
|
|
Some URL schemes (for example, "telnet") provide location |
| 202 |
|
|
information for hooking onto bi-directional data streams, and don't |
| 203 |
|
|
fit the "infoaccess" paradigm of most URLs very well; this should |
| 204 |
|
|
be documented. |
| 205 |
|
|
|
| 206 |
|
|
NOTE: It is perfectly valid to say that "no operation apart from |
| 207 |
|
|
GET is defined for this URL". It is also valid to say that "there's |
| 208 |
|
|
only one operation defined for this URL, and it's not very |
| 209 |
|
|
GET-like". The important point is that what is defined on this type |
| 210 |
|
|
is described. |
| 211 |
|
|
|
| 212 |
|
|
|
| 213 |
|
|
2.3 Demonstrated utility |
| 214 |
|
|
|
| 215 |
|
|
URL schemes should have demonstrated utility. New URL schemes are |
| 216 |
|
|
expensive things to support. Often they require special code in |
| 217 |
|
|
browsers, proxies, and/or servers. Having a lot of ways to say the |
| 218 |
|
|
same thing needless complicates these programs without adding value |
| 219 |
|
|
to the Internet. |
| 220 |
|
|
|
| 221 |
|
|
The kinds of things that are useful include: |
| 222 |
|
|
|
| 223 |
|
|
o Things that cannot be referred to in any other way. |
| 224 |
|
|
|
| 225 |
|
|
o Things where it is much easier to get at them using this scheme |
| 226 |
|
|
than (for instance) a proxy gateway. |
| 227 |
|
|
|
| 228 |
|
|
|
| 229 |
|
|
2.3.1 Proxy into HTTP/HTML |
| 230 |
|
|
|
| 231 |
|
|
One way to provide a demonstration of utility is via a gateway |
| 232 |
|
|
which provides objects in the new scheme for clients using an |
| 233 |
|
|
existing protocol. It is much easier to deploy gateways to a new |
| 234 |
|
|
service than it is to deploy browsers that understand the new URL |
| 235 |
|
|
object. |
| 236 |
|
|
|
| 237 |
|
|
Things to look for when thinking about a proxy are: |
| 238 |
|
|
|
| 239 |
|
|
o Is there a single global resolution mechanism whereby any proxy |
| 240 |
|
|
can find the referenced object? |
| 241 |
|
|
o If not, is there a way in which the user can find any object of |
| 242 |
|
|
this type, and "run his own proxy"? |
| 243 |
|
|
o Are the operations mappable one-to-one (or possibly using |
| 244 |
|
|
modifiers) to HTTP operations? |
| 245 |
|
|
o Is the type of returned objects well defined? |
| 246 |
|
|
- as MIME content-types? |
| 247 |
|
|
- as something that can be translated to HTML? |
| 248 |
|
|
o Is there running code for a proxy? |
| 249 |
|
|
|
| 250 |
|
|
|
| 251 |
|
|
2.4 Are there security considerations? |
| 252 |
|
|
|
| 253 |
|
|
Above and beyond the security considerations of the base mechanism |
| 254 |
|
|
a scheme builds upon, one must think of things that can happen in |
| 255 |
|
|
the normal course of URL usage. |
| 256 |
|
|
|
| 257 |
|
|
In particular: |
| 258 |
|
|
|
| 259 |
|
|
o Does the user need to be warned that such a thing is happening |
| 260 |
|
|
without an explicit request (GET for the source of an IMG tag, |
| 261 |
|
|
for instance)? This has implications for the design of a proxy |
| 262 |
|
|
gateway, of course. |
| 263 |
|
|
|
| 264 |
|
|
o Is it possible to fake URLs of this type that point to different |
| 265 |
|
|
things in a dangerous way? |
| 266 |
|
|
|
| 267 |
|
|
o Are there mechanisms for identifying the requester that can be |
| 268 |
|
|
used or need to be used with this mechanism (the From: field in a |
| 269 |
|
|
mailto: URL, or the Kerberos login required for AFS access in the |
| 270 |
|
|
AFS: URL, for instance)? |
| 271 |
|
|
|
| 272 |
|
|
o Does the mechanism contain passwords or other security |
| 273 |
|
|
information that are passed inside the referring document in the |
| 274 |
|
|
clear (as in the "ftp" URL, for instance)? |
| 275 |
|
|
|
| 276 |
|
|
|
| 277 |
|
|
2.5 Does it start with UR? |
| 278 |
|
|
|
| 279 |
|
|
Any scheme starting with the letters "U" and "R", in particular if |
| 280 |
|
|
it attaches any of the meanings "uniform", "universal" or |
| 281 |
|
|
"unifying" to the first letter, is going to cause intense debate, |
| 282 |
|
|
and generate much heat (but maybe little light). |
| 283 |
|
|
|
| 284 |
|
|
Any such proposal should either make sure that there is a large |
| 285 |
|
|
consensus behind it that it will be the only scheme of its type, or |
| 286 |
|
|
pick another name. |
| 287 |
|
|
|
| 288 |
|
|
|
| 289 |
|
|
2.6 Non-considerations |
| 290 |
|
|
|
| 291 |
|
|
Some issues that are often raised but are not relevant to new URL |
| 292 |
|
|
schemes include the following. |
| 293 |
|
|
|
| 294 |
|
|
|
| 295 |
|
|
2.6.1 Are all objects accessible? |
| 296 |
|
|
|
| 297 |
|
|
Can all objects in the world that are validly identified by a |
| 298 |
|
|
scheme be accessed by any UA implementing it? |
| 299 |
|
|
|
| 300 |
|
|
Sometimes the answer will be yes and sometimes no; often it will |
| 301 |
|
|
depend on factors (like firewalls or client configuration) not |
| 302 |
|
|
directly related to the scheme itself. |
| 303 |
|
|
|
| 304 |
|
|
|
| 305 |
|
|
3. Security considerations |
| 306 |
|
|
|
| 307 |
|
|
New URL schemes are required to address all security considerations |
| 308 |
|
|
in their definitions. |
| 309 |
|
|
|
| 310 |
|
|
|
| 311 |
|
|
4. IANA considerations |
| 312 |
|
|
|
| 313 |
|
|
The process by which URL schemes names are registered is specified |
| 314 |
|
|
in RFC [URL-PROCESS]. |
| 315 |
|
|
|
| 316 |
|
|
|
| 317 |
|
|
5. References |
| 318 |
|
|
|
| 319 |
|
|
[1] Berners-Lee, T., Fielding, R., Masinter, L., "Uniform Resource |
| 320 |
|
|
Identifiers (URI): Generic Syntax", RFC [URI-SYNTAX], August 1998 |
| 321 |
|
|
|
| 322 |
|
|
[2] Petke, R., "Registration Procedures for URL Scheme Names", |
| 323 |
|
|
RFC [URL-PROCESS], August 1998 |
| 324 |
|
|
|
| 325 |
|
|
[3] Yergeau, F., "UTF-8, A Transformation Format of Unicode and ISO |
| 326 |
|
|
10646", RFC 2044, October 1996. |
| 327 |
|
|
|
| 328 |
|
|
|
| 329 |
|
|
6. Authors' Addresses |
| 330 |
|
|
|
| 331 |
|
|
Larry Masinter |
| 332 |
|
|
Xerox Corporation |
| 333 |
|
|
Palo Alto Research Center |
| 334 |
|
|
3333 Coyote Hill Road |
| 335 |
|
|
Palo Alto, CA 94304 |
| 336 |
|
|
Fax: +1-415-812-4333 |
| 337 |
|
|
EMail: masinter@parc.xerox.com |
| 338 |
|
|
|
| 339 |
|
|
Harald Tveit Alvestrand |
| 340 |
|
|
Maxware, Pirsenteret |
| 341 |
|
|
N-7005 Trondheim |
| 342 |
|
|
NORWAY |
| 343 |
|
|
Voice: +47 73 54 57 00 |
| 344 |
|
|
EMail: harald.alvestrand@maxware.no |
| 345 |
|
|
|
| 346 |
|
|
Dan Zigmond |
| 347 |
|
|
WebTV Networks, Inc. |
| 348 |
|
|
305 Lytton Avenue |
| 349 |
|
|
Palo Alto, CA 94301 |
| 350 |
|
|
USA |
| 351 |
|
|
Voice: +1-650-614-6071 |
| 352 |
|
|
EMail: djz@corp.webtv.net |
| 353 |
|
|
|
| 354 |
|
|
Rich Petke |
| 355 |
|
|
WorldCom Advanced Networks |
| 356 |
|
|
5000 Britton Road |
| 357 |
|
|
P. O. Box 5000 |
| 358 |
|
|
Hilliard, OH 43026-5000 |
| 359 |
|
|
Voice: +1-614-723-4157 |
| 360 |
|
|
Fax: +1-614-723-1333 |
| 361 |
|
|
EMail: rpetke@compuserve.net |
| 362 |
|
|
|