2004/id/draft-ietf-urlreg-guide-02.txt


INTERNET-DRAFT                                           Larry Masinter
<draft-ietf-urlreg-guide-02.txt>                   Harald T. Alvestrand
May 8, 1998                                                 Dan Zigmond
                                                             Rich Petke


                  Guidelines for new URL Schemes


Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-Drafts
   as reference material or to cite them other than as ``work in
   progress.''

   To view the entire list of current Internet-Drafts, please check
   the "1id-abstracts.txt" listing contained in the Internet-Drafts
   Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
   (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
   (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
   (US West Coast).

Abstract

   A Uniform Resource Locator (URL) is a compact string representation
   of the location for a resource that is available via the Internet.
   This document provides guidelines for the definition of new URL
   schemes.


1. Introduction

   A Uniform Resource Locator (URL) is a compact string representation
   of the location for a resource that is available via the Internet.
   RFC [URI-SYNTAX] defines the general syntax and semantics of URIs,
   and, by inclusion, URLs.  URLs are designated by including a
   "scheme" and then a "scheme-specific part".  Many URL schemes are
   already defined.

   This document provides guidelines for the definition of new URL
   schemes, for consideration by those who are defining and
   registering or evaluating those definitions.

   The process by which new URL schemes are registered is defined in
   RFC [URL-PROCESS].


2. Guidelines for new URL schemes

   Because new URL schemes potentially complicate client software, new
   schemes must have demonstrable utility and operability, as well as
   compatibility with existing URL schemes.  This section elaborates
   these criteria.


2.1 Syntactic compatibility

   New URL schemes should follow the same syntactic conventions of
   existing schemes when appropriate. 


2.1.1 Use of initial "//" for top level

   Many proposed new URL schemes seem to use "://" as a kind of
   indicator that what follows is a URL.  However, the use of "//"
   indicates a "top level" for schemes that support relative
   URLs, and is not necessary (and just confusing) for schemes
   that have no relative forms.  URL schemes without relative
   forms (such as mailto, cid, mid) do not use an initial "//".


2.1.2 Compatibility with relative URLs

   URL schemes should use the generic URL syntax if they are intended
   to be used with relative URLs.  A description of the allowed
   relative forms should be included in the scheme's definition.
   Many applications use relative URLs extensively.  Specifically,

   o Can the scheme be parsed according to RFC [URI-SYNTAX] - that is,
     if the tokens "//", "/", ";", "?" and "#" are used, do they have
     the meaning given in RFC [URI-SYNTAX]?

   o Does the scheme make sense to use it in relative URLs like those
     RFC [URI-SYNTAX] specifies?

   o If the scheme syntax is designed to be broken into pieces, does
     the documentation for the scheme's syntax specify what those
     pieces are, why it should be broken in this way, and why the
     breaks aren't where RFC [URI-SYNTAX] says that they usually should
     be?
     
   o If the scheme has a hierarchy, does it go left-to-right and with
     slash separators like RFC [URI-SYNTAX]?  If not, why not?


2.2 Is the scheme well defined?

   It is important that the semantics of the "resource" that a URL
   "locates" be well defined.  This might mean different things
   depending on the nature of the URL scheme.


2.2.1 Clear mapping from other name spaces

   In many cases, new URL schemes are defined as ways to translate
   other protocols and name spaces into the general framework of
   URLs.  The "ftp" URL scheme translates from the FTP protocol, while
   the "mid" URL scheme translates from the Message-ID field of
   messages.

   In either case, the description of the mapping must be complete,
   must describe how character sets get encoded or not in URLs, must
   describe exactly how all legal values of the base standard can be
   represented using the URL scheme, and exactly which modifiers,
   alternate forms and other artifacts from the base standards are
   included or not included.  These requirements are elaborated
   below.


2.2.2 URL schemes associated with network protocols

   Most new URL schemes are associated with network resources that
   have one or several network protocols that can access them.  The
   'ftp', 'news', and 'http' schemes are of this nature.  For such
   schemes, the specification should completely describe how URLs are
   translated into protocol actions in sufficient detail to make the
   access of the network resource unambiguous.  If an implementation
   of the URL scheme requires some configuration, the configuration
   elements must be clearly identified.  (For example, the 'news'
   scheme, if implemented using NTTP, requires configuration of the
   NTTP server.)


2.2.3 Character encoding

   When describing URL schemes in which (some of) the elements of
   the URL are actually representations of sequences of characters,
   care should be taken not to introduce unnecessary variety in the
   ways in which characters are encoded into octets and then into
   URL characters.  Unless there is some compelling reason for a
   particular scheme to do otherwise, translating character sequences
   into UTF-8 [RFC 2044] and then subsequently using the %HH encoding
   for unsafe octets is recommended.


2.2.4 Definition of non-protocol URL schemes
   
   In some cases, URL schemes do not have particular network protocols
   associated with them, because their use is limited to contexts
   where the access method is understood.  This is the case, for
   example, with the "cid" and "mid" URL schemes.  For these URL
   schemes, the specification should describe the notation of the
   scheme and a complete mapping of the locator from its source.


2.2.5 Definition of URL schemes not associated with data resources

   Most URL schemes locate Internet resources that correspond
   to data objects that can be retrieved or modified.  This is the
   case with "ftp" and "http", for example.  However, some URL schemes
   do not; for example, the "mailto" URL scheme corresponds to an
   Internet mail address.
   
   If a new URL scheme does not locate resources that are data
   objects, the properties of names in the new space must be clearly
   defined.


2.2.6 Definition of operations

   In some contexts (for example, HTML forms) it is possible to
   specify any one of a list of operations to be performed on a
   specific URL.  (Outside forms, it is generally assumed to be
   something you GET.)

   The URL scheme definition should describe all well-defined
   operations on the URL identifier, and what they are supposed to
   do.
        
   Some URL schemes (for example, "telnet") provide location
   information for hooking onto bi-directional data streams, and don't
   fit the "infoaccess" paradigm of most URLs very well; this should
   be documented.

   NOTE: It is perfectly valid to say that "no operation apart from
   GET is defined for this URL".  It is also valid to say that "there's
   only one operation defined for this URL, and it's not very
   GET-like".  The important point is that what is defined on this type
   is described.


2.3 Demonstrated utility

   URL schemes should have demonstrated utility.  New URL schemes are
   expensive things to support.  Often they require special code in
   browsers, proxies, and/or servers.  Having a lot of ways to say the
   same thing needless complicates these programs without adding value
   to the Internet.

   The kinds of things that are useful include:

      o Things that cannot be referred to in any other way.

      o Things where it is much easier to get at them using this
        scheme than (for instance) a proxy gateway.


2.3.1 Proxy into HTTP/HTML

   One way to provide a demonstration of utility is via a gateway
   which provides objects in the new scheme for clients using an
   existing protocol.  It is much easier to deploy gateways to a new
   service than it is to deploy browsers that understand the new URL
   object.

   Things to look for when thinking about a proxy are:

   o Is there a single global resolution mechanism whereby any proxy
     can find the referenced object?
   o If not, is there a way in which the user can find any object of
     this type, and "run his own proxy"?
   o Are the operations mappable one-to-one (or possibly using
     modifiers) to HTTP operations?
   o Is the type of returned objects well defined?
      * as MIME content-types?
      * as something that can be translated to HTML?
   o Is there running code for a proxy?


2.4 Are there security considerations?

   Above and beyond the security considerations of the base mechanism
   a scheme builds upon, one must think of things that can happen in
   the normal course of URL usage.

   In particular:

   o Does the user need to be warned that such a thing is happening
     without an explicit request (GET for the source of an IMG tag,
     for instance)?  This has implications for the design of a proxy
     gateway, of course.

   o Is it possible to fake URLs of this type that point to different
     things in a dangerous way?

   o Are there mechanisms for identifying the requester that can be
     used or need to be used with this mechanism (the From: field in a
     mailto: URL, or the Kerberos login required for AFS access in the
     AFS: URL, for instance)?

   o Does the mechanism contain passwords or other security
     information that are passed inside the referring document in the
     clear (as in the "ftp" URL, for instance)?


2.5 Does it start with UR?

   Any scheme starting with the letters "U" and "R", in particular if
   it attaches any of the meanings "uniform", "universal" or
   "unifying" to the first letter, is going to cause intense debate,
   and generate much heat (but maybe little light).

   Any such proposal should either make sure that there is a large
   consensus behind it that it will be the only scheme of its type, or
   pick another name.


2.6 Non-considerations

   Some issues that are often raised but are not relevant to new URL
   schemes include the following.


2.6.1 Are all objects accessible?

   Can all objects in the world that are validly identified by a
   scheme be accessed by any UA implementing it?

   Sometimes the answer will be yes and sometimes no; often it will
   depend on factors (like firewalls or client configuration) not
   directly related to the scheme itself.


3. Security considerations

   New URL schemes are required to address all security considerations
   in their definitions.


4. IANA considerations

   The process by which URL schemes names are registered is specified
   in RFC [URL-PROCESS].


5. References

   RFC 2044 F. Yergeau, "UTF-8, A Transformation Format of Unicode
    and ISO 10646", Alis Technologies, October 1996.

   RFC [URI-SYNTAX] T. Berners-Lee, R. Fielding, L. Masinter, "Uniform
    Resource Identifiers (URI): Generic Syntax and Semantics",
    <draft-fielding-uri-syntax-*.txt>.

   RFC [URL-PROCESS] R. Petke, "Registration Procedures for URL Scheme
    Names", <draft-ietf-urlreg-procedures-*.txt>


6. Authors' Addresses

   Larry Masinter
   Xerox Corporation
   Palo Alto Research Center
   3333 Coyote Hill Road
   Palo Alto, CA 94304
   Fax: +1-415-812-4333
   EMail: masinter@parc.xerox.com

   Harald Tveit Alvestrand
   Maxware, Pirsenteret
   N-7005 Trondheim
   NORWAY
   Voice: +47 73 54 57 00
   EMail: harald.alvestrand@maxware.no

   Dan Zigmond
   WebTV Networks, Inc.
   305 Lytton Avenue
   Palo Alto, CA 94301
   USA
   Voice: +1-650-614-6071
   EMail: djz@corp.webtv.net 

   Rich Petke
   WorldCom Advanced Networks
   5000 Britton Road
   P. O. Box 5000
   Hilliard, OH 43026-5000
   Voice: +1-614-723-4157
   Fax: +1-614-723-1333
   EMail: rpetke@compuserve.net

1
2	INTERNET-DRAFT Larry Masinter
3	<draft-ietf-urlreg-guide-02.txt> Harald T. Alvestrand
4	May 8, 1998 Dan Zigmond
5	Rich Petke
6
7
8	Guidelines for new URL Schemes
9
10
11	Status of this Memo
12
13	This document is an Internet-Draft. Internet-Drafts are working
14	documents of the Internet Engineering Task Force (IETF), its areas,
15	and its working groups. Note that other groups may also distribute
16	working documents as Internet-Drafts.
17
18	Internet-Drafts are draft documents valid for a maximum of six
19	months and may be updated, replaced, or obsoleted by other
20	documents at any time. It is inappropriate to use Internet-Drafts
21	as reference material or to cite them other than as ``work in
22	progress.''
23
24	To view the entire list of current Internet-Drafts, please check
25	the "1id-abstracts.txt" listing contained in the Internet-Drafts
26	Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net
27	(Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au
28	(Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu
29	(US West Coast).
30
31	Abstract
32
33	A Uniform Resource Locator (URL) is a compact string representation
34	of the location for a resource that is available via the Internet.
35	This document provides guidelines for the definition of new URL
36	schemes.
37
38
39	1. Introduction
40
41	A Uniform Resource Locator (URL) is a compact string representation
42	of the location for a resource that is available via the Internet.
43	RFC [URI-SYNTAX] defines the general syntax and semantics of URIs,
44	and, by inclusion, URLs. URLs are designated by including a
45	"scheme" and then a "scheme-specific part". Many URL schemes are
46	already defined.
47
48	This document provides guidelines for the definition of new URL
49	schemes, for consideration by those who are defining and
50	registering or evaluating those definitions.
51
52	The process by which new URL schemes are registered is defined in
53	RFC [URL-PROCESS].
54
55
56	2. Guidelines for new URL schemes
57
58	Because new URL schemes potentially complicate client software, new
59	schemes must have demonstrable utility and operability, as well as
60	compatibility with existing URL schemes. This section elaborates
61	these criteria.
62
63
64	2.1 Syntactic compatibility
65
66	New URL schemes should follow the same syntactic conventions of
67	existing schemes when appropriate.
68
69
70	2.1.1 Use of initial "//" for top level
71
72	Many proposed new URL schemes seem to use "://" as a kind of
73	indicator that what follows is a URL. However, the use of "//"
74	indicates a "top level" for schemes that support relative
75	URLs, and is not necessary (and just confusing) for schemes
76	that have no relative forms. URL schemes without relative
77	forms (such as mailto, cid, mid) do not use an initial "//".
78
79
80	2.1.2 Compatibility with relative URLs
81
82	URL schemes should use the generic URL syntax if they are intended
83	to be used with relative URLs. A description of the allowed
84	relative forms should be included in the scheme's definition.
85	Many applications use relative URLs extensively. Specifically,
86
87	o Can the scheme be parsed according to RFC [URI-SYNTAX] - that is,
88	if the tokens "//", "/", ";", "?" and "#" are used, do they have
89	the meaning given in RFC [URI-SYNTAX]?
90
91	o Does the scheme make sense to use it in relative URLs like those
92	RFC [URI-SYNTAX] specifies?
93
94	o If the scheme syntax is designed to be broken into pieces, does
95	the documentation for the scheme's syntax specify what those
96	pieces are, why it should be broken in this way, and why the
97	breaks aren't where RFC [URI-SYNTAX] says that they usually should
98	be?
99
100	o If the scheme has a hierarchy, does it go left-to-right and with
101	slash separators like RFC [URI-SYNTAX]? If not, why not?
102
103
104	2.2 Is the scheme well defined?
105
106	It is important that the semantics of the "resource" that a URL
107	"locates" be well defined. This might mean different things
108	depending on the nature of the URL scheme.
109
110
111	2.2.1 Clear mapping from other name spaces
112
113	In many cases, new URL schemes are defined as ways to translate
114	other protocols and name spaces into the general framework of
115	URLs. The "ftp" URL scheme translates from the FTP protocol, while
116	the "mid" URL scheme translates from the Message-ID field of
117	messages.
118
119	In either case, the description of the mapping must be complete,
120	must describe how character sets get encoded or not in URLs, must
121	describe exactly how all legal values of the base standard can be
122	represented using the URL scheme, and exactly which modifiers,
123	alternate forms and other artifacts from the base standards are
124	included or not included. These requirements are elaborated
125	below.
126
127
128	2.2.2 URL schemes associated with network protocols
129
130	Most new URL schemes are associated with network resources that
131	have one or several network protocols that can access them. The
132	'ftp', 'news', and 'http' schemes are of this nature. For such
133	schemes, the specification should completely describe how URLs are
134	translated into protocol actions in sufficient detail to make the
135	access of the network resource unambiguous. If an implementation
136	of the URL scheme requires some configuration, the configuration
137	elements must be clearly identified. (For example, the 'news'
138	scheme, if implemented using NTTP, requires configuration of the
139	NTTP server.)
140
141
142	2.2.3 Character encoding
143
144	When describing URL schemes in which (some of) the elements of
145	the URL are actually representations of sequences of characters,
146	care should be taken not to introduce unnecessary variety in the
147	ways in which characters are encoded into octets and then into
148	URL characters. Unless there is some compelling reason for a
149	particular scheme to do otherwise, translating character sequences
150	into UTF-8 [RFC 2044] and then subsequently using the %HH encoding
151	for unsafe octets is recommended.
152
153
154	2.2.4 Definition of non-protocol URL schemes
155
156	In some cases, URL schemes do not have particular network protocols
157	associated with them, because their use is limited to contexts
158	where the access method is understood. This is the case, for
159	example, with the "cid" and "mid" URL schemes. For these URL
160	schemes, the specification should describe the notation of the
161	scheme and a complete mapping of the locator from its source.
162
163
164	2.2.5 Definition of URL schemes not associated with data resources
165
166	Most URL schemes locate Internet resources that correspond
167	to data objects that can be retrieved or modified. This is the
168	case with "ftp" and "http", for example. However, some URL schemes
169	do not; for example, the "mailto" URL scheme corresponds to an
170	Internet mail address.
171
172	If a new URL scheme does not locate resources that are data
173	objects, the properties of names in the new space must be clearly
174	defined.
175
176
177	2.2.6 Definition of operations
178
179	In some contexts (for example, HTML forms) it is possible to
180	specify any one of a list of operations to be performed on a
181	specific URL. (Outside forms, it is generally assumed to be
182	something you GET.)
183
184	The URL scheme definition should describe all well-defined
185	operations on the URL identifier, and what they are supposed to
186	do.
187
188	Some URL schemes (for example, "telnet") provide location
189	information for hooking onto bi-directional data streams, and don't
190	fit the "infoaccess" paradigm of most URLs very well; this should
191	be documented.
192
193	NOTE: It is perfectly valid to say that "no operation apart from
194	GET is defined for this URL". It is also valid to say that "there's
195	only one operation defined for this URL, and it's not very
196	GET-like". The important point is that what is defined on this type
197	is described.
198
199
200	2.3 Demonstrated utility
201
202	URL schemes should have demonstrated utility. New URL schemes are
203	expensive things to support. Often they require special code in
204	browsers, proxies, and/or servers. Having a lot of ways to say the
205	same thing needless complicates these programs without adding value
206	to the Internet.
207
208	The kinds of things that are useful include:
209
210	o Things that cannot be referred to in any other way.
211
212	o Things where it is much easier to get at them using this
213	scheme than (for instance) a proxy gateway.
214
215
216	2.3.1 Proxy into HTTP/HTML
217
218	One way to provide a demonstration of utility is via a gateway
219	which provides objects in the new scheme for clients using an
220	existing protocol. It is much easier to deploy gateways to a new
221	service than it is to deploy browsers that understand the new URL
222	object.
223
224	Things to look for when thinking about a proxy are:
225
226	o Is there a single global resolution mechanism whereby any proxy
227	can find the referenced object?
228	o If not, is there a way in which the user can find any object of
229	this type, and "run his own proxy"?
230	o Are the operations mappable one-to-one (or possibly using
231	modifiers) to HTTP operations?
232	o Is the type of returned objects well defined?
233	* as MIME content-types?
234	* as something that can be translated to HTML?
235	o Is there running code for a proxy?
236
237
238	2.4 Are there security considerations?
239
240	Above and beyond the security considerations of the base mechanism
241	a scheme builds upon, one must think of things that can happen in
242	the normal course of URL usage.
243
244	In particular:
245
246	o Does the user need to be warned that such a thing is happening
247	without an explicit request (GET for the source of an IMG tag,
248	for instance)? This has implications for the design of a proxy
249	gateway, of course.
250
251	o Is it possible to fake URLs of this type that point to different
252	things in a dangerous way?
253
254	o Are there mechanisms for identifying the requester that can be
255	used or need to be used with this mechanism (the From: field in a
256	mailto: URL, or the Kerberos login required for AFS access in the
257	AFS: URL, for instance)?
258
259	o Does the mechanism contain passwords or other security
260	information that are passed inside the referring document in the
261	clear (as in the "ftp" URL, for instance)?
262
263
264	2.5 Does it start with UR?
265
266	Any scheme starting with the letters "U" and "R", in particular if
267	it attaches any of the meanings "uniform", "universal" or
268	"unifying" to the first letter, is going to cause intense debate,
269	and generate much heat (but maybe little light).
270
271	Any such proposal should either make sure that there is a large
272	consensus behind it that it will be the only scheme of its type, or
273	pick another name.
274
275
276	2.6 Non-considerations
277
278	Some issues that are often raised but are not relevant to new URL
279	schemes include the following.
280
281
282	2.6.1 Are all objects accessible?
283
284	Can all objects in the world that are validly identified by a
285	scheme be accessed by any UA implementing it?
286
287	Sometimes the answer will be yes and sometimes no; often it will
288	depend on factors (like firewalls or client configuration) not
289	directly related to the scheme itself.
290
291
292	3. Security considerations
293
294	New URL schemes are required to address all security considerations
295	in their definitions.
296
297
298	4. IANA considerations
299
300	The process by which URL schemes names are registered is specified
301	in RFC [URL-PROCESS].
302
303
304	5. References
305
306	RFC 2044 F. Yergeau, "UTF-8, A Transformation Format of Unicode
307	and ISO 10646", Alis Technologies, October 1996.
308
309	RFC [URI-SYNTAX] T. Berners-Lee, R. Fielding, L. Masinter, "Uniform
310	Resource Identifiers (URI): Generic Syntax and Semantics",
311	<draft-fielding-uri-syntax-*.txt>.
312
313	RFC [URL-PROCESS] R. Petke, "Registration Procedures for URL Scheme
314	Names", <draft-ietf-urlreg-procedures-*.txt>
315
316
317	6. Authors' Addresses
318
319	Larry Masinter
320	Xerox Corporation
321	Palo Alto Research Center
322	3333 Coyote Hill Road
323	Palo Alto, CA 94304
324	Fax: +1-415-812-4333
325	EMail: masinter@parc.xerox.com
326
327	Harald Tveit Alvestrand
328	Maxware, Pirsenteret
329	N-7005 Trondheim
330	NORWAY
331	Voice: +47 73 54 57 00
332	EMail: harald.alvestrand@maxware.no
333
334	Dan Zigmond
335	WebTV Networks, Inc.
336	305 Lytton Avenue
337	Palo Alto, CA 94301
338	USA
339	Voice: +1-650-614-6071
340	EMail: djz@corp.webtv.net
341
342	Rich Petke
343	WorldCom Advanced Networks
344	5000 Britton Road
345	P. O. Box 5000
346	Hilliard, OH 43026-5000
347	Voice: +1-614-723-4157
348	Fax: +1-614-723-1333
349	EMail: rpetke@compuserve.net
350