INTERNET-DRAFT Larry Masinter Harald T. Alvestrand December 9, 1997 Dan Zigmond Guidelines for new URL Schemes Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract A Uniform Resource Locator (URL) is a compact string representation of the location for a resource that is available via the Internet. This document provides guidelines for the definition of new URL schemes. 1. Introduction A Uniform Resource Locator (URL) is a compact string representation of the location for a resource that is available via the Internet. [URI-SYNTAX] defines the general syntax and semantics of URIs, and, by inclusion, URLs. URLs are designated by including a "scheme" and then a "scheme-specific part". Many URL schemes are already defined. This document provides guidelines for the definition of new URL schemes, for consideration by those who are defining and registering or evaluating those definitions. The process by which new URL schemes are registered or defined is not defined here. 2. Guildelines for new URL schemes Because new URL schemes potentially complicate client software, new schemes must have demonstrable utility and operability, as well as compatibility with existing URL schemes. This section elaborates these criteria. 2.1 Syntactic compatibility New URL schemes should follow the same syntactic conventions of existing schemes when appropriate. 2.1.1 Use of initial "//" for top level Many proposed new URL schemes seem to use "://" as a kind of indicator that what follows is a URL. However, the use of "//" indicates a "top level" for schemes that support relative URLs, and is not necessary (and just confusing) for schemes that have no relative forms. URL schemes without relative forms (such as mailto, cid, mid) do not use an initial "//". 2.1.2 Compatibility with relative URLs URL schemes should use the generic-URL syntax if they are intended to be used with relative URLs. A description of the allowed relative forms should be included in the scheme's definition. Many applications use relative URLs extensively. o Can it be parsed according to [URI-SYNTAX] - that is, if the tokens "//", "/", ";", "?" and "#" are used, do they have the meaning given in RFC URI-SYNTAX? o Does it make sense to use it in relative URLs like those [URI-SYNTAX] specifies? o If something is designed to be broken into pieces, does it document what those pieces are, why it should be broken in this way, and why the breaks aren't where URI-SYNTAX says that they usually should be? o If it has a hierarchy, does it go left-to-right and with slash separators like [URN-SYNTAX]? If not, why not? 2.2 Is the scheme well defined? It is important that the semantics of the "resource" that a URL "locates" be well defined. This might mean different things depending on the nature of the URL scheme. 2.2.1 Clear mapping from other name spaces In many cases, new URL schemes are defined as ways to translate other protocols and name spaces into the general framework of URLs. The "ftp" URL scheme translates from the FTP protocol, while the "mid" URL scheme translates from the Message-ID field of messages. In either case, the description of the mapping must be complete, must describe how character sets get encoded or not in URLs, must describe exactly how all legal values of the base standard can be represented using the URL scheme, and exactly which modifiers, alternate forms and other artifacts from the base standards are included or not included. These requirements are elaborated below. 2.2.2 URL schemes associated with network protocols Most new URL schemes are associated with network resources that have one or several network protocols that can access them. The 'ftp', 'news', and 'http' schemes are of this nature. For such schemes, the specification should completely describe how URLs are translated into protocol actions in sufficient detail to make the access of the network resource unambiguous. If an implementation of of the URL scheme requires some configuration, the configuration elements must be clearly identified. (For example, the 'news' scheme, if implemented using NTTP, requires configuration of the NTTP server.) 2.2.3 Character encoding When describing URL schemes in which (some of) the elements of the URL are actually representations of sequences of characters, care should be taken not to introduce unnecessary variety in the ways in which characters are encoded into octets and then into URL characters. Unless there is some compelling reason for a particular scheme to do otherwise, translating character sequences into UTF-8 [RFC2044] and then subsequently using the %HH encoding for unsafe octets is recommended. 2.2.4 Definition of non-protocol URL schemes In some cases, URL schemes do not have particular network protocols associated with them, because their use is limited to contexts where the access method is understood. This is the case, for example, with the "cid" and "mid" URL schemes. For these URL schemes, the specification should describe the notation of the scheme and a complete mapping of the locator from its source. 2.2.5 Definition of URL schemes not associated with data resources Most URL schemes locate Internet resources that correspond to data objects that can be retrieved or modified. This is the case with "ftp" and "http", for example. However, some URL schemes do not; for example, the "mailto" URL scheme corresponds to an Internet mail address. If a new URL scheme does not locate resources that are data objects, the properties of names in the new space must be clearly defined. 2.2.6 Definition of operations In some contexts (for example, HTML forms) it is possible to specify any one of a list of operations to be performed on a specifc URL. (Outside forms, it is generally assumed to be something you GET.) The URL scheme definition should describe all well-defined operations on the URL identifier, and what they are supposed to do. Some URL schemes (for example, "telnet") provide location information for hooking onto bidirectional data streams, and don't fit the "infoaccess" paradigm of most URLs very well; this should be documented. NOTE: It is perfectly valid to say that "no operation apart from GET is defined for this URL". It is also valid to say that "there's only one operation defined for this URL, and it's not very GET-like". The important point is that what is defined on this type is described. 2.3 Demonstrated utility URL schemes should have demonstrated utility. New URL schemes are expensive things to support. Often they require special code in browsers, proxies, and/or servers. Having a lot of ways to say the same thing needless complicates these programs without adding value to the Internet. The kinds of things that are useful include: o Things that cannot be referred to in any other way. o Things where it is much easier to get at them using this scheme than (for instance) a proxy gateway. 2.3.1 Proxy into HTTP/HTML One way to provide a demonstration of utility is via a gateway which provides objects in the new scheme for clients using an existing protocol. It is much easier to deploy gateways to a new service than it is to deploy browsers that understand the new URL object. Things to look for when thinking about a proxy are: o Is there a single global resolution mechanism whereby any proxy can find the referenced object? o If not, is there a way in which the user can find any object of this type, and "run his own proxy"? o Are the operations mappable one-to-one (or possibly using modifiers) to HTTP operations? o Is the type of returned objects well defined? * as MIME content-types? * as something that can be translated to HTML? o Is there running code for a proxy? 2.4 Are there security considerations? Above and beyond the security considerations of the base mechanism a scheme builds upon, one must think of things that can happen in the normal course of URL usage. In particular: o Does the user need to be warned that such a thing is happening without an explicit request (GET for the source of an IMG tag, for instance)? This has implications for the design of a proxy gateway, of course. o Is it possible to fake URLs of this type that point to different things in a dangerous way? o Are there mechanisms for identifying the requester that can be used or need to be used with this mechanism (the From: field in a mailto: URL, or the Kerberos login required for AFS access in the AFS: url, for instance)? o Does the mechanism contain passwords or other security information that are passed inside the referring document in the clear (as in the "ftp" URL, for instance)? 2.5 Does it start with UR? Any scheme starting with the letters "U" and "R", in particular if it attaches any of the meanings "uniform", "universal" or "unifying" to the first letter, is going to cause intense debate, and generate much heat (but maybe little light). Any such proposal should either make sure that there is a large consensus behind it that it will be the only scheme of its type, or pick another name. 2.6 Non-considerations Some issues that are often raised but are not relevent to new URL schemes include the following. 2.6.1 Are all objects acessible? Can all objects in the world that are validly identified by a scheme be accessed by any UA implementing it? Sometimes the answer will be yes and sometimes no; often it will depend on factors (like firewalls or client configuration) not directly related to the scheme itself. 3. Security considerations New URL schemes are required to address all security considerations in their definitions. 4. IANA considerations The process by which URL schemes are defined and registered is not defined in this document. 5. References [RFC2044] F. Yergeau, "UTF-8, A Transformation Format of Unicode and ISO 10646", Alis Technologies, October 1996. [URI-SYNTAX] T. Berners-Lee, R. Fielding, L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax and Semantics", . 6. Author's Addresses Larry Masinter Xerox Corporation Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, CA 94304 Fax: +1-415-812-4333 EMail: masinter@parc.xerox.com Harald T. Alvestrand UNINETT A/S Postboks 6683 Elgeseter 7002 Trondheim, Norway Tel: +47 73 59 70 94 EMail: Harald.T.Alvestrand@uninett.no Dan Zigmond Wink Communications 1001 Marina Village Parkway Alameda CA 94610 Fax: +1-510-337-2960 Phone: +1-510-337-6359 Email: dan.zigmond@wink.com