2004/id/draft-ietf-uri-roy-urn-urc-00.txt

Uniform Resource Identifiers Working Group                  R. Fielding
INTERNET-DRAFT                                                UC Irvine
Expires January 7, 1996                                    July 7, 1995


              How Roy would Implement URNs and URCs Today
                  <draft-ietf-uri-roy-urn-urc-00.txt>


Status of this Memo

   This document is an Internet-Draft.  Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups.  Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time.  It is inappropriate to use Internet-
   Drafts as reference material or to cite them other than as
   ``work in progress.''

   To learn the current status of any Internet-Draft, please check
   the ``1id-abstracts.txt'' listing contained in the Internet-
   Drafts Shadow Directories on ftp.is.co.za (Africa),
   nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
   ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).

   Distribution of this document is unlimited.  Please send comments
   to the author, Roy T. Fielding <fielding@ics.uci.edu>, or to the
   URI working group (URI-WG) of the Internet Engineering Task Force
   (IETF) at <uri@bunyip.com>. Discussions of the group are archived at
   <URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>.

   This document has no formal status and should not be considered as
   anything more than the opinions of the author.  Although it is
   hoped that someone will eventually implement these ideas, they are
   nonetheless only ideas and are not intended as a standards track
   document [which is why I have chosen such a strange title].


Abstract

   This document describes how the author would implement Uniform
   Resource Names (URNs) and Uniform Resource Characteristics (URCs),
   such that the basic concepts and technology can be usable by today's
   World-Wide Web clients and servers.  It is intended to identify the
   key ingredients which make the WWW extensible and open to the
   introduction of URNs and URCs, and thereby steer the implementors
   of URI technology toward more consistent solutions.

1.  Introduction

   The URI working group has been discussing the topic of Uniform
   Resource Names (URNs) for over three years.  Although the intentions
   of those participating in the WG have always been good, and usually
   constructive, the WG has failed to attain any consensus on how
   a URN service can be implemented such that it satisfies everyone's
   needs.  

   It is my opinion that this search for the "Holy Grail" of URNs is
   both misguided and unnecessary.  It is neither possible nor
   appropriate for us to define a single URN service.  Instead, the WG
   should focus on the interfaces between clients, servers, and name
   services, such that any reasonable form of naming service can be
   introduced when they are available, and according to the needs of the
   end users and content providers rather than those of the WG members.

   The World-Wide Web already contains an architecture capable of
   supporting the client and server interfaces necessary for URN
   addressing, though these interfaces have rarely been defined as such.
   This document is intended to remedy that situation.  Furthermore,
   it will attempt to identify how several URN services can be defined
   and implemented today.  Although these solutions will not solve
   everyone's problems (including such issues as replication and
   authentication of centralized name services), they do provide a
   significant step forward and supply the infrastructure required by
   all URN services.

   This document assumes that the reader has knowledge of the basic
   syntax of WWW Universal Resource Identifiers [1] and Uniform Resource
   Locators (URLs) [2].

2.  URI Syntax

   The World-Wide Web architecture assumes that resource addresses are
   identifiable by their scheme name.  This applies to all URIs, not
   just to what are commonly considered URLs today.  A URI in absolute
   form consists of

      <scheme>:<scheme-specific-part>#<fragment>

   where <scheme> contains only US-ASCII lowercase letters, digits, "+",
   "-", or ".".

   The scheme name identifies the handler routine which would be used
   to resolve the address.  Note that it does not necessarily define
   the protocol to be used, although people commonly make that
   assumption after seeing that the most common scheme names are
   associated with preexisting Internet application protocols.

   The scheme handler routine may exist internal to the client
   application (either hardcoded or within a modular library
   architecture such as that found in libwww or libwww-perl),
   or may be redirected to a proxy application via environment variables
   or other user-configurable devices.  This ability to extend the
   addressing schemes of clients is one of the key features of WWW
   technology.

   In order to be successfully implemented within the current base of
   WWW technology, the URN syntax must correspond to the basic URI
   syntax as described above.  That is, it must start with a scheme name
   which identifies an appropriate resolver for that address (or allows
   the client to identify that it has no resolver for that address).

3.  Media Types

   After an address is resolved and a retrieval action has been
   accomplished through the appropriate scheme handler, a World-Wide Web
   client will choose a second handler routine for the retrieved
   document.  The document handler is chosen according to that
   document's Internet media type [5].  The media type is either
   assigned by the transfer protocol or guessed by the client.

   The document handler routine may exist internal to the client
   application, or may be redirected to an external application via the
   MIME mailcap facility.  Although most handler routines are simply
   viewers for the document content, others exist that control internal
   events or prompt the user for additional input.  This ability to
   extend the behavior of clients is another one of the key features of
   WWW technology.

4.  URCs are Documents

   The notion of Uniform Resource Characteristics (URCs) has been one
   of the central issues in the debate about URN services.  Simply put,
   a URC is a set of characteristics regarding a named resource, in a
   format that can be easily parsed, which identifies a set of locations
   from which the named resource may be obtained.  The URC can then be
   used as the intermediate step between resolving a URN address and
   determining the most appropriate location (from the perspective of
   the client configuration) from which to retrieve the resource.

   Proposals for the format of a URC have ranged from a simple list of
   URLs to a hierarchical query language.  In all cases, however, a URC
   can be considered a document, and therefore should be assigned an
   appropriate media type.  Furthermore, since it is impossible for any
   one group to define a single, all-encompassing format for URCs which
   will satisfy the needs of all archivists and content providers, it
   will be necessary to define a range of media types.

   Note that this view of URCs already fits well with the WWW
   architecture.  If a URC is labelled as such, a WWW client can perform
   location redirection as part of the document handler routine.
   In other words, we can have URN -> URC -> URL indirection working
   with only minor changes to existing clients.

   Unfortunately, that's still not good enough.  Current browsing clients
   will default to "application/octet-stream" if they do not have a
   handler routine installed for the indicated media type (usually
   resulting in a prompt to save the document as a local file).  In
   practice, this has been a barrier to the wholesale introduction of
   new media types.  We need an implementation of URCs that will work
   with all existing clients, because without that assurance, content
   providers will be unwilling to use URCs as an intermediate step.

   The solution is to start with an intermediate form of URC which
   is a fixed variant of an already-universal media type: text/html.
   This is outlined below in Section 6.

5.  URI Resolution Architecture

   But wait, there's more!

   If a URC is identifiable as a document, then any document retrieval
   action may result in an indirection.  Therefore, we are no longer
   talking about just URN resolution via URCs, but also URL redirection
   via URCs (i.e., redirection of a single URL to multiple variants),
   URN resolution to a single URL (i.e., minimal URCs), and URN
   resolution directly to the named resource.  As far as the client is
   concerned, it is just using a URI to retrieve a resource.  All of the
   details of the resolution mechanism remain internal to the scheme
   handler and the URN service provider, thereby removing the need for
   the IETF to attempt to standardize any particular scheme, or any
   particular URN service.

6.  Graceful Introduction of URNs and URCs

   Well, its not all just a bed of roses -- there are plenty of thorns
   that need to be smoothed out in order to promulgate widespread
   implementations of URNs and URCs over the existing WWW.  The
   following sections outline the steps I would take.

6.1.  The ietf URI scheme

   The first thing we need is a simple, but worthwhile, mechanism for
   testing these ideas.  I suggest that we should define a new URI
   scheme called "ietf" -- it's purpose would be to provide a single
   identifier for the replicated archives of the Internet Engineering
   Taskforce.  The format for this identifier is simply:

      "ietf" ":" <existing-ietf-path>

   For example, the identifier of RFC 1808 would become

      ietf:/rfc/rfc1808.txt

   and the one for this draft would be

      ietf:/internet-drafts/draft-ietf-uri-roy-urn-urc-00.txt

   The implementation of the scheme handler is a fairly straightforward
   address replacement table and associated logic.  For example, the
   following could act as the configuration for my local client:

      PREFIX       REPLACEMENT                            AUTHORITATIVE?
      ietf:        file:/home/fielding/ietf                     No
      ietf:/rfc/   ftp://ftp.ics.uci.edu/pub/ietf/rfc/          No
      ietf:/rfc/   http://info.internet.isi.edu/in-notes/rfc/   No
      ietf:        http://ds.internic.net                       Yes
      ietf:        ftp://ds.internic.net                        Yes

   The retrieval logic behind this table is also simple: try each of the
   matching URI addresses (replacing the matching prefix with the
   replacement string) until a good response is received, or until a
   "not found" response is received from an authoritative location.

   Note that the first location points to my own personal archive -- the
   place where I keep a copy of most of the specs I have referenced in
   my past work (or anticipate referencing in the near future). 
   Clearly, I want to retrieve my local copy if I have one available.
   The second address is also a local copy, but consisting of only RFCs
   and maintained by others at UC Irvine working on Internet Mail and
   network management issues.  The ISI archive is also fairly close to
   my (network and physical) location, but uses a slightly different
   path and tends to be 1-2 days out-of-sync with the main IETF
   archives, which are represented as the final two locations.

   There are a couple of interesting features of this example which have
   rarely been considered during past discussion of URN issues.  The
   first is that the table is particular to my own client setup.  There
   is no way for a centralized name service to know these details.
   The second is that the table format could be generic to any URI which
   can be resolved directly via some other URL (such as, for instance,
   via the URL of a URN name service).  Finally, note that the actual
   protocol used to resolve the name is defined by the replacement URL,
   and not by any decision of the WG.

6.2.  The ietf URCs

   The above example did not assume any changes to the existing IETF
   archive namespace.  However, we could get considerably more value
   out of this scheme if partial name matching resulted in a URC.
   For example, if the following name

      ietf:/internet-drafts/draft-ietf-uri-roy-urn-urc

   (note the missing "-00.txt") corresponded to a URC pointing to all
   of the currently available format variants of this draft, then I
   could avoid having to change references every time a new version is
   placed in the archives.  Similarly,

      ietf:/internet-drafts/draft-ietf-uri

   could point to a summary of all current drafts by the URI-WG, and

      ietf:/rfc/rfc1521

   could point to all format variants of RFC 1521.

6.3. The urc major media type

   If URCs are to be given media types, we need to register them.  MIME
   provides four major types: text, application, multipart, message,
   image, audio, and video [4].  However, it is clear that URCs do
   not fit within any one of these categories, and that subtypes of URC
   are desirable.  Therefore, I suggest that we define a new major media
   type called "urc".

   RFC 1590 [5] states that "If a new fundamental top-level type is
   needed, its specification must be published as an RFC or submitted in
   a form suitable to become an RFC, and be subject to the Internet
   standards process."  We'll just put that on the to-do list.

6.4. The urc/html media type

   The first URC format that must be defined is one which will not
   adversely affect current WWW clients.  Therefore, we need to define
   a variant of HTML which will look like a menu on existing browsers,
   and yet be machine recognizable as a URC by new browsers.  We can
   do this by using a fixed format and require a specific SGML DOCTYPE
   declaration to appear as the first line of the URC document.
   For starters, here is what one may look like:

      <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML URC//EN">
      <HTML><HEAD>
      <TITLE>Available resources for ietf:/rfc/rfc1521</TITLE>
      </HEAD><BODY>
      <H1>ietf:/rfc/rfc1521</H1>
      <DL COMPACT>
      <DT>Title:
      <DD>MIME (Multipurpose Internet Mail Extensions)
          Part One: Mechanisms for Specifying and Describing the
          Format of Internet Message Bodies
      <DT>Author:
      <DD>N. Borenstein
      <DD>N. Freed
      <DT>Date:
      <DD>September 1993
      <DT>Obsoletes:
      <DD><A rel="obsoletes" href="ietf:/rfc/rfc1341">RFC 1341</A>
      <DT>Updated-by:
      <DD><A rev="updates" href="ietf:/rfc/rfc1590">RFC 1590</A>
      </DL>
      <MENU vary="location">
      <LI>ftp.is.co.za (Africa)
         <MENU vary="type">
         <LI><A href="ftp://ftp.is.co.za/rfc/rfc1521.txt.gz">
             gzip(text/plain), 20000 bytes</a>
         <LI><A href="ftp://ftp.is.co.za/rfc/rfc1521.ps.gz">
             gzip(application/postscript), 40000 bytes</A>
         </MENU>
      <LI>nic.nordu.net (Europe)
         <MENU vary="type">
         <LI><A href="ftp://nic.nordu.net/rfc/rfc1521.txt">
             text/plain, 187424 bytes</a>
         <LI><A href="ftp://nic.nordu.net/rfc/rfc1521.ps">
             application/postscript, 393670 bytes</A>
         </MENU>
      <LI>munnari.oz.au (Pacific Rim)
         <MENU vary="type">
         <LI><A href="ftp://munnari.oz.au/rfc/rfc1521.txt">
             text/plain, 187424 bytes</a>
         <LI><A href="ftp://munnari.oz.au/rfc/rfc1521.ps">
             application/postscript, 393670 bytes</A>
         </MENU>
      <LI>ds.internic.net (US East Coast)
         <MENU vary="type">
         <LI><A href="http://ds.internic.net/rfc/rfc1521.txt">
             text/plain, 187424 bytes</a>
         <LI><A href="http://ds.internic.net/rfc/rfc1521.ps">
             application/postscript, 393670 bytes</A>
         <LI><A href="ftp://ds.internic.net/rfc/rfc1521.txt">
             text/plain, 187424 bytes</a>
         <LI><A href="ftp://ds.internic.net/rfc/rfc1521.ps">
             application/postscript, 393670 bytes</A>
         </MENU>
      <LI>ftp.isi.edu (US West Coast)
         <MENU vary="type">
         <LI><A href="ftp://ftp.isi.edu/rfc/rfc1521.txt">
             text/plain, 187424 bytes</a>
         <LI><A href="ftp://ftp.isi.edu/rfc/rfc1521.ps">
             application/postscript, 393670 bytes</A>
         </MENU>
      </MENU>
      </BODY></HTML>

   This is only an example -- a complete definition (including BNF)
   would be necessary for the format to be usable for automated
   indirection.

7.  Unfinished Business

   I do not pretend to think that the suggestions identified by this
   document will completely solve the URN problem.  However, I am
   certain that they will eventually be necessary in order to
   successfully implement any URN scheme on the World-Wide Web.
   Some of the outstanding problems are identified below, though
   there are probably more.

7.1.  Changes to HTML to support URNs

   The HTML 2.0 specification [3] already defines an attribute of
   anchors and link elements for containing a URN.  However, no general
   client supports it, and its not what we really want anyway.  What we
   need is a way to assign multiple URIs to a single hypertext anchor.
   Fortunately, we don't need this right away, so it can be deferred
   to the HTML WG for consideration later.

7.2.  Name Persistence

   One of the "requirements" identified for URNs is that they be
   unique for all time (or at least a reasonable time such as to
   make name collision impossible).  This document completely
   ignores that issue, as I think should any real implementation
   of URNs.  Name persistence is not something that technology can
   guarantee, other than by the undesirable mechanism of assigning
   a new name based on the location and time of creation.  It is
   quite possible that some URN schemes will have such persistence,
   but it will be attained through the institutions responsible
   for assigning the names and maintaining the resolution services,
   not by constraining the syntax of names.

7.3.  Sub-second Resolution

   No constraints on resolution times are proposed, because they
   are simply unnecessary.  Nobody can determine the resolution time
   for any particular user at any particular network (or, egads,
   non-networked) site.  People will use the quickest (or cheapest)
   resolution available to them -- we do not need to define it in
   advance, nor should we.
   
7.3.  Security Considerations

   No security considerations have been identified by this document.
   This will require future work.

8.  Acknowledgements

   This paper is the result of over a year of thinking and only two
   days of writing, so I have left some things out and have probably
   failed to properly acknowledge all those who deserve to be.
   Tim Berners-Lee is primarily responsible for the extensible
   architecture of the World-Wide Web.  I have discussed the issues
   involved in URI indirection, and URCs as media types, with
   Daniel LaLiberte several times, but he is not to blame for this
   treatise.  Larry Masinter has pointed out several times that the
   WG is unable to "create" the institutions needed for true
   persistence.
   

9.  References

   [1] T. Berners-Lee, "Universal Resource Identifiers in WWW:
       A Unifying Syntax for the Expression of Names and Addresses of
       Objects on the Network as used in the World-Wide Web", RFC 1630,
       CERN, June 1994.

   [2] T. Berners-Lee, L. Masinter, and M. McCahill, Editors,
       "Uniform Resource Locators (URL)", RFC 1738, CERN, 
       Xerox Corporation, University of Minnesota, December 1994. 

   [3] T. Berners-Lee and D. Connolly, "HyperText Markup Language
       Specification -- 2.0", Work in Progress, MIT/W3C,
       June 1995.  <URL:http://www.ics.uci.edu/pub/ietf/html/>

   [4] N. Borenstein and N. Freed, "MIME (Multipurpose Internet Mail
       Extensions): Mechanisms for Specifying and Describing the Format
       of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
       September 1993.

   [5] J. Postel, "Media Type Registration Procedure", RFC 1590,
       USC/ISI, March 1994.

10.  Author's Address

   Roy T. Fielding
   Department of Information and Computer Science
   University of California
   Irvine, CA  92717-3425
   U.S.A.

   Tel: +1 (714) 824-4049
   Fax: +1 (714) 824-4056
   Email: fielding@ics.uci.edu

1	Uniform Resource Identifiers Working Group R. Fielding
2	INTERNET-DRAFT UC Irvine
3	Expires January 7, 1996 July 7, 1995
4
5
6	How Roy would Implement URNs and URCs Today
7	<draft-ietf-uri-roy-urn-urc-00.txt>
8
9
10	Status of this Memo
11
12	This document is an Internet-Draft. Internet-Drafts are working
13	documents of the Internet Engineering Task Force (IETF), its areas,
14	and its working groups. Note that other groups may also distribute
15	working documents as Internet-Drafts.
16
17	Internet-Drafts are draft documents valid for a maximum of six
18	months and may be updated, replaced, or obsoleted by other
19	documents at any time. It is inappropriate to use Internet-
20	Drafts as reference material or to cite them other than as
21	``work in progress.''
22
23	To learn the current status of any Internet-Draft, please check
24	the ``1id-abstracts.txt'' listing contained in the Internet-
25	Drafts Shadow Directories on ftp.is.co.za (Africa),
26	nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
27	ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).
28
29	Distribution of this document is unlimited. Please send comments
30	to the author, Roy T. Fielding <fielding@ics.uci.edu>, or to the
31	URI working group (URI-WG) of the Internet Engineering Task Force
32	(IETF) at <uri@bunyip.com>. Discussions of the group are archived at
33	<URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>.
34
35	This document has no formal status and should not be considered as
36	anything more than the opinions of the author. Although it is
37	hoped that someone will eventually implement these ideas, they are
38	nonetheless only ideas and are not intended as a standards track
39	document [which is why I have chosen such a strange title].
40
41
42	Abstract
43
44	This document describes how the author would implement Uniform
45	Resource Names (URNs) and Uniform Resource Characteristics (URCs),
46	such that the basic concepts and technology can be usable by today's
47	World-Wide Web clients and servers. It is intended to identify the
48	key ingredients which make the WWW extensible and open to the
49	introduction of URNs and URCs, and thereby steer the implementors
50	of URI technology toward more consistent solutions.
51
52	1. Introduction
53
54	The URI working group has been discussing the topic of Uniform
55	Resource Names (URNs) for over three years. Although the intentions
56	of those participating in the WG have always been good, and usually
57	constructive, the WG has failed to attain any consensus on how
58	a URN service can be implemented such that it satisfies everyone's
59	needs.
60
61	It is my opinion that this search for the "Holy Grail" of URNs is
62	both misguided and unnecessary. It is neither possible nor
63	appropriate for us to define a single URN service. Instead, the WG
64	should focus on the interfaces between clients, servers, and name
65	services, such that any reasonable form of naming service can be
66	introduced when they are available, and according to the needs of the
67	end users and content providers rather than those of the WG members.
68
69	The World-Wide Web already contains an architecture capable of
70	supporting the client and server interfaces necessary for URN
71	addressing, though these interfaces have rarely been defined as such.
72	This document is intended to remedy that situation. Furthermore,
73	it will attempt to identify how several URN services can be defined
74	and implemented today. Although these solutions will not solve
75	everyone's problems (including such issues as replication and
76	authentication of centralized name services), they do provide a
77	significant step forward and supply the infrastructure required by
78	all URN services.
79
80	This document assumes that the reader has knowledge of the basic
81	syntax of WWW Universal Resource Identifiers [1] and Uniform Resource
82	Locators (URLs) [2].
83
84	2. URI Syntax
85
86	The World-Wide Web architecture assumes that resource addresses are
87	identifiable by their scheme name. This applies to all URIs, not
88	just to what are commonly considered URLs today. A URI in absolute
89	form consists of
90
91	<scheme>:<scheme-specific-part>#<fragment>
92
93	where <scheme> contains only US-ASCII lowercase letters, digits, "+",
94	"-", or ".".
95
96	The scheme name identifies the handler routine which would be used
97	to resolve the address. Note that it does not necessarily define
98	the protocol to be used, although people commonly make that
99	assumption after seeing that the most common scheme names are
100	associated with preexisting Internet application protocols.
101
102	The scheme handler routine may exist internal to the client
103	application (either hardcoded or within a modular library
104	architecture such as that found in libwww or libwww-perl),
105	or may be redirected to a proxy application via environment variables
106	or other user-configurable devices. This ability to extend the
107	addressing schemes of clients is one of the key features of WWW
108	technology.
109
110	In order to be successfully implemented within the current base of
111	WWW technology, the URN syntax must correspond to the basic URI
112	syntax as described above. That is, it must start with a scheme name
113	which identifies an appropriate resolver for that address (or allows
114	the client to identify that it has no resolver for that address).
115
116	3. Media Types
117
118	After an address is resolved and a retrieval action has been
119	accomplished through the appropriate scheme handler, a World-Wide Web
120	client will choose a second handler routine for the retrieved
121	document. The document handler is chosen according to that
122	document's Internet media type [5]. The media type is either
123	assigned by the transfer protocol or guessed by the client.
124
125	The document handler routine may exist internal to the client
126	application, or may be redirected to an external application via the
127	MIME mailcap facility. Although most handler routines are simply
128	viewers for the document content, others exist that control internal
129	events or prompt the user for additional input. This ability to
130	extend the behavior of clients is another one of the key features of
131	WWW technology.
132
133	4. URCs are Documents
134
135	The notion of Uniform Resource Characteristics (URCs) has been one
136	of the central issues in the debate about URN services. Simply put,
137	a URC is a set of characteristics regarding a named resource, in a
138	format that can be easily parsed, which identifies a set of locations
139	from which the named resource may be obtained. The URC can then be
140	used as the intermediate step between resolving a URN address and
141	determining the most appropriate location (from the perspective of
142	the client configuration) from which to retrieve the resource.
143
144	Proposals for the format of a URC have ranged from a simple list of
145	URLs to a hierarchical query language. In all cases, however, a URC
146	can be considered a document, and therefore should be assigned an
147	appropriate media type. Furthermore, since it is impossible for any
148	one group to define a single, all-encompassing format for URCs which
149	will satisfy the needs of all archivists and content providers, it
150	will be necessary to define a range of media types.
151
152	Note that this view of URCs already fits well with the WWW
153	architecture. If a URC is labelled as such, a WWW client can perform
154	location redirection as part of the document handler routine.
155	In other words, we can have URN -> URC -> URL indirection working
156	with only minor changes to existing clients.
157
158	Unfortunately, that's still not good enough. Current browsing clients
159	will default to "application/octet-stream" if they do not have a
160	handler routine installed for the indicated media type (usually
161	resulting in a prompt to save the document as a local file). In
162	practice, this has been a barrier to the wholesale introduction of
163	new media types. We need an implementation of URCs that will work
164	with all existing clients, because without that assurance, content
165	providers will be unwilling to use URCs as an intermediate step.
166
167	The solution is to start with an intermediate form of URC which
168	is a fixed variant of an already-universal media type: text/html.
169	This is outlined below in Section 6.
170
171	5. URI Resolution Architecture
172
173	But wait, there's more!
174
175	If a URC is identifiable as a document, then any document retrieval
176	action may result in an indirection. Therefore, we are no longer
177	talking about just URN resolution via URCs, but also URL redirection
178	via URCs (i.e., redirection of a single URL to multiple variants),
179	URN resolution to a single URL (i.e., minimal URCs), and URN
180	resolution directly to the named resource. As far as the client is
181	concerned, it is just using a URI to retrieve a resource. All of the
182	details of the resolution mechanism remain internal to the scheme
183	handler and the URN service provider, thereby removing the need for
184	the IETF to attempt to standardize any particular scheme, or any
185	particular URN service.
186
187	6. Graceful Introduction of URNs and URCs
188
189	Well, its not all just a bed of roses -- there are plenty of thorns
190	that need to be smoothed out in order to promulgate widespread
191	implementations of URNs and URCs over the existing WWW. The
192	following sections outline the steps I would take.
193
194	6.1. The ietf URI scheme
195
196	The first thing we need is a simple, but worthwhile, mechanism for
197	testing these ideas. I suggest that we should define a new URI
198	scheme called "ietf" -- it's purpose would be to provide a single
199	identifier for the replicated archives of the Internet Engineering
200	Taskforce. The format for this identifier is simply:
201
202	"ietf" ":" <existing-ietf-path>
203
204	For example, the identifier of RFC 1808 would become
205
206	ietf:/rfc/rfc1808.txt
207
208	and the one for this draft would be
209
210	ietf:/internet-drafts/draft-ietf-uri-roy-urn-urc-00.txt
211
212	The implementation of the scheme handler is a fairly straightforward
213	address replacement table and associated logic. For example, the
214	following could act as the configuration for my local client:
215
216	PREFIX REPLACEMENT AUTHORITATIVE?
217	ietf: file:/home/fielding/ietf No
218	ietf:/rfc/ ftp://ftp.ics.uci.edu/pub/ietf/rfc/ No
219	ietf:/rfc/ http://info.internet.isi.edu/in-notes/rfc/ No
220	ietf: http://ds.internic.net Yes
221	ietf: ftp://ds.internic.net Yes
222
223	The retrieval logic behind this table is also simple: try each of the
224	matching URI addresses (replacing the matching prefix with the
225	replacement string) until a good response is received, or until a
226	"not found" response is received from an authoritative location.
227
228	Note that the first location points to my own personal archive -- the
229	place where I keep a copy of most of the specs I have referenced in
230	my past work (or anticipate referencing in the near future).
231	Clearly, I want to retrieve my local copy if I have one available.
232	The second address is also a local copy, but consisting of only RFCs
233	and maintained by others at UC Irvine working on Internet Mail and
234	network management issues. The ISI archive is also fairly close to
235	my (network and physical) location, but uses a slightly different
236	path and tends to be 1-2 days out-of-sync with the main IETF
237	archives, which are represented as the final two locations.
238
239	There are a couple of interesting features of this example which have
240	rarely been considered during past discussion of URN issues. The
241	first is that the table is particular to my own client setup. There
242	is no way for a centralized name service to know these details.
243	The second is that the table format could be generic to any URI which
244	can be resolved directly via some other URL (such as, for instance,
245	via the URL of a URN name service). Finally, note that the actual
246	protocol used to resolve the name is defined by the replacement URL,
247	and not by any decision of the WG.
248
249	6.2. The ietf URCs
250
251	The above example did not assume any changes to the existing IETF
252	archive namespace. However, we could get considerably more value
253	out of this scheme if partial name matching resulted in a URC.
254	For example, if the following name
255
256	ietf:/internet-drafts/draft-ietf-uri-roy-urn-urc
257
258	(note the missing "-00.txt") corresponded to a URC pointing to all
259	of the currently available format variants of this draft, then I
260	could avoid having to change references every time a new version is
261	placed in the archives. Similarly,
262
263	ietf:/internet-drafts/draft-ietf-uri
264
265	could point to a summary of all current drafts by the URI-WG, and
266
267	ietf:/rfc/rfc1521
268
269	could point to all format variants of RFC 1521.
270
271	6.3. The urc major media type
272
273	If URCs are to be given media types, we need to register them. MIME
274	provides four major types: text, application, multipart, message,
275	image, audio, and video [4]. However, it is clear that URCs do
276	not fit within any one of these categories, and that subtypes of URC
277	are desirable. Therefore, I suggest that we define a new major media
278	type called "urc".
279
280	RFC 1590 [5] states that "If a new fundamental top-level type is
281	needed, its specification must be published as an RFC or submitted in
282	a form suitable to become an RFC, and be subject to the Internet
283	standards process." We'll just put that on the to-do list.
284
285	6.4. The urc/html media type
286
287	The first URC format that must be defined is one which will not
288	adversely affect current WWW clients. Therefore, we need to define
289	a variant of HTML which will look like a menu on existing browsers,
290	and yet be machine recognizable as a URC by new browsers. We can
291	do this by using a fixed format and require a specific SGML DOCTYPE
292	declaration to appear as the first line of the URC document.
293	For starters, here is what one may look like:
294
295	<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML URC//EN">
296	<HTML><HEAD>
297	<TITLE>Available resources for ietf:/rfc/rfc1521</TITLE>
298	</HEAD><BODY>
299	<H1>ietf:/rfc/rfc1521</H1>
300	<DL COMPACT>
301	<DT>Title:
302	<DD>MIME (Multipurpose Internet Mail Extensions)
303	Part One: Mechanisms for Specifying and Describing the
304	Format of Internet Message Bodies
305	<DT>Author:
306	<DD>N. Borenstein
307	<DD>N. Freed
308	<DT>Date:
309	<DD>September 1993
310	<DT>Obsoletes:
311	<DD><A rel="obsoletes" href="ietf:/rfc/rfc1341">RFC 1341</A>
312	<DT>Updated-by:
313	<DD><A rev="updates" href="ietf:/rfc/rfc1590">RFC 1590</A>
314	</DL>
315	<MENU vary="location">
316	<LI>ftp.is.co.za (Africa)
317	<MENU vary="type">
318	<LI><A href="ftp://ftp.is.co.za/rfc/rfc1521.txt.gz">
319	gzip(text/plain), 20000 bytes</a>
320	<LI><A href="ftp://ftp.is.co.za/rfc/rfc1521.ps.gz">
321	gzip(application/postscript), 40000 bytes</A>
322	</MENU>
323	<LI>nic.nordu.net (Europe)
324	<MENU vary="type">
325	<LI><A href="ftp://nic.nordu.net/rfc/rfc1521.txt">
326	text/plain, 187424 bytes</a>
327	<LI><A href="ftp://nic.nordu.net/rfc/rfc1521.ps">
328	application/postscript, 393670 bytes</A>
329	</MENU>
330	<LI>munnari.oz.au (Pacific Rim)
331	<MENU vary="type">
332	<LI><A href="ftp://munnari.oz.au/rfc/rfc1521.txt">
333	text/plain, 187424 bytes</a>
334	<LI><A href="ftp://munnari.oz.au/rfc/rfc1521.ps">
335	application/postscript, 393670 bytes</A>
336	</MENU>
337	<LI>ds.internic.net (US East Coast)
338	<MENU vary="type">
339	<LI><A href="http://ds.internic.net/rfc/rfc1521.txt">
340	text/plain, 187424 bytes</a>
341	<LI><A href="http://ds.internic.net/rfc/rfc1521.ps">
342	application/postscript, 393670 bytes</A>
343	<LI><A href="ftp://ds.internic.net/rfc/rfc1521.txt">
344	text/plain, 187424 bytes</a>
345	<LI><A href="ftp://ds.internic.net/rfc/rfc1521.ps">
346	application/postscript, 393670 bytes</A>
347	</MENU>
348	<LI>ftp.isi.edu (US West Coast)
349	<MENU vary="type">
350	<LI><A href="ftp://ftp.isi.edu/rfc/rfc1521.txt">
351	text/plain, 187424 bytes</a>
352	<LI><A href="ftp://ftp.isi.edu/rfc/rfc1521.ps">
353	application/postscript, 393670 bytes</A>
354	</MENU>
355	</MENU>
356	</BODY></HTML>
357
358	This is only an example -- a complete definition (including BNF)
359	would be necessary for the format to be usable for automated
360	indirection.
361
362	7. Unfinished Business
363
364	I do not pretend to think that the suggestions identified by this
365	document will completely solve the URN problem. However, I am
366	certain that they will eventually be necessary in order to
367	successfully implement any URN scheme on the World-Wide Web.
368	Some of the outstanding problems are identified below, though
369	there are probably more.
370
371	7.1. Changes to HTML to support URNs
372
373	The HTML 2.0 specification [3] already defines an attribute of
374	anchors and link elements for containing a URN. However, no general
375	client supports it, and its not what we really want anyway. What we
376	need is a way to assign multiple URIs to a single hypertext anchor.
377	Fortunately, we don't need this right away, so it can be deferred
378	to the HTML WG for consideration later.
379
380	7.2. Name Persistence
381
382	One of the "requirements" identified for URNs is that they be
383	unique for all time (or at least a reasonable time such as to
384	make name collision impossible). This document completely
385	ignores that issue, as I think should any real implementation
386	of URNs. Name persistence is not something that technology can
387	guarantee, other than by the undesirable mechanism of assigning
388	a new name based on the location and time of creation. It is
389	quite possible that some URN schemes will have such persistence,
390	but it will be attained through the institutions responsible
391	for assigning the names and maintaining the resolution services,
392	not by constraining the syntax of names.
393
394	7.3. Sub-second Resolution
395
396	No constraints on resolution times are proposed, because they
397	are simply unnecessary. Nobody can determine the resolution time
398	for any particular user at any particular network (or, egads,
399	non-networked) site. People will use the quickest (or cheapest)
400	resolution available to them -- we do not need to define it in
401	advance, nor should we.
402
403	7.3. Security Considerations
404
405	No security considerations have been identified by this document.
406	This will require future work.
407
408	8. Acknowledgements
409
410	This paper is the result of over a year of thinking and only two
411	days of writing, so I have left some things out and have probably
412	failed to properly acknowledge all those who deserve to be.
413	Tim Berners-Lee is primarily responsible for the extensible
414	architecture of the World-Wide Web. I have discussed the issues
415	involved in URI indirection, and URCs as media types, with
416	Daniel LaLiberte several times, but he is not to blame for this
417	treatise. Larry Masinter has pointed out several times that the
418	WG is unable to "create" the institutions needed for true
419	persistence.
420
421
422	9. References
423
424	[1] T. Berners-Lee, "Universal Resource Identifiers in WWW:
425	A Unifying Syntax for the Expression of Names and Addresses of
426	Objects on the Network as used in the World-Wide Web", RFC 1630,
427	CERN, June 1994.
428
429	[2] T. Berners-Lee, L. Masinter, and M. McCahill, Editors,
430	"Uniform Resource Locators (URL)", RFC 1738, CERN,
431	Xerox Corporation, University of Minnesota, December 1994.
432
433	[3] T. Berners-Lee and D. Connolly, "HyperText Markup Language
434	Specification -- 2.0", Work in Progress, MIT/W3C,
435	June 1995. <URL:http://www.ics.uci.edu/pub/ietf/html/>
436
437	[4] N. Borenstein and N. Freed, "MIME (Multipurpose Internet Mail
438	Extensions): Mechanisms for Specifying and Describing the Format
439	of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
440	September 1993.
441
442	[5] J. Postel, "Media Type Registration Procedure", RFC 1590,
443	USC/ISI, March 1994.
444
445	10. Author's Address
446
447	Roy T. Fielding
448	Department of Information and Computer Science
449	University of California
450	Irvine, CA 92717-3425
451	U.S.A.
452
453	Tel: +1 (714) 824-4049
454	Fax: +1 (714) 824-4056
455	Email: fielding@ics.uci.edu
456