1 |
Uniform Resource Identifiers Working Group R. Fielding |
2 |
INTERNET-DRAFT UC Irvine |
3 |
Expires January 7, 1996 July 7, 1995 |
4 |
|
5 |
|
6 |
How Roy would Implement URNs and URCs Today |
7 |
<draft-ietf-uri-roy-urn-urc-00.txt> |
8 |
|
9 |
|
10 |
Status of this Memo |
11 |
|
12 |
This document is an Internet-Draft. Internet-Drafts are working |
13 |
documents of the Internet Engineering Task Force (IETF), its areas, |
14 |
and its working groups. Note that other groups may also distribute |
15 |
working documents as Internet-Drafts. |
16 |
|
17 |
Internet-Drafts are draft documents valid for a maximum of six |
18 |
months and may be updated, replaced, or obsoleted by other |
19 |
documents at any time. It is inappropriate to use Internet- |
20 |
Drafts as reference material or to cite them other than as |
21 |
``work in progress.'' |
22 |
|
23 |
To learn the current status of any Internet-Draft, please check |
24 |
the ``1id-abstracts.txt'' listing contained in the Internet- |
25 |
Drafts Shadow Directories on ftp.is.co.za (Africa), |
26 |
nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), |
27 |
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). |
28 |
|
29 |
Distribution of this document is unlimited. Please send comments |
30 |
to the author, Roy T. Fielding <fielding@ics.uci.edu>, or to the |
31 |
URI working group (URI-WG) of the Internet Engineering Task Force |
32 |
(IETF) at <uri@bunyip.com>. Discussions of the group are archived at |
33 |
<URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>. |
34 |
|
35 |
This document has no formal status and should not be considered as |
36 |
anything more than the opinions of the author. Although it is |
37 |
hoped that someone will eventually implement these ideas, they are |
38 |
nonetheless only ideas and are not intended as a standards track |
39 |
document [which is why I have chosen such a strange title]. |
40 |
|
41 |
|
42 |
Abstract |
43 |
|
44 |
This document describes how the author would implement Uniform |
45 |
Resource Names (URNs) and Uniform Resource Characteristics (URCs), |
46 |
such that the basic concepts and technology can be usable by today's |
47 |
World-Wide Web clients and servers. It is intended to identify the |
48 |
key ingredients which make the WWW extensible and open to the |
49 |
introduction of URNs and URCs, and thereby steer the implementors |
50 |
of URI technology toward more consistent solutions. |
51 |
|
52 |
1. Introduction |
53 |
|
54 |
The URI working group has been discussing the topic of Uniform |
55 |
Resource Names (URNs) for over three years. Although the intentions |
56 |
of those participating in the WG have always been good, and usually |
57 |
constructive, the WG has failed to attain any consensus on how |
58 |
a URN service can be implemented such that it satisfies everyone's |
59 |
needs. |
60 |
|
61 |
It is my opinion that this search for the "Holy Grail" of URNs is |
62 |
both misguided and unnecessary. It is neither possible nor |
63 |
appropriate for us to define a single URN service. Instead, the WG |
64 |
should focus on the interfaces between clients, servers, and name |
65 |
services, such that any reasonable form of naming service can be |
66 |
introduced when they are available, and according to the needs of the |
67 |
end users and content providers rather than those of the WG members. |
68 |
|
69 |
The World-Wide Web already contains an architecture capable of |
70 |
supporting the client and server interfaces necessary for URN |
71 |
addressing, though these interfaces have rarely been defined as such. |
72 |
This document is intended to remedy that situation. Furthermore, |
73 |
it will attempt to identify how several URN services can be defined |
74 |
and implemented today. Although these solutions will not solve |
75 |
everyone's problems (including such issues as replication and |
76 |
authentication of centralized name services), they do provide a |
77 |
significant step forward and supply the infrastructure required by |
78 |
all URN services. |
79 |
|
80 |
This document assumes that the reader has knowledge of the basic |
81 |
syntax of WWW Universal Resource Identifiers [1] and Uniform Resource |
82 |
Locators (URLs) [2]. |
83 |
|
84 |
2. URI Syntax |
85 |
|
86 |
The World-Wide Web architecture assumes that resource addresses are |
87 |
identifiable by their scheme name. This applies to all URIs, not |
88 |
just to what are commonly considered URLs today. A URI in absolute |
89 |
form consists of |
90 |
|
91 |
<scheme>:<scheme-specific-part>#<fragment> |
92 |
|
93 |
where <scheme> contains only US-ASCII lowercase letters, digits, "+", |
94 |
"-", or ".". |
95 |
|
96 |
The scheme name identifies the handler routine which would be used |
97 |
to resolve the address. Note that it does not necessarily define |
98 |
the protocol to be used, although people commonly make that |
99 |
assumption after seeing that the most common scheme names are |
100 |
associated with preexisting Internet application protocols. |
101 |
|
102 |
The scheme handler routine may exist internal to the client |
103 |
application (either hardcoded or within a modular library |
104 |
architecture such as that found in libwww or libwww-perl), |
105 |
or may be redirected to a proxy application via environment variables |
106 |
or other user-configurable devices. This ability to extend the |
107 |
addressing schemes of clients is one of the key features of WWW |
108 |
technology. |
109 |
|
110 |
In order to be successfully implemented within the current base of |
111 |
WWW technology, the URN syntax must correspond to the basic URI |
112 |
syntax as described above. That is, it must start with a scheme name |
113 |
which identifies an appropriate resolver for that address (or allows |
114 |
the client to identify that it has no resolver for that address). |
115 |
|
116 |
3. Media Types |
117 |
|
118 |
After an address is resolved and a retrieval action has been |
119 |
accomplished through the appropriate scheme handler, a World-Wide Web |
120 |
client will choose a second handler routine for the retrieved |
121 |
document. The document handler is chosen according to that |
122 |
document's Internet media type [5]. The media type is either |
123 |
assigned by the transfer protocol or guessed by the client. |
124 |
|
125 |
The document handler routine may exist internal to the client |
126 |
application, or may be redirected to an external application via the |
127 |
MIME mailcap facility. Although most handler routines are simply |
128 |
viewers for the document content, others exist that control internal |
129 |
events or prompt the user for additional input. This ability to |
130 |
extend the behavior of clients is another one of the key features of |
131 |
WWW technology. |
132 |
|
133 |
4. URCs are Documents |
134 |
|
135 |
The notion of Uniform Resource Characteristics (URCs) has been one |
136 |
of the central issues in the debate about URN services. Simply put, |
137 |
a URC is a set of characteristics regarding a named resource, in a |
138 |
format that can be easily parsed, which identifies a set of locations |
139 |
from which the named resource may be obtained. The URC can then be |
140 |
used as the intermediate step between resolving a URN address and |
141 |
determining the most appropriate location (from the perspective of |
142 |
the client configuration) from which to retrieve the resource. |
143 |
|
144 |
Proposals for the format of a URC have ranged from a simple list of |
145 |
URLs to a hierarchical query language. In all cases, however, a URC |
146 |
can be considered a document, and therefore should be assigned an |
147 |
appropriate media type. Furthermore, since it is impossible for any |
148 |
one group to define a single, all-encompassing format for URCs which |
149 |
will satisfy the needs of all archivists and content providers, it |
150 |
will be necessary to define a range of media types. |
151 |
|
152 |
Note that this view of URCs already fits well with the WWW |
153 |
architecture. If a URC is labelled as such, a WWW client can perform |
154 |
location redirection as part of the document handler routine. |
155 |
In other words, we can have URN -> URC -> URL indirection working |
156 |
with only minor changes to existing clients. |
157 |
|
158 |
Unfortunately, that's still not good enough. Current browsing clients |
159 |
will default to "application/octet-stream" if they do not have a |
160 |
handler routine installed for the indicated media type (usually |
161 |
resulting in a prompt to save the document as a local file). In |
162 |
practice, this has been a barrier to the wholesale introduction of |
163 |
new media types. We need an implementation of URCs that will work |
164 |
with all existing clients, because without that assurance, content |
165 |
providers will be unwilling to use URCs as an intermediate step. |
166 |
|
167 |
The solution is to start with an intermediate form of URC which |
168 |
is a fixed variant of an already-universal media type: text/html. |
169 |
This is outlined below in Section 6. |
170 |
|
171 |
5. URI Resolution Architecture |
172 |
|
173 |
But wait, there's more! |
174 |
|
175 |
If a URC is identifiable as a document, then any document retrieval |
176 |
action may result in an indirection. Therefore, we are no longer |
177 |
talking about just URN resolution via URCs, but also URL redirection |
178 |
via URCs (i.e., redirection of a single URL to multiple variants), |
179 |
URN resolution to a single URL (i.e., minimal URCs), and URN |
180 |
resolution directly to the named resource. As far as the client is |
181 |
concerned, it is just using a URI to retrieve a resource. All of the |
182 |
details of the resolution mechanism remain internal to the scheme |
183 |
handler and the URN service provider, thereby removing the need for |
184 |
the IETF to attempt to standardize any particular scheme, or any |
185 |
particular URN service. |
186 |
|
187 |
6. Graceful Introduction of URNs and URCs |
188 |
|
189 |
Well, its not all just a bed of roses -- there are plenty of thorns |
190 |
that need to be smoothed out in order to promulgate widespread |
191 |
implementations of URNs and URCs over the existing WWW. The |
192 |
following sections outline the steps I would take. |
193 |
|
194 |
6.1. The ietf URI scheme |
195 |
|
196 |
The first thing we need is a simple, but worthwhile, mechanism for |
197 |
testing these ideas. I suggest that we should define a new URI |
198 |
scheme called "ietf" -- it's purpose would be to provide a single |
199 |
identifier for the replicated archives of the Internet Engineering |
200 |
Taskforce. The format for this identifier is simply: |
201 |
|
202 |
"ietf" ":" <existing-ietf-path> |
203 |
|
204 |
For example, the identifier of RFC 1808 would become |
205 |
|
206 |
ietf:/rfc/rfc1808.txt |
207 |
|
208 |
and the one for this draft would be |
209 |
|
210 |
ietf:/internet-drafts/draft-ietf-uri-roy-urn-urc-00.txt |
211 |
|
212 |
The implementation of the scheme handler is a fairly straightforward |
213 |
address replacement table and associated logic. For example, the |
214 |
following could act as the configuration for my local client: |
215 |
|
216 |
PREFIX REPLACEMENT AUTHORITATIVE? |
217 |
ietf: file:/home/fielding/ietf No |
218 |
ietf:/rfc/ ftp://ftp.ics.uci.edu/pub/ietf/rfc/ No |
219 |
ietf:/rfc/ http://info.internet.isi.edu/in-notes/rfc/ No |
220 |
ietf: http://ds.internic.net Yes |
221 |
ietf: ftp://ds.internic.net Yes |
222 |
|
223 |
The retrieval logic behind this table is also simple: try each of the |
224 |
matching URI addresses (replacing the matching prefix with the |
225 |
replacement string) until a good response is received, or until a |
226 |
"not found" response is received from an authoritative location. |
227 |
|
228 |
Note that the first location points to my own personal archive -- the |
229 |
place where I keep a copy of most of the specs I have referenced in |
230 |
my past work (or anticipate referencing in the near future). |
231 |
Clearly, I want to retrieve my local copy if I have one available. |
232 |
The second address is also a local copy, but consisting of only RFCs |
233 |
and maintained by others at UC Irvine working on Internet Mail and |
234 |
network management issues. The ISI archive is also fairly close to |
235 |
my (network and physical) location, but uses a slightly different |
236 |
path and tends to be 1-2 days out-of-sync with the main IETF |
237 |
archives, which are represented as the final two locations. |
238 |
|
239 |
There are a couple of interesting features of this example which have |
240 |
rarely been considered during past discussion of URN issues. The |
241 |
first is that the table is particular to my own client setup. There |
242 |
is no way for a centralized name service to know these details. |
243 |
The second is that the table format could be generic to any URI which |
244 |
can be resolved directly via some other URL (such as, for instance, |
245 |
via the URL of a URN name service). Finally, note that the actual |
246 |
protocol used to resolve the name is defined by the replacement URL, |
247 |
and not by any decision of the WG. |
248 |
|
249 |
6.2. The ietf URCs |
250 |
|
251 |
The above example did not assume any changes to the existing IETF |
252 |
archive namespace. However, we could get considerably more value |
253 |
out of this scheme if partial name matching resulted in a URC. |
254 |
For example, if the following name |
255 |
|
256 |
ietf:/internet-drafts/draft-ietf-uri-roy-urn-urc |
257 |
|
258 |
(note the missing "-00.txt") corresponded to a URC pointing to all |
259 |
of the currently available format variants of this draft, then I |
260 |
could avoid having to change references every time a new version is |
261 |
placed in the archives. Similarly, |
262 |
|
263 |
ietf:/internet-drafts/draft-ietf-uri |
264 |
|
265 |
could point to a summary of all current drafts by the URI-WG, and |
266 |
|
267 |
ietf:/rfc/rfc1521 |
268 |
|
269 |
could point to all format variants of RFC 1521. |
270 |
|
271 |
6.3. The urc major media type |
272 |
|
273 |
If URCs are to be given media types, we need to register them. MIME |
274 |
provides four major types: text, application, multipart, message, |
275 |
image, audio, and video [4]. However, it is clear that URCs do |
276 |
not fit within any one of these categories, and that subtypes of URC |
277 |
are desirable. Therefore, I suggest that we define a new major media |
278 |
type called "urc". |
279 |
|
280 |
RFC 1590 [5] states that "If a new fundamental top-level type is |
281 |
needed, its specification must be published as an RFC or submitted in |
282 |
a form suitable to become an RFC, and be subject to the Internet |
283 |
standards process." We'll just put that on the to-do list. |
284 |
|
285 |
6.4. The urc/html media type |
286 |
|
287 |
The first URC format that must be defined is one which will not |
288 |
adversely affect current WWW clients. Therefore, we need to define |
289 |
a variant of HTML which will look like a menu on existing browsers, |
290 |
and yet be machine recognizable as a URC by new browsers. We can |
291 |
do this by using a fixed format and require a specific SGML DOCTYPE |
292 |
declaration to appear as the first line of the URC document. |
293 |
For starters, here is what one may look like: |
294 |
|
295 |
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML URC//EN"> |
296 |
<HTML><HEAD> |
297 |
<TITLE>Available resources for ietf:/rfc/rfc1521</TITLE> |
298 |
</HEAD><BODY> |
299 |
<H1>ietf:/rfc/rfc1521</H1> |
300 |
<DL COMPACT> |
301 |
<DT>Title: |
302 |
<DD>MIME (Multipurpose Internet Mail Extensions) |
303 |
Part One: Mechanisms for Specifying and Describing the |
304 |
Format of Internet Message Bodies |
305 |
<DT>Author: |
306 |
<DD>N. Borenstein |
307 |
<DD>N. Freed |
308 |
<DT>Date: |
309 |
<DD>September 1993 |
310 |
<DT>Obsoletes: |
311 |
<DD><A rel="obsoletes" href="ietf:/rfc/rfc1341">RFC 1341</A> |
312 |
<DT>Updated-by: |
313 |
<DD><A rev="updates" href="ietf:/rfc/rfc1590">RFC 1590</A> |
314 |
</DL> |
315 |
<MENU vary="location"> |
316 |
<LI>ftp.is.co.za (Africa) |
317 |
<MENU vary="type"> |
318 |
<LI><A href="ftp://ftp.is.co.za/rfc/rfc1521.txt.gz"> |
319 |
gzip(text/plain), 20000 bytes</a> |
320 |
<LI><A href="ftp://ftp.is.co.za/rfc/rfc1521.ps.gz"> |
321 |
gzip(application/postscript), 40000 bytes</A> |
322 |
</MENU> |
323 |
<LI>nic.nordu.net (Europe) |
324 |
<MENU vary="type"> |
325 |
<LI><A href="ftp://nic.nordu.net/rfc/rfc1521.txt"> |
326 |
text/plain, 187424 bytes</a> |
327 |
<LI><A href="ftp://nic.nordu.net/rfc/rfc1521.ps"> |
328 |
application/postscript, 393670 bytes</A> |
329 |
</MENU> |
330 |
<LI>munnari.oz.au (Pacific Rim) |
331 |
<MENU vary="type"> |
332 |
<LI><A href="ftp://munnari.oz.au/rfc/rfc1521.txt"> |
333 |
text/plain, 187424 bytes</a> |
334 |
<LI><A href="ftp://munnari.oz.au/rfc/rfc1521.ps"> |
335 |
application/postscript, 393670 bytes</A> |
336 |
</MENU> |
337 |
<LI>ds.internic.net (US East Coast) |
338 |
<MENU vary="type"> |
339 |
<LI><A href="http://ds.internic.net/rfc/rfc1521.txt"> |
340 |
text/plain, 187424 bytes</a> |
341 |
<LI><A href="http://ds.internic.net/rfc/rfc1521.ps"> |
342 |
application/postscript, 393670 bytes</A> |
343 |
<LI><A href="ftp://ds.internic.net/rfc/rfc1521.txt"> |
344 |
text/plain, 187424 bytes</a> |
345 |
<LI><A href="ftp://ds.internic.net/rfc/rfc1521.ps"> |
346 |
application/postscript, 393670 bytes</A> |
347 |
</MENU> |
348 |
<LI>ftp.isi.edu (US West Coast) |
349 |
<MENU vary="type"> |
350 |
<LI><A href="ftp://ftp.isi.edu/rfc/rfc1521.txt"> |
351 |
text/plain, 187424 bytes</a> |
352 |
<LI><A href="ftp://ftp.isi.edu/rfc/rfc1521.ps"> |
353 |
application/postscript, 393670 bytes</A> |
354 |
</MENU> |
355 |
</MENU> |
356 |
</BODY></HTML> |
357 |
|
358 |
This is only an example -- a complete definition (including BNF) |
359 |
would be necessary for the format to be usable for automated |
360 |
indirection. |
361 |
|
362 |
7. Unfinished Business |
363 |
|
364 |
I do not pretend to think that the suggestions identified by this |
365 |
document will completely solve the URN problem. However, I am |
366 |
certain that they will eventually be necessary in order to |
367 |
successfully implement any URN scheme on the World-Wide Web. |
368 |
Some of the outstanding problems are identified below, though |
369 |
there are probably more. |
370 |
|
371 |
7.1. Changes to HTML to support URNs |
372 |
|
373 |
The HTML 2.0 specification [3] already defines an attribute of |
374 |
anchors and link elements for containing a URN. However, no general |
375 |
client supports it, and its not what we really want anyway. What we |
376 |
need is a way to assign multiple URIs to a single hypertext anchor. |
377 |
Fortunately, we don't need this right away, so it can be deferred |
378 |
to the HTML WG for consideration later. |
379 |
|
380 |
7.2. Name Persistence |
381 |
|
382 |
One of the "requirements" identified for URNs is that they be |
383 |
unique for all time (or at least a reasonable time such as to |
384 |
make name collision impossible). This document completely |
385 |
ignores that issue, as I think should any real implementation |
386 |
of URNs. Name persistence is not something that technology can |
387 |
guarantee, other than by the undesirable mechanism of assigning |
388 |
a new name based on the location and time of creation. It is |
389 |
quite possible that some URN schemes will have such persistence, |
390 |
but it will be attained through the institutions responsible |
391 |
for assigning the names and maintaining the resolution services, |
392 |
not by constraining the syntax of names. |
393 |
|
394 |
7.3. Sub-second Resolution |
395 |
|
396 |
No constraints on resolution times are proposed, because they |
397 |
are simply unnecessary. Nobody can determine the resolution time |
398 |
for any particular user at any particular network (or, egads, |
399 |
non-networked) site. People will use the quickest (or cheapest) |
400 |
resolution available to them -- we do not need to define it in |
401 |
advance, nor should we. |
402 |
|
403 |
7.3. Security Considerations |
404 |
|
405 |
No security considerations have been identified by this document. |
406 |
This will require future work. |
407 |
|
408 |
8. Acknowledgements |
409 |
|
410 |
This paper is the result of over a year of thinking and only two |
411 |
days of writing, so I have left some things out and have probably |
412 |
failed to properly acknowledge all those who deserve to be. |
413 |
Tim Berners-Lee is primarily responsible for the extensible |
414 |
architecture of the World-Wide Web. I have discussed the issues |
415 |
involved in URI indirection, and URCs as media types, with |
416 |
Daniel LaLiberte several times, but he is not to blame for this |
417 |
treatise. Larry Masinter has pointed out several times that the |
418 |
WG is unable to "create" the institutions needed for true |
419 |
persistence. |
420 |
|
421 |
|
422 |
9. References |
423 |
|
424 |
[1] T. Berners-Lee, "Universal Resource Identifiers in WWW: |
425 |
A Unifying Syntax for the Expression of Names and Addresses of |
426 |
Objects on the Network as used in the World-Wide Web", RFC 1630, |
427 |
CERN, June 1994. |
428 |
|
429 |
[2] T. Berners-Lee, L. Masinter, and M. McCahill, Editors, |
430 |
"Uniform Resource Locators (URL)", RFC 1738, CERN, |
431 |
Xerox Corporation, University of Minnesota, December 1994. |
432 |
|
433 |
[3] T. Berners-Lee and D. Connolly, "HyperText Markup Language |
434 |
Specification -- 2.0", Work in Progress, MIT/W3C, |
435 |
June 1995. <URL:http://www.ics.uci.edu/pub/ietf/html/> |
436 |
|
437 |
[4] N. Borenstein and N. Freed, "MIME (Multipurpose Internet Mail |
438 |
Extensions): Mechanisms for Specifying and Describing the Format |
439 |
of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, |
440 |
September 1993. |
441 |
|
442 |
[5] J. Postel, "Media Type Registration Procedure", RFC 1590, |
443 |
USC/ISI, March 1994. |
444 |
|
445 |
10. Author's Address |
446 |
|
447 |
Roy T. Fielding |
448 |
Department of Information and Computer Science |
449 |
University of California |
450 |
Irvine, CA 92717-3425 |
451 |
U.S.A. |
452 |
|
453 |
Tel: +1 (714) 824-4049 |
454 |
Fax: +1 (714) 824-4056 |
455 |
Email: fielding@ics.uci.edu |
456 |
|