/[suikacvs]/webroot/www/2004/id/draft-ietf-uri-roy-urn-urc-00.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-uri-roy-urn-urc-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:37:16 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 Uniform Resource Identifiers Working Group R. Fielding
2 INTERNET-DRAFT UC Irvine
3 Expires January 7, 1996 July 7, 1995
4
5
6 How Roy would Implement URNs and URCs Today
7 <draft-ietf-uri-roy-urn-urc-00.txt>
8
9
10 Status of this Memo
11
12 This document is an Internet-Draft. Internet-Drafts are working
13 documents of the Internet Engineering Task Force (IETF), its areas,
14 and its working groups. Note that other groups may also distribute
15 working documents as Internet-Drafts.
16
17 Internet-Drafts are draft documents valid for a maximum of six
18 months and may be updated, replaced, or obsoleted by other
19 documents at any time. It is inappropriate to use Internet-
20 Drafts as reference material or to cite them other than as
21 ``work in progress.''
22
23 To learn the current status of any Internet-Draft, please check
24 the ``1id-abstracts.txt'' listing contained in the Internet-
25 Drafts Shadow Directories on ftp.is.co.za (Africa),
26 nic.nordu.net (Europe), munnari.oz.au (Pacific Rim),
27 ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).
28
29 Distribution of this document is unlimited. Please send comments
30 to the author, Roy T. Fielding <fielding@ics.uci.edu>, or to the
31 URI working group (URI-WG) of the Internet Engineering Task Force
32 (IETF) at <uri@bunyip.com>. Discussions of the group are archived at
33 <URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>.
34
35 This document has no formal status and should not be considered as
36 anything more than the opinions of the author. Although it is
37 hoped that someone will eventually implement these ideas, they are
38 nonetheless only ideas and are not intended as a standards track
39 document [which is why I have chosen such a strange title].
40
41
42 Abstract
43
44 This document describes how the author would implement Uniform
45 Resource Names (URNs) and Uniform Resource Characteristics (URCs),
46 such that the basic concepts and technology can be usable by today's
47 World-Wide Web clients and servers. It is intended to identify the
48 key ingredients which make the WWW extensible and open to the
49 introduction of URNs and URCs, and thereby steer the implementors
50 of URI technology toward more consistent solutions.
51
52 1. Introduction
53
54 The URI working group has been discussing the topic of Uniform
55 Resource Names (URNs) for over three years. Although the intentions
56 of those participating in the WG have always been good, and usually
57 constructive, the WG has failed to attain any consensus on how
58 a URN service can be implemented such that it satisfies everyone's
59 needs.
60
61 It is my opinion that this search for the "Holy Grail" of URNs is
62 both misguided and unnecessary. It is neither possible nor
63 appropriate for us to define a single URN service. Instead, the WG
64 should focus on the interfaces between clients, servers, and name
65 services, such that any reasonable form of naming service can be
66 introduced when they are available, and according to the needs of the
67 end users and content providers rather than those of the WG members.
68
69 The World-Wide Web already contains an architecture capable of
70 supporting the client and server interfaces necessary for URN
71 addressing, though these interfaces have rarely been defined as such.
72 This document is intended to remedy that situation. Furthermore,
73 it will attempt to identify how several URN services can be defined
74 and implemented today. Although these solutions will not solve
75 everyone's problems (including such issues as replication and
76 authentication of centralized name services), they do provide a
77 significant step forward and supply the infrastructure required by
78 all URN services.
79
80 This document assumes that the reader has knowledge of the basic
81 syntax of WWW Universal Resource Identifiers [1] and Uniform Resource
82 Locators (URLs) [2].
83
84 2. URI Syntax
85
86 The World-Wide Web architecture assumes that resource addresses are
87 identifiable by their scheme name. This applies to all URIs, not
88 just to what are commonly considered URLs today. A URI in absolute
89 form consists of
90
91 <scheme>:<scheme-specific-part>#<fragment>
92
93 where <scheme> contains only US-ASCII lowercase letters, digits, "+",
94 "-", or ".".
95
96 The scheme name identifies the handler routine which would be used
97 to resolve the address. Note that it does not necessarily define
98 the protocol to be used, although people commonly make that
99 assumption after seeing that the most common scheme names are
100 associated with preexisting Internet application protocols.
101
102 The scheme handler routine may exist internal to the client
103 application (either hardcoded or within a modular library
104 architecture such as that found in libwww or libwww-perl),
105 or may be redirected to a proxy application via environment variables
106 or other user-configurable devices. This ability to extend the
107 addressing schemes of clients is one of the key features of WWW
108 technology.
109
110 In order to be successfully implemented within the current base of
111 WWW technology, the URN syntax must correspond to the basic URI
112 syntax as described above. That is, it must start with a scheme name
113 which identifies an appropriate resolver for that address (or allows
114 the client to identify that it has no resolver for that address).
115
116 3. Media Types
117
118 After an address is resolved and a retrieval action has been
119 accomplished through the appropriate scheme handler, a World-Wide Web
120 client will choose a second handler routine for the retrieved
121 document. The document handler is chosen according to that
122 document's Internet media type [5]. The media type is either
123 assigned by the transfer protocol or guessed by the client.
124
125 The document handler routine may exist internal to the client
126 application, or may be redirected to an external application via the
127 MIME mailcap facility. Although most handler routines are simply
128 viewers for the document content, others exist that control internal
129 events or prompt the user for additional input. This ability to
130 extend the behavior of clients is another one of the key features of
131 WWW technology.
132
133 4. URCs are Documents
134
135 The notion of Uniform Resource Characteristics (URCs) has been one
136 of the central issues in the debate about URN services. Simply put,
137 a URC is a set of characteristics regarding a named resource, in a
138 format that can be easily parsed, which identifies a set of locations
139 from which the named resource may be obtained. The URC can then be
140 used as the intermediate step between resolving a URN address and
141 determining the most appropriate location (from the perspective of
142 the client configuration) from which to retrieve the resource.
143
144 Proposals for the format of a URC have ranged from a simple list of
145 URLs to a hierarchical query language. In all cases, however, a URC
146 can be considered a document, and therefore should be assigned an
147 appropriate media type. Furthermore, since it is impossible for any
148 one group to define a single, all-encompassing format for URCs which
149 will satisfy the needs of all archivists and content providers, it
150 will be necessary to define a range of media types.
151
152 Note that this view of URCs already fits well with the WWW
153 architecture. If a URC is labelled as such, a WWW client can perform
154 location redirection as part of the document handler routine.
155 In other words, we can have URN -> URC -> URL indirection working
156 with only minor changes to existing clients.
157
158 Unfortunately, that's still not good enough. Current browsing clients
159 will default to "application/octet-stream" if they do not have a
160 handler routine installed for the indicated media type (usually
161 resulting in a prompt to save the document as a local file). In
162 practice, this has been a barrier to the wholesale introduction of
163 new media types. We need an implementation of URCs that will work
164 with all existing clients, because without that assurance, content
165 providers will be unwilling to use URCs as an intermediate step.
166
167 The solution is to start with an intermediate form of URC which
168 is a fixed variant of an already-universal media type: text/html.
169 This is outlined below in Section 6.
170
171 5. URI Resolution Architecture
172
173 But wait, there's more!
174
175 If a URC is identifiable as a document, then any document retrieval
176 action may result in an indirection. Therefore, we are no longer
177 talking about just URN resolution via URCs, but also URL redirection
178 via URCs (i.e., redirection of a single URL to multiple variants),
179 URN resolution to a single URL (i.e., minimal URCs), and URN
180 resolution directly to the named resource. As far as the client is
181 concerned, it is just using a URI to retrieve a resource. All of the
182 details of the resolution mechanism remain internal to the scheme
183 handler and the URN service provider, thereby removing the need for
184 the IETF to attempt to standardize any particular scheme, or any
185 particular URN service.
186
187 6. Graceful Introduction of URNs and URCs
188
189 Well, its not all just a bed of roses -- there are plenty of thorns
190 that need to be smoothed out in order to promulgate widespread
191 implementations of URNs and URCs over the existing WWW. The
192 following sections outline the steps I would take.
193
194 6.1. The ietf URI scheme
195
196 The first thing we need is a simple, but worthwhile, mechanism for
197 testing these ideas. I suggest that we should define a new URI
198 scheme called "ietf" -- it's purpose would be to provide a single
199 identifier for the replicated archives of the Internet Engineering
200 Taskforce. The format for this identifier is simply:
201
202 "ietf" ":" <existing-ietf-path>
203
204 For example, the identifier of RFC 1808 would become
205
206 ietf:/rfc/rfc1808.txt
207
208 and the one for this draft would be
209
210 ietf:/internet-drafts/draft-ietf-uri-roy-urn-urc-00.txt
211
212 The implementation of the scheme handler is a fairly straightforward
213 address replacement table and associated logic. For example, the
214 following could act as the configuration for my local client:
215
216 PREFIX REPLACEMENT AUTHORITATIVE?
217 ietf: file:/home/fielding/ietf No
218 ietf:/rfc/ ftp://ftp.ics.uci.edu/pub/ietf/rfc/ No
219 ietf:/rfc/ http://info.internet.isi.edu/in-notes/rfc/ No
220 ietf: http://ds.internic.net Yes
221 ietf: ftp://ds.internic.net Yes
222
223 The retrieval logic behind this table is also simple: try each of the
224 matching URI addresses (replacing the matching prefix with the
225 replacement string) until a good response is received, or until a
226 "not found" response is received from an authoritative location.
227
228 Note that the first location points to my own personal archive -- the
229 place where I keep a copy of most of the specs I have referenced in
230 my past work (or anticipate referencing in the near future).
231 Clearly, I want to retrieve my local copy if I have one available.
232 The second address is also a local copy, but consisting of only RFCs
233 and maintained by others at UC Irvine working on Internet Mail and
234 network management issues. The ISI archive is also fairly close to
235 my (network and physical) location, but uses a slightly different
236 path and tends to be 1-2 days out-of-sync with the main IETF
237 archives, which are represented as the final two locations.
238
239 There are a couple of interesting features of this example which have
240 rarely been considered during past discussion of URN issues. The
241 first is that the table is particular to my own client setup. There
242 is no way for a centralized name service to know these details.
243 The second is that the table format could be generic to any URI which
244 can be resolved directly via some other URL (such as, for instance,
245 via the URL of a URN name service). Finally, note that the actual
246 protocol used to resolve the name is defined by the replacement URL,
247 and not by any decision of the WG.
248
249 6.2. The ietf URCs
250
251 The above example did not assume any changes to the existing IETF
252 archive namespace. However, we could get considerably more value
253 out of this scheme if partial name matching resulted in a URC.
254 For example, if the following name
255
256 ietf:/internet-drafts/draft-ietf-uri-roy-urn-urc
257
258 (note the missing "-00.txt") corresponded to a URC pointing to all
259 of the currently available format variants of this draft, then I
260 could avoid having to change references every time a new version is
261 placed in the archives. Similarly,
262
263 ietf:/internet-drafts/draft-ietf-uri
264
265 could point to a summary of all current drafts by the URI-WG, and
266
267 ietf:/rfc/rfc1521
268
269 could point to all format variants of RFC 1521.
270
271 6.3. The urc major media type
272
273 If URCs are to be given media types, we need to register them. MIME
274 provides four major types: text, application, multipart, message,
275 image, audio, and video [4]. However, it is clear that URCs do
276 not fit within any one of these categories, and that subtypes of URC
277 are desirable. Therefore, I suggest that we define a new major media
278 type called "urc".
279
280 RFC 1590 [5] states that "If a new fundamental top-level type is
281 needed, its specification must be published as an RFC or submitted in
282 a form suitable to become an RFC, and be subject to the Internet
283 standards process." We'll just put that on the to-do list.
284
285 6.4. The urc/html media type
286
287 The first URC format that must be defined is one which will not
288 adversely affect current WWW clients. Therefore, we need to define
289 a variant of HTML which will look like a menu on existing browsers,
290 and yet be machine recognizable as a URC by new browsers. We can
291 do this by using a fixed format and require a specific SGML DOCTYPE
292 declaration to appear as the first line of the URC document.
293 For starters, here is what one may look like:
294
295 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML URC//EN">
296 <HTML><HEAD>
297 <TITLE>Available resources for ietf:/rfc/rfc1521</TITLE>
298 </HEAD><BODY>
299 <H1>ietf:/rfc/rfc1521</H1>
300 <DL COMPACT>
301 <DT>Title:
302 <DD>MIME (Multipurpose Internet Mail Extensions)
303 Part One: Mechanisms for Specifying and Describing the
304 Format of Internet Message Bodies
305 <DT>Author:
306 <DD>N. Borenstein
307 <DD>N. Freed
308 <DT>Date:
309 <DD>September 1993
310 <DT>Obsoletes:
311 <DD><A rel="obsoletes" href="ietf:/rfc/rfc1341">RFC 1341</A>
312 <DT>Updated-by:
313 <DD><A rev="updates" href="ietf:/rfc/rfc1590">RFC 1590</A>
314 </DL>
315 <MENU vary="location">
316 <LI>ftp.is.co.za (Africa)
317 <MENU vary="type">
318 <LI><A href="ftp://ftp.is.co.za/rfc/rfc1521.txt.gz">
319 gzip(text/plain), 20000 bytes</a>
320 <LI><A href="ftp://ftp.is.co.za/rfc/rfc1521.ps.gz">
321 gzip(application/postscript), 40000 bytes</A>
322 </MENU>
323 <LI>nic.nordu.net (Europe)
324 <MENU vary="type">
325 <LI><A href="ftp://nic.nordu.net/rfc/rfc1521.txt">
326 text/plain, 187424 bytes</a>
327 <LI><A href="ftp://nic.nordu.net/rfc/rfc1521.ps">
328 application/postscript, 393670 bytes</A>
329 </MENU>
330 <LI>munnari.oz.au (Pacific Rim)
331 <MENU vary="type">
332 <LI><A href="ftp://munnari.oz.au/rfc/rfc1521.txt">
333 text/plain, 187424 bytes</a>
334 <LI><A href="ftp://munnari.oz.au/rfc/rfc1521.ps">
335 application/postscript, 393670 bytes</A>
336 </MENU>
337 <LI>ds.internic.net (US East Coast)
338 <MENU vary="type">
339 <LI><A href="http://ds.internic.net/rfc/rfc1521.txt">
340 text/plain, 187424 bytes</a>
341 <LI><A href="http://ds.internic.net/rfc/rfc1521.ps">
342 application/postscript, 393670 bytes</A>
343 <LI><A href="ftp://ds.internic.net/rfc/rfc1521.txt">
344 text/plain, 187424 bytes</a>
345 <LI><A href="ftp://ds.internic.net/rfc/rfc1521.ps">
346 application/postscript, 393670 bytes</A>
347 </MENU>
348 <LI>ftp.isi.edu (US West Coast)
349 <MENU vary="type">
350 <LI><A href="ftp://ftp.isi.edu/rfc/rfc1521.txt">
351 text/plain, 187424 bytes</a>
352 <LI><A href="ftp://ftp.isi.edu/rfc/rfc1521.ps">
353 application/postscript, 393670 bytes</A>
354 </MENU>
355 </MENU>
356 </BODY></HTML>
357
358 This is only an example -- a complete definition (including BNF)
359 would be necessary for the format to be usable for automated
360 indirection.
361
362 7. Unfinished Business
363
364 I do not pretend to think that the suggestions identified by this
365 document will completely solve the URN problem. However, I am
366 certain that they will eventually be necessary in order to
367 successfully implement any URN scheme on the World-Wide Web.
368 Some of the outstanding problems are identified below, though
369 there are probably more.
370
371 7.1. Changes to HTML to support URNs
372
373 The HTML 2.0 specification [3] already defines an attribute of
374 anchors and link elements for containing a URN. However, no general
375 client supports it, and its not what we really want anyway. What we
376 need is a way to assign multiple URIs to a single hypertext anchor.
377 Fortunately, we don't need this right away, so it can be deferred
378 to the HTML WG for consideration later.
379
380 7.2. Name Persistence
381
382 One of the "requirements" identified for URNs is that they be
383 unique for all time (or at least a reasonable time such as to
384 make name collision impossible). This document completely
385 ignores that issue, as I think should any real implementation
386 of URNs. Name persistence is not something that technology can
387 guarantee, other than by the undesirable mechanism of assigning
388 a new name based on the location and time of creation. It is
389 quite possible that some URN schemes will have such persistence,
390 but it will be attained through the institutions responsible
391 for assigning the names and maintaining the resolution services,
392 not by constraining the syntax of names.
393
394 7.3. Sub-second Resolution
395
396 No constraints on resolution times are proposed, because they
397 are simply unnecessary. Nobody can determine the resolution time
398 for any particular user at any particular network (or, egads,
399 non-networked) site. People will use the quickest (or cheapest)
400 resolution available to them -- we do not need to define it in
401 advance, nor should we.
402
403 7.3. Security Considerations
404
405 No security considerations have been identified by this document.
406 This will require future work.
407
408 8. Acknowledgements
409
410 This paper is the result of over a year of thinking and only two
411 days of writing, so I have left some things out and have probably
412 failed to properly acknowledge all those who deserve to be.
413 Tim Berners-Lee is primarily responsible for the extensible
414 architecture of the World-Wide Web. I have discussed the issues
415 involved in URI indirection, and URCs as media types, with
416 Daniel LaLiberte several times, but he is not to blame for this
417 treatise. Larry Masinter has pointed out several times that the
418 WG is unable to "create" the institutions needed for true
419 persistence.
420
421
422 9. References
423
424 [1] T. Berners-Lee, "Universal Resource Identifiers in WWW:
425 A Unifying Syntax for the Expression of Names and Addresses of
426 Objects on the Network as used in the World-Wide Web", RFC 1630,
427 CERN, June 1994.
428
429 [2] T. Berners-Lee, L. Masinter, and M. McCahill, Editors,
430 "Uniform Resource Locators (URL)", RFC 1738, CERN,
431 Xerox Corporation, University of Minnesota, December 1994.
432
433 [3] T. Berners-Lee and D. Connolly, "HyperText Markup Language
434 Specification -- 2.0", Work in Progress, MIT/W3C,
435 June 1995. <URL:http://www.ics.uci.edu/pub/ietf/html/>
436
437 [4] N. Borenstein and N. Freed, "MIME (Multipurpose Internet Mail
438 Extensions): Mechanisms for Specifying and Describing the Format
439 of Internet Message Bodies", RFC 1521, Bellcore, Innosoft,
440 September 1993.
441
442 [5] J. Postel, "Media Type Registration Procedure", RFC 1590,
443 USC/ISI, March 1994.
444
445 10. Author's Address
446
447 Roy T. Fielding
448 Department of Information and Computer Science
449 University of California
450 Irvine, CA 92717-3425
451 U.S.A.
452
453 Tel: +1 (714) 824-4049
454 Fax: +1 (714) 824-4056
455 Email: fielding@ics.uci.edu
456

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24