/[suikacvs]/webroot/www/2004/id/draft-ietf-uri-urns-harmful-00.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-uri-urns-harmful-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Tue Jun 15 08:37:16 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 wakaba 1.1
2    
3    
4    
5    
6    
7     Network Working Group Keith Moore
8     Internet-Draft University of Tennessee
9     Expires: 7 January 1996
10    
11    
12     Location-Independent URLs
13     or
14     URNs considered harmful
15    
16     draft-ietf-uri-urns-harmful-00.txt
17    
18     Status of this Memo
19    
20     This document is an Internet-Draft. Internet-Drafts are working docu-
21     ments of the Internet Engineering Task Force (IETF), its areas, and its
22     working groups. Note that other groups may also distribute working doc-
23     uments as Internet-Drafts.
24    
25     Internet-Drafts are draft documents valid for a maximum of six months
26     and may be updated, replaced, or obsoleted by other documents at any
27     time. It is not appropriate to use Internet-Drafts as reference mate-
28     rial or to cite them other than as "work in progress."
29    
30     To learn the current status of any Internet-Draft, please check the
31     "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
32     Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
33     munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
34     ftp.isi.edu (US West Coast).
35    
36     Abstract
37    
38     This document describes a means by which location-independent access to
39     resources can be provided, without creating a new class of resource
40     names. Instead, a resolution service is proposed for existing URLs,
41     which allows information providers to advertise meta-information about a
42     resource named by a URL, and/or alternate locations from which that par-
43     ticular resource might be accessed.
44    
45     Depending on your point of view, the approach described in this document
46     might be taken as one or more of: (a) an alternative solution to the
47     "URL problem", (b) a strategy for gradual transition from URLs to URNs,
48     or (c) a worst-case scenario in the event that URN adoption takes too
49     long.
50    
51     1. The Problem
52    
53     URLs are widely used today, but they have three basic problems: (1) they
54     are tied to host names, (2) they are tied to filenames on a particular
55    
56    
57    
58     Moore Expires 7 January 1996 [Page 1]
59    
60    
61    
62    
63    
64    
65    
66    
67    
68    
69    
70     URNs considered harmful 7 July 1995
71    
72    
73     host, and (3) they are tied to access protocols. Since all of these are
74     likely to change over time, URLs are not stable names.
75    
76     Host names impose two problems: stability and scalability. Host names
77     based on DNS are frequently unstable because they (by design) reflect
78     administrative hierarchies, which tend to change fairly frequently rela-
79     tive to the lifetime of a resource. Host names also reflect the names
80     of organizations, which also tend to change over time. Finally, a
81     resource "owned" or maintained by one organization can migrate to
82     another "organization", thus necessitating a change in the host name at
83     which a resource is located. All of these changes invalidate old URLs.
84    
85     So long as the only way to access an object named by a URL, is to con-
86     tact the host whose name appears in a URL, it is difficult to provide
87     scalable access to those objects. Popular world wide web sites assign
88     several server machines to a single host name (perhaps using a modified
89     name server which randomizes responses), but this requires that the
90     entire collection of files made available using that host name, be con-
91     sistently mirrored to each server machine. As a practical consequence,
92     all of those server machines are usually maintained at a single loca-
93     tion, even though better service and conservation of bandwidth would
94     result from distributing the servers around the network.
95    
96     Filenames embedded in URLs serve two purposes which are in conflict.
97     They need to be stable so that a reference to a named resource will con-
98     tinue to be valid for as long as needed. On the other hand, files are
99     usually organized into hierarchies to help humans browse through the
100     file system. The hierarchies need to change from time to time as new
101     files are added and old ones are removed. When the file system hierar-
102     chy is changed, the old URLs become invalid.
103    
104     Having access protocols implicit in a resource name imposes an addi-
105     tional barrier: The client must support the access protocol (perhaps via
106     a proxy) in order to access the resource. This is effectively true even
107     if the resource is available via other protocols, because other URLs for
108     those protocols may not be available. Even if several URLs are avail-
109     able, the user must be knowledgable enough to choose one which is sup-
110     ported by his client, will pass through his security firewall, etc.
111    
112     The issue of stability deserves careful scrutiny. Some people might be
113     happy with URLs if only they did not become stale during their useful
114     lifetime. Others have suggested that we need resource names to support
115     enforcement of copyright laws, and that resource names should therefore
116     be stable for the duration of the copyright (up to 150 years or more).
117    
118     2. The "official future" scenario: URNs and URCs
119    
120     The widely expected solution to these problems is something like the
121    
122    
123    
124     Moore Expires 7 January 1996 [Page 2]
125    
126    
127    
128    
129    
130    
131    
132    
133    
134    
135    
136     URNs considered harmful 7 July 1995
137    
138    
139     following: Instead of using URLs as resource names, we will migrate to
140     using "uniform resource *names*" or URNs. URNs will not be tied to
141     locations, but there will be resolution services available which will
142     allow a user to obtain the "characteristics" of the resource (also known
143     as the URC). Locations (URLs) of resources will also be obtainable via
144     the URN, either as part of the URC, indirectly (through a location-
145     independent file name or LIFN which appears in a URC), or perhaps via
146     some separate resolution service.
147    
148     To solve the problem of stale links, users will have to upgrade to
149     browsers that support URNs directly, or access the web via proxy servers
150     which convert URNs to URLs. References to existing objects (several
151     million of them) will need to be converted to provide URNs instead of or
152     in addition to the URLs.
153    
154     3. Cost of Transition to URNs
155    
156     We have resource names now; they're called URLs. URLs have all kinds of
157     undesirable properties, but one nice thing is that they're cheap. All
158     you need is a host on the Internet and a DNS name, and you have your own
159     very large chunk of URL-space. Since DNS is a well-established part of
160     the Internet infrastructure, it's "free".
161    
162     There's a cost to providing another level of indirection. New protocols
163     have to be defined and tested, new (and more complex) clients and
164     servers must be debugged. The new services have to be managed; the new
165     clients have to be configured correctly. Information providers (which
166     potentially includes nearly everybody) will have to learn how to use the
167     new tools.
168    
169     There are also issues of serviceability and reliability. If the client
170     doesn't do what the user thinks it should, the system administrator has
171     one or two more possible culprits. Every additional level of lookup
172     imposes an overhead in network bandwidth and delay (as seen by the
173     user). When adding a layer to a service that millions of people depend
174     on, the new layer must integrate smoothly with those that already exist
175     so as not to adversely impact reliability.
176    
177     Even after a URN lookup service exists and is available in clients, most
178     clients will still support only URLs for some time, and most references
179     to resources will still use URLs. During that period, there will be an
180     increased cost due not only to the need to maintain both sets of names,
181     but also to having multiple sources of failure. (Did the attempt to
182     access the resource fail because the URL was stale, or because the URN
183     server returned the wrong answer?). Due to the large number of
184     resources presently named by URLs, one can imagine needing a service to
185     map from URLs to URNs, in addition to the other way around.
186    
187    
188    
189    
190     Moore Expires 7 January 1996 [Page 3]
191    
192    
193    
194    
195    
196    
197    
198    
199    
200    
201    
202     URNs considered harmful 7 July 1995
203    
204    
205     For any new protocol, naming scheme, or additional level of lookup, the
206     costs and benefits need to be analyzed to see whether the anticipated
207     gain is likely to be worth the cost. An important question is: when
208     will the user see a benefit from using a client that supports URNs?
209     Another is: how much investment is required to develop the URN infras-
210     tructure to the point that the user does see that benefit?
211    
212     4. Current state of affairs
213    
214     The Uniform Resource Identifiers working group was chartered in December
215     of 1992. Most of the problems identified in section I of this document
216     have been apparent for at least that long, yet there is no agreement on
217     an implementable solution for those problems.
218    
219     A specification for URLs - which were already widely used and interoper-
220     able before URI was chartered - has been published. Some requirements
221     for URNs have been specified and published as RFC 1737. Requirements
222     for URCs are still being debated. Neither URNs, URCs, nor the resolu-
223     tion protocols have yet been specified.
224    
225     To be fair, the URI working group was chartered only to develop resource
226     identifiers, not to design a complete system which solves problems.
227     Part of the reason for the long delay over URNs and URCs may be that the
228     discussion centers around the components of the system, without explic-
229     itly considering the system as a whole or the environment in which those
230     components must operate. While many of the proposals for URNs and URCs
231     were obviously designed to solve certain problems, there may still be a
232     lack of agreement about what problems need to be addressed, or the rela-
233     tive importance of those problems. There may also be a lack of under-
234     standing or agreement on the cost of providing a particular service ver-
235     sus the benefit that it would provide.
236    
237     Frederick Brooks's _The Mythical Man-Month_ describes a phenomenon known
238     as "the second-system effect". He writes:
239    
240     An architect's first work is apt to be spare and clean. He
241     knows he doesn't know what he's doing, so he does it carefully
242     and with great restraint.
243    
244     As he designs his first work, frill after frill and
245     embellishment after embellishment occur to him. These
246     get stored away to be used "next time". Sooner or later
247     the first system is finished, and the architect, with firm
248     confidence and a demonstrated mastery of that class of
249     systems, is ready to build a second system.
250    
251     This second is the most dangerous system a man ever designs...
252    
253    
254    
255    
256     Moore Expires 7 January 1996 [Page 4]
257    
258    
259    
260    
261    
262    
263    
264    
265    
266    
267    
268     URNs considered harmful 7 July 1995
269    
270    
271     It is tempting to cite URNs and URCs as examples of the second-system
272     effect. Most of us are painfully aware of the deficiencies of URLs, and
273     also of the difficulties in changing from URLs to some other scheme.
274     There is therefore a tremendous incentive to do the job "right" this
275     time around, lest we end up with URNs and URCs that are somehow insuffi-
276     cient for the job within a few years. Some would even suggest that we
277     need to get URNs right "once and for all".
278    
279     But it's at least possible that problems inherent in URLs and the world
280     wide web are not due not only to a lack of foresight in their design
281     (after all, who could have predicted the tremendous success of the
282     web?), but also to the lack of ability to control the ways in which the
283     web was used. To put it a different way, it takes time for tens of
284     thousands of new information providers to acquire disciplines and build
285     tools for consistent resource naming and keeping track of resource loca-
286     tions. This is true regardless of the scheme used for resource names.
287    
288     We probably cannot anticipate needs of web users and information
289     providers for more than the next few years, and we are deluding our-
290     selves if we believe we can impose a discipline on its use simply by
291     creating a new space for resource names. We need to design a system
292     which is adaptable to future needs without knowing precisely what those
293     needs are. At the same time, if a solution for possible future problems
294     does not address today's needs, it is doomed to failure.
295    
296     The functional requirements for URNs (from RFC 1737) include: global
297     scope, global uniqueness, persistence, scalability (of assignment),
298     legacy support, extensibility, independence, and resolution. Notably
299     absent from this list is scalability of access, even though the lack
300     thereof is one of our most pressing problems.
301    
302    
303     5. An alternate scenario: Location-Independent URLs
304    
305     The usual explanation for why URLs are a Bad Thing (tm) is "URLs are
306     tied to locations". But the Internet has already had one successful
307     transition away from location-based names - in electronic mail. Once
308     upon a time, email addresses were of the form user@host, and were there-
309     fore tied to the network "locations" of those hosts. To send mail to a
310     user at a host, you connected to that host's SMTP server (before that,
311     it was the FTP server), and told it to deliver the message to that
312     user's mailbox.
313    
314     Along came DNS and the MX record. Addresses are now of the form
315     user@domain. Instead of connecting to the host associated with that
316     domain, one now connects to one of the mail exchangers for that domain
317     listed in the DNS. One result is that email domains are increasingly
318     decoupled from host names. It is not uncommon for a single email domain
319    
320    
321    
322     Moore Expires 7 January 1996 [Page 5]
323    
324    
325    
326    
327    
328    
329    
330    
331    
332    
333    
334     URNs considered harmful 7 July 1995
335    
336    
337     to serve hundreds of Internet hosts, which may not even be directly con-
338     nected to the Internet. Email addresses are now less likely to be tied
339     to individual host names. And since there can be multiple mail exchang-
340     ers for a domain, other results have been increased fault-tolerance via
341     redundancy, and better ability to handle load.
342    
343     Imagine a new DNS record type called RCS (for "resource catalog server")
344     which performed an MX-like function for URLs. For example, the records:
345    
346     www.netlib.org. RCS 10 netlib2.cs.utk.edu.
347     RCS 10 netlib1.epm.ornl.gov.
348    
349     would inform a web client that meta-information for any URL containing
350     the domain www.netlib.org, could be found using the resource catalog
351     servers at netlib2.cs.utk.edu and netlib1.epm.ornl.gov.
352    
353     These records would be obtained in a single DNS query for
354     www.netlib.org, along with the IP addresses of the RCS servers. If
355     there had been no RCS records, the same query would have returned the IP
356     addresses of www.netlib.org. (Existing TXT records could be used
357     instead of RCS records, but additional DNS queries would be required to
358     look up the IP addresses of any resource characteristics servers.)
359    
360     Once the addresses of RCS servers were known for the domain in a URL,
361     the client would use a special-purpose resolution protocol to obtain
362     characteristics of the resource named by that URL, alternate locations
363     (i.e additional URLs) at which that URL could be accessed, or both.
364    
365     New clients that supported this scheme would use the same URLs as the
366     old clients, but would gain immediate benefit from being able to dis-
367     cover alternate servers for their resources (with little penalty for
368     trying). Eventually they would be able to make use of URCs as well.
369     When coupled with a mechanism such as SONAR for finding network proxim-
370     ity information (see draft-moore-sonar-00.txt), the client would gain
371     the ability to automatically choose a nearby location for that resource
372     (thereby improving access times), and to recover from the failure of any
373     single resource server.
374    
375     Legacy clients would still be able to access those resources as long as
376     there were a *host* with the same name as that used in the URL which
377     provided such access. Information providers would therefore continue to
378     maintain such servers until most legacy clients had been replaced.
379    
380     It should be possible to reserve a few chunks of DNS for naming authori-
381     ties for long-term resource-names. Subdomains within these spaces would
382     be required to NOT be meaningful to humans; thus, the names themselves
383     need never be obsolete. While such a subdomain would initially be
384     assigned to a publisher, the responsibility for serving that subdomain
385    
386    
387    
388     Moore Expires 7 January 1996 [Page 6]
389    
390    
391    
392    
393    
394    
395    
396    
397    
398    
399    
400     URNs considered harmful 7 July 1995
401    
402    
403     could be transferred as necessary when that publisher or its intellec-
404     tual property assets changed hands. Finally, resource names would only
405     be assigned for any particular subdomain for a short time (perhaps a
406     year), after which a new resource name would be used. This would allow
407     the resolution service for older subdomains could be shifted from pri-
408     mary servers to "custodians".
409    
410     One nice feature of this scheme is that it still works for ordinary
411     users and their home pages. Sites could set up their location and/or
412     URC servers with their existing DNS names; they need not obtain new nam-
413     ing authority names. If a user's web page becomes unexpectedly popular,
414     a resource catalog server and appropriate RCS records could be installed
415     to inform clients of alternate locations, even though no prior arrange-
416     ments were made.
417    
418     But publishers and others who had an interest in making resources avail-
419     able over the long-term (including probably anybody who wanted to make
420     money selling access to his works), would see the benefit in using the
421     new name space. And new naming authority names could be distinguished
422     from ordinary DNS names.
423    
424     Such a system would not provide scalable resolution of resource names
425     for several centuries in the future. But it probably would work for a
426     few decades, during which usage patterns are almost certainly to change
427     beyond what we can anticipate. It would also encourage building of an
428     infrastructure for maintaining meta-information about resources.
429    
430     6. URN functional requirements as applied to URLs
431    
432     RFC 1737 lists the following functional requirements for URNs:
433    
434     a. Global scope: A URN is a name with global scope which does not
435     imply a location. It has the same meaning everywhere.
436    
437     Non-relative URLs already have the same meaning everywhere. If an MX
438     like system is defined, a URL no longer implies a location.
439    
440     b. Global uniqueness: The same URN will never be assigned to two dif-
441     ferent resources.
442    
443     This is generally true in practice for URLs. It is only rarely that a
444     file name which is exported to the world, is re-used to name a com-
445     pletely different resource. To the extent that this is not true, it is
446     a matter of discipline and education on the part of information
447     providers. If providers do not understand the value of unique naming,
448     the introduction of URNs will not force them to apply the discipline.
449     If providers do understand the value of unique naming, they can provide
450     unique names for URLs.
451    
452    
453    
454     Moore Expires 7 January 1996 [Page 7]
455    
456    
457    
458    
459    
460    
461    
462    
463    
464    
465    
466     URNs considered harmful 7 July 1995
467    
468    
469     c. Persistence: It is intended that the lifetime of a URN be perma-
470     nent. That is, the URN will be globally unique forever, and may
471     well be used as a reference to a resource well beyond the lifetime
472     of the resource it identifies or of any naming authority involved
473     in the assignment of its name.
474    
475     The persistence of URLs depends on the persistence of their components -
476     file names and domain names.
477    
478     Persistence of file names is also a matter of discipline on the part of
479     an information provider, since file systems and existing protocol
480     servers are capable of storing files under fairly arbitrary names. If
481     the "browsing" function that currently requires human-meaningful file
482     names and arranges them into human-meaningful hierarchies is replaced
483     with some other means of resource discovery, information providers are
484     free to begin using some other naming scheme that allows stable names,
485     *without* introducing a new kind of resource name. Similarly, persis-
486     tence of domain names can be achieved by using domain names that do not
487     correspond to organizational names or hierarchies, and which will never
488     be re-assigned.
489    
490     On the other hand, for persistent resource names to be useful, requires
491     a long-term commitment to maintain the characteristics and location data
492     and to provide the resolution and/or access services. Few organizations
493     (and not all countries) can reliably commit to providing such support
494     for several generations, yet large centralized databases would provide a
495     dangerous means for control of networked information.
496    
497     d. Scalability: URNs can be assigned to any resource that might con-
498     ceivably be available on the network, for hundreds of years.
499    
500     URLs meet this requirement.
501    
502     e. Legacy support: The scheme must permit the support of existing
503     legacy naming systems, insofar as they satisfy the other require-
504     ments described here. For example, ISBN numbers, ISO public identi-
505     fiers, and UPC product codes seem to satisfy the functional
506     requirements, and allow an embedding that satisfies the syntactic
507     requirements described here.
508    
509     URLs were designed to support legacy protocols, and therefore are very
510     adaptable to legacy naming systems.
511    
512     f. Extensibility: Any scheme for URNs must permit future extensions to
513     the scheme.
514    
515     URLs have proven to be extensible to several protocols other than those
516     for which they were designed.
517    
518    
519    
520     Moore Expires 7 January 1996 [Page 8]
521    
522    
523    
524    
525    
526    
527    
528    
529    
530    
531    
532     URNs considered harmful 7 July 1995
533    
534    
535     g. Independence: It is solely the responsibility of a name issuing
536     authority to determine the conditions under which it will issue a
537     name.
538    
539     URLs meet this requirement.
540    
541     h. Resolution: A URN will not impede resolution (translation into a
542     URL, q.v.). To be more specific, for URNs that have corresponding
543     URLs, there must be some feasible mechanism to translate a URN to a
544     URL.
545    
546     By introducing a resolution service which uses URLs as resource names
547     and allows location of resolution servers using DNS, URLs meet this
548     requirement.
549    
550     7. Limitations
551    
552     If we provide the means for the continued use of URLs, some users and/or
553     information providers may delay migrating to new protocols (URC records,
554     resolution servers) or to better resource naming schemes (URLs based on
555     stable domain names and file names, or a new URL type named "URN").
556     Nevertheless, the new proposal provides incentives to migrate for both
557     users and providers.
558    
559     DNS names are not ideally suited as naming authority names. The problem
560     is not so much due to the names themselves (since new DNS subtrees can
561     be reserved for naming authorities) but to the way the lookup hierarchy
562     is exposed in the name. If a particular branch of the tree grows too
563     big (as in the .COM domain), it is difficult to sub-divide it. This can
564     be solved by limiting the number of branches at any node within the por-
565     tion of DNS space reserved for naming authority names. Alternatively, a
566     special-purpose server (which understand the DNS query protocol) could
567     be constructed to provide resolution for any large and flat subspace
568     within the DNS tree.
569    
570     8. Summary
571    
572     The difficulty in transition from URLs to URNs, combined with the delay
573     by IETF in defining URNs and resolution services, encourages solutions
574     to the existing problems of the web which use existing URLs. Such solu-
575     tions could possibly be implemented and deployed without the cooperation
576     of the IETF.
577    
578     On the other hand, IETF could use this mechanism to provide a transition
579     from present-day URLs to names which (while syntactically identical to
580     URLs), had all of the characteristics desired for URNs. The transition
581     to such a scheme would be easier than a transition to an entirely new
582     naming scheme.
583    
584    
585    
586     Moore Expires 7 January 1996 [Page 9]
587    
588    
589    
590    
591    
592    
593    
594    
595    
596    
597    
598     URNs considered harmful 7 July 1995
599    
600    
601     Many of the difficulties with the present URL scheme can be seen as a
602     lack of knowledge and discipline on the part of network information
603     providers. This is to be expected, considering the rapid growth of the
604     web and the fact that many of its information providers are ordinary
605     users. These problems would exist regardless of the naming scheme used,
606     and will only be solved through education, new tools, and time.
607    
608     9. Security Considerations
609    
610     Security Considerations are not addressed in this memo.
611    
612     10. Author's Address
613    
614     Keith Moore
615     Department of Computer Science
616     University of Tennessee
617     107 Ayres Hall
618     Knoxville TN 37996-1301
619    
620     moore@cs.utk.edu
621    
622    
623    
624    
625    
626    
627    
628    
629    
630    
631    
632    
633    
634    
635    
636    
637    
638    
639    
640    
641    
642    
643    
644    
645    
646    
647    
648    
649    
650    
651    
652     Moore Expires 7 January 1996 [Page 10]
653    
654    
655    
656    
657    
658    
659    
660    

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24