/[suikacvs]/webroot/www/2004/id/draft-ietf-uri-urns-harmful-00.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-uri-urns-harmful-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:37:16 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2
3
4
5
6
7 Network Working Group Keith Moore
8 Internet-Draft University of Tennessee
9 Expires: 7 January 1996
10
11
12 Location-Independent URLs
13 or
14 URNs considered harmful
15
16 draft-ietf-uri-urns-harmful-00.txt
17
18 Status of this Memo
19
20 This document is an Internet-Draft. Internet-Drafts are working docu-
21 ments of the Internet Engineering Task Force (IETF), its areas, and its
22 working groups. Note that other groups may also distribute working doc-
23 uments as Internet-Drafts.
24
25 Internet-Drafts are draft documents valid for a maximum of six months
26 and may be updated, replaced, or obsoleted by other documents at any
27 time. It is not appropriate to use Internet-Drafts as reference mate-
28 rial or to cite them other than as "work in progress."
29
30 To learn the current status of any Internet-Draft, please check the
31 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
32 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
33 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
34 ftp.isi.edu (US West Coast).
35
36 Abstract
37
38 This document describes a means by which location-independent access to
39 resources can be provided, without creating a new class of resource
40 names. Instead, a resolution service is proposed for existing URLs,
41 which allows information providers to advertise meta-information about a
42 resource named by a URL, and/or alternate locations from which that par-
43 ticular resource might be accessed.
44
45 Depending on your point of view, the approach described in this document
46 might be taken as one or more of: (a) an alternative solution to the
47 "URL problem", (b) a strategy for gradual transition from URLs to URNs,
48 or (c) a worst-case scenario in the event that URN adoption takes too
49 long.
50
51 1. The Problem
52
53 URLs are widely used today, but they have three basic problems: (1) they
54 are tied to host names, (2) they are tied to filenames on a particular
55
56
57
58 Moore Expires 7 January 1996 [Page 1]
59
60
61
62
63
64
65
66
67
68
69
70 URNs considered harmful 7 July 1995
71
72
73 host, and (3) they are tied to access protocols. Since all of these are
74 likely to change over time, URLs are not stable names.
75
76 Host names impose two problems: stability and scalability. Host names
77 based on DNS are frequently unstable because they (by design) reflect
78 administrative hierarchies, which tend to change fairly frequently rela-
79 tive to the lifetime of a resource. Host names also reflect the names
80 of organizations, which also tend to change over time. Finally, a
81 resource "owned" or maintained by one organization can migrate to
82 another "organization", thus necessitating a change in the host name at
83 which a resource is located. All of these changes invalidate old URLs.
84
85 So long as the only way to access an object named by a URL, is to con-
86 tact the host whose name appears in a URL, it is difficult to provide
87 scalable access to those objects. Popular world wide web sites assign
88 several server machines to a single host name (perhaps using a modified
89 name server which randomizes responses), but this requires that the
90 entire collection of files made available using that host name, be con-
91 sistently mirrored to each server machine. As a practical consequence,
92 all of those server machines are usually maintained at a single loca-
93 tion, even though better service and conservation of bandwidth would
94 result from distributing the servers around the network.
95
96 Filenames embedded in URLs serve two purposes which are in conflict.
97 They need to be stable so that a reference to a named resource will con-
98 tinue to be valid for as long as needed. On the other hand, files are
99 usually organized into hierarchies to help humans browse through the
100 file system. The hierarchies need to change from time to time as new
101 files are added and old ones are removed. When the file system hierar-
102 chy is changed, the old URLs become invalid.
103
104 Having access protocols implicit in a resource name imposes an addi-
105 tional barrier: The client must support the access protocol (perhaps via
106 a proxy) in order to access the resource. This is effectively true even
107 if the resource is available via other protocols, because other URLs for
108 those protocols may not be available. Even if several URLs are avail-
109 able, the user must be knowledgable enough to choose one which is sup-
110 ported by his client, will pass through his security firewall, etc.
111
112 The issue of stability deserves careful scrutiny. Some people might be
113 happy with URLs if only they did not become stale during their useful
114 lifetime. Others have suggested that we need resource names to support
115 enforcement of copyright laws, and that resource names should therefore
116 be stable for the duration of the copyright (up to 150 years or more).
117
118 2. The "official future" scenario: URNs and URCs
119
120 The widely expected solution to these problems is something like the
121
122
123
124 Moore Expires 7 January 1996 [Page 2]
125
126
127
128
129
130
131
132
133
134
135
136 URNs considered harmful 7 July 1995
137
138
139 following: Instead of using URLs as resource names, we will migrate to
140 using "uniform resource *names*" or URNs. URNs will not be tied to
141 locations, but there will be resolution services available which will
142 allow a user to obtain the "characteristics" of the resource (also known
143 as the URC). Locations (URLs) of resources will also be obtainable via
144 the URN, either as part of the URC, indirectly (through a location-
145 independent file name or LIFN which appears in a URC), or perhaps via
146 some separate resolution service.
147
148 To solve the problem of stale links, users will have to upgrade to
149 browsers that support URNs directly, or access the web via proxy servers
150 which convert URNs to URLs. References to existing objects (several
151 million of them) will need to be converted to provide URNs instead of or
152 in addition to the URLs.
153
154 3. Cost of Transition to URNs
155
156 We have resource names now; they're called URLs. URLs have all kinds of
157 undesirable properties, but one nice thing is that they're cheap. All
158 you need is a host on the Internet and a DNS name, and you have your own
159 very large chunk of URL-space. Since DNS is a well-established part of
160 the Internet infrastructure, it's "free".
161
162 There's a cost to providing another level of indirection. New protocols
163 have to be defined and tested, new (and more complex) clients and
164 servers must be debugged. The new services have to be managed; the new
165 clients have to be configured correctly. Information providers (which
166 potentially includes nearly everybody) will have to learn how to use the
167 new tools.
168
169 There are also issues of serviceability and reliability. If the client
170 doesn't do what the user thinks it should, the system administrator has
171 one or two more possible culprits. Every additional level of lookup
172 imposes an overhead in network bandwidth and delay (as seen by the
173 user). When adding a layer to a service that millions of people depend
174 on, the new layer must integrate smoothly with those that already exist
175 so as not to adversely impact reliability.
176
177 Even after a URN lookup service exists and is available in clients, most
178 clients will still support only URLs for some time, and most references
179 to resources will still use URLs. During that period, there will be an
180 increased cost due not only to the need to maintain both sets of names,
181 but also to having multiple sources of failure. (Did the attempt to
182 access the resource fail because the URL was stale, or because the URN
183 server returned the wrong answer?). Due to the large number of
184 resources presently named by URLs, one can imagine needing a service to
185 map from URLs to URNs, in addition to the other way around.
186
187
188
189
190 Moore Expires 7 January 1996 [Page 3]
191
192
193
194
195
196
197
198
199
200
201
202 URNs considered harmful 7 July 1995
203
204
205 For any new protocol, naming scheme, or additional level of lookup, the
206 costs and benefits need to be analyzed to see whether the anticipated
207 gain is likely to be worth the cost. An important question is: when
208 will the user see a benefit from using a client that supports URNs?
209 Another is: how much investment is required to develop the URN infras-
210 tructure to the point that the user does see that benefit?
211
212 4. Current state of affairs
213
214 The Uniform Resource Identifiers working group was chartered in December
215 of 1992. Most of the problems identified in section I of this document
216 have been apparent for at least that long, yet there is no agreement on
217 an implementable solution for those problems.
218
219 A specification for URLs - which were already widely used and interoper-
220 able before URI was chartered - has been published. Some requirements
221 for URNs have been specified and published as RFC 1737. Requirements
222 for URCs are still being debated. Neither URNs, URCs, nor the resolu-
223 tion protocols have yet been specified.
224
225 To be fair, the URI working group was chartered only to develop resource
226 identifiers, not to design a complete system which solves problems.
227 Part of the reason for the long delay over URNs and URCs may be that the
228 discussion centers around the components of the system, without explic-
229 itly considering the system as a whole or the environment in which those
230 components must operate. While many of the proposals for URNs and URCs
231 were obviously designed to solve certain problems, there may still be a
232 lack of agreement about what problems need to be addressed, or the rela-
233 tive importance of those problems. There may also be a lack of under-
234 standing or agreement on the cost of providing a particular service ver-
235 sus the benefit that it would provide.
236
237 Frederick Brooks's _The Mythical Man-Month_ describes a phenomenon known
238 as "the second-system effect". He writes:
239
240 An architect's first work is apt to be spare and clean. He
241 knows he doesn't know what he's doing, so he does it carefully
242 and with great restraint.
243
244 As he designs his first work, frill after frill and
245 embellishment after embellishment occur to him. These
246 get stored away to be used "next time". Sooner or later
247 the first system is finished, and the architect, with firm
248 confidence and a demonstrated mastery of that class of
249 systems, is ready to build a second system.
250
251 This second is the most dangerous system a man ever designs...
252
253
254
255
256 Moore Expires 7 January 1996 [Page 4]
257
258
259
260
261
262
263
264
265
266
267
268 URNs considered harmful 7 July 1995
269
270
271 It is tempting to cite URNs and URCs as examples of the second-system
272 effect. Most of us are painfully aware of the deficiencies of URLs, and
273 also of the difficulties in changing from URLs to some other scheme.
274 There is therefore a tremendous incentive to do the job "right" this
275 time around, lest we end up with URNs and URCs that are somehow insuffi-
276 cient for the job within a few years. Some would even suggest that we
277 need to get URNs right "once and for all".
278
279 But it's at least possible that problems inherent in URLs and the world
280 wide web are not due not only to a lack of foresight in their design
281 (after all, who could have predicted the tremendous success of the
282 web?), but also to the lack of ability to control the ways in which the
283 web was used. To put it a different way, it takes time for tens of
284 thousands of new information providers to acquire disciplines and build
285 tools for consistent resource naming and keeping track of resource loca-
286 tions. This is true regardless of the scheme used for resource names.
287
288 We probably cannot anticipate needs of web users and information
289 providers for more than the next few years, and we are deluding our-
290 selves if we believe we can impose a discipline on its use simply by
291 creating a new space for resource names. We need to design a system
292 which is adaptable to future needs without knowing precisely what those
293 needs are. At the same time, if a solution for possible future problems
294 does not address today's needs, it is doomed to failure.
295
296 The functional requirements for URNs (from RFC 1737) include: global
297 scope, global uniqueness, persistence, scalability (of assignment),
298 legacy support, extensibility, independence, and resolution. Notably
299 absent from this list is scalability of access, even though the lack
300 thereof is one of our most pressing problems.
301
302
303 5. An alternate scenario: Location-Independent URLs
304
305 The usual explanation for why URLs are a Bad Thing (tm) is "URLs are
306 tied to locations". But the Internet has already had one successful
307 transition away from location-based names - in electronic mail. Once
308 upon a time, email addresses were of the form user@host, and were there-
309 fore tied to the network "locations" of those hosts. To send mail to a
310 user at a host, you connected to that host's SMTP server (before that,
311 it was the FTP server), and told it to deliver the message to that
312 user's mailbox.
313
314 Along came DNS and the MX record. Addresses are now of the form
315 user@domain. Instead of connecting to the host associated with that
316 domain, one now connects to one of the mail exchangers for that domain
317 listed in the DNS. One result is that email domains are increasingly
318 decoupled from host names. It is not uncommon for a single email domain
319
320
321
322 Moore Expires 7 January 1996 [Page 5]
323
324
325
326
327
328
329
330
331
332
333
334 URNs considered harmful 7 July 1995
335
336
337 to serve hundreds of Internet hosts, which may not even be directly con-
338 nected to the Internet. Email addresses are now less likely to be tied
339 to individual host names. And since there can be multiple mail exchang-
340 ers for a domain, other results have been increased fault-tolerance via
341 redundancy, and better ability to handle load.
342
343 Imagine a new DNS record type called RCS (for "resource catalog server")
344 which performed an MX-like function for URLs. For example, the records:
345
346 www.netlib.org. RCS 10 netlib2.cs.utk.edu.
347 RCS 10 netlib1.epm.ornl.gov.
348
349 would inform a web client that meta-information for any URL containing
350 the domain www.netlib.org, could be found using the resource catalog
351 servers at netlib2.cs.utk.edu and netlib1.epm.ornl.gov.
352
353 These records would be obtained in a single DNS query for
354 www.netlib.org, along with the IP addresses of the RCS servers. If
355 there had been no RCS records, the same query would have returned the IP
356 addresses of www.netlib.org. (Existing TXT records could be used
357 instead of RCS records, but additional DNS queries would be required to
358 look up the IP addresses of any resource characteristics servers.)
359
360 Once the addresses of RCS servers were known for the domain in a URL,
361 the client would use a special-purpose resolution protocol to obtain
362 characteristics of the resource named by that URL, alternate locations
363 (i.e additional URLs) at which that URL could be accessed, or both.
364
365 New clients that supported this scheme would use the same URLs as the
366 old clients, but would gain immediate benefit from being able to dis-
367 cover alternate servers for their resources (with little penalty for
368 trying). Eventually they would be able to make use of URCs as well.
369 When coupled with a mechanism such as SONAR for finding network proxim-
370 ity information (see draft-moore-sonar-00.txt), the client would gain
371 the ability to automatically choose a nearby location for that resource
372 (thereby improving access times), and to recover from the failure of any
373 single resource server.
374
375 Legacy clients would still be able to access those resources as long as
376 there were a *host* with the same name as that used in the URL which
377 provided such access. Information providers would therefore continue to
378 maintain such servers until most legacy clients had been replaced.
379
380 It should be possible to reserve a few chunks of DNS for naming authori-
381 ties for long-term resource-names. Subdomains within these spaces would
382 be required to NOT be meaningful to humans; thus, the names themselves
383 need never be obsolete. While such a subdomain would initially be
384 assigned to a publisher, the responsibility for serving that subdomain
385
386
387
388 Moore Expires 7 January 1996 [Page 6]
389
390
391
392
393
394
395
396
397
398
399
400 URNs considered harmful 7 July 1995
401
402
403 could be transferred as necessary when that publisher or its intellec-
404 tual property assets changed hands. Finally, resource names would only
405 be assigned for any particular subdomain for a short time (perhaps a
406 year), after which a new resource name would be used. This would allow
407 the resolution service for older subdomains could be shifted from pri-
408 mary servers to "custodians".
409
410 One nice feature of this scheme is that it still works for ordinary
411 users and their home pages. Sites could set up their location and/or
412 URC servers with their existing DNS names; they need not obtain new nam-
413 ing authority names. If a user's web page becomes unexpectedly popular,
414 a resource catalog server and appropriate RCS records could be installed
415 to inform clients of alternate locations, even though no prior arrange-
416 ments were made.
417
418 But publishers and others who had an interest in making resources avail-
419 able over the long-term (including probably anybody who wanted to make
420 money selling access to his works), would see the benefit in using the
421 new name space. And new naming authority names could be distinguished
422 from ordinary DNS names.
423
424 Such a system would not provide scalable resolution of resource names
425 for several centuries in the future. But it probably would work for a
426 few decades, during which usage patterns are almost certainly to change
427 beyond what we can anticipate. It would also encourage building of an
428 infrastructure for maintaining meta-information about resources.
429
430 6. URN functional requirements as applied to URLs
431
432 RFC 1737 lists the following functional requirements for URNs:
433
434 a. Global scope: A URN is a name with global scope which does not
435 imply a location. It has the same meaning everywhere.
436
437 Non-relative URLs already have the same meaning everywhere. If an MX
438 like system is defined, a URL no longer implies a location.
439
440 b. Global uniqueness: The same URN will never be assigned to two dif-
441 ferent resources.
442
443 This is generally true in practice for URLs. It is only rarely that a
444 file name which is exported to the world, is re-used to name a com-
445 pletely different resource. To the extent that this is not true, it is
446 a matter of discipline and education on the part of information
447 providers. If providers do not understand the value of unique naming,
448 the introduction of URNs will not force them to apply the discipline.
449 If providers do understand the value of unique naming, they can provide
450 unique names for URLs.
451
452
453
454 Moore Expires 7 January 1996 [Page 7]
455
456
457
458
459
460
461
462
463
464
465
466 URNs considered harmful 7 July 1995
467
468
469 c. Persistence: It is intended that the lifetime of a URN be perma-
470 nent. That is, the URN will be globally unique forever, and may
471 well be used as a reference to a resource well beyond the lifetime
472 of the resource it identifies or of any naming authority involved
473 in the assignment of its name.
474
475 The persistence of URLs depends on the persistence of their components -
476 file names and domain names.
477
478 Persistence of file names is also a matter of discipline on the part of
479 an information provider, since file systems and existing protocol
480 servers are capable of storing files under fairly arbitrary names. If
481 the "browsing" function that currently requires human-meaningful file
482 names and arranges them into human-meaningful hierarchies is replaced
483 with some other means of resource discovery, information providers are
484 free to begin using some other naming scheme that allows stable names,
485 *without* introducing a new kind of resource name. Similarly, persis-
486 tence of domain names can be achieved by using domain names that do not
487 correspond to organizational names or hierarchies, and which will never
488 be re-assigned.
489
490 On the other hand, for persistent resource names to be useful, requires
491 a long-term commitment to maintain the characteristics and location data
492 and to provide the resolution and/or access services. Few organizations
493 (and not all countries) can reliably commit to providing such support
494 for several generations, yet large centralized databases would provide a
495 dangerous means for control of networked information.
496
497 d. Scalability: URNs can be assigned to any resource that might con-
498 ceivably be available on the network, for hundreds of years.
499
500 URLs meet this requirement.
501
502 e. Legacy support: The scheme must permit the support of existing
503 legacy naming systems, insofar as they satisfy the other require-
504 ments described here. For example, ISBN numbers, ISO public identi-
505 fiers, and UPC product codes seem to satisfy the functional
506 requirements, and allow an embedding that satisfies the syntactic
507 requirements described here.
508
509 URLs were designed to support legacy protocols, and therefore are very
510 adaptable to legacy naming systems.
511
512 f. Extensibility: Any scheme for URNs must permit future extensions to
513 the scheme.
514
515 URLs have proven to be extensible to several protocols other than those
516 for which they were designed.
517
518
519
520 Moore Expires 7 January 1996 [Page 8]
521
522
523
524
525
526
527
528
529
530
531
532 URNs considered harmful 7 July 1995
533
534
535 g. Independence: It is solely the responsibility of a name issuing
536 authority to determine the conditions under which it will issue a
537 name.
538
539 URLs meet this requirement.
540
541 h. Resolution: A URN will not impede resolution (translation into a
542 URL, q.v.). To be more specific, for URNs that have corresponding
543 URLs, there must be some feasible mechanism to translate a URN to a
544 URL.
545
546 By introducing a resolution service which uses URLs as resource names
547 and allows location of resolution servers using DNS, URLs meet this
548 requirement.
549
550 7. Limitations
551
552 If we provide the means for the continued use of URLs, some users and/or
553 information providers may delay migrating to new protocols (URC records,
554 resolution servers) or to better resource naming schemes (URLs based on
555 stable domain names and file names, or a new URL type named "URN").
556 Nevertheless, the new proposal provides incentives to migrate for both
557 users and providers.
558
559 DNS names are not ideally suited as naming authority names. The problem
560 is not so much due to the names themselves (since new DNS subtrees can
561 be reserved for naming authorities) but to the way the lookup hierarchy
562 is exposed in the name. If a particular branch of the tree grows too
563 big (as in the .COM domain), it is difficult to sub-divide it. This can
564 be solved by limiting the number of branches at any node within the por-
565 tion of DNS space reserved for naming authority names. Alternatively, a
566 special-purpose server (which understand the DNS query protocol) could
567 be constructed to provide resolution for any large and flat subspace
568 within the DNS tree.
569
570 8. Summary
571
572 The difficulty in transition from URLs to URNs, combined with the delay
573 by IETF in defining URNs and resolution services, encourages solutions
574 to the existing problems of the web which use existing URLs. Such solu-
575 tions could possibly be implemented and deployed without the cooperation
576 of the IETF.
577
578 On the other hand, IETF could use this mechanism to provide a transition
579 from present-day URLs to names which (while syntactically identical to
580 URLs), had all of the characteristics desired for URNs. The transition
581 to such a scheme would be easier than a transition to an entirely new
582 naming scheme.
583
584
585
586 Moore Expires 7 January 1996 [Page 9]
587
588
589
590
591
592
593
594
595
596
597
598 URNs considered harmful 7 July 1995
599
600
601 Many of the difficulties with the present URL scheme can be seen as a
602 lack of knowledge and discipline on the part of network information
603 providers. This is to be expected, considering the rapid growth of the
604 web and the fact that many of its information providers are ordinary
605 users. These problems would exist regardless of the naming scheme used,
606 and will only be solved through education, new tools, and time.
607
608 9. Security Considerations
609
610 Security Considerations are not addressed in this memo.
611
612 10. Author's Address
613
614 Keith Moore
615 Department of Computer Science
616 University of Tennessee
617 107 Ayres Hall
618 Knoxville TN 37996-1301
619
620 moore@cs.utk.edu
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652 Moore Expires 7 January 1996 [Page 10]
653
654
655
656
657
658
659
660

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24