1 |
INTERNET-DRAFT |
2 |
|
3 |
The Path URN Specification |
4 |
************************** |
5 |
|
6 |
draft-ietf-uri-urn-path-01.txt |
7 |
Expires 17 Jan 96 |
8 |
|
9 |
Daniel LaLiberte <liberte@ncsa.uiuc.edu> |
10 |
Michael Shapiro <mshapiro@ncsa.uiuc.edu> |
11 |
|
12 |
Status of this memo |
13 |
=================== |
14 |
|
15 |
This document is an Internet-Draft. Internet-Drafts are working |
16 |
documents of the Internet Engineering Task Force (IETF), its |
17 |
areas, and its working groups. Note that other groups may also |
18 |
distribute working documents as Internet-Drafts. |
19 |
|
20 |
Internet-Drafts are draft documents valid for a maximum of six |
21 |
months and may be updated, replaced, or obsoleted by other |
22 |
documents at any time. It is inappropriate to use Internet-Drafts as |
23 |
reference material or to cite them other than as "work in progress." |
24 |
|
25 |
To learn the current status of any Internet-Draft, please check the |
26 |
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow |
27 |
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), |
28 |
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or |
29 |
ftp.isi.edu (US West Coast). |
30 |
|
31 |
This Internet Draft expires 17 Jan 96. |
32 |
|
33 |
Last modified: Wed Jul 26 09:56:05 CDT 1995 |
34 |
|
35 |
This document is also available in HTML at: |
36 |
|
37 |
<URL: http://union.ncsa.uiuc.edu/~liberte/www/path.html> |
38 |
|
39 |
Modifications of that document relative to the internet draft are |
40 |
shown in italic font. |
41 |
|
42 |
Abstract |
43 |
======== |
44 |
|
45 |
A new "path" URN scheme is proposed that defines a uniformly |
46 |
hierarchical name space. This URN scheme supports dynamic |
47 |
relocation and replication of resources. Existing DNS technology is |
48 |
used to resolve a path into sets of equivalent URLs, and then one |
49 |
URL is resolved into the named resource. |
50 |
|
51 |
Introduction |
52 |
============ |
53 |
|
54 |
The path scheme defines a uniformly hierarchical name space |
55 |
where a path URN is a sequence of components and an optional |
56 |
opaque string. An example path URN is: |
57 |
|
58 |
path:/A/B/C/doc.html |
59 |
|
60 |
The path is /A/B/C and the opaque string is doc.html. |
61 |
|
62 |
The significant features of the path URN scheme include the |
63 |
following: |
64 |
|
65 |
Highly Scalable |
66 |
|
67 |
The resolution process is highly scalable due to several |
68 |
factors. Resolution is distributed as much as the named |
69 |
resources themselves are. (This also permits the resolution |
70 |
of names to be handled by servers that are motivated to |
71 |
maintain the service because they also serve the named |
72 |
resources.) The public hierarchy enables clients to make |
73 |
use of caches of resolver locations. |
74 |
|
75 |
Dynamically Reconfigurable |
76 |
|
77 |
The resolution process is reconfigurable to support |
78 |
additional scalability and persistence of names in the event |
79 |
of relocations. The responsibility for resolution of a part of a |
80 |
name space may be delegated to another resolver or |
81 |
several parts of the name space may be recombined and |
82 |
resolved by a single server. |
83 |
|
84 |
Built-in Fallback Mechanism |
85 |
|
86 |
The resolution process has a built-in fallback mechanism in |
87 |
case the original resolver is uncooperative in forwarding |
88 |
references to resources that have moved. |
89 |
|
90 |
Easily Deployed |
91 |
|
92 |
The resolution and name assignment mechanisms are easily |
93 |
deployable since they use existing DNS technology and URL |
94 |
resolution schemes such as HTTP and FTP. Only a small |
95 |
amount of path-specific code is added to clients or proxy |
96 |
servers. Existing URLs may be automatically mapped to path |
97 |
URNs. |
98 |
|
99 |
Resolves to Resource |
100 |
|
101 |
A path resolves first into a list of sets of equivalent URLs, |
102 |
and then second, that list is resolved into the named |
103 |
resource using one of the URLs. The type of the resource is |
104 |
identified by the protocol of the particular URL that is used; if |
105 |
metadata for the resource is desired instead, the particular |
106 |
URL scheme may provide it. The path URN scheme does not |
107 |
depend on URCs. |
108 |
|
109 |
In this document, we first describe the name assignment and |
110 |
resolution process conceptually. This is followed by a more detailed |
111 |
description of the protocol, the encoding rules, and the compliance |
112 |
to URN requirements. |
113 |
|
114 |
Name Assignment |
115 |
=============== |
116 |
|
117 |
Names of resources are assigned by naming authorities that are |
118 |
responsible for a subtree of the name space, and naming |
119 |
authorities may delegate naming responsibility to sub-authorities. |
120 |
The top-most naming authority in the hierarchy is known as the root |
121 |
naming authority. Each naming authority corresponds to a name |
122 |
resolution service; a name resolution service may be shared by |
123 |
several naming authorities. |
124 |
|
125 |
A naming authority may create any new name for a resource as |
126 |
long as the encoding rules described below are met. Once a name |
127 |
has been assigned, it should never be assigned again for a different |
128 |
resource, as per the URN requirements. Naming authorities are |
129 |
responsibile for meeting this uniqueness requirement. |
130 |
|
131 |
A path name may be declared by the appropriate naming authority |
132 |
as the name of a collection of resources. Such a name must end |
133 |
with a final "/". The resource that a collection name resolves into is |
134 |
undefined by the path scheme protocol. Not all prefixes of path |
135 |
names are guaranteed to be names of collections. |
136 |
|
137 |
An automatic mapping from most FTP and HTTP URLs to path |
138 |
URNs is feasible and will speed deployment. However, the |
139 |
generated names may not be appropriate for some HTTP URLs due |
140 |
to encoding requirements or misleading semantics, so some |
141 |
manual intervention or customization of the generation process will |
142 |
be required. Since the process is repeatable, the same generator |
143 |
service may be used as a URN lookup service given URLs. The |
144 |
generator service is not described in this document. |
145 |
|
146 |
The Name Resolution Process |
147 |
=========================== |
148 |
|
149 |
The resolution process is described in two steps. The first step |
150 |
resolves the name into an ordered list of URL-sets. The second |
151 |
step attempts to resolve URLs from successive sets in the list until |
152 |
the resolution succeeds or the list is exhausted. |
153 |
|
154 |
The first step in the resolution process involves traversing the |
155 |
components of the path, left to right. Each component in the path |
156 |
(except the final opaque string) has two functions. One function is to |
157 |
provide a context for resolving the remainder of the path. The |
158 |
context for resolving the first component is the resolver for the root |
159 |
naming authority. The other function is to optionally provide a set of |
160 |
equivalent URLs (called a URL-set) constructed from URL-prefixes |
161 |
and the remainder of the path. All URLs in a set are equivalent in |
162 |
that each should resolve to the "same" resource, if it resolves at all. |
163 |
|
164 |
The first step ends when no more URL-sets are found. The result is |
165 |
a list of URL-sets ordered from most-specific to least-specific in |
166 |
the reverse order that they were discovered during the first step. |
167 |
|
168 |
The second step is to resolve the list of URL-sets to the named |
169 |
resource. A URL (which may be a URN) is selected from the first set |
170 |
(e.g., randomly) and resolution of the URL is attempted. Any of the |
171 |
URLs may be URNs, even other path URNs. If the resolution fails |
172 |
because the URL service is unavailable (e.g. connection failure), |
173 |
another URL is selected from the set, until none are left; retries with |
174 |
exponential backoff may then follow, or the path resolution process |
175 |
may be declared a failure. Alternatively, if the resolution of a URL |
176 |
fails because the URL is unknown, then the process is repeated |
177 |
with the next set in the list. The process is repeated until the |
178 |
resolution succeeds or the list is exhausted (which implies |
179 |
resolution failure). |
180 |
|
181 |
If the resolution of a URL results in a redirection to yet another URL, |
182 |
then that redirection should be followed to determine if it succeeds |
183 |
before declaring that the first URL has been resolved. A failure to |
184 |
resolve the redirection should be treated as the same kind of failure |
185 |
to resolve the first URL. |
186 |
|
187 |
Reconfiguration of the Resolution Service |
188 |
+++++++++++++++++++++++++++++++++++++++++ |
189 |
|
190 |
The resolution process may be dynamically reconfigured in a |
191 |
number of ways to meet the requirements of scalability and |
192 |
persistence. |
193 |
|
194 |
o Resolvers may delegate part of their resolution service to |
195 |
sub-resolvers. Since the most specific URL-set is used |
196 |
first, the sub-resolver will have the first chance to resolve |
197 |
the URLs. Note, however, that a sub-resolver can only be |
198 |
created where there is already a corresponding sub-naming |
199 |
authority; that is, the name space must have already been |
200 |
subdivided. |
201 |
|
202 |
Administrators of a resolution service may want to delegate |
203 |
resolution to sub-resolvers for one of two reasons: to |
204 |
reduce the load on a resolver, or to allow a sub-resolver to |
205 |
be located elsewhere on the internet. |
206 |
|
207 |
o Responsibility for resolution of a set of path names may fall |
208 |
back to higher level resolution services in the event that a |
209 |
lower level resolution service declines to either resolve the |
210 |
paths to the resources or provide redirects. |
211 |
|
212 |
Examples |
213 |
++++++++ |
214 |
|
215 |
In the following partial tree diagram, the nodes marked with * have |
216 |
URL-sets associated with them. |
217 |
|
218 |
/ |
219 |
| |
220 |
------------------------------- |
221 |
A1 A2 |
222 |
| | |
223 |
-------------------------- |
224 |
B1* B2* |
225 |
| | |
226 |
---------- | |
227 |
C1 C2* C |
228 |
| |
229 |
D* |
230 |
|
231 |
/A/B1 names resources under /A/B1 except those under |
232 |
/A/B1/C2 |
233 |
/A/B2 names resources under /A/B2 except those under |
234 |
/A/B2/C/D |
235 |
|
236 |
Details of the Resolution Process |
237 |
================================= |
238 |
|
239 |
This section describes more details of the path scheme resolution |
240 |
process using existing capabilities of the Domain Name System |
241 |
(DNS) [3]. In principle, the path scheme protocol could use any |
242 |
global, hierarchical name system that provides the necessary |
243 |
functionality, but it is necessary to specify one protocol so clients |
244 |
and servers can communicate. The main reason for using DNS is |
245 |
that it is widely deployed and relatively stable. |
246 |
|
247 |
The path name space may use existing the DNS name space, or a |
248 |
newly created name space within DNS devoted to the path name |
249 |
space, or some combination of both. (This draft does not specify |
250 |
which will be used.) |
251 |
|
252 |
A small amount of new code is required on the client side to drive |
253 |
the resolution process, but generic proxy mechanisms available in |
254 |
many WWW browsers may be used with a path proxy server to |
255 |
share the process across a number of clients. |
256 |
|
257 |
Resolving the Name into URL-sets |
258 |
+++++++++++++++++++++++++++++++++ |
259 |
|
260 |
The implementation uses DNS TXT records that are typed, based |
261 |
on the information they contain. At present, there is one type of path |
262 |
TXT record beginning with "path-u". TXT records that begin with |
263 |
"path-" are reserved for future extensions. |
264 |
|
265 |
The "path-u" TXT record is followed by a single URL-prefix. Note |
266 |
that a URL-prefix is not necessarily a full URL; it specifies a |
267 |
resolution service and it is used to construct a full URL during |
268 |
resolution. There may be multiple "path-u" TXT records for a single |
269 |
DNS name, and each should logically specify equivalent resolution |
270 |
services. |
271 |
|
272 |
The DNS step of the resolution process proceeds as follows. |
273 |
|
274 |
1. The list of URL-sets is initialized to the empty list. |
275 |
|
276 |
2. The entire path URN, except the scheme and the opaque |
277 |
string, is converted to lowercase and then to DNS names |
278 |
(one name for each component of the path). For example, |
279 |
|
280 |
path:/A/B2/C1/doc.html is converted to |
281 |
/a/b2/c1/doc.html and then to the DNS names |
282 |
|
283 |
. (the root of DNS) |
284 |
a. |
285 |
a.b2. |
286 |
a.b2.c2. |
287 |
|
288 |
3. For each of the DNS names, in order of the shortest name to |
289 |
longest name, all TXT records associated with it are |
290 |
requested using DNS resolvers. |
291 |
|
292 |
If there are any "path-u" TXT records for a particular DNS |
293 |
name, then a URL-set is constructed from the URL-prefixes |
294 |
in the TXT records and the set is added to the head of the |
295 |
list. The URLs in a URL-set are constructed by appending |
296 |
the remaining components of the path and the opaque string |
297 |
to each URL-prefix. |
298 |
|
299 |
For example, suppose that while resolving |
300 |
path:/A/B2/C1/doc.html, we discover the the TXT record |
301 |
corresponding to the DNS name b2.a. is |
302 |
"path-u http://ietf.org/path/docs" |
303 |
Since b2.a. corresponds to /A/B2I, and remainder of the |
304 |
path is "/c1/doc.html", then the URL for this URL-prefix |
305 |
would be |
306 |
http://ietf.org/path/docs/c1/doc.html |
307 |
|
308 |
To clarify the above algorithm, some examples are presented. The |
309 |
examples use the partial document tree specified previously. The |
310 |
DNS entries for this partial tree are: |
311 |
|
312 |
TXT |
313 |
a. -none- |
314 |
b1.a. "path-u http://ietf.org/path/docs" |
315 |
c2.b1.a. "path-u http://www.org:70/docs" |
316 |
b2.a. "path-u http://ietf.org/path/docs" |
317 |
d.c.b1.a. "path-u http://www.org:70/docs" |
318 |
"path-u http://w3c.org/docs/www" |
319 |
|
320 |
Example lookups |
321 |
|
322 |
/A/B1/C1/doc.html |
323 |
|
324 |
a. no "path-u" record |
325 |
repeat with b1.a. |
326 |
b1.a. URL http://ietf.org/path/docs/c1/doc.html |
327 |
repeat with c1.b1.a. |
328 |
c1.b1.a. unknown DNS name - done |
329 |
|
330 |
List of URL-sets is |
331 |
|
332 |
{http://ietf.org/path/docs/c1/doc.html} |
333 |
|
334 |
/A/B2/C/D/doc.html |
335 |
|
336 |
a. no "path-u" record |
337 |
repeat with b2.a. |
338 |
b2.a. URL ftp://ietf.org/path/docs/c/d/doc.html |
339 |
repeat with c.b2.a. |
340 |
c.b2.a. no "path-u" record |
341 |
repeat with d.c.b2.a. |
342 |
d.c.b2.a. URL http://www.org:70/docs/doc.html |
343 |
URL ftp://w3c.org/docs/www/doc.html |
344 |
done |
345 |
|
346 |
List of URL-sets is |
347 |
|
348 |
{http://www.org:70/docs/doc.html, ftp://w3c.org/docs/www/doc.html} |
349 |
{ftp://ietf.org/path/docs/c/d/doc.html} |
350 |
|
351 |
|
352 |
Resolving the URL-sets into the Resource |
353 |
+++++++++++++++++++++++++++++++++++++++++ |
354 |
|
355 |
After constructing a list of URL-sets, it must be resolved into the |
356 |
named resource. The list of URL-sets could itself be an object that |
357 |
may be passed back from proxy servers to clients or cached for |
358 |
later use. But here we describe the resolution of the list of URL-sets |
359 |
into the named resource independent of which agent resolves it or |
360 |
whether it is a first class object. |
361 |
|
362 |
A URL is selected from the first set (e.g., randomly) and resolution of |
363 |
the URL is attempted. Any of the URLs may be URNs, even other |
364 |
path URNs. The appropriate protocol, as indicated by the scheme of |
365 |
the URL and user preference, is used to resolve it. For example, if |
366 |
the URL were http://ietf.org/path/docs/c1/doc.html, then |
367 |
the HTTP protocol is typically used to resolve that URL using the |
368 |
GET method. |
369 |
|
370 |
If the resolution fails because the URL service is unavailable, |
371 |
another URL is selected from the set, until none are left; retries with |
372 |
exponential backoff may then follow, or the path resolution process |
373 |
may be declared a failure. Resolution may fail because the server |
374 |
doesn't exist, or the connection times out before or after it is made, |
375 |
or the server returns an error code indicating that the service is |
376 |
unavailable. |
377 |
|
378 |
Alternatively, if the resolution of a URL fails because the URL is |
379 |
unknown, then the process is repeated with the next set in the list. |
380 |
The process is repeated until the resolution succeeds or the list is |
381 |
exhausted (which implies resolution failure). |
382 |
|
383 |
If the resolution of a URL results in a redirection to yet another URL |
384 |
(which may be a URN), then that redirection should be followed to |
385 |
determine if it succeeds before declaring that the first URL has been |
386 |
resolved. |
387 |
|
388 |
Management Issues |
389 |
================= |
390 |
|
391 |
(This section will describe what administrators of naming authorities |
392 |
and resolvers need to do to manage their portion of the path name |
393 |
space.) |
394 |
|
395 |
Encoding Syntax |
396 |
=============== |
397 |
|
398 |
The encoding rules may vary depending on the underlying |
399 |
implemenation, but, again, we assume DNS is used. Therefore, the |
400 |
components of a path must be compatible with DNS <label> names. |
401 |
Hex encodings must be used for uppercase characters in the name |
402 |
that are to be distinguished from the corresponding lowercase |
403 |
characters. Hex encoding is also required for dot ("."), the DNS |
404 |
component separator, and slash ("/"), if it is used within a |
405 |
component name or the opaque string. Here is a BNF description of |
406 |
the encoding rules. |
407 |
|
408 |
<path-urn> ::= "path:" <name> |
409 |
<name> ::= <path> "/" [ <final-part> ] |
410 |
<path> ::= "" | "/" <label> [ <path> ] |
411 |
|
412 |
<final-part> ::= any ascii character except "/" |
413 |
|
414 |
<label> ::= any ascii character except "/", or "." |
415 |
|
416 |
|
417 |
URN Requirements |
418 |
================ |
419 |
|
420 |
The path scheme meets all of the requirements for Universal |
421 |
Resource Names, as described in [2]. For each functional |
422 |
requirement, we discuss how the path scheme is in conformance |
423 |
with it. We also discuss conformance to the encoding requirements. |
424 |
|
425 |
Functional Requirements |
426 |
+++++++++++++++++++++++ |
427 |
|
428 |
o Global scope: The root of the path name space will be known |
429 |
to all clients, and for each node in the hierarchical name |
430 |
space, the corresponding resolution service will know all its |
431 |
subnodes. This guarantees that any particular path URN will |
432 |
have the same meaning for each client. |
433 |
|
434 |
o Global uniqueness: Each node in the hierarchical name |
435 |
space corresponds to a naming authority that is responsible |
436 |
for guaranteeing uniqueness within that portion of the name |
437 |
space, or for delegating that responsibility to a |
438 |
sub-authority. |
439 |
|
440 |
o Persistence: To help guarantee that path URNs remain |
441 |
useful as long as they are needed, the path scheme allows |
442 |
any subtree of the name space to be served at any net |
443 |
location, and this location may be changed without having to |
444 |
change names. Additionally, a fallback mechanism is |
445 |
provided by the protocol in case a resolver does not wish to |
446 |
forward requests. |
447 |
|
448 |
o Scalability: Assignment of path names is scalable for an |
449 |
arbitrarily large number of names since the assignment |
450 |
process is distributed across an arbitrarily large number of |
451 |
naming authorities. The name resolution process is also |
452 |
scalable for any number of names and clients, as discussed |
453 |
below under "Resolution". Each naming authority and |
454 |
resolution service need know about only a small number of |
455 |
neighboring authorities and services. |
456 |
|
457 |
o Legacy support: A legacy naming scheme may be supported |
458 |
in the path URN scheme by assigning it a well-known path |
459 |
naming authority, preferrably near the root. Hierarchical |
460 |
names may be mapped to appropriate path names, or any |
461 |
names may be embedded in the opaque string of a path |
462 |
name. E.g. path:/isbn/1-884777-01-5 or |
463 |
path:/isbn/1/884777/01/5 |
464 |
|
465 |
o Extensibility: Same as for legacy support. |
466 |
|
467 |
o Independence: Every path naming authority is constrained |
468 |
by the requirements of the path scheme (e.g. components of |
469 |
the path must follow the encoding rules), but control of |
470 |
whether a naming authority issues a conforming name in its |
471 |
name space is up to that authority alone. |
472 |
|
473 |
o Resolution: The path scheme facilitates efficient resolution of |
474 |
path URNs. The hierarchical nature of the name space |
475 |
allows clients to use caches of remote resolution server |
476 |
locations, so clients rarely need to query servers near the |
477 |
top of the hierarchy. For additional scalability, a server may |
478 |
delegate resolution of parts of its name space to other |
479 |
servers, and clients may then bypass contacting the original |
480 |
server. |
481 |
|
482 |
There is an implied assumption in the URN requirements document |
483 |
that names resolve into locations or metadata as opposed to the |
484 |
resources themselves. This based on the need for indirection to |
485 |
allow the resource to change location, which we agree with. |
486 |
However, a path name is actually a dynamic location since the |
487 |
resolution process always finds the current location of the resolvers |
488 |
along the path. So there is no need to impose the requirement of an |
489 |
explicit indirection solely for the purpose of finding the current |
490 |
location. |
491 |
|
492 |
Encoding Requirements |
493 |
+++++++++++++++++++++ |
494 |
|
495 |
The encoding syntax for path URNs conforms to the requirements |
496 |
for generic URLs and for URNs. |
497 |
|
498 |
Security Considerations |
499 |
======================= |
500 |
|
501 |
The decentralized path scheme is arguably less vulnerable to |
502 |
attack than are centralized services. |
503 |
|
504 |
The path scheme depends on DNS for most of the resolution |
505 |
process, and insofar as DNS is secure or insecure, so is the path |
506 |
scheme. A more complete reference of relevant weaknesses |
507 |
should be included here. |
508 |
|
509 |
The hierarchical path scheme allows security constraints to be |
510 |
imposed on just the subtree of names that require it. The resolution |
511 |
process hides whether a name actually is resolvable by first |
512 |
requesting authentication. |
513 |
|
514 |
References |
515 |
========== |
516 |
|
517 |
1. Berners-Lee, T., Masinter, L., McCahill, M. (editors), |
518 |
"Uniform Resource Locators (URL)", RFC 1738, December |
519 |
1994. ftp://ds.internic.net/rfc/rfc1738.txt |
520 |
|
521 |
2. Sollins, K., Masinter, L. "Functional Requirements for Uniform |
522 |
Resource Names", RFC 1737, December 1994. |
523 |
ftp://ds.internic.net/rfc/rfc1737.txt |
524 |
|
525 |
3. Mockapetris, P., "Domain Names - Implementation and |
526 |
Specification", RFC 1035, November 1987. |
527 |
ftp://ds.internic.net/rfc/rfc1035.txt |
528 |
|
529 |
4. T. Berners-Lee, R. T. Fielding, H. Frystyk Nielsen, HTTP |
530 |
Internet-Draft, "Hypertext Transfer Protocol -- HTTP/1.0". |
531 |
The name of the draft at the time of this writing is |
532 |
"draft-ietf-http-v10-spec-00.txt". |
533 |
|
534 |
Author Contact Information |
535 |
========================== |
536 |
|
537 |
Daniel LaLiberte |
538 |
National Center for Supercomputing Applications |
539 |
152 Computing Appliations Building |
540 |
605 East Springfield Avenue |
541 |
Champaign, IL 61820 |
542 |
Tel: (217) 244-0013 |
543 |
liberte@ncsa.uiuc.edu |
544 |
|
545 |
Michael Shapiro |
546 |
National Center for Supercomputing Applications |
547 |
152 Computing Applications Building |
548 |
605 East Springfield Avenue |
549 |
Champaign, IL 61820 |
550 |
Tel: (217) 244-6642 |
551 |
mshapiro@ncsa.uiuc.edu |
552 |
|
553 |
draft-ietf-uri-urn-path-01.txt |
554 |
Expires 17 Jan 96 |
555 |
|