1 |
|
2 |
INTERNET-DRAFT |
3 |
|
4 |
The Path URN Specification |
5 |
************************** |
6 |
|
7 |
draft-ietf-uri-urn-path-00.txt |
8 |
Expires Sept 25, 1995 |
9 |
|
10 |
Daniel LaLiberte <liberte@ncsa.uiuc.edu> |
11 |
Michael Shapiro <mshapiro@ncsa.uiuc.edu> |
12 |
|
13 |
This document is also available in HTML at: |
14 |
|
15 |
<URL: http://union.ncsa.uiuc.edu/~liberte/www/path.html> |
16 |
|
17 |
Status of this memo |
18 |
=================== |
19 |
|
20 |
This document is an Internet-Draft. Internet-Drafts are working |
21 |
documents of the Internet Engineering Task Force (IETF), its areas, |
22 |
and its working groups. Note that other groups may also distribute |
23 |
working documents as Internet-Drafts. |
24 |
|
25 |
Internet-Drafts are draft documents valid for a maximum of six |
26 |
months and may be updated, replaced, or obsoleted by other |
27 |
documents at any time. It is inappropriate to use Internet-Drafts as |
28 |
reference material or to cite them other than as "work in progress." |
29 |
|
30 |
To learn the current status of any Internet-Draft, please check the |
31 |
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow |
32 |
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), |
33 |
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or |
34 |
ftp.isi.edu (US West Coast). |
35 |
|
36 |
This Internet Draft expires Sept 25, 1995. |
37 |
|
38 |
Last modified: Mon Mar 20 12:13:51 1995 |
39 |
|
40 |
Abstract |
41 |
======== |
42 |
|
43 |
A new "path" URN scheme is proposed that defines a uniformly |
44 |
hierarchical name space. The resolution of a path URN is a two-step |
45 |
process: locating the resolution server and locating the resource |
46 |
within the server. Existing DNS capabilities are used to locate the |
47 |
resolution server and HTTP is used as the protocol for locating a |
48 |
resource within the server. |
49 |
|
50 |
Introduction |
51 |
============ |
52 |
|
53 |
Conceptually, the path scheme defines a uniformly hierarchical name |
54 |
space. A path is a sequence of components and an optional opaque |
55 |
string. An example path is: |
56 |
|
57 |
path:/A/B/C/doc.html |
58 |
|
59 |
Names are assigned by naming authorities that are responsible for a |
60 |
subtree of the name space, and naming authories may delegate |
61 |
responsibility to sub-authorities. Each naming authority corresponds |
62 |
to a name resolution service, which may be shared by several |
63 |
naming authorities. |
64 |
|
65 |
In this document, we first describe the name resolution process |
66 |
conceptually. This is followed by a detailed description of our |
67 |
(planned) implementation, the encoding rules, and the discussion of |
68 |
URN requirements. |
69 |
|
70 |
The Name Resolution Process |
71 |
=========================== |
72 |
|
73 |
This section describes the resolution process conceptually but not |
74 |
completely. See the implementation section for the details. |
75 |
|
76 |
The name resolution process involves two steps: First we traverse |
77 |
the path left to right until we find a most-specific server, then we |
78 |
interact with that server to resolve the remainder of the path name. |
79 |
The server has the option of returning a redirection to a URL. |
80 |
|
81 |
The resolution process starts at the path name root located at some |
82 |
fixed, globally known network address. The root corresponds to a |
83 |
name resolution service which resolves the first component of a path |
84 |
into the address of another node. Generally, each node in the |
85 |
hierarchy resolves a path component into another node at the next |
86 |
lower level. This process repeats until no more-specific resolver is |
87 |
found. |
88 |
|
89 |
The name resolver for each node must tell clients whether there is a |
90 |
more-specific resolver for the given path. This information will be |
91 |
used by clients to avoid requesting resolution for components of the |
92 |
path that do not have a more-specific resolver. If there is a |
93 |
more-specific resolver, then the client proceeds with the process of |
94 |
requesting subsequent components of the path. If there is not a |
95 |
more-specific resolver, then this first phase of the resolution |
96 |
process is completed. |
97 |
|
98 |
Clients are expected to make use of caches to retain information |
99 |
about recently visited name resolvers so that resolution of a path |
100 |
can start from the most-specific known resolver instead of at the |
101 |
root. |
102 |
|
103 |
Once the most-specific resolver is found for a particular path, it |
104 |
returns the address of a separate terminal resolver to the client. The |
105 |
client then sends the full path to this terminal resolver. The path |
106 |
scheme defines the protocol for interacting with the terminal resolver |
107 |
as HTTP. |
108 |
|
109 |
The result of the terminal resolution may be any document, identified |
110 |
by Content-type, or it may be a redirection to a URL. The URL may |
111 |
be, for example, an http URL or another path URN. |
112 |
|
113 |
Implementation of Resolution |
114 |
============================ |
115 |
|
116 |
The implementation of the resolution process follows the abstract |
117 |
two-step process. The first step resolves the name into an IP |
118 |
address and a port number. The second step involves contacting a |
119 |
server at the IP address and port number returned by the first step |
120 |
and, using the HTTP protocol, issuing a GET of the entire URN. |
121 |
|
122 |
Resolving the name into a server and port number |
123 |
+++++++++++++++++++++++++++++++++++++++++++++++++ |
124 |
|
125 |
The resolution of a name into a server and port number is done |
126 |
using existing DNS capabilities. As an aid for the discussion that |
127 |
follows, the following partial document tree is used: |
128 |
|
129 |
/ |
130 |
| |
131 |
A |
132 |
| |
133 |
-------------------------- |
134 |
| | |
135 |
B1* B2* |
136 |
| | |
137 |
---------- | |
138 |
| | | |
139 |
C1 C2* C |
140 |
| |
141 |
D* |
142 |
|
143 |
The nodes marked with * are server nodes. They have one or more |
144 |
(IP-address, port) pairs associated with them. |
145 |
|
146 |
/A/B1 serves all documents under /A/B1 except /A/B1/C2 |
147 |
/A/B2 serves all documents under /A/B2 execpt /A/B2/C/D |
148 |
|
149 |
The resolution process proceeds as follows. |
150 |
|
151 |
1. The entire URN, except the scheme and the final component, |
152 |
is converted to a DNS name appended with ".path.urn". For |
153 |
example, |
154 |
|
155 |
path:/A/B2/C1/doc.html is converted to |
156 |
c1.b2.a.path.urn |
157 |
|
158 |
2. Partial-names are built starting with the last three |
159 |
components of the DNS name and iteratively adding |
160 |
components. All DNS records associated with this |
161 |
partial-name are requested using DNS resolvers. |
162 |
|
163 |
o If the TXT record is missing, then the URN does not |
164 |
resolve into a server and the URN is assumed to be |
165 |
invalid. |
166 |
|
167 |
o If there is an A record, then this is a server node. The |
168 |
TXT record lists sub-nodes not handled by this |
169 |
server. |
170 |
|
171 |
o If none of the sub-nodes listed in the TXT |
172 |
record match, then this is the server. |
173 |
|
174 |
o Else this implies that there is a DNS entry for |
175 |
the sub-node. The matching component is |
176 |
added to the partial-name to form a new |
177 |
partial-name and this step is repeated. |
178 |
|
179 |
o If there is no A record |
180 |
|
181 |
o If no A record has been encountered up to this |
182 |
point, the next component of the URN is added |
183 |
to the partial-name to form a new partial-name |
184 |
and this step repeated. |
185 |
|
186 |
o If at least one A record has been encounted up |
187 |
to this point |
188 |
|
189 |
o If none of the sub-nodes listed in the |
190 |
TXT record match the remaining |
191 |
components of the path, then the most |
192 |
recent partial-name that had an A |
193 |
record is the server for this name. |
194 |
|
195 |
o Else this implies that there is a DNS |
196 |
entry for the sub-node. The matching |
197 |
component is added to the partial-name |
198 |
to form a new partial-name and this step |
199 |
is repeated. |
200 |
|
201 |
Once the server DNS entry is located, the IP-address(es) |
202 |
are extracted from the A record and the associated port |
203 |
number(s) extracted from the TXT record. |
204 |
|
205 |
To clarify the above algorithm, some examples are presented. The |
206 |
examples use the partial document tree specified previously. The |
207 |
DNS entries for this partial tree are: |
208 |
|
209 |
TXT A |
210 |
a.path.urn -empty- -none- |
211 |
b1.a.path.urn c2, port=n ip-address |
212 |
c2.b1.a.path.urn port=n ip-address |
213 |
b2.a.path.urn d.c, port=n ip-address |
214 |
d.c.b2.a.path.urn port=n ip-address |
215 |
|
216 |
Example lookups |
217 |
|
218 |
/A/B1/C1/doc.ps |
219 |
|
220 |
a.path.urn no A record |
221 |
repeat with b1.a.path.urn |
222 |
b1.a.path.urn has A record, TXT doesn't have c1 |
223 |
this is the server |
224 |
|
225 |
/A/B2/C/D/doc.ps |
226 |
|
227 |
a.path.urn no A record |
228 |
repeat with b2.a.path.urn |
229 |
b2.a.path.urn has A record, TXT has d.c |
230 |
repeat with d.c.b2.a.path.urn |
231 |
d.c.b2.a.path.urn has A record |
232 |
this is the server |
233 |
|
234 |
Alternatively, there could be an entry for c.b2.a.path.urn instead |
235 |
of it being subsumed in b2.a.path.urn: |
236 |
|
237 |
TXT A |
238 |
a.path.urn -empty- -none- |
239 |
b2.a.path.urn c, port=n ip-address |
240 |
c.b2.a.path.urn d -none- |
241 |
d.c.b2.a.path.urn port=n ip-address |
242 |
|
243 |
The lookups proceed as |
244 |
|
245 |
/A/B2/C/D/doc.ps |
246 |
|
247 |
a.path.urn no A record |
248 |
repeat with b2.a.path.urn |
249 |
b2.a.path.urn has A record, TXT has c |
250 |
repeat with c.b2.a.path.urn |
251 |
c.b2.a.path.urn no A record, TXT has d |
252 |
repeat with d.c.b2.a.path.urn |
253 |
d.c.b2.a.path.urn has A record |
254 |
this is the server |
255 |
|
256 |
/A/B2/C/E/doc.ps |
257 |
|
258 |
a.path.urn no A record |
259 |
repeat with b2.a.path.urn |
260 |
b2.a.path.urn has A record, TXT has c |
261 |
repeat with c.b2.a.path.urn |
262 |
c.b2.a.path.urn no A record, TXT does not have e |
263 |
server at b2.a.path.urn |
264 |
|
265 |
Locating the Resource |
266 |
+++++++++++++++++++++ |
267 |
|
268 |
The full path URN is passed to the server using the HTTP protocol |
269 |
as a GET request. The server must either return a full response |
270 |
(with HTTP header and response), or a URI-header in HTTP |
271 |
message types 301 (moved permanently) or 302 (moved |
272 |
temporarily). For the redirect messages, the client should process |
273 |
the URLs normally. |
274 |
|
275 |
If the HTTP server returns a full response, the object returned could |
276 |
be the named object itself, or it might be metadata for the object. In |
277 |
either case, it would be identified by the Content-type header line. If |
278 |
and when URC standards are defined, clients that are capable of |
279 |
handling URCs indicate that in the Accepts header line. For clients |
280 |
that cannot handle URCs, the server could automatically process |
281 |
the URC to instead return a URL for the object, or it could return the |
282 |
object itself. |
283 |
|
284 |
Encoding Syntax |
285 |
=============== |
286 |
|
287 |
<path-urn> ::= "path:" <name> |
288 |
<name> ::= <path> "/" [ <final-part> ] |
289 |
<path> ::= "" | "/" <label> [ <path> ] |
290 |
|
291 |
<final-part> ::= any ascii character except "/" |
292 |
|
293 |
<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ] |
294 |
<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str> |
295 |
<let-dig-hyp> ::= <let-dig> | "-" |
296 |
<let-dig> ::= <letter> | <digit> |
297 |
<letter> ::= A..Z | a..z |
298 |
<digit> ::= 0..9 |
299 |
|
300 |
|
301 |
Note the <label> is defined using the same rules as the domain |
302 |
name <label>. RFC 1035, specifies that |
303 |
|
304 |
"... while upper and lower case letters are allowed in domain |
305 |
names, no significance is attached to the case. That is, two |
306 |
names with the same spelling but different case are to be |
307 |
treated as identical |
308 |
|
309 |
"The labels must follow the rules for ARPANET host names. |
310 |
They must start with a letter, end with a letter or digit, and |
311 |
have as interior characters only letters, digits, and hyphens. |
312 |
There are also some restrictions on the length. Labels must |
313 |
be 63 characters or less." |
314 |
|
315 |
This document specifies that <label> have the same rules as the |
316 |
<label> in RFC 1035. |
317 |
|
318 |
Naming Collections |
319 |
++++++++++++++++++ |
320 |
|
321 |
A prefix of a name may be declared by the corresponding naming |
322 |
authority as the name of a collection. Such a prefix must end with a |
323 |
final "/". The behavior of resolving the name of a collection is |
324 |
undefined at this point. |
325 |
|
326 |
URN Requirements |
327 |
================ |
328 |
|
329 |
The path scheme meets most of the requirements for Universal |
330 |
Resource Names, as described in [2]. For each functional |
331 |
requirement, we discuss how the path scheme is in conformance |
332 |
with it or why it should not be a consideration. We also discuss |
333 |
conformance to the encoding requirements. |
334 |
|
335 |
[These comments regarding the URN requirements themselves |
336 |
should perhaps be in another document, or in a revision of the URN |
337 |
Requirements document.] |
338 |
|
339 |
Functional Requirements |
340 |
+++++++++++++++++++++++ |
341 |
|
342 |
o Global scope: The root of the path name space will be known |
343 |
to all clients, and for each node in the hierarchical name |
344 |
space, the corresponding resolution service will know all its |
345 |
subnodes. This guarantees that any particular path URN will |
346 |
have the same meaning for each client. |
347 |
|
348 |
o Global uniqueness: Each node in the hierarchical name |
349 |
space corresponds to a naming authority that is responsible |
350 |
for guaranteeing uniqueness within that portion of the name |
351 |
space, or for delegating that responsibity to a sub-authority. |
352 |
|
353 |
o Persistence: To help guarentee that path URNs remain useful |
354 |
as long as they are needed, the path scheme allows any |
355 |
subtree of the name space to be served at any net location, |
356 |
and this location may be changed without having to change |
357 |
names. But there will always exist names that no one wants to |
358 |
continue to support indefinitely. |
359 |
|
360 |
o Scalability: Assignment of path names is scalable for an |
361 |
arbitrarily large number of documents because the |
362 |
assignment process is distributed across an arbitrarily large |
363 |
number of naming authorities. The name resolution process is |
364 |
also scalable for any number of documents and clients, as |
365 |
discussed below under "Resolution". Each naming authority |
366 |
and resolution service need know about only a small number |
367 |
of neighboring authorities and services. |
368 |
|
369 |
o Legacy support: The path URN scheme does not itself |
370 |
support existing legacy naming schemes, but it permits them |
371 |
to be supported outside of the path scheme via the |
372 |
extensible, generic URL scheme. |
373 |
|
374 |
o Extensibility: New URN schemes may be supported outside of |
375 |
the path scheme via the extensible, generic URL scheme. |
376 |
|
377 |
o Independence: Every path naming authority is constrained by |
378 |
the requirements of the path scheme (e.g. components of the |
379 |
path must follow the encoding rules), but control of whether a |
380 |
naming authority issues a conforming name in its name space |
381 |
is up to that authority alone. |
382 |
|
383 |
o Resolution: The path scheme facilitates efficient resolution of |
384 |
path URNs. The hierarchical nature of the name space allows |
385 |
clients to use caches of remote resolution server locations, |
386 |
so clients rarely need to query servers near the top of the |
387 |
hierarchy. For additional scalability, a server may delegate |
388 |
resolution of parts of its name space to other servers, and |
389 |
clients would then bypass contacting the original server. |
390 |
|
391 |
There is an implied assumption in the URN requirements document |
392 |
that names resolve into locations as opposed to the documents |
393 |
themselves. This assumption is predicated on the need for |
394 |
independence from static location, which we agree with. However, a |
395 |
path name is actually a dynamic location since the resolution |
396 |
process always finds the current location of the resolvers along the |
397 |
path. So there is no need to impose the additional indirection of a |
398 |
map from names to locations solely for the purpose of finding the |
399 |
current location. There are other advantages of indirection, however. |
400 |
|
401 |
Instead, the path scheme permits different types of documents to be |
402 |
returned from the resolution process, identified by Content-types as |
403 |
defined by the HTTP protocol, or locations may be returned via |
404 |
Redirect commands. |
405 |
|
406 |
Encoding Requirements |
407 |
+++++++++++++++++++++ |
408 |
|
409 |
The encoding syntax for path URNs conforms to the requirements for |
410 |
generic URLs. Since we intend paths to be used as URNs, the |
411 |
encoding syntax must also conform to the encoding requirements of |
412 |
URNs. |
413 |
|
414 |
The encoding requirements for URNs are met by the path scheme |
415 |
except potentially for the simple comparison requirement. The path |
416 |
scheme may be used in such a way that a single resource has only |
417 |
one path name, and this constraint would be consistent with the |
418 |
simple comparison requirement. But this requirement does not |
419 |
specify the intended meaning of a comparison. The intention might |
420 |
be that if two URNs are compared, inequality implies that the two |
421 |
resources named by the URNs must necessarily be different. On the |
422 |
other hand, the comparison might be intended only to find out if the |
423 |
names themselves are supposed to be equivalent, modulo variation |
424 |
in character sets and whitespace. |
425 |
|
426 |
In general, we must allow that a single resource may have multiple |
427 |
names by different naming schemes. So the simple comparison |
428 |
requirement cannot be met across multiple naming schemes. Is |
429 |
there sufficient advantage for the constraint that a resource have |
430 |
only one name per naming scheme? Tools (such as browsers and |
431 |
caches) should be made to work with the knowledge that resources |
432 |
do not necessarily have a single name, by perhaps remembering the |
433 |
canonical name for a resource in addition to its alternative names. |
434 |
|
435 |
References |
436 |
========== |
437 |
|
438 |
1. Berners-Lee, T., Masinter, L., McCahill, M. (editors), "Uniform |
439 |
Resource Locators (URL)", RFC 1738, December 1994. |
440 |
ftp://ds.internic.net/rfc/rfc1738.txt |
441 |
|
442 |
2. Sollins, K., Masinter, L. "Functional Requirements for Uniform |
443 |
Resource Names", RFC 1737, December 1994. |
444 |
ftp://ds.internic.net/rfc/rfc1737.txt |
445 |
|
446 |
3. Mockapetris, P., "Domain Names - Implementation and |
447 |
Specification", RFC 1035, November 1987. |
448 |
ftp://ds.internic.net/rfc/rfc1035.txt |
449 |
|
450 |
4. Fielding, R., HTTP |
451 |
|
452 |
Author Contact Information |
453 |
========================== |
454 |
|
455 |
Daniel LaLiberte |
456 |
National Center for Supercomputing Applications |
457 |
152 Computing Appliations Building |
458 |
605 East Springfield Avenue |
459 |
Champaign, IL 61820 |
460 |
Tel: (217) 244-0013 |
461 |
liberte@ncsa.uiuc.edu |
462 |
|
463 |
Michael Shapiro |
464 |
National Center for Supercomputing Applications |
465 |
152 Computing Appliations Building |
466 |
605 East Springfield Avenue |
467 |
Champaign, IL 61820 |
468 |
Tel: (217) 244-6642 |
469 |
mshapiro@ncsa.uiuc.edu |
470 |
|
471 |
draft-ietf-uri-urn-path-00.txt |
472 |
Expires Sept 25, 1995 |
473 |
|