/[suikacvs]/webroot/www/2004/id/draft-ietf-urn-syntax-03.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-urn-syntax-03.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 wakaba 1.1
2    
3    
4    
5    
6     Internet-Draft Ryan Moats
7     draft-ietf-urn-syntax-03.txt AT&T
8     Expires in six months March 1997
9    
10    
11     URN Syntax
12     Filename: draft-ietf-urn-syntax-03.txt
13    
14    
15     Status of This Memo
16    
17     This document is an Internet-Draft. Internet-Drafts are working
18     documents of the Internet Engineering Task Force (IETF), its
19     areas, and its working groups. Note that other groups may also
20     distribute working documents as Internet-Drafts.
21    
22     Internet-Drafts are draft documents valid for a maximum of six
23     months and may be updated, replaced, or obsoleted by other
24     documents at any time. It is inappropriate to use Internet-
25     Drafts as reference material or to cite them other than as ``work
26     in progress.''
27    
28     To learn the current status of any Internet-Draft, please check
29     the ``1id-abstracts.txt'' listing contained in the Internet-
30     Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
31     (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
32     Coast), or ftp.isi.edu (US West Coast).
33    
34    
35     Abstract
36    
37     Uniform Resource Names (URNs) are intended to serve as persistent,
38     location-independent, resource identifiers. This document sets
39     forward the canonical syntax for URNs. A discussion of both existing
40     legacy and new namespaces and requirements for URN presentation and
41     transmission are presented. Finally, there is a discussion of URN
42     equivalence and how to determine it.
43    
44     1. Introduction
45    
46     Uniform Resource Names (URNs) are intended to serve as persistent,
47     location-independent, resource identifiers and are designed to make
48     it easy to map other namespaces (which share the properties of URNs)
49     into URN-space. Therefore, the URN syntax provides a means to encode
50     character data in a form that can be sent in existing protocols,
51     transcribed on most keyboards, etc.
52    
53    
54    
55    
56    
57     Expires 9/30/97 [Page 1]
58    
59    
60    
61    
62    
63     INTERNET DRAFT URN Syntax March 1997
64    
65    
66     2. Syntax
67    
68     All URNs have the following syntax (phrases enclosed in quotes are
69     REQUIRED):
70    
71     <URN> ::= "urn:" <NID> ":" <NSS>
72    
73     where <NID> is the Namespace Identifier, and <NSS> is the Namespace
74     Specific String. The leading "urn:" sequence is case-insensitive.
75     The Namespace ID determines the _syntactic_ interpretation of the
76     Namespace Specific String (as discussed in [1]).
77    
78     RFC 1630 [2] and RFC 1737 [3] each presents additional considerations
79     for URN encoding, which have implications as far as limiting syntax.
80     On the other hand, the requirement to support existing legacy naming
81     systems has the effect of broadening syntax. Thus, we discuss the
82     acceptable syntax for both the Namespace Identifier and the Namespace
83     Specific String separately.
84    
85     2.1 Namespace Identifier Syntax
86    
87     The following is the syntax for the Namespace Identifier. To (a) be
88     consistent with all potential resolution schemes and (b) not put any
89     undue constraints on any potential resolution scheme, the syntax for
90     the Namespace Identifier is:
91    
92     <NID> ::= <let-num> [ 1,31<let-num-hyp> ]
93    
94     <let-num-hyp> ::= <upper> | <lower> | <number> | "-"
95    
96     <let-num> ::= <upper> | <lower> | <number>
97    
98     <upper> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
99     "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
100     "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
101     "Y" | "Z"
102    
103     <lower> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
104     "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
105     "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
106     "y" | "z"
107    
108     <number> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
109     "8" | "9"
110    
111    
112    
113     This is slightly more restrictive that what is stated in [4] (which
114    
115    
116    
117     Expires 9/30/97 [Page 2]
118    
119    
120    
121    
122    
123     INTERNET DRAFT URN Syntax March 1997
124    
125    
126     allows the characters "." and "+"). Further, the Namespace
127     Identifier is case insensitive, so that "ISBN" and "isbn" refer to
128     the same namespace.
129    
130     To avoid confusion with the "urn:" identifier, the NID "urn" is
131     reserved and MUST NOT be used.
132    
133     2.2 Namespace Specific String Syntax
134    
135     As required by RFC 1737, there is a single canonical representation
136     of the NSS portion of an URN. The format of this single canonical
137     form follows:
138    
139     <NSS> ::= 1*<URN chars>
140    
141     <URN chars> ::= <trans> | "%" <hex> <hex>
142    
143     <trans> ::= <upper> | <lower> | <number> | <other> | <reserved>
144    
145     <hex> ::= <number> | "A" | "B" | "C" | "D" | "E" | "F" |
146     "a" | "b" | "c" | "d" | "e" | "f"
147    
148     <other> ::= "(" | ")" | "+" | "," | "-" | "." |
149     ":" | "=" | "@" | ";" | "$" |
150     "_" | "!" | "*" | "'"
151    
152     Depending on the rules governing a namespace, valid identifiers in a
153     namespace might contain characters that are not members of the URN
154     character set above (<URN chars>). Such strings MUST be translated
155     into canonical NSS format before using them as protocol elements or
156     otherwise passing them on to other applications. Translation is done
157     by encoding each character outside the URN character set as a
158     sequence of one to six octets using UTF-8 encoding, and the encoding
159     of each of those octets as "%" followed by two characters from the
160     <hex> character set above. The two characters give the hexadecimal
161     representation of that octet.
162    
163     2.3 Reserved characters
164    
165     The remaining character set left to be discussed above is the
166     reserved character set, which contains various characters reserved
167     from normal use. The reserved character set follows, with a
168     discussion on the specifics of why each character is reserved.
169    
170     The reserved character set is:
171    
172     <reserved> ::= '%" | "/" | "?" | "#"
173    
174    
175    
176    
177     Expires 9/30/97 [Page 3]
178    
179    
180    
181    
182    
183     INTERNET DRAFT URN Syntax March 1997
184    
185    
186     2.3.1 The "%" character
187    
188     The "%" character is reserved in the URN syntax for introducing the
189     escape sequence for an octet. Literal use of the "%" character in a
190     namespace must be encoded using "%25" in URNs for that namespace.
191     The presence of an "%" character in an URN MUST be followed by two
192     characters from the <hex> character set.
193    
194     Namespaces MAY designate one or more characters from the URN
195     character set as having special meaning for that namespace. If the
196     namespace also uses that character in a literal sense as well, the
197     character used in a literal sense MUST be encoded with "%" followed
198     by the hexadecimal representation of that octet. Further, a
199     character MUST NOT be "%"-encoded if the character is not a reserved
200     character. Therefore, the process of registering a namespace
201     identifier shall include publication of a definition of which
202     characters have a special meaning to that namespace.
203    
204     2.3.2 The other reserved characters
205    
206     RFC 1630 [2] reserves the characters "/", "?", and "#" for particular
207     purposes. The URN-WG has not yet debated the applicability and
208     precise semantics of those purposes as applied to URNs. Therefore,
209     these characters are RESERVED for future developments. Namespace
210     developers SHOULD NOT use these characters in unencoded form, but
211     rather use the appropriate %-encoding for each character.
212    
213     2.4 Excluded characters
214    
215     The following list is included only for the sake of completeness.
216     Any octets/characters on this list are explicitly NOT part of the URN
217     character set, and if used in an URN, MUST be %encoded:
218    
219     <excluded> ::= octets 1-32 (1-20 hex) | "\" | """ | "&" | "<"
220     | ">" | "[" | "]" | "^" | "`" | "{" | "|" | "}" | "~"
221     | octets 127-255 (7F-FF hex)
222    
223     In addition, octet 0 (0 hex) should NEVER be used, in either
224     unencoded or %-encoded form.
225    
226     An URN ends when an octet/character from the excluded character set
227     (<excluded>) is encountered. The character from the excluded
228     character set is NOT part of the URN.
229    
230     3. Support of existing legacy naming systems and new naming systems
231    
232     Any namespace (existing or newly-devised) that is proposed as an
233     URN-namespace and fulfills the criteria of URN-namespaces MUST be
234    
235    
236    
237     Expires 9/30/97 [Page 4]
238    
239    
240    
241    
242    
243     INTERNET DRAFT URN Syntax March 1997
244    
245    
246     expressed in this syntax. If names in these namespaces contain
247     characters other than those defined for the URN character set, they
248     MUST be translated into canonical form as discussed in section 2.2.
249    
250     4. URN presentation and transport
251    
252     The URN syntax defines the canonical format for URNs and all URN
253     transport and interchanges MUST take place in this format. Further,
254     all URN-aware applications MUST offer the option of displaying URNs
255     in this canonical form to allow for direct transcription (for example
256     by cut and paste techniques). Such applications MAY support display
257     of URNs in a more human-friendly form and may use a character set
258     that includes characters that aren't permitted in URN syntax as
259     defined in this RFC (that is, they may replace %-notation by
260     characters in some extended character set in display to humans).
261    
262     5. Lexical Equivalence in URNs
263    
264     For various purposes such as caching, it's often desirable to
265     determine if two URNs are the same without resolving them. The
266     general purpose means of doing so is by testing for "lexical
267     equivalence" as defined below.
268    
269     Two URNs are lexically equivalent if they are octet-by-octet equal
270     after the following preprocessing:
271    
272     1. normalize the case of the leading "urn:" token
273     2. normalize the case of the NID
274     3. normalizing the case of any %-escaping
275    
276     Note that %-escaping MUST NOT be removed.
277    
278     Some namespaces may define additional lexical equivalences, such as
279     case-insensitivity of the NSS (or parts thereof). Additional lexical
280     equivalences MUST be documented as part of namespace registration,
281     MUST always have the effect of eliminating some of the false
282     negatives obtained by the procedure above, and MUST NEVER say that
283     two URNs are not equivalent if the procedure above says they are
284     equivalent.
285    
286     6. Examples of lexical equivalence
287    
288     The following URN comparisons highlight the lexical equivalence
289     definitions:
290    
291     1- URN:foo:a123,456
292     2- urn:foo:a123,456
293     3- urn:FOO:a123,456
294    
295    
296    
297     Expires 9/30/97 [Page 5]
298    
299    
300    
301    
302    
303     INTERNET DRAFT URN Syntax March 1997
304    
305    
306     4- urn:foo:A123,456
307     5- urn:foo:a123%2C456
308     6- URN:FOO:a123%2c456
309     URNs 1, 2, and 3 are all lexically equivalent. URN 4 is not
310     lexically equivalent any of the other URNs of the above set. URNs 5
311     and 6 are only lexically equivalent to each other.
312    
313     7. Functional Equivalence in URNs
314    
315     Functional equivalence is determined by practice within a given
316     namespace and managed by resolvers for that namespeace. Thus, it is
317     beyond the scope of this document. Namespace registration must
318     include guidance on how to determine functional equivalence for that
319     namespace, i.e. when two URNs are the identical within a namespace.
320    
321     8. Security considerations
322    
323     This document specifies the syntax for URNs. While some namespaces
324     resolvers may assign special meaning to certain of the characters of
325     the Namespace Specific String, any security consideration resulting
326     from such assignment are outside the scope of this document. It is
327     strongly recommended that the process of registering a namespace
328     identifier include any such considerations.
329    
330     9. Acknowledgments
331    
332     Thanks to various members of the URN working group and <<your name
333     here!!>> for comments on earlier drafts of this document. This
334     document is partially supported by the National Science Foundation,
335     Cooperative Agreement NCR-9218179.
336    
337     10. References
338    
339     Request For Comments (RFC) and Internet Draft documents are available
340     from <URL:ftp://ftp.internic.net> and numerous mirror sites.
341    
342     [1] K. R. Sollins, "Requirements and a Framework for
343     URN Resolution Systems," Internet Draft (work in
344     progress), November 1996.
345    
346    
347     [2]
348     T. Berners-Lee, "Universal Resource Identifiers in WWW," RFC
349     1630, June 1994.
350    
351    
352     [3] K. Sollins and L. Masinter, "Functional Require-
353     ments for Uniform Resource Names," RFC 1737.
354    
355    
356    
357     Expires 9/30/97 [Page 6]
358    
359    
360    
361    
362    
363     INTERNET DRAFT URN Syntax March 1997
364    
365    
366     December 1994.
367    
368    
369     [4] T. Berners-Lee, R. Fielding, L. Masinter, "Uniform
370     Resource Locators (URL)," Internet Draft (work in
371     progress), December 1996.
372    
373     11. Editor's address
374    
375     Ryan Moats
376     AT&T
377     15621 Drexel Circle
378     Omaha, NE 68135-2358
379     USA
380    
381     Phone: +1 402 894-9456
382     EMail: jayhawk@ds.internic.net
383    
384    
385    
386    
387    
388     Appendix A. Handling of URNs by URL resolvers/browsers.
389    
390     The URN syntax has been defined so that URNs can be used in places
391     where URLs are expected. A resolver that conforms to the current URL
392     syntax specification [3] will extract a scheme value of "urn:"
393     rather than a scheme value of "urn:<nid>".
394    
395     An URN MUST be considered an opaque URL by URL resolvers and passed
396     (with the "urn:" tag) to an URN resolver for resolution. The URN
397     resolver can either be an external resolver that the URL resolver
398     knows of, or it can be functionality built-in to the URL resolver.
399    
400     To avoid confusion of users, an URL browser SHOULD display the com-
401     plete URN (including the "urn:" tag) to ensure that there is no con-
402     fusion between URN namespace identifiers and URL scheme identifiers.
403    
404    
405     This Internet Draft expires September 30, 1997.
406    
407    
408    
409    
410    
411    
412    
413    
414    
415    
416    
417     Expires 9/30/97 [Page 7]
418    
419    

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24