/[suikacvs]/webroot/www/2004/id/draft-ietf-urn-syntax-04.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-urn-syntax-04.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2 Internet-Draft Ryan Moats
3 draft-ietf-urn-syntax-04.txt AT&T
4 Expires in six months March 1997
5
6
7 URN Syntax
8 Filename: draft-ietf-urn-syntax-04.txt
9
10
11 Status of This Memo
12
13 This document is an Internet-Draft. Internet-Drafts are working
14 documents of the Internet Engineering Task Force (IETF), its
15 areas, and its working groups. Note that other groups may also
16 distribute working documents as Internet-Drafts.
17
18 Internet-Drafts are draft documents valid for a maximum of six
19 months and may be updated, replaced, or obsoleted by other
20 documents at any time. It is inappropriate to use Internet-
21 Drafts as reference material or to cite them other than as ``work
22 in progress.''
23
24 To learn the current status of any Internet-Draft, please check
25 the ``1id-abstracts.txt'' listing contained in the Internet-
26 Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
27 (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
28 Coast), or ftp.isi.edu (US West Coast).
29
30
31 Abstract
32
33 Uniform Resource Names (URNs) are intended to serve as persistent,
34 location-independent, resource identifiers. This document sets
35 forward the canonical syntax for URNs. A discussion of both existing
36 legacy and new namespaces and requirements for URN presentation and
37 transmission are presented. Finally, there is a discussion of URN
38 equivalence and how to determine it.
39
40 1. Introduction
41
42 Uniform Resource Names (URNs) are intended to serve as persistent,
43 location-independent, resource identifiers and are designed to make
44 it easy to map other namespaces (which share the properties of URNs)
45 into URN-space. Therefore, the URN syntax provides a means to encode
46 character data in a form that can be sent in existing protocols,
47 transcribed on most keyboards, etc.
48
49
50
51
52
53 Expires 9/30/97 [Page 1]
54
55
56
57
58
59 INTERNET DRAFT URN Syntax March 1997
60
61
62 2. Syntax
63
64 All URNs have the following syntax (phrases enclosed in quotes are
65 REQUIRED):
66
67 <URN> ::= "urn:" <NID> ":" <NSS>
68
69 where <NID> is the Namespace Identifier, and <NSS> is the Namespace
70 Specific String. The leading "urn:" sequence is case-insensitive.
71 The Namespace ID determines the _syntactic_ interpretation of the
72 Namespace Specific String (as discussed in [1]).
73
74 RFC 1630 [2] and RFC 1737 [3] each presents additional considerations
75 for URN encoding, which have implications as far as limiting syntax.
76 On the other hand, the requirement to support existing legacy naming
77 systems has the effect of broadening syntax. Thus, we discuss the
78 acceptable syntax for both the Namespace Identifier and the Namespace
79 Specific String separately.
80
81 2.1 Namespace Identifier Syntax
82
83 The following is the syntax for the Namespace Identifier. To (a) be
84 consistent with all potential resolution schemes and (b) not put any
85 undue constraints on any potential resolution scheme, the syntax for
86 the Namespace Identifier is:
87
88 <NID> ::= <let-num> [ 1,31<let-num-hyp> ]
89
90 <let-num-hyp> ::= <upper> | <lower> | <number> | "-"
91
92 <let-num> ::= <upper> | <lower> | <number>
93
94 <upper> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
95 "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
96 "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
97 "Y" | "Z"
98
99 <lower> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
100 "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
101 "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
102 "y" | "z"
103
104 <number> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
105 "8" | "9"
106
107
108
109 This is slightly more restrictive that what is stated in [4] (which
110
111
112
113 Expires 9/30/97 [Page 2]
114
115
116
117
118
119 INTERNET DRAFT URN Syntax March 1997
120
121
122 allows the characters "." and "+"). Further, the Namespace
123 Identifier is case insensitive, so that "ISBN" and "isbn" refer to
124 the same namespace.
125
126 To avoid confusion with the "urn:" identifier, the NID "urn" is
127 reserved and MUST NOT be used.
128
129 2.2 Namespace Specific String Syntax
130
131 As required by RFC 1737, there is a single canonical representation
132 of the NSS portion of an URN. The format of this single canonical
133 form follows:
134
135 <NSS> ::= 1*<URN chars>
136
137 <URN chars> ::= <trans> | "%" <hex> <hex>
138
139 <trans> ::= <upper> | <lower> | <number> | <other> | <reserved>
140
141 <hex> ::= <number> | "A" | "B" | "C" | "D" | "E" | "F" |
142 "a" | "b" | "c" | "d" | "e" | "f"
143
144 <other> ::= "(" | ")" | "+" | "," | "-" | "." |
145 ":" | "=" | "@" | ";" | "$" |
146 "_" | "!" | "*" | "'"
147
148 Depending on the rules governing a namespace, valid identifiers in a
149 namespace might contain characters that are not members of the URN
150 character set above (<URN chars>). Such strings MUST be translated
151 into canonical NSS format before using them as protocol elements or
152 otherwise passing them on to other applications. Translation is done
153 by encoding each character outside the URN character set as a
154 sequence of one to six octets using UTF-8 encoding [5], and the
155 encoding of each of those octets as "%" followed by two characters
156 from the <hex> character set above. The two characters give the
157 hexadecimal representation of that octet.
158
159 2.3 Reserved characters
160
161 The remaining character set left to be discussed above is the
162 reserved character set, which contains various characters reserved
163 from normal use. The reserved character set follows, with a
164 discussion on the specifics of why each character is reserved.
165
166 The reserved character set is:
167
168 <reserved> ::= '%" | "/" | "?" | "#"
169
170
171
172
173 Expires 9/30/97 [Page 3]
174
175
176
177
178
179 INTERNET DRAFT URN Syntax March 1997
180
181
182 2.3.1 The "%" character
183
184 The "%" character is reserved in the URN syntax for introducing the
185 escape sequence for an octet. Literal use of the "%" character in a
186 namespace must be encoded using "%25" in URNs for that namespace.
187 The presence of an "%" character in an URN MUST be followed by two
188 characters from the <hex> character set.
189
190 Namespaces MAY designate one or more characters from the URN
191 character set as having special meaning for that namespace. If the
192 namespace also uses that character in a literal sense as well, the
193 character used in a literal sense MUST be encoded with "%" followed
194 by the hexadecimal representation of that octet. Further, a
195 character MUST NOT be "%"-encoded if the character is not a reserved
196 character. Therefore, the process of registering a namespace
197 identifier shall include publication of a definition of which
198 characters have a special meaning to that namespace.
199
200 2.3.2 The other reserved characters
201
202 RFC 1630 [2] reserves the characters "/", "?", and "#" for particular
203 purposes. The URN-WG has not yet debated the applicability and
204 precise semantics of those purposes as applied to URNs. Therefore,
205 these characters are RESERVED for future developments. Namespace
206 developers SHOULD NOT use these characters in unencoded form, but
207 rather use the appropriate %-encoding for each character.
208
209 2.4 Excluded characters
210
211 The following list is included only for the sake of completeness.
212 Any octets/characters on this list are explicitly NOT part of the URN
213 character set, and if used in an URN, MUST be %encoded:
214
215 <excluded> ::= octets 1-32 (1-20 hex) | "\" | """ | "&" | "<"
216 | ">" | "[" | "]" | "^" | "`" | "{" | "|" | "}" | "~"
217 | octets 127-255 (7F-FF hex)
218
219 In addition, octet 0 (0 hex) should NEVER be used, in either
220 unencoded or %-encoded form.
221
222 An URN ends when an octet/character from the excluded character set
223 (<excluded>) is encountered. The character from the excluded
224 character set is NOT part of the URN.
225
226 3. Support of existing legacy naming systems and new naming systems
227
228 Any namespace (existing or newly-devised) that is proposed as an
229 URN-namespace and fulfills the criteria of URN-namespaces MUST be
230
231
232
233 Expires 9/30/97 [Page 4]
234
235
236
237
238
239 INTERNET DRAFT URN Syntax March 1997
240
241
242 expressed in this syntax. If names in these namespaces contain
243 characters other than those defined for the URN character set, they
244 MUST be translated into canonical form as discussed in section 2.2.
245
246 4. URN presentation and transport
247
248 The URN syntax defines the canonical format for URNs and all URN
249 transport and interchanges MUST take place in this format. Further,
250 all URN-aware applications MUST offer the option of displaying URNs
251 in this canonical form to allow for direct transcription (for example
252 by cut and paste techniques). Such applications MAY support display
253 of URNs in a more human-friendly form and may use a character set
254 that includes characters that aren't permitted in URN syntax as
255 defined in this RFC (that is, they may replace %-notation by
256 characters in some extended character set in display to humans).
257
258 5. Lexical Equivalence in URNs
259
260 For various purposes such as caching, it's often desirable to
261 determine if two URNs are the same without resolving them. The
262 general purpose means of doing so is by testing for "lexical
263 equivalence" as defined below.
264
265 Two URNs are lexically equivalent if they are octet-by-octet equal
266 after the following preprocessing:
267
268 1. normalize the case of the leading "urn:" token
269 2. normalize the case of the NID
270 3. normalizing the case of any %-escaping
271
272 Note that %-escaping MUST NOT be removed.
273
274 Some namespaces may define additional lexical equivalences, such as
275 case-insensitivity of the NSS (or parts thereof). Additional lexical
276 equivalences MUST be documented as part of namespace registration,
277 MUST always have the effect of eliminating some of the false
278 negatives obtained by the procedure above, and MUST NEVER say that
279 two URNs are not equivalent if the procedure above says they are
280 equivalent.
281
282 6. Examples of lexical equivalence
283
284 The following URN comparisons highlight the lexical equivalence
285 definitions:
286
287 1- URN:foo:a123,456
288 2- urn:foo:a123,456
289 3- urn:FOO:a123,456
290
291
292
293 Expires 9/30/97 [Page 5]
294
295
296
297
298
299 INTERNET DRAFT URN Syntax March 1997
300
301
302 4- urn:foo:A123,456
303 5- urn:foo:a123%2C456
304 6- URN:FOO:a123%2c456
305 URNs 1, 2, and 3 are all lexically equivalent. URN 4 is not
306 lexically equivalent any of the other URNs of the above set. URNs 5
307 and 6 are only lexically equivalent to each other.
308
309 7. Functional Equivalence in URNs
310
311 Functional equivalence is determined by practice within a given
312 namespace and managed by resolvers for that namespeace. Thus, it is
313 beyond the scope of this document. Namespace registration must
314 include guidance on how to determine functional equivalence for that
315 namespace, i.e. when two URNs are the identical within a namespace.
316
317 8. Security considerations
318
319 This document specifies the syntax for URNs. While some namespaces
320 resolvers may assign special meaning to certain of the characters of
321 the Namespace Specific String, any security consideration resulting
322 from such assignment are outside the scope of this document. It is
323 strongly recommended that the process of registering a namespace
324 identifier include any such considerations.
325
326 9. Acknowledgments
327
328 Thanks to various members of the URN working group for comments on
329 earlier drafts of this document. This document is partially
330 supported by the National Science Foundation, Cooperative Agreement
331 NCR-9218179.
332
333 10. References
334
335 Request For Comments (RFC) and Internet Draft documents are available
336 from <URL:ftp://ftp.internic.net> and numerous mirror sites.
337
338 [1] K. R. Sollins, "Requirements and a Framework for
339 URN Resolution Systems," Internet Draft (work in
340 progress), November 1996.
341
342
343 [2] T. Berners-Lee, "Universal Resource Identifiers in
344 WWW," RFC 1630, June 1994.
345
346
347 [3] K. Sollins and L. Masinter, "Functional Require-
348 ments for Uniform Resource Names," RFC 1737.
349 December 1994.
350
351
352
353 Expires 9/30/97 [Page 6]
354
355
356
357
358
359 INTERNET DRAFT URN Syntax March 1997
360
361
362 [4] T. Berners-Lee, R. Fielding, L. Masinter, "Uniform
363 Resource Locators (URL)," Internet Draft (work in
364 progress), December 1996.
365
366
367 [5] Appendix A.2 of The Unicode Consortium, "The
368 Unicode Standard, Version 2.0", Addison-Wesley
369 Developers Press, 1996. ISBN 0-201-48345-9.
370
371 11. Editor's address
372
373 Ryan Moats
374 AT&T
375 15621 Drexel Circle
376 Omaha, NE 68135-2358
377 USA
378
379 Phone: +1 402 894-9456
380 EMail: jayhawk@ds.internic.net
381
382
383
384
385
386 Appendix A. Handling of URNs by URL resolvers/browsers.
387
388 The URN syntax has been defined so that URNs can be used in places
389 where URLs are expected. A resolver that conforms to the current URL
390 syntax specification [3] will extract a scheme value of "urn:"
391 rather than a scheme value of "urn:<nid>".
392
393 An URN MUST be considered an opaque URL by URL resolvers and passed
394 (with the "urn:" tag) to an URN resolver for resolution. The URN
395 resolver can either be an external resolver that the URL resolver
396 knows of, or it can be functionality built-in to the URL resolver.
397
398 To avoid confusion of users, an URL browser SHOULD display the com-
399 plete URN (including the "urn:" tag) to ensure that there is no con-
400 fusion between URN namespace identifiers and URL scheme identifiers.
401
402
403 This Internet Draft expires September 30, 1997.
404
405
406
407
408
409
410
411
412
413 Expires 9/30/97 [Page 7]
414
415
416

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24