2004/id/draft-ietf-urn-syntax-01.txt



Internet-Draft                                                Ryan Moats
draft-ietf-urn-syntax-01.txt                                        AT&T
Expires in six months                                      November 1996


                               URN Syntax
                 Filename: draft-ietf-urn-syntax-01.txt


Status of This Memo

      This document is an Internet-Draft.  Internet-Drafts are working
      documents of the Internet Engineering Task Force (IETF), its
      areas, and its working groups.  Note that other groups may also
      distribute working documents as Internet-Drafts.

      Internet-Drafts are draft documents valid for a maximum of six
      months and may be updated, replaced, or obsoleted by other
      documents at any time.  It is inappropriate to use Internet-
      Drafts as reference material or to cite them other than as ``work
      in progress.''

      To learn the current status of any Internet-Draft, please check
      the ``1id-abstracts.txt'' listing contained in the Internet-
      Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
      (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
      Coast), or ftp.isi.edu (US West Coast).


Abstract

   Uniform Resource Names (URNs) are intended to serve as persistent
   resource identifiers. This document sets forward the canonical syntax
   for URNs.  Support for both existing legacy and new namespaces is
   discussed. Requirements for URN presentation and transmission are
   presented.  Finally, there is a discussion of URN equivalence and how
   to determine it.

1. Introduction

   Uniform Resource Names (URNs) are intended to serve as persistent
   resource identifiers and are designed to make it easy to map other
   namespaces (which share the properties of URNs) into URN-space. The
   URN syntax therefore provides a means to encode character data in a
   form that can be sent in existing protocols, transcribed on most
   keyboards, etc.


Expires 5/19/97                                                 [Page 1]


INTERNET DRAFT                 URN Syntax                  November 1996


2. Syntax

   All URNs have the following syntax:

                    <URN> ::= ["urn:"] <NID> ":" <NSS>

   <NID> is the Namespace Identifier, and <NSS> is the Namespace
   Specific String.  The leading case-insensitive "urn:" sequence is
   currently optional, as no closure on its definite presence or absence
   has been reached.  The Namespace ID is used to determine the
   _syntactic_ interpretation of the Namespace Specific String (as
   discussed in [1]).

   RFC 1737 [2] presents additional requirements on URN encoding, which
   all have implications as far as limiting syntax.  On the other hand,
   the requirement to support existing legacy naming systems has the
   effect of broadening syntax.  Thus, we discuss the acceptable syntax
   for both the Namespace Identifier and the Namespace Specific String
   separately.

2.1 Namespace Identifier Syntax

   The following is the syntax for the Namespace Identifier. To (a) be
   consistent with all potential resolution schemes and (b) not put any
   undue constraints on any potential resolution scheme, the syntax for
   the Namespace Identifier is:

   <NID>         ::= <letter> [ <let-hyp> ]

   <let-hyp>     ::= <letter> | "-" | <let-hyp>

   <letter>      ::= any one of the 52 alphabetic characters A through Z
                     in upper case and a through z in lower case

   This is slightly more restrictive that what is stated in RFC 1738 [4]
   (which allows the period "."). Further, the Namespace Identifier is
   case insensitive, so that "ISBN" and "isbn" refer to the same
   namespace.

   To avoid confusion with the optional "urn:" identifier, the NID "urn"
   is reserved and may not be used.

2.2 Namespace Specific String Syntax

   As required by 1737, there is a single canonical representation of
   the NSS portion of an URN.   The format of this single canonical form
   follows:


Expires 5/19/97                                                 [Page 2]


INTERNET DRAFT                 URN Syntax                  November 1996


   <NSS>        ::= <URN chars>*

   <URN chars>  ::= <trans> | "%" <hex> <hex>

   <trans>      ::= <upper> | <lower> | <number> | <other>

   <hex>        ::= <number> | "A" | "B" | "C" | "D" | "E" | "F"

   <upper>      ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" |
                    "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" |
                    "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" |
                    "Y" | "Z"

   <lower>      ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
                    "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
                    "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
                    "y" | "z"

   <number>     ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                    "8" | "9"

   <other>      ::= "(" | ")" | "+" | "
                    ":" | "=" | "?" | "@"


   Depending on the rules governing a namespace, valid identifiers in a
   namespace might contain characters that are not members of the URN
   character set above (<URN chars>).  Such strings MUST be translated
   into canonical NSS format before using them as protocol elements or
   otherwise passing them on to other applications. Translation is done
   by encoding each character outside the URN character set as a
   sequence of one to six octets using UTF-8 encoding, and the encoding
   of each of those octets as "%" followed by two characters from the
   <hex> character set above. The two characters give the hexadecimal
   representation of that octet.

   Namespaces MAY designate one or more characters from the URN
   character set as having special meaning for that namespace. If the
   namespace also uses that character in a literal sense as well, the
   character used in a literal sense must be encoded with "%" followed
   by the hexadecimal representation of that octet.  Therefore, the
   process of registering a namespace identifier shall include
   publication of a definition of which characters have a special
   meaning and how to encode these characters if used in a literal
   sense.


Expires 5/19/97                                                 [Page 3]


INTERNET DRAFT                 URN Syntax                  November 1996


3. Support of existing legacy naming systems and new naming systems

   URN-aware applications MAY accept as input other resource identifiers
   from existing legacy namespaces.  If such identifiers contain
   characters that are not members of the URN character set specified in
   section 2.2, the identifier MUST be translated to canonical format as
   discussed in section 2.2.

   Some existing name spaces that have the properties of the URN-space
   contain some human-significant components, and these exist in a wide
   variety of languages.  However, URNs are NOT intended to convey
   information that is significant to humans.  While the translation
   rule in section 2.2 is provided for existing namespaces, new
   namespaces, as part of their registration documentation, MUST define
   a discipline for assigning new URNs that does not simplify the
   generation of human-significant names.

4. URN presentation and transport

   URN-aware applications MAY support "natural" display of URNs which
   contain characters encoded using "%" notation.  However, they MUST
   provide for display of URNs in canonical form (i.e. in a format
   suitable for transcription).

   URNs may only be transported in canonical format.

5. Equivalence in URNs

URNs are considered equivalent if they return the same resource.  For
various purposes, such as caching, a test is necessary to determine
equivalence without actually resolving the URNs and fetching/comparing
the underlying resources.  "Lexical equivalence" is a stricter condition
that the equivalence described above (functional equivalence).

5.1 Lexical Equivalence

   Lexical equivalence may be determined by comparing two URNs without
   making any network accesses. Two URNs are lexically equivalent if
   they are octet-by-octet equal after the following preprocessing

           1. drop any preceding "urn:" token
           2. normalize the case of the NID

   Some namespaces may define additional lexical equivalences, such as
   case-insensitivity of the NSS (or parts thereof).  Additional lexical
   equivalences MUST be documented as part of namespace registration,
   MUST always have the effect of eliminating some of the false
   negatives obtained by the procedure above, and MUST NEVER says that


Expires 5/19/97                                                 [Page 4]


INTERNET DRAFT                 URN Syntax                  November 1996


   two URNs are not equivalent if the procedure above says they are
   equivalent.

5.2 Functional Equivalence

   Resolvers determine functional equivalence based on specific rules
   for the namespace.  Therefore, namespace registration must include
   documentation on how to determine functional equivalence for that
   namespace.

5.3 Examples

   The following URN comparisons highlight the difference between these
   types of equivalence:

     urn:isbn:1-23485-8-29, isbn:1-23485-8-29 are lexically equiv.
     urn:isbn:1-23485-8-29, ISBN:1-23485-8-29 are lexically equiv.
     urn:isbn:1-23485-8-29, isbn:123485829 are not lexically equiv.
        but may be functionally equivalent.

6. Security considerations

   Because of the number of potential namespaces, it must be restated
   that certain of the characters in the Namespace Specific String may
   have special meaning to certain namespace resolvers.  The process of
   registering a namespace identifier shall therefore include
   publication of a definition of which characters have a special
   meaning.

7. Acknowledgments

   Thanks to various members of the URN working group and <<your name
   here!!>> for comments on earlier drafts of this document.  This
   document is partially supported by the National Science Foundation.

8. References

   Request For Comments (RFC) and Internet Draft documents are available
   from <URL:ftp://ftp.internic.net> and numerous mirror sites.

         [1]         L. L. Daigle, P. Faltstrom, R. Iannella.  "A Frame-
                     work for the Assignment and Resolution of Uniform
                     Resource Names," Internet Draft (work in progress).
                     June 1996.


         [2]         K. Sollins, L. Masinter.  "Functional Requirements
                     for Uniform Resource Names," RFC 1737.  December


Expires 5/19/97                                                 [Page 5]


INTERNET DRAFT                 URN Syntax                  November 1996


                     1994.


         [3]         T. Berners-Lee. "Universal Resource Identifiers in
                     WWW," RFC 1630. June 1994.


         [4]         T. Berners-Lee, L. Masinter, M. McCahill. "Uniform
                     Resource Locators (URL)," RFC 1738.  December 1994.

9. Editor's address

   Ryan Moats
   AT&T
   15621 Drexel Circle
   Omaha, NE 68135-2358
   USA

   Phone:  +1 402 894-9456
   EMail:  jayhawk@ds.internic.net


                 This Internet Draft expires May 19, 1997.


Expires 5/19/97                                                 [Page 6]


1
2
3	Internet-Draft Ryan Moats
4	draft-ietf-urn-syntax-01.txt AT&T
5	Expires in six months November 1996
6
7
8	URN Syntax
9	Filename: draft-ietf-urn-syntax-01.txt
10
11
12	Status of This Memo
13
14	This document is an Internet-Draft. Internet-Drafts are working
15	documents of the Internet Engineering Task Force (IETF), its
16	areas, and its working groups. Note that other groups may also
17	distribute working documents as Internet-Drafts.
18
19	Internet-Drafts are draft documents valid for a maximum of six
20	months and may be updated, replaced, or obsoleted by other
21	documents at any time. It is inappropriate to use Internet-
22	Drafts as reference material or to cite them other than as ``work
23	in progress.''
24
25	To learn the current status of any Internet-Draft, please check
26	the ``1id-abstracts.txt'' listing contained in the Internet-
27	Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
28	(Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
29	Coast), or ftp.isi.edu (US West Coast).
30
31
32	Abstract
33
34	Uniform Resource Names (URNs) are intended to serve as persistent
35	resource identifiers. This document sets forward the canonical syntax
36	for URNs. Support for both existing legacy and new namespaces is
37	discussed. Requirements for URN presentation and transmission are
38	presented. Finally, there is a discussion of URN equivalence and how
39	to determine it.
40
41	1. Introduction
42
43	Uniform Resource Names (URNs) are intended to serve as persistent
44	resource identifiers and are designed to make it easy to map other
45	namespaces (which share the properties of URNs) into URN-space. The
46	URN syntax therefore provides a means to encode character data in a
47	form that can be sent in existing protocols, transcribed on most
48	keyboards, etc.
49
50
51
52
53
54	Expires 5/19/97 [Page 1]
55
56
57
58
59
60	INTERNET DRAFT URN Syntax November 1996
61
62
63	2. Syntax
64
65	All URNs have the following syntax:
66
67	<URN> ::= ["urn:"] <NID> ":" <NSS>
68
69	<NID> is the Namespace Identifier, and <NSS> is the Namespace
70	Specific String. The leading case-insensitive "urn:" sequence is
71	currently optional, as no closure on its definite presence or absence
72	has been reached. The Namespace ID is used to determine the
73	_syntactic_ interpretation of the Namespace Specific String (as
74	discussed in [1]).
75
76	RFC 1737 [2] presents additional requirements on URN encoding, which
77	all have implications as far as limiting syntax. On the other hand,
78	the requirement to support existing legacy naming systems has the
79	effect of broadening syntax. Thus, we discuss the acceptable syntax
80	for both the Namespace Identifier and the Namespace Specific String
81	separately.
82
83	2.1 Namespace Identifier Syntax
84
85	The following is the syntax for the Namespace Identifier. To (a) be
86	consistent with all potential resolution schemes and (b) not put any
87	undue constraints on any potential resolution scheme, the syntax for
88	the Namespace Identifier is:
89
90	<NID> ::= <letter> [ <let-hyp> ]
91
92	<let-hyp> ::= <letter> \| "-" \| <let-hyp>
93
94	<letter> ::= any one of the 52 alphabetic characters A through Z
95	in upper case and a through z in lower case
96
97	This is slightly more restrictive that what is stated in RFC 1738 [4]
98	(which allows the period "."). Further, the Namespace Identifier is
99	case insensitive, so that "ISBN" and "isbn" refer to the same
100	namespace.
101
102	To avoid confusion with the optional "urn:" identifier, the NID "urn"
103	is reserved and may not be used.
104
105	2.2 Namespace Specific String Syntax
106
107	As required by 1737, there is a single canonical representation of
108	the NSS portion of an URN. The format of this single canonical form
109	follows:
110
111
112
113
114	Expires 5/19/97 [Page 2]
115
116
117
118
119
120	INTERNET DRAFT URN Syntax November 1996
121
122
123	<NSS> ::= <URN chars>*
124
125	<URN chars> ::= <trans> \| "%" <hex> <hex>
126
127	<trans> ::= <upper> \| <lower> \| <number> \| <other>
128
129	<hex> ::= <number> \| "A" \| "B" \| "C" \| "D" \| "E" \| "F"
130
131	<upper> ::= "A" \| "B" \| "C" \| "D" \| "E" \| "F" \| "G" \| "H" \|
132	"I" \| "J" \| "K" \| "L" \| "M" \| "N" \| "O" \| "P" \|
133	"Q" \| "R" \| "S" \| "T" \| "U" \| "V" \| "W" \| "X" \|
134	"Y" \| "Z"
135
136	<lower> ::= "a" \| "b" \| "c" \| "d" \| "e" \| "f" \| "g" \| "h" \|
137	"i" \| "j" \| "k" \| "l" \| "m" \| "n" \| "o" \| "p" \|
138	"q" \| "r" \| "s" \| "t" \| "u" \| "v" \| "w" \| "x" \|
139	"y" \| "z"
140
141	<number> ::= "0" \| "1" \| "2" \| "3" \| "4" \| "5" \| "6" \| "7" \|
142	"8" \| "9"
143
144	<other> ::= "(" \| ")" \| "+" \| "
145	":" \| "=" \| "?" \| "@"
146
147
148	Depending on the rules governing a namespace, valid identifiers in a
149	namespace might contain characters that are not members of the URN
150	character set above (<URN chars>). Such strings MUST be translated
151	into canonical NSS format before using them as protocol elements or
152	otherwise passing them on to other applications. Translation is done
153	by encoding each character outside the URN character set as a
154	sequence of one to six octets using UTF-8 encoding, and the encoding
155	of each of those octets as "%" followed by two characters from the
156	<hex> character set above. The two characters give the hexadecimal
157	representation of that octet.
158
159	Namespaces MAY designate one or more characters from the URN
160	character set as having special meaning for that namespace. If the
161	namespace also uses that character in a literal sense as well, the
162	character used in a literal sense must be encoded with "%" followed
163	by the hexadecimal representation of that octet. Therefore, the
164	process of registering a namespace identifier shall include
165	publication of a definition of which characters have a special
166	meaning and how to encode these characters if used in a literal
167	sense.
168
169
170
171
172
173
174	Expires 5/19/97 [Page 3]
175
176
177
178
179
180	INTERNET DRAFT URN Syntax November 1996
181
182
183	3. Support of existing legacy naming systems and new naming systems
184
185	URN-aware applications MAY accept as input other resource identifiers
186	from existing legacy namespaces. If such identifiers contain
187	characters that are not members of the URN character set specified in
188	section 2.2, the identifier MUST be translated to canonical format as
189	discussed in section 2.2.
190
191	Some existing name spaces that have the properties of the URN-space
192	contain some human-significant components, and these exist in a wide
193	variety of languages. However, URNs are NOT intended to convey
194	information that is significant to humans. While the translation
195	rule in section 2.2 is provided for existing namespaces, new
196	namespaces, as part of their registration documentation, MUST define
197	a discipline for assigning new URNs that does not simplify the
198	generation of human-significant names.
199
200	4. URN presentation and transport
201
202	URN-aware applications MAY support "natural" display of URNs which
203	contain characters encoded using "%" notation. However, they MUST
204	provide for display of URNs in canonical form (i.e. in a format
205	suitable for transcription).
206
207	URNs may only be transported in canonical format.
208
209	5. Equivalence in URNs
210
211	URNs are considered equivalent if they return the same resource. For
212	various purposes, such as caching, a test is necessary to determine
213	equivalence without actually resolving the URNs and fetching/comparing
214	the underlying resources. "Lexical equivalence" is a stricter condition
215	that the equivalence described above (functional equivalence).
216
217	5.1 Lexical Equivalence
218
219	Lexical equivalence may be determined by comparing two URNs without
220	making any network accesses. Two URNs are lexically equivalent if
221	they are octet-by-octet equal after the following preprocessing
222
223	1. drop any preceding "urn:" token
224	2. normalize the case of the NID
225
226	Some namespaces may define additional lexical equivalences, such as
227	case-insensitivity of the NSS (or parts thereof). Additional lexical
228	equivalences MUST be documented as part of namespace registration,
229	MUST always have the effect of eliminating some of the false
230	negatives obtained by the procedure above, and MUST NEVER says that
231
232
233
234	Expires 5/19/97 [Page 4]
235
236
237
238
239
240	INTERNET DRAFT URN Syntax November 1996
241
242
243	two URNs are not equivalent if the procedure above says they are
244	equivalent.
245
246	5.2 Functional Equivalence
247
248	Resolvers determine functional equivalence based on specific rules
249	for the namespace. Therefore, namespace registration must include
250	documentation on how to determine functional equivalence for that
251	namespace.
252
253	5.3 Examples
254
255	The following URN comparisons highlight the difference between these
256	types of equivalence:
257
258	urn:isbn:1-23485-8-29, isbn:1-23485-8-29 are lexically equiv.
259	urn:isbn:1-23485-8-29, ISBN:1-23485-8-29 are lexically equiv.
260	urn:isbn:1-23485-8-29, isbn:123485829 are not lexically equiv.
261	but may be functionally equivalent.
262
263	6. Security considerations
264
265	Because of the number of potential namespaces, it must be restated
266	that certain of the characters in the Namespace Specific String may
267	have special meaning to certain namespace resolvers. The process of
268	registering a namespace identifier shall therefore include
269	publication of a definition of which characters have a special
270	meaning.
271
272	7. Acknowledgments
273
274	Thanks to various members of the URN working group and <<your name
275	here!!>> for comments on earlier drafts of this document. This
276	document is partially supported by the National Science Foundation.
277
278	8. References
279
280	Request For Comments (RFC) and Internet Draft documents are available
281	from <URL:ftp://ftp.internic.net> and numerous mirror sites.
282
283	[1] L. L. Daigle, P. Faltstrom, R. Iannella. "A Frame-
284	work for the Assignment and Resolution of Uniform
285	Resource Names," Internet Draft (work in progress).
286	June 1996.
287
288
289	[2] K. Sollins, L. Masinter. "Functional Requirements
290	for Uniform Resource Names," RFC 1737. December
291
292
293
294	Expires 5/19/97 [Page 5]
295
296
297
298
299
300	INTERNET DRAFT URN Syntax November 1996
301
302
303	1994.
304
305
306	[3] T. Berners-Lee. "Universal Resource Identifiers in
307	WWW," RFC 1630. June 1994.
308
309
310	[4] T. Berners-Lee, L. Masinter, M. McCahill. "Uniform
311	Resource Locators (URL)," RFC 1738. December 1994.
312
313	9. Editor's address
314
315	Ryan Moats
316	AT&T
317	15621 Drexel Circle
318	Omaha, NE 68135-2358
319	USA
320
321	Phone: +1 402 894-9456
322	EMail: jayhawk@ds.internic.net
323
324
325	This Internet Draft expires May 19, 1997.
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354	Expires 5/19/97 [Page 6]
355
356
357