2004/id/draft-ietf-urn-syntax-00.txt

Internet-Draft                                                Ryan Moats
draft-ietf-urn-syntax-00.txt                                        AT&T
Expires in six months                                       October 1996


                               URN Syntax
                 Filename: draft-ietf-urn-syntax-00.txt


Status of This Memo

      This document is an Internet-Draft.  Internet-Drafts are working
      documents of the Internet Engineering Task Force (IETF), its
      areas, and its working groups.  Note that other groups may also
      distribute working documents as Internet-Drafts.

      Internet-Drafts are draft documents valid for a maximum of six
      months and may be updated, replaced, or obsoleted by other
      documents at any time.  It is inappropriate to use Internet-
      Drafts as reference material or to cite them other than as ``work
      in progress.''

      To learn the current status of any Internet-Draft, please check
      the ``1id-abstracts.txt'' listing contained in the Internet-
      Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
      (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
      Coast), or ftp.isi.edu (US West Coast).


Abstract

   Uniform Resource Names (URNs) are intended to serve as persistent
   resource identifiers. This document presents the syntax for URNs.
   Support for existing legacy namespaces is discussed. URN transmission
   encoding requirements are presented.  Finally, there is a discussion
   of URN equivalence and how to determine it.

1. Syntax

   All URNs have the following syntax:

                    <URN> ::= ["urn:"] <NID> ":" <NSS>

   <NID> is the Namespace Identifier, and <NSS> is the Namespace
   Specific String.  The leading "urn:" sequence is currently optional,
   as no closure on its definite presence or absence has been reached.
   The Namespace ID is used to determine the _syntactic_ interpretation
   of the Namespace Specific String (as discussed in [1]).

   RFC 1737 [2] suggests additional requirements on URN encoding, which
   all have implications as far as limiting syntax.  On the other hand,
   the requirement to support existing legacy naming systems has the
   effect of broadening syntax.  Thus, we discuss the acceptable syntax
   for both the Namespace Identifier and the Namespace Specific String
   separately.

1.1 Namespace Identifier Syntax

   The following is the syntax for the Namespace Identifier. To (a) be
   consistent with all potential resolution schemes and (b) not put any
   undue constraints on any potential resolution scheme, the syntax for
   the Namespace Identifier is:

   <NID>         ::= <letter> [ <let-hyp> ]

   <let-hyp>     ::= <letter> | "-"

   <letter>      ::= any one of the 52 alphabetic characters A through Z
                     in upper case and a through z in lower case

   This is slightly more restrictive that what is stated in RFC 1738 [4]
   (which allows the period "."). Further, the Namespace Identifier is
   case insensitive, so that "ISBN" and "isbn" refer to the same
   namespace.

   To avoid confusion with the optional "urn:" identifier, the NID "urn"
   is reserved and may not be used.

1.2 Namespace Specific String Syntax

   Depending on the rules governing a namespace, valid identifiers in a
   namespace might contain characters that are reserved characters in
   URI syntax or non-printable ASCII characters.  To accommodate the
   largest set of valid identifiers, the NSS portion of a URN shall use
   UTF-8 representation of ISO 10646 as its character set.  Namespaces
   that do not currently use ISO 10646/UTF-8 are encouraged to migrate
   to it.

   Clients MUST be capable of %encoding the UTF-8 formatted NSS.
   %encoding, (as discussed in [3]) uses a percent sign "%" immediately
   followed by two hexadecimal digits (0-9, A-F) giving the binary code
   for that octet. The rules for %encoding presented in [3] apply with
   the following exceptions:

      1. [3] states that occurrence of the '/' character in URIs must
      denote hierarchy, so that partial forms of a URI are possible.
      This restriction is unenforceable, and relative URLs do not have a
      scheme prefix, so we allow URNs to contain unescaped occurrences
      of the '/' character that do not denote hierarchy.

      2. As an optimization when the transport between systems is known
      to be 8-bit-clean, clients MAY omit the %encoding on 8-bit
      characters but MUST still %encode the reserved characters below.

   For historic reasons, the characters "#" (%23), "?" (%3F), "%" (%25),
   "*" (%2A), "!" (%21), "<" (%3C), ">" (%3E), and '"' (%22), are
   reserved and must be %encoded.  Thus client implementers should
   accept URNs from users in an unencoded form but must encode them
   before sending them to a resolver.

   URN resolvers MUST be capable of accepting URNs that have been
   %encoded for either 8-bit clean or 7-bit transports.  %encoding is
   removed first, then UTF-8 decoding is performed.  URN resolvers MUST
   return identical results from ANY legally encoded form of the URN.

   It should be noted that certain characters in the Namespace Specific
   String syntax may have special meaning in certain namespaces.
   Therefore, the process of registering a namespace identifier shall
   include publication of a definition of which characters have a
   special meaning and how to encode these characters if used in a
   literal sense.

2. Support of existing legacy naming systems

   To allow for support existing legacy naming systems (as required by
   [2]), the Namespace Specific String shall be considered an "opaque
   string" in the sense of structure except as mentioned in Section 1.

   In addition, URN servers should be prepared to accept URNs that do
   not use ISO 10646/UTF-8 for those namespaces that currently use a
   different encoding.  Note that this is not a general requirement on
   all resolvers, only resolvers that handle a namespace that is known
   not to use ISO 10646/UTF-8.

3. URN encoding for transmission

   Because the NSS of a URN is considered a series of octets of data,
   encoding URNs for transport is the responsibility of the transport
   mechanism and is not discussed here.  Any mechanism that can handle
   arbitrary 8-bit data will successfully transport a URN.

4. Equivalence in URNs

URNs are considered equivalent if they return the same result.  For
various purposes, such as caching, a test is necessary to determine
equivalence without actually resolving the URNs and fetching/comparing
the underlying resources.  "Lexical equivalence" is a stricter condition
that the equivalence described above (functional equivalence).

4.1 Lexical Equivalence

   Lexical equivalence may be determined by comparing two URNs without
   making any network accesses. Two URNs are lexically equivalent if
   they are octet-by-octet equal after the following preprocessing

           1. remove any %encoding that might be present
           2. drop any preceding "urn:" token
           3. normalize the case of the NID

   Some namespaces may define additional lexical equivalences, such as
   case-insensitivity of the NSS (or parts thereof).  Additional lexical
   equivalences MUST be documented as part of namespace registration,
   MUST always have the effect of eliminating some of the false
   negatives obtained by the procedure above, and MUST NEVER says that
   two URNs are not equivalent if the procedure above says they are
   equivalent.

4.2 Functional Equivalence

   Resolvers determine functional equivalence based on specific rules
   for the namespace.  Therefore, namespace registration must include
   documentation on how to determine functional equivalence for that
   namespace.

4.3 Examples

   The following URN comparisons highlight the difference between these
   types of equivalence:

     urn:isbn:1-23485-8-29, isbn:1-23485-8-29 are lexically equiv.
     urn:isbn:1-23485-8-29, ISBN:1-23485-8-29 are lexically equiv.
     urn:isbn:1-23485-8-29, isbn:123485829 are not lexically equiv.
        but may be functionally equivalent.

5. Security considerations

   Because of the number of potential namespaces, it must be restated
   that certain of the characters in the Namespace Specific String may
   have special meaning to certain namespace resolvers.  The process of
   registering a namespace identifier shall therefore include
   publication of a definition of which characters have a special
   meaning and how to encode these characters if used in a literal
   sense.

6. Acknowledgments

   Thanks to various members of the URN working group and <<your name
   here!!>> for comments on earlier drafts of this document.  This
   document is partially supported by the National Science Foundation.

7. References

   Request For Comments (RFC) and Internet Draft documents are available
   from <URL:ftp://ftp.internic.net> and numerous mirror sites.
         L. L. Daigle, P. Faltstrom, R. Iannella.  "A Framework for the
         Assignment and Resolution of Uniform Resource Names", Internet
         Draft (work in progress).  June 1996.

         K. Sollins, L. Masinter.  "Functional Requirements for Uniform
         Resource Names", RFC 1737.  December 1994.

         T. Berners-Lee. "Universal Resource Identifiers in WWW", RFC
         1630. June 1994.

         T. Berners-Lee, L. Masinter, M. McCahill. "Uniform Resource
         Locators (URL)", RFC 1738.  December 1994.

8. Author's address

   Ryan Moats
   AT&T
   15621 Drexel Circle
   Omaha, NE 68135-2358
   USA

   Phone:  +1 402 894-9456
   EMail:  jayhawk@ds.internic.net


                This Internet Draft expires April 1, 1997.


1	wakaba	1.1	Internet-Draft Ryan Moats
2			draft-ietf-urn-syntax-00.txt AT&T
3			Expires in six months October 1996
4
5
6			URN Syntax
7			Filename: draft-ietf-urn-syntax-00.txt
8
9
10			Status of This Memo
11
12			This document is an Internet-Draft. Internet-Drafts are working
13			documents of the Internet Engineering Task Force (IETF), its
14			areas, and its working groups. Note that other groups may also
15			distribute working documents as Internet-Drafts.
16
17			Internet-Drafts are draft documents valid for a maximum of six
18			months and may be updated, replaced, or obsoleted by other
19			documents at any time. It is inappropriate to use Internet-
20			Drafts as reference material or to cite them other than as ``work
21			in progress.''
22
23			To learn the current status of any Internet-Draft, please check
24			the ``1id-abstracts.txt'' listing contained in the Internet-
25			Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
26			(Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
27			Coast), or ftp.isi.edu (US West Coast).
28
29
30			Abstract
31
32			Uniform Resource Names (URNs) are intended to serve as persistent
33			resource identifiers. This document presents the syntax for URNs.
34			Support for existing legacy namespaces is discussed. URN transmission
35			encoding requirements are presented. Finally, there is a discussion
36			of URN equivalence and how to determine it.
37
38			1. Syntax
39
40			All URNs have the following syntax:
41
42			<URN> ::= ["urn:"] <NID> ":" <NSS>
43
44			<NID> is the Namespace Identifier, and <NSS> is the Namespace
45			Specific String. The leading "urn:" sequence is currently optional,
46			as no closure on its definite presence or absence has been reached.
47			The Namespace ID is used to determine the _syntactic_ interpretation
48			of the Namespace Specific String (as discussed in [1]).
49
50			RFC 1737 [2] suggests additional requirements on URN encoding, which
51			all have implications as far as limiting syntax. On the other hand,
52			the requirement to support existing legacy naming systems has the
53			effect of broadening syntax. Thus, we discuss the acceptable syntax
54			for both the Namespace Identifier and the Namespace Specific String
55			separately.
56
57			1.1 Namespace Identifier Syntax
58
59			The following is the syntax for the Namespace Identifier. To (a) be
60			consistent with all potential resolution schemes and (b) not put any
61			undue constraints on any potential resolution scheme, the syntax for
62			the Namespace Identifier is:
63
64			<NID> ::= <letter> [ <let-hyp> ]
65
66			<let-hyp> ::= <letter> \| "-"
67
68			<letter> ::= any one of the 52 alphabetic characters A through Z
69			in upper case and a through z in lower case
70
71			This is slightly more restrictive that what is stated in RFC 1738 [4]
72			(which allows the period "."). Further, the Namespace Identifier is
73			case insensitive, so that "ISBN" and "isbn" refer to the same
74			namespace.
75
76			To avoid confusion with the optional "urn:" identifier, the NID "urn"
77			is reserved and may not be used.
78
79			1.2 Namespace Specific String Syntax
80
81			Depending on the rules governing a namespace, valid identifiers in a
82			namespace might contain characters that are reserved characters in
83			URI syntax or non-printable ASCII characters. To accommodate the
84			largest set of valid identifiers, the NSS portion of a URN shall use
85			UTF-8 representation of ISO 10646 as its character set. Namespaces
86			that do not currently use ISO 10646/UTF-8 are encouraged to migrate
87			to it.
88
89			Clients MUST be capable of %encoding the UTF-8 formatted NSS.
90			%encoding, (as discussed in [3]) uses a percent sign "%" immediately
91			followed by two hexadecimal digits (0-9, A-F) giving the binary code
92			for that octet. The rules for %encoding presented in [3] apply with
93			the following exceptions:
94
95			1. [3] states that occurrence of the '/' character in URIs must
96			denote hierarchy, so that partial forms of a URI are possible.
97			This restriction is unenforceable, and relative URLs do not have a
98			scheme prefix, so we allow URNs to contain unescaped occurrences
99			of the '/' character that do not denote hierarchy.
100
101			2. As an optimization when the transport between systems is known
102			to be 8-bit-clean, clients MAY omit the %encoding on 8-bit
103			characters but MUST still %encode the reserved characters below.
104
105			For historic reasons, the characters "#" (%23), "?" (%3F), "%" (%25),
106			"*" (%2A), "!" (%21), "<" (%3C), ">" (%3E), and '"' (%22), are
107			reserved and must be %encoded. Thus client implementers should
108			accept URNs from users in an unencoded form but must encode them
109			before sending them to a resolver.
110
111			URN resolvers MUST be capable of accepting URNs that have been
112			%encoded for either 8-bit clean or 7-bit transports. %encoding is
113			removed first, then UTF-8 decoding is performed. URN resolvers MUST
114			return identical results from ANY legally encoded form of the URN.
115
116			It should be noted that certain characters in the Namespace Specific
117			String syntax may have special meaning in certain namespaces.
118			Therefore, the process of registering a namespace identifier shall
119			include publication of a definition of which characters have a
120			special meaning and how to encode these characters if used in a
121			literal sense.
122
123			2. Support of existing legacy naming systems
124
125			To allow for support existing legacy naming systems (as required by
126			[2]), the Namespace Specific String shall be considered an "opaque
127			string" in the sense of structure except as mentioned in Section 1.
128
129			In addition, URN servers should be prepared to accept URNs that do
130			not use ISO 10646/UTF-8 for those namespaces that currently use a
131			different encoding. Note that this is not a general requirement on
132			all resolvers, only resolvers that handle a namespace that is known
133			not to use ISO 10646/UTF-8.
134
135			3. URN encoding for transmission
136
137			Because the NSS of a URN is considered a series of octets of data,
138			encoding URNs for transport is the responsibility of the transport
139			mechanism and is not discussed here. Any mechanism that can handle
140			arbitrary 8-bit data will successfully transport a URN.
141
142			4. Equivalence in URNs
143
144			URNs are considered equivalent if they return the same result. For
145			various purposes, such as caching, a test is necessary to determine
146			equivalence without actually resolving the URNs and fetching/comparing
147			the underlying resources. "Lexical equivalence" is a stricter condition
148			that the equivalence described above (functional equivalence).
149
150			4.1 Lexical Equivalence
151
152			Lexical equivalence may be determined by comparing two URNs without
153			making any network accesses. Two URNs are lexically equivalent if
154			they are octet-by-octet equal after the following preprocessing
155
156			1. remove any %encoding that might be present
157			2. drop any preceding "urn:" token
158			3. normalize the case of the NID
159
160			Some namespaces may define additional lexical equivalences, such as
161			case-insensitivity of the NSS (or parts thereof). Additional lexical
162			equivalences MUST be documented as part of namespace registration,
163			MUST always have the effect of eliminating some of the false
164			negatives obtained by the procedure above, and MUST NEVER says that
165			two URNs are not equivalent if the procedure above says they are
166			equivalent.
167
168			4.2 Functional Equivalence
169
170			Resolvers determine functional equivalence based on specific rules
171			for the namespace. Therefore, namespace registration must include
172			documentation on how to determine functional equivalence for that
173			namespace.
174
175			4.3 Examples
176
177			The following URN comparisons highlight the difference between these
178			types of equivalence:
179
180			urn:isbn:1-23485-8-29, isbn:1-23485-8-29 are lexically equiv.
181			urn:isbn:1-23485-8-29, ISBN:1-23485-8-29 are lexically equiv.
182			urn:isbn:1-23485-8-29, isbn:123485829 are not lexically equiv.
183			but may be functionally equivalent.
184
185			5. Security considerations
186
187			Because of the number of potential namespaces, it must be restated
188			that certain of the characters in the Namespace Specific String may
189			have special meaning to certain namespace resolvers. The process of
190			registering a namespace identifier shall therefore include
191			publication of a definition of which characters have a special
192			meaning and how to encode these characters if used in a literal
193			sense.
194
195			6. Acknowledgments
196
197			Thanks to various members of the URN working group and <<your name
198			here!!>> for comments on earlier drafts of this document. This
199			document is partially supported by the National Science Foundation.
200
201			7. References
202
203			Request For Comments (RFC) and Internet Draft documents are available
204			from <URL:ftp://ftp.internic.net> and numerous mirror sites.
205			L. L. Daigle, P. Faltstrom, R. Iannella. "A Framework for the
206			Assignment and Resolution of Uniform Resource Names", Internet
207			Draft (work in progress). June 1996.
208
209			K. Sollins, L. Masinter. "Functional Requirements for Uniform
210			Resource Names", RFC 1737. December 1994.
211
212			T. Berners-Lee. "Universal Resource Identifiers in WWW", RFC
213			1630. June 1994.
214
215			T. Berners-Lee, L. Masinter, M. McCahill. "Uniform Resource
216			Locators (URL)", RFC 1738. December 1994.
217
218			8. Author's address
219
220			Ryan Moats
221			AT&T
222			15621 Drexel Circle
223			Omaha, NE 68135-2358
224			USA
225
226			Phone: +1 402 894-9456
227			EMail: jayhawk@ds.internic.net
228
229
230			This Internet Draft expires April 1, 1997.
231
232
233
234
235