1 |
Internet-Draft Ryan Moats |
2 |
draft-ietf-urn-syntax-00.txt AT&T |
3 |
Expires in six months October 1996 |
4 |
|
5 |
|
6 |
URN Syntax |
7 |
Filename: draft-ietf-urn-syntax-00.txt |
8 |
|
9 |
|
10 |
Status of This Memo |
11 |
|
12 |
This document is an Internet-Draft. Internet-Drafts are working |
13 |
documents of the Internet Engineering Task Force (IETF), its |
14 |
areas, and its working groups. Note that other groups may also |
15 |
distribute working documents as Internet-Drafts. |
16 |
|
17 |
Internet-Drafts are draft documents valid for a maximum of six |
18 |
months and may be updated, replaced, or obsoleted by other |
19 |
documents at any time. It is inappropriate to use Internet- |
20 |
Drafts as reference material or to cite them other than as ``work |
21 |
in progress.'' |
22 |
|
23 |
To learn the current status of any Internet-Draft, please check |
24 |
the ``1id-abstracts.txt'' listing contained in the Internet- |
25 |
Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net |
26 |
(Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East |
27 |
Coast), or ftp.isi.edu (US West Coast). |
28 |
|
29 |
|
30 |
Abstract |
31 |
|
32 |
Uniform Resource Names (URNs) are intended to serve as persistent |
33 |
resource identifiers. This document presents the syntax for URNs. |
34 |
Support for existing legacy namespaces is discussed. URN transmission |
35 |
encoding requirements are presented. Finally, there is a discussion |
36 |
of URN equivalence and how to determine it. |
37 |
|
38 |
1. Syntax |
39 |
|
40 |
All URNs have the following syntax: |
41 |
|
42 |
<URN> ::= ["urn:"] <NID> ":" <NSS> |
43 |
|
44 |
<NID> is the Namespace Identifier, and <NSS> is the Namespace |
45 |
Specific String. The leading "urn:" sequence is currently optional, |
46 |
as no closure on its definite presence or absence has been reached. |
47 |
The Namespace ID is used to determine the _syntactic_ interpretation |
48 |
of the Namespace Specific String (as discussed in [1]). |
49 |
|
50 |
RFC 1737 [2] suggests additional requirements on URN encoding, which |
51 |
all have implications as far as limiting syntax. On the other hand, |
52 |
the requirement to support existing legacy naming systems has the |
53 |
effect of broadening syntax. Thus, we discuss the acceptable syntax |
54 |
for both the Namespace Identifier and the Namespace Specific String |
55 |
separately. |
56 |
|
57 |
1.1 Namespace Identifier Syntax |
58 |
|
59 |
The following is the syntax for the Namespace Identifier. To (a) be |
60 |
consistent with all potential resolution schemes and (b) not put any |
61 |
undue constraints on any potential resolution scheme, the syntax for |
62 |
the Namespace Identifier is: |
63 |
|
64 |
<NID> ::= <letter> [ <let-hyp> ] |
65 |
|
66 |
<let-hyp> ::= <letter> | "-" |
67 |
|
68 |
<letter> ::= any one of the 52 alphabetic characters A through Z |
69 |
in upper case and a through z in lower case |
70 |
|
71 |
This is slightly more restrictive that what is stated in RFC 1738 [4] |
72 |
(which allows the period "."). Further, the Namespace Identifier is |
73 |
case insensitive, so that "ISBN" and "isbn" refer to the same |
74 |
namespace. |
75 |
|
76 |
To avoid confusion with the optional "urn:" identifier, the NID "urn" |
77 |
is reserved and may not be used. |
78 |
|
79 |
1.2 Namespace Specific String Syntax |
80 |
|
81 |
Depending on the rules governing a namespace, valid identifiers in a |
82 |
namespace might contain characters that are reserved characters in |
83 |
URI syntax or non-printable ASCII characters. To accommodate the |
84 |
largest set of valid identifiers, the NSS portion of a URN shall use |
85 |
UTF-8 representation of ISO 10646 as its character set. Namespaces |
86 |
that do not currently use ISO 10646/UTF-8 are encouraged to migrate |
87 |
to it. |
88 |
|
89 |
Clients MUST be capable of %encoding the UTF-8 formatted NSS. |
90 |
%encoding, (as discussed in [3]) uses a percent sign "%" immediately |
91 |
followed by two hexadecimal digits (0-9, A-F) giving the binary code |
92 |
for that octet. The rules for %encoding presented in [3] apply with |
93 |
the following exceptions: |
94 |
|
95 |
1. [3] states that occurrence of the '/' character in URIs must |
96 |
denote hierarchy, so that partial forms of a URI are possible. |
97 |
This restriction is unenforceable, and relative URLs do not have a |
98 |
scheme prefix, so we allow URNs to contain unescaped occurrences |
99 |
of the '/' character that do not denote hierarchy. |
100 |
|
101 |
2. As an optimization when the transport between systems is known |
102 |
to be 8-bit-clean, clients MAY omit the %encoding on 8-bit |
103 |
characters but MUST still %encode the reserved characters below. |
104 |
|
105 |
For historic reasons, the characters "#" (%23), "?" (%3F), "%" (%25), |
106 |
"*" (%2A), "!" (%21), "<" (%3C), ">" (%3E), and '"' (%22), are |
107 |
reserved and must be %encoded. Thus client implementers should |
108 |
accept URNs from users in an unencoded form but must encode them |
109 |
before sending them to a resolver. |
110 |
|
111 |
URN resolvers MUST be capable of accepting URNs that have been |
112 |
%encoded for either 8-bit clean or 7-bit transports. %encoding is |
113 |
removed first, then UTF-8 decoding is performed. URN resolvers MUST |
114 |
return identical results from ANY legally encoded form of the URN. |
115 |
|
116 |
It should be noted that certain characters in the Namespace Specific |
117 |
String syntax may have special meaning in certain namespaces. |
118 |
Therefore, the process of registering a namespace identifier shall |
119 |
include publication of a definition of which characters have a |
120 |
special meaning and how to encode these characters if used in a |
121 |
literal sense. |
122 |
|
123 |
2. Support of existing legacy naming systems |
124 |
|
125 |
To allow for support existing legacy naming systems (as required by |
126 |
[2]), the Namespace Specific String shall be considered an "opaque |
127 |
string" in the sense of structure except as mentioned in Section 1. |
128 |
|
129 |
In addition, URN servers should be prepared to accept URNs that do |
130 |
not use ISO 10646/UTF-8 for those namespaces that currently use a |
131 |
different encoding. Note that this is not a general requirement on |
132 |
all resolvers, only resolvers that handle a namespace that is known |
133 |
not to use ISO 10646/UTF-8. |
134 |
|
135 |
3. URN encoding for transmission |
136 |
|
137 |
Because the NSS of a URN is considered a series of octets of data, |
138 |
encoding URNs for transport is the responsibility of the transport |
139 |
mechanism and is not discussed here. Any mechanism that can handle |
140 |
arbitrary 8-bit data will successfully transport a URN. |
141 |
|
142 |
4. Equivalence in URNs |
143 |
|
144 |
URNs are considered equivalent if they return the same result. For |
145 |
various purposes, such as caching, a test is necessary to determine |
146 |
equivalence without actually resolving the URNs and fetching/comparing |
147 |
the underlying resources. "Lexical equivalence" is a stricter condition |
148 |
that the equivalence described above (functional equivalence). |
149 |
|
150 |
4.1 Lexical Equivalence |
151 |
|
152 |
Lexical equivalence may be determined by comparing two URNs without |
153 |
making any network accesses. Two URNs are lexically equivalent if |
154 |
they are octet-by-octet equal after the following preprocessing |
155 |
|
156 |
1. remove any %encoding that might be present |
157 |
2. drop any preceding "urn:" token |
158 |
3. normalize the case of the NID |
159 |
|
160 |
Some namespaces may define additional lexical equivalences, such as |
161 |
case-insensitivity of the NSS (or parts thereof). Additional lexical |
162 |
equivalences MUST be documented as part of namespace registration, |
163 |
MUST always have the effect of eliminating some of the false |
164 |
negatives obtained by the procedure above, and MUST NEVER says that |
165 |
two URNs are not equivalent if the procedure above says they are |
166 |
equivalent. |
167 |
|
168 |
4.2 Functional Equivalence |
169 |
|
170 |
Resolvers determine functional equivalence based on specific rules |
171 |
for the namespace. Therefore, namespace registration must include |
172 |
documentation on how to determine functional equivalence for that |
173 |
namespace. |
174 |
|
175 |
4.3 Examples |
176 |
|
177 |
The following URN comparisons highlight the difference between these |
178 |
types of equivalence: |
179 |
|
180 |
urn:isbn:1-23485-8-29, isbn:1-23485-8-29 are lexically equiv. |
181 |
urn:isbn:1-23485-8-29, ISBN:1-23485-8-29 are lexically equiv. |
182 |
urn:isbn:1-23485-8-29, isbn:123485829 are not lexically equiv. |
183 |
but may be functionally equivalent. |
184 |
|
185 |
5. Security considerations |
186 |
|
187 |
Because of the number of potential namespaces, it must be restated |
188 |
that certain of the characters in the Namespace Specific String may |
189 |
have special meaning to certain namespace resolvers. The process of |
190 |
registering a namespace identifier shall therefore include |
191 |
publication of a definition of which characters have a special |
192 |
meaning and how to encode these characters if used in a literal |
193 |
sense. |
194 |
|
195 |
6. Acknowledgments |
196 |
|
197 |
Thanks to various members of the URN working group and <<your name |
198 |
here!!>> for comments on earlier drafts of this document. This |
199 |
document is partially supported by the National Science Foundation. |
200 |
|
201 |
7. References |
202 |
|
203 |
Request For Comments (RFC) and Internet Draft documents are available |
204 |
from <URL:ftp://ftp.internic.net> and numerous mirror sites. |
205 |
L. L. Daigle, P. Faltstrom, R. Iannella. "A Framework for the |
206 |
Assignment and Resolution of Uniform Resource Names", Internet |
207 |
Draft (work in progress). June 1996. |
208 |
|
209 |
K. Sollins, L. Masinter. "Functional Requirements for Uniform |
210 |
Resource Names", RFC 1737. December 1994. |
211 |
|
212 |
T. Berners-Lee. "Universal Resource Identifiers in WWW", RFC |
213 |
1630. June 1994. |
214 |
|
215 |
T. Berners-Lee, L. Masinter, M. McCahill. "Uniform Resource |
216 |
Locators (URL)", RFC 1738. December 1994. |
217 |
|
218 |
8. Author's address |
219 |
|
220 |
Ryan Moats |
221 |
AT&T |
222 |
15621 Drexel Circle |
223 |
Omaha, NE 68135-2358 |
224 |
USA |
225 |
|
226 |
Phone: +1 402 894-9456 |
227 |
EMail: jayhawk@ds.internic.net |
228 |
|
229 |
|
230 |
This Internet Draft expires April 1, 1997. |
231 |
|
232 |
|
233 |
|
234 |
|
235 |
|