1 |
wakaba |
1.1 |
Internet-Draft Ryan Moats |
2 |
|
|
draft-ietf-urn-syntax-00.txt AT&T |
3 |
|
|
Expires in six months October 1996 |
4 |
|
|
|
5 |
|
|
|
6 |
|
|
URN Syntax |
7 |
|
|
Filename: draft-ietf-urn-syntax-00.txt |
8 |
|
|
|
9 |
|
|
|
10 |
|
|
Status of This Memo |
11 |
|
|
|
12 |
|
|
This document is an Internet-Draft. Internet-Drafts are working |
13 |
|
|
documents of the Internet Engineering Task Force (IETF), its |
14 |
|
|
areas, and its working groups. Note that other groups may also |
15 |
|
|
distribute working documents as Internet-Drafts. |
16 |
|
|
|
17 |
|
|
Internet-Drafts are draft documents valid for a maximum of six |
18 |
|
|
months and may be updated, replaced, or obsoleted by other |
19 |
|
|
documents at any time. It is inappropriate to use Internet- |
20 |
|
|
Drafts as reference material or to cite them other than as ``work |
21 |
|
|
in progress.'' |
22 |
|
|
|
23 |
|
|
To learn the current status of any Internet-Draft, please check |
24 |
|
|
the ``1id-abstracts.txt'' listing contained in the Internet- |
25 |
|
|
Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net |
26 |
|
|
(Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East |
27 |
|
|
Coast), or ftp.isi.edu (US West Coast). |
28 |
|
|
|
29 |
|
|
|
30 |
|
|
Abstract |
31 |
|
|
|
32 |
|
|
Uniform Resource Names (URNs) are intended to serve as persistent |
33 |
|
|
resource identifiers. This document presents the syntax for URNs. |
34 |
|
|
Support for existing legacy namespaces is discussed. URN transmission |
35 |
|
|
encoding requirements are presented. Finally, there is a discussion |
36 |
|
|
of URN equivalence and how to determine it. |
37 |
|
|
|
38 |
|
|
1. Syntax |
39 |
|
|
|
40 |
|
|
All URNs have the following syntax: |
41 |
|
|
|
42 |
|
|
<URN> ::= ["urn:"] <NID> ":" <NSS> |
43 |
|
|
|
44 |
|
|
<NID> is the Namespace Identifier, and <NSS> is the Namespace |
45 |
|
|
Specific String. The leading "urn:" sequence is currently optional, |
46 |
|
|
as no closure on its definite presence or absence has been reached. |
47 |
|
|
The Namespace ID is used to determine the _syntactic_ interpretation |
48 |
|
|
of the Namespace Specific String (as discussed in [1]). |
49 |
|
|
|
50 |
|
|
RFC 1737 [2] suggests additional requirements on URN encoding, which |
51 |
|
|
all have implications as far as limiting syntax. On the other hand, |
52 |
|
|
the requirement to support existing legacy naming systems has the |
53 |
|
|
effect of broadening syntax. Thus, we discuss the acceptable syntax |
54 |
|
|
for both the Namespace Identifier and the Namespace Specific String |
55 |
|
|
separately. |
56 |
|
|
|
57 |
|
|
1.1 Namespace Identifier Syntax |
58 |
|
|
|
59 |
|
|
The following is the syntax for the Namespace Identifier. To (a) be |
60 |
|
|
consistent with all potential resolution schemes and (b) not put any |
61 |
|
|
undue constraints on any potential resolution scheme, the syntax for |
62 |
|
|
the Namespace Identifier is: |
63 |
|
|
|
64 |
|
|
<NID> ::= <letter> [ <let-hyp> ] |
65 |
|
|
|
66 |
|
|
<let-hyp> ::= <letter> | "-" |
67 |
|
|
|
68 |
|
|
<letter> ::= any one of the 52 alphabetic characters A through Z |
69 |
|
|
in upper case and a through z in lower case |
70 |
|
|
|
71 |
|
|
This is slightly more restrictive that what is stated in RFC 1738 [4] |
72 |
|
|
(which allows the period "."). Further, the Namespace Identifier is |
73 |
|
|
case insensitive, so that "ISBN" and "isbn" refer to the same |
74 |
|
|
namespace. |
75 |
|
|
|
76 |
|
|
To avoid confusion with the optional "urn:" identifier, the NID "urn" |
77 |
|
|
is reserved and may not be used. |
78 |
|
|
|
79 |
|
|
1.2 Namespace Specific String Syntax |
80 |
|
|
|
81 |
|
|
Depending on the rules governing a namespace, valid identifiers in a |
82 |
|
|
namespace might contain characters that are reserved characters in |
83 |
|
|
URI syntax or non-printable ASCII characters. To accommodate the |
84 |
|
|
largest set of valid identifiers, the NSS portion of a URN shall use |
85 |
|
|
UTF-8 representation of ISO 10646 as its character set. Namespaces |
86 |
|
|
that do not currently use ISO 10646/UTF-8 are encouraged to migrate |
87 |
|
|
to it. |
88 |
|
|
|
89 |
|
|
Clients MUST be capable of %encoding the UTF-8 formatted NSS. |
90 |
|
|
%encoding, (as discussed in [3]) uses a percent sign "%" immediately |
91 |
|
|
followed by two hexadecimal digits (0-9, A-F) giving the binary code |
92 |
|
|
for that octet. The rules for %encoding presented in [3] apply with |
93 |
|
|
the following exceptions: |
94 |
|
|
|
95 |
|
|
1. [3] states that occurrence of the '/' character in URIs must |
96 |
|
|
denote hierarchy, so that partial forms of a URI are possible. |
97 |
|
|
This restriction is unenforceable, and relative URLs do not have a |
98 |
|
|
scheme prefix, so we allow URNs to contain unescaped occurrences |
99 |
|
|
of the '/' character that do not denote hierarchy. |
100 |
|
|
|
101 |
|
|
2. As an optimization when the transport between systems is known |
102 |
|
|
to be 8-bit-clean, clients MAY omit the %encoding on 8-bit |
103 |
|
|
characters but MUST still %encode the reserved characters below. |
104 |
|
|
|
105 |
|
|
For historic reasons, the characters "#" (%23), "?" (%3F), "%" (%25), |
106 |
|
|
"*" (%2A), "!" (%21), "<" (%3C), ">" (%3E), and '"' (%22), are |
107 |
|
|
reserved and must be %encoded. Thus client implementers should |
108 |
|
|
accept URNs from users in an unencoded form but must encode them |
109 |
|
|
before sending them to a resolver. |
110 |
|
|
|
111 |
|
|
URN resolvers MUST be capable of accepting URNs that have been |
112 |
|
|
%encoded for either 8-bit clean or 7-bit transports. %encoding is |
113 |
|
|
removed first, then UTF-8 decoding is performed. URN resolvers MUST |
114 |
|
|
return identical results from ANY legally encoded form of the URN. |
115 |
|
|
|
116 |
|
|
It should be noted that certain characters in the Namespace Specific |
117 |
|
|
String syntax may have special meaning in certain namespaces. |
118 |
|
|
Therefore, the process of registering a namespace identifier shall |
119 |
|
|
include publication of a definition of which characters have a |
120 |
|
|
special meaning and how to encode these characters if used in a |
121 |
|
|
literal sense. |
122 |
|
|
|
123 |
|
|
2. Support of existing legacy naming systems |
124 |
|
|
|
125 |
|
|
To allow for support existing legacy naming systems (as required by |
126 |
|
|
[2]), the Namespace Specific String shall be considered an "opaque |
127 |
|
|
string" in the sense of structure except as mentioned in Section 1. |
128 |
|
|
|
129 |
|
|
In addition, URN servers should be prepared to accept URNs that do |
130 |
|
|
not use ISO 10646/UTF-8 for those namespaces that currently use a |
131 |
|
|
different encoding. Note that this is not a general requirement on |
132 |
|
|
all resolvers, only resolvers that handle a namespace that is known |
133 |
|
|
not to use ISO 10646/UTF-8. |
134 |
|
|
|
135 |
|
|
3. URN encoding for transmission |
136 |
|
|
|
137 |
|
|
Because the NSS of a URN is considered a series of octets of data, |
138 |
|
|
encoding URNs for transport is the responsibility of the transport |
139 |
|
|
mechanism and is not discussed here. Any mechanism that can handle |
140 |
|
|
arbitrary 8-bit data will successfully transport a URN. |
141 |
|
|
|
142 |
|
|
4. Equivalence in URNs |
143 |
|
|
|
144 |
|
|
URNs are considered equivalent if they return the same result. For |
145 |
|
|
various purposes, such as caching, a test is necessary to determine |
146 |
|
|
equivalence without actually resolving the URNs and fetching/comparing |
147 |
|
|
the underlying resources. "Lexical equivalence" is a stricter condition |
148 |
|
|
that the equivalence described above (functional equivalence). |
149 |
|
|
|
150 |
|
|
4.1 Lexical Equivalence |
151 |
|
|
|
152 |
|
|
Lexical equivalence may be determined by comparing two URNs without |
153 |
|
|
making any network accesses. Two URNs are lexically equivalent if |
154 |
|
|
they are octet-by-octet equal after the following preprocessing |
155 |
|
|
|
156 |
|
|
1. remove any %encoding that might be present |
157 |
|
|
2. drop any preceding "urn:" token |
158 |
|
|
3. normalize the case of the NID |
159 |
|
|
|
160 |
|
|
Some namespaces may define additional lexical equivalences, such as |
161 |
|
|
case-insensitivity of the NSS (or parts thereof). Additional lexical |
162 |
|
|
equivalences MUST be documented as part of namespace registration, |
163 |
|
|
MUST always have the effect of eliminating some of the false |
164 |
|
|
negatives obtained by the procedure above, and MUST NEVER says that |
165 |
|
|
two URNs are not equivalent if the procedure above says they are |
166 |
|
|
equivalent. |
167 |
|
|
|
168 |
|
|
4.2 Functional Equivalence |
169 |
|
|
|
170 |
|
|
Resolvers determine functional equivalence based on specific rules |
171 |
|
|
for the namespace. Therefore, namespace registration must include |
172 |
|
|
documentation on how to determine functional equivalence for that |
173 |
|
|
namespace. |
174 |
|
|
|
175 |
|
|
4.3 Examples |
176 |
|
|
|
177 |
|
|
The following URN comparisons highlight the difference between these |
178 |
|
|
types of equivalence: |
179 |
|
|
|
180 |
|
|
urn:isbn:1-23485-8-29, isbn:1-23485-8-29 are lexically equiv. |
181 |
|
|
urn:isbn:1-23485-8-29, ISBN:1-23485-8-29 are lexically equiv. |
182 |
|
|
urn:isbn:1-23485-8-29, isbn:123485829 are not lexically equiv. |
183 |
|
|
but may be functionally equivalent. |
184 |
|
|
|
185 |
|
|
5. Security considerations |
186 |
|
|
|
187 |
|
|
Because of the number of potential namespaces, it must be restated |
188 |
|
|
that certain of the characters in the Namespace Specific String may |
189 |
|
|
have special meaning to certain namespace resolvers. The process of |
190 |
|
|
registering a namespace identifier shall therefore include |
191 |
|
|
publication of a definition of which characters have a special |
192 |
|
|
meaning and how to encode these characters if used in a literal |
193 |
|
|
sense. |
194 |
|
|
|
195 |
|
|
6. Acknowledgments |
196 |
|
|
|
197 |
|
|
Thanks to various members of the URN working group and <<your name |
198 |
|
|
here!!>> for comments on earlier drafts of this document. This |
199 |
|
|
document is partially supported by the National Science Foundation. |
200 |
|
|
|
201 |
|
|
7. References |
202 |
|
|
|
203 |
|
|
Request For Comments (RFC) and Internet Draft documents are available |
204 |
|
|
from <URL:ftp://ftp.internic.net> and numerous mirror sites. |
205 |
|
|
L. L. Daigle, P. Faltstrom, R. Iannella. "A Framework for the |
206 |
|
|
Assignment and Resolution of Uniform Resource Names", Internet |
207 |
|
|
Draft (work in progress). June 1996. |
208 |
|
|
|
209 |
|
|
K. Sollins, L. Masinter. "Functional Requirements for Uniform |
210 |
|
|
Resource Names", RFC 1737. December 1994. |
211 |
|
|
|
212 |
|
|
T. Berners-Lee. "Universal Resource Identifiers in WWW", RFC |
213 |
|
|
1630. June 1994. |
214 |
|
|
|
215 |
|
|
T. Berners-Lee, L. Masinter, M. McCahill. "Uniform Resource |
216 |
|
|
Locators (URL)", RFC 1738. December 1994. |
217 |
|
|
|
218 |
|
|
8. Author's address |
219 |
|
|
|
220 |
|
|
Ryan Moats |
221 |
|
|
AT&T |
222 |
|
|
15621 Drexel Circle |
223 |
|
|
Omaha, NE 68135-2358 |
224 |
|
|
USA |
225 |
|
|
|
226 |
|
|
Phone: +1 402 894-9456 |
227 |
|
|
EMail: jayhawk@ds.internic.net |
228 |
|
|
|
229 |
|
|
|
230 |
|
|
This Internet Draft expires April 1, 1997. |
231 |
|
|
|
232 |
|
|
|
233 |
|
|
|
234 |
|
|
|
235 |
|
|
|