1 |
Uniform Resource Identifiers Working Group R. T. Fielding |
2 |
INTERNET-DRAFT UC Irvine |
3 |
Expires May 27, 1995 November 27, 1994 |
4 |
|
5 |
|
6 |
Relative Uniform Resource Locators |
7 |
<draft-ietf-uri-relative-url-02.txt> |
8 |
|
9 |
|
10 |
Status of this Memo |
11 |
|
12 |
This document is an Internet-Draft. Internet-Drafts are working |
13 |
documents of the Internet Engineering Task Force (IETF), its areas, |
14 |
and its working groups. Note that other groups may also distribute |
15 |
working documents as Internet-Drafts. |
16 |
|
17 |
Internet-Drafts are draft documents valid for a maximum of six |
18 |
months and may be updated, replaced, or obsoleted by other |
19 |
documents at any time. It is inappropriate to use Internet- |
20 |
Drafts as reference material or to cite them other than as |
21 |
``work in progress.'' |
22 |
|
23 |
To learn the current status of any Internet-Draft, please check |
24 |
the ``1id-abstracts.txt'' listing contained in the Internet- |
25 |
Drafts Shadow Directories on ds.internic.net (US East Coast), |
26 |
nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or |
27 |
munnari.oz.au (Pacific Rim). |
28 |
|
29 |
Distribution of this document is unlimited. Please send comments |
30 |
to the author, Roy T. Fielding <fielding@ics.uci.edu>, or to the |
31 |
URI working group (URI-WG) of the Internet Engineering Task Force |
32 |
(IETF) at <uri@bunyip.com>. Discussions of the group are archived at |
33 |
<URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>. |
34 |
|
35 |
|
36 |
Abstract |
37 |
|
38 |
Uniform Resource Locators (URLs) are a compact representation of the |
39 |
location and access method for a resource available via the Internet. |
40 |
When embedded within a base document, a URL in its absolute form may |
41 |
contain a great deal of information which is already known from the |
42 |
context of that base document's retrieval, including the access |
43 |
scheme, network location, and parts of the url-path. In situations |
44 |
where the base URL is well-defined and known to the parser (human or |
45 |
machine), it is useful to be able to embed URL references which |
46 |
inherit that context rather than re-specifying it in every instance. |
47 |
This document defines the syntax and semantics for such Relative |
48 |
Uniform Resource Locators. |
49 |
|
50 |
|
51 |
1. Introduction |
52 |
|
53 |
This work is derived from concepts introduced by the World-Wide Web |
54 |
global information initiative, whose use of such objects dates from |
55 |
1990 and is described in "Universal Resource Identifiers in WWW", |
56 |
RFC 1630 [1]. This document is a companion to the Internet-Draft |
57 |
"Uniform Resource Locators (URL)" [2], which specifies the |
58 |
syntax and semantics of absolute URLs. A URL is "absolute" if it |
59 |
can be interpreted consistently and unambiguously, with global scope, |
60 |
independent of any other URL. |
61 |
|
62 |
This document describes the syntax and semantics for "relative" |
63 |
Uniform Resource Locators (relative URLs): a compact representation |
64 |
of the location and access method for a resource available via the |
65 |
Internet relative to an absolute base URL. The name space of |
66 |
relative URLs is a superset of that defined in [2] for Uniform |
67 |
Resource Locators, in that all absolute URLs can be interpreted |
68 |
consistently relative to any Internet-accessible resource. For the |
69 |
sake of clarity, however, this document will only term "relative" |
70 |
those URLs which obtain global scope only when interpreted relative |
71 |
to a separate base URL. |
72 |
|
73 |
A common use for Uniform Resource Locators is to embed them within |
74 |
a document (referred to as the "base" document) for the purpose of |
75 |
identifying other Internet-accessible resources. For example, in |
76 |
hypertext documents, URLs can be used as the identifiers for |
77 |
hypertext link destinations. |
78 |
|
79 |
Absolute URLs contain a great deal of information which may already |
80 |
be known from the context of the base document's retrieval, |
81 |
including the access scheme, network location, and parts of the |
82 |
URL path. In situations where the base URL is well-defined and |
83 |
known, it is useful to be able to embed a URL reference which |
84 |
inherits that context rather than re-specifying it within each |
85 |
instance. |
86 |
|
87 |
It is often the case that a group or "tree" of documents has been |
88 |
constructed to serve a common purpose; the vast majority of URLs |
89 |
within these documents point to locations within the tree rather |
90 |
than outside of it. Similarly, documents located at a particular |
91 |
Internet site are much more likely to refer to other resources at |
92 |
that site than to resources at remote sites. |
93 |
|
94 |
Relative addressing of URLs allows document trees to be partially |
95 |
independent of their location and/or access scheme. For instance, |
96 |
if they refer to each other using relative URLs, it is possible for |
97 |
a single set of documents to be simultaneously accessible and, if |
98 |
hypertext, traversable via each of the "file", "http", and "ftp" |
99 |
access schemes. Furthermore, document trees can be moved, as a whole, |
100 |
without changing any of the embedded URLs. Experience within the |
101 |
World-Wide Web has demonstrated that the ability to perform relative |
102 |
referencing is necessary for the long-term usability of embedded |
103 |
URLs. |
104 |
|
105 |
2. Relative URL Syntax |
106 |
|
107 |
The syntax for relative URLs is the same as that for absolute URLs |
108 |
[2], with the exception that portions of the URL may be missing, and |
109 |
certain path components ("." and "..") have a special meaning when |
110 |
interpreting a relative URL path. Although this document does not |
111 |
seek to define the overall URL syntax, some discussion of it is |
112 |
necessary in order to describe the parsing of relative URLs. |
113 |
|
114 |
2.1. URL Syntactic Components |
115 |
|
116 |
Like absolute URLs, relative URL syntax is dependent upon the access |
117 |
scheme. Some schemes use "?" and ";" to indicate special reserved |
118 |
components, while others just consider them to be part of the path. |
119 |
However, there is enough uniformity in the syntax to allow a parser |
120 |
to resolve relative URLs based upon a few syntactic categories. |
121 |
These categories are described in Section 2.3. |
122 |
|
123 |
In general, the relative URL syntax consists of six components: |
124 |
|
125 |
<scheme>://<net_loc>/<path>;<params>?<query>#<fragment> |
126 |
|
127 |
each of which may be absent or may be disallowed by a particular |
128 |
scheme. They are defined as follows (a complete BNF is provided in |
129 |
Section 2.2): |
130 |
|
131 |
scheme ":" ::= access scheme name, as per Section 2.1 of [2]. |
132 |
|
133 |
"//" net_loc ::= network location and login information, as per |
134 |
Section 3.1 of [2]. |
135 |
|
136 |
"/" path ::= URL path, as per Section 3.1 of [2]. |
137 |
|
138 |
";" params ::= object parameters (e.g. ";type=a" as in |
139 |
Section 3.2.2 of [2]). |
140 |
|
141 |
"?" query ::= query information, as per Section 3.3 of [2]. |
142 |
|
143 |
"#" fragment ::= fragment identifier (currently only used within |
144 |
the World-Wide Web initiative). |
145 |
|
146 |
The order of the components is important. If both <params> and |
147 |
<query> are present, the <query> information must occur after the |
148 |
<params>. Relative components are resolved from left-to-right, |
149 |
according to the rules given in Section 4. |
150 |
|
151 |
2.2. BNF for Relative URLs |
152 |
|
153 |
This is a BNF-like description of the Relative Uniform Resource |
154 |
Locator syntax, using the conventions of RFC 822 [7], except that |
155 |
"|" is used to designate alternatives, and brackets "[]" are used |
156 |
around optional or repeated elements. Briefly, literals are quoted |
157 |
with "", optional elements are enclosed in [brackets], and elements |
158 |
may be preceded with <n>* to designate n or more repetitions of the |
159 |
following element; n defaults to 0. |
160 |
|
161 |
Because relative URLs are parsed within the context of the base URL, |
162 |
this BNF is not sufficient to completely specify the allowed syntax |
163 |
within any given context. Section 2.4 describes a context-sensitive |
164 |
parsing algorithm which disambiguates the grammar. |
165 |
|
166 |
|
167 |
relativeURL = [ absoluteURL | location | abs_path | rel_path ] |
168 |
[ "#" fragment ] |
169 |
|
170 |
absoluteURL = scheme ":" *[ uchar | reserved ] |
171 |
location = "//" net_loc [ "/" rel_path ] |
172 |
abs_path = "/" rel_path |
173 |
rel_path = [ path ] [ ";" params ] [ "?" query ] |
174 |
|
175 |
path = segment *[ "/" segment ] |
176 |
segment = *[ pchar | ";" ] |
177 |
|
178 |
params = param *[ ";" param ] |
179 |
param = *[ pchar | "/" ] |
180 |
|
181 |
scheme = 1*[ alpha | digit | "+" | "-" | "." ] |
182 |
net_loc = *[ pchar | ";" ] |
183 |
query = *[ uchar | reserved ] |
184 |
fragment = *[ uchar | reserved ] |
185 |
|
186 |
pchar = [ uchar | "?" | ":" | "@" | "&" | "=" ] |
187 |
uchar = unreserved | escape |
188 |
unreserved = alpha | digit | safe | extra | national |
189 |
|
190 |
escape = "%" hex hex |
191 |
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | |
192 |
"a" | "b" | "c" | "d" | "e" | "f" |
193 |
|
194 |
alpha = lowalpha | hialpha |
195 |
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | |
196 |
"j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | |
197 |
"s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" |
198 |
hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | |
199 |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | |
200 |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" |
201 |
|
202 |
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | |
203 |
"8" | "9" |
204 |
|
205 |
safe = "$" | "-" | "_" | "." | "+" |
206 |
extra = "!" | "*" | "'" | "(" | ")" | "," |
207 |
national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`" |
208 |
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" |
209 |
punctuation = "<" | ">" | "#" | "%" | <"> |
210 |
|
211 |
|
212 |
2.3. Specific Schemes and their Syntactic Categories |
213 |
|
214 |
Each URL access scheme has its own rules regarding the presence or |
215 |
absence of the syntactic components described in Section 2.1 and 2.2. |
216 |
However, there is enough commonality among the schemes to be able |
217 |
to group them into just a few categories. These categories are |
218 |
sufficiently general to allow new schemes to be added without |
219 |
substantial changes to the algorithm for resolving relative URLs. |
220 |
|
221 |
Within this section, we include as examples only those schemes |
222 |
which have a defined URL syntax in [2]. This includes: |
223 |
|
224 |
ftp File Transfer Protocol [3] |
225 |
http Hypertext Transfer Protocol [4] |
226 |
gopher Gopher and Gopher+ Protocols [5, 6] |
227 |
mailto Electronic Mail [7] |
228 |
news USENET news [8] |
229 |
nntp USENET news using NNTP access [9] |
230 |
telnet TELNET Protocol for Interactive Sessions [10] |
231 |
wais Wide Area Information Servers Protocol [11,12] |
232 |
file Host-specific Files |
233 |
prospero Prospero Directory Service [13] |
234 |
|
235 |
It is recommended that new schemes include a description of their |
236 |
membership in the following categories when they are registered, |
237 |
as per Section 4 of [2]. Membership in the five categories is |
238 |
described in terms of named sets: Uses-Netloc, Non-Hierarchical, |
239 |
Uses-Params, Uses-Query, and Uses-Fragment. |
240 |
|
241 |
2.3.1 The Uses-Netloc Set |
242 |
|
243 |
The Uses-Netloc set includes those access schemes which use the |
244 |
Common Internet Scheme Syntax described in Section 3.1 of [2], where |
245 |
the network location and/or login information starts with a |
246 |
double-slash "//" to indicate its presence, and continues until the |
247 |
following slash "/", if any. |
248 |
|
249 |
Uses-Netloc = {ftp, http, gopher, nntp, telnet, wais, |
250 |
file, prospero} |
251 |
|
252 |
2.3.2 The Non-Hierarchical Set |
253 |
|
254 |
The Non-Hierarchical set includes those access schemes which do not |
255 |
use "/" to indicate hierarchical segments in the URL path. |
256 |
|
257 |
Non-Hierarchical = {gopher, wais, mailto, news, telnet, prospero} |
258 |
|
259 |
When resolving relative paths for schemes not in the Non-Hierarchical |
260 |
set, the complete path segments ".." and "." have a significance |
261 |
reserved for representing the path hierarchy, indicating up-one-level |
262 |
and current-level, respectively. |
263 |
|
264 |
2.3.3 The Uses-Params Set |
265 |
|
266 |
The Uses-Params set includes those access schemes which allow the |
267 |
semicolon ";" character to separate object parameters from the |
268 |
URL path. There may be more than one parameter, each being |
269 |
separated by a semicolon ";". |
270 |
|
271 |
Uses-Params = {ftp, http, prospero} |
272 |
|
273 |
2.3.4 The Uses-Query Set |
274 |
|
275 |
The Uses-Query set includes those access schemes which allow the |
276 |
question mark "?" character to separate query information from the |
277 |
URL path. |
278 |
|
279 |
Uses-Query = {http, wais} |
280 |
|
281 |
2.3.5 The Uses-Fragment Set |
282 |
|
283 |
The Uses-Fragment set includes those access schemes which allow the |
284 |
crosshatch "#" character to separate a fragment identifier from |
285 |
the rest of the URL. Within systems that use fragment identifiers, |
286 |
|
287 |
Uses-Fragment = {ftp, http, gopher, news, nntp, wais, |
288 |
file, prospero} |
289 |
|
290 |
Unlike the other sets, however, the fragment identifier is only |
291 |
reserved within systems which use it. Outside of those systems, |
292 |
Uses-Fragment is equal to the empty set. |
293 |
|
294 |
2.3.6. Summary of Categories by Scheme |
295 |
|
296 |
Uses- Non-Hier Uses- Uses- Uses- |
297 |
Netloc archical Params Query Fragment |
298 |
.--------------------------------------------. |
299 |
ftp | XXXX | | XXXX | | XXXX | |
300 |
http | XXXX | | XXXX | XXXX | XXXX | |
301 |
gopher | XXXX | XXXX | | | XXXX | |
302 |
mailto | | XXXX | | | | |
303 |
news | | XXXX | | | XXXX | |
304 |
nntp | XXXX | | | | XXXX | |
305 |
telnet | XXXX | XXXX | | | | |
306 |
wais | XXXX | XXXX | | XXXX | XXXX | |
307 |
file | XXXX | | | | XXXX | |
308 |
prospero | XXXX | XXXX | XXXX | | XXXX | |
309 |
`--------------------------------------------' |
310 |
|
311 |
2.4. Parsing a URL |
312 |
|
313 |
An accepted method for parsing URLs is necessary to disambiguate the |
314 |
relative URL syntax of Section 2.2 and to describe the algorithm for |
315 |
resolving relative URLs presented in Section 4. This section |
316 |
describes the parsing rules for breaking down a URL (relative or |
317 |
absolute) into the component parts described in Section 2.1. The |
318 |
rules assume that the URL has already been separated from any |
319 |
surrounding text and copied to a "parse string". The rules are |
320 |
listed in the order in which they must be applied by the parser. |
321 |
|
322 |
2.4.1. Parsing the Scheme |
323 |
|
324 |
If the parse string contains a colon ":" after the first character |
325 |
and before any characters not allowed as part of a scheme name |
326 |
(i.e. any not an alphanumeric, plus "+", period ".", or hyphen "-"), |
327 |
the scheme of the URL is the substring of characters up to but not |
328 |
including the first colon. These characters and the colon are then |
329 |
removed from the parse string before continuing. |
330 |
|
331 |
2.4.2. Parsing the Fragment Identifier |
332 |
|
333 |
If the scheme is not a member of the Uses-Fragment set, this section |
334 |
is skipped. |
335 |
|
336 |
If the parse string contains a crosshatch "#" character, then the |
337 |
substring after the last (right-most) crosshatch "#" and up to the |
338 |
end of the parse string is the fragment identifier. If the |
339 |
crosshatch is the last character, or no crosshatch is present, then |
340 |
the fragment identifier is empty. The matched substring, including |
341 |
the crosshatch character, is removed from the parse string before |
342 |
continuing. |
343 |
|
344 |
Note that the fragment identifier is not considered part of the URL. |
345 |
However, since it is often attached to the URL, parsers must be able |
346 |
to recognize and set aside fragment identifiers as part of the |
347 |
process. |
348 |
|
349 |
2.4.3. Parsing the Network Location/Login |
350 |
|
351 |
If the scheme is not a member of the Uses-Netloc set, this section |
352 |
is skipped. |
353 |
|
354 |
If the parse string begins with a double-slash "//", then the |
355 |
substring of characters after the double-slash and up to, but not |
356 |
including, the next slash "/" character is the network location/login |
357 |
(<net_loc>) of the URL. If no trailing slash "/" is present, the |
358 |
entire remaining parse string is assigned to <net_loc>. The |
359 |
double-slash and <net_loc> are removed from the parse string before |
360 |
continuing. |
361 |
|
362 |
2.4.4. Parsing the Query Information |
363 |
|
364 |
If the scheme is not a member of the Uses-Query set, this section |
365 |
is skipped. |
366 |
|
367 |
If the parse string contains a question mark "?" character, then the |
368 |
substring after the first (left-most) question mark "?" and up to the |
369 |
end of the parse string is the query information. If the question |
370 |
mark is the last character, or no question mark is present, then the |
371 |
query information is empty. The matched substring, including the |
372 |
question mark character, is removed from the parse string before |
373 |
continuing. |
374 |
|
375 |
2.4.5. Parsing the Parameters |
376 |
|
377 |
If the scheme is not a member of the Uses-Params set, this section |
378 |
is skipped. |
379 |
|
380 |
If the parse string contains a semicolon ";" character, then the |
381 |
substring after the first (left-most) semicolon ";" and up to the |
382 |
end of the parse string is the parameters (<params>). If the |
383 |
semicolon is the last character, or no semicolon is present, then |
384 |
<params> is empty. The matched substring, including the semicolon |
385 |
character, is removed from the parse string before continuing. |
386 |
|
387 |
2.4.6. Parsing the Path |
388 |
|
389 |
After the above steps, all that is left of the parse string is |
390 |
the URL path and the slash "/" that may precede it. Even though |
391 |
the initial slash is not part of the URL path, the parser must |
392 |
remember whether or not it was present so that later processes |
393 |
can differentiate between relative and absolute paths. Often this |
394 |
is done by simply storing the preceding slash along with the path. |
395 |
|
396 |
3. Establishing a Base URL |
397 |
|
398 |
In order for relative URLs to be usable within a base document, |
399 |
the absolute "base URL" of that document must be known to the |
400 |
parser. There are three methods for obtaining the base URL of |
401 |
a document, listed here in order of precedence. |
402 |
|
403 |
3.1. Base URL within Document Content |
404 |
|
405 |
Within certain document content-types, the base URL of the document |
406 |
can be embedded within the content itself such that it can be |
407 |
readily obtained by a parser. This can be useful for descriptive |
408 |
documents, such as tables of content, which may be transmitted to |
409 |
others through schemes which do not support relative addressing |
410 |
(e.g. E-Mail or USENET news). |
411 |
|
412 |
It is beyond the scope of this document to specify how, for each |
413 |
content-type, the base URL can be embedded. However, an example of |
414 |
how this is done for the Hypertext Markup Language (HTML) [14] is |
415 |
provided in an Appendix (Section 10). |
416 |
|
417 |
3.2. Base URL within Message Headers |
418 |
|
419 |
For access schemes which make use of message headers like those |
420 |
described in RFC 822 [7], a second method for identifying the base |
421 |
URL of a document is to include that URL in the message headers. |
422 |
It is recommended that the format of this header be: |
423 |
|
424 |
Base-URL: absoluteURL |
425 |
|
426 |
where "Base-URL" is case-insensitive. For example, |
427 |
|
428 |
Base-URL: http://www.ics.uci.edu/Test/a/b/c |
429 |
|
430 |
would indicate that any relative URLs found within the document |
431 |
should be parsed relative to <URL:http://www.ics.uci.edu/Test/a/b/c>. |
432 |
In situations where both an embedded base URL (as described in |
433 |
Section 3.1) and a "Base-URL" message header are present, the |
434 |
embedded base URL takes precedence. |
435 |
|
436 |
3.3. Base URL from the Retrieval Context |
437 |
|
438 |
If neither an embedded base URL nor a "Base-URL" message header |
439 |
is present, then, if a URL was used to retrieve the base document, |
440 |
that URL shall be considered the base URL. Note that if the |
441 |
retrieval was the result of a redirected request, the last URL used |
442 |
(i.e., that which resulted in the actual retrieval of the document) |
443 |
is the base URL. |
444 |
|
445 |
3.4. Default Base URL |
446 |
|
447 |
If none of the conditions described in Sections 3.1 -- 3.3 apply, |
448 |
then the base URL is considered to be the empty string and all |
449 |
embedded URLs within that document shall be interpreted as absolute. |
450 |
It is the responsibility of the distributor(s) of a document |
451 |
containing relative URLs to ensure that the base URL for that |
452 |
document can be established. It must be emphasized that relative |
453 |
URLs cannot be used reliably in situations where the object's base |
454 |
URL is not well-defined. |
455 |
|
456 |
4. Resolving Relative URLs |
457 |
|
458 |
This section describes an example algorithm for resolving URLs |
459 |
within a context in which the URLs may be relative, such that the |
460 |
result is always a URL in absolute form. Although this algorithm |
461 |
cannot guarantee that the resulting URL will equal that intended |
462 |
by the original author, it does guarantee that any valid URL |
463 |
(relative or absolute) can be consistently transformed to an |
464 |
absolute form given a valid base URL. |
465 |
|
466 |
The following steps are performed in order: |
467 |
|
468 |
Step 1: The base URL is established according to the rules of |
469 |
Section 3. |
470 |
|
471 |
Step 2: If the base URL is the empty string (unknown), the embedded |
472 |
URL is interpreted as an absolute URL and we are done. |
473 |
|
474 |
Step 3: Both the base and embedded URLs are parsed into their |
475 |
component parts as described in Section 2.4. |
476 |
|
477 |
a) If the embedded URL starts with a scheme name, it is |
478 |
interpreted as an absolute URL and we are done. |
479 |
|
480 |
b) Otherwise, the embedded URL inherits the scheme of |
481 |
the base URL. |
482 |
|
483 |
Step 4: If the scheme is a member of the Uses-Netloc set |
484 |
(Section 2.3.1), then |
485 |
|
486 |
a) If the embedded URL's <net_loc> is non-empty, we skip to |
487 |
Step 8. |
488 |
|
489 |
b) Otherwise, the embedded URL inherits the <net_loc> of the |
490 |
base URL. |
491 |
|
492 |
Step 5: If the embedded URL path is preceded by a slash "/", the |
493 |
path is not relative and we skip to Step 8. |
494 |
|
495 |
Step 6: If the embedded URL path is empty (and not preceded by a |
496 |
slash), then |
497 |
|
498 |
a) The embedded URL inherits the base URL path; and, |
499 |
|
500 |
b) If the embedded URL's <params> is empty, it |
501 |
inherits the <params> of the base URL (if any); and, |
502 |
|
503 |
c) If the embedded URL's <query> is empty, it inherits |
504 |
the <query> of the base URL (if any); and, |
505 |
|
506 |
d) We skip to Step 8. |
507 |
|
508 |
Step 7: The last path segment of the base URL's path (anything |
509 |
following the rightmost slash "/", or the entire path if no |
510 |
slash is present) is removed and the embedded URL's path is |
511 |
appended in its place. The following operations are |
512 |
then applied, in order, to the new URL path: |
513 |
|
514 |
a) All occurrences of "./", where "." is a complete path |
515 |
segment, are removed. |
516 |
|
517 |
b) If the URL path ends with "." as a complete path segment, |
518 |
that "." is removed. |
519 |
|
520 |
c) All occurrences of "<segment>/../", where <segment> and |
521 |
".." are complete path segments, are removed. Removal of |
522 |
these path segments is performed iteratively, removing the |
523 |
leftmost matching pattern on each iteration, until no |
524 |
matching pattern remains. |
525 |
|
526 |
d) If the URL path ends with "<segment>/..", that |
527 |
"<segment>/.." is removed. |
528 |
|
529 |
Step 8: The resulting URL components, including any inherited from |
530 |
the base URL, are recombined to give the absolute form of |
531 |
the embedded URL. |
532 |
|
533 |
Parameters, regardless of their purpose, do not form a part of the |
534 |
URL path and thus have no effect on the resolving of relative paths. |
535 |
In particular, the presence or absence of the ";type=d" parameter |
536 |
on an ftp URL has no effect on the interpretation of paths relative |
537 |
to that URL. Fragment identifiers are never inherited from the |
538 |
base URL. |
539 |
|
540 |
5. Examples and Recommended Practice |
541 |
|
542 |
Within an object with a well-defined base URL of |
543 |
|
544 |
<URL:http://a/b/c/d> |
545 |
|
546 |
the relative URLs would be resolved as follows: |
547 |
|
548 |
5.1. Normal Examples |
549 |
|
550 |
g:h = <URL:g:h> |
551 |
g = <URL:http://a/b/c/g> |
552 |
./g = <URL:http://a/b/c/g> |
553 |
g/ = <URL:http://a/b/c/g/> |
554 |
/g = <URL:http://a/g> |
555 |
//g = <URL:http://g> |
556 |
?y = <URL:http://a/b/c/d?y> |
557 |
g?y = <URL:http://a/b/c/g?y> |
558 |
g?y/./x = <URL:http://a/b/c/g?y/./x> |
559 |
. = <URL:http://a/b/c/> |
560 |
./ = <URL:http://a/b/c/> |
561 |
.. = <URL:http://a/b/> |
562 |
../ = <URL:http://a/b/> |
563 |
../g = <URL:http://a/b/g> |
564 |
../.. = <URL:http://a/> |
565 |
../../g = <URL:http://a/g> |
566 |
|
567 |
5.2. Abnormal Examples |
568 |
|
569 |
../../../g = <URL:http://a/../g> |
570 |
./../g = <URL:http://a/b/g> |
571 |
./g/. = <URL:http://a/b/c/g/> |
572 |
/./g = <URL:http://a/./g> |
573 |
g/./h = <URL:http://a/b/c/g/h> |
574 |
g/../h = <URL:http://a/b/c/h> |
575 |
http:g = <URL:http:g> |
576 |
http: = <URL:http:> |
577 |
|
578 |
Note that, although the abnormal examples are not likely to occur |
579 |
for a normal relative URL, all URL parsers should be capable of |
580 |
resolving them consistently. |
581 |
|
582 |
5.3. Recommended Practice |
583 |
|
584 |
Authors should be aware that path names which contain a colon |
585 |
":" character cannot be used as the first component of a relative |
586 |
URL path (e.g. "this:that") because they will likely be mistaken for |
587 |
a scheme name. It is therefore necessary to precede such cases with |
588 |
other components (e.g., "./this:that"), or to escape the colon |
589 |
character (e.g., "this%3Athat"), in order for them to be correctly |
590 |
parsed. The former solution is preferred because it has no effect |
591 |
on the absolute form of the URL. |
592 |
|
593 |
6. Security Considerations |
594 |
|
595 |
None. |
596 |
|
597 |
7. Acknowledgements |
598 |
|
599 |
This work is derived from concepts introduced by Tim Berners-Lee and |
600 |
the World-Wide Web global information initiative. Relative URLs are |
601 |
described as "Partial URLs" in RFC 1630 [1]. That description was |
602 |
expanded for inclusion as an appendix for the Internet-Draft |
603 |
"Uniform Resource Locators (URL)" [2]. However, after further |
604 |
discussion, the URI-WG decided to specify Relative URLs separately |
605 |
from the primary URL draft. |
606 |
|
607 |
This document is intended to fulfill the requirements for Internet |
608 |
Resource Locators as stated in [15]. It has benefited greatly from |
609 |
the comments of all those participating in the URI-WG. Particular |
610 |
thanks go to Larry Masinter, Michael A. Dolan, Guido van Rossum, and |
611 |
Dave Kristol for identifying problems/deficiencies in earlier drafts. |
612 |
|
613 |
8. References |
614 |
|
615 |
[1] Berners-Lee, T., "Universal Resource Identifiers in WWW: |
616 |
A Unifying Syntax for the Expression of Names and Addresses of |
617 |
Objects on the Network as used in the World-Wide Web", RFC 1630, |
618 |
<URL:ftp://ds.internic.net/rfc/rfc1630.txt>, June 1994. |
619 |
|
620 |
[2] Berners-Lee, T., Masinter, L., and McCahill, M., Editors, |
621 |
"Uniform Resource Locators (URL)", Internet-Draft (work in |
622 |
progress), <URL:ftp://ds.internic.net/internet-drafts/ |
623 |
draft-ietf-uri-url-08.txt>, October 1994. |
624 |
|
625 |
[3] Postel, J. and Reynolds, J.K., "File Transfer Protocol (FTP)", |
626 |
STD 9, RFC 959, <URL:ftp://ds.internic.net/rfc/rfc959.txt>, |
627 |
October 1985. |
628 |
|
629 |
[4] Berners-Lee, T ., "Hypertext Transfer Protocol (HTTP)" , |
630 |
CERN, <URL:ftp://info.cern.ch/pub/www/doc/http-spec.txt.Z>, |
631 |
November 1993. |
632 |
|
633 |
[5] Anklesaria, F., McCahill, M., Lindner, P., Johnson, D., |
634 |
Torrey, D., and Alberti, B., "The Internet Gopher Protocol: |
635 |
A distributed document search and retrieval protocol", |
636 |
RFC 1436, <URL:ftp://ds.internic.net/rfc/rfc1436.txt>, |
637 |
March 1993. |
638 |
|
639 |
[6] Anklesaria, F., Lindner, P., McCahill, M., Torrey, D., |
640 |
Johnson, D., and Alberti, B., "Gopher+: Upward compatible |
641 |
enhancements to the Internet Gopher protocol", |
642 |
University of Minnesota, <URL:ftp://boombox.micro.umn.edu |
643 |
/pub/gopher/gopher_protocol/Gopher+/Gopher+.txt>, July 1993. |
644 |
|
645 |
[7] Crocker, D. H., "Standard for the Format of ARPA Internet Text |
646 |
Messages", STD 11, RFC 822, |
647 |
<URL:ftp://ds.internic.net/rfc/rfc822.txt>, August 1982. |
648 |
|
649 |
[8] Horton, M. and Adams, R., "Standard For Interchange of USENET |
650 |
messages", RFC 1036, <URL:ftp://ds.internic.net/rfc/rfc1036.txt>, |
651 |
December 1987. |
652 |
|
653 |
[9] Kantor, B. and Lapsley, P., "Network News Transfer Protocol: |
654 |
A Proposed Standard for the Stream-Based Transmission of News", |
655 |
RFC977, <URL:ftp://ds.internic.net/rfc/rfc977.txt>, |
656 |
February 1986. |
657 |
|
658 |
[10] Postel, J. and Reynolds, J.K., "TELNET Protocol Specification", |
659 |
RFC 854, <URL:ftp://ds.internic.net/rfc/rfc854.txt>, May 1983. |
660 |
|
661 |
[11] Davis, F., Kahle, B., Morris, H., Salem, J., Shen, T., Wang, R., |
662 |
Sui, J., and Grinbaum, M., "WAIS Interface Protocol Prototype |
663 |
Functional Specification", (v1.5), Thinking Machines Corporation, |
664 |
<URL:ftp://quake.think.com/pub/wais/doc/protspec.txt>, |
665 |
April 1990. |
666 |
|
667 |
[12] St. Pierre, M, Fullton, J., Gamiel, K., Goldman, J., Kahle, B., |
668 |
Kunze, J., Morris, H., and Schiettecatte, F., |
669 |
"WAIS over Z39.50-1988", RFC 1625, |
670 |
<URL:ftp://ds.internic.net/rfc/rfc1625.txt>, June 1994. |
671 |
|
672 |
[13] Neuman, B.C., and Augart, S. "The Prospero Protocol", |
673 |
USC Information Sciences Institute, <URL: |
674 |
ftp://prospero.isi.edu/pub/prospero/doc/prospero-protocol.PS.Z>, |
675 |
June 1993. |
676 |
|
677 |
[14] Berners-Lee, T., Connolly, D., et al. "HyperText Markup Language |
678 |
Specification -- 2.0", Internet-Draft (work in progress), |
679 |
<URL:ftp://www.ics.uci.edu/pub/ietf/html/>, November 1994. |
680 |
|
681 |
[15] Kunze, J., "Functional Requirements for Internet Resource |
682 |
Locators", Internet-Draft (work in progress), |
683 |
<URL:ftp://ds.internic.net/internet-drafts/ |
684 |
draft-ietf-uri-irl-fun-req-01.txt>, July 1994. |
685 |
|
686 |
9. Author's Address |
687 |
|
688 |
Roy T. Fielding |
689 |
Department of Information and Computer Science |
690 |
University of California |
691 |
Irvine, CA 92717-3425 |
692 |
U.S.A. |
693 |
|
694 |
Tel: +1 (714) 824-4049 |
695 |
Fax: +1 (714) 824-4056 |
696 |
Email: fielding@ics.uci.edu |
697 |
|
698 |
This Internet-Draft expires May 27, 1995. |
699 |
|
700 |
|
701 |
10. Appendix - Embedding the Base URL in HTML documents. |
702 |
|
703 |
It is useful to consider an example of how the base URL of a |
704 |
document can be embedded within the document's content. In this |
705 |
appendix, we describe how documents written in the Hypertext Markup |
706 |
Language (HTML) [14] can include an embedded base URL. This appendix |
707 |
does not form a part of the relative URL specification and should not |
708 |
be considered as anything more than a descriptive example. |
709 |
|
710 |
HTML defines a special element "BASE" which, when present in the |
711 |
"HEAD" portion of a document, signals that the parser should use |
712 |
the BASE element's "HREF" attribute as the base URL for resolving |
713 |
any relative URLs. The "HREF" attribute must be an absolute URL. |
714 |
Note that, in HTML, element and attribute names are case-insensitive. |
715 |
For example: |
716 |
|
717 |
<!doctype html public "-//IETF//DTD HTML//EN"> |
718 |
<HTML><HEAD> |
719 |
<TITLE>An example HTML document</TITLE> |
720 |
<BASE href="http://www.ics.uci.edu/Test/a/b/c"> |
721 |
</HEAD><BODY> |
722 |
... <A href="../x">a hypertext anchor</A> ... |
723 |
</BODY></HTML> |
724 |
|
725 |
A parser reading the example document should interpret the given |
726 |
relative URL "../x" as representing the absolute URL |
727 |
|
728 |
<URL:http://www.ics.uci.edu/Test/a/x> |
729 |
|
730 |
regardless of the context in which the example document was obtained. |
731 |
|