1 |
wakaba |
1.1 |
Uniform Resource Identifiers Working Group R. T. Fielding |
2 |
|
|
INTERNET-DRAFT UC Irvine |
3 |
|
|
Expires July 30, 1995 January 30, 1995 |
4 |
|
|
|
5 |
|
|
|
6 |
|
|
Relative Uniform Resource Locators |
7 |
|
|
<draft-ietf-uri-relative-url-05.txt> |
8 |
|
|
|
9 |
|
|
|
10 |
|
|
Status of this Memo |
11 |
|
|
|
12 |
|
|
This document is an Internet-Draft. Internet-Drafts are working |
13 |
|
|
documents of the Internet Engineering Task Force (IETF), its areas, |
14 |
|
|
and its working groups. Note that other groups may also distribute |
15 |
|
|
working documents as Internet-Drafts. |
16 |
|
|
|
17 |
|
|
Internet-Drafts are draft documents valid for a maximum of six |
18 |
|
|
months and may be updated, replaced, or obsoleted by other |
19 |
|
|
documents at any time. It is inappropriate to use Internet- |
20 |
|
|
Drafts as reference material or to cite them other than as |
21 |
|
|
``work in progress.'' |
22 |
|
|
|
23 |
|
|
To learn the current status of any Internet-Draft, please check |
24 |
|
|
the ``1id-abstracts.txt'' listing contained in the Internet- |
25 |
|
|
Drafts Shadow Directories on ftp.is.co.za (Africa), |
26 |
|
|
nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), |
27 |
|
|
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). |
28 |
|
|
|
29 |
|
|
Distribution of this document is unlimited. Please send comments |
30 |
|
|
to the author, Roy T. Fielding <fielding@ics.uci.edu>, or to the |
31 |
|
|
URI working group (URI-WG) of the Internet Engineering Task Force |
32 |
|
|
(IETF) at <uri@bunyip.com>. Discussions of the group are archived at |
33 |
|
|
<URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>. |
34 |
|
|
|
35 |
|
|
|
36 |
|
|
Abstract |
37 |
|
|
|
38 |
|
|
A Uniform Resource Locator (URL) is a compact representation of the |
39 |
|
|
location and access method for a resource available via the Internet. |
40 |
|
|
When embedded within a base document, a URL in its absolute form may |
41 |
|
|
contain a great deal of information which is already known from the |
42 |
|
|
context of that base document's retrieval, including the scheme, |
43 |
|
|
network location, and parts of the url-path. In situations where the |
44 |
|
|
base URL is well-defined and known to the parser (human or machine), |
45 |
|
|
it is useful to be able to embed URL references which inherit that |
46 |
|
|
context rather than re-specifying it in every instance. This |
47 |
|
|
document defines the syntax and semantics for such Relative Uniform |
48 |
|
|
Resource Locators. |
49 |
|
|
|
50 |
|
|
|
51 |
|
|
1. Introduction |
52 |
|
|
|
53 |
|
|
This document describes the syntax and semantics for "relative" |
54 |
|
|
Uniform Resource Locators (relative URLs): a compact representation |
55 |
|
|
of the location of a resource relative to an absolute base URL. |
56 |
|
|
It is a companion to RFC 1738, "Uniform Resource Locators (URL)" [2], |
57 |
|
|
which specifies the syntax and semantics of absolute URLs. |
58 |
|
|
|
59 |
|
|
A common use for Uniform Resource Locators is to embed them within |
60 |
|
|
a document (referred to as the "base" document) for the purpose of |
61 |
|
|
identifying other Internet-accessible resources. For example, in |
62 |
|
|
hypertext documents, URLs can be used as the identifiers for |
63 |
|
|
hypertext link destinations. |
64 |
|
|
|
65 |
|
|
Absolute URLs contain a great deal of information which may already |
66 |
|
|
be known from the context of the base document's retrieval, |
67 |
|
|
including the scheme, network location, and parts of the URL path. |
68 |
|
|
In situations where the base URL is well-defined and known, it is |
69 |
|
|
useful to be able to embed a URL reference which inherits that |
70 |
|
|
context rather than re-specifying it within each instance. |
71 |
|
|
Similarly, relative URLs can be used within data-entry dialogs to |
72 |
|
|
decrease the number of characters necessary to describe a location. |
73 |
|
|
|
74 |
|
|
It is often the case that a group or "tree" of documents has been |
75 |
|
|
constructed to serve a common purpose; the vast majority of URLs |
76 |
|
|
within these documents point to locations within the tree rather |
77 |
|
|
than outside of it. Similarly, documents located at a particular |
78 |
|
|
Internet site are much more likely to refer to other resources at |
79 |
|
|
that site than to resources at remote sites. |
80 |
|
|
|
81 |
|
|
Relative addressing of URLs allows document trees to be partially |
82 |
|
|
independent of their location and access scheme. For instance, |
83 |
|
|
if they refer to each other using relative URLs, it is possible for |
84 |
|
|
a single set of documents to be simultaneously accessible and, if |
85 |
|
|
hypertext, traversable via each of the "file", "http", and "ftp" |
86 |
|
|
schemes. Furthermore, document trees can be moved, as a whole, |
87 |
|
|
without changing any of the embedded URLs. Experience within the |
88 |
|
|
World-Wide Web has demonstrated that the ability to perform relative |
89 |
|
|
referencing is necessary for the long-term usability of embedded |
90 |
|
|
URLs. |
91 |
|
|
|
92 |
|
|
2. Relative URL Syntax |
93 |
|
|
|
94 |
|
|
The syntax for relative URLs is a shortened form of that for absolute |
95 |
|
|
URLs [2], where some prefix of the URL is missing and certain path |
96 |
|
|
components ("." and "..") have a special meaning when interpreting a |
97 |
|
|
relative path. Because a relative URL may appear in any context that |
98 |
|
|
could hold an absolute URL, systems that support relative URLs must |
99 |
|
|
be able to recognize them as part of the URL parsing process. |
100 |
|
|
|
101 |
|
|
Although this document does not seek to define the overall URL |
102 |
|
|
syntax, some discussion of it is necessary in order to describe the |
103 |
|
|
parsing of relative URLs. In particular, base documents can only |
104 |
|
|
make use of relative URLs when their base URL fits within the |
105 |
|
|
generic-RL syntax described below. Although some URL schemes do not |
106 |
|
|
require this generic-RL syntax, it is assumed that any document which |
107 |
|
|
contains a relative reference does have a base URL that obeys the |
108 |
|
|
syntax. In other words, relative URLs cannot be used within |
109 |
|
|
documents that have unsuitable base URLs. |
110 |
|
|
|
111 |
|
|
2.1. URL Syntactic Components |
112 |
|
|
|
113 |
|
|
The URL syntax is dependent upon the scheme. Some schemes use |
114 |
|
|
reserved characters like "?" and ";" to indicate special components, |
115 |
|
|
while others just consider them to be part of the path. However, |
116 |
|
|
there is enough uniformity in the use of URLs to allow a parser |
117 |
|
|
to resolve relative URLs based upon a single, generic-RL syntax. |
118 |
|
|
This generic-RL syntax consists of six components: |
119 |
|
|
|
120 |
|
|
<scheme>://<net_loc>/<path>;<params>?<query>#<fragment> |
121 |
|
|
|
122 |
|
|
each of which, except <scheme>, may be absent from a particular URL. |
123 |
|
|
These components are defined as follows (a complete BNF is provided |
124 |
|
|
in Section 2.2): |
125 |
|
|
|
126 |
|
|
scheme ":" ::= scheme name, as per Section 2.1 of RFC 1738 [2]. |
127 |
|
|
|
128 |
|
|
"//" net_loc ::= network location and login information, as per |
129 |
|
|
Section 3.1 of RFC 1738 [2]. |
130 |
|
|
|
131 |
|
|
"/" path ::= URL path, as per Section 3.1 of RFC 1738 [2]. |
132 |
|
|
|
133 |
|
|
";" params ::= object parameters (e.g. ";type=a" as in |
134 |
|
|
Section 3.2.2 of RFC 1738 [2]). |
135 |
|
|
|
136 |
|
|
"?" query ::= query information, as per Section 3.3 of |
137 |
|
|
RFC 1738 [2]. |
138 |
|
|
|
139 |
|
|
"#" fragment ::= fragment identifier. |
140 |
|
|
|
141 |
|
|
Note that the fragment identifier (and the "#" that precedes it) is |
142 |
|
|
not considered part of the URL. However, since it is commonly used |
143 |
|
|
within the same string context as a URL, a parser must be able to |
144 |
|
|
recognize the fragment when it is present and set it aside as part |
145 |
|
|
of the parsing process. |
146 |
|
|
|
147 |
|
|
The order of the components is important. If both <params> and |
148 |
|
|
<query> are present, the <query> information must occur after the |
149 |
|
|
<params>. |
150 |
|
|
|
151 |
|
|
2.2. BNF for Relative URLs |
152 |
|
|
|
153 |
|
|
This is a BNF-like description of the Relative Uniform Resource |
154 |
|
|
Locator syntax, using the conventions of RFC 822 [5], except that |
155 |
|
|
"|" is used to designate alternatives. Briefly, literals are quoted |
156 |
|
|
with "", parentheses "(" and ")" are used to group elements, optional |
157 |
|
|
elements are enclosed in [brackets], and elements may be preceded |
158 |
|
|
with <n>* to designate n or more repetitions of the following |
159 |
|
|
element; n defaults to 0. |
160 |
|
|
|
161 |
|
|
URL = ( absoluteURL | relativeURL ) [ "#" fragment ] |
162 |
|
|
|
163 |
|
|
absoluteURL = generic-RL | ( scheme ":" *( uchar | reserved ) ) |
164 |
|
|
|
165 |
|
|
generic-RL = scheme ":" relativeURL |
166 |
|
|
|
167 |
|
|
relativeURL = net_path | abs_path | rel_path |
168 |
|
|
|
169 |
|
|
net_path = "//" net_loc [ abs_path ] |
170 |
|
|
abs_path = "/" rel_path |
171 |
|
|
rel_path = [ path ] [ ";" params ] [ "?" query ] |
172 |
|
|
|
173 |
|
|
path = fsegment *( "/" segment ) |
174 |
|
|
fsegment = 1*pchar |
175 |
|
|
segment = *pchar |
176 |
|
|
|
177 |
|
|
params = param *( ";" param ) |
178 |
|
|
param = *( pchar | "/" ) |
179 |
|
|
|
180 |
|
|
scheme = 1*( alpha | digit | "+" | "-" | "." ) |
181 |
|
|
net_loc = *( pchar | ";" | "?" ) |
182 |
|
|
query = *( uchar | reserved ) |
183 |
|
|
fragment = *( uchar | reserved ) |
184 |
|
|
|
185 |
|
|
pchar = uchar | ":" | "@" | "&" | "=" |
186 |
|
|
uchar = unreserved | escape |
187 |
|
|
unreserved = alpha | digit | safe | extra | national |
188 |
|
|
|
189 |
|
|
escape = "%" hex hex |
190 |
|
|
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | |
191 |
|
|
"a" | "b" | "c" | "d" | "e" | "f" |
192 |
|
|
|
193 |
|
|
alpha = lowalpha | hialpha |
194 |
|
|
lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | |
195 |
|
|
"j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | |
196 |
|
|
"s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" |
197 |
|
|
hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | |
198 |
|
|
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | |
199 |
|
|
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" |
200 |
|
|
|
201 |
|
|
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | |
202 |
|
|
"8" | "9" |
203 |
|
|
|
204 |
|
|
safe = "$" | "-" | "_" | "." | "+" |
205 |
|
|
extra = "!" | "*" | "'" | "(" | ")" | "," |
206 |
|
|
national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]" | "`" |
207 |
|
|
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" |
208 |
|
|
punctuation = "<" | ">" | "#" | "%" | <"> |
209 |
|
|
|
210 |
|
|
|
211 |
|
|
2.3. Specific Schemes and their Syntactic Categories |
212 |
|
|
|
213 |
|
|
Each URL scheme has its own rules regarding the presence or absence |
214 |
|
|
of the syntactic components described in Sections 2.1 and 2.2. |
215 |
|
|
In addition, some schemes are never appropriate for use with relative |
216 |
|
|
URLs. However, since relative URLs will only be used within contexts |
217 |
|
|
in which they are useful, these scheme-specific differences can be |
218 |
|
|
ignored by the resolution process. |
219 |
|
|
|
220 |
|
|
Within this section, we include as examples only those schemes that |
221 |
|
|
have a defined URL syntax in RFC 1738 [2]. The following schemes are |
222 |
|
|
never used with relative URLs: |
223 |
|
|
|
224 |
|
|
mailto Electronic Mail |
225 |
|
|
news USENET news |
226 |
|
|
telnet TELNET Protocol for Interactive Sessions |
227 |
|
|
|
228 |
|
|
Some URL schemes allow the use of reserved characters for purposes |
229 |
|
|
outside the generic-RL syntax given above. However, such use is |
230 |
|
|
rare. Relative URLs can be used with these schemes whenever the |
231 |
|
|
applicable base URL follows the generic-RL syntax. |
232 |
|
|
|
233 |
|
|
gopher Gopher and Gopher+ Protocols |
234 |
|
|
prospero Prospero Directory Service |
235 |
|
|
wais Wide Area Information Servers Protocol |
236 |
|
|
|
237 |
|
|
Users of gopher URLs should note that gopher-type information is |
238 |
|
|
often included at the beginning of what would be the generic-RL path. |
239 |
|
|
If present, this type information prevents relative-path references |
240 |
|
|
to documents with differing gopher-types. |
241 |
|
|
|
242 |
|
|
Finally, the following schemes can always be parsed using the |
243 |
|
|
generic-RL syntax. |
244 |
|
|
|
245 |
|
|
file Host-specific Files |
246 |
|
|
ftp File Transfer Protocol |
247 |
|
|
http Hypertext Transfer Protocol |
248 |
|
|
nntp USENET news using NNTP access |
249 |
|
|
|
250 |
|
|
It is recommended that new schemes be designed to be parsable via |
251 |
|
|
the generic-RL syntax if they are intended to be used with relative |
252 |
|
|
URLs. A description of the allowed relative forms should be included |
253 |
|
|
when a new scheme is registered, as per Section 4 of RFC 1738 [2]. |
254 |
|
|
|
255 |
|
|
2.4. Parsing a URL |
256 |
|
|
|
257 |
|
|
An accepted method for parsing URLs is useful to clarify the |
258 |
|
|
generic-RL syntax of Section 2.2 and to describe the algorithm for |
259 |
|
|
resolving relative URLs presented in Section 4. This section |
260 |
|
|
describes the parsing rules for breaking down a URL (relative or |
261 |
|
|
absolute) into the component parts described in Section 2.1. The |
262 |
|
|
rules assume that the URL has already been separated from any |
263 |
|
|
surrounding text and copied to a "parse string". The rules are |
264 |
|
|
listed in the order in which they would be applied by the parser. |
265 |
|
|
|
266 |
|
|
2.4.1. Parsing the Fragment Identifier |
267 |
|
|
|
268 |
|
|
If the parse string contains a crosshatch "#" character, then the |
269 |
|
|
substring after the first (left-most) crosshatch "#" and up to the |
270 |
|
|
end of the parse string is the <fragment> identifier. If the |
271 |
|
|
crosshatch is the last character, or no crosshatch is present, then |
272 |
|
|
the fragment identifier is empty. The matched substring, including |
273 |
|
|
the crosshatch character, is removed from the parse string before |
274 |
|
|
continuing. |
275 |
|
|
|
276 |
|
|
Note that the fragment identifier is not considered part of the URL. |
277 |
|
|
However, since it is often attached to the URL, parsers must be able |
278 |
|
|
to recognize and set aside fragment identifiers as part of the |
279 |
|
|
process. |
280 |
|
|
|
281 |
|
|
2.4.2. Parsing the Scheme |
282 |
|
|
|
283 |
|
|
If the parse string contains a colon ":" after the first character |
284 |
|
|
and before any characters not allowed as part of a scheme name |
285 |
|
|
(i.e. any not an alphanumeric, plus "+", period ".", or hyphen "-"), |
286 |
|
|
the <scheme> of the URL is the substring of characters up to but not |
287 |
|
|
including the first colon. These characters and the colon are then |
288 |
|
|
removed from the parse string before continuing. |
289 |
|
|
|
290 |
|
|
2.4.3. Parsing the Network Location/Login |
291 |
|
|
|
292 |
|
|
If the parse string begins with a double-slash "//", then the |
293 |
|
|
substring of characters after the double-slash and up to, but not |
294 |
|
|
including, the next slash "/" character is the network location/login |
295 |
|
|
(<net_loc>) of the URL. If no trailing slash "/" is present, the |
296 |
|
|
entire remaining parse string is assigned to <net_loc>. The |
297 |
|
|
double-slash and <net_loc> are removed from the parse string before |
298 |
|
|
continuing. |
299 |
|
|
|
300 |
|
|
2.4.4. Parsing the Query Information |
301 |
|
|
|
302 |
|
|
If the parse string contains a question mark "?" character, then the |
303 |
|
|
substring after the first (left-most) question mark "?" and up to the |
304 |
|
|
end of the parse string is the <query> information. If the question |
305 |
|
|
mark is the last character, or no question mark is present, then the |
306 |
|
|
query information is empty. The matched substring, including the |
307 |
|
|
question mark character, is removed from the parse string before |
308 |
|
|
continuing. |
309 |
|
|
|
310 |
|
|
2.4.5. Parsing the Parameters |
311 |
|
|
|
312 |
|
|
If the parse string contains a semicolon ";" character, then the |
313 |
|
|
substring after the first (left-most) semicolon ";" and up to the |
314 |
|
|
end of the parse string is the parameters (<params>). If the |
315 |
|
|
semicolon is the last character, or no semicolon is present, then |
316 |
|
|
<params> is empty. The matched substring, including the semicolon |
317 |
|
|
character, is removed from the parse string before continuing. |
318 |
|
|
|
319 |
|
|
2.4.6. Parsing the Path |
320 |
|
|
|
321 |
|
|
After the above steps, all that is left of the parse string is |
322 |
|
|
the URL <path> and the slash "/" that may precede it. Even though |
323 |
|
|
the initial slash is not part of the URL path, the parser must |
324 |
|
|
remember whether or not it was present so that later processes |
325 |
|
|
can differentiate between relative and absolute paths. Often this |
326 |
|
|
is done by simply storing the preceding slash along with the path. |
327 |
|
|
|
328 |
|
|
3. Establishing a Base URL |
329 |
|
|
|
330 |
|
|
The term "relative URL" implies that there exists some absolute |
331 |
|
|
"base URL" against which the relative reference is applied. Indeed, |
332 |
|
|
the base URL is necessary to define the semantics of any embedded |
333 |
|
|
relative URLs; without it, a relative reference is meaningless. |
334 |
|
|
In order for relative URLs to be usable within a document, the base |
335 |
|
|
URL of that document must be known to the parser. |
336 |
|
|
|
337 |
|
|
The base URL of a document can be established in one of four ways, |
338 |
|
|
listed below in order of precedence. The order of precedence can be |
339 |
|
|
thought of in terms of layers, where the innermost defined base URL |
340 |
|
|
has the highest precedence. This can be visualized graphically as: |
341 |
|
|
|
342 |
|
|
.---------------------------------------------------------. |
343 |
|
|
| .---------------------------------------------------. | |
344 |
|
|
| | .---------------------------------------------. | | |
345 |
|
|
| | | .---------------------------------------. | | | |
346 |
|
|
| | | | (3.1) Base URL embedded in the | | | | |
347 |
|
|
| | | | document's content | | | | |
348 |
|
|
| | | `---------------------------------------' | | | |
349 |
|
|
| | | (3.2) URL defined by a "Base" message | | | |
350 |
|
|
| | | header (or equivalent) | | | |
351 |
|
|
| | `---------------------------------------------' | | |
352 |
|
|
| | (3.3) URL of the document's retrieval context | | |
353 |
|
|
| `---------------------------------------------------' | |
354 |
|
|
| (3.4) Base URL = "" (undefined) | |
355 |
|
|
`---------------------------------------------------------' |
356 |
|
|
|
357 |
|
|
3.1. Base URL within Document Content |
358 |
|
|
|
359 |
|
|
Within certain document media types, the base URL of the document |
360 |
|
|
can be embedded within the content itself such that it can be |
361 |
|
|
readily obtained by a parser. This can be useful for descriptive |
362 |
|
|
documents, such as tables of content, which may be transmitted to |
363 |
|
|
others through protocols other than their usual retrieval context |
364 |
|
|
(e.g. E-Mail or USENET news). |
365 |
|
|
|
366 |
|
|
It is beyond the scope of this document to specify how, for each |
367 |
|
|
media type, the base URL can be embedded. However, an example of |
368 |
|
|
how this is done for the Hypertext Markup Language (HTML) [3] is |
369 |
|
|
provided in an Appendix (Section 10). |
370 |
|
|
|
371 |
|
|
3.2. Base URL within Message Headers |
372 |
|
|
|
373 |
|
|
A second method for identifying the base URL of a document is to |
374 |
|
|
specify it within the message headers (or equivalent tagged |
375 |
|
|
metainformation) of the message enclosing the document. For |
376 |
|
|
protocols that make use of message headers like those described in |
377 |
|
|
RFC 822 [5], it is recommended that the format of this header be: |
378 |
|
|
|
379 |
|
|
base-header = "Base" ":" "<URL:" absoluteURL ">" |
380 |
|
|
|
381 |
|
|
where "Base" is case-insensitive. For example, the header |
382 |
|
|
|
383 |
|
|
Base: <URL:http://www.ics.uci.edu/Test/a/b/c> |
384 |
|
|
|
385 |
|
|
would indicate that any relative URLs found within the document |
386 |
|
|
should be parsed relative to <URL:http://www.ics.uci.edu/Test/a/b/c>. |
387 |
|
|
Any whitespace (including that used for line folding) inside the |
388 |
|
|
angle brackets should be ignored. |
389 |
|
|
|
390 |
|
|
Protocols which do not use the RFC 822 message header syntax, but |
391 |
|
|
which do allow some form of tagged metainformation to be included |
392 |
|
|
within messages, may define their own syntax for passing the base URL |
393 |
|
|
as part of a message. Describing the syntax for all possible |
394 |
|
|
protocols is beyond the scope of this document. It is assumed that |
395 |
|
|
user agents using such a protocol will be able to obtain the |
396 |
|
|
appropriate syntax from that protocol's specification. |
397 |
|
|
|
398 |
|
|
In situations where both an embedded base URL (as described in |
399 |
|
|
Section 3.1) and a base-header are present, the embedded base URL |
400 |
|
|
takes precedence. |
401 |
|
|
|
402 |
|
|
3.3. Base URL from the Retrieval Context |
403 |
|
|
|
404 |
|
|
If neither an embedded base URL nor a base-header is present, then, |
405 |
|
|
if a URL was used to retrieve the base document, that URL shall be |
406 |
|
|
considered the base URL. Note that if the retrieval was the result |
407 |
|
|
of a redirected request, the last URL used (i.e., that which resulted |
408 |
|
|
in the actual retrieval of the document) is the base URL. |
409 |
|
|
|
410 |
|
|
Composite media types, such as the "multipart/*" and "message/*" |
411 |
|
|
media types defined by MIME (RFC 1521, [4]), require special |
412 |
|
|
processing in order to determine the retrieval context of an enclosed |
413 |
|
|
document. For these types, the base URL of the composite entity |
414 |
|
|
must be determined first; this base is then considered the retrieval |
415 |
|
|
context for its component parts, and thus the base URL for any part |
416 |
|
|
that does not define its own base via one of the methods described |
417 |
|
|
in Sections 3.1 and 3.2. This logic is applied recursively for |
418 |
|
|
component parts that are themselves composite entities. |
419 |
|
|
|
420 |
|
|
In other words, the retrieval context (Section 3.3) of a component |
421 |
|
|
part is the base URL of the composite entity of which it is a part. |
422 |
|
|
Thus, a composite entity can redefine the retrieval context of its |
423 |
|
|
component parts via inclusion of a base-header, and this redefinition |
424 |
|
|
applies recursively for a hierarchy of composite parts. Note that |
425 |
|
|
this is not necessarily the same as defining the base URL of the |
426 |
|
|
components, since each component may include an embedded base URL |
427 |
|
|
or base-header that takes precedence over the retrieval context. |
428 |
|
|
|
429 |
|
|
3.4. Default Base URL |
430 |
|
|
|
431 |
|
|
If none of the conditions described in Sections 3.1 -- 3.3 apply, |
432 |
|
|
then the base URL is considered to be the empty string and all |
433 |
|
|
embedded URLs within that document are assumed to be absolute URLs. |
434 |
|
|
It is the responsibility of the distributor(s) of a document |
435 |
|
|
containing relative URLs to ensure that the base URL for that |
436 |
|
|
document can be established. It must be emphasized that relative |
437 |
|
|
URLs cannot be used reliably in situations where the object's base |
438 |
|
|
URL is not well-defined. |
439 |
|
|
|
440 |
|
|
4. Resolving Relative URLs |
441 |
|
|
|
442 |
|
|
This section describes an example algorithm for resolving URLs |
443 |
|
|
within a context in which the URLs may be relative, such that the |
444 |
|
|
result is always a URL in absolute form. Although this algorithm |
445 |
|
|
cannot guarantee that the resulting URL will equal that intended |
446 |
|
|
by the original author, it does guarantee that any valid URL |
447 |
|
|
(relative or absolute) can be consistently transformed to an |
448 |
|
|
absolute form given a valid base URL. |
449 |
|
|
|
450 |
|
|
The following steps are performed in order: |
451 |
|
|
|
452 |
|
|
Step 1: The base URL is established according to the rules of |
453 |
|
|
Section 3. If the base URL is the empty string (unknown), |
454 |
|
|
the embedded URL is interpreted as an absolute URL and |
455 |
|
|
we are done. |
456 |
|
|
|
457 |
|
|
Step 2: Both the base and embedded URLs are parsed into their |
458 |
|
|
component parts as described in Section 2.4. |
459 |
|
|
|
460 |
|
|
a) If the embedded URL is entirely empty, it inherits the |
461 |
|
|
entire base URL (i.e. is set equal to the base URL) |
462 |
|
|
and we are done. |
463 |
|
|
|
464 |
|
|
b) If the embedded URL starts with a scheme name, it is |
465 |
|
|
interpreted as an absolute URL and we are done. |
466 |
|
|
|
467 |
|
|
c) Otherwise, the embedded URL inherits the scheme of |
468 |
|
|
the base URL. |
469 |
|
|
|
470 |
|
|
Step 3: If the embedded URL's <net_loc> is non-empty, we skip to |
471 |
|
|
Step 7. Otherwise, the embedded URL inherits the <net_loc> |
472 |
|
|
(if any) of the base URL. |
473 |
|
|
|
474 |
|
|
Step 4: If the embedded URL path is preceded by a slash "/", the |
475 |
|
|
path is not relative and we skip to Step 7. |
476 |
|
|
|
477 |
|
|
Step 5: If the embedded URL path is empty (and not preceded by a |
478 |
|
|
slash), then the embedded URL inherits the base URL path, |
479 |
|
|
and |
480 |
|
|
|
481 |
|
|
a) if the embedded URL's <params> is non-empty, we skip to |
482 |
|
|
step 7; otherwise, it inherits the <params> of the base |
483 |
|
|
URL (if any) and |
484 |
|
|
|
485 |
|
|
b) if the embedded URL's <query> is non-empty, we skip to |
486 |
|
|
step 7; otherwise, it inherits the <query> of the base |
487 |
|
|
URL (if any) and we skip to step 7. |
488 |
|
|
|
489 |
|
|
Step 6: The last segment of the base URL's path (anything |
490 |
|
|
following the rightmost slash "/", or the entire path if no |
491 |
|
|
slash is present) is removed and the embedded URL's path is |
492 |
|
|
appended in its place. The following operations are |
493 |
|
|
then applied, in order, to the new path: |
494 |
|
|
|
495 |
|
|
a) All occurrences of "./", where "." is a complete path |
496 |
|
|
segment, are removed. |
497 |
|
|
|
498 |
|
|
b) If the path ends with "." as a complete path segment, |
499 |
|
|
that "." is removed. |
500 |
|
|
|
501 |
|
|
c) All occurrences of "<segment>/../", where <segment> and |
502 |
|
|
".." are complete path segments, are removed. Removal of |
503 |
|
|
these path segments is performed iteratively, removing the |
504 |
|
|
leftmost matching pattern on each iteration, until no |
505 |
|
|
matching pattern remains. |
506 |
|
|
|
507 |
|
|
d) If the path ends with "<segment>/..", that "<segment>/.." |
508 |
|
|
is removed. |
509 |
|
|
|
510 |
|
|
Step 7: The resulting URL components, including any inherited from |
511 |
|
|
the base URL, are recombined to give the absolute form of |
512 |
|
|
the embedded URL. |
513 |
|
|
|
514 |
|
|
Parameters, regardless of their purpose, do not form a part of the |
515 |
|
|
URL path and thus have no effect on the resolving of relative paths. |
516 |
|
|
In particular, the presence or absence of the ";type=d" parameter |
517 |
|
|
on an ftp URL has no effect on the interpretation of paths relative |
518 |
|
|
to that URL. Fragment identifiers are only inherited from the base |
519 |
|
|
URL when the entire embedded URL is empty. |
520 |
|
|
|
521 |
|
|
5. Examples and Recommended Practice |
522 |
|
|
|
523 |
|
|
Within an object with a well-defined base URL of |
524 |
|
|
|
525 |
|
|
Base: <URL:http://a/b/c/d;p?q#f> |
526 |
|
|
|
527 |
|
|
the relative URLs would be resolved as follows: |
528 |
|
|
|
529 |
|
|
5.1. Normal Examples |
530 |
|
|
|
531 |
|
|
g:h = <URL:g:h> |
532 |
|
|
g = <URL:http://a/b/c/g> |
533 |
|
|
./g = <URL:http://a/b/c/g> |
534 |
|
|
g/ = <URL:http://a/b/c/g/> |
535 |
|
|
/g = <URL:http://a/g> |
536 |
|
|
//g = <URL:http://g> |
537 |
|
|
?y = <URL:http://a/b/c/d;p?y> |
538 |
|
|
g?y = <URL:http://a/b/c/g?y> |
539 |
|
|
g?y/./x = <URL:http://a/b/c/g?y/./x> |
540 |
|
|
#s = <URL:http://a/b/c/d;p?q#s> |
541 |
|
|
g#s = <URL:http://a/b/c/g#s> |
542 |
|
|
g#s/./x = <URL:http://a/b/c/g#s/./x> |
543 |
|
|
g?y#s = <URL:http://a/b/c/g?y#s> |
544 |
|
|
;x = <URL:http://a/b/c/d;x> |
545 |
|
|
g;x = <URL:http://a/b/c/g;x> |
546 |
|
|
g;x?y#s = <URL:http://a/b/c/g;x?y#s> |
547 |
|
|
. = <URL:http://a/b/c/> |
548 |
|
|
./ = <URL:http://a/b/c/> |
549 |
|
|
.. = <URL:http://a/b/> |
550 |
|
|
../ = <URL:http://a/b/> |
551 |
|
|
../g = <URL:http://a/b/g> |
552 |
|
|
../.. = <URL:http://a/> |
553 |
|
|
../../ = <URL:http://a/> |
554 |
|
|
../../g = <URL:http://a/g> |
555 |
|
|
|
556 |
|
|
5.2. Abnormal Examples |
557 |
|
|
|
558 |
|
|
Although the following abnormal examples are unlikely to occur |
559 |
|
|
in normal practice, all URL parsers should be capable of resolving |
560 |
|
|
them consistently. Each example uses the same base as above. |
561 |
|
|
|
562 |
|
|
An empty reference resolves to the complete base URL: |
563 |
|
|
|
564 |
|
|
<> = <URL:http://a/b/c/d;p?q#f> |
565 |
|
|
|
566 |
|
|
Parsers must be careful in handling the case where there are more |
567 |
|
|
relative path ".." segments than there are hierarchical levels in |
568 |
|
|
the base URL's path. Note that the ".." syntax cannot be used to |
569 |
|
|
change the <net_loc> of a URL. |
570 |
|
|
|
571 |
|
|
../../../g = <URL:http://a/../g> |
572 |
|
|
|
573 |
|
|
Similarly, parsers must avoid treating "." and ".." as special when |
574 |
|
|
they are not complete components of a relative path. |
575 |
|
|
|
576 |
|
|
/./g = <URL:http://a/./g> |
577 |
|
|
/../g = <URL:http://a/../g> |
578 |
|
|
g. = <URL:http://a/b/c/g.> |
579 |
|
|
.g = <URL:http://a/b/c/.g> |
580 |
|
|
g.. = <URL:http://a/b/c/g..> |
581 |
|
|
..g = <URL:http://a/b/c/..g> |
582 |
|
|
|
583 |
|
|
Less likely are cases where the relative URL uses unnecessary or |
584 |
|
|
nonsensical forms of the "." and ".." complete path segments. |
585 |
|
|
|
586 |
|
|
./../g = <URL:http://a/b/g> |
587 |
|
|
./g/. = <URL:http://a/b/c/g/> |
588 |
|
|
g/./h = <URL:http://a/b/c/g/h> |
589 |
|
|
g/../h = <URL:http://a/b/c/h> |
590 |
|
|
|
591 |
|
|
Finally, some older parsers allow the scheme name to be present in |
592 |
|
|
a relative URL if it is the same as the base URL scheme. This is |
593 |
|
|
considered to be a loophole in prior specifications of partial |
594 |
|
|
URLs [1] and should be avoided by future parsers. |
595 |
|
|
|
596 |
|
|
http:g = <URL:http:g> |
597 |
|
|
http: = <URL:http:> |
598 |
|
|
|
599 |
|
|
5.3. Recommended Practice |
600 |
|
|
|
601 |
|
|
Authors should be aware that path names which contain a colon |
602 |
|
|
":" character cannot be used as the first component of a relative |
603 |
|
|
URL path (e.g. "this:that") because they will likely be mistaken for |
604 |
|
|
a scheme name. It is therefore necessary to precede such cases with |
605 |
|
|
other components (e.g., "./this:that"), or to escape the colon |
606 |
|
|
character (e.g., "this%3Athat"), in order for them to be correctly |
607 |
|
|
parsed. The former solution is preferred because it has no effect |
608 |
|
|
on the absolute form of the URL. |
609 |
|
|
|
610 |
|
|
There is an ambiguity in the semantics for the ftp URL scheme |
611 |
|
|
regarding the use of a trailing slash ("/") character and/or a |
612 |
|
|
parameter ";type=d" to indicate a resource that is an ftp directory. |
613 |
|
|
If the result of retrieving that directory includes embedded |
614 |
|
|
relative URLs, it is necessary that the base URL path for that result |
615 |
|
|
include a trailing slash. For this reason, it is recommended that |
616 |
|
|
the ";type=d" parameter value not be used within contexts that allow |
617 |
|
|
relative URLs. |
618 |
|
|
|
619 |
|
|
6. Security Considerations |
620 |
|
|
|
621 |
|
|
There are no security considerations in the use or parsing of relative |
622 |
|
|
URLs. However, once a relative URL has been resolved to its absolute |
623 |
|
|
form, the same security considerations apply as those described in |
624 |
|
|
RFC 1738 [2]. |
625 |
|
|
|
626 |
|
|
7. Acknowledgements |
627 |
|
|
|
628 |
|
|
This work is derived from concepts introduced by Tim Berners-Lee and |
629 |
|
|
the World-Wide Web global information initiative. Relative URLs are |
630 |
|
|
described as "Partial URLs" in RFC 1630 [1]. That description was |
631 |
|
|
expanded for inclusion as an appendix for an early draft of RFC 1738, |
632 |
|
|
"Uniform Resource Locators (URL)" [2]. However, after further |
633 |
|
|
discussion, the URI-WG decided to specify Relative URLs separately |
634 |
|
|
from the primary URL draft. |
635 |
|
|
|
636 |
|
|
This document is intended to fulfill the requirements for Internet |
637 |
|
|
Resource Locators as stated in [6]. It has benefited greatly from |
638 |
|
|
the comments of all those participating in the URI-WG. Particular |
639 |
|
|
thanks go to Larry Masinter, Michael A. Dolan, Guido van Rossum, and |
640 |
|
|
Dave Kristol for identifying problems/deficiencies in earlier drafts. |
641 |
|
|
|
642 |
|
|
8. References |
643 |
|
|
|
644 |
|
|
[1] T. Berners-Lee, "Universal Resource Identifiers in WWW: |
645 |
|
|
A Unifying Syntax for the Expression of Names and Addresses of |
646 |
|
|
Objects on the Network as used in the World-Wide Web", RFC 1630, |
647 |
|
|
CERN, June 1994. <URL:ftp://ds.internic.net/rfc/rfc1630.txt> |
648 |
|
|
|
649 |
|
|
[2] T. Berners-Lee, L. Masinter, and M. McCahill, Editors, |
650 |
|
|
"Uniform Resource Locators (URL)", RFC 1738, CERN, |
651 |
|
|
Xerox Corporation, University of Minnesota, December 1994. |
652 |
|
|
<URL:ftp://ds.internic.net/rfc/rfc1738.txt> |
653 |
|
|
|
654 |
|
|
[3] T. Berners-Lee and D. Connolly, "HyperText Markup Language |
655 |
|
|
Specification -- 2.0", Work in Progress, MIT, HaL Computer |
656 |
|
|
Systems, November 1994. |
657 |
|
|
<URL:http://www.ics.uci.edu/pub/ietf/html/> |
658 |
|
|
|
659 |
|
|
[4] N. Borenstein and N. Freed, "MIME (Multipurpose Internet Mail |
660 |
|
|
Extensions): Mechanisms for Specifying and Describing the Format |
661 |
|
|
of Internet Message Bodies", RFC 1521, Bellcore, Innosoft, |
662 |
|
|
September 1993. <URL:ftp://ds.internic.net/rfc/rfc1521.txt> |
663 |
|
|
|
664 |
|
|
[5] D. H. Crocker, "Standard for the Format of ARPA Internet |
665 |
|
|
Text Messages", STD 11, RFC 822, UDEL, August 1982. |
666 |
|
|
<URL:ftp://ds.internic.net/rfc/rfc822.txt> |
667 |
|
|
|
668 |
|
|
[6] J. Kunze, "Functional Requirements for Internet Resource |
669 |
|
|
Locators", Work in Progress, IS&T, UC Berkeley, January 1995. |
670 |
|
|
<URL:ftp://ds.internic.net/internet-drafts/ |
671 |
|
|
draft-ietf-uri-irl-fun-req-03.txt> |
672 |
|
|
|
673 |
|
|
9. Author's Address |
674 |
|
|
|
675 |
|
|
Roy T. Fielding |
676 |
|
|
Department of Information and Computer Science |
677 |
|
|
University of California |
678 |
|
|
Irvine, CA 92717-3425 |
679 |
|
|
U.S.A. |
680 |
|
|
|
681 |
|
|
Tel: +1 (714) 824-4049 |
682 |
|
|
Fax: +1 (714) 824-4056 |
683 |
|
|
Email: fielding@ics.uci.edu |
684 |
|
|
|
685 |
|
|
This Internet-Draft expires July 30, 1995. |
686 |
|
|
|
687 |
|
|
|
688 |
|
|
10. Appendix - Embedding the Base URL in HTML documents. |
689 |
|
|
|
690 |
|
|
It is useful to consider an example of how the base URL of a |
691 |
|
|
document can be embedded within the document's content. In this |
692 |
|
|
appendix, we describe how documents written in the Hypertext Markup |
693 |
|
|
Language (HTML) [3] can include an embedded base URL. This appendix |
694 |
|
|
does not form a part of the relative URL specification and should not |
695 |
|
|
be considered as anything more than a descriptive example. |
696 |
|
|
|
697 |
|
|
HTML defines a special element "BASE" which, when present in the |
698 |
|
|
"HEAD" portion of a document, signals that the parser should use |
699 |
|
|
the BASE element's "HREF" attribute as the base URL for resolving |
700 |
|
|
any relative URLs. The "HREF" attribute must be an absolute URL. |
701 |
|
|
Note that, in HTML, element and attribute names are case-insensitive. |
702 |
|
|
For example: |
703 |
|
|
|
704 |
|
|
<!doctype html public "-//IETF//DTD HTML//EN"> |
705 |
|
|
<HTML><HEAD> |
706 |
|
|
<TITLE>An example HTML document</TITLE> |
707 |
|
|
<BASE href="http://www.ics.uci.edu/Test/a/b/c"> |
708 |
|
|
</HEAD><BODY> |
709 |
|
|
... <A href="../x">a hypertext anchor</A> ... |
710 |
|
|
</BODY></HTML> |
711 |
|
|
|
712 |
|
|
A parser reading the example document should interpret the given |
713 |
|
|
relative URL "../x" as representing the absolute URL |
714 |
|
|
|
715 |
|
|
<URL:http://www.ics.uci.edu/Test/a/x> |
716 |
|
|
|
717 |
|
|
regardless of the context in which the example document was obtained. |
718 |
|
|
|