1 |
Uniform Resource Locators (URL) Tim Berners-Lee |
2 |
draft-ietf-uri-url-03.{ps,txt} URI working Group |
3 |
Expires 21 September 1994 21 March 1994 |
4 |
|
5 |
|
6 |
Uniform Resource Locators (URL) |
7 |
|
8 |
A Syntax for the Expression of |
9 |
Access Information of Objects on the Network |
10 |
|
11 |
|
12 |
ABOUT THIS DOCUMENT |
13 |
|
14 |
This document specifies a Uniform Resource Locator (URL), the |
15 |
syntax and semantics of formalized information for location and |
16 |
access of resources on the Internet. |
17 |
|
18 |
This document was written by the URI working group of the Internet |
19 |
Engineering Task Force. Comments may be addressed to the editor, |
20 |
Tim Berners-Lee <timbl@info.cern.ch>, or to the URI-WG |
21 |
<uri@bunyip.com>. Discussions of the group are archived at |
22 |
|
23 |
<http://www.acl.lanl.gov/URI/archive/uri-archive.index.html> |
24 |
|
25 |
This document is bound by the Requirements Specification in |
26 |
preparation. |
27 |
|
28 |
The work is derived from concepts introduced by the World-Wide Web |
29 |
global information initiative, whose use of such objects dates |
30 |
from 1990 and is described in "Universal Resource identifeirs for |
31 |
the World-Wide Web", RFCXXX . |
32 |
|
33 |
This document is available in hypertext form, with links to |
34 |
background information, as: |
35 |
|
36 |
<http://info.cern.ch/hypertext/WWW/Addressing/URL/Overview.html> |
37 |
|
38 |
. |
39 |
|
40 |
STATUS OF THIS MEMO |
41 |
|
42 |
This document is an Internet Draft. Internet Drafts are working |
43 |
documents of the Internet Engineering Task Force (IETF), its Areas, |
44 |
and its Working Groups. Note that other groups may also distribute |
45 |
working documents as Internet Drafts. |
46 |
|
47 |
Internet Drafts are working documents valid for a maximum of six |
48 |
months. Internet Drafts may be updated, replaced, or obsoleted by |
49 |
other documents at any time. It is not appropriate to use Internet |
50 |
Drafts as reference material or to cite them other than as a |
51 |
"working draft" or "work in progress". |
52 |
|
53 |
Distribution of this document is unlimited. |
54 |
|
55 |
|
56 |
|
57 |
Berners-Lee 1 |
58 |
|
59 |
Recommendations |
60 |
|
61 |
This section describes the syntax for "Uniform Resource Locators" |
62 |
(URLs): that is, basically physical addresses of objects which are |
63 |
retrievable using protocols already deployed on the net. The |
64 |
generic syntax provides a framework for new schemes for names to be |
65 |
resolved using as yet undefined protocols. |
66 |
|
67 |
The syntax is described in two parts. Firstly, we give the syntax |
68 |
rules of a completely specified name; secondly, we give the rules |
69 |
under which parts of the name may be omitted in a well-defined |
70 |
context. |
71 |
|
72 |
URL SYNTAX |
73 |
|
74 |
A complete URL consists of a naming scheme specifier followed by a |
75 |
string whose format is a function of the naming scheme. For |
76 |
locators of information on the internet, a common syntax is used |
77 |
for the IP address part. A BNF description of the URL syntax is |
78 |
given in an a later section. The components are as follows. |
79 |
Fragment identifiers and partial URLs are not involved in the basic |
80 |
URL definition. |
81 |
|
82 |
PrePrefix |
83 |
|
84 |
To be a Uniform Resource Locator as currently defined by the URI |
85 |
working group, the whole string must start with a constant prefix |
86 |
"URL:". Note that to save space in this document, some URLs may |
87 |
have been quoted throughout without this preprefix. |
88 |
|
89 |
Scheme |
90 |
|
91 |
Within the URL of a object, the first element is the name of the |
92 |
scheme, separated from the rest of the object by a colon. The rest |
93 |
of the URL follows the colon in a format depending on the scheme. |
94 |
|
95 |
Internet protocol parts |
96 |
|
97 |
Those schemes which refer to internet protocols mostly have a |
98 |
common syntax for the rest of the object name. This starts with a |
99 |
double slash "//" to indicate its presence, and continues until the |
100 |
following slash "/". Within that section are |
101 |
|
102 |
An optional user name, |
103 |
if required (as it is with a few FTP |
104 |
servers). The password, is present, follows |
105 |
the user name, separated from it by a colon; |
106 |
the user name and optional password are |
107 |
followed by a commercial at sign "@". The |
108 |
user of user name and passwords which are |
109 |
public is discouraged. |
110 |
|
111 |
The internet domain name |
112 |
|
113 |
|
114 |
|
115 |
Berners-Lee 2 |
116 |
|
117 |
of the host in RFC1037 format (or, |
118 |
optionally and less advisably, the IP address |
119 |
as a set of four decimal digits) |
120 |
|
121 |
The port number, if it is not the default number for the |
122 |
protocol, is given in decimal notation after |
123 |
a colon. |
124 |
|
125 |
Path The rest of the locator is known as the |
126 |
"path". It may define details of how the |
127 |
client should communicate with the server, |
128 |
including information to be passed |
129 |
transparently to the server without any |
130 |
processing by the client. |
131 |
|
132 |
The path is interpreted in a manner dependent on the scheme being |
133 |
used. Generally, the slash "/" (ASCII 2F hex) denotes a level in a |
134 |
hierarchical structure, the higher level part to the left of the |
135 |
slash. |
136 |
|
137 |
ENCODING PROHIBITED CHARACTERS |
138 |
|
139 |
When a system uses a local addressing scheme, it is useful to |
140 |
provide a mapping from local addresses into URLs so that references |
141 |
to objects within the addressing scheme may be referred to |
142 |
globally, and possibly accessed through gateway servers. |
143 |
|
144 |
Any mapping scheme may be defined provided it is unambiguous, |
145 |
reversible, and provides valid URLs. It is recommended that where |
146 |
hierarchical aspects to the local naming scheme exist, they be |
147 |
mapped onto the hierarchical URL path syntax in order to allow the |
148 |
partial form to be used. |
149 |
|
150 |
The following encoding method shall be used for mapping WAIS, FTP, |
151 |
Prospero and Gopher addresses onto URLs. Where the local naming |
152 |
scheme uses characters which are not allowed in the URL, these may |
153 |
be represented in the URL by a percent sign "%" followed by two |
154 |
hexadecimal digits (0-9, A-F) giving the ISO Latin 1 code for that |
155 |
character. Character codes other than those allowed by the syntax |
156 |
shall not be used unencoded in a URL. |
157 |
|
158 |
The same encoding method may be used for encoding characters whose |
159 |
use, although technically allowed in a URL, would be unwise due to |
160 |
problems of corruption by imperfect gateways or misrepresentation |
161 |
due to the use of variant character sets, or which would simply be |
162 |
awkward in a given environment. Because a % sign always indicates |
163 |
an encoded character, a URL may be made safer simply by encoding |
164 |
any characters considered unsafe, while leaving already encoded |
165 |
characters still encoded. Similarly, in cases where a larger set |
166 |
of characters is acceptable, % signs can be selectively and |
167 |
reversibly expanded. |
168 |
|
169 |
Specific Schemes |
170 |
|
171 |
|
172 |
|
173 |
Berners-Lee 3 |
174 |
|
175 |
The mapping for some existing standard and experimental protocols |
176 |
is outlined in the BNF syntax definition . Notes on particular |
177 |
protocols follow. The schemes covered are |
178 |
|
179 |
http Hypertext Transfer Protocol |
180 |
|
181 |
ftp File Transfer protocol |
182 |
|
183 |
gopher The Gopher protocol |
184 |
|
185 |
mailto Electronic mail address |
186 |
|
187 |
mid Message identifiers for electronic mail |
188 |
|
189 |
cid Content identifiers for MIME body part |
190 |
|
191 |
news Usenet news |
192 |
|
193 |
nntp Usenet news for local NNTP access only |
194 |
|
195 |
prospero Access using the prospero protocols |
196 |
|
197 |
telnet , rlogin and tn3270 |
198 |
Reference to interactive sessions |
199 |
|
200 |
wais Wide Area Information Servers |
201 |
|
202 |
Other schemes may be specified by future specifications |
203 |
|
204 |
New schemes may be registered at a later time. |
205 |
|
206 |
FTP |
207 |
|
208 |
The ftp: prefix indicates that the FTP protocol is used, as defined |
209 |
in RFC957 or any successor. The port number, if present, gives the |
210 |
port of the FTP server if not the FTP default. |
211 |
|
212 |
User name and password |
213 |
|
214 |
The syntax allows for the inclusion of a user name and even a |
215 |
password for those systems which do not use the anonymous FTP |
216 |
convention. The default, however, if no user or password is |
217 |
supplied, will be to use that convention, viz. that the user name |
218 |
is "anonymous" and the password the user's Internet-style mail |
219 |
address . |
220 |
|
221 |
Where possible, this mail address should correspond to a usable |
222 |
mail address for the user, and preferably give a DNS host name |
223 |
which resolves to the IP address of the client. Note that servers |
224 |
currently vary in their treatment of the anonymous password. |
225 |
|
226 |
Path |
227 |
|
228 |
|
229 |
|
230 |
|
231 |
Berners-Lee 4 |
232 |
|
233 |
The FTP protocol allows for a sequence of CWD commands (change |
234 |
working directory) and a TYPE command prior to service commands |
235 |
such as RETR (retrieve) or NLIST (etc) which actually access a |
236 |
file. |
237 |
|
238 |
The arguments of any CWD commands are successive segment parts of |
239 |
the URL delimited by slash, and the final segment is suitable as |
240 |
the filename argument to the RETR command for retrieval or the |
241 |
directory argument to NLIST. |
242 |
|
243 |
For some file systems (Unix in particular), the "/" used to denote |
244 |
the hierarchical structure of the URL corresponds to the delimiter |
245 |
used to construct a file name hierarchy, and thus, the filename |
246 |
will look the same as the URL path. This does NOT mean that the URL |
247 |
is a Unix filename. |
248 |
|
249 |
Note: Retrieving subsequent URLs from the same host |
250 |
|
251 |
There is no common hierarchical model to the FTP protocol, so if a |
252 |
directory change command has been given, it is impossible in |
253 |
general to deduce what sequence should be given to navigate to |
254 |
another directory for a second retrieval, if the paths are |
255 |
different. The only reliable algorithm is to disconnect and |
256 |
reestablish the control connection. |
257 |
|
258 |
Data type |
259 |
|
260 |
The data content type of a file can only, in the general FTP case, |
261 |
be deduced from the name, normally the suffix of the name. This is |
262 |
not standardized. An alternative is for it to be transferred in |
263 |
information outside the URL. A suitable FTP transfer type (for |
264 |
example binary "I" or text "A") must in turn be deduced from the |
265 |
data content type. It is recommended that conventions for suffixes |
266 |
of public archives be established, but it is outside the scope of |
267 |
this standard. |
268 |
|
269 |
An FTP URL may optionally specify the FTP data transfer type by |
270 |
which an object is to be retrieved. Two of the methods correspond |
271 |
to the FTP "Data Types" ASCII and IMAGE for the retrieval of a |
272 |
document, as specified in FTP by the TYPE command . One method |
273 |
indicates directory access. |
274 |
|
275 |
The data type is specified by a suffix to the URL. Possible |
276 |
suffixes are: |
277 |
|
278 |
;type = <type-code> Use FTP type as given to perform data |
279 |
transfer. |
280 |
|
281 |
;type=d Use FTP directory list commands to read |
282 |
directory |
283 |
|
284 |
The type code is in the format defined in RFC959. |
285 |
|
286 |
|
287 |
|
288 |
|
289 |
Berners-Lee 5 |
290 |
|
291 |
Transfer Mode |
292 |
|
293 |
Stream Mode is always used. |
294 |
|
295 |
HTTP |
296 |
|
297 |
The HTTP protocol specifies that the path is handled transparently |
298 |
by those who handle URLs, except for the servers which de-reference |
299 |
them. The path is passed by the client to the server with any |
300 |
request, but is not otherwise understood by the client. The |
301 |
fragmentid part is not sent with the request. The search part, if |
302 |
present, is sent. Spaces and control characters in URLs must be |
303 |
escaped for transmission in HTTP. |
304 |
|
305 |
GOPHER |
306 |
|
307 |
Gopher selector strings are, in general, interpreted as a sequence |
308 |
of 8-bit bytes which may contain any characters other than tab, |
309 |
return, or linefeed. It is necessary to encode any characters |
310 |
disallowed in a URL, including spaces and other binary data not in |
311 |
the allowed character set, using the standard convention of the "%" |
312 |
character followed by two hexadecimal digits. |
313 |
|
314 |
Note that slash "/" in gopher selector strings may not correspond |
315 |
to a level in a hierarchical structure. |
316 |
|
317 |
The format of a gopher URL is: |
318 |
|
319 |
1. A single-character field to denote the Gopher type of the |
320 |
resource to which the URL refers. |
321 |
|
322 |
2. The gopher selector string. Note that some gopher selector |
323 |
strings begin with a copy of the gopher type character, in which |
324 |
case that character will occur twice consecutively. Also note |
325 |
that the gopher selector string may be an empty string since |
326 |
this is how gopher clients refer to the top-level directory on |
327 |
a gopher server. |
328 |
|
329 |
If the URL does not refer to a Gopher+ item and if there is no |
330 |
gopher search string then parts 3, 4, 5, and 6 of the URL are |
331 |
optional |
332 |
|
333 |
3. An encoded tab character (%09) to seperate the gopher |
334 |
selector string from the optional search string (see 4 below). |
335 |
|
336 |
4. The gopher search string. If the URL refers to a search to |
337 |
be submitted to a gopher search engine, the search string is |
338 |
required. Otherwise this is an empty string. |
339 |
|
340 |
5. An encoded tab character (%09) to seperate the gopher search |
341 |
string from the optional gopher+ string (see 6 below). |
342 |
[suggestion: Note that if the URL refers to a gopher+ item and |
343 |
does not have a gopher search string, there will be two encoded |
344 |
|
345 |
|
346 |
|
347 |
Berners-Lee 6 |
348 |
|
349 |
tab characters in a row.] |
350 |
|
351 |
6. The Gopher+ string. Gopher+ strings consist of a one or more |
352 |
characters and are used to represent information required for |
353 |
retrieval of the Gopher+ item. Gopher+ items may have alternate |
354 |
views, arbitrary sets of attributes, and may have electronic |
355 |
forms associated with them. To accomodate the various Gopher+ |
356 |
objects, the Gopher+ string in the URL must accomodate a |
357 |
mapping of the information a Gopher+ client sends to the server. |
358 |
This makes this section a bit long since we basically cover the |
359 |
entire Gopher+ protocol here. |
360 |
|
361 |
When a Gopher server returns a directory listing to a client, |
362 |
Gopher+ items are tagged with either a "+" (denoting gopher+ items) |
363 |
or a "?" (denoting items which have a +ASK form associated with |
364 |
them). A Gopher+ string which is only a "+" refers to the default |
365 |
view (data representation) of the item. To retrieve this item a |
366 |
gopher+ client should send |
367 |
|
368 |
a_gopher_selector<tab>+<cr><lf> |
369 |
|
370 |
to the gopher+ server. |
371 |
|
372 |
Note that items which have a +ASK asssociated with them (ie. |
373 |
Gopher+ items tagged with a "?") require the client to fetch the |
374 |
item's +ASK attribute to get the form definition, and then ask the |
375 |
user to fill out the form and return the user's responces along |
376 |
with the selector string to retrieve the item. Gopher+ clients |
377 |
know how to do this but depend on the "?" tag in the gopher+ item |
378 |
description to know when to handle this case. The "?" is used in |
379 |
the Gopher+ string to be consistent with Gopher+ protocol's use of |
380 |
this symbol. |
381 |
|
382 |
To refer to the Gopher+ attributes of an item, the Gopher+ string |
383 |
might consist of "!" or "$". "!" refers to the all of a gopher+ |
384 |
item's attributes. "$" refers to all the item attributes for all |
385 |
items in a Gopher directory. To retrieve an item or directory's |
386 |
attributes, a gopher client will send: |
387 |
|
388 |
a_gopher_selector<tab>!<cr><lf> |
389 |
|
390 |
for items or |
391 |
|
392 |
a_gopher_selector<tab>$<cr><lf> |
393 |
|
394 |
for directories to the gopher+ server. |
395 |
|
396 |
To refer to specific attributes, the Gopher+ string is |
397 |
"!attribute_name" or "$attribute_name". For example, to refer to |
398 |
the attribute containing the abstract of an item, the Gopher+ |
399 |
string would be "!+ABSTRACT". To refer to several attributes, |
400 |
clients send the server the attribute names seperated by spaces so |
401 |
it is neccesary to seperate the attribute names with coded spaces. |
402 |
|
403 |
|
404 |
|
405 |
Berners-Lee 7 |
406 |
|
407 |
To retrieve a collection of item attributes specified with a |
408 |
gopher+ string of "!+ABSTRACT%20+SMELL" a gopher client would send |
409 |
|
410 |
a_gopher_selector<tab>!+ABSTRACT +SMELL<cr><lf> |
411 |
|
412 |
to the gopher server. |
413 |
|
414 |
Gopher+ allows for optional alternate data representations |
415 |
(alternate views) of items. To retrieve a Gopher+ alternate view, |
416 |
the gopher+ client sends the appropriate view and language |
417 |
identifier (found in the item's +VIEW attribute). To refer to a |
418 |
specific Gopher+ alternate view, the URL's Gopher+ string would be |
419 |
in the form "+view_name%20language_name". For example, a gopher+ |
420 |
string of "+application/postscript%20Es_ES" refers to the spanish |
421 |
language postscript alternate view of a gopher+ item. To retrieve |
422 |
this alternate view the client would send |
423 |
|
424 |
a_gopher_selector<tab>+application/postscript Es_ES<cr><lf> |
425 |
|
426 |
to the gopher server. |
427 |
|
428 |
The gopher+ string for a URL that refers to an item referenced by |
429 |
an ASK form filled out with specific values is essentially a coded |
430 |
version of what the client sends to the server. The gopher+ string |
431 |
will be of the form |
432 |
|
433 |
+%091%0D%0A+-1%0D%0Aask_item1_value%0D%0Aask_item2_value%0D%0A.%0D%0 |
434 |
A |
435 |
|
436 |
To retrieve this item, the gopher client sends: |
437 |
|
438 |
a_gopher_selector<tab>+<tab>1<cr><lf> |
439 |
+-1<cr><lf> |
440 |
ask_item1_value<cr><lf> |
441 |
ask_item2_value<cr><lf> |
442 |
.<cr><lf> |
443 |
|
444 |
to the gopher server. |
445 |
|
446 |
For a really complex example, consider a URL that refers to an |
447 |
alternate view of an item that is referenced with a filled-out |
448 |
Gopher +ASK form. The gopher+ string will be of the form: |
449 |
|
450 |
|
451 |
+view_name%20language_name%091%0D%0A+-1%0D%0Aask_item1_value%0D%0A |
452 |
ask_item2_value%0D%0A.%0D%0A |
453 |
|
454 |
To retrieve this item, the gopher client sends: |
455 |
|
456 |
a_gopher_selector<tab>+view_name language_name<tab>1<cr><lf> |
457 |
+-1<cr><lf> |
458 |
ask_item1_value<cr><lf> |
459 |
ask_item2_value<cr><lf> |
460 |
|
461 |
|
462 |
|
463 |
Berners-Lee 8 |
464 |
|
465 |
.<cr><lf> |
466 |
|
467 |
to the gopher server. |
468 |
|
469 |
Summary: gopher+ string part of Gopher URL |
470 |
|
471 |
To refer to an item which has an ASK form associated with it where |
472 |
the intent is to allow the user to enter values into the form as |
473 |
part of the retrieval process: |
474 |
|
475 |
%3F [was: ?] |
476 |
|
477 |
|
478 |
To refer to all or specific attributes of a gopher item: |
479 |
|
480 |
![attribute_name][%20attribute_name][%20attribute_name]... |
481 |
|
482 |
|
483 |
To refer to all or specific attributes of a gopher directory: |
484 |
|
485 |
$[attribute_name][%20attribute_name][%20attribute_name]... |
486 |
|
487 |
|
488 |
To refer to the content of a gopher+ item (including an item |
489 |
referred to by specific values in a filled-out ASK form): |
490 |
|
491 |
+[view_name[%20language_name]] |
492 |
[%091%0D%0A+-1%0D%0Aask_item1_value%0D%0Aask_item2_value...%0D%0A. |
493 |
%0D%0A] |
494 |
|
495 |
|
496 |
|
497 |
Overall summary and examples |
498 |
|
499 |
The general format of a Gopher URL path refering to a gopher type |
500 |
"T" item is: |
501 |
|
502 |
gopher://host [port]/T[gopher_selector]%09[search_string]%09[gopher+ |
503 |
_string] |
504 |
|
505 |
|
506 |
Examples: |
507 |
|
508 |
An example of a URL pointing to a gopher type 0 item (a document) |
509 |
is: |
510 |
|
511 |
gopher://host [port]/0a_gopher_selector |
512 |
|
513 |
|
514 |
An example of a URL pointing to a gopher type 7 item (a search |
515 |
engine) where the string foobar is to be submitted to the search |
516 |
engine is: |
517 |
|
518 |
|
519 |
|
520 |
|
521 |
Berners-Lee 9 |
522 |
|
523 |
gopher://host [port]/7a_gopher_selector%09foobar |
524 |
|
525 |
|
526 |
An example of a URL pointing to a Gopher+ type 0 item (a document) |
527 |
is: |
528 |
|
529 |
gopher://host [port]/0a_gopher_selector%09%09some_gplus_stuff |
530 |
|
531 |
|
532 |
An example of a URL pointing to a Gopher+ type 0 (document) item's |
533 |
attribute information is: |
534 |
|
535 |
gopher://host [port]/0a_gopher_selector%09%09! |
536 |
|
537 |
|
538 |
An example of a URL pointing to a Gopher+ document's spanish |
539 |
postscript representation is: |
540 |
|
541 |
gopher://host [port]/0a_gopher_selector%09%09+application/postscript |
542 |
%20Es_ES |
543 |
|
544 |
. |
545 |
|
546 |
MAILTO |
547 |
|
548 |
This allows a URL to specify an RFC822 addr-spec mail address. |
549 |
Note that use of % , for example as used in forming a gatewayed |
550 |
mail address, requires conversion to %25 in a URL. |
551 |
|
552 |
NEWS |
553 |
|
554 |
The news locators refer to either news group names or article |
555 |
message identifiers which must conform to the rules for a |
556 |
Message-Idof RFC 1036 (Horton 1987). A message identifier may be |
557 |
distinguished from a news group name by the presence of the |
558 |
commercial at "@" character. These rules imply that within an |
559 |
article, a reference to a news group or to another article will be |
560 |
a valid URL (in the partial form). |
561 |
|
562 |
A news URL may be dereferenced using NNTP (RFC977, Kantor 86) (The |
563 |
ARTICLE by message-id command ) or using any other protocol for the |
564 |
conveyance of usenet news articles, or by reference to a body of |
565 |
news articles already received. |
566 |
|
567 |
Note1: |
568 |
|
569 |
Among URLs the "news" URLs are anomalous in that they are |
570 |
location-independent. They are unsuitable as URN candidates because |
571 |
the NNTP architecture relies on the expiry of articles and |
572 |
therefore a small number of articles being available at any time. |
573 |
When a news: URL is quoted, the assumption is that the reader will |
574 |
fetch the article or group from his or her local news host. News |
575 |
host names are NOT part of news URLs. |
576 |
|
577 |
|
578 |
|
579 |
Berners-Lee 10 |
580 |
|
581 |
Note 2: |
582 |
|
583 |
An outstanding problem is that the message identifier is |
584 |
insufficient to allow the retrieval of an expired article, as no |
585 |
algorithm exists for deriving an archive site and file name. The |
586 |
addition of the date and news group set to the article's URL would |
587 |
allow this if a directory existed of archive sites by news group. |
588 |
Suggested subject of study in conjunction with NNTP working group. |
589 |
Further extension possible may be to allow the naming of subject |
590 |
threads as addressable objects. |
591 |
|
592 |
NNTP |
593 |
|
594 |
This is an alternative form of reference for news articles, |
595 |
specifically to be used with NNTP servers, and particularly those |
596 |
incomplete server implementations which do not allow retrieval by |
597 |
message identifier. In all other cases the "news" scheme should be |
598 |
used. |
599 |
|
600 |
The news server name, newsgroup name, and index number of an |
601 |
article within the newsgroup on that particular server are given. |
602 |
The NNTP protocol must be used. |
603 |
|
604 |
Note1. |
605 |
|
606 |
This form of URL is not of global accessability, as typically NNTP |
607 |
servers only allow access from local clients. Note that the |
608 |
article numbers within groups vary from server to server. |
609 |
|
610 |
This form or URL should not be quoted outside this local area. It |
611 |
should not be used within news articles for wider circulation than |
612 |
the one server. This is a local identifier for a resource which is |
613 |
often available globally, and so is not recommended except in the |
614 |
case in which incomplete NNTP implementations on the local server |
615 |
force its adoption. |
616 |
|
617 |
PROSPERO |
618 |
|
619 |
The Prospero (Neuman, 1991) directory service is used to resolve |
620 |
the URL yielding an access method for the object (which can then |
621 |
itself be represented as a URL if translated). The host part |
622 |
contains a host name or internet address. The port part is |
623 |
optional. |
624 |
|
625 |
The path part contains a host specific object name and an optional |
626 |
version number. If present, the version number is separated from |
627 |
the host specific object name by the characters "%00" (percent |
628 |
zero zero), this being an escaped string terminator (null). |
629 |
External Prospero links are represented as URLs of the underlying |
630 |
access method and are not represented as Prospero URLs. |
631 |
|
632 |
TELNET, RLOGIN, TN3270 |
633 |
|
634 |
|
635 |
|
636 |
|
637 |
Berners-Lee 11 |
638 |
|
639 |
The use of URLs to represent interactive sessions is a convenient |
640 |
extension to their uses for objects. This allows access to |
641 |
information systems which only provide an interactive service, and |
642 |
no information server. As information within the service cannot be |
643 |
addressed individually or, in general, automatically retrieved, |
644 |
this is a less desirable, though currently common, solution. |
645 |
|
646 |
WAIS |
647 |
|
648 |
The current WAIS implementation public domain requires that a |
649 |
client know the "type" of a object prior to retrieval. This value |
650 |
is returned along with the internal object identifier in the search |
651 |
response. It has been encoded into the path part of the URL in |
652 |
order to make the URL sufficient for the retrieval of the object. |
653 |
Within the WAIS world, names do not of course need to be prefixed |
654 |
by "wais:" (by the partial form rules). |
655 |
|
656 |
The wpath of a WAIS URL consists of encoded fields of the WAIS |
657 |
identifier, in the same order as inthe WAIS identifier. For each |
658 |
field, the identifier field number is the digits before the equals |
659 |
sign, and the field contents follow, encoded in the conventional |
660 |
encoding, terminated by ";". |
661 |
|
662 |
|
663 |
|
664 |
REGISTRATION OF NAMING SCHEMES |
665 |
|
666 |
A new naming scheme may be introduced by defining a mapping onto a |
667 |
conforming URL syntax, using a new prefix. Experimental prefixes |
668 |
may be used by mutual agreement between parties, and must start |
669 |
with the characters "x-". The scheme name "urn:" is reserved for |
670 |
the work in progress on a scheme for more persistent names. |
671 |
|
672 |
It is proposed that the Internet Assigned Numbers Authority (IANA) |
673 |
perform the function of registration of new schemes. Any submission |
674 |
of a new URI scheme must include a definition of an algorithm for |
675 |
the retrieval of any object within that scheme. The algorithm must |
676 |
take the URI and produce either a set of URL(s) which will lead to |
677 |
the desired object, or the object itself, in a well-defined or |
678 |
determinable format. |
679 |
|
680 |
It is recommended that those proposing a new scheme demonstrate its |
681 |
utility and operability by the provision of a gateway which will |
682 |
provide images of objects in the new scheme for clients using an |
683 |
existing protocol. If the new scheme is not a locator scheme, then |
684 |
the properties of names in the new space should be clearly defined. |
685 |
It is likewise recommended that, where a protocol allows for |
686 |
retrieval by URL, that the client software have provision for being |
687 |
configured to use specific gateway locators for indirect access |
688 |
through new naming schemes. |
689 |
|
690 |
BNF for specific URL schemes |
691 |
|
692 |
|
693 |
|
694 |
|
695 |
Berners-Lee 12 |
696 |
|
697 |
This is a BNF-like description of the Uniform Resource Locator |
698 |
syntax. A vertical line "|" indicates alternatives, and |
699 |
[brackets] indicate optional parts. Spaces are represented by the |
700 |
word "space", and the vertical line character by "vline". Single |
701 |
letters stand for single letters. All words of more than one letter |
702 |
below are entities described somewhere in this description. |
703 |
|
704 |
The current IETF URI working group preference is for the |
705 |
prefixedurl production. (Nov 1993. July 93: url). |
706 |
|
707 |
The "national" and "punctuation" characters do not appear in any |
708 |
productions and therefore may not appear in URLs. |
709 |
|
710 |
The "afsaddress" is left in as historical note, but is not a url |
711 |
production |
712 |
|
713 |
prefixedurl u r l : url |
714 |
|
715 |
ur l httpaddress | ftpaddress | newsaddress | |
716 |
nntpaddress | prosperoaddress | telnetaddress |
717 |
| gopheraddress | waisaddress | |
718 |
mailtoaddress | midaddress | cidaddress |
719 |
|
720 |
scheme ialpha |
721 |
|
722 |
httpaddress h t t p : / / hostport [ / path ] [ ? |
723 |
search ] |
724 |
|
725 |
ftpaddress f t p : / / login / path [ ! ftptype ] |
726 |
|
727 |
afsaddress a f s : / / cellname / path |
728 |
|
729 |
newsaddress n e w s : groupart |
730 |
|
731 |
nntpaddress n n t p : group / digits |
732 |
|
733 |
midaddress m i d : addr-spec |
734 |
|
735 |
cidaddress c i d : content-identifier |
736 |
|
737 |
mailtoaddress m a i l t o : : xalphas @ hostname |
738 |
|
739 |
waisaddress waisindex | waisdoc |
740 |
|
741 |
waisindex w a i s : / / hostport / database [ ? search |
742 |
] |
743 |
|
744 |
waisdoc w a i s : / / hostport / database / wtype / |
745 |
wpath |
746 |
|
747 |
wpath digits = path ; [ wpath ] |
748 |
|
749 |
groupart * | group | article |
750 |
|
751 |
|
752 |
|
753 |
Berners-Lee 13 |
754 |
|
755 |
group ialpha [ . group ] |
756 |
|
757 |
article xalphas @ host |
758 |
|
759 |
database xalphas |
760 |
|
761 |
wtype xalphas |
762 |
|
763 |
prosperoaddress prosperolink |
764 |
|
765 |
prosperolink p r o s p e r o : / / hostport / hsoname [ % |
766 |
0 0 version [ attributes ] ] |
767 |
|
768 |
hsoname path |
769 |
|
770 |
version digits |
771 |
|
772 |
attributes attribute [ attributes ] |
773 |
|
774 |
attribute alphanums |
775 |
|
776 |
telnetaddress t e l n e t : / / login |
777 |
|
778 |
gopheraddress g o p h e r : / / hostport [/ gtype [ |
779 |
selector ] ] [ ? search ] |
780 |
|
781 |
login [ user [ : password ] @ ] hostport |
782 |
|
783 |
hostport host [ : port ] |
784 |
|
785 |
host hostname | hostnumber |
786 |
|
787 |
ftptype A | I | D |
788 |
|
789 |
cellname hostname |
790 |
|
791 |
hostname ialpha [ . hostname ] |
792 |
|
793 |
hostnumber digits . digits . digits . digits |
794 |
|
795 |
port digits |
796 |
|
797 |
selector path |
798 |
|
799 |
path void | segment [ / path ] |
800 |
|
801 |
segment xpalphas |
802 |
|
803 |
search xalphas [ + search ] |
804 |
|
805 |
user xalphas |
806 |
|
807 |
password xalphas |
808 |
|
809 |
|
810 |
|
811 |
Berners-Lee 14 |
812 |
|
813 |
fragmentid xalphas |
814 |
|
815 |
gtype xalpha |
816 |
|
817 |
xalpha alpha | digit | safe | extra | escape |
818 |
|
819 |
xalphas xalpha [ xalphas ] |
820 |
|
821 |
xpalpha xalpha | + |
822 |
|
823 |
xpalphas xpalpha [ xpalpha ] |
824 |
|
825 |
ialpha alpha [ xalphas ] |
826 |
|
827 |
alpha a | b | c | d | e | f | g | h | i | j | k | |
828 |
l | m | n | o | p | q | r | s | t | u | v | |
829 |
w | x | y | z | A | B | C | D | E | F | G | |
830 |
H | I | J | K | L | M | N | O | P | Q | R | |
831 |
S | T | U | V | W | X | Y | Z |
832 |
|
833 |
digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
834 |
|
835 |
safe $ | - | _ | @ | . | & | + | - |
836 |
|
837 |
extra " | ' | ( | ) | , | space |
838 |
|
839 |
reserved = | ; | / | # | ? | : |
840 |
|
841 |
escape % hex hex |
842 |
|
843 |
hex digit | a | b | c | d | e | f | A | B | C | |
844 |
D | E | F |
845 |
|
846 |
national { | } | vline | [ | ] | \ | ^ | ~ |
847 |
|
848 |
punctuation < | > |
849 |
|
850 |
digits digit [ digits ] |
851 |
|
852 |
alphanum alpha | digit |
853 |
|
854 |
alphanums alphanum [ alphanums ] |
855 |
|
856 |
void |
857 |
|
858 |
(end of URL BNF) |
859 |
|
860 |
Security considerations |
861 |
|
862 |
The URL scheme does not in itself pose a security threat. Users |
863 |
should beware that there is no general guarantee that a URL which |
864 |
at one time points to a given object continues to do so, and does |
865 |
not even at some later time point to a different object due to the |
866 |
|
867 |
|
868 |
|
869 |
Berners-Lee 15 |
870 |
|
871 |
movement of objects on servers. |
872 |
|
873 |
A URL-related security threat is that it is sometimes possible to |
874 |
construct a URL such that an attempt to perform a harmless |
875 |
idempotent operation such as the retrieval of the object will in |
876 |
fact cause a possibly damaging remote operation to occur. The |
877 |
unsafe URL is typically constructed by specifying a port number |
878 |
other than that reserved for the network protocol in question. The |
879 |
client unwittingly contacts a server which is in fact running a |
880 |
different protocol. The content of the URL contains instructions |
881 |
which when interpreted according to this other protocol cause an |
882 |
unexpected ooperation. An example has been the use of gopher URLs |
883 |
to cause a rude message to be sent via a SMTP server. Caution |
884 |
should be used when using any URL which specifies a port number |
885 |
other than the default for the protocol, especially when it is a |
886 |
number within the reserved space. |
887 |
|
888 |
Care should be taken when URLs contain embedded encoded delimiters |
889 |
for a given protocol (for example, CR and LF characters for telnet |
890 |
protocols) that these are not unencoded before transmission. This |
891 |
would violate the protocol but could be used to simulate an extra |
892 |
operation or parameter, again causing an unexpected and possible |
893 |
harmful remote operation to be performed. |
894 |
|
895 |
The use of URLs containing passwords is clearly unwise. |
896 |
|
897 |
Acknowledgements |
898 |
|
899 |
This paper builds on the basic W3 design and much discussion of |
900 |
these issues by many people on the network. The discussion was |
901 |
particularly stimulated by articles by Clifford Lynch (1991), |
902 |
Brewster Kahle (1991) and Wengyik Yeong (1991b). Contributions from |
903 |
John Curran (NEARnet), Clifford Neuman (ISI) Ed Vielmetti (MSEN) |
904 |
and later the IETF URL BOF and URI working group have been |
905 |
incorporated into this issue of this paper. |
906 |
|
907 |
The draft url4 (Internet Draft 00) was generated from url3 |
908 |
following discussion and overall approval of the URL working group |
909 |
on 29 March 1993. The paper url3 had been generated from udi2 in |
910 |
the light of discussion at the UDI BOF meeting at the Boston IETF |
911 |
in July 1992. Draft url4 was Internet Draft 00. Draft url5 |
912 |
incorporated changes suggested by Clifford Neuman, and draft url6 |
913 |
(ID 01) incorporated character group changes and a few other fixes |
914 |
defined by the IETF URI WG in submitting it as a proposed standard. |
915 |
URL7 (Internet Draft 02) incorporated changes introduced at the |
916 |
Amsterdam IETF and refined in net discussion. |
917 |
|
918 |
The draft 03 includes changes made at Houston in Nov 93, and on the |
919 |
net before Seattle March 1994. |
920 |
|
921 |
APPENDICES |
922 |
|
923 |
The following are not formally part of this document. |
924 |
|
925 |
|
926 |
|
927 |
Berners-Lee 16 |
928 |
|
929 |
Wrappers for URIs in plain text |
930 |
|
931 |
This section does not formally form part of the URL specification . |
932 |
|
933 |
URIs, including URLs, will ideally be transmitted though protocols |
934 |
which accept them and data formats which define a context for them. |
935 |
However, in practice nowadays there are many occasions when URLs |
936 |
are included in plain ASCII non-marked-up text such as electronic |
937 |
mail and usenet news messages. |
938 |
|
939 |
In this case, it is convenient to have a separate wrapper syntax to |
940 |
define delimiters which will enable the human or automated reader |
941 |
to recognize that the URI is a URI. |
942 |
|
943 |
The recommendation is that the angle brackets (less than and |
944 |
greater than signs) of the ASCII set be used for this purpose. |
945 |
|
946 |
These wrappers do not form part of the URL, are not mandatory, and |
947 |
should not be used in contexts (such as SGML parameters, HTTP |
948 |
requests, etc) in which delimiters are already specified. |
949 |
|
950 |
Example |
951 |
|
952 |
Yes, Jim, I found it under <ftp://info.cern.ch/pub/www/doc> but |
953 |
you can probably pick it up from <ftp://ds.internic.net/rfc>. |
954 |
|
955 |
|
956 |
|
957 |
REFERENCES |
958 |
|
959 |
Alberti, R., et.al. (1991) |
960 |
"Notes on the Internet Gopher Protocol" |
961 |
University of Minnesota, December 1991, |
962 |
<ftp://boombox.micro.umn.edu/pub/gopher/ |
963 |
gopher_protocol> . See also |
964 |
<gopher://gopher.micro.umn.edu/00/Information |
965 |
About Gopher/About Gopher> |
966 |
|
967 |
Berners-Lee, T ., (1991) |
968 |
"Hypertext Transfer Protocol (HTTP)" , CERN, |
969 |
December 1991, as updated from time to time, |
970 |
<ftp://info.cern.ch/pub/www/doc/http-spec.txt |
971 |
> |
972 |
|
973 |
Crocker "Standard for ARPA Internet Text Messages" . |
974 |
David H. Crocker, RFC822, |
975 |
|
976 |
Davis, F, et al., (1990) |
977 |
"WAIS Interface Protocol: Prototype |
978 |
Functional Specification", Thinking Machines |
979 |
Corporation, April 23, 1990 |
980 |
<ftp://quake.think.com/pub/wa |
981 |
is/doc/protspec.txt> |
982 |
|
983 |
|
984 |
|
985 |
Berners-Lee 17 |
986 |
|
987 |
International Standards Organization, (1991) |
988 |
Information and Documentation - Search and |
989 |
Retrieve Application Protocol Specification |
990 |
for open Systems Interconnection, ISO-10163 |
991 |
|
992 |
Horton (1987) M. Horton, R. Adams, "Standard for |
993 |
interchange of USENET messages", Internet RFC |
994 |
1036 , 12/01/1987. |
995 |
|
996 |
Huitema, C., (1991) "Naming: strategies and techniques", |
997 |
Computer Networks and ISDN Systems 23 (1991) |
998 |
107-110. |
999 |
|
1000 |
Kahle, Brewster, (1991) |
1001 |
"Document Identifiers, or International |
1002 |
Standard Book Numbers for the Electronic |
1003 |
Age", |
1004 |
<ftp: |
1005 |
//quake.think.com/pub/wais/doc/doc-ids.txt> |
1006 |
|
1007 |
Kantor, B., and Lapsley, P., (1986) |
1008 |
"A proposed standard for the stream-based |
1009 |
transmission of news" , Internet RFC-977, |
1010 |
February 1986. |
1011 |
<ftp://ds.internic.net/rfc/rfc977.txt> |
1012 |
|
1013 |
Kunze, 1994 J. Kunze, Requirements for URLs, to be |
1014 |
published. |
1015 |
|
1016 |
Lynch, C., Coallition for Networked Information: (1991) |
1017 |
"Workshop on ID and Reference Structures for |
1018 |
Networked Information", November 1991. See |
1019 |
<wais://quake.think.com/wais-discussion-ar |
1020 |
chives?lynch> |
1021 |
|
1022 |
Mockapetris, P., (1987) |
1023 |
"Domain names + concepts and facilities", |
1024 |
RFC-1034, USC-ISI, November 1987, |
1025 |
<ftp://ds.internic.net/rfc/rfc1034.txt> |
1026 |
|
1027 |
Neuman, B. Clifford, (1992) |
1028 |
"Prospero: A Tool for Organizing Internet |
1029 |
Resources", Electronic Networking: Research, |
1030 |
Applications and Policy, Vol 1 No 2, Meckler |
1031 |
Westport CT USA. See also |
1032 |
<ftp://prospero.isi.edu/pub/prospero/oir.ps> |
1033 |
|
1034 |
Postel, J. and Reynolds, J. (1985) |
1035 |
"File Transfer Protocol (FTP)", Internet |
1036 |
RFC-959, October 1985. |
1037 |
<ftp://ds.internic.net/rfc/rfc959.txt> |
1038 |
|
1039 |
Sollins 1994 K. Sollins and L. Masinter, Requiremnets for |
1040 |
|
1041 |
|
1042 |
|
1043 |
Berners-Lee 18 |
1044 |
|
1045 |
URNs, to be published. |
1046 |
|
1047 |
Yeong, W., (1991a) "Towards Networked Information Retrieval", |
1048 |
Technical report 91-06-25-01, June 1991, |
1049 |
Performance Systems International, Inc. |
1050 |
<ftp://uu.psi.com/wp/nir.txt> |
1051 |
|
1052 |
Yeong, W., (1991b), "Representing Public Archives in the |
1053 |
Directory", Internet Draft, November 1991, |
1054 |
now expired. |
1055 |
|
1056 |
. |
1057 |
|
1058 |
EDITOR'S ADDRESS |
1059 |
|
1060 |
Tim Berners-Lee |
1061 |
Address: World-Wide Web project |
1062 |
CERN, |
1063 |
1211 Geneva 23, |
1064 |
Switzerland |
1065 |
|
1066 |
Telephone: +41 (22)767 3755 |
1067 |
Fax: +41 (22)767 7155 |
1068 |
Email: timbl@info.cern.ch |
1069 |
|
1070 |
|
1071 |
|
1072 |
|
1073 |
|
1074 |
|
1075 |
|
1076 |
|
1077 |
|
1078 |
|
1079 |
|
1080 |
|
1081 |
|
1082 |
|
1083 |
|
1084 |
|
1085 |
|
1086 |
|
1087 |
|
1088 |
|
1089 |
|
1090 |
|
1091 |
|
1092 |
|
1093 |
|
1094 |
|
1095 |
|
1096 |
|
1097 |
|
1098 |
|
1099 |
|
1100 |
|
1101 |
|
1102 |
Berners-Lee 19 |
1103 |
|