| 1 |
Uniform Resource Locators (URL) Tim Berners-Lee |
| 2 |
draft-ietf-uri-url-02.{ps,txt} CERN |
| 3 |
Expires 1 July 1994 1 Jan 1994 |
| 4 |
|
| 5 |
|
| 6 |
Uniform Resource Locators (URL) |
| 7 |
|
| 8 |
A Unifying Syntax for the Expression of |
| 9 |
Names and Addresses of Objects on the Network |
| 10 |
|
| 11 |
|
| 12 |
Status of this memo |
| 13 |
|
| 14 |
This document is an Internet Draft. Internet Drafts are working |
| 15 |
documents of the Internet Engineering Task Force (IETF), its Areas, |
| 16 |
and its Working Groups. Note that other groups may also distribute |
| 17 |
working documents as Internet Drafts. |
| 18 |
|
| 19 |
Internet Drafts are working documents valid for a maximum of six |
| 20 |
months. Internet Drafts may be updated, replaced, or obsoleted by |
| 21 |
other documents at any time. It is not appropriate to use Internet |
| 22 |
Drafts as reference material or to cite them other than as a |
| 23 |
"working draft" or "work in progress". |
| 24 |
|
| 25 |
Distribution of this document is unlimited. Please send comments |
| 26 |
to the author as timbl@info.cern.ch. or to the discussion list |
| 27 |
ietf-url@merit.edu. |
| 28 |
|
| 29 |
Abstract |
| 30 |
|
| 31 |
Many protocols and systems for document search and retrieval are |
| 32 |
currently in use, and many more protocols or refinements of |
| 33 |
existing protocols are to be expected in a field whose expansion is |
| 34 |
explosive. |
| 35 |
|
| 36 |
These systems are aiming to achieve global search and readership of |
| 37 |
documents across differing computing platforms, and despite a |
| 38 |
plethora of protocols and data formats. As protocols evolve, |
| 39 |
gateways can allow global access to remain possible. As data |
| 40 |
formats evolve, format conversion programs can preserve global |
| 41 |
access. There is one area, however, in which it is impractical to |
| 42 |
make conversions, and that is in the names and addresses used to |
| 43 |
identify objects. This is because names and addresses of objects |
| 44 |
are passed on in so many ways, from the backs of envelopes to |
| 45 |
hypertext objects, and may have a long life. |
| 46 |
|
| 47 |
A common feature of almost all the data models of past and proposed |
| 48 |
systems is something which can be mapped onto a concept of "object" |
| 49 |
and some kind of name, address, or identifier for that object. One |
| 50 |
can therefore define a set of name spaces in which these objects |
| 51 |
can be said to exist. |
| 52 |
|
| 53 |
Practical systems need to access and mix objects which are part of |
| 54 |
|
| 55 |
|
| 56 |
|
| 57 |
Berners-Lee 1 |
| 58 |
|
| 59 |
different existing and proposed systems. |
| 60 |
|
| 61 |
This paper discusses the requirements on a universal syntax which |
| 62 |
can be used to encapsulate a name in any registered name space. |
| 63 |
This will allow names in different spaces to be treated in a common |
| 64 |
way, even though names in different spaces have differing |
| 65 |
characteristics, as do the objects to which they refer |
| 66 |
|
| 67 |
The universal syntax to objects available using existing protocols, |
| 68 |
and may be extended with technology. It makes a recommendation for |
| 69 |
a generic syntax, and for specific forms for "Uniform Resource |
| 70 |
Locators" (URLs)of objects accessible using existing Internet |
| 71 |
protocols. |
| 72 |
|
| 73 |
The syntax has been in widespread use by World-Wide Web software |
| 74 |
since 1990. |
| 75 |
|
| 76 |
Terms |
| 77 |
|
| 78 |
The objects on the network which are to be named and addressed |
| 79 |
include typically objects which can be retrieved, and objects which |
| 80 |
can be searched. There is a great variety of other objects which |
| 81 |
may support other operations. We imply nothing about the contents |
| 82 |
of objects in this document. Whereas human-readable documents are |
| 83 |
currently the center of interest of the field, we envisage all |
| 84 |
aspects discussed in this paper applying to generalized objects |
| 85 |
when systems to handle them become available. The "object" is the |
| 86 |
unit of reference and need not correspond to any unit of storage. |
| 87 |
We refer to objects which can be searched as "indexes". We |
| 88 |
emphasize that this is the abstract view of the client, and these |
| 89 |
objects need not correspond to physical files on computers. We |
| 90 |
refer to the person who does the retrieval or searching as the |
| 91 |
user. |
| 92 |
|
| 93 |
Within this document, we use the terms "name" very generally for a |
| 94 |
string of characters describing an object, whatever its |
| 95 |
combination of properties mentioned below. (The term usually has a |
| 96 |
narrower meaning but we needed some term for the universal set.). |
| 97 |
This uniform syntax applied to a generic name is known as a Uniform |
| 98 |
Resource Identifier (URI). The term "address" is reserved for an |
| 99 |
string which specifies a more or less physical location. The term |
| 100 |
"locator" refers to a URL as here defined. URIs which have a |
| 101 |
greater persistence than URLs are referred to as URNs. |
| 102 |
|
| 103 |
Characteristics |
| 104 |
|
| 105 |
This section characteristics of various naming schemes, |
| 106 |
requirements which some ofexisting schemes meet, and requirements |
| 107 |
for the URL scheme itself. URLs, as an introduction of and |
| 108 |
background for the Recommendations section. |
| 109 |
|
| 110 |
USES OF NAMES AND ADDRESSES |
| 111 |
|
| 112 |
|
| 113 |
|
| 114 |
|
| 115 |
Berners-Lee 2 |
| 116 |
|
| 117 |
A name allows a user, with the help of a "client" program, to |
| 118 |
retrieve or operate on objects via a "server" program. A name may |
| 119 |
be passed for example: |
| 120 |
|
| 121 |
In communication of any form between two people, to refer to a |
| 122 |
document, or part of a document; |
| 123 |
|
| 124 |
As part of the description of a link associated with a hypertext |
| 125 |
document; |
| 126 |
|
| 127 |
As part of the result of searching an index. |
| 128 |
|
| 129 |
Some typical requirements on a name which are met to a varying |
| 130 |
degree by various schemes are for example that the name is |
| 131 |
|
| 132 |
Persistent A given name will remain valid as long as it |
| 133 |
is needed; |
| 134 |
|
| 135 |
Extensible A given naming syntax will remain valid |
| 136 |
through the introduction of new protocols and |
| 137 |
directory technologies; |
| 138 |
|
| 139 |
Resolvable A name will contain enough information to |
| 140 |
allow the document or index to which it |
| 141 |
refers to be accessed, perhaps via resolution |
| 142 |
into an intermediate, more physical, name. |
| 143 |
|
| 144 |
Unique Each object can only have one such name. |
| 145 |
The fact that two such names are different |
| 146 |
implies that the objects to which they refer |
| 147 |
are different (in some way). |
| 148 |
|
| 149 |
Unambiguous The fact that two names are identical |
| 150 |
implies that the objects named are the same |
| 151 |
(in some way). |
| 152 |
|
| 153 |
The syntax discussed is the syntax of one name, be it a lasting |
| 154 |
name or a physical address. When a directory server or hypertext |
| 155 |
link contains a set of alternative names, then that is beyond the |
| 156 |
scope of this syntax. Similarly, a syntax for describing a |
| 157 |
compound object is outside the scope of this syntax. The specific |
| 158 |
locator name spaces (defined under the umbrella of the general |
| 159 |
syntax) each meet the requirements above to a greater or lesser |
| 160 |
extent. |
| 161 |
|
| 162 |
CURRENT PRACTICE |
| 163 |
|
| 164 |
Current protocols use many different standards for names. For some |
| 165 |
protocols, such as ISO-10163 Search and Retrieve protocol[16], the |
| 166 |
names returned in a search are only valid during the session. For |
| 167 |
others, such as FTP[9], they are lasting names which may be used |
| 168 |
for object retrieval at a later time. Typically, however, they are |
| 169 |
not long-lasting names which are independent of the location of the |
| 170 |
|
| 171 |
|
| 172 |
|
| 173 |
Berners-Lee 3 |
| 174 |
|
| 175 |
object. Such names may be provided using directory servers such as |
| 176 |
x.500. They will refer to the registration, however formal or |
| 177 |
informal, of a object with a particular organisation or person. |
| 178 |
Both hypertext and manual references rely on long- lasting names. |
| 179 |
Current names are basically location specifiers (addresses). These |
| 180 |
may be known as Uniform Resource Locators (URLs). They give the |
| 181 |
necessary parts of an address for a reader to access an information |
| 182 |
provider using the given protocol, and ask for the object required. |
| 183 |
Examples of names used by various protocols include |
| 184 |
|
| 185 |
File Transfer Protocol (Postel 1985): |
| 186 |
|
| 187 |
Host name or IP-address |
| 188 |
|
| 189 |
[TCP port] |
| 190 |
|
| 191 |
[user name, password] |
| 192 |
|
| 193 |
Filename |
| 194 |
|
| 195 |
W.A.I.S. (Kahle 1990) |
| 196 |
|
| 197 |
Host name or IP-address |
| 198 |
|
| 199 |
[TCP port] |
| 200 |
|
| 201 |
local document id |
| 202 |
|
| 203 |
Gopher (Alberti 1991) |
| 204 |
|
| 205 |
Host name or IP-address |
| 206 |
|
| 207 |
[TCP port] |
| 208 |
|
| 209 |
database name |
| 210 |
|
| 211 |
selector string |
| 212 |
|
| 213 |
HTTP (Berners-Lee 1991) |
| 214 |
|
| 215 |
Host name or IP-address |
| 216 |
|
| 217 |
[TCP port] |
| 218 |
|
| 219 |
local object id |
| 220 |
|
| 221 |
NNTP (Kantor 1986) |
| 222 |
|
| 223 |
NNTP group |
| 224 |
|
| 225 |
Group name |
| 226 |
|
| 227 |
NNTP article |
| 228 |
|
| 229 |
|
| 230 |
|
| 231 |
Berners-Lee 4 |
| 232 |
|
| 233 |
Host name |
| 234 |
|
| 235 |
unique message identifier |
| 236 |
|
| 237 |
Prospero links (Neuman 1992) |
| 238 |
|
| 239 |
Host name or IP address |
| 240 |
|
| 241 |
[UDP port] |
| 242 |
|
| 243 |
Host specific object name |
| 244 |
|
| 245 |
[version] |
| 246 |
|
| 247 |
[identifier]* |
| 248 |
|
| 249 |
x.500 distinguished name |
| 250 |
|
| 251 |
Country |
| 252 |
|
| 253 |
Organisation |
| 254 |
|
| 255 |
Organisational unit |
| 256 |
|
| 257 |
Person |
| 258 |
|
| 259 |
Local object identifier |
| 260 |
|
| 261 |
Other systems with their own naming schemes include BITNET |
| 262 |
"LISTSERV" application, FTAM file retrieval, SQLnetTM remote |
| 263 |
database search, proprietary distributed file systems, etc. |
| 264 |
Conventional syntax for writing these addresses involve various |
| 265 |
forms of punctuation to separate these parts. This sometimes, but |
| 266 |
not always, allows the naming scheme to be deduced from the |
| 267 |
punctuation. For example, a name of the form |
| 268 |
xxx.yyy.zz.edu:/pub.aa.bb.cc often implies anonymous FTP access. |
| 269 |
However, there is no well-defined algorithm for parsing an |
| 270 |
arbitrary name, as there is no common syntax. |
| 271 |
|
| 272 |
EXPANDABILITY |
| 273 |
|
| 274 |
There will necessarily be a phase during which lasting names will |
| 275 |
become more common, as the deployment of directory services |
| 276 |
increases to the point where every user has direct or indirect |
| 277 |
access to one. Even then, however, one can envisage more than one |
| 278 |
competing directory system, and cases in which physical names are |
| 279 |
still required. A directory service takes a lasting name and |
| 280 |
reduces it to a physical address (or set of addresses) which, |
| 281 |
though less useful for lasting reference, is the only way to |
| 282 |
actually retrieve the object. An addressing syntax is required |
| 283 |
which will be able to encompass existing physical address spaces, |
| 284 |
and be extendible to any future protocols. This requires that it |
| 285 |
contain an identifier for the protocol in use. The format of the |
| 286 |
|
| 287 |
|
| 288 |
|
| 289 |
Berners-Lee 5 |
| 290 |
|
| 291 |
rest of the address will necessarily depend to a certain extent on |
| 292 |
the protocol. |
| 293 |
|
| 294 |
RELEVANCE |
| 295 |
|
| 296 |
The life of a name is limited by any information contained within |
| 297 |
it which may become prematurely invalid. It is therefore necessary |
| 298 |
to limit the contents of a name to the information required for the |
| 299 |
operations above. Other extraneous information about the object |
| 300 |
(its size, data format, authorisation details, etc.) may in general |
| 301 |
change with time and should not be part of the name. One might |
| 302 |
expect such information to be part of the "header" of a object, and |
| 303 |
for protocols to allow the header information to be retrieved |
| 304 |
independently of the objects themselves. Any physical address may |
| 305 |
be subject to change with time: hence we encourage the move to |
| 306 |
lasting names and directory services. |
| 307 |
|
| 308 |
UNIQUENESS |
| 309 |
|
| 310 |
Clearly one requires unambiguous names in the sense that one name |
| 311 |
should refer to only one logical object. This is the case with all |
| 312 |
the addressing schemes in use, whether they are directory systems |
| 313 |
or physical addresses. (The internet addresses all rely on the |
| 314 |
domain name (Mockapetris 1987) of the host to achieve this). |
| 315 |
However, given that names can be translated, many apparently |
| 316 |
different names may lead to the same object. Any object may |
| 317 |
therefore be referred to by many names. One needs to be able to |
| 318 |
know whether two objects, retrieved through different paths, are |
| 319 |
in fact the same object. It is suggested that each object have a |
| 320 |
unique "official" name. This name could be stored in the object in |
| 321 |
some representations, or stored in a database accessible to the |
| 322 |
server, for example. Any references within that object should be |
| 323 |
parsed in the context of the official name. In the presence of a |
| 324 |
directory service, the official name will normally be the |
| 325 |
registered name of the object. However, a name in any scheme will |
| 326 |
do, so long as it is completely specified. On systems which do not |
| 327 |
allow the name to be stored (such as anonymous FTP archive sites), |
| 328 |
a possible ambiguity will always exist as to whether two similarly |
| 329 |
named objects are in fact the same. Note that Internet newsgroup |
| 330 |
names are unique world-wide, and news articles carry a unique |
| 331 |
message id. In most other cases, however, there is no guarantee |
| 332 |
that dereferencing a URL will work, or that if it does the object |
| 333 |
it refers to will in fact be the object intended. URLs such as FTP |
| 334 |
addresses are transient in that files may be moved and even |
| 335 |
replaced by different files of the same name. This disorganisation |
| 336 |
may be limited by good server management, but a naming scheme which |
| 337 |
is independent also of internet host name is obviously preferable. |
| 338 |
|
| 339 |
READABILITY BY PEOPLE |
| 340 |
|
| 341 |
This requirement has been put forward by several people (Clifford |
| 342 |
Lynch, Douglas Engelbart among others), and disputed by others. |
| 343 |
The author's view is that it will be a while before technology and |
| 344 |
|
| 345 |
|
| 346 |
|
| 347 |
Berners-Lee 6 |
| 348 |
|
| 349 |
standardisation have reached the point at which names and addresses |
| 350 |
will be hidden from human beings. As long as they must be written |
| 351 |
on the backs of envelopes and "cut and pasted" between workstation |
| 352 |
windows, there is a strong need for names to be |
| 353 |
|
| 354 |
Short |
| 355 |
|
| 356 |
Composed of printable (preferably non-white) characters |
| 357 |
|
| 358 |
To a certain extent, understadable by a human being. |
| 359 |
|
| 360 |
STRUCTURE OF NAMES AND ADDRESSES |
| 361 |
|
| 362 |
A physical address is required in order for: |
| 363 |
|
| 364 |
The user's program to contact the server; |
| 365 |
|
| 366 |
The server to perform the operation (e.g. search and index, |
| 367 |
retrieve a object, or look up the name) and return a result; |
| 368 |
|
| 369 |
The user's program to locate an individual position or element |
| 370 |
within a returned object. |
| 371 |
|
| 372 |
This suggests that a name be structured, such that the parts |
| 373 |
necessary for these three operations be separate and only used by |
| 374 |
those system elements which need those parts. This corresponds to |
| 375 |
the basic principle of information hiding. In fact, four parts |
| 376 |
are necessary, including the indicator of the naming scheme to be |
| 377 |
used: |
| 378 |
|
| 379 |
The naming scheme: a registered identifier for the protocol. |
| 380 |
|
| 381 |
The name of a suitable server. The format of this part must be |
| 382 |
well defined. It will depend on the lower-layer protocols in |
| 383 |
use. Systems which use widely distributed information, such as |
| 384 |
x.500 and NNTP, do not need this part as each client generally |
| 385 |
contacts his nearest server (or a particular server). |
| 386 |
|
| 387 |
Information to be passed to the server. This may be private to |
| 388 |
the server, as all names may be generated and used by the same |
| 389 |
server. This part of the name should be opaque to the client. |
| 390 |
|
| 391 |
Information to be used by the application once the object has |
| 392 |
been retrieved. This part is private to the application (or, |
| 393 |
more strictly, the data format) and so cannot be defined here. |
| 394 |
|
| 395 |
Both lasting names and physical addresses often share a |
| 396 |
hierarchical structure. This follows often from the organisation of |
| 397 |
the system. From the naming point of view, it has the advantage |
| 398 |
that a reference in one object to another object need not include |
| 399 |
that part of the structure which is common to both names. |
| 400 |
|
| 401 |
CHOICES FOR A UNIVERSAL SYNTAX |
| 402 |
|
| 403 |
|
| 404 |
|
| 405 |
Berners-Lee 7 |
| 406 |
|
| 407 |
The requirements above leave little room for choice save for the |
| 408 |
order and punctuation of the elements of an address. It is only |
| 409 |
reasonable for the order of writing of the parts to be consistently |
| 410 |
from left to right (or right to left) with increasing specificity. |
| 411 |
Punctuation schemes fall into two categories (Huitema 1991): tagged |
| 412 |
schemes in which field are given names, and fields which use |
| 413 |
special characters and field order. The latter tend to be more |
| 414 |
compact schemes. |
| 415 |
|
| 416 |
|
| 417 |
protocol: aftp host: xxx.yyy.edu path: |
| 418 |
|
| 419 |
/pub/doc/README |
| 420 |
|
| 421 |
PR=aftp; H=xx.yy.edu; PA=/pub/doc/README; |
| 422 |
|
| 423 |
PR:aftp/xx.yy.edu/pub/doc/README |
| 424 |
|
| 425 |
/aftp/xx.yy.edu/pub/doc/README |
| 426 |
|
| 427 |
Fig 1. Some alternative tagged and untagged representations |
| 428 |
|
| 429 |
The choice of special symbols for punctuation tends to be a matter |
| 430 |
of taste. It is easier to read addresses whose symbols correspond |
| 431 |
to those of one's favourite operating system. A variety of symbols |
| 432 |
is needed so that when a name is abbreviated it is possible to tell |
| 433 |
which parts have been omitted. |
| 434 |
|
| 435 |
The recommendation below uses special characters in order to |
| 436 |
achieve a compact name, and uses where possible punctuation symbols |
| 437 |
established in the internet or unix community. |
| 438 |
|
| 439 |
The choice of escape character for introducing representations of |
| 440 |
non-allowed characters also tends to be a matter of taste. An ANSI |
| 441 |
standard exists in the C language, using the back-slash character |
| 442 |
"\". The use of this character on unix command lines, however, can |
| 443 |
be a problem as it is interpreted by many shell programs, and would |
| 444 |
have itself to be escaped. |
| 445 |
|
| 446 |
There is a conflict between the need to be able to represent many |
| 447 |
characters including spaces within a URL directly, and the need to |
| 448 |
be able to use a URL in environments which have limited character |
| 449 |
sets or in which certain characters are prone to corruption. This |
| 450 |
conflict has been resolved by use of an hexadecimal escaping method |
| 451 |
which may be applied to any characters forbidden in a given |
| 452 |
context. When URLs are moved between contexts, the set of |
| 453 |
characters escaped may be enlarged or reduced unambiguously. |
| 454 |
|
| 455 |
The use of multiple white space characters is discouraged in URLs |
| 456 |
to be printed or sent by electronic mail. This is because of the |
| 457 |
frequent introduction of extraneous white space when lines are |
| 458 |
wrapped by systems such as mail, or sheer necessity of narrow |
| 459 |
column width, and because of the inter-conversion of various forms |
| 460 |
|
| 461 |
|
| 462 |
|
| 463 |
Berners-Lee 8 |
| 464 |
|
| 465 |
of white space which occurs during character code conversion and |
| 466 |
the transfer of text between applications. |
| 467 |
|
| 468 |
Recommendations |
| 469 |
|
| 470 |
This section describes the syntax for "Uniform Resource Locators" |
| 471 |
(URLs): that is, basically physical addresses of objects which are |
| 472 |
retrievable using protocols already deployed on the net. The |
| 473 |
generic syntax provides a framework for new schemes for names to be |
| 474 |
resolved using as yet undefined protocols. |
| 475 |
|
| 476 |
The syntax is described in two parts. Firstly, we give the syntax |
| 477 |
rules of a completely specified name; secondly, we give the rules |
| 478 |
under which parts of the name may be omitted in a well-defined |
| 479 |
context. |
| 480 |
|
| 481 |
FULL FORM |
| 482 |
|
| 483 |
A complete URL consists of a naming scheme specifier followed by a |
| 484 |
string whose format is a function of the naming scheme. For |
| 485 |
locators of information on the internet, a common syntax is used |
| 486 |
for the IP address part. A BNF description of the URL syntax is |
| 487 |
given in an a later section. The components are as follows. |
| 488 |
Fragment identifiers and partial URLs are not involved in the basic |
| 489 |
URL definition. |
| 490 |
|
| 491 |
SCHEME |
| 492 |
|
| 493 |
Within the URL of a object, the first element is the name of the |
| 494 |
scheme, separated from the rest of the object by a colon. The rest |
| 495 |
of the URL follows the colon in a format depending on the scheme. |
| 496 |
|
| 497 |
Internet protocol parts |
| 498 |
|
| 499 |
Those schemes which refer to internet protocols have a common |
| 500 |
syntax for the rest of the object name. This starts with a double |
| 501 |
slash "//" to indicate its presence, and continues until the |
| 502 |
following slash "/". Within that section are |
| 503 |
|
| 504 |
An optional user name, |
| 505 |
if this must be quoted to the server, |
| 506 |
followed by a commercial at sign "@". (Use |
| 507 |
of this field is discouraged. Provision of |
| 508 |
encoding a password after the user name, |
| 509 |
delimited by a colon, could be made but |
| 510 |
obviously is only useful when the password is |
| 511 |
public, in which case it should not be |
| 512 |
necessary, so that is also discouraged.) |
| 513 |
|
| 514 |
The internet domain name |
| 515 |
of the host in RFC1037 format (or, |
| 516 |
optionally and less advisably, the IP address |
| 517 |
as a set of four decimal digits) |
| 518 |
|
| 519 |
|
| 520 |
|
| 521 |
Berners-Lee 9 |
| 522 |
|
| 523 |
The port number, if it is not the default number for the |
| 524 |
protocol, is given in decimal notation after |
| 525 |
a colon. |
| 526 |
|
| 527 |
Path The rest of the locator is known as the |
| 528 |
"path". It may define details of how the |
| 529 |
client should communicate with the server, |
| 530 |
including information to be passed |
| 531 |
transparently to the server without any |
| 532 |
processing by the client. |
| 533 |
|
| 534 |
The path is interpreted in a manner dependent on the protocol being |
| 535 |
used. However, when it contains slashes, these must imply a |
| 536 |
hierarchical structure. |
| 537 |
|
| 538 |
ENCODING PROHIBITED CHARACTERS |
| 539 |
|
| 540 |
When a system uses a local addressing scheme, it is useful to |
| 541 |
provide a mapping from local addresses into URLs so that references |
| 542 |
to objects within the addressing scheme may be referred to |
| 543 |
globally, and possibly accessed through gateway servers. |
| 544 |
|
| 545 |
Any mapping scheme may be defined provided it is unambiguous, |
| 546 |
reversible, and provides valid URLs. It is recommended that where |
| 547 |
hierarchical aspects to the local naming scheme exist, they be |
| 548 |
mapped onto the hierarchical URL path syntax in order to allow the |
| 549 |
partial form to be used. |
| 550 |
|
| 551 |
The following encoding method shall be used for mapping WAIS, FTP, |
| 552 |
Prospero and Gopher addresses onto URLs. Where the local naming |
| 553 |
scheme uses ASCII characters which are not allowed in the URL, |
| 554 |
these may be represented in the URL by a percent sign "%" followed |
| 555 |
by two hexadecimal digits (0-9, A-F) giving the ISO Latin 1 code |
| 556 |
for that character. Character codes other than those allowed by |
| 557 |
the syntax shall not be used in a URL. |
| 558 |
|
| 559 |
The same encoding method may be used for encoding characters whose |
| 560 |
use, although technically allowed in a URL, would be unwise due to |
| 561 |
problems of corruption by imperfect gateways or misrepresentation |
| 562 |
due to the use of variant character sets, or which would simply be |
| 563 |
awkward in a given environment. As a % sign always indicates an |
| 564 |
encoded character, a URL may be made safer simply by encoding any |
| 565 |
characters considered unsafe, while leaving already encoded |
| 566 |
characters still encoded. |
| 567 |
|
| 568 |
(Note: If a new naming scheme is introduced which encodes binary |
| 569 |
data as opposed to text, then a more compact encoding such as pure |
| 570 |
hex or base 64 would be more appropriate.) |
| 571 |
|
| 572 |
The same considerations apply to mapping local fragment identifiers |
| 573 |
onto the fragmentid part of a URL. |
| 574 |
|
| 575 |
Specific Schemes |
| 576 |
|
| 577 |
|
| 578 |
|
| 579 |
Berners-Lee 10 |
| 580 |
|
| 581 |
The mapping for some existing standard and experimental protocols |
| 582 |
is outlined in the BNF syntax definition . Notes on particular |
| 583 |
protocols follow. |
| 584 |
|
| 585 |
HTTP |
| 586 |
|
| 587 |
The HTTP protocol specifies that the path is handled transparently |
| 588 |
by those who handle URLs, except for the servers which de-reference |
| 589 |
them. The path is passed by the client to the server with any |
| 590 |
request, but is not otherwise understood by the client. The |
| 591 |
fragmentid part is not sent with the request. The search part, if |
| 592 |
present, is sent. Spaces in URLs should be escaped for transmission |
| 593 |
in HTTP. |
| 594 |
|
| 595 |
FTP |
| 596 |
|
| 597 |
The ftp: prefix indicates a file which is to be picked up from the |
| 598 |
file system of the given host. The FTP protocol is used. The port |
| 599 |
number if given gives the port of the FTP server if not the FTP |
| 600 |
default. (A client may in practice use local file access to |
| 601 |
retrieve objects which are available though more efficient means |
| 602 |
such as local file open or NFS mounting, where this is available |
| 603 |
and equivalent). |
| 604 |
|
| 605 |
The syntax allows for the inclusion of a user name and even a |
| 606 |
password for those systems which do not use the anonymous FTP |
| 607 |
convention. The default, however, if no user or password is |
| 608 |
supplied, will be to use that convention, viz. that the user name |
| 609 |
is "anonymous" and the password the user's mail address. |
| 610 |
|
| 611 |
The adoption of a unix-style syntax involves the conversion into |
| 612 |
non-unix local forms by either the client or server. Some non-unix |
| 613 |
servers do this, but clients wishing to access sites which do not |
| 614 |
have unix-style naming will need certain algorithms to enable |
| 615 |
other file systems to be identified and treated. Client software |
| 616 |
may also have to be flexible in terms of the sequence of FTP |
| 617 |
commands used with different varieties of server. In view of a |
| 618 |
tendency for file systems to look increasingly similar, it was felt |
| 619 |
that the URL convention should not be weighed down by extra |
| 620 |
mechanisms for identifying these cases. |
| 621 |
|
| 622 |
The data format of a file can only, in the general FTP case, be |
| 623 |
deduced from the name, normally the suffix of the name. This is not |
| 624 |
standardized. An alternative is for it to be transferred in |
| 625 |
information outside the URL. The transfer mode (binary or text) |
| 626 |
must in turn be deduced from the data format. It is recommended |
| 627 |
that conventions for suffixes of public archives be established, |
| 628 |
but it outside the scope of this paper. |
| 629 |
|
| 630 |
NEWS |
| 631 |
|
| 632 |
The news locators refer to either news group names or article |
| 633 |
message identifiers which must conform to the rules of RFC 850. A |
| 634 |
|
| 635 |
|
| 636 |
|
| 637 |
Berners-Lee 11 |
| 638 |
|
| 639 |
message identifier may be distinguished from a news group name by |
| 640 |
the presence of the commercial at "@" character. These rules imply |
| 641 |
that within an article, a reference to a news group or to another |
| 642 |
article will be a valid URL (in the partial form). |
| 643 |
|
| 644 |
A news URL may be dereferenced using NNTP or using any other |
| 645 |
protocol for the conveyance of usenet news articles. |
| 646 |
|
| 647 |
Note1: |
| 648 |
|
| 649 |
Among URLs the news: URLs are anomalous in that they are |
| 650 |
location-independent. They are unsuitable as URN candidates because |
| 651 |
the NNTP architecture relies on the expiry of articles and |
| 652 |
therefore a small number of articles being available at any time. |
| 653 |
When a news: URL is quoted, the assumption is that the reader will |
| 654 |
fetch the article or group from his or her local news host. News |
| 655 |
host names are NOT part of news URLs. |
| 656 |
|
| 657 |
Note 2: |
| 658 |
|
| 659 |
An outstanding problem is that the message identifier is |
| 660 |
insufficient to allow the retrieval of an expired article, as no |
| 661 |
algorithm exists for deriving an archive site and file name. The |
| 662 |
addition of the date and news group set to the article's URL would |
| 663 |
allow this if a directory existed of archive sites by news group. |
| 664 |
Suggested subject of study in conjunction with NNTP WG. Further |
| 665 |
extension possible may be to allow the naming of subject threads as |
| 666 |
addressable objects. |
| 667 |
|
| 668 |
NNTP |
| 669 |
|
| 670 |
This is an alternative form of reference for news articles, |
| 671 |
specifically to be used with NNTP servers, and particularly those |
| 672 |
incomplete server implementations which do not allow retrieval by |
| 673 |
message identifier. |
| 674 |
|
| 675 |
The news server name, newsgroup name, and index number of an |
| 676 |
article within the newsgroup on that particular server are given. |
| 677 |
|
| 678 |
Note1. |
| 679 |
|
| 680 |
This form of URL is not of global accessiablity, as typically NNTP |
| 681 |
servers only allow access from local clients. This form or URL |
| 682 |
should not be quoted outside this local area. It should not be |
| 683 |
used within news articles for wider circulation than the one |
| 684 |
server. |
| 685 |
|
| 686 |
WAIS |
| 687 |
|
| 688 |
The current WAIS implementation public domain requires that a |
| 689 |
client know the "type" of a object prior to retrieval. This value |
| 690 |
is returned along with the internal object identifier in the search |
| 691 |
response. It has been encoded into the path part of the URL in |
| 692 |
|
| 693 |
|
| 694 |
|
| 695 |
Berners-Lee 12 |
| 696 |
|
| 697 |
order to make the URL sufficient for the retrieval of the object. |
| 698 |
Within the WAIS world, names do not of course not need to be |
| 699 |
prefixed by "wais:" (by the partial form rules). |
| 700 |
|
| 701 |
PROSPERO |
| 702 |
|
| 703 |
The Prospero (Neuman, 1991) directory service is used to resolve |
| 704 |
the URL yielding an access method for the object (which can then |
| 705 |
itself be represented as a URL if translated). The host part |
| 706 |
contains a host name or internet address. The port part is |
| 707 |
optional. |
| 708 |
|
| 709 |
The path part contains a host specific object name and an optional |
| 710 |
version number. If present, the version number is seperated from |
| 711 |
the host specific object name by the characters "%00" (percent |
| 712 |
zero zero), this being an escaped string terminator (null). |
| 713 |
External Prospero links are represented as URLs of the underlying |
| 714 |
access method and are not represented as Prospero URLs. |
| 715 |
|
| 716 |
GOPHER |
| 717 |
|
| 718 |
The first character of the URL path part (after the initial single |
| 719 |
slash) is a single-character "type" field which is that used by the |
| 720 |
Gopher protocol. The rest of the path is the "selector string", |
| 721 |
with disallowed characters encoded. Note that some selector strings |
| 722 |
begin with a copy of the gopher type character, in which case that |
| 723 |
character will occur twice consecutively in the URL. If the type |
| 724 |
character and selector are omitted, the type defaults to "1". |
| 725 |
Gopher links which refer to non-Gopher protocols are represented |
| 726 |
directly as URLs of the underlying access method and are not |
| 727 |
represented as Gopher URLs. |
| 728 |
|
| 729 |
MAILTO |
| 730 |
|
| 731 |
This allows a URL to specify an RFC822 addr-spec mail address. |
| 732 |
Note that use of % , for example as used in forming a gatewayed |
| 733 |
mail address, requires conversion to %25 in a URL. |
| 734 |
|
| 735 |
This semantics may be considered to be that the object referred to |
| 736 |
by the mailto: URL is the set of messages sent to or from that |
| 737 |
address. There is no algorithm to retrieve this set, but the SMTP |
| 738 |
protocol allows messages to be added to it, and any given user may |
| 739 |
be aware of a subset of its members. |
| 740 |
|
| 741 |
TELNET, RLOGIN, TN3270 |
| 742 |
|
| 743 |
The use of URLs to represent interactive sessions is a convenient |
| 744 |
extension to their uses for objects. This allows access to |
| 745 |
information systems which only provide an interactive service, and |
| 746 |
no information server. As information within the service cannot be |
| 747 |
addressed individually or, in general, automatically retrieved, |
| 748 |
this is a less desirable, though currently common, solution. |
| 749 |
|
| 750 |
|
| 751 |
|
| 752 |
|
| 753 |
Berners-Lee 13 |
| 754 |
|
| 755 |
X500 |
| 756 |
|
| 757 |
The mapping of x500 names onto URLs is not defined here. A decision |
| 758 |
is required as to whether "distinguished names" or "user friendly |
| 759 |
names" (ufn), or both, should be allowed. If any punctuation |
| 760 |
conversions are needed from the adopted x500 representation (such |
| 761 |
as the use of slashes between parts of a ufn) they must be defined. |
| 762 |
This is a subject for study. |
| 763 |
|
| 764 |
WHOIS |
| 765 |
|
| 766 |
This prefix describes the access using the "whois++" scheme in the |
| 767 |
process of definition. The host name part is the same as for other |
| 768 |
IP based schemes. The path part can be either a whois handle for a |
| 769 |
whois object, or it can be a valid whois query string. This is a |
| 770 |
subject for further study. |
| 771 |
|
| 772 |
NETWORK MANAGEMENT DATABASE |
| 773 |
|
| 774 |
This is a subject for study. |
| 775 |
|
| 776 |
REGISTRATION OF NAMING SCHEMES |
| 777 |
|
| 778 |
A new naming scheme may be introduced by defining a mapping onto a |
| 779 |
conforming URL syntax, using a new scheme identifier. Experimental |
| 780 |
scheme identifiers may be used by mutual agreement between parties, |
| 781 |
and must start with the characters "x-". The scheme name "urn:" is |
| 782 |
reserved for the work in progress on a scheme for more persistent |
| 783 |
names. Therefore URNs (Names) and URLs (Locators) be |
| 784 |
distinguishable. An object which is either a URL or a URN is known |
| 785 |
as a URI (Identifier). |
| 786 |
|
| 787 |
It is proposed that the Internet Assigned Numbers Authority (IANA) |
| 788 |
perform the function of registration of new schemes. Any submission |
| 789 |
of a new URI scheme must include a definition of an algorithm for |
| 790 |
the retrieval of any object within that scheme. The algorithm must |
| 791 |
take the URI and produce either a set of URL(s) which will lead to |
| 792 |
the desired object, or the object itself, in a well-defined or |
| 793 |
determinable format. |
| 794 |
|
| 795 |
It is recommended that those proposing a new scheme demonstrate its |
| 796 |
utility and operability by the provision of a gateway which will |
| 797 |
provide images of objects in the new scheme for clients using an |
| 798 |
existing protocol. If the new scheme is not a locator scheme, then |
| 799 |
the properties of names in the new space should be clearly defined. |
| 800 |
It is likewise recommended that, where a protocol allows for |
| 801 |
retrieval by URI, that the client software have provision for being |
| 802 |
configured to use specific gateway locators for indirect access |
| 803 |
through new naming schemes. |
| 804 |
|
| 805 |
BNF syntax |
| 806 |
|
| 807 |
This is a BNF-like description of the Uniform Resource Locator |
| 808 |
|
| 809 |
|
| 810 |
|
| 811 |
Berners-Lee 14 |
| 812 |
|
| 813 |
syntax. A vertical line "|" indicates alternatives, and |
| 814 |
[brackets] indicate optional parts. Spaces are representated by |
| 815 |
the word "space", and the vertical line character by "vline". |
| 816 |
Single letters stand for single letters. All words of more than one |
| 817 |
letter below are entities described somewhere in this description. |
| 818 |
|
| 819 |
The current IETF URI working group prefereence is for the |
| 820 |
prefiexedurl production. (Nov 1993. July 93: url). |
| 821 |
|
| 822 |
The "generic" production gives a higher level parsing of the same |
| 823 |
URLs as the other productions. The "national" and "punctuation" |
| 824 |
characters fo not appear in any productions and therefore may not |
| 825 |
appear in URLs. |
| 826 |
|
| 827 |
The "afsaddress" is left in as historical note, but is not a url |
| 828 |
production |
| 829 |
|
| 830 |
prefixedurl u r l : url |
| 831 |
|
| 832 |
fragmentaddress uri [ # fragmentid ] |
| 833 |
|
| 834 |
uri url | generic |
| 835 |
|
| 836 |
ur l generic | httpaddress | ftpaddress | |
| 837 |
newsaddress | nntpaddress | prosperoaddress | |
| 838 |
telnetaddress | gopheraddress | waisaddress |
| 839 |
| mailtoaddress |
| 840 |
|
| 841 |
generic scheme : path [ ? search ] |
| 842 |
|
| 843 |
scheme ialpha |
| 844 |
|
| 845 |
httpaddress h t t p : / / hostport [ / path ] [ ? |
| 846 |
search ] |
| 847 |
|
| 848 |
ftpaddress f t p : / / login / path |
| 849 |
|
| 850 |
afsaddress a f s : / / cellname / path |
| 851 |
|
| 852 |
newsaddress n e w s : groupart |
| 853 |
|
| 854 |
nntpaddress n n t p : group / digits |
| 855 |
|
| 856 |
mailtoaddress m a i l t o : : xalphas @ hostname |
| 857 |
|
| 858 |
waisaddress waisindex | waisdoc |
| 859 |
|
| 860 |
waisindex w a i s : / / hostport / database [ ? search |
| 861 |
] |
| 862 |
|
| 863 |
waisdoc w a i s : / / hostport / database / wtype / |
| 864 |
path |
| 865 |
|
| 866 |
|
| 867 |
|
| 868 |
|
| 869 |
Berners-Lee 15 |
| 870 |
|
| 871 |
groupart * | group | article |
| 872 |
|
| 873 |
group ialpha [ . group ] |
| 874 |
|
| 875 |
article xalphas @ host |
| 876 |
|
| 877 |
database xalphas |
| 878 |
|
| 879 |
wtype xalphas |
| 880 |
|
| 881 |
prosperoaddress prosperolink |
| 882 |
|
| 883 |
prosperolink p r o s p e r o : / / hostport / hsoname [ % |
| 884 |
0 0 version [ attributes ] ] |
| 885 |
|
| 886 |
hsoname path |
| 887 |
|
| 888 |
version digits |
| 889 |
|
| 890 |
attributes attribute [ attributes ] |
| 891 |
|
| 892 |
attribute alphanums |
| 893 |
|
| 894 |
telnetaddress t e l n e t : / / login |
| 895 |
|
| 896 |
gopheraddress g o p h e r : / / hostport [/ gtype [ |
| 897 |
selector ] ] [ ? search ] |
| 898 |
|
| 899 |
login [ user [ : password ] @ ] hostport |
| 900 |
|
| 901 |
hostport host [ : port ] |
| 902 |
|
| 903 |
host hostname | hostnumber |
| 904 |
|
| 905 |
cellname hostname |
| 906 |
|
| 907 |
hostname ialpha [ . hostname ] |
| 908 |
|
| 909 |
hostnumber digits . digits . digits . digits |
| 910 |
|
| 911 |
port digits |
| 912 |
|
| 913 |
selector path |
| 914 |
|
| 915 |
path void | xpalphas [ / path ] |
| 916 |
|
| 917 |
search xalphas [ + search ] |
| 918 |
|
| 919 |
user xalphas |
| 920 |
|
| 921 |
password xalphas |
| 922 |
|
| 923 |
fragmentid xalphas |
| 924 |
|
| 925 |
|
| 926 |
|
| 927 |
Berners-Lee 16 |
| 928 |
|
| 929 |
gtype xalpha |
| 930 |
|
| 931 |
xalpha alpha | digit | safe | extra | escape |
| 932 |
|
| 933 |
xalphas xalpha [ xalphas ] |
| 934 |
|
| 935 |
xpalpha xalpha | + |
| 936 |
|
| 937 |
xpalphas xpalpha [ xpalpha ] |
| 938 |
|
| 939 |
ialpha alpha [ xalphas ] |
| 940 |
|
| 941 |
alpha a | b | c | d | e | f | g | h | i | j | k | |
| 942 |
l | m | n | o | p | q | r | s | t | u | v | |
| 943 |
w | x | y | z | A | B | C | D | E | F | G | |
| 944 |
H | I | J | K | L | M | N | O | P | Q | R | |
| 945 |
S | T | U | V | W | X | Y | Z |
| 946 |
|
| 947 |
0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| 948 |
|
| 949 |
safe $ | - | _ | @ | . | & | + | - |
| 950 |
|
| 951 |
extra ! | * | " | ' | ( | ) | : | ; | , | space |
| 952 |
|
| 953 |
escape % hex hex |
| 954 |
|
| 955 |
hex digit | a | b | c | d | e | f | A | B | C | |
| 956 |
D | E | F |
| 957 |
|
| 958 |
national { | } | vline | [ | ] | \ | ^ | ~ |
| 959 |
|
| 960 |
punctuation < | > |
| 961 |
|
| 962 |
digits digit [ digits ] |
| 963 |
|
| 964 |
alphanum alpha | digit |
| 965 |
|
| 966 |
alphanums alphanum [ alphanums ] |
| 967 |
|
| 968 |
void |
| 969 |
|
| 970 |
Security considerations |
| 971 |
|
| 972 |
The URL scheme does not in itself pose a security threat. Users |
| 973 |
should beware that there is no general guarantee that a URL which |
| 974 |
at one time points to a given object continues to do so, and does |
| 975 |
not even at some later time point to a different object due to the |
| 976 |
movement of objects on servers. |
| 977 |
|
| 978 |
The use of URLs containing passwords is clearly unwise. |
| 979 |
|
| 980 |
Conclusion |
| 981 |
|
| 982 |
|
| 983 |
|
| 984 |
|
| 985 |
Berners-Lee 17 |
| 986 |
|
| 987 |
A need has been demonstrated, and a number of requirements have |
| 988 |
been stated for uniform resource locators (URLs). A scheme has been |
| 989 |
proposed which builds on existing conventions to define a syntax |
| 990 |
for URLs. This scheme has been in serious use by World-Wide Web |
| 991 |
(W3) initiative since 1991. Adoption of the scheme in |
| 992 |
correspondence, standards and software will ease the use of |
| 993 |
references to on-line information in a flexible way as the coming |
| 994 |
information age arrives. |
| 995 |
|
| 996 |
Acknowledgements |
| 997 |
|
| 998 |
This paper builds on the basic W3 design and much discussion of |
| 999 |
these issues by many people on the network. The discussion was |
| 1000 |
particularly stimulated by articles by Clifford Lynch (1991), |
| 1001 |
Brewster Kahle (1991) and Wengyik Yeong (1991b). Contributions from |
| 1002 |
John Curran (NEARnet), Clifford Neuman (ISI) Ed Vielmetti (MSEN) |
| 1003 |
and later the IETF URL BOF and URI working group have been |
| 1004 |
incorporated into this issue of this paper. |
| 1005 |
|
| 1006 |
The draft url4 (Internet Draft 00) was generated from url3 |
| 1007 |
following discussion and overall approval of the URL working group |
| 1008 |
on 29 March 1993. The paper url3 had been generated from udi2 in |
| 1009 |
the light of discussion at the UDI BOF meeting at the Boston IETF |
| 1010 |
in July 1992. Draft url4 was Internet Draft 00. Draft url5 |
| 1011 |
incorporated changes suggested by Clifford Neuman, and draft url6 |
| 1012 |
(ID 01) incorporated character group changes and a few other fixes |
| 1013 |
defined by the IETF URI WG in submitting it as a proposed standard. |
| 1014 |
URL7 (Internet Draft 02) incorporated changes introduced at the |
| 1015 |
Amsterdam IETF and refined in net discussion. |
| 1016 |
|
| 1017 |
APPENDICES |
| 1018 |
|
| 1019 |
The following are not formally part of this document. |
| 1020 |
|
| 1021 |
Fragment-id |
| 1022 |
|
| 1023 |
This represents a part of, fragment of, or a sub-function within, |
| 1024 |
an object or object. Its syntax and semantics are defined by the |
| 1025 |
application responsible for the object, or the specification of the |
| 1026 |
content type of the object. The only definition here is of the |
| 1027 |
allowed characters by which it may be represented in a URL. |
| 1028 |
|
| 1029 |
The fragment-id follows the URL of the whole object from which it |
| 1030 |
is separated by a hash sign (#). If the fragment-id is void, the |
| 1031 |
hash sign may be omitted: A void fragment-id with or without the |
| 1032 |
hash sign means that the URL refers to the whole object. |
| 1033 |
|
| 1034 |
While this hook is allowed for identification of fragments, the |
| 1035 |
question of addressing of parts of objects, or of the grouping of |
| 1036 |
objects and relationship between contined and containing objects, |
| 1037 |
is not addressed by this object. |
| 1038 |
|
| 1039 |
This object does not address the question of objects which are |
| 1040 |
|
| 1041 |
|
| 1042 |
|
| 1043 |
Berners-Lee 18 |
| 1044 |
|
| 1045 |
different versions of a "living" object, nor of expressing the |
| 1046 |
relationships between different versions and the living object. |
| 1047 |
|
| 1048 |
Partial form |
| 1049 |
|
| 1050 |
In a certain limited set of cases, generally within a certain |
| 1051 |
application, it may be useful to pass only a section of the URL. |
| 1052 |
Within a object whose URL is well defined, the URL of another |
| 1053 |
object may be given in abbreviated form, where parts of the two |
| 1054 |
URLs are the same. This allows objects within a group to refer to |
| 1055 |
each other without requiring the space for a complete reference, |
| 1056 |
and it incidentally allows the group of objects to be moved |
| 1057 |
without changing any references. This is not discussed in detail |
| 1058 |
here, it is only mentioned so that the characters required by the |
| 1059 |
technique be reserved for that purpose. It must be emphasised that |
| 1060 |
when a reference is passed in anything other than a well controlled |
| 1061 |
context, the full form must always be used. |
| 1062 |
|
| 1063 |
The partial form relies on a property of the URL syntax that |
| 1064 |
certain characters ("/") and certain path elements ("..", ".") have |
| 1065 |
a significance reserved for representing a hierarchical space, and |
| 1066 |
must be recognised as such by both clients and servers. |
| 1067 |
|
| 1068 |
A partial form can be distinguished from a full form in that a full |
| 1069 |
form must have a colon and that colon must occur before any slash |
| 1070 |
characters. |
| 1071 |
|
| 1072 |
The rules for the use of a partial name are: |
| 1073 |
|
| 1074 |
If the scheme parts are different, the whole absolute locator |
| 1075 |
must be given. Otherwise, the scheme is omitted, and: |
| 1076 |
|
| 1077 |
If the host and/or port parts are the different, the host, port |
| 1078 |
name and all the rest of the locator must be given. |
| 1079 |
|
| 1080 |
If the access and host parts are the same, then the path may be |
| 1081 |
given in absolute (fully qualified) or relative form. Within the |
| 1082 |
path: |
| 1083 |
|
| 1084 |
If a leading slash is present, the path is absolute. Otherwise, |
| 1085 |
a relative path is interpreted as follows: |
| 1086 |
|
| 1087 |
The last part of the path of the context locator (anything |
| 1088 |
following the rightmost slash) is removed, and the given partial |
| 1089 |
URL appended in its place. |
| 1090 |
|
| 1091 |
Within the result, all occurrences of "xxx/../" or "/." are |
| 1092 |
recursively removed, where xxx, ".." and "." are complete path |
| 1093 |
elements. |
| 1094 |
|
| 1095 |
Note: If a path of the context locator end in slash, partial URLs |
| 1096 |
will be treated differently to their treatment with respect to the |
| 1097 |
same path without a slash. Using a trailing slash on a directory |
| 1098 |
|
| 1099 |
|
| 1100 |
|
| 1101 |
Berners-Lee 19 |
| 1102 |
|
| 1103 |
name is not therefore recommended. The signifcance of a trailing |
| 1104 |
slash may be considered as that of the locator of a file with void |
| 1105 |
name within that directory. |
| 1106 |
|
| 1107 |
Wrappers for URIs in plain text |
| 1108 |
|
| 1109 |
This section does not formally form part of the URL specification. |
| 1110 |
|
| 1111 |
URIs, including URLs, will ideally be transmitted though protocols |
| 1112 |
which accept them and data formats which define a context for them. |
| 1113 |
However, in practice nowadays there are many occasions when URLs |
| 1114 |
are included in plain ASCII non-marked-up text such as electronic |
| 1115 |
mail and usenet news messages. |
| 1116 |
|
| 1117 |
In this case, it is convenient to have a separate wrapper syntax to |
| 1118 |
define delimiters which will enable the human or automated reader |
| 1119 |
to recognize that the URI is a URI. |
| 1120 |
|
| 1121 |
The recommendation is that the angle brackets (less than and |
| 1122 |
greater than signs) of the ASCII set be used for this purpose. |
| 1123 |
|
| 1124 |
These wrappers do not form part of the URL, are not mandatory, and |
| 1125 |
should not be used in contexts (such as SGML parameters, HTTP |
| 1126 |
requests, etc) in which delimiters are already specified. |
| 1127 |
|
| 1128 |
Example |
| 1129 |
|
| 1130 |
Yes, Jim, I found it under <ftp://info.cern.ch/pub> bu |
| 1131 |
t |
| 1132 |
you can probably pick it up from <ftp://ds.internic.ne |
| 1133 |
t/rfc>. |
| 1134 |
|
| 1135 |
|
| 1136 |
REFERENCES |
| 1137 |
|
| 1138 |
Alberti, R., et.al. (1991) |
| 1139 |
"Notes on the Internet Gopher Protocol" |
| 1140 |
University of Minnesota, December 1991, |
| 1141 |
<ftp://boombox.micro.umn.edu/pub/gopher/ |
| 1142 |
gopher_protocol> . See also |
| 1143 |
<gopher://gopher.micro.umn.edu/00/Information |
| 1144 |
About Gopher/About Gopher> |
| 1145 |
|
| 1146 |
Berners-Lee, T ., (1991) |
| 1147 |
"Hypertext Transfer Protocol (HTTP)" , CERN, |
| 1148 |
December 1991, |
| 1149 |
<ftp://info.cer |
| 1150 |
n.ch/pub/www/doc/http-spec.txt> |
| 1151 |
|
| 1152 |
Crocker "Standard for ARPA Internet Text Messages" . |
| 1153 |
David H. Crocker, RFC822, |
| 1154 |
|
| 1155 |
Davis, F, et al., (1990) |
| 1156 |
|
| 1157 |
|
| 1158 |
|
| 1159 |
Berners-Lee 20 |
| 1160 |
|
| 1161 |
"WAIS Interface Protocol: Prototype |
| 1162 |
Functional Specification", Thinking Machines |
| 1163 |
Corporation, April 23, 1990 |
| 1164 |
<ftp://quake.think.com/pub/wa |
| 1165 |
is/doc/protspec.txt> |
| 1166 |
|
| 1167 |
International Standards Organization, (1991) |
| 1168 |
Information and Documentation - Search and |
| 1169 |
Retrieve Application Protocol Specification |
| 1170 |
for open Systems Interconnection, ISO-10163 |
| 1171 |
|
| 1172 |
Huitema, C., (1991) "Naming: strategies and techniques", |
| 1173 |
Computer Networks and ISDN Systems 23 (1991) |
| 1174 |
107-110. |
| 1175 |
|
| 1176 |
Kahle, Brewster, (1991) |
| 1177 |
"Document Identifiers, or International |
| 1178 |
Standard Book Numbers for the Electronic |
| 1179 |
Age", |
| 1180 |
<ftp: |
| 1181 |
//quake.think.com/pub/wais/doc/doc-ids.txt> |
| 1182 |
|
| 1183 |
Kantor, B., and Lapsley, P., (1986) |
| 1184 |
"A proposed standard for the stream-based |
| 1185 |
transmission of news", Internet RFC-977, |
| 1186 |
February 1986. |
| 1187 |
<ftp://ds.internic.net/rfc/rfc977.txt> |
| 1188 |
|
| 1189 |
Lynch, C., Coallition for Networked Information: (1991) |
| 1190 |
"Workshop on ID and Reference Structures for |
| 1191 |
Networked Information", November 1991. See |
| 1192 |
<wais://quake.think.com/wais-discussion-ar |
| 1193 |
chives?lynch> |
| 1194 |
|
| 1195 |
Mockapetris, P., (1987) |
| 1196 |
"Domain names + concepts and facilities", |
| 1197 |
RFC-1034, USC-ISI, November 1987, |
| 1198 |
<ftp://ds.internic.net/rfc/rfc1034.txt> |
| 1199 |
|
| 1200 |
Neuman, B. Clifford, (1992) |
| 1201 |
"Prospero: A Tool for Organizing Internet |
| 1202 |
Resources", Electronic Networking: Research, |
| 1203 |
Applications and Policy, Vol 1 No 2, Meckler |
| 1204 |
Westport CT USA. See also |
| 1205 |
<ftp://prospero.isi.edu/pub/prospero/oir.ps> |
| 1206 |
|
| 1207 |
Postel, J. and Reynolds, J. (1985) |
| 1208 |
"File Transfer Protocol (FTP)", Internet |
| 1209 |
RFC-959, October 1985. |
| 1210 |
<ftp://ds.internic.net/rfc/rfc959.txt> |
| 1211 |
|
| 1212 |
Yeong, W., (1991a) "Towards Networked Information Retrieval", |
| 1213 |
Technical report 91-06-25-01, June 1991, |
| 1214 |
|
| 1215 |
|
| 1216 |
|
| 1217 |
Berners-Lee 21 |
| 1218 |
|
| 1219 |
Performance Systems International, Inc. |
| 1220 |
<ftp://uu.psi.com/wp/nir.txt> |
| 1221 |
|
| 1222 |
Yeong, W., (1991b), "Representing Public Archives in the |
| 1223 |
Directory", Internet Draft, November 1991, |
| 1224 |
now expired. |
| 1225 |
|
| 1226 |
Author's address |
| 1227 |
|
| 1228 |
|
| 1229 |
Tim Berners-Lee |
| 1230 |
Address: World-Wide Web project |
| 1231 |
CERN, |
| 1232 |
1211 Geneva 23, |
| 1233 |
Switzerland |
| 1234 |
|
| 1235 |
Telephone: +41 (22)767 3755 |
| 1236 |
Fax: +41 (22)767 7155 |
| 1237 |
Email: timbl@info.cern.ch |
| 1238 |
|
| 1239 |
|
| 1240 |
|
| 1241 |
|
| 1242 |
|
| 1243 |
|
| 1244 |
|
| 1245 |
|
| 1246 |
|
| 1247 |
|
| 1248 |
|
| 1249 |
|
| 1250 |
|
| 1251 |
|
| 1252 |
|
| 1253 |
|
| 1254 |
|
| 1255 |
|
| 1256 |
|
| 1257 |
|
| 1258 |
|
| 1259 |
|
| 1260 |
|
| 1261 |
|
| 1262 |
|
| 1263 |
|
| 1264 |
|
| 1265 |
|
| 1266 |
|
| 1267 |
|
| 1268 |
|
| 1269 |
|
| 1270 |
|
| 1271 |
|
| 1272 |
|
| 1273 |
|
| 1274 |
|
| 1275 |
|
| 1276 |
Berners-Lee 22 |
| 1277 |
|