1 |
Universal Resource Identifiers Tim Berners-Lee |
2 |
draft-bernerslee-www-uri-00.{ps,txt} CERN |
3 |
Expires 12 September 1994 12 March 1994 |
4 |
|
5 |
|
6 |
Universal Resource Identifiers in WWW |
7 |
|
8 |
A Unifying Syntax for the Expression of |
9 |
Names and Addresses of Objects on the Network |
10 |
as used in the World-Wide Web |
11 |
|
12 |
|
13 |
ABOUT THIS DOCUMENT |
14 |
|
15 |
This document defines the syntax used by the World-Wide Web |
16 |
initiative to encode the names and addresses of objects on the |
17 |
Internet. The web is considered to include objects accessed using |
18 |
an extendable number of protocols, existing, invented for the web |
19 |
itself, or to be invented in the future. Access instructions for |
20 |
an individual object under a given protocol are encoded into forms |
21 |
of address string. Other protocols allow the use of object names |
22 |
of various forms. In order to abstract the idea of a generic |
23 |
object, the web needs the concepts of the universal set of objects, |
24 |
and of the universal set of names or addresses of objects. |
25 |
|
26 |
A Universal Resource Identifier (URI) is a member of this universal |
27 |
set of names in registered name spaces and addresses referring to |
28 |
registered protocols or name spaces. A Uniform Resource Locator |
29 |
(URL), defined elsewhere, is a form of URI which expresses an |
30 |
address which maps onto an access algorithm using network |
31 |
protocols. Existing URI schemes which correspond to the (still |
32 |
mutating) concept of IETF URLs are listed here. The Uniform |
33 |
Resource Name (URN) debate attempts to define a name space (and |
34 |
presumably resolution protocols) for persistent object names. This |
35 |
area is not addressed by this document, which is written in order |
36 |
to document existing practice and provide a reference point for URL |
37 |
and URN discussions. |
38 |
|
39 |
This document is therefore to be issued under the "informational |
40 |
RFC" disclaimer . |
41 |
|
42 |
The world-wide web protocols are discussed on the mailing list |
43 |
www-talk-request@info.cern.ch and the newsgroup |
44 |
comp.infosystems.www is preferable for beginner's questions. The |
45 |
mailing list uri-request@bunyip.com has discussion related |
46 |
particularly to the URI issue. The author may be contacted as |
47 |
timbl@info.cern.ch. |
48 |
|
49 |
This document is available in hypertext form at |
50 |
http://info.cern.ch/hypertext/WWW/Addressing/URL/URI_Overview.html |
51 |
|
52 |
STATUS OF THIS MEMO |
53 |
|
54 |
|
55 |
|
56 |
|
57 |
Berners-Lee 1 |
58 |
|
59 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
60 |
|
61 |
This document is an Internet Draft. Internet Drafts are working |
62 |
documents of the Internet Engineering Task Force (IETF), its Areas, |
63 |
and its Working Groups. Note that other groups may also distribute |
64 |
working documents as Internet Drafts. |
65 |
|
66 |
Internet Drafts are working documents valid for a maximum of six |
67 |
months. Internet Drafts may be updated, replaced, or obsoleted by |
68 |
other documents at any time. It is not appropriate to use Internet |
69 |
Drafts as reference material or to cite them other than as a |
70 |
"working draft" or "work in progress". |
71 |
|
72 |
Distribution of this document is unlimited. |
73 |
|
74 |
THE NEED FOR A UNIVERSAL SYNTAX |
75 |
|
76 |
This section describes the concept of the URI and does not form |
77 |
part of the specification. |
78 |
|
79 |
Many protocols and systems for document search and retrieval are |
80 |
currently in use, and many more protocols or refinements of |
81 |
existing protocols are to be expected in a field whose expansion is |
82 |
explosive. |
83 |
|
84 |
These systems are aiming to achieve global search and readership of |
85 |
documents across differing computing platforms, and despite a |
86 |
plethora of protocols and data formats. As protocols evolve, |
87 |
gateways can allow global access to remain possible. As data |
88 |
formats evolve, format conversion programs can preserve global |
89 |
access. There is one area, however, in which it is impractical to |
90 |
make conversions, and that is in the names and addresses used to |
91 |
identify objects. This is because names and addresses of objects |
92 |
are passed on in so many ways, from the backs of envelopes to |
93 |
hypertext objects, and may have a long life. |
94 |
|
95 |
A common feature of almost all the data models of past and proposed |
96 |
systems is something which can be mapped onto a concept of "object" |
97 |
and some kind of name, address, or identifier for that object. One |
98 |
can therefore define a set of name spaces in which these objects |
99 |
can be said to exist. |
100 |
|
101 |
Practical systems need to access and mix objects which are part of |
102 |
different existing and proposed systems. Therefore, the concept of |
103 |
the universal set of all objects, and hence the universal set of |
104 |
names and addresses, in all name spaces, becomes important. This |
105 |
allows names in different spaces to be treated in a common way, |
106 |
even though names in different spaces have differing |
107 |
characteristics, as do the objects to which they refer. |
108 |
|
109 |
URIs |
110 |
|
111 |
This document defines a way to encapsulate a name in any registered |
112 |
name space, and label it with the the name space, producing a |
113 |
member of the universal set. Such an encoded and labelled member |
114 |
|
115 |
Berners-Lee 2 |
116 |
|
117 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
118 |
|
119 |
of this set is known as a Universal Resource Identifier, or URI. |
120 |
|
121 |
The universal syntax allows access of objects available using |
122 |
existing protocols, and may be extended with technology. |
123 |
|
124 |
The specification of the URI syntax does not imply anything about |
125 |
the properties of names and addresses in the various name spaces |
126 |
which are mapped onto the set of URI strings. The properties |
127 |
follow from the specifications of the protocols and the associated |
128 |
usage conventions for each scheme. |
129 |
|
130 |
URLs |
131 |
|
132 |
For existing Internet access protocols, it is necessary in most |
133 |
cases to define the encoding of the access algorithm into something |
134 |
concise enough to be termed address. URIs which refer to objects |
135 |
accessed with existing protocols are known as "Uniform Resource |
136 |
Locators" (URLs) and are listed here as used in WWW, but to be |
137 |
formally defined in a separate document . |
138 |
|
139 |
URNs |
140 |
|
141 |
There is currently a drive to define a space of more persistent |
142 |
names than any URLs. These "Uniform Resource Names" are the |
143 |
subject of an IETF working group's discussions. (See Sollins and |
144 |
Masinter, Functional Specifications for URNs, circulated |
145 |
informally.) |
146 |
|
147 |
The URI syntax and URL forms have been in widespread use by |
148 |
World-Wide Web software since 1990. |
149 |
|
150 |
DESIGN CRITERIA AND CHOICES |
151 |
|
152 |
This section is not part of the specification: it is simply an |
153 |
explanation of the way in which the specification was derived. |
154 |
|
155 |
Design criteria |
156 |
|
157 |
The syntax was designed to be |
158 |
|
159 |
Extensible New naming schemes may be added later. |
160 |
|
161 |
Complete It is possible to encode any naming scheme. |
162 |
|
163 |
Printable It is possible to express any URI using |
164 |
7-bit ASCII characters so that URIs may if |
165 |
necessary be passed using pen and ink. |
166 |
|
167 |
Choices for a universal syntax |
168 |
|
169 |
For the syntax itself there is little choice except for the order |
170 |
and punctuation of the elements, and the acceptable characters and |
171 |
|
172 |
|
173 |
Berners-Lee 3 |
174 |
|
175 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
176 |
|
177 |
escaping rules. |
178 |
|
179 |
The extensibility requirement is met by allowing an arbitrary (but |
180 |
registered) string to be used as a prefix. A prefix is chosen as |
181 |
left to right parsing is more common than right to left. The |
182 |
choice of a colon as separator of the prefix from the rest of the |
183 |
URI was arbitrary. |
184 |
|
185 |
The decoding of the rest of the string is defined as a function of |
186 |
the prefix. New prefixed are introduced for new schemes as |
187 |
necessary, in agreement with the registration authority. The |
188 |
registration of a new scheme clearly requires the definition of the |
189 |
decoding of the URI into a given name space, and a definition of |
190 |
the properties and, where applicable, resolution protocols, for the |
191 |
name space. |
192 |
|
193 |
The completeness requirement is easily met by allowing particularly |
194 |
strange or plain binary names to be encoded in base 16 or 64 using |
195 |
the acceptable characters. |
196 |
|
197 |
The printability requirement could have been met by requiring all |
198 |
schemes to encode characters not part of a basic set. This led to |
199 |
many discussions of what the basic set should be. A difficult case, |
200 |
for example, is when an ISO latin 1 string appears in a URL, and |
201 |
within an application with ISO Latin-1 capability, it can be |
202 |
handled intact. However, for transport in general, the non-ASCII |
203 |
characters need to be escaped. |
204 |
|
205 |
The solution to this was to specify a safe set of characters, and a |
206 |
general escaping scheme which may be used for encoding "unsafe" |
207 |
characters. This "safe" set is suitable, for example, for use in |
208 |
electronic mail. This is the canonical form of a URI. |
209 |
|
210 |
The choice of escape character for introducing representations of |
211 |
non-allowed characters also tends to be a matter of taste. An ANSI |
212 |
standard exists in the C language, using the back-slash character |
213 |
"\". The use of this character on unix command lines, however, can |
214 |
be a problem as it is interpreted by many shell programs, and would |
215 |
have itself to be escaped. It is also a character which is not |
216 |
available on certain keyboards. The equals sign is commonly used |
217 |
in the encoding of names having attribute=value pairs. The percent |
218 |
sign was eventually chosen as a suitable escape character. |
219 |
|
220 |
There is a conflict between the need to be able to represent many |
221 |
characters including spaces within a URI directly, and the need to |
222 |
be able to use a URI in environments which have limited character |
223 |
sets or in which certain characters are prone to corruption. This |
224 |
conflict has been resolved by use of an hexadecimal escaping method |
225 |
which may be applied to any characters forbidden in a given |
226 |
context. When URLs are moved between contexts, the set of |
227 |
characters escaped may be enlarged or reduced unambiguously. |
228 |
|
229 |
|
230 |
|
231 |
Berners-Lee 4 |
232 |
|
233 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
234 |
|
235 |
The use of white space characters is risky in URIs to be printed |
236 |
or sent by electronic mail, and the use of multiple white space |
237 |
characters is very risky. This is because of the frequent |
238 |
introduction of extraneous white space when lines are wrapped by |
239 |
systems such as mail, or sheer necessity of narrow column width, |
240 |
and because of the inter-conversion of various forms of white |
241 |
space which occurs during character code conversion and the |
242 |
transfer of text between applications. This is why the canonical |
243 |
form for URIs has all white spaces encoded. |
244 |
|
245 |
RECOMMENDATIONS |
246 |
|
247 |
This section describes the syntax for URIs as used in the WorldWide |
248 |
Web initiative. The generic syntax provides a framework for new |
249 |
schemes for names to be resolved using as yet undefined protocols. |
250 |
|
251 |
|
252 |
URI syntax |
253 |
|
254 |
A complete URI consists of a naming scheme specifier followed by a |
255 |
string whose format is a function of the naming scheme. For |
256 |
locators of information on the Internet, a common syntax is used |
257 |
for the IP address part. A BNF description of the URL syntax is |
258 |
given in an a later section. The components are as follows. |
259 |
Fragment identifiers and relative URIs are not involved in the |
260 |
basic URL definition. |
261 |
|
262 |
SCHEME |
263 |
|
264 |
Within the URI of a object, the first element is the name of the |
265 |
scheme, separated from the rest of the object by a colon. |
266 |
|
267 |
PATH |
268 |
|
269 |
The rest of the URI follows the colon in a format depending on the |
270 |
scheme. The path is interpreted in a manner dependent on the |
271 |
protocol being used. However, when it contains slashes, these must |
272 |
imply a hierarchical structure. |
273 |
|
274 |
Reserved characters |
275 |
|
276 |
The path in the URI has a significance defined by the particular |
277 |
scheme. Typically it is used to encode a name in a given name |
278 |
space, or an algorithm for accessing an object. In either case, the |
279 |
encoding may use those characters allowed by the BNF syntax, or |
280 |
hexadecimal encoding of other characters. |
281 |
|
282 |
Some of the reserved characters have special uses as defined here. |
283 |
|
284 |
THE PERCENT SIGN |
285 |
|
286 |
The percent sign ("%", ASCII 25 hex) is used as the escape |
287 |
character in the encoding scheme and is never allowed for anything |
288 |
|
289 |
Berners-Lee 5 |
290 |
|
291 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
292 |
|
293 |
else. |
294 |
|
295 |
HIERARCHICAL FORMS |
296 |
|
297 |
The slash ("/", ASCII 2F hex) character is reserved for the |
298 |
delimiting of substrings whose relationship is hierarchical. This |
299 |
enables partial forms of the URI. Substrings consisting of single |
300 |
or double dots ("." or "..") are similarly reserved. |
301 |
|
302 |
The significance of the slash between two segments is that the |
303 |
segment of the path to the left is more significant than the |
304 |
segment of the path to the right. ("Significance" in this case |
305 |
refers solely to closeness to the root of the hierarchical |
306 |
structure and makes no value judgement!) |
307 |
|
308 |
Note |
309 |
|
310 |
The similarity to unix and other disk operating system filename |
311 |
conventions should be taken as purely coincidental, and should not |
312 |
be taken to indicate that URIs should be interpreted as file names. |
313 |
|
314 |
HASH FOR FRAGMENT IDENTIFIERS |
315 |
|
316 |
The hash ("#", ASCII 23 hex) character is reserved as a delimiter |
317 |
to separate the URI of an object from a fragment identifier . |
318 |
|
319 |
QUERY STRINGS |
320 |
|
321 |
The question mark ("?", ASCII 3F hex) is used to delimit the |
322 |
boundary between the URI of a queryable object, and a set of words |
323 |
used to express a query on that object. When this form is used, |
324 |
the combined URI stands for the object which results from the query |
325 |
being applied to the original object. |
326 |
|
327 |
Within the query string, the plus sign is reserved as shorthand |
328 |
notation for a space. Therefore, real plus signs must be encoded. |
329 |
This method was used to make query URIs easier to pass in systems |
330 |
which did not allow spaces. |
331 |
|
332 |
The query string represents some operation applied to the object, |
333 |
but this specification gives no common syntax or semantics for it. |
334 |
In practice the syntax and sematics may depend on the scheme and |
335 |
may even on the base URI. |
336 |
|
337 |
OTHER RESERVED CHARACTERS |
338 |
|
339 |
The astersik ("*", ASCII 2A hex) and exclamation mark ("!" , ASCII |
340 |
21 hex) are reserved for use as having special signifiance within |
341 |
specific schemes. |
342 |
|
343 |
Unsafe characters |
344 |
|
345 |
|
346 |
|
347 |
Berners-Lee 6 |
348 |
|
349 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
350 |
|
351 |
In canonical form, certain characters such as spaces, control |
352 |
characters, some characters whose ASCII code is used differently in |
353 |
different national character variant 7 bit sets, and all 8bit |
354 |
characters beyond DEL (7F hex) of the ISO Latin-1 set, shall not be |
355 |
used unencoded. This is a recommendation for trouble-free |
356 |
interchange, and as indicated below, the encoded set may be |
357 |
extended or reduced. |
358 |
|
359 |
Encoding reserved characters |
360 |
|
361 |
When a system uses a local addressing scheme, it is useful to |
362 |
provide a mapping from local addresses into URIs so that references |
363 |
to objects within the addressing scheme may be referred to |
364 |
globally, and possibly accessed through gateway servers. |
365 |
|
366 |
For a new naming scheme, any mapping scheme may be defined provided |
367 |
it is unambiguous, reversible, and provides valid URIs. It is |
368 |
recommended that where hierarchical aspects to the local naming |
369 |
scheme exist, they be mapped onto the hierarchical URL path syntax |
370 |
in order to allow the partial form to be used. |
371 |
|
372 |
It is also recommended that the conventional scheme below be used |
373 |
in all cases except for any scheme which encodes binary data as |
374 |
opposed to text, in which case a more compact encoding such as pure |
375 |
hexadecimal or base 64 might be more appropriate. For example, the |
376 |
conventional URI encoding method is used for mapping WAIS, FTP, |
377 |
Prospero and Gopher addresses in the URI specification. |
378 |
|
379 |
CONVENTIONAL URI ENCODING SCHEME |
380 |
|
381 |
Where the local naming scheme uses ASCII characters which are not |
382 |
allowed in the URI, these may be represented in the URL by a |
383 |
percent sign "%" immediately followed by two hexadecimal digits |
384 |
(0-9, A-F) giving the ISO Latin 1 code for that character. |
385 |
Character codes other than those allowed by the syntax shall not be |
386 |
used unencoded in a URI. |
387 |
|
388 |
REDUCED OR INCREASED SAFE CHARACTER SETS |
389 |
|
390 |
The same encoding method may be used for encoding characters whose |
391 |
use, although technically allowed in a URI, would be unwise due to |
392 |
problems of corruption by imperfect gateways or misrepresentation |
393 |
due to the use of variant character sets, or which would simply be |
394 |
awkward in a given environment. Because a % sign always indicates |
395 |
an encoded character, a URI may be made "safer" simply by encoding |
396 |
any characters considered unsafe, while leaving already encoded |
397 |
characters still encoded. Similarly, in cases where a larger set |
398 |
of characters is acceptable, % signs can be selectively and |
399 |
reversibly expanded. |
400 |
|
401 |
Before two URIs can be compared, it is therefore necessary to bring |
402 |
them to the same encoding level. |
403 |
|
404 |
|
405 |
Berners-Lee 7 |
406 |
|
407 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
408 |
|
409 |
However, the reserved characters mentioned above have a quite |
410 |
different significance when encoded, and so may NEVER be encoded |
411 |
and unencoded in this way. |
412 |
|
413 |
The percent sign intended as such must always be encoded, as its |
414 |
presence otherwise always indicates an encoding. Sequences which |
415 |
start with a percent sign but are not followed by two hexadecimal |
416 |
characters are reserved for future extension. (see example 3 ) |
417 |
|
418 |
Example 1 |
419 |
|
420 |
The URIs |
421 |
|
422 |
http://info.cern.ch/albert/bertram/marie-claude |
423 |
|
424 |
and |
425 |
|
426 |
http://info.cern.ch/albert/bertram/marie%2Dclaude |
427 |
|
428 |
are identical, as the %2D encodes a hyphen character. |
429 |
|
430 |
Example 2 |
431 |
|
432 |
The URIs |
433 |
|
434 |
http://info.cern.ch/albert/bertram/marie-claude |
435 |
|
436 |
and |
437 |
|
438 |
http://info.cern.ch/albert/bertram%2Fmarie-claude |
439 |
|
440 |
are NOT identical, as in the second case the encoded slash does not |
441 |
have hierarchical significance. |
442 |
|
443 |
Example 3 |
444 |
|
445 |
The URIs |
446 |
|
447 |
fxqn:/us/va/reston/cnri/ietf/24/asdf%*.fred |
448 |
|
449 |
and |
450 |
|
451 |
news:12345667123%asdghfh@info.cern.ch |
452 |
|
453 |
are illegal, as all % characters imply encodings, and there is no |
454 |
decoding defined for "%*" or "%as" in this recommendation. |
455 |
|
456 |
Partial (relative) form |
457 |
|
458 |
Within a object whose URI is well defined, the URI of another |
459 |
object may be given in abbreviated form, where parts of the two |
460 |
URIs are the same. This allows objects within a group to refer to |
461 |
each other without requiring the space for a complete reference, |
462 |
|
463 |
Berners-Lee 8 |
464 |
|
465 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
466 |
|
467 |
and it incidentally allows the group of objects to be moved |
468 |
without changing any references. It must be emphasized that when |
469 |
a reference is passed in anything other than a well controlled |
470 |
context, the full form must always be used. |
471 |
|
472 |
In the World-Wide Web applications, the context URI is that of the |
473 |
document or object containing a reference. In this case partial |
474 |
URIs can be generated in virtual objects or stored in real objects, |
475 |
without the need for dramatic change if the higher-order parts of a |
476 |
hierarchical naming system are modified. Apart from terseness, |
477 |
this gives greater robustness to practical systems, by enabling |
478 |
information hiding between system components. |
479 |
|
480 |
The partial form relies on a property of the URI syntax that |
481 |
certain characters ("/") and certain path elements ("..", ".") have |
482 |
a significance reserved for representing a hierarchical space, and |
483 |
must be recognized as such by both clients and servers. |
484 |
|
485 |
A partial form can be distinguished from an absolute form in that |
486 |
the latter must have a colon and that colon must occur before any |
487 |
slash characters. Systems not requiring partial forms should not |
488 |
use any unencoded slashes in their naming schemes. If they do, |
489 |
absolute URIs will still work, but confusion may result. (See note |
490 |
on Gopher below). |
491 |
|
492 |
The rules for the use of a partial name relative to the URI of the |
493 |
context are: |
494 |
|
495 |
If the scheme parts are different, the whole absolute URI must |
496 |
be given. Otherwise, the scheme is omitted, and: |
497 |
|
498 |
If the partial URI starts with a non-zero number of consecutive |
499 |
slashes, then everything from the context URI up to (but not |
500 |
including) the first occurrence of exactly the same number of |
501 |
consecutive slashes is taken to be the same and so prepended to |
502 |
the partial URL to form the full URL. Otherwise: |
503 |
|
504 |
The last part of the path of the context URI (anything following |
505 |
the rightmost slash) is removed, and the given partial URI |
506 |
appended in its place, and then: |
507 |
|
508 |
Within the result, all occurrences of "xxx/../" or "/." are |
509 |
recursively removed, where xxx, ".." and "." are complete path |
510 |
elements. |
511 |
|
512 |
Note: Trailing slashes |
513 |
|
514 |
If a path of the context locator ends in slash, partial URIs are |
515 |
treated differently to the URI with the same path but without a |
516 |
trailing slash. The trailing slash indicates a void segment of the |
517 |
path. |
518 |
|
519 |
|
520 |
|
521 |
Berners-Lee 9 |
522 |
|
523 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
524 |
|
525 |
Note: Gopher |
526 |
|
527 |
The gopher system does not have the concept of relative URIs, and |
528 |
the gopher community currently allows / as data characters in |
529 |
gopher URIs without escaping them to %2F. Relative forms may not |
530 |
in general be used for documents served by gopher servers. If they |
531 |
are used, then WWW software assumes, normally correctly, that in |
532 |
fact they do have hierarchical significance despite the |
533 |
specifications. The use of HTTP rather than gopher protocol is |
534 |
however recommended. |
535 |
|
536 |
Examples |
537 |
|
538 |
In the context of URI |
539 |
|
540 |
magic://a/b/c//d/e/f |
541 |
|
542 |
the partial URIs would expand as follows: |
543 |
|
544 |
g magic://a/b/c//d/e/g |
545 |
|
546 |
/g magic://a/g |
547 |
|
548 |
//g magic://g |
549 |
|
550 |
../g magic://a/b/c//d/g |
551 |
|
552 |
g:a g:a |
553 |
|
554 |
and in the context of the URI |
555 |
|
556 |
magic://a/b/c//d/e/ |
557 |
|
558 |
the results would be exactly the same. |
559 |
|
560 |
Fragment-id |
561 |
|
562 |
This represents a part of, fragment of, or a sub-function within, |
563 |
an object . Its syntax and semantics are defined by the application |
564 |
responsible for the object, or the specification of the content |
565 |
type of the object. The only definition here is of the allowed |
566 |
characters by which it may be represented in a URL. |
567 |
|
568 |
Specific syntaxes for representing fragments in text documents by |
569 |
line and character range, or in graphics by coordinates, or in |
570 |
structured documents using ladders, are suitable for |
571 |
standardization but not defined here. |
572 |
|
573 |
The fragment-id follows the URL of the whole object from which it |
574 |
is separated by a hash sign (#). If the fragment-id is void, the |
575 |
hash sign may be omitted: A void fragment-id with or without the |
576 |
hash sign means that the URL refers to the whole object. |
577 |
|
578 |
|
579 |
Berners-Lee 10 |
580 |
|
581 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
582 |
|
583 |
While this hook is allowed for identification of fragments, the |
584 |
question of addressing of parts of objects, or of the grouping of |
585 |
objects and relationship between continued and containing objects, |
586 |
is not addressed by this document. |
587 |
|
588 |
Fragment identifiers do NOT address the question of objects which |
589 |
are different versions of a "living" object, nor of expressing the |
590 |
relationships between different versions and the living object. |
591 |
|
592 |
There is no implication that a fragment identifier refers to |
593 |
anything which can be extracted as an object in its own right. It |
594 |
may, for example, refer to an indivisible point within an object. |
595 |
|
596 |
SPECIFIC SCHEMES |
597 |
|
598 |
The mapping for URIs onto some existing standard and experimental |
599 |
protocols is outlined in the BNF syntax definition . Notes on |
600 |
particular protocols follow. These URIs are frequently referred |
601 |
to as URLs, though the exact definition of the term URL is still |
602 |
under discussion (March 1993). The schemes covered are: |
603 |
|
604 |
http Hypertext Transfer Protocol (examples) |
605 |
|
606 |
ftp File Transfer protocol |
607 |
|
608 |
gopher Gopher protocol |
609 |
|
610 |
mailto Electronic mail address |
611 |
|
612 |
news Usenet news |
613 |
|
614 |
telnet , rlogin and tn3270 |
615 |
Reference to interactive sessions |
616 |
|
617 |
wais Wide Area Information Servers |
618 |
|
619 |
file Local file access |
620 |
|
621 |
The following schemes are proposed as essential to the unification |
622 |
of the web with electronic mail, but not currently (to the author's |
623 |
knowledge) implemented: |
624 |
|
625 |
mid Message identifiers for electronic mail |
626 |
|
627 |
cid Content identifiers for MIME body part |
628 |
|
629 |
The schemes for x.500, network management database, and whois++ |
630 |
have not been specified and may be the subject of further study. |
631 |
Schemes for Prospero , and restricted NNTP use are not currently |
632 |
implemented as far as the author is aware. |
633 |
|
634 |
The "urn" prefix is reserved for use in encoding a Uniform Resource |
635 |
Name when that has been developed by the IETF working group. |
636 |
|
637 |
Berners-Lee 11 |
638 |
|
639 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
640 |
|
641 |
|
642 |
New schemes may be registered at a later time. |
643 |
|
644 |
HTTP |
645 |
|
646 |
The HTTP protocol specifies that the path is handled transparently |
647 |
by those who handle URLs, except for the servers which de-reference |
648 |
them. The path is passed by the client to the server with any |
649 |
request, but is not otherwise understood by the client. |
650 |
|
651 |
The host details are not passed on to the client when the URL is an |
652 |
http URL which refers to the server in question. In this case the |
653 |
string sent starts with the slash which follows the host details. |
654 |
However, when an http server is being used as a gateway (or |
655 |
"proxy") then the entire URI, whether HTTP or some other scheme, is |
656 |
passed on the HTTP command line.The search part, if present, is |
657 |
sent as part of the HTTP command, and may in this respect be |
658 |
treated as part of the path.No fragmentid part of a WWW URI (the |
659 |
hash sign and following) is sent with the request. Spaces and |
660 |
control characters in URLs must be escaped for transmission in |
661 |
HTTP, as must other disallowed characters. |
662 |
|
663 |
EXAMPLES |
664 |
|
665 |
These examples are not part of the specification: they are provided |
666 |
as illustations only. The URI of the "welcome" page to a server |
667 |
is conventionally |
668 |
|
669 |
http://www.my.work.com/ |
670 |
|
671 |
As the rest of the URL (after the hostname an port) is opaque to |
672 |
the client, it shows great variety but the following are all |
673 |
fairly typical. |
674 |
|
675 |
http://www.my.uni.edu/info/matriculation/enroling.html |
676 |
|
677 |
http://info.my.org/AboutUs/Phonebook |
678 |
|
679 |
http://www.library.my.town.va.us/Catalogue/76523471236%2Fwen44 |
680 |
--4.98 |
681 |
|
682 |
http://www.my.org/462F4F2D4241522A314159265358979323846 |
683 |
|
684 |
A URL for a server on a different port to 80 looks like |
685 |
|
686 |
http://info.cern.ch:8000/imaginary/test |
687 |
|
688 |
A reference to a particular part of a document may, including the |
689 |
fragment identifier, look like |
690 |
|
691 |
http://www.myu.edu/org/admin/people#andy |
692 |
|
693 |
|
694 |
|
695 |
Berners-Lee 12 |
696 |
|
697 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
698 |
|
699 |
in which case the string "#andy" is not sent to the server, but is |
700 |
retained by the client and used when the whole object had been |
701 |
retrieved. |
702 |
|
703 |
A search on a text database might look like |
704 |
|
705 |
http://info.my.org/AboutUs/Index/Phonebook?dobbins |
706 |
|
707 |
and on another database |
708 |
|
709 |
http://info.cern.ch/RDB/EMP?*%20where%20name%%3Ddobbins |
710 |
|
711 |
In all cases the client passes the path string to the server |
712 |
uninterpreted, and for the client to deduce anything from |
713 |
|
714 |
FTP |
715 |
|
716 |
The ftp: prefix indicates that the FTP protocol is used, as defined |
717 |
in RFC957 or any successor. The port number, if present, gives the |
718 |
port of the FTP server if not the FTP default. |
719 |
|
720 |
User name and password |
721 |
|
722 |
The syntax allows for the inclusion of a user name and even a |
723 |
password for those systems which do not use the anonymous FTP |
724 |
convention. The default, however, if no user or password is |
725 |
supplied, will be to use that convention, viz. that the user name |
726 |
is "anonymous" and the password the user's Internet-style mail |
727 |
address . |
728 |
|
729 |
Where possible, this mail address should correspond to a usable |
730 |
mail address for the user, and preferably give a DNS host name |
731 |
which resolves to the IP address of the client. Note that servers |
732 |
currently vary in their treatment of the anonymous password. |
733 |
|
734 |
Path |
735 |
|
736 |
The FTP protocol allows for a sequence of CWD commands (change |
737 |
working directory) and a TYPE command prior to service commands |
738 |
such as RETR (retrieve) or NLIST (etc) which actually access a |
739 |
file. |
740 |
|
741 |
The arguments of any CWD commands are successive segment parts of |
742 |
the URL delimited by slash, and the final segment is suitable as |
743 |
the filename argument to the RETR command for retrieval or the |
744 |
directory argument to NLIST. |
745 |
|
746 |
For some file systems (Unix in particular), the "/" used to denote |
747 |
the hierarchical structure of the URL corresponds to the delimiter |
748 |
used to construct a file name hierarchy, and thus, the filename |
749 |
will look the same as the URL path. This does NOT mean that the URL |
750 |
is a Unix filename. |
751 |
|
752 |
|
753 |
Berners-Lee 13 |
754 |
|
755 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
756 |
|
757 |
Note: Retrieving subsequent URLs from the same host |
758 |
|
759 |
There is no common hierarchical model to the FTP protocol, so if a |
760 |
directory change command has been given, it is impossible in |
761 |
general to deduce what sequence should be given to navigate to |
762 |
another directory for a second retrieval, if the paths are |
763 |
different. The only reliable algorithm is to disconnect and |
764 |
reestablish the control connection. |
765 |
|
766 |
Data type |
767 |
|
768 |
The data content type of a file can only, in the general FTP case, |
769 |
be deduced from the name, normally the suffix of the name. This is |
770 |
not standardized. An alternative is for it to be transferred in |
771 |
information outside the URL. A suitable FTP transfer type (for |
772 |
example binary "I" or text "A") must in turn be deduced from the |
773 |
data content type. It is recommended that conventions for suffixes |
774 |
of public archives be established, but it is outside the scope of |
775 |
this standard. |
776 |
|
777 |
An FTP URL may optionally specify the FTP data transfer type by |
778 |
which an object is to be retrieved. Most of the methods correspond |
779 |
to the FTP "Data Types" ASCII and IMAGE for the retrieval of a |
780 |
document, as specified in FTP by the TYPE command . One method |
781 |
indicates directory access. |
782 |
|
783 |
The data type is specified by a suffix to the URL. Possible |
784 |
suffixes are: |
785 |
|
786 |
;type = <type-code> Use FTP type as given to perform data |
787 |
transfer. |
788 |
|
789 |
/ Use FTP directory list commands to read |
790 |
directory |
791 |
|
792 |
The type code is in the format defined in RFC959 except that THE |
793 |
SPACE IS OMITTED FROM THE URL. |
794 |
|
795 |
Transfer Mode |
796 |
|
797 |
Stream Mode is always used. |
798 |
|
799 |
Gopher |
800 |
|
801 |
The gopher URL specifies the host and optionally the port to which |
802 |
the client should connect. This is followed by a slash and a |
803 |
single gopher type code. This type code is used by the client to |
804 |
determine how to interpret the server's reply and is is not for |
805 |
sending to server. The command string to be sent to the server |
806 |
immediately follows the gopher type character. It consists of the |
807 |
gopher selector string followed by any "Gopher plus" syntax, but |
808 |
always omitting the trainling CR LF pair. |
809 |
|
810 |
|
811 |
Berners-Lee 14 |
812 |
|
813 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
814 |
|
815 |
When the gopher command string contains characters (such a |
816 |
embedded CR LF and HT characters) not allowed in a URL, these are |
817 |
encoded using the conventional encoding. |
818 |
|
819 |
|
820 |
|
821 |
Note that some gopher selector strings begin with a copy of the |
822 |
gopher type character, in which case that character will occur |
823 |
twice consecutively. Also note that the gopher selector string may |
824 |
be an empty string since this is how gopher clients refer to the |
825 |
top-level directory on a gopher server. |
826 |
|
827 |
If the encoded command string (with trailing CR LF stripped) would |
828 |
be void then the gopher type character may be omiited and "1" |
829 |
(ASCII 31 hex) is assumed. |
830 |
|
831 |
|
832 |
|
833 |
Note that slash "/" in gopher selector strings may not correspond |
834 |
to a level in a hierarchical structure. |
835 |
|
836 |
Mailto |
837 |
|
838 |
This allows a URL to specify an RFC822 addr-spec mail address. |
839 |
Note that use of % , for example as used in forming a gatewayed |
840 |
mail address, requires conversion to %25 in a URL. |
841 |
|
842 |
News |
843 |
|
844 |
The news locators refer to either news group names or article |
845 |
message identifiers which must conform to the rules for a |
846 |
Message-Idof RFC 1036 (Horton 1987). A message identifier may be |
847 |
distinguished from a news group name by the presence of the |
848 |
commercial at "@" character. These rules imply that within an |
849 |
article, a reference to a news group or to another article will be |
850 |
a valid URL (in the partial form). |
851 |
|
852 |
A news URL may be dereferenced using NNTP (RFC977, Kantor 86) (The |
853 |
ARTICLE by message-id command ) or using any other protocol for the |
854 |
conveyance of usenet news articles, or by reference to a body of |
855 |
news articles already received. |
856 |
|
857 |
Note1: |
858 |
|
859 |
Among URLs the "news" URLs are anomalous in that they are |
860 |
location-independent. They are unsuitable as URN candidates because |
861 |
the NNTP architecture relies on the expiry of articles and |
862 |
therefore a small number of articles being available at any time. |
863 |
When a news: URL is quoted, the assumption is that the reader will |
864 |
fetch the article or group from his or her local news host. News |
865 |
host names are NOT part of news URLs. |
866 |
|
867 |
Note 2: |
868 |
|
869 |
Berners-Lee 15 |
870 |
|
871 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
872 |
|
873 |
|
874 |
An outstanding problem is that the message identifier is |
875 |
insufficient to allow the retrieval of an expired article, as no |
876 |
algorithm exists for deriving an archive site and file name. The |
877 |
addition of the date and news group set to the article's URL would |
878 |
allow this if a directory existed of archive sites by news group. |
879 |
Suggested subject of study in conjunction with NNTP working group. |
880 |
Further extension possible may be to allow the naming of subject |
881 |
threads as addressable objects. |
882 |
|
883 |
Telnet, rlogin, tn3270 |
884 |
|
885 |
The use of URLs to represent interactive sessions is a convenient |
886 |
extension to their uses for objects. This allows access to |
887 |
information systems which only provide an interactive service, and |
888 |
no information server. As information within the service cannot be |
889 |
addressed individually or, in general, automatically retrieved, |
890 |
this is a less desirable, though currently common, solution. |
891 |
|
892 |
URN |
893 |
|
894 |
The "Universal Resource Name" is currently (March 1993) under |
895 |
development in the IETF. A requirements specification is in |
896 |
preparation. It currently looks as though it will be a short string |
897 |
suitable for encoding in URI syntax, for which case the "urn:" |
898 |
prefix is reserved. The URN shall be encoded precisely as defined |
899 |
in the (future) URN standard, except in that: |
900 |
|
901 |
If the official description of the URN syntax includes any |
902 |
constant wrapper characters, then they shall not be omitted from |
903 |
the URI encoding of the URN; |
904 |
|
905 |
If the URN has a hierarchical nature, then the slash delimiter |
906 |
shall be used in the URI encoding; |
907 |
|
908 |
If the URN has a hierarchical nature, the most significant part |
909 |
shall be encoded on the left in the URI encoding; |
910 |
|
911 |
Any characters with reserved meanings in the URI syntax shall be |
912 |
escape encoded |
913 |
|
914 |
These rules of course apply to any URI scheme. It is of course |
915 |
possible that the URN syntax will be chosen such that the URI |
916 |
encoding will be a 1-1 transcription. |
917 |
|
918 |
An example might be a name such as |
919 |
|
920 |
urn:/iana/dns/ch/cern/cn/techdoc/94/1642-3 |
921 |
|
922 |
but the reader should refer to the latest URN drafts or |
923 |
specifications. |
924 |
|
925 |
|
926 |
|
927 |
Berners-Lee 16 |
928 |
|
929 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
930 |
|
931 |
WAIS |
932 |
|
933 |
The current WAIS implementation public domain requires that a |
934 |
client know the "type" of a object prior to retrieval. This value |
935 |
is returned along with the internal object identifier in the search |
936 |
response. It has been encoded into the path part of the URL in |
937 |
order to make the URL sufficient for the retrieval of the object. |
938 |
Within the WAIS world, names do not of course need to be prefixed |
939 |
by "wais:" (by the partial form rules). |
940 |
|
941 |
The wpath of a WAIS URL consists of encoded fields of the WAIS |
942 |
identifier, in the same order as inthe WAIS identifier. For each |
943 |
field, the identifier field number is the digits before the equals |
944 |
sign, and the field contents follow, encoded in the conventional |
945 |
encoding, terminated by ";". |
946 |
|
947 |
|
948 |
|
949 |
file |
950 |
|
951 |
The other URI schemes (except nntp) share the property that they |
952 |
are equally valid at any geographical place. |
953 |
|
954 |
There is however a real practical requirement to be able to |
955 |
generate a URL for an object in a machine's local file system. |
956 |
|
957 |
The syntax is similar to the ftp syntax, but in this case the slash |
958 |
is used to donate boundaries between directory levels of a |
959 |
hierarchical file system is used. The "client" software converts |
960 |
the file URL into a file name in the local file name conventions. |
961 |
This allows local files to be treated just as network objects |
962 |
without any necessity to use a network server for access. This may |
963 |
be used for example for defining a user's "home" document in WWW. |
964 |
|
965 |
There is clearly a danger of confusion that a link made to a local |
966 |
file should be followed by someone on a different system, with |
967 |
unexpected and possibly harmful results. Therefore, the convention |
968 |
is that even a "file" URL is provided with a host part. This |
969 |
allows a client on another system to know that it cannot access the |
970 |
file system, or perhaps to use some other local mecahnism to access |
971 |
the file. |
972 |
|
973 |
The special value "localhost" is used in the host field to indicate |
974 |
that the filename should really be used on whatever host one is. |
975 |
This for example allows links to be made to files which are |
976 |
distribted on many machines, or to "your unix local password file" |
977 |
subject of course to consistency across the users of the data. |
978 |
|
979 |
A void host field is equivalent to "localhost". |
980 |
|
981 |
Message-Id |
982 |
|
983 |
For systems which include information transferred using mail |
984 |
|
985 |
Berners-Lee 17 |
986 |
|
987 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
988 |
|
989 |
protocols, there is a need to be able to make cross-references |
990 |
between different items of information, even though, by the nature |
991 |
of mail, those items are only available to a restricted set of |
992 |
people. |
993 |
|
994 |
Two schemes are defined. The first, "mid:", refers to the RFC822 |
995 |
Message-Id of a mail message. This Identifier is already used in |
996 |
RFC822 in for example the References and In-Reply-to field . The |
997 |
rest of the URL after the "mid:" is the RFC822 msg-id with the |
998 |
constant <> wrapper removed, leaving an identifier whose format |
999 |
in fact happens to be the same as addr-spec format for mailboxes |
1000 |
(though the semantics are different). |
1001 |
|
1002 |
The use of a "mid" URL implies access to a body of mail already |
1003 |
received. If a message has been distributed using NNTP or other |
1004 |
usenet protocols over the news system, then the "news:" form should |
1005 |
be used. |
1006 |
|
1007 |
Content-Id |
1008 |
|
1009 |
The second scheme, "cid:", is similar to "mid:" , but makes |
1010 |
reference to a body part of a MIME message by the value of its |
1011 |
content-id field. This allows, for example, a master document being |
1012 |
the first part of a multipart/related MIME message to refer to |
1013 |
component parts which are transferred in the same message. |
1014 |
|
1015 |
Note |
1016 |
|
1017 |
Beware however, that content identifiers are only required to be |
1018 |
unique within the context of a given MIME message, and so the cid: |
1019 |
URL is only meaningful with the context the same MIME message. For |
1020 |
a reference outside the message, it would need to be appended to |
1021 |
the message-id of the whole message. A syntax for this has not been |
1022 |
defined. |
1023 |
|
1024 |
Schemes for Further Study |
1025 |
|
1026 |
X500 |
1027 |
|
1028 |
The mapping of x500 names onto URLs is not defined here. A decision |
1029 |
is required as to whether "distinguished names" or "user friendly |
1030 |
names" (ufn), or both, should be allowed. If any punctuation |
1031 |
conversions are needed from the adopted x500 representation (such |
1032 |
as the use of slashes between parts of a ufn) they must be defined. |
1033 |
This is a subject for study. |
1034 |
|
1035 |
WHOIS |
1036 |
|
1037 |
This prefix describes the access using the "whois++" scheme in the |
1038 |
process of definition. The host name part is the same as for other |
1039 |
IP based schemes. The path part can be either a whois handle for a |
1040 |
whois object, or it can be a valid whois query string. This is a |
1041 |
|
1042 |
|
1043 |
Berners-Lee 18 |
1044 |
|
1045 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
1046 |
|
1047 |
subject for further study. |
1048 |
|
1049 |
NETWORK MANAGEMENT DATABASE |
1050 |
|
1051 |
This is a subject for study. |
1052 |
|
1053 |
NNTP |
1054 |
|
1055 |
This is an alternative form of reference for news articles, |
1056 |
specifically to be used with NNTP servers, and particularly those |
1057 |
incomplete server implementations which do not allow retrieval by |
1058 |
message identifier. In all other cases the "news" scheme should be |
1059 |
used. |
1060 |
|
1061 |
The news server name, newsgroup name, and index number of an |
1062 |
article within the newsgroup on that particular server are given. |
1063 |
The NNTP protocol must be used. |
1064 |
|
1065 |
Note1. |
1066 |
|
1067 |
This form of URL is not of global accessability, as typically NNTP |
1068 |
servers only allow access from local clients. Note that the |
1069 |
article numbers within groups vary from server to server. |
1070 |
|
1071 |
This form or URL should not be quoted outside this local area. It |
1072 |
should not be used within news articles for wider circulation than |
1073 |
the one server. This is a local identifier for a resource which is |
1074 |
often available globally, and so is not recommended except in the |
1075 |
case in which incomplete NNTP implementations on the local server |
1076 |
force its adoption. |
1077 |
|
1078 |
Prospero |
1079 |
|
1080 |
The Prospero (Neuman, 1991) directory service is used to resolve |
1081 |
the URL yielding an access method for the object (which can then |
1082 |
itself be represented as a URL if translated). The host part |
1083 |
contains a host name or internet address. The port part is |
1084 |
optional. |
1085 |
|
1086 |
The path part contains a host specific object name and an optional |
1087 |
version number. If present, the version number is separated from |
1088 |
the host specific object name by the characters "%00" (percent |
1089 |
zero zero), this being an escaped string terminator (null). |
1090 |
External Prospero links are represented as URLs of the underlying |
1091 |
access method and are not represented as Prospero URLs. |
1092 |
|
1093 |
Registration of naming schemes |
1094 |
|
1095 |
A new naming scheme may be introduced by defining a mapping onto a |
1096 |
conforming URL syntax, using a new prefix. Experimental prefixes |
1097 |
may be used by mutual agreement between parties, and must start |
1098 |
with the characters "x-". The scheme name "urn:" is reserved for |
1099 |
|
1100 |
|
1101 |
Berners-Lee 19 |
1102 |
|
1103 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
1104 |
|
1105 |
the work in progress on a scheme for more persistent names. |
1106 |
|
1107 |
It is proposed that the Internet Assigned Numbers Authority (IANA) |
1108 |
perform the function of registration of new schemes. Any submission |
1109 |
of a new URI scheme must include a definition of an algorithm for |
1110 |
the retrieval of any object within that scheme. The algorithm must |
1111 |
take the URI and produce either a set of URL(s) which will lead to |
1112 |
the desired object, or the object itself, in a well-defined or |
1113 |
determinable format. |
1114 |
|
1115 |
It is recommended that those proposing a new scheme demonstrate its |
1116 |
utility and operability by the provision of a gateway which will |
1117 |
provide images of objects in the new scheme for clients using an |
1118 |
existing protocol. If the new scheme is not a locator scheme, then |
1119 |
the properties of names in the new space should be clearly defined. |
1120 |
It is likewise recommended that, where a protocol allows for |
1121 |
retrieval by URL, that the client software have provision for being |
1122 |
configured to use specific gateway locators for indirect access |
1123 |
through new naming schemes. |
1124 |
|
1125 |
BNF OF GENERIC URI SYNTAX |
1126 |
|
1127 |
This is a BNF-like description of the URI syntax. at the level at |
1128 |
which specific schemes are not considered. |
1129 |
|
1130 |
A vertical line "|" indicates alternatives, and [brackets] |
1131 |
indicate optional parts. Spaces are represented by the word |
1132 |
"space", and the vertical line character by "vline". Single |
1133 |
letters stand for single letters. All words of more than one letter |
1134 |
below are entities described somewhere in this description. |
1135 |
|
1136 |
The "generic" production gives a higher level parsing of the same |
1137 |
URIs as the other productions. The "national" and "punctuation" |
1138 |
characters do not appear in any productions and therefore may not |
1139 |
appear in URIs. |
1140 |
|
1141 |
fragmentaddress uri [ # fragmentid ] |
1142 |
|
1143 |
uri scheme : path [ ? search ] |
1144 |
|
1145 |
scheme ialpha |
1146 |
|
1147 |
path void | xpalphas [ / path ] |
1148 |
|
1149 |
search xalphas [ + search ] |
1150 |
|
1151 |
fragmentid xalphas |
1152 |
|
1153 |
xalpha alpha | digit | safe | extra | escape |
1154 |
|
1155 |
xalphas xalpha [ xalphas ] |
1156 |
|
1157 |
|
1158 |
|
1159 |
Berners-Lee 20 |
1160 |
|
1161 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
1162 |
|
1163 |
xpalpha xalpha | + |
1164 |
|
1165 |
xpalphas xpalpha [ xpalpha ] |
1166 |
|
1167 |
ialpha alpha [ xalphas ] |
1168 |
|
1169 |
alpha a | b | c | d | e | f | g | h | i | j | k | |
1170 |
l | m | n | o | p | q | r | s | t | u | v | |
1171 |
w | x | y | z | A | B | C | D | E | F | G | |
1172 |
H | I | J | K | L | M | N | O | P | Q | R | |
1173 |
S | T | U | V | W | X | Y | Z |
1174 |
|
1175 |
digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
1176 |
|
1177 |
safe $ | - | _ | @ | . | & |
1178 |
|
1179 |
extra ! | * | " | ' | ( | ) | , |
1180 |
|
1181 |
reserved = | ; | / | # | ? | : | space |
1182 |
|
1183 |
escape % hex hex |
1184 |
|
1185 |
hex digit | a | b | c | d | e | f | A | B | C | |
1186 |
D | E | F |
1187 |
|
1188 |
national { | } | vline | [ | ] | \ | ^ | ~ |
1189 |
|
1190 |
punctuation < | > |
1191 |
|
1192 |
void |
1193 |
|
1194 |
(end of URI BNF) |
1195 |
|
1196 |
BNF for specific URL schemes |
1197 |
|
1198 |
This is a BNF-like description of the Uniform Resource Locator |
1199 |
syntax. A vertical line "|" indicates alternatives, and |
1200 |
[brackets] indicate optional parts. Spaces are represented by the |
1201 |
word "space", and the vertical line character by "vline". Single |
1202 |
letters stand for single letters. All words of more than one letter |
1203 |
below are entities described somewhere in this description. |
1204 |
|
1205 |
The current IETF URI working group preference is for the |
1206 |
prefixedurl production. (Nov 1993. July 93: url). |
1207 |
|
1208 |
The "national" and "punctuation" characters do not appear in any |
1209 |
productions and therefore may not appear in URLs. |
1210 |
|
1211 |
The "afsaddress" is left in as historical note, but is not a url |
1212 |
production |
1213 |
|
1214 |
prefixedurl u r l : url |
1215 |
|
1216 |
|
1217 |
Berners-Lee 21 |
1218 |
|
1219 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
1220 |
|
1221 |
ur l httpaddress | ftpaddress | newsaddress | |
1222 |
nntpaddress | prosperoaddress | telnetaddress |
1223 |
| gopheraddress | waisaddress | |
1224 |
mailtoaddress | midaddress | cidaddress |
1225 |
|
1226 |
scheme ialpha |
1227 |
|
1228 |
httpaddress h t t p : / / hostport [ / path ] [ ? |
1229 |
search ] |
1230 |
|
1231 |
ftpaddress f t p : / / login / path [ ftptype ] |
1232 |
|
1233 |
afsaddress a f s : / / cellname / path |
1234 |
|
1235 |
newsaddress n e w s : groupart |
1236 |
|
1237 |
nntpaddress n n t p : group / digits |
1238 |
|
1239 |
midaddress m i d : addr-spec |
1240 |
|
1241 |
cidaddress c i d : content-identifier |
1242 |
|
1243 |
mailtoaddress m a i l t o : : xalphas @ hostname |
1244 |
|
1245 |
waisaddress waisindex | waisdoc |
1246 |
|
1247 |
waisindex w a i s : / / hostport / database [ ? search |
1248 |
] |
1249 |
|
1250 |
waisdoc w a i s : / / hostport / database / wtype / |
1251 |
wpath |
1252 |
|
1253 |
wpath digits = path ; [ wpath ] |
1254 |
|
1255 |
groupart * | group | article |
1256 |
|
1257 |
group ialpha [ . group ] |
1258 |
|
1259 |
article xalphas @ host |
1260 |
|
1261 |
database xalphas |
1262 |
|
1263 |
wtype xalphas |
1264 |
|
1265 |
prosperoaddress prosperolink |
1266 |
|
1267 |
prosperolink p r o s p e r o : / / hostport / hsoname [ % |
1268 |
0 0 version [ attributes ] ] |
1269 |
|
1270 |
hsoname path |
1271 |
|
1272 |
version digits |
1273 |
|
1274 |
|
1275 |
Berners-Lee 22 |
1276 |
|
1277 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
1278 |
|
1279 |
attributes attribute [ attributes ] |
1280 |
|
1281 |
attribute alphanums |
1282 |
|
1283 |
telnetaddress t e l n e t : / / login |
1284 |
|
1285 |
gopheraddress g o p h e r : / / hostport [/ gtype [ |
1286 |
gcommand ] ] |
1287 |
|
1288 |
login [ user [ : password ] @ ] hostport |
1289 |
|
1290 |
hostport host [ : port ] |
1291 |
|
1292 |
host hostname | hostnumber |
1293 |
|
1294 |
ftptype A formcode | E formcode | I | L digits |
1295 |
|
1296 |
formcode N | T | C |
1297 |
|
1298 |
cellname hostname |
1299 |
|
1300 |
hostname ialpha [ . hostname ] |
1301 |
|
1302 |
hostnumber digits . digits . digits . digits |
1303 |
|
1304 |
port digits |
1305 |
|
1306 |
gcommand path |
1307 |
|
1308 |
path void | segment [ / path ] |
1309 |
|
1310 |
segment xpalphas |
1311 |
|
1312 |
search xalphas [ + search ] |
1313 |
|
1314 |
user alphanum2 [ user ] |
1315 |
|
1316 |
password alphanum2 [ password ] |
1317 |
|
1318 |
fragmentid xalphas |
1319 |
|
1320 |
gtype xalpha |
1321 |
|
1322 |
alphanum2 alpha | digit | - | _ | . | + |
1323 |
|
1324 |
xalpha alpha | digit | safe | extra | escape |
1325 |
|
1326 |
xalphas xalpha [ xalphas ] |
1327 |
|
1328 |
xpalpha xalpha | + |
1329 |
|
1330 |
xpalphas xpalpha [ xpalphas ] |
1331 |
|
1332 |
|
1333 |
Berners-Lee 23 |
1334 |
|
1335 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
1336 |
|
1337 |
ialpha alpha [ xalphas ] |
1338 |
|
1339 |
alpha a | b | c | d | e | f | g | h | i | j | k | |
1340 |
l | m | n | o | p | q | r | s | t | u | v | |
1341 |
w | x | y | z | A | B | C | D | E | F | G | |
1342 |
H | I | J | K | L | M | N | O | P | Q | R | |
1343 |
S | T | U | V | W | X | Y | Z |
1344 |
|
1345 |
digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
1346 |
|
1347 |
safe $ | - | _ | @ | . | & | + | - |
1348 |
|
1349 |
extra ! | * | " | ' | ( | ) | , |
1350 |
|
1351 |
reserved = | ; | / | # | ? | : | space |
1352 |
|
1353 |
escape % hex hex |
1354 |
|
1355 |
hex digit | a | b | c | d | e | f | A | B | C | |
1356 |
D | E | F |
1357 |
|
1358 |
national { | } | vline | [ | ] | \ | ^ | ~ |
1359 |
|
1360 |
punctuation < | > |
1361 |
|
1362 |
digits digit [ digits ] |
1363 |
|
1364 |
alphanum alpha | digit |
1365 |
|
1366 |
alphanums alphanum [ alphanums ] |
1367 |
|
1368 |
void |
1369 |
|
1370 |
(end of URL BNF) |
1371 |
|
1372 |
REFERENCES |
1373 |
|
1374 |
Alberti, R., et.al. (1991) |
1375 |
"Notes on the Internet Gopher Protocol" |
1376 |
University of Minnesota, December 1991, |
1377 |
<ftp://boombox.micro.umn.edu/pub/gopher/ |
1378 |
gopher_protocol> . See also |
1379 |
<gopher://gopher.micro.umn.edu/00/Information |
1380 |
About Gopher/About Gopher> |
1381 |
|
1382 |
Berners-Lee, T ., (1991) |
1383 |
"Hypertext Transfer Protocol (HTTP)" , CERN, |
1384 |
December 1991, as updated from time to time, |
1385 |
<ftp://info.cern.ch/pub/www/doc/http-spec.txt |
1386 |
> |
1387 |
|
1388 |
Crocker "Standard for ARPA Internet Text Messages" . |
1389 |
David H. Crocker, RFC822, |
1390 |
|
1391 |
Berners-Lee 24 |
1392 |
|
1393 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
1394 |
|
1395 |
|
1396 |
Davis, F, et al., (1990) |
1397 |
"WAIS Interface Protocol: Prototype |
1398 |
Functional Specification", Thinking Machines |
1399 |
Corporation, April 23, 1990 |
1400 |
<ftp://quake.think.com/pub/wa |
1401 |
is/doc/protspec.txt> |
1402 |
|
1403 |
International Standards Organization, (1991) |
1404 |
Information and Documentation - Search and |
1405 |
Retrieve Application Protocol Specification |
1406 |
for open Systems Interconnection, ISO-10163 |
1407 |
|
1408 |
Horton (1987) M. Horton, R. Adams, "Standard for |
1409 |
interchange of USENET messages", Internet RFC |
1410 |
1036 , 12/01/1987. |
1411 |
|
1412 |
Huitema, C., (1991) "Naming: strategies and techniques", |
1413 |
Computer Networks and ISDN Systems 23 (1991) |
1414 |
107-110. |
1415 |
|
1416 |
Kahle, Brewster, (1991) |
1417 |
"Document Identifiers, or International |
1418 |
Standard Book Numbers for the Electronic |
1419 |
Age", |
1420 |
<ftp: |
1421 |
//quake.think.com/pub/wais/doc/doc-ids.txt> |
1422 |
|
1423 |
Kantor, B., and Lapsley, P., (1986) |
1424 |
"A proposed standard for the stream-based |
1425 |
transmission of news" , Internet RFC-977, |
1426 |
February 1986. |
1427 |
<ftp://ds.internic.net/rfc/rfc977.txt> |
1428 |
|
1429 |
Kunze, 1994 J. Kunze, Requirements for URLs, to be |
1430 |
published. |
1431 |
|
1432 |
Lynch, C., Coallition for Networked Information: (1991) |
1433 |
"Workshop on ID and Reference Structures for |
1434 |
Networked Information", November 1991. See |
1435 |
<wais://quake.think.com/wais-discussion-ar |
1436 |
chives?lynch> |
1437 |
|
1438 |
Mockapetris, P., (1987) |
1439 |
"Domain names + concepts and facilities", |
1440 |
RFC-1034, USC-ISI, November 1987, |
1441 |
<ftp://ds.internic.net/rfc/rfc1034.txt> |
1442 |
|
1443 |
Neuman, B. Clifford, (1992) |
1444 |
"Prospero: A Tool for Organizing Internet |
1445 |
Resources", Electronic Networking: Research, |
1446 |
Applications and Policy, Vol 1 No 2, Meckler |
1447 |
|
1448 |
|
1449 |
Berners-Lee 25 |
1450 |
|
1451 |
RFC XXXX Universal Resource Identifiers in WWW 12 March 1994 |
1452 |
|
1453 |
Westport CT USA. See also |
1454 |
<ftp://prospero.isi.edu/pub/prospero/oir.ps> |
1455 |
|
1456 |
Postel, J. and Reynolds, J. (1985) |
1457 |
"File Transfer Protocol (FTP)", Internet |
1458 |
RFC-959, October 1985. |
1459 |
<ftp://ds.internic.net/rfc/rfc959.txt> |
1460 |
|
1461 |
Sollins 1994 K. Sollins and L. Masinter, Requiremnets for |
1462 |
URNs, to be published. |
1463 |
|
1464 |
Yeong, W., (1991a) "Towards Networked Information Retrieval", |
1465 |
Technical report 91-06-25-01, June 1991, |
1466 |
Performance Systems International, Inc. |
1467 |
<ftp://uu.psi.com/wp/nir.txt> |
1468 |
|
1469 |
Yeong, W., (1991b), "Representing Public Archives in the |
1470 |
Directory", Internet Draft, November 1991, |
1471 |
now expired. |
1472 |
|
1473 |
. |
1474 |
|
1475 |
AUTHOR'S ADDRESS |
1476 |
|
1477 |
Tim Berners-Lee |
1478 |
|
1479 |
Address: World-Wide Web project |
1480 |
|
1481 |
CERN, |
1482 |
|
1483 |
1211 Geneva 23, |
1484 |
|
1485 |
Switzerland |
1486 |
|
1487 |
|
1488 |
Telephone: +41 (22)767 3755 |
1489 |
|
1490 |
Fax: +41 (22)767 7155 |
1491 |
|
1492 |
Email: timbl@info.cern.ch |
1493 |
|
1494 |
|
1495 |
|
1496 |
|
1497 |
|
1498 |
|
1499 |
|
1500 |
|
1501 |
|
1502 |
|
1503 |
|
1504 |
|
1505 |
|
1506 |
|
1507 |
Berners-Lee 26 |
1508 |
|