/[suikacvs]/webroot/www/2004/id/draft-uri-url-02.txt
Suika

Contents of /webroot/www/2004/id/draft-uri-url-02.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 Uniform Resource Locators (URL) Tim Berners-Lee
2 draft-ietf-uri-url-02.{ps,txt} CERN
3 Expires 1 July 1994 1 Jan 1994
4
5
6 Uniform Resource Locators (URL)
7
8 A Unifying Syntax for the Expression of
9 Names and Addresses of Objects on the Network
10
11
12 Status of this memo
13
14 This document is an Internet Draft. Internet Drafts are working
15 documents of the Internet Engineering Task Force (IETF), its Areas,
16 and its Working Groups. Note that other groups may also distribute
17 working documents as Internet Drafts.
18
19 Internet Drafts are working documents valid for a maximum of six
20 months. Internet Drafts may be updated, replaced, or obsoleted by
21 other documents at any time. It is not appropriate to use Internet
22 Drafts as reference material or to cite them other than as a
23 "working draft" or "work in progress".
24
25 Distribution of this document is unlimited. Please send comments
26 to the author as timbl@info.cern.ch. or to the discussion list
27 ietf-url@merit.edu.
28
29 Abstract
30
31 Many protocols and systems for document search and retrieval are
32 currently in use, and many more protocols or refinements of
33 existing protocols are to be expected in a field whose expansion is
34 explosive.
35
36 These systems are aiming to achieve global search and readership of
37 documents across differing computing platforms, and despite a
38 plethora of protocols and data formats. As protocols evolve,
39 gateways can allow global access to remain possible. As data
40 formats evolve, format conversion programs can preserve global
41 access. There is one area, however, in which it is impractical to
42 make conversions, and that is in the names and addresses used to
43 identify objects. This is because names and addresses of objects
44 are passed on in so many ways, from the backs of envelopes to
45 hypertext objects, and may have a long life.
46
47 A common feature of almost all the data models of past and proposed
48 systems is something which can be mapped onto a concept of "object"
49 and some kind of name, address, or identifier for that object. One
50 can therefore define a set of name spaces in which these objects
51 can be said to exist.
52
53 Practical systems need to access and mix objects which are part of
54
55
56
57 Berners-Lee 1
58
59 different existing and proposed systems.
60
61 This paper discusses the requirements on a universal syntax which
62 can be used to encapsulate a name in any registered name space.
63 This will allow names in different spaces to be treated in a common
64 way, even though names in different spaces have differing
65 characteristics, as do the objects to which they refer
66
67 The universal syntax to objects available using existing protocols,
68 and may be extended with technology. It makes a recommendation for
69 a generic syntax, and for specific forms for "Uniform Resource
70 Locators" (URLs)of objects accessible using existing Internet
71 protocols.
72
73 The syntax has been in widespread use by World-Wide Web software
74 since 1990.
75
76 Terms
77
78 The objects on the network which are to be named and addressed
79 include typically objects which can be retrieved, and objects which
80 can be searched. There is a great variety of other objects which
81 may support other operations. We imply nothing about the contents
82 of objects in this document. Whereas human-readable documents are
83 currently the center of interest of the field, we envisage all
84 aspects discussed in this paper applying to generalized objects
85 when systems to handle them become available. The "object" is the
86 unit of reference and need not correspond to any unit of storage.
87 We refer to objects which can be searched as "indexes". We
88 emphasize that this is the abstract view of the client, and these
89 objects need not correspond to physical files on computers. We
90 refer to the person who does the retrieval or searching as the
91 user.
92
93 Within this document, we use the terms "name" very generally for a
94 string of characters describing an object, whatever its
95 combination of properties mentioned below. (The term usually has a
96 narrower meaning but we needed some term for the universal set.).
97 This uniform syntax applied to a generic name is known as a Uniform
98 Resource Identifier (URI). The term "address" is reserved for an
99 string which specifies a more or less physical location. The term
100 "locator" refers to a URL as here defined. URIs which have a
101 greater persistence than URLs are referred to as URNs.
102
103 Characteristics
104
105 This section characteristics of various naming schemes,
106 requirements which some ofexisting schemes meet, and requirements
107 for the URL scheme itself. URLs, as an introduction of and
108 background for the Recommendations section.
109
110 USES OF NAMES AND ADDRESSES
111
112
113
114
115 Berners-Lee 2
116
117 A name allows a user, with the help of a "client" program, to
118 retrieve or operate on objects via a "server" program. A name may
119 be passed for example:
120
121 In communication of any form between two people, to refer to a
122 document, or part of a document;
123
124 As part of the description of a link associated with a hypertext
125 document;
126
127 As part of the result of searching an index.
128
129 Some typical requirements on a name which are met to a varying
130 degree by various schemes are for example that the name is
131
132 Persistent A given name will remain valid as long as it
133 is needed;
134
135 Extensible A given naming syntax will remain valid
136 through the introduction of new protocols and
137 directory technologies;
138
139 Resolvable A name will contain enough information to
140 allow the document or index to which it
141 refers to be accessed, perhaps via resolution
142 into an intermediate, more physical, name.
143
144 Unique Each object can only have one such name.
145 The fact that two such names are different
146 implies that the objects to which they refer
147 are different (in some way).
148
149 Unambiguous The fact that two names are identical
150 implies that the objects named are the same
151 (in some way).
152
153 The syntax discussed is the syntax of one name, be it a lasting
154 name or a physical address. When a directory server or hypertext
155 link contains a set of alternative names, then that is beyond the
156 scope of this syntax. Similarly, a syntax for describing a
157 compound object is outside the scope of this syntax. The specific
158 locator name spaces (defined under the umbrella of the general
159 syntax) each meet the requirements above to a greater or lesser
160 extent.
161
162 CURRENT PRACTICE
163
164 Current protocols use many different standards for names. For some
165 protocols, such as ISO-10163 Search and Retrieve protocol[16], the
166 names returned in a search are only valid during the session. For
167 others, such as FTP[9], they are lasting names which may be used
168 for object retrieval at a later time. Typically, however, they are
169 not long-lasting names which are independent of the location of the
170
171
172
173 Berners-Lee 3
174
175 object. Such names may be provided using directory servers such as
176 x.500. They will refer to the registration, however formal or
177 informal, of a object with a particular organisation or person.
178 Both hypertext and manual references rely on long- lasting names.
179 Current names are basically location specifiers (addresses). These
180 may be known as Uniform Resource Locators (URLs). They give the
181 necessary parts of an address for a reader to access an information
182 provider using the given protocol, and ask for the object required.
183 Examples of names used by various protocols include
184
185 File Transfer Protocol (Postel 1985):
186
187 Host name or IP-address
188
189 [TCP port]
190
191 [user name, password]
192
193 Filename
194
195 W.A.I.S. (Kahle 1990)
196
197 Host name or IP-address
198
199 [TCP port]
200
201 local document id
202
203 Gopher (Alberti 1991)
204
205 Host name or IP-address
206
207 [TCP port]
208
209 database name
210
211 selector string
212
213 HTTP (Berners-Lee 1991)
214
215 Host name or IP-address
216
217 [TCP port]
218
219 local object id
220
221 NNTP (Kantor 1986)
222
223 NNTP group
224
225 Group name
226
227 NNTP article
228
229
230
231 Berners-Lee 4
232
233 Host name
234
235 unique message identifier
236
237 Prospero links (Neuman 1992)
238
239 Host name or IP address
240
241 [UDP port]
242
243 Host specific object name
244
245 [version]
246
247 [identifier]*
248
249 x.500 distinguished name
250
251 Country
252
253 Organisation
254
255 Organisational unit
256
257 Person
258
259 Local object identifier
260
261 Other systems with their own naming schemes include BITNET
262 "LISTSERV" application, FTAM file retrieval, SQLnetTM remote
263 database search, proprietary distributed file systems, etc.
264 Conventional syntax for writing these addresses involve various
265 forms of punctuation to separate these parts. This sometimes, but
266 not always, allows the naming scheme to be deduced from the
267 punctuation. For example, a name of the form
268 xxx.yyy.zz.edu:/pub.aa.bb.cc often implies anonymous FTP access.
269 However, there is no well-defined algorithm for parsing an
270 arbitrary name, as there is no common syntax.
271
272 EXPANDABILITY
273
274 There will necessarily be a phase during which lasting names will
275 become more common, as the deployment of directory services
276 increases to the point where every user has direct or indirect
277 access to one. Even then, however, one can envisage more than one
278 competing directory system, and cases in which physical names are
279 still required. A directory service takes a lasting name and
280 reduces it to a physical address (or set of addresses) which,
281 though less useful for lasting reference, is the only way to
282 actually retrieve the object. An addressing syntax is required
283 which will be able to encompass existing physical address spaces,
284 and be extendible to any future protocols. This requires that it
285 contain an identifier for the protocol in use. The format of the
286
287
288
289 Berners-Lee 5
290
291 rest of the address will necessarily depend to a certain extent on
292 the protocol.
293
294 RELEVANCE
295
296 The life of a name is limited by any information contained within
297 it which may become prematurely invalid. It is therefore necessary
298 to limit the contents of a name to the information required for the
299 operations above. Other extraneous information about the object
300 (its size, data format, authorisation details, etc.) may in general
301 change with time and should not be part of the name. One might
302 expect such information to be part of the "header" of a object, and
303 for protocols to allow the header information to be retrieved
304 independently of the objects themselves. Any physical address may
305 be subject to change with time: hence we encourage the move to
306 lasting names and directory services.
307
308 UNIQUENESS
309
310 Clearly one requires unambiguous names in the sense that one name
311 should refer to only one logical object. This is the case with all
312 the addressing schemes in use, whether they are directory systems
313 or physical addresses. (The internet addresses all rely on the
314 domain name (Mockapetris 1987) of the host to achieve this).
315 However, given that names can be translated, many apparently
316 different names may lead to the same object. Any object may
317 therefore be referred to by many names. One needs to be able to
318 know whether two objects, retrieved through different paths, are
319 in fact the same object. It is suggested that each object have a
320 unique "official" name. This name could be stored in the object in
321 some representations, or stored in a database accessible to the
322 server, for example. Any references within that object should be
323 parsed in the context of the official name. In the presence of a
324 directory service, the official name will normally be the
325 registered name of the object. However, a name in any scheme will
326 do, so long as it is completely specified. On systems which do not
327 allow the name to be stored (such as anonymous FTP archive sites),
328 a possible ambiguity will always exist as to whether two similarly
329 named objects are in fact the same. Note that Internet newsgroup
330 names are unique world-wide, and news articles carry a unique
331 message id. In most other cases, however, there is no guarantee
332 that dereferencing a URL will work, or that if it does the object
333 it refers to will in fact be the object intended. URLs such as FTP
334 addresses are transient in that files may be moved and even
335 replaced by different files of the same name. This disorganisation
336 may be limited by good server management, but a naming scheme which
337 is independent also of internet host name is obviously preferable.
338
339 READABILITY BY PEOPLE
340
341 This requirement has been put forward by several people (Clifford
342 Lynch, Douglas Engelbart among others), and disputed by others.
343 The author's view is that it will be a while before technology and
344
345
346
347 Berners-Lee 6
348
349 standardisation have reached the point at which names and addresses
350 will be hidden from human beings. As long as they must be written
351 on the backs of envelopes and "cut and pasted" between workstation
352 windows, there is a strong need for names to be
353
354 Short
355
356 Composed of printable (preferably non-white) characters
357
358 To a certain extent, understadable by a human being.
359
360 STRUCTURE OF NAMES AND ADDRESSES
361
362 A physical address is required in order for:
363
364 The user's program to contact the server;
365
366 The server to perform the operation (e.g. search and index,
367 retrieve a object, or look up the name) and return a result;
368
369 The user's program to locate an individual position or element
370 within a returned object.
371
372 This suggests that a name be structured, such that the parts
373 necessary for these three operations be separate and only used by
374 those system elements which need those parts. This corresponds to
375 the basic principle of information hiding. In fact, four parts
376 are necessary, including the indicator of the naming scheme to be
377 used:
378
379 The naming scheme: a registered identifier for the protocol.
380
381 The name of a suitable server. The format of this part must be
382 well defined. It will depend on the lower-layer protocols in
383 use. Systems which use widely distributed information, such as
384 x.500 and NNTP, do not need this part as each client generally
385 contacts his nearest server (or a particular server).
386
387 Information to be passed to the server. This may be private to
388 the server, as all names may be generated and used by the same
389 server. This part of the name should be opaque to the client.
390
391 Information to be used by the application once the object has
392 been retrieved. This part is private to the application (or,
393 more strictly, the data format) and so cannot be defined here.
394
395 Both lasting names and physical addresses often share a
396 hierarchical structure. This follows often from the organisation of
397 the system. From the naming point of view, it has the advantage
398 that a reference in one object to another object need not include
399 that part of the structure which is common to both names.
400
401 CHOICES FOR A UNIVERSAL SYNTAX
402
403
404
405 Berners-Lee 7
406
407 The requirements above leave little room for choice save for the
408 order and punctuation of the elements of an address. It is only
409 reasonable for the order of writing of the parts to be consistently
410 from left to right (or right to left) with increasing specificity.
411 Punctuation schemes fall into two categories (Huitema 1991): tagged
412 schemes in which field are given names, and fields which use
413 special characters and field order. The latter tend to be more
414 compact schemes.
415
416
417 protocol: aftp host: xxx.yyy.edu path:
418
419 /pub/doc/README
420
421 PR=aftp; H=xx.yy.edu; PA=/pub/doc/README;
422
423 PR:aftp/xx.yy.edu/pub/doc/README
424
425 /aftp/xx.yy.edu/pub/doc/README
426
427 Fig 1. Some alternative tagged and untagged representations
428
429 The choice of special symbols for punctuation tends to be a matter
430 of taste. It is easier to read addresses whose symbols correspond
431 to those of one's favourite operating system. A variety of symbols
432 is needed so that when a name is abbreviated it is possible to tell
433 which parts have been omitted.
434
435 The recommendation below uses special characters in order to
436 achieve a compact name, and uses where possible punctuation symbols
437 established in the internet or unix community.
438
439 The choice of escape character for introducing representations of
440 non-allowed characters also tends to be a matter of taste. An ANSI
441 standard exists in the C language, using the back-slash character
442 "\". The use of this character on unix command lines, however, can
443 be a problem as it is interpreted by many shell programs, and would
444 have itself to be escaped.
445
446 There is a conflict between the need to be able to represent many
447 characters including spaces within a URL directly, and the need to
448 be able to use a URL in environments which have limited character
449 sets or in which certain characters are prone to corruption. This
450 conflict has been resolved by use of an hexadecimal escaping method
451 which may be applied to any characters forbidden in a given
452 context. When URLs are moved between contexts, the set of
453 characters escaped may be enlarged or reduced unambiguously.
454
455 The use of multiple white space characters is discouraged in URLs
456 to be printed or sent by electronic mail. This is because of the
457 frequent introduction of extraneous white space when lines are
458 wrapped by systems such as mail, or sheer necessity of narrow
459 column width, and because of the inter-conversion of various forms
460
461
462
463 Berners-Lee 8
464
465 of white space which occurs during character code conversion and
466 the transfer of text between applications.
467
468 Recommendations
469
470 This section describes the syntax for "Uniform Resource Locators"
471 (URLs): that is, basically physical addresses of objects which are
472 retrievable using protocols already deployed on the net. The
473 generic syntax provides a framework for new schemes for names to be
474 resolved using as yet undefined protocols.
475
476 The syntax is described in two parts. Firstly, we give the syntax
477 rules of a completely specified name; secondly, we give the rules
478 under which parts of the name may be omitted in a well-defined
479 context.
480
481 FULL FORM
482
483 A complete URL consists of a naming scheme specifier followed by a
484 string whose format is a function of the naming scheme. For
485 locators of information on the internet, a common syntax is used
486 for the IP address part. A BNF description of the URL syntax is
487 given in an a later section. The components are as follows.
488 Fragment identifiers and partial URLs are not involved in the basic
489 URL definition.
490
491 SCHEME
492
493 Within the URL of a object, the first element is the name of the
494 scheme, separated from the rest of the object by a colon. The rest
495 of the URL follows the colon in a format depending on the scheme.
496
497 Internet protocol parts
498
499 Those schemes which refer to internet protocols have a common
500 syntax for the rest of the object name. This starts with a double
501 slash "//" to indicate its presence, and continues until the
502 following slash "/". Within that section are
503
504 An optional user name,
505 if this must be quoted to the server,
506 followed by a commercial at sign "@". (Use
507 of this field is discouraged. Provision of
508 encoding a password after the user name,
509 delimited by a colon, could be made but
510 obviously is only useful when the password is
511 public, in which case it should not be
512 necessary, so that is also discouraged.)
513
514 The internet domain name
515 of the host in RFC1037 format (or,
516 optionally and less advisably, the IP address
517 as a set of four decimal digits)
518
519
520
521 Berners-Lee 9
522
523 The port number, if it is not the default number for the
524 protocol, is given in decimal notation after
525 a colon.
526
527 Path The rest of the locator is known as the
528 "path". It may define details of how the
529 client should communicate with the server,
530 including information to be passed
531 transparently to the server without any
532 processing by the client.
533
534 The path is interpreted in a manner dependent on the protocol being
535 used. However, when it contains slashes, these must imply a
536 hierarchical structure.
537
538 ENCODING PROHIBITED CHARACTERS
539
540 When a system uses a local addressing scheme, it is useful to
541 provide a mapping from local addresses into URLs so that references
542 to objects within the addressing scheme may be referred to
543 globally, and possibly accessed through gateway servers.
544
545 Any mapping scheme may be defined provided it is unambiguous,
546 reversible, and provides valid URLs. It is recommended that where
547 hierarchical aspects to the local naming scheme exist, they be
548 mapped onto the hierarchical URL path syntax in order to allow the
549 partial form to be used.
550
551 The following encoding method shall be used for mapping WAIS, FTP,
552 Prospero and Gopher addresses onto URLs. Where the local naming
553 scheme uses ASCII characters which are not allowed in the URL,
554 these may be represented in the URL by a percent sign "%" followed
555 by two hexadecimal digits (0-9, A-F) giving the ISO Latin 1 code
556 for that character. Character codes other than those allowed by
557 the syntax shall not be used in a URL.
558
559 The same encoding method may be used for encoding characters whose
560 use, although technically allowed in a URL, would be unwise due to
561 problems of corruption by imperfect gateways or misrepresentation
562 due to the use of variant character sets, or which would simply be
563 awkward in a given environment. As a % sign always indicates an
564 encoded character, a URL may be made safer simply by encoding any
565 characters considered unsafe, while leaving already encoded
566 characters still encoded.
567
568 (Note: If a new naming scheme is introduced which encodes binary
569 data as opposed to text, then a more compact encoding such as pure
570 hex or base 64 would be more appropriate.)
571
572 The same considerations apply to mapping local fragment identifiers
573 onto the fragmentid part of a URL.
574
575 Specific Schemes
576
577
578
579 Berners-Lee 10
580
581 The mapping for some existing standard and experimental protocols
582 is outlined in the BNF syntax definition . Notes on particular
583 protocols follow.
584
585 HTTP
586
587 The HTTP protocol specifies that the path is handled transparently
588 by those who handle URLs, except for the servers which de-reference
589 them. The path is passed by the client to the server with any
590 request, but is not otherwise understood by the client. The
591 fragmentid part is not sent with the request. The search part, if
592 present, is sent. Spaces in URLs should be escaped for transmission
593 in HTTP.
594
595 FTP
596
597 The ftp: prefix indicates a file which is to be picked up from the
598 file system of the given host. The FTP protocol is used. The port
599 number if given gives the port of the FTP server if not the FTP
600 default. (A client may in practice use local file access to
601 retrieve objects which are available though more efficient means
602 such as local file open or NFS mounting, where this is available
603 and equivalent).
604
605 The syntax allows for the inclusion of a user name and even a
606 password for those systems which do not use the anonymous FTP
607 convention. The default, however, if no user or password is
608 supplied, will be to use that convention, viz. that the user name
609 is "anonymous" and the password the user's mail address.
610
611 The adoption of a unix-style syntax involves the conversion into
612 non-unix local forms by either the client or server. Some non-unix
613 servers do this, but clients wishing to access sites which do not
614 have unix-style naming will need certain algorithms to enable
615 other file systems to be identified and treated. Client software
616 may also have to be flexible in terms of the sequence of FTP
617 commands used with different varieties of server. In view of a
618 tendency for file systems to look increasingly similar, it was felt
619 that the URL convention should not be weighed down by extra
620 mechanisms for identifying these cases.
621
622 The data format of a file can only, in the general FTP case, be
623 deduced from the name, normally the suffix of the name. This is not
624 standardized. An alternative is for it to be transferred in
625 information outside the URL. The transfer mode (binary or text)
626 must in turn be deduced from the data format. It is recommended
627 that conventions for suffixes of public archives be established,
628 but it outside the scope of this paper.
629
630 NEWS
631
632 The news locators refer to either news group names or article
633 message identifiers which must conform to the rules of RFC 850. A
634
635
636
637 Berners-Lee 11
638
639 message identifier may be distinguished from a news group name by
640 the presence of the commercial at "@" character. These rules imply
641 that within an article, a reference to a news group or to another
642 article will be a valid URL (in the partial form).
643
644 A news URL may be dereferenced using NNTP or using any other
645 protocol for the conveyance of usenet news articles.
646
647 Note1:
648
649 Among URLs the news: URLs are anomalous in that they are
650 location-independent. They are unsuitable as URN candidates because
651 the NNTP architecture relies on the expiry of articles and
652 therefore a small number of articles being available at any time.
653 When a news: URL is quoted, the assumption is that the reader will
654 fetch the article or group from his or her local news host. News
655 host names are NOT part of news URLs.
656
657 Note 2:
658
659 An outstanding problem is that the message identifier is
660 insufficient to allow the retrieval of an expired article, as no
661 algorithm exists for deriving an archive site and file name. The
662 addition of the date and news group set to the article's URL would
663 allow this if a directory existed of archive sites by news group.
664 Suggested subject of study in conjunction with NNTP WG. Further
665 extension possible may be to allow the naming of subject threads as
666 addressable objects.
667
668 NNTP
669
670 This is an alternative form of reference for news articles,
671 specifically to be used with NNTP servers, and particularly those
672 incomplete server implementations which do not allow retrieval by
673 message identifier.
674
675 The news server name, newsgroup name, and index number of an
676 article within the newsgroup on that particular server are given.
677
678 Note1.
679
680 This form of URL is not of global accessiablity, as typically NNTP
681 servers only allow access from local clients. This form or URL
682 should not be quoted outside this local area. It should not be
683 used within news articles for wider circulation than the one
684 server.
685
686 WAIS
687
688 The current WAIS implementation public domain requires that a
689 client know the "type" of a object prior to retrieval. This value
690 is returned along with the internal object identifier in the search
691 response. It has been encoded into the path part of the URL in
692
693
694
695 Berners-Lee 12
696
697 order to make the URL sufficient for the retrieval of the object.
698 Within the WAIS world, names do not of course not need to be
699 prefixed by "wais:" (by the partial form rules).
700
701 PROSPERO
702
703 The Prospero (Neuman, 1991) directory service is used to resolve
704 the URL yielding an access method for the object (which can then
705 itself be represented as a URL if translated). The host part
706 contains a host name or internet address. The port part is
707 optional.
708
709 The path part contains a host specific object name and an optional
710 version number. If present, the version number is seperated from
711 the host specific object name by the characters "%00" (percent
712 zero zero), this being an escaped string terminator (null).
713 External Prospero links are represented as URLs of the underlying
714 access method and are not represented as Prospero URLs.
715
716 GOPHER
717
718 The first character of the URL path part (after the initial single
719 slash) is a single-character "type" field which is that used by the
720 Gopher protocol. The rest of the path is the "selector string",
721 with disallowed characters encoded. Note that some selector strings
722 begin with a copy of the gopher type character, in which case that
723 character will occur twice consecutively in the URL. If the type
724 character and selector are omitted, the type defaults to "1".
725 Gopher links which refer to non-Gopher protocols are represented
726 directly as URLs of the underlying access method and are not
727 represented as Gopher URLs.
728
729 MAILTO
730
731 This allows a URL to specify an RFC822 addr-spec mail address.
732 Note that use of % , for example as used in forming a gatewayed
733 mail address, requires conversion to %25 in a URL.
734
735 This semantics may be considered to be that the object referred to
736 by the mailto: URL is the set of messages sent to or from that
737 address. There is no algorithm to retrieve this set, but the SMTP
738 protocol allows messages to be added to it, and any given user may
739 be aware of a subset of its members.
740
741 TELNET, RLOGIN, TN3270
742
743 The use of URLs to represent interactive sessions is a convenient
744 extension to their uses for objects. This allows access to
745 information systems which only provide an interactive service, and
746 no information server. As information within the service cannot be
747 addressed individually or, in general, automatically retrieved,
748 this is a less desirable, though currently common, solution.
749
750
751
752
753 Berners-Lee 13
754
755 X500
756
757 The mapping of x500 names onto URLs is not defined here. A decision
758 is required as to whether "distinguished names" or "user friendly
759 names" (ufn), or both, should be allowed. If any punctuation
760 conversions are needed from the adopted x500 representation (such
761 as the use of slashes between parts of a ufn) they must be defined.
762 This is a subject for study.
763
764 WHOIS
765
766 This prefix describes the access using the "whois++" scheme in the
767 process of definition. The host name part is the same as for other
768 IP based schemes. The path part can be either a whois handle for a
769 whois object, or it can be a valid whois query string. This is a
770 subject for further study.
771
772 NETWORK MANAGEMENT DATABASE
773
774 This is a subject for study.
775
776 REGISTRATION OF NAMING SCHEMES
777
778 A new naming scheme may be introduced by defining a mapping onto a
779 conforming URL syntax, using a new scheme identifier. Experimental
780 scheme identifiers may be used by mutual agreement between parties,
781 and must start with the characters "x-". The scheme name "urn:" is
782 reserved for the work in progress on a scheme for more persistent
783 names. Therefore URNs (Names) and URLs (Locators) be
784 distinguishable. An object which is either a URL or a URN is known
785 as a URI (Identifier).
786
787 It is proposed that the Internet Assigned Numbers Authority (IANA)
788 perform the function of registration of new schemes. Any submission
789 of a new URI scheme must include a definition of an algorithm for
790 the retrieval of any object within that scheme. The algorithm must
791 take the URI and produce either a set of URL(s) which will lead to
792 the desired object, or the object itself, in a well-defined or
793 determinable format.
794
795 It is recommended that those proposing a new scheme demonstrate its
796 utility and operability by the provision of a gateway which will
797 provide images of objects in the new scheme for clients using an
798 existing protocol. If the new scheme is not a locator scheme, then
799 the properties of names in the new space should be clearly defined.
800 It is likewise recommended that, where a protocol allows for
801 retrieval by URI, that the client software have provision for being
802 configured to use specific gateway locators for indirect access
803 through new naming schemes.
804
805 BNF syntax
806
807 This is a BNF-like description of the Uniform Resource Locator
808
809
810
811 Berners-Lee 14
812
813 syntax. A vertical line "|" indicates alternatives, and
814 [brackets] indicate optional parts. Spaces are representated by
815 the word "space", and the vertical line character by "vline".
816 Single letters stand for single letters. All words of more than one
817 letter below are entities described somewhere in this description.
818
819 The current IETF URI working group prefereence is for the
820 prefiexedurl production. (Nov 1993. July 93: url).
821
822 The "generic" production gives a higher level parsing of the same
823 URLs as the other productions. The "national" and "punctuation"
824 characters fo not appear in any productions and therefore may not
825 appear in URLs.
826
827 The "afsaddress" is left in as historical note, but is not a url
828 production
829
830 prefixedurl u r l : url
831
832 fragmentaddress uri [ # fragmentid ]
833
834 uri url | generic
835
836 ur l generic | httpaddress | ftpaddress |
837 newsaddress | nntpaddress | prosperoaddress |
838 telnetaddress | gopheraddress | waisaddress
839 | mailtoaddress
840
841 generic scheme : path [ ? search ]
842
843 scheme ialpha
844
845 httpaddress h t t p : / / hostport [ / path ] [ ?
846 search ]
847
848 ftpaddress f t p : / / login / path
849
850 afsaddress a f s : / / cellname / path
851
852 newsaddress n e w s : groupart
853
854 nntpaddress n n t p : group / digits
855
856 mailtoaddress m a i l t o : : xalphas @ hostname
857
858 waisaddress waisindex | waisdoc
859
860 waisindex w a i s : / / hostport / database [ ? search
861 ]
862
863 waisdoc w a i s : / / hostport / database / wtype /
864 path
865
866
867
868
869 Berners-Lee 15
870
871 groupart * | group | article
872
873 group ialpha [ . group ]
874
875 article xalphas @ host
876
877 database xalphas
878
879 wtype xalphas
880
881 prosperoaddress prosperolink
882
883 prosperolink p r o s p e r o : / / hostport / hsoname [ %
884 0 0 version [ attributes ] ]
885
886 hsoname path
887
888 version digits
889
890 attributes attribute [ attributes ]
891
892 attribute alphanums
893
894 telnetaddress t e l n e t : / / login
895
896 gopheraddress g o p h e r : / / hostport [/ gtype [
897 selector ] ] [ ? search ]
898
899 login [ user [ : password ] @ ] hostport
900
901 hostport host [ : port ]
902
903 host hostname | hostnumber
904
905 cellname hostname
906
907 hostname ialpha [ . hostname ]
908
909 hostnumber digits . digits . digits . digits
910
911 port digits
912
913 selector path
914
915 path void | xpalphas [ / path ]
916
917 search xalphas [ + search ]
918
919 user xalphas
920
921 password xalphas
922
923 fragmentid xalphas
924
925
926
927 Berners-Lee 16
928
929 gtype xalpha
930
931 xalpha alpha | digit | safe | extra | escape
932
933 xalphas xalpha [ xalphas ]
934
935 xpalpha xalpha | +
936
937 xpalphas xpalpha [ xpalpha ]
938
939 ialpha alpha [ xalphas ]
940
941 alpha a | b | c | d | e | f | g | h | i | j | k |
942 l | m | n | o | p | q | r | s | t | u | v |
943 w | x | y | z | A | B | C | D | E | F | G |
944 H | I | J | K | L | M | N | O | P | Q | R |
945 S | T | U | V | W | X | Y | Z
946
947 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
948
949 safe $ | - | _ | @ | . | & | + | -
950
951 extra ! | * | " | ' | ( | ) | : | ; | , | space
952
953 escape % hex hex
954
955 hex digit | a | b | c | d | e | f | A | B | C |
956 D | E | F
957
958 national { | } | vline | [ | ] | \ | ^ | ~
959
960 punctuation < | >
961
962 digits digit [ digits ]
963
964 alphanum alpha | digit
965
966 alphanums alphanum [ alphanums ]
967
968 void
969
970 Security considerations
971
972 The URL scheme does not in itself pose a security threat. Users
973 should beware that there is no general guarantee that a URL which
974 at one time points to a given object continues to do so, and does
975 not even at some later time point to a different object due to the
976 movement of objects on servers.
977
978 The use of URLs containing passwords is clearly unwise.
979
980 Conclusion
981
982
983
984
985 Berners-Lee 17
986
987 A need has been demonstrated, and a number of requirements have
988 been stated for uniform resource locators (URLs). A scheme has been
989 proposed which builds on existing conventions to define a syntax
990 for URLs. This scheme has been in serious use by World-Wide Web
991 (W3) initiative since 1991. Adoption of the scheme in
992 correspondence, standards and software will ease the use of
993 references to on-line information in a flexible way as the coming
994 information age arrives.
995
996 Acknowledgements
997
998 This paper builds on the basic W3 design and much discussion of
999 these issues by many people on the network. The discussion was
1000 particularly stimulated by articles by Clifford Lynch (1991),
1001 Brewster Kahle (1991) and Wengyik Yeong (1991b). Contributions from
1002 John Curran (NEARnet), Clifford Neuman (ISI) Ed Vielmetti (MSEN)
1003 and later the IETF URL BOF and URI working group have been
1004 incorporated into this issue of this paper.
1005
1006 The draft url4 (Internet Draft 00) was generated from url3
1007 following discussion and overall approval of the URL working group
1008 on 29 March 1993. The paper url3 had been generated from udi2 in
1009 the light of discussion at the UDI BOF meeting at the Boston IETF
1010 in July 1992. Draft url4 was Internet Draft 00. Draft url5
1011 incorporated changes suggested by Clifford Neuman, and draft url6
1012 (ID 01) incorporated character group changes and a few other fixes
1013 defined by the IETF URI WG in submitting it as a proposed standard.
1014 URL7 (Internet Draft 02) incorporated changes introduced at the
1015 Amsterdam IETF and refined in net discussion.
1016
1017 APPENDICES
1018
1019 The following are not formally part of this document.
1020
1021 Fragment-id
1022
1023 This represents a part of, fragment of, or a sub-function within,
1024 an object or object. Its syntax and semantics are defined by the
1025 application responsible for the object, or the specification of the
1026 content type of the object. The only definition here is of the
1027 allowed characters by which it may be represented in a URL.
1028
1029 The fragment-id follows the URL of the whole object from which it
1030 is separated by a hash sign (#). If the fragment-id is void, the
1031 hash sign may be omitted: A void fragment-id with or without the
1032 hash sign means that the URL refers to the whole object.
1033
1034 While this hook is allowed for identification of fragments, the
1035 question of addressing of parts of objects, or of the grouping of
1036 objects and relationship between contined and containing objects,
1037 is not addressed by this object.
1038
1039 This object does not address the question of objects which are
1040
1041
1042
1043 Berners-Lee 18
1044
1045 different versions of a "living" object, nor of expressing the
1046 relationships between different versions and the living object.
1047
1048 Partial form
1049
1050 In a certain limited set of cases, generally within a certain
1051 application, it may be useful to pass only a section of the URL.
1052 Within a object whose URL is well defined, the URL of another
1053 object may be given in abbreviated form, where parts of the two
1054 URLs are the same. This allows objects within a group to refer to
1055 each other without requiring the space for a complete reference,
1056 and it incidentally allows the group of objects to be moved
1057 without changing any references. This is not discussed in detail
1058 here, it is only mentioned so that the characters required by the
1059 technique be reserved for that purpose. It must be emphasised that
1060 when a reference is passed in anything other than a well controlled
1061 context, the full form must always be used.
1062
1063 The partial form relies on a property of the URL syntax that
1064 certain characters ("/") and certain path elements ("..", ".") have
1065 a significance reserved for representing a hierarchical space, and
1066 must be recognised as such by both clients and servers.
1067
1068 A partial form can be distinguished from a full form in that a full
1069 form must have a colon and that colon must occur before any slash
1070 characters.
1071
1072 The rules for the use of a partial name are:
1073
1074 If the scheme parts are different, the whole absolute locator
1075 must be given. Otherwise, the scheme is omitted, and:
1076
1077 If the host and/or port parts are the different, the host, port
1078 name and all the rest of the locator must be given.
1079
1080 If the access and host parts are the same, then the path may be
1081 given in absolute (fully qualified) or relative form. Within the
1082 path:
1083
1084 If a leading slash is present, the path is absolute. Otherwise,
1085 a relative path is interpreted as follows:
1086
1087 The last part of the path of the context locator (anything
1088 following the rightmost slash) is removed, and the given partial
1089 URL appended in its place.
1090
1091 Within the result, all occurrences of "xxx/../" or "/." are
1092 recursively removed, where xxx, ".." and "." are complete path
1093 elements.
1094
1095 Note: If a path of the context locator end in slash, partial URLs
1096 will be treated differently to their treatment with respect to the
1097 same path without a slash. Using a trailing slash on a directory
1098
1099
1100
1101 Berners-Lee 19
1102
1103 name is not therefore recommended. The signifcance of a trailing
1104 slash may be considered as that of the locator of a file with void
1105 name within that directory.
1106
1107 Wrappers for URIs in plain text
1108
1109 This section does not formally form part of the URL specification.
1110
1111 URIs, including URLs, will ideally be transmitted though protocols
1112 which accept them and data formats which define a context for them.
1113 However, in practice nowadays there are many occasions when URLs
1114 are included in plain ASCII non-marked-up text such as electronic
1115 mail and usenet news messages.
1116
1117 In this case, it is convenient to have a separate wrapper syntax to
1118 define delimiters which will enable the human or automated reader
1119 to recognize that the URI is a URI.
1120
1121 The recommendation is that the angle brackets (less than and
1122 greater than signs) of the ASCII set be used for this purpose.
1123
1124 These wrappers do not form part of the URL, are not mandatory, and
1125 should not be used in contexts (such as SGML parameters, HTTP
1126 requests, etc) in which delimiters are already specified.
1127
1128 Example
1129
1130 Yes, Jim, I found it under <ftp://info.cern.ch/pub> bu
1131 t
1132 you can probably pick it up from <ftp://ds.internic.ne
1133 t/rfc>.
1134
1135
1136 REFERENCES
1137
1138 Alberti, R., et.al. (1991)
1139 "Notes on the Internet Gopher Protocol"
1140 University of Minnesota, December 1991,
1141 <ftp://boombox.micro.umn.edu/pub/gopher/
1142 gopher_protocol> . See also
1143 <gopher://gopher.micro.umn.edu/00/Information
1144 About Gopher/About Gopher>
1145
1146 Berners-Lee, T ., (1991)
1147 "Hypertext Transfer Protocol (HTTP)" , CERN,
1148 December 1991,
1149 <ftp://info.cer
1150 n.ch/pub/www/doc/http-spec.txt>
1151
1152 Crocker "Standard for ARPA Internet Text Messages" .
1153 David H. Crocker, RFC822,
1154
1155 Davis, F, et al., (1990)
1156
1157
1158
1159 Berners-Lee 20
1160
1161 "WAIS Interface Protocol: Prototype
1162 Functional Specification", Thinking Machines
1163 Corporation, April 23, 1990
1164 <ftp://quake.think.com/pub/wa
1165 is/doc/protspec.txt>
1166
1167 International Standards Organization, (1991)
1168 Information and Documentation - Search and
1169 Retrieve Application Protocol Specification
1170 for open Systems Interconnection, ISO-10163
1171
1172 Huitema, C., (1991) "Naming: strategies and techniques",
1173 Computer Networks and ISDN Systems 23 (1991)
1174 107-110.
1175
1176 Kahle, Brewster, (1991)
1177 "Document Identifiers, or International
1178 Standard Book Numbers for the Electronic
1179 Age",
1180 <ftp:
1181 //quake.think.com/pub/wais/doc/doc-ids.txt>
1182
1183 Kantor, B., and Lapsley, P., (1986)
1184 "A proposed standard for the stream-based
1185 transmission of news", Internet RFC-977,
1186 February 1986.
1187 <ftp://ds.internic.net/rfc/rfc977.txt>
1188
1189 Lynch, C., Coallition for Networked Information: (1991)
1190 "Workshop on ID and Reference Structures for
1191 Networked Information", November 1991. See
1192 <wais://quake.think.com/wais-discussion-ar
1193 chives?lynch>
1194
1195 Mockapetris, P., (1987)
1196 "Domain names + concepts and facilities",
1197 RFC-1034, USC-ISI, November 1987,
1198 <ftp://ds.internic.net/rfc/rfc1034.txt>
1199
1200 Neuman, B. Clifford, (1992)
1201 "Prospero: A Tool for Organizing Internet
1202 Resources", Electronic Networking: Research,
1203 Applications and Policy, Vol 1 No 2, Meckler
1204 Westport CT USA. See also
1205 <ftp://prospero.isi.edu/pub/prospero/oir.ps>
1206
1207 Postel, J. and Reynolds, J. (1985)
1208 "File Transfer Protocol (FTP)", Internet
1209 RFC-959, October 1985.
1210 <ftp://ds.internic.net/rfc/rfc959.txt>
1211
1212 Yeong, W., (1991a) "Towards Networked Information Retrieval",
1213 Technical report 91-06-25-01, June 1991,
1214
1215
1216
1217 Berners-Lee 21
1218
1219 Performance Systems International, Inc.
1220 <ftp://uu.psi.com/wp/nir.txt>
1221
1222 Yeong, W., (1991b), "Representing Public Archives in the
1223 Directory", Internet Draft, November 1991,
1224 now expired.
1225
1226 Author's address
1227
1228
1229 Tim Berners-Lee
1230 Address: World-Wide Web project
1231 CERN,
1232 1211 Geneva 23,
1233 Switzerland
1234
1235 Telephone: +41 (22)767 3755
1236 Fax: +41 (22)767 7155
1237 Email: timbl@info.cern.ch
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276 Berners-Lee 22
1277

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24