/[suikacvs]/webroot/www/2004/id/draft-uri-url-03.txt
Suika

Contents of /webroot/www/2004/id/draft-uri-url-03.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 Uniform Resource Locators (URL) Tim Berners-Lee
2 draft-ietf-uri-url-03.{ps,txt} URI working Group
3 Expires 21 September 1994 21 March 1994
4
5
6 Uniform Resource Locators (URL)
7
8 A Syntax for the Expression of
9 Access Information of Objects on the Network
10
11
12 ABOUT THIS DOCUMENT
13
14 This document specifies a Uniform Resource Locator (URL), the
15 syntax and semantics of formalized information for location and
16 access of resources on the Internet.
17
18 This document was written by the URI working group of the Internet
19 Engineering Task Force. Comments may be addressed to the editor,
20 Tim Berners-Lee <timbl@info.cern.ch>, or to the URI-WG
21 <uri@bunyip.com>. Discussions of the group are archived at
22
23 <http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>
24
25 This document is bound by the Requirements Specification in
26 preparation.
27
28 The work is derived from concepts introduced by the World-Wide Web
29 global information initiative, whose use of such objects dates
30 from 1990 and is described in "Universal Resource identifeirs for
31 the World-Wide Web", RFCXXX .
32
33 This document is available in hypertext form, with links to
34 background information, as:
35
36 <http://info.cern.ch/hypertext/WWW/Addressing/URL/Overview.html>
37
38 .
39
40 STATUS OF THIS MEMO
41
42 This document is an Internet Draft. Internet Drafts are working
43 documents of the Internet Engineering Task Force (IETF), its Areas,
44 and its Working Groups. Note that other groups may also distribute
45 working documents as Internet Drafts.
46
47 Internet Drafts are working documents valid for a maximum of six
48 months. Internet Drafts may be updated, replaced, or obsoleted by
49 other documents at any time. It is not appropriate to use Internet
50 Drafts as reference material or to cite them other than as a
51 "working draft" or "work in progress".
52
53 Distribution of this document is unlimited.
54
55
56
57 Berners-Lee 1
58
59 Recommendations
60
61 This section describes the syntax for "Uniform Resource Locators"
62 (URLs): that is, basically physical addresses of objects which are
63 retrievable using protocols already deployed on the net. The
64 generic syntax provides a framework for new schemes for names to be
65 resolved using as yet undefined protocols.
66
67 The syntax is described in two parts. Firstly, we give the syntax
68 rules of a completely specified name; secondly, we give the rules
69 under which parts of the name may be omitted in a well-defined
70 context.
71
72 URL SYNTAX
73
74 A complete URL consists of a naming scheme specifier followed by a
75 string whose format is a function of the naming scheme. For
76 locators of information on the internet, a common syntax is used
77 for the IP address part. A BNF description of the URL syntax is
78 given in an a later section. The components are as follows.
79 Fragment identifiers and partial URLs are not involved in the basic
80 URL definition.
81
82 PrePrefix
83
84 To be a Uniform Resource Locator as currently defined by the URI
85 working group, the whole string must start with a constant prefix
86 "URL:". Note that to save space in this document, some URLs may
87 have been quoted throughout without this preprefix.
88
89 Scheme
90
91 Within the URL of a object, the first element is the name of the
92 scheme, separated from the rest of the object by a colon. The rest
93 of the URL follows the colon in a format depending on the scheme.
94
95 Internet protocol parts
96
97 Those schemes which refer to internet protocols mostly have a
98 common syntax for the rest of the object name. This starts with a
99 double slash "//" to indicate its presence, and continues until the
100 following slash "/". Within that section are
101
102 An optional user name,
103 if required (as it is with a few FTP
104 servers). The password, is present, follows
105 the user name, separated from it by a colon;
106 the user name and optional password are
107 followed by a commercial at sign "@". The
108 user of user name and passwords which are
109 public is discouraged.
110
111 The internet domain name
112
113
114
115 Berners-Lee 2
116
117 of the host in RFC1037 format (or,
118 optionally and less advisably, the IP address
119 as a set of four decimal digits)
120
121 The port number, if it is not the default number for the
122 protocol, is given in decimal notation after
123 a colon.
124
125 Path The rest of the locator is known as the
126 "path". It may define details of how the
127 client should communicate with the server,
128 including information to be passed
129 transparently to the server without any
130 processing by the client.
131
132 The path is interpreted in a manner dependent on the scheme being
133 used. Generally, the slash "/" (ASCII 2F hex) denotes a level in a
134 hierarchical structure, the higher level part to the left of the
135 slash.
136
137 ENCODING PROHIBITED CHARACTERS
138
139 When a system uses a local addressing scheme, it is useful to
140 provide a mapping from local addresses into URLs so that references
141 to objects within the addressing scheme may be referred to
142 globally, and possibly accessed through gateway servers.
143
144 Any mapping scheme may be defined provided it is unambiguous,
145 reversible, and provides valid URLs. It is recommended that where
146 hierarchical aspects to the local naming scheme exist, they be
147 mapped onto the hierarchical URL path syntax in order to allow the
148 partial form to be used.
149
150 The following encoding method shall be used for mapping WAIS, FTP,
151 Prospero and Gopher addresses onto URLs. Where the local naming
152 scheme uses characters which are not allowed in the URL, these may
153 be represented in the URL by a percent sign "%" followed by two
154 hexadecimal digits (0-9, A-F) giving the ISO Latin 1 code for that
155 character. Character codes other than those allowed by the syntax
156 shall not be used unencoded in a URL.
157
158 The same encoding method may be used for encoding characters whose
159 use, although technically allowed in a URL, would be unwise due to
160 problems of corruption by imperfect gateways or misrepresentation
161 due to the use of variant character sets, or which would simply be
162 awkward in a given environment. Because a % sign always indicates
163 an encoded character, a URL may be made safer simply by encoding
164 any characters considered unsafe, while leaving already encoded
165 characters still encoded. Similarly, in cases where a larger set
166 of characters is acceptable, % signs can be selectively and
167 reversibly expanded.
168
169 Specific Schemes
170
171
172
173 Berners-Lee 3
174
175 The mapping for some existing standard and experimental protocols
176 is outlined in the BNF syntax definition . Notes on particular
177 protocols follow. The schemes covered are
178
179 http Hypertext Transfer Protocol
180
181 ftp File Transfer protocol
182
183 gopher The Gopher protocol
184
185 mailto Electronic mail address
186
187 mid Message identifiers for electronic mail
188
189 cid Content identifiers for MIME body part
190
191 news Usenet news
192
193 nntp Usenet news for local NNTP access only
194
195 prospero Access using the prospero protocols
196
197 telnet , rlogin and tn3270
198 Reference to interactive sessions
199
200 wais Wide Area Information Servers
201
202 Other schemes may be specified by future specifications
203
204 New schemes may be registered at a later time.
205
206 FTP
207
208 The ftp: prefix indicates that the FTP protocol is used, as defined
209 in RFC957 or any successor. The port number, if present, gives the
210 port of the FTP server if not the FTP default.
211
212 User name and password
213
214 The syntax allows for the inclusion of a user name and even a
215 password for those systems which do not use the anonymous FTP
216 convention. The default, however, if no user or password is
217 supplied, will be to use that convention, viz. that the user name
218 is "anonymous" and the password the user's Internet-style mail
219 address .
220
221 Where possible, this mail address should correspond to a usable
222 mail address for the user, and preferably give a DNS host name
223 which resolves to the IP address of the client. Note that servers
224 currently vary in their treatment of the anonymous password.
225
226 Path
227
228
229
230
231 Berners-Lee 4
232
233 The FTP protocol allows for a sequence of CWD commands (change
234 working directory) and a TYPE command prior to service commands
235 such as RETR (retrieve) or NLIST (etc) which actually access a
236 file.
237
238 The arguments of any CWD commands are successive segment parts of
239 the URL delimited by slash, and the final segment is suitable as
240 the filename argument to the RETR command for retrieval or the
241 directory argument to NLIST.
242
243 For some file systems (Unix in particular), the "/" used to denote
244 the hierarchical structure of the URL corresponds to the delimiter
245 used to construct a file name hierarchy, and thus, the filename
246 will look the same as the URL path. This does NOT mean that the URL
247 is a Unix filename.
248
249 Note: Retrieving subsequent URLs from the same host
250
251 There is no common hierarchical model to the FTP protocol, so if a
252 directory change command has been given, it is impossible in
253 general to deduce what sequence should be given to navigate to
254 another directory for a second retrieval, if the paths are
255 different. The only reliable algorithm is to disconnect and
256 reestablish the control connection.
257
258 Data type
259
260 The data content type of a file can only, in the general FTP case,
261 be deduced from the name, normally the suffix of the name. This is
262 not standardized. An alternative is for it to be transferred in
263 information outside the URL. A suitable FTP transfer type (for
264 example binary "I" or text "A") must in turn be deduced from the
265 data content type. It is recommended that conventions for suffixes
266 of public archives be established, but it is outside the scope of
267 this standard.
268
269 An FTP URL may optionally specify the FTP data transfer type by
270 which an object is to be retrieved. Two of the methods correspond
271 to the FTP "Data Types" ASCII and IMAGE for the retrieval of a
272 document, as specified in FTP by the TYPE command . One method
273 indicates directory access.
274
275 The data type is specified by a suffix to the URL. Possible
276 suffixes are:
277
278 ;type = <type-code> Use FTP type as given to perform data
279 transfer.
280
281 ;type=d Use FTP directory list commands to read
282 directory
283
284 The type code is in the format defined in RFC959.
285
286
287
288
289 Berners-Lee 5
290
291 Transfer Mode
292
293 Stream Mode is always used.
294
295 HTTP
296
297 The HTTP protocol specifies that the path is handled transparently
298 by those who handle URLs, except for the servers which de-reference
299 them. The path is passed by the client to the server with any
300 request, but is not otherwise understood by the client. The
301 fragmentid part is not sent with the request. The search part, if
302 present, is sent. Spaces and control characters in URLs must be
303 escaped for transmission in HTTP.
304
305 GOPHER
306
307 Gopher selector strings are, in general, interpreted as a sequence
308 of 8-bit bytes which may contain any characters other than tab,
309 return, or linefeed. It is necessary to encode any characters
310 disallowed in a URL, including spaces and other binary data not in
311 the allowed character set, using the standard convention of the "%"
312 character followed by two hexadecimal digits.
313
314 Note that slash "/" in gopher selector strings may not correspond
315 to a level in a hierarchical structure.
316
317 The format of a gopher URL is:
318
319 1. A single-character field to denote the Gopher type of the
320 resource to which the URL refers.
321
322 2. The gopher selector string. Note that some gopher selector
323 strings begin with a copy of the gopher type character, in which
324 case that character will occur twice consecutively. Also note
325 that the gopher selector string may be an empty string since
326 this is how gopher clients refer to the top-level directory on
327 a gopher server.
328
329 If the URL does not refer to a Gopher+ item and if there is no
330 gopher search string then parts 3, 4, 5, and 6 of the URL are
331 optional
332
333 3. An encoded tab character (%09) to seperate the gopher
334 selector string from the optional search string (see 4 below).
335
336 4. The gopher search string. If the URL refers to a search to
337 be submitted to a gopher search engine, the search string is
338 required. Otherwise this is an empty string.
339
340 5. An encoded tab character (%09) to seperate the gopher search
341 string from the optional gopher+ string (see 6 below).
342 [suggestion: Note that if the URL refers to a gopher+ item and
343 does not have a gopher search string, there will be two encoded
344
345
346
347 Berners-Lee 6
348
349 tab characters in a row.]
350
351 6. The Gopher+ string. Gopher+ strings consist of a one or more
352 characters and are used to represent information required for
353 retrieval of the Gopher+ item. Gopher+ items may have alternate
354 views, arbitrary sets of attributes, and may have electronic
355 forms associated with them. To accomodate the various Gopher+
356 objects, the Gopher+ string in the URL must accomodate a
357 mapping of the information a Gopher+ client sends to the server.
358 This makes this section a bit long since we basically cover the
359 entire Gopher+ protocol here.
360
361 When a Gopher server returns a directory listing to a client,
362 Gopher+ items are tagged with either a "+" (denoting gopher+ items)
363 or a "?" (denoting items which have a +ASK form associated with
364 them). A Gopher+ string which is only a "+" refers to the default
365 view (data representation) of the item. To retrieve this item a
366 gopher+ client should send
367
368 a_gopher_selector<tab>+<cr><lf>
369
370 to the gopher+ server.
371
372 Note that items which have a +ASK asssociated with them (ie.
373 Gopher+ items tagged with a "?") require the client to fetch the
374 item's +ASK attribute to get the form definition, and then ask the
375 user to fill out the form and return the user's responces along
376 with the selector string to retrieve the item. Gopher+ clients
377 know how to do this but depend on the "?" tag in the gopher+ item
378 description to know when to handle this case. The "?" is used in
379 the Gopher+ string to be consistent with Gopher+ protocol's use of
380 this symbol.
381
382 To refer to the Gopher+ attributes of an item, the Gopher+ string
383 might consist of "!" or "$". "!" refers to the all of a gopher+
384 item's attributes. "$" refers to all the item attributes for all
385 items in a Gopher directory. To retrieve an item or directory's
386 attributes, a gopher client will send:
387
388 a_gopher_selector<tab>!<cr><lf>
389
390 for items or
391
392 a_gopher_selector<tab>$<cr><lf>
393
394 for directories to the gopher+ server.
395
396 To refer to specific attributes, the Gopher+ string is
397 "!attribute_name" or "$attribute_name". For example, to refer to
398 the attribute containing the abstract of an item, the Gopher+
399 string would be "!+ABSTRACT". To refer to several attributes,
400 clients send the server the attribute names seperated by spaces so
401 it is neccesary to seperate the attribute names with coded spaces.
402
403
404
405 Berners-Lee 7
406
407 To retrieve a collection of item attributes specified with a
408 gopher+ string of "!+ABSTRACT%20+SMELL" a gopher client would send
409
410 a_gopher_selector<tab>!+ABSTRACT +SMELL<cr><lf>
411
412 to the gopher server.
413
414 Gopher+ allows for optional alternate data representations
415 (alternate views) of items. To retrieve a Gopher+ alternate view,
416 the gopher+ client sends the appropriate view and language
417 identifier (found in the item's +VIEW attribute). To refer to a
418 specific Gopher+ alternate view, the URL's Gopher+ string would be
419 in the form "+view_name%20language_name". For example, a gopher+
420 string of "+application/postscript%20Es_ES" refers to the spanish
421 language postscript alternate view of a gopher+ item. To retrieve
422 this alternate view the client would send
423
424 a_gopher_selector<tab>+application/postscript Es_ES<cr><lf>
425
426 to the gopher server.
427
428 The gopher+ string for a URL that refers to an item referenced by
429 an ASK form filled out with specific values is essentially a coded
430 version of what the client sends to the server. The gopher+ string
431 will be of the form
432
433 +%091%0D%0A+-1%0D%0Aask_item1_value%0D%0Aask_item2_value%0D%0A.%0D%0
434 A
435
436 To retrieve this item, the gopher client sends:
437
438 a_gopher_selector<tab>+<tab>1<cr><lf>
439 +-1<cr><lf>
440 ask_item1_value<cr><lf>
441 ask_item2_value<cr><lf>
442 .<cr><lf>
443
444 to the gopher server.
445
446 For a really complex example, consider a URL that refers to an
447 alternate view of an item that is referenced with a filled-out
448 Gopher +ASK form. The gopher+ string will be of the form:
449
450
451 +view_name%20language_name%091%0D%0A+-1%0D%0Aask_item1_value%0D%0A
452 ask_item2_value%0D%0A.%0D%0A
453
454 To retrieve this item, the gopher client sends:
455
456 a_gopher_selector<tab>+view_name language_name<tab>1<cr><lf>
457 +-1<cr><lf>
458 ask_item1_value<cr><lf>
459 ask_item2_value<cr><lf>
460
461
462
463 Berners-Lee 8
464
465 .<cr><lf>
466
467 to the gopher server.
468
469 Summary: gopher+ string part of Gopher URL
470
471 To refer to an item which has an ASK form associated with it where
472 the intent is to allow the user to enter values into the form as
473 part of the retrieval process:
474
475 %3F [was: ?]
476
477
478 To refer to all or specific attributes of a gopher item:
479
480 ![attribute_name][%20attribute_name][%20attribute_name]...
481
482
483 To refer to all or specific attributes of a gopher directory:
484
485 $[attribute_name][%20attribute_name][%20attribute_name]...
486
487
488 To refer to the content of a gopher+ item (including an item
489 referred to by specific values in a filled-out ASK form):
490
491 +[view_name[%20language_name]]
492 [%091%0D%0A+-1%0D%0Aask_item1_value%0D%0Aask_item2_value...%0D%0A.
493 %0D%0A]
494
495
496
497 Overall summary and examples
498
499 The general format of a Gopher URL path refering to a gopher type
500 "T" item is:
501
502 gopher://host [port]/T[gopher_selector]%09[search_string]%09[gopher+
503 _string]
504
505
506 Examples:
507
508 An example of a URL pointing to a gopher type 0 item (a document)
509 is:
510
511 gopher://host [port]/0a_gopher_selector
512
513
514 An example of a URL pointing to a gopher type 7 item (a search
515 engine) where the string foobar is to be submitted to the search
516 engine is:
517
518
519
520
521 Berners-Lee 9
522
523 gopher://host [port]/7a_gopher_selector%09foobar
524
525
526 An example of a URL pointing to a Gopher+ type 0 item (a document)
527 is:
528
529 gopher://host [port]/0a_gopher_selector%09%09some_gplus_stuff
530
531
532 An example of a URL pointing to a Gopher+ type 0 (document) item's
533 attribute information is:
534
535 gopher://host [port]/0a_gopher_selector%09%09!
536
537
538 An example of a URL pointing to a Gopher+ document's spanish
539 postscript representation is:
540
541 gopher://host [port]/0a_gopher_selector%09%09+application/postscript
542 %20Es_ES
543
544 .
545
546 MAILTO
547
548 This allows a URL to specify an RFC822 addr-spec mail address.
549 Note that use of % , for example as used in forming a gatewayed
550 mail address, requires conversion to %25 in a URL.
551
552 NEWS
553
554 The news locators refer to either news group names or article
555 message identifiers which must conform to the rules for a
556 Message-Idof RFC 1036 (Horton 1987). A message identifier may be
557 distinguished from a news group name by the presence of the
558 commercial at "@" character. These rules imply that within an
559 article, a reference to a news group or to another article will be
560 a valid URL (in the partial form).
561
562 A news URL may be dereferenced using NNTP (RFC977, Kantor 86) (The
563 ARTICLE by message-id command ) or using any other protocol for the
564 conveyance of usenet news articles, or by reference to a body of
565 news articles already received.
566
567 Note1:
568
569 Among URLs the "news" URLs are anomalous in that they are
570 location-independent. They are unsuitable as URN candidates because
571 the NNTP architecture relies on the expiry of articles and
572 therefore a small number of articles being available at any time.
573 When a news: URL is quoted, the assumption is that the reader will
574 fetch the article or group from his or her local news host. News
575 host names are NOT part of news URLs.
576
577
578
579 Berners-Lee 10
580
581 Note 2:
582
583 An outstanding problem is that the message identifier is
584 insufficient to allow the retrieval of an expired article, as no
585 algorithm exists for deriving an archive site and file name. The
586 addition of the date and news group set to the article's URL would
587 allow this if a directory existed of archive sites by news group.
588 Suggested subject of study in conjunction with NNTP working group.
589 Further extension possible may be to allow the naming of subject
590 threads as addressable objects.
591
592 NNTP
593
594 This is an alternative form of reference for news articles,
595 specifically to be used with NNTP servers, and particularly those
596 incomplete server implementations which do not allow retrieval by
597 message identifier. In all other cases the "news" scheme should be
598 used.
599
600 The news server name, newsgroup name, and index number of an
601 article within the newsgroup on that particular server are given.
602 The NNTP protocol must be used.
603
604 Note1.
605
606 This form of URL is not of global accessability, as typically NNTP
607 servers only allow access from local clients. Note that the
608 article numbers within groups vary from server to server.
609
610 This form or URL should not be quoted outside this local area. It
611 should not be used within news articles for wider circulation than
612 the one server. This is a local identifier for a resource which is
613 often available globally, and so is not recommended except in the
614 case in which incomplete NNTP implementations on the local server
615 force its adoption.
616
617 PROSPERO
618
619 The Prospero (Neuman, 1991) directory service is used to resolve
620 the URL yielding an access method for the object (which can then
621 itself be represented as a URL if translated). The host part
622 contains a host name or internet address. The port part is
623 optional.
624
625 The path part contains a host specific object name and an optional
626 version number. If present, the version number is separated from
627 the host specific object name by the characters "%00" (percent
628 zero zero), this being an escaped string terminator (null).
629 External Prospero links are represented as URLs of the underlying
630 access method and are not represented as Prospero URLs.
631
632 TELNET, RLOGIN, TN3270
633
634
635
636
637 Berners-Lee 11
638
639 The use of URLs to represent interactive sessions is a convenient
640 extension to their uses for objects. This allows access to
641 information systems which only provide an interactive service, and
642 no information server. As information within the service cannot be
643 addressed individually or, in general, automatically retrieved,
644 this is a less desirable, though currently common, solution.
645
646 WAIS
647
648 The current WAIS implementation public domain requires that a
649 client know the "type" of a object prior to retrieval. This value
650 is returned along with the internal object identifier in the search
651 response. It has been encoded into the path part of the URL in
652 order to make the URL sufficient for the retrieval of the object.
653 Within the WAIS world, names do not of course need to be prefixed
654 by "wais:" (by the partial form rules).
655
656 The wpath of a WAIS URL consists of encoded fields of the WAIS
657 identifier, in the same order as inthe WAIS identifier. For each
658 field, the identifier field number is the digits before the equals
659 sign, and the field contents follow, encoded in the conventional
660 encoding, terminated by ";".
661
662
663
664 REGISTRATION OF NAMING SCHEMES
665
666 A new naming scheme may be introduced by defining a mapping onto a
667 conforming URL syntax, using a new prefix. Experimental prefixes
668 may be used by mutual agreement between parties, and must start
669 with the characters "x-". The scheme name "urn:" is reserved for
670 the work in progress on a scheme for more persistent names.
671
672 It is proposed that the Internet Assigned Numbers Authority (IANA)
673 perform the function of registration of new schemes. Any submission
674 of a new URI scheme must include a definition of an algorithm for
675 the retrieval of any object within that scheme. The algorithm must
676 take the URI and produce either a set of URL(s) which will lead to
677 the desired object, or the object itself, in a well-defined or
678 determinable format.
679
680 It is recommended that those proposing a new scheme demonstrate its
681 utility and operability by the provision of a gateway which will
682 provide images of objects in the new scheme for clients using an
683 existing protocol. If the new scheme is not a locator scheme, then
684 the properties of names in the new space should be clearly defined.
685 It is likewise recommended that, where a protocol allows for
686 retrieval by URL, that the client software have provision for being
687 configured to use specific gateway locators for indirect access
688 through new naming schemes.
689
690 BNF for specific URL schemes
691
692
693
694
695 Berners-Lee 12
696
697 This is a BNF-like description of the Uniform Resource Locator
698 syntax. A vertical line "|" indicates alternatives, and
699 [brackets] indicate optional parts. Spaces are represented by the
700 word "space", and the vertical line character by "vline". Single
701 letters stand for single letters. All words of more than one letter
702 below are entities described somewhere in this description.
703
704 The current IETF URI working group preference is for the
705 prefixedurl production. (Nov 1993. July 93: url).
706
707 The "national" and "punctuation" characters do not appear in any
708 productions and therefore may not appear in URLs.
709
710 The "afsaddress" is left in as historical note, but is not a url
711 production
712
713 prefixedurl u r l : url
714
715 ur l httpaddress | ftpaddress | newsaddress |
716 nntpaddress | prosperoaddress | telnetaddress
717 | gopheraddress | waisaddress |
718 mailtoaddress | midaddress | cidaddress
719
720 scheme ialpha
721
722 httpaddress h t t p : / / hostport [ / path ] [ ?
723 search ]
724
725 ftpaddress f t p : / / login / path [ ! ftptype ]
726
727 afsaddress a f s : / / cellname / path
728
729 newsaddress n e w s : groupart
730
731 nntpaddress n n t p : group / digits
732
733 midaddress m i d : addr-spec
734
735 cidaddress c i d : content-identifier
736
737 mailtoaddress m a i l t o : : xalphas @ hostname
738
739 waisaddress waisindex | waisdoc
740
741 waisindex w a i s : / / hostport / database [ ? search
742 ]
743
744 waisdoc w a i s : / / hostport / database / wtype /
745 wpath
746
747 wpath digits = path ; [ wpath ]
748
749 groupart * | group | article
750
751
752
753 Berners-Lee 13
754
755 group ialpha [ . group ]
756
757 article xalphas @ host
758
759 database xalphas
760
761 wtype xalphas
762
763 prosperoaddress prosperolink
764
765 prosperolink p r o s p e r o : / / hostport / hsoname [ %
766 0 0 version [ attributes ] ]
767
768 hsoname path
769
770 version digits
771
772 attributes attribute [ attributes ]
773
774 attribute alphanums
775
776 telnetaddress t e l n e t : / / login
777
778 gopheraddress g o p h e r : / / hostport [/ gtype [
779 selector ] ] [ ? search ]
780
781 login [ user [ : password ] @ ] hostport
782
783 hostport host [ : port ]
784
785 host hostname | hostnumber
786
787 ftptype A | I | D
788
789 cellname hostname
790
791 hostname ialpha [ . hostname ]
792
793 hostnumber digits . digits . digits . digits
794
795 port digits
796
797 selector path
798
799 path void | segment [ / path ]
800
801 segment xpalphas
802
803 search xalphas [ + search ]
804
805 user xalphas
806
807 password xalphas
808
809
810
811 Berners-Lee 14
812
813 fragmentid xalphas
814
815 gtype xalpha
816
817 xalpha alpha | digit | safe | extra | escape
818
819 xalphas xalpha [ xalphas ]
820
821 xpalpha xalpha | +
822
823 xpalphas xpalpha [ xpalpha ]
824
825 ialpha alpha [ xalphas ]
826
827 alpha a | b | c | d | e | f | g | h | i | j | k |
828 l | m | n | o | p | q | r | s | t | u | v |
829 w | x | y | z | A | B | C | D | E | F | G |
830 H | I | J | K | L | M | N | O | P | Q | R |
831 S | T | U | V | W | X | Y | Z
832
833 digit 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
834
835 safe $ | - | _ | @ | . | & | + | -
836
837 extra " | ' | ( | ) | , | space
838
839 reserved = | ; | / | # | ? | :
840
841 escape % hex hex
842
843 hex digit | a | b | c | d | e | f | A | B | C |
844 D | E | F
845
846 national { | } | vline | [ | ] | \ | ^ | ~
847
848 punctuation < | >
849
850 digits digit [ digits ]
851
852 alphanum alpha | digit
853
854 alphanums alphanum [ alphanums ]
855
856 void
857
858 (end of URL BNF)
859
860 Security considerations
861
862 The URL scheme does not in itself pose a security threat. Users
863 should beware that there is no general guarantee that a URL which
864 at one time points to a given object continues to do so, and does
865 not even at some later time point to a different object due to the
866
867
868
869 Berners-Lee 15
870
871 movement of objects on servers.
872
873 A URL-related security threat is that it is sometimes possible to
874 construct a URL such that an attempt to perform a harmless
875 idempotent operation such as the retrieval of the object will in
876 fact cause a possibly damaging remote operation to occur. The
877 unsafe URL is typically constructed by specifying a port number
878 other than that reserved for the network protocol in question. The
879 client unwittingly contacts a server which is in fact running a
880 different protocol. The content of the URL contains instructions
881 which when interpreted according to this other protocol cause an
882 unexpected ooperation. An example has been the use of gopher URLs
883 to cause a rude message to be sent via a SMTP server. Caution
884 should be used when using any URL which specifies a port number
885 other than the default for the protocol, especially when it is a
886 number within the reserved space.
887
888 Care should be taken when URLs contain embedded encoded delimiters
889 for a given protocol (for example, CR and LF characters for telnet
890 protocols) that these are not unencoded before transmission. This
891 would violate the protocol but could be used to simulate an extra
892 operation or parameter, again causing an unexpected and possible
893 harmful remote operation to be performed.
894
895 The use of URLs containing passwords is clearly unwise.
896
897 Acknowledgements
898
899 This paper builds on the basic W3 design and much discussion of
900 these issues by many people on the network. The discussion was
901 particularly stimulated by articles by Clifford Lynch (1991),
902 Brewster Kahle (1991) and Wengyik Yeong (1991b). Contributions from
903 John Curran (NEARnet), Clifford Neuman (ISI) Ed Vielmetti (MSEN)
904 and later the IETF URL BOF and URI working group have been
905 incorporated into this issue of this paper.
906
907 The draft url4 (Internet Draft 00) was generated from url3
908 following discussion and overall approval of the URL working group
909 on 29 March 1993. The paper url3 had been generated from udi2 in
910 the light of discussion at the UDI BOF meeting at the Boston IETF
911 in July 1992. Draft url4 was Internet Draft 00. Draft url5
912 incorporated changes suggested by Clifford Neuman, and draft url6
913 (ID 01) incorporated character group changes and a few other fixes
914 defined by the IETF URI WG in submitting it as a proposed standard.
915 URL7 (Internet Draft 02) incorporated changes introduced at the
916 Amsterdam IETF and refined in net discussion.
917
918 The draft 03 includes changes made at Houston in Nov 93, and on the
919 net before Seattle March 1994.
920
921 APPENDICES
922
923 The following are not formally part of this document.
924
925
926
927 Berners-Lee 16
928
929 Wrappers for URIs in plain text
930
931 This section does not formally form part of the URL specification .
932
933 URIs, including URLs, will ideally be transmitted though protocols
934 which accept them and data formats which define a context for them.
935 However, in practice nowadays there are many occasions when URLs
936 are included in plain ASCII non-marked-up text such as electronic
937 mail and usenet news messages.
938
939 In this case, it is convenient to have a separate wrapper syntax to
940 define delimiters which will enable the human or automated reader
941 to recognize that the URI is a URI.
942
943 The recommendation is that the angle brackets (less than and
944 greater than signs) of the ASCII set be used for this purpose.
945
946 These wrappers do not form part of the URL, are not mandatory, and
947 should not be used in contexts (such as SGML parameters, HTTP
948 requests, etc) in which delimiters are already specified.
949
950 Example
951
952 Yes, Jim, I found it under <ftp://info.cern.ch/pub/www/doc> but
953 you can probably pick it up from <ftp://ds.internic.net/rfc>.
954
955
956
957 REFERENCES
958
959 Alberti, R., et.al. (1991)
960 "Notes on the Internet Gopher Protocol"
961 University of Minnesota, December 1991,
962 <ftp://boombox.micro.umn.edu/pub/gopher/
963 gopher_protocol> . See also
964 <gopher://gopher.micro.umn.edu/00/Information
965 About Gopher/About Gopher>
966
967 Berners-Lee, T ., (1991)
968 "Hypertext Transfer Protocol (HTTP)" , CERN,
969 December 1991, as updated from time to time,
970 <ftp://info.cern.ch/pub/www/doc/http-spec.txt
971 >
972
973 Crocker "Standard for ARPA Internet Text Messages" .
974 David H. Crocker, RFC822,
975
976 Davis, F, et al., (1990)
977 "WAIS Interface Protocol: Prototype
978 Functional Specification", Thinking Machines
979 Corporation, April 23, 1990
980 <ftp://quake.think.com/pub/wa
981 is/doc/protspec.txt>
982
983
984
985 Berners-Lee 17
986
987 International Standards Organization, (1991)
988 Information and Documentation - Search and
989 Retrieve Application Protocol Specification
990 for open Systems Interconnection, ISO-10163
991
992 Horton (1987) M. Horton, R. Adams, "Standard for
993 interchange of USENET messages", Internet RFC
994 1036 , 12/01/1987.
995
996 Huitema, C., (1991) "Naming: strategies and techniques",
997 Computer Networks and ISDN Systems 23 (1991)
998 107-110.
999
1000 Kahle, Brewster, (1991)
1001 "Document Identifiers, or International
1002 Standard Book Numbers for the Electronic
1003 Age",
1004 <ftp:
1005 //quake.think.com/pub/wais/doc/doc-ids.txt>
1006
1007 Kantor, B., and Lapsley, P., (1986)
1008 "A proposed standard for the stream-based
1009 transmission of news" , Internet RFC-977,
1010 February 1986.
1011 <ftp://ds.internic.net/rfc/rfc977.txt>
1012
1013 Kunze, 1994 J. Kunze, Requirements for URLs, to be
1014 published.
1015
1016 Lynch, C., Coallition for Networked Information: (1991)
1017 "Workshop on ID and Reference Structures for
1018 Networked Information", November 1991. See
1019 <wais://quake.think.com/wais-discussion-ar
1020 chives?lynch>
1021
1022 Mockapetris, P., (1987)
1023 "Domain names + concepts and facilities",
1024 RFC-1034, USC-ISI, November 1987,
1025 <ftp://ds.internic.net/rfc/rfc1034.txt>
1026
1027 Neuman, B. Clifford, (1992)
1028 "Prospero: A Tool for Organizing Internet
1029 Resources", Electronic Networking: Research,
1030 Applications and Policy, Vol 1 No 2, Meckler
1031 Westport CT USA. See also
1032 <ftp://prospero.isi.edu/pub/prospero/oir.ps>
1033
1034 Postel, J. and Reynolds, J. (1985)
1035 "File Transfer Protocol (FTP)", Internet
1036 RFC-959, October 1985.
1037 <ftp://ds.internic.net/rfc/rfc959.txt>
1038
1039 Sollins 1994 K. Sollins and L. Masinter, Requiremnets for
1040
1041
1042
1043 Berners-Lee 18
1044
1045 URNs, to be published.
1046
1047 Yeong, W., (1991a) "Towards Networked Information Retrieval",
1048 Technical report 91-06-25-01, June 1991,
1049 Performance Systems International, Inc.
1050 <ftp://uu.psi.com/wp/nir.txt>
1051
1052 Yeong, W., (1991b), "Representing Public Archives in the
1053 Directory", Internet Draft, November 1991,
1054 now expired.
1055
1056 .
1057
1058 EDITOR'S ADDRESS
1059
1060 Tim Berners-Lee
1061 Address: World-Wide Web project
1062 CERN,
1063 1211 Geneva 23,
1064 Switzerland
1065
1066 Telephone: +41 (22)767 3755
1067 Fax: +41 (22)767 7155
1068 Email: timbl@info.cern.ch
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102 Berners-Lee 19
1103

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24