/[suikacvs]/webroot/www/2004/id/draft-ietf-uri-url-07.txt
Suika

Contents of /webroot/www/2004/id/draft-ietf-uri-url-07.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 Uniform Resource Locators T. Berners-Lee
2 draft-ietf-uri-url-07.txt L. Masinter
3 Expires March 13, 1995 M. McCahill
4 Editors
5 September 7, 1994
6
7 Uniform Resource Locators (URL)
8
9 Status of this memo
10
11 This document is an Internet-Draft. Internet-Drafts are
12 working documents of the Internet Engineering Task Force
13 (IETF), its areas, and its working groups. Note that other
14 groups may also distribute working documents as
15 Internet-Drafts.
16
17 Internet-Drafts are draft documents valid for a maximum of six
18 months. Internet-Drafts may be updated, replaced, or obsoleted
19 by other documents at any time. It is not appropriate to use
20 Internet-Drafts as reference material or to cite them other
21 than as a ``working draft'' or ``work in progress.''
22
23 To learn the current status of any Internet-Draft, please check
24 the 1id-abstracts.txt listing contained in the Internet-Drafts
25 Shadow Directories on ds.internic.net, nic.nordu.net,
26 ftp.isi.edu, or munnari.oz.au.
27
28 This Internet Draft expires April 7, 1995.
29
30 0. Abstract
31
32 This document specifies a Uniform Resource Locator (URL), the
33 syntax and semantics of formalized information for location and
34 access of resources via the Internet.
35
36 1. Introduction
37
38 The specification of a Uniform Resource Locator, defined in this
39 document, is derived from concepts introduced by the World-Wide Web
40 global information initiative, whose use of such objects dates from
41 1990 and is described in "Universal Resource Identifiers in WWW",
42 RFC 1630. The specification is based on the "Functional
43 Requirements for Internet Resource Locators"[12].
44
45 This document was written by the URI working group of the Internet
46 Engineering Task Force. Comments may be addressed to the editor,
47 Tim Berners-Lee <timbl@info.cern.ch>, or to the URI-WG
48 <uri@bunyip.com>. Discussions of the group are archived at
49 <URL:http://www.acl.lanl.gov/URI/archive/uri-archive.index.html>
50
51 2. Definitions
52
53 This document describes the syntax for "Uniform Resource Locators"
54 (URLs): a compact representation of the location and access method
55 for a resource available via the Internet. Just as there are many
56 different methods of access to resources, there are several
57 _schemes_ for describing the location of such resources.
58
59 The generic syntax provides a framework for new URL schemes to be
60 established using as yet undefined protocols.
61
62 URLs are used to `locate' resources, by providing an abstract
63 identification of the resource location. Having located a
64 resource, a system may perform a variety of operations on the
65 resource, as might be characterized by such words as `access',
66 `update', `replace', `find attributes'. In general, only the
67 `access' method needs to be specified for any URL scheme.
68
69 2.1. URL SYNTAX
70
71 URLs are written as follows:
72
73 <scheme>:<scheme-specific-part>
74
75 A URL contains the name of the scheme being used (<scheme>)
76 followed by a colon and then a string (the <scheme-specific-part>)
77 whose interpretation depends on the scheme.
78
79 Scheme names consist of lower case letters "a"--"z", digits, and
80 the characters plus ("+"), period ("."), and hyphen ("-"). For
81 resiliency, programs interpreting URLs should treat upper case
82 letters as equivalent to lower case in scheme names (e.g., allow
83 "HTTP" as well as "http").
84
85 A BNF description of the URL syntax is given in Section 5.
86
87 2.2. Encoding of reserved and unsafe characters
88
89 URLs are represented as a sequence of characters taken from the NVT
90 ASCII character set.
91
92 Characters and other 8-bit bytes are _encoded_ by representing them
93 as a percent sign "%" followed by two hexadecimal digits (0-9,
94 A-F).
95
96 In any circumstance, only printable ASCII characters are allowed in
97 URLs: URLs may not contain space or other non-printable characters.
98 If it is necessary to designate a byte in a component of a URL that
99 would otherwise be represented by a space or a non-printable
100 character, it is necessary to represent that byte encoded.
101
102 There are a number of printable ASCII characters whose use in URLs
103 is _unsafe_; characters can be unsafe for a number of reasons. The
104 characters "<" and ">" are unsafe because they are used as the
105 delimiters around URLs in free text; the quote mark (""") is used
106 to delimit URLs in some systems. The character "#" is unsafe and
107 should always be encoded because it is used in World Wide Web and
108 in other systems to delimit a URL from a fragment/anchor identifier
109 that might follow it. The character "%" is unsafe because it is
110 used for encodings of other characters. Other characters are
111 unsafe because gateways and other transport agents are sometimes
112 known to modify such characters.
113
114 All unsafe characters should always be encoded within a URL. For
115 example, the character "#" should be encoded within URLs even in
116 systems that do not normally deal with fragment or anchor
117 identifiers, so that if the URL is copied into another system that
118 does use them, it will not be necessary to change the URL encoding.
119
120 In general, only alphanumerics, reserved characters used for their
121 reserved purposes, "$", "-", "_", ".", and "+" may be used
122 unencoded.
123
124 On the other hand, even safe characters such as alphanumerics _may_
125 be encoded, as long as they are not being used for a reserved
126 purpose.
127
128 Many URL schemes reserve certain characters for a special meaning;
129 their appearance in the scheme-specific part of the URL has a
130 designated semantics. If it is necessary to designate a byte in a
131 component of a URL that would otherwise be represented by a
132 reserved character, it is necessary to represent that byte encoded.
133 The characters ";", "/", "?", ":", "@", "=" and "&" are the
134 characters which may be reserved for special meaning within a
135 scheme. No other characters may be reserved within a scheme.
136
137 Usually, a URL has the same interpretation when a byte is
138 represented by a character and when it is represented by its hex
139 encoding. However, this is not true for reserved characters:
140 encoding a reserved character for a particular scheme may change
141 the semantics of a URL.
142
143 2.3 Hierarchical schemes and relative links
144
145 In some cases, URLs are used to locate resources that contain
146 pointers to _other_ resources. In some cases, those pointers are
147 represented as _relative links_ where the expression of the
148 location of the second resource is in terms of "in the same place
149 as this one except with the following relative path". Relative
150 links are not described in this document. However, the use of
151 relative links depends on the original URL containing a
152 hierarchical structure against which the relative link is based.
153
154 Some URL schemes (such as the ftp, http, and file schemes) contain
155 names that can be considered hierarchical; the components of the
156 hierarchy are separated by "/".
157
158 3. Specific Schemes
159
160 The mapping for some existing standard and experimental protocols
161 is outlined in the BNF syntax definition. Notes on particular
162 protocols follow. The schemes covered are:
163
164 ftp File Transfer protocol
165 http Hypertext Transfer Protocol
166 gopher The Gopher protocol
167 mailto Electronic mail address
168 news USENET news
169 nntp USENET news using NNTP access
170 telnet Reference to interactive sessions
171 wais Wide Area Information Servers
172 file Host-specific file names
173 prospero Prospero Directory Service
174
175 Other schemes may be specified by future specifications. Section 4
176 of this document describes how new schemes may be registered, and
177 lists some scheme names that are under development.
178
179 3.1. Common Internet Scheme Syntax
180
181 While the syntax for the rest of the URL may vary depending on the
182 particular scheme selected, URL schemes that involve the direct use
183 of an IP-based protocol to a specified host on the Internet use a
184 common syntax for the initial part of the scheme-specific data:
185
186 //<user>:<password>@<host>:<port>
187 or
188 //<user>:<password>@<host>:<port>/<url-path>
189
190 This initial part starts with a double slash "//" to indicate its
191 presence, and continues until the following slash "/", if any.
192 Within this section are:
193
194 user
195 An optional user name. Some schemes (e.g., ftp) allow the
196 specification of a user name.
197
198 password
199 An optional password. If present, it follows the user
200 name separated from it by a colon.
201
202 The user name (and password), if present, are followed by a
203 commercial at-sign "@". Within the user and password field, any
204 ":", "@", or "/" must be encoded.
205
206 Note that an empty user name or password is different than no user
207 name or password; there is no way to specify a password without
208 specifying a user name. E.g., <URL:ftp://@host.com/> has an empty
209 user name and no password, <URL:ftp://host.com/> has no user name,
210 while <URL:ftp://foo:@host.com/> has a user name of "foo" and an
211 empty password.
212
213 host
214 The fully qualified domain name of a network host, or its IP
215 address as a set of four decimal digits separated by periods.
216 Fully qualified domain names take the form as described in
217 Section 3.5 of RFC 1034: a sequence of parts separated by
218 period.
219
220 port
221 The port number to connect to. Most schemes designate
222 protocols that have a default port number. Another port number
223 may optionally be supplied, in decimal, separated from the
224 host by a colon. If the port is omitted, the colon is as well.
225
226 url-path
227 The rest of the locator consists of data specific to the
228 scheme, and is known as the "url-path". It supplies the
229 details of how the specified resource can be accessed. Note
230 that the "/" between the host (or port) and the url-path is
231 NOT part of the url-path.
232
233 The url-path syntax depends on the scheme being use, as does the
234 manner in which it is interpreted.
235
236 3.2. FTP
237
238 The FTP URL scheme is used to designate files and directories on
239 Internet hosts accessible using the FTP protocol (RFC959).
240
241 A FTP URL follow the syntax described in Section 3.1. If :<port>
242 is omitted, the port defaults to 21.
243
244 3.2.1. FTP Name and Password
245
246 A user name and password may be supplied; they are used in the ftp
247 "USER" and "PASS" commands after first making the connection to the
248 FTP server. If no user name or password is supplied and one is
249 requested by the FTP server, the conventions for "anonymous" FTP
250 are to be used, as follows:
251
252 The user name "anonymous" is supplied.
253
254 The password is supplied as the Internet e-mail address
255 of the end user accessing the resource.
256
257 If the URL supplies a user name but no password, and the remote
258 server requests a password, the program interpreting the FTP URL
259 should request one from the user.
260
261 3.2.2. FTP url-path
262
263 The url-path of a FTP URL has the following syntax:
264
265 <cwd1>/<cwd2>/.../<cwdN>/<name>;type=<typecode>
266
267 Where <cwd1> through <cwdN> and <name> are (possibly encoded)
268 strings and <typecode> is one of the characters "a", "i", or "d".
269
270 The url-path is interpreted as a series of FTP commands as follows:
271
272 Each of the <cwd> elements is to be supplied, sequentially, as
273 the argument to a CWD (change working directory) command.
274
275 If the typecode is "d", perform a NLST (name list) command with
276 <name> as the argument, and interpret the results as a file
277 directory listing.
278
279 Otherwise, perform a TYPE command with <typecode> as the
280 argument, and then access the file whose name is <name> (for
281 example, using the RETR command.)
282
283 Within a name or CWD component, the characters "/" and ";" are
284 reserved and must be encoded. The components are decoded prior to
285 their use in the FTP protocol. In particular, if the appropriate
286 FTP sequence to access a particular file requires supplying a
287 string containing a "/" as an argument to a CWD or RETR command, it
288 is necessary to encode each "/" as %2F.
289
290 For example, the URL <URL:ftp://myname@host.dom/%2Fetc/motd> is
291 interpreted by FTP-ing to "host.dom", logging in as "myname"
292 (prompting for a password if it is asked for), and then executing
293 "CWD /etc" and then "RETR motd". This has a different meaning from
294 <URL:ftp://myname@host.dom/etc/motd> which would "CWD etc" and then
295 "RETR motd"; the initial "CWD" might be executed relative to the
296 default directory for "myname". On the other hand,
297 <URL:ftp://myname@host.dom//etc/motd>, would "CWD " with a null
298 argument, then "CWD etc", and then "RETR motd".
299
300 FTP URLs may also be used for other operations; for example, it is
301 possible to update a file on a remote file server, or infer
302 information about it from the directory listings. The mechanism for
303 doing so is not spelled out here.
304
305 3.2.3. FTP Typecode is Optional
306
307 The entire ;type=<typecode> part of a FTP URL is optional. If it is
308 omitted, the client program interpreting the URL must guess the
309 appropriate mode to use. In general, the data content type of a
310 file can only be guessed from the name, e.g., from the suffix of
311 the name; the appropriate type code to be used for transfer of the
312 file can then be deduced from the data content of the file.
313
314 3.2.4 Hierarchy
315
316 For some file systems, the "/" used to denote the hierarchical
317 structure of the URL corresponds to the delimiter used to construct
318 a file name hierarchy, and thus, the filename will look similar to
319 the URL path. This does NOT mean that the URL is a Unix filename.
320
321 3.2.5. Optimization
322
323 Clients accessing resources via FTP may employ additional
324 heuristics to optimize the interaction. For some FTP servers, for
325 example, it may be reasonable to keep the control connection open
326 while accessing multiple URLs from the same server. However, there
327 is no common hierarchical model to the FTP protocol, so if a
328 directory change command has been given, it is impossible in
329 general to deduce what sequence should be given to navigate to
330 another directory for a second retrieval, if the paths are
331 different. The only reliable algorithm is to disconnect and
332 reestablish the control connection.
333
334 3.3. HTTP
335
336 The HTTP URL scheme is used to designate Internet resources
337 accessible using HTTP (HyperText Transfer Protocol).
338
339 The HTTP protocol is specified elsewhere. This specification only
340 describes the syntax of HTTP URLs.
341
342 An HTTP URL takes the form:
343
344 http://<host>:<port>/<path>?<searchpart>
345
346 where <host> and <port> are as described in Section 3.1. If :<port>
347 is omitted, the port defaults to 80. No user name or password is
348 allowed. <path> is an HTTP selector, and <searchpart> is a query
349 string. The <path> is optional, as is the <searchpart> and its
350 preceding "?". If neither <path> nor <searchpart> is present, the
351 "/" may also be omitted.
352
353 Within the <path> and <searchpart> components, "/", ";", "?" are
354 reserved. The "/" character may be used within HTTP to designate a
355 hierarchical structure.
356
357 3.4. GOPHER
358
359 The Gopher URL scheme is used to designate Internet resources
360 accessible using the Gopher protocol.
361
362 The base Gopher protocol is described in RFC 1436 and supports
363 items and collections of items (directories). The Gopher+ protocol
364 is a set of upward compatible extensions to the base Gopher
365 protocol and is described in [2]. Gopher+ supports associating
366 arbitrary sets of attributes and alternate data representations
367 with Gopher items. Gopher URLs accommodate both Gopher and Gopher+
368 items and item attributes.
369
370 3.4.1. Gopher URL syntax
371
372 A Gopher URL takes the form:
373
374 gopher://<host>:<port>/<gopher-path>
375
376 where <gopher-path> is one of
377
378 <gophertype><selector>
379 <gophertype><selector>%09<search>
380 <gophertype><selector>%09<gopher+_string>
381 <gophertype><selector>%09<search>%09<gopher+_string>
382
383 If :<port> is omitted, the port defaults to 70. <gophertype> is
384 single-character field to denote the Gopher type of the resource to
385 which the URL refers. The entire <gopher-path> may also be empty,
386 in which case the delimiting "/" is also optional and the
387 <gophertype> defaults to "1".
388
389 <selector> is the Gopher selector string. In the Gopher protocol,
390 Gopher selector strings are a sequence of 8-bit bytes which may
391 contain any characters other than tab, return, or linefeed. Gopher
392 clients specify which item to retrieve by sending the Gopher
393 selector string to a Gopher server.
394
395 Within the <gopher-path>, no additional characters have a reserved
396 interpretation.
397
398 Note that some Gopher <selector> strings begin with a copy of the
399 <gophertype> character, in which case that character will occur
400 twice consecutively. The Gopher selector string may be an empty
401 string; this is how Gopher clients refer to the top-level directory
402 on a Gopher server.
403
404 3.4.2 Specifying URLs for Gopher Search Engines
405
406 If the URL refers to a search to be submitted to a Gopher search
407 engine, the selector is followed by an encoded tab (%09) and the
408 search string. To submit a search to a Gopher search engine, the
409 Gopher client sends the <selector> string (after decoding), a tab,
410 and the search string to the Gopher server.
411
412 3.4.3 URL syntax for Gopher+ items
413
414 URLs for Gopher+ items have a second encoded tab (%09) and a
415 Gopher+ string. Note that in this case, the %09<search> string must
416 be supplied, although the <search> element may be the empty string.
417
418 The <gopher+_string> is used to represent information required for
419 retrieval of the Gopher+ item. Gopher+ items may have alternate
420 views, arbitrary sets of attributes, and may have electronic forms
421 associated with them.
422
423 To retrieve the data associated with a Gopher+ URL, a client will
424 connect to the server and send the Gopher selector, followed
425 optionally by a tab and the search string (if the <search> element
426 is not empty), followed by a tab and the Gopher+ commands.
427
428 3.4.4 Default Gopher+ data representation
429
430 When a Gopher server returns a directory listing to a client, the
431 Gopher+ items are tagged with either a "+" (denoting Gopher+ items)
432 or a "?" (denoting Gopher+ items which have a +ASK form associated
433 with them). A Gopher URL with a Gopher+ string consisting of only
434 a "+" refers to the default view (data representation) of the item
435 while a Gopher+ string containing only a "?" refer to an item with
436 a Gopher electronic form associated with it.
437
438 3.4.5 Gopher+ items with electronic forms
439
440 Gopher+ items which have a +ASK associated with them (i.e. Gopher+
441 items tagged with a "?") require the client to fetch the item's
442 +ASK attribute to get the form definition, and then ask the user to
443 fill out the form and return the user's responses along with the
444 selector string to retrieve the item. Gopher+ clients know how to
445 do this but depend on the "?" tag in the Gopher+ item description
446 to know when to handle this case. The "?" is used in the Gopher+
447 string to be consistent with Gopher+ protocol's use of this symbol.
448
449 3.4.6 Gopher+ item attribute collections
450
451 To refer to the Gopher+ attributes of an item, the Gopher URL's
452 Gopher+ string consists of "!" or "$". "!" refers to the all of a
453 Gopher+ item's attributes. "$" refers to all the item attributes for
454 all items in a Gopher directory.
455
456 3.4.7 Referring to specific Gopher+ attributes
457
458 To refer to specific attributes, the URL's gopher+_string is
459 "!attribute_name" or "$attribute_name". For example, to refer to
460 the attribute containing the abstract of an item, the
461 gopher+_string would be "!+ABSTRACT".
462
463 To refer to several attributes, the gopher+_string consists of
464 the attribute names separated by coded spaces. For example,
465 "!+ABSTRACT%20+SMELL" refers to the +ABSTRACT and +SMELL attributes
466 of an item.
467
468 3.4.8 URL syntax for Gopher+ alternate views
469
470 Gopher+ allows for optional alternate data representations
471 (alternate views) of items. To retrieve a Gopher+ alternate view,
472 a Gopher+ client sends the appropriate view and language
473 identifier (found in the item's +VIEW attribute). To refer to a
474 specific Gopher+ alternate view, the URL's Gopher+ string would
475 be in the form:
476
477 +view_name%20language_name
478
479 For example, a Gopher+ string of "+application/postscript%20Es_ES"
480 refers to the Spanish language postscript alternate view of a
481 Gopher+ item.
482
483 3.4.9 URL syntax for Gopher+ electronic forms
484
485 The gopher+_string for a URL that refers to an item referenced by
486 a Gopher+ electronic form (an ASK block) filled out with specific
487 values is a coded version of what the client sends to the server.
488 The gopher+_string is of the form:
489
490 +%091%0D%0A+-1%0D%0Aask_item1_value%0D%0Aask_item2_value%0D%0A.%0D%0A
491
492 To retrieve this item, the Gopher client sends:
493
494 a_gopher_selector<tab>+<tab>1<cr><lf>
495 +-1<cr><lf>
496 ask_item1_value<cr><lf>
497 ask_item2_value<cr><lf>
498 .<cr><lf>
499
500 to the Gopher server.
501
502 3.5. MAILTO
503
504 The mailto URL scheme is used to designate the Internet mailing
505 address of an individual or service. No additional information
506 other than an Internet mailing address is present or implied.
507
508 A mailto URL takes the form:
509
510 mailto:<rfc822-addr-spec>
511
512 where <rfc822-addr-spec> is (the encoding of an) addr-spec, as
513 specified in RFC 822. Within mailto URLs, no additional characters
514 are reserved within the <rfc822-addr-spec> component.
515
516 Note that the percent sign ("%") is commonly used within RFC 822
517 addresses and must be URL-encoded.
518
519 Unlike many URLs, the mailto scheme does not represent a data
520 object to be accessed directly; there is no sense in which it
521 designates an object. It has a different use than the
522 message/external-body type in MIME.
523
524 3.6. NEWS
525
526 The news URL scheme is used to refer to either news groups or
527 individual articles of USENET news, as specified in RFC 1036.
528
529 A news URL takes one of two forms:
530
531 news:<newsgroup-name>
532 news:<message-id>
533
534 A <newsgroup-name> is a period-delimited hierarchical name, such as
535 "comp.infosystems.www.misc". A <message-id> corresponds to the
536 Message-ID of section 2.1.5 of RFC 1036, without the enclosing "<"
537 and ">"; it takes the form <unique>@<full_domain_name>. A message
538 identifier may be distinguished from a news group name by the
539 presence of the commercial at "@" character. No additional
540 characters are reserved within the components of a news URL.
541
542 If <newsgroup-name> is "*" (as in <URL:news:*>), it is used to
543 refer to "all available news groups".
544
545 The news URLs are unusual in that by themselves, they do not
546 contain sufficient information to locate a single resource, but,
547 rather, are location-independent.
548
549 3.7. NNTP
550
551 The nntp URL scheme is an alternative method of referencing news
552 articles, useful for specifying news articles from NNTP servers
553 (RFC 977).
554
555 A nntp URL take the form:
556
557 nntp://<host>:<port>/<newsgroup-name>/<article-number>
558
559 where <host> and <port> are as described in Section 3.1. If :<port>
560 is omitted, the port defaults to 119.
561
562 The <newsgroup-name> is the name of the group, while the
563 <article-number> is the numeric id of the article within that
564 newsgroup.
565
566 Note that while nntp: URLs specify a unique location for the
567 article resource, most NNTP servers currently on the Internet today
568 are configured only to allow access from local clients, and thus
569 nntp URLs do not designate globally accessible resources. Thus, the
570 news: form of URL is preferred as a way of identifying news
571 articles.
572
573 3.8. TELNET
574
575 The Telnet URL scheme is used to designate interactive services
576 that may be accessed by the Telnet protocol.
577
578 A telnet URL takes the form:
579
580 telnet://<user>:<password>@<host>:<port> [ / ]
581
582 as specified in Section 3.1. The port defaults to 23; the <user>
583 and <password> segments are completely optional (a <password>
584 requires a <user> element.)
585
586 This URL does not designate a data object, but rather an
587 interactive service. Remote interactive services vary widely in the
588 means by which they allow remote logins; in practice, the <user>
589 and <password> supplied are advisory only: clients accessing a
590 telnet URL merely advise the user of the suggested username and
591 password.
592
593 3.9. WAIS
594
595 The WAIS URL scheme is used to designate WAIS databases, searches,
596 or individual documents available from a WAIS database. WAIS is
597 described in [6]; the WAIS protocol is described in RFC 1625 [17].
598
599 A WAIS URLs takes one the following forms:
600
601 wais://<host>:<port>/<database>
602 wais://<host>:<port>/<database>?<search>
603 wais://<host>:<port>/<database>/<wtype>/<wpath>
604
605 where <host> and <port> are as described in Section 3.1. If :<port>
606 is omitted, the port defaults to 210. The first form designates a
607 WAIS database that is available for searching. The second form
608 designates a particular search. <database> is the name of the WAIS
609 database being queried.
610
611 The third form designates a particular document within a WAIS
612 database to be retrieved. In this form <wtype> is the WAIS
613 designation of the type of the object. Many WAIS implementations
614 require that a client know the "type" of an object prior to
615 retrieval, the type being returned along with the internal object
616 identifier in the search response. The <wtype> is included in the
617 URL in order to allow the client interpreting the URL adequate
618 information to actually retrieve the document.
619
620 The <wpath> of a WAIS URL consists of the WAIS document-id, encoded
621 as necessary using the method described in Section 2.2. The WAIS
622 document-id should be treated opaquely; it may only be decomposed
623 by the server that issued it.
624
625 3.10 FILES
626
627 The file URL scheme is used to designate files accessible on
628 a particular host computer. This scheme, unlike most other
629 URL schemes, does not designate a resource that is universally
630 accessible over the Internet.
631
632 A file URL takes the form:
633
634 file://<host>/<path>
635
636 where <host> is the fully qualified domain name of the system on
637 which the <path> is accessible, and <path> is a hierarchical
638 directory path of the form <directory>/<directory>/.../<name>.
639
640 For example, a VMS file
641
642 DISK$USER:[MY.NOTES]NOTE123456.TXT
643
644 might become
645
646 <URL:file://vms.host.edu/disk$user/my/notes/note12345.txt>
647
648 As a special case, <host> can be the string "localhost" or the
649 empty string; this is interpreted as `the machine from which the
650 URL is being interpreted'.
651
652 The file URL scheme is unusual in that it does not specify an
653 Internet protocol or access method for such files; as such, its
654 utility in network protocols between hosts is limited.
655
656 3.11 PROSPERO
657
658 The Prospero URL scheme is used to designate resources that are
659 accessed via the Prospero Directory Service. The Prospero protocol
660 is described elsewhere [14].
661
662 A prospero URLs takes the form:
663
664 prospero://<host>:<port>/<hsoname>;<field>=<value>
665
666 where <host> and <port> are as described in Section 3.1. If :<port>
667 is omitted, the port defaults to 1525. No username or password is
668 allowed.
669
670 The <hsoname> is the host-specific object name in the Prospero
671 protocol, suitably encoded. This name is opaque and interpreted by
672 the Prospero server. The semicolon ";" is reserved and may not
673 appear without quoting in the <hsoname>.
674
675 Prospero URLs are interpreted by contacting a Prospero directory
676 server on the specified host and port to determine appropriate
677 access methods for a resource, which might themselves be
678 represented as different URLs. External Prospero links are
679 represented as URLs of the underlying access method and are not
680 represented as Prospero URLs.
681
682 Note that a slash "/" may appear in the <hsoname> without quoting
683 and no significance may be assumed by the application. Though
684 slashes may indicate hierarchical structure on the server, such
685 structure is not guaranteed. Note that many <hsoname>s begin with a
686 slash, in which case the host or port will be followed by a double
687 slash: the slash from the URL syntax, followed by the initial slash
688 from the <hsoname>. (E.g., <URL:prospero://host.dom//pros/name>
689 designates a <hsoname> of "/pros/name".)
690
691 In addition, after the <hsoname>, optional fields and values
692 associated with a Prospero link may be specified as part of the
693 URL. When present, each field/value pair is separated from each
694 other and from the rest of the URL by a ";" (semicolon). The name
695 of the field and its value are separated by a "=" (equal sign). If
696 present, these fields serve to identify the target of the URL. For
697 example, the OBJECT-VERSION field can be specified to identify a
698 specific version of an object.
699
700 4. REGISTRATION OF NEW SCHEMES
701
702 A new scheme may be introduced by defining a mapping onto a
703 conforming URL syntax, using a new prefix. Experimental prefixes
704 may be used by mutual agreement between parties. Scheme names
705 starting with the characters "x-" are reserved for experimental
706 purposes.
707
708 The Internet Assigned Numbers Authority (IANA) will maintain a
709 registry of URL schemes. Any submission of a new URL scheme must
710 include a definition of an algorithm for accessing of resources
711 within that scheme and the syntax for representing such a scheme.
712
713 URL schemes must have demonstrable utility and operability. One
714 way to provide such a demonstration is via a gateway which provides
715 objects in the new scheme for clients using an existing protocol.
716 If the new scheme does not locate resources that are data objects,
717 the properties of names in the new space must be clearly defined.
718
719 New schemes should try to follow the same syntactic conventions of
720 existing schemes, where appropriate. It is likewise recommended
721 that, where a protocol allows for retrieval by URL, that the client
722 software have provision for being configured to use specific
723 gateway locators for indirect access through new naming schemes.
724
725 The following scheme have been proposed at various times, but this
726 document does not define their syntax or use at this time. It is
727 suggested that IANA reserve their scheme names for future
728 definition:
729
730 afs Andrew File System global file names.
731 mid Message identifiers for electronic mail.
732 cid Content identifiers for MIME body parts.
733 nfs Network File System (NFS) file names.
734 tn3270 Interactive 3270 emulation sessions.
735 mailserver Access to data available from mail servers.
736 z39.50 Access to ANSI Z39.50 services.
737
738 5. BNF for specific URL schemes
739
740 This is a BNF-like description of the Uniform Resource Locator
741 syntax, using the conventions of RFC822, except that "|" is used to
742 designate alternatives, and brackets [] are used around optional or
743 repeated elements. Briefly, literals are quoted with "", optional
744 elements are enclosed in [brackets], and elements may be preceded
745 with <n>* to designate n or more repetitions of the following
746 element; n defaults to 0.
747
748 ; The generic form of a URL is:
749
750 genericurl = scheme ":" schemepart
751
752 ; Specific predefined schemes are defined here; new schemes
753 ; may be registered with IANA
754
755 url = httpurl | ftpurl | newsurl |
756 nntpurl | telneturl | gopherurl |
757 waisurl | mailtourl | fileurl |
758 prosperourl | otherurl
759
760 ; new schemes follow the general syntax
761 otherurl = genericurl
762
763 ; the scheme is in lower case; interpreters should use case-ignore
764 scheme = 1*[ lowalpha | digit | "+" | "-" | "." ]
765 schemepart = *xchar | ip-schemepart
766
767
768 ; URL schemeparts for ip based protocols:
769
770 ip-schemepart = "//" login [ "/" urlpath ]
771
772 login = [ user [ ":" password ] "@" ] hostport
773 hostport = host [ ":" port ]
774 host = hostname | hostnumber
775 hostname = alpha *uchar
776 hostnumber = digits "." digits "." digits "." digits
777 port = digits
778 user = *[ uchar | ";" | "?" | "&" | "=" ]
779 password = *[ uchar | ";" | "?" | "&" | "=" ]
780 urlpath = *xchar ; depends on protocol see section 3.1
781
782 ; The predefined schemes:
783
784 ; FTP (see also RFC959)
785
786 ftpurl = "ftp://" login [ "/" fpath [ ";type=" ftptype ]]
787 fpath = fsegment *[ "/" fsegment ]
788 fsegment = *[ uchar | "?" | ":" | "@" | "&" | "=" ]
789 ftptype = "A" | "I" | "D" | "a" | "i" | "d"
790
791 ; FILE
792
793 fileurl = "file://" host [ "/" fpath ]
794
795 ; HTTP
796
797 httpurl = "http://" hostport [ "/" hpath [ "?" search ]]
798 hpath = hsegment *[ "/" hsegment ]
799 hsegment = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
800 search = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
801
802 ; GOPHER (see also RFC1436)
803
804 gopherurl = "gopher://" hostport [ / [ gtype [ selector
805 [ "%09" search [ "%09" gopher+_string ] ] ] ] ]
806 gtype = xchar
807 selector = *xchar
808 gopher+_string = *xchar
809
810 ; MAILTO (see also RFC822)
811
812 mailtourl = "mailto:" encoded822addr
813 encoded822addr = 1*xchar ; further defined in RFC822
814
815 ; NEWS (see also RFC1036)
816
817 newsurl = "news:" grouppart
818 grouppart = "*" | group | article
819 group = alpha *[ alpha | digit | "-" | "." ]
820 article = 1*[ uchar | ";" | "/" | "?" | ":" | "&" | "=" ] "@" host
821
822 ; NNTP (see also RFC977)
823
824 nntpurl = "nntp://" hostport "/" group [ "/" digits ]
825
826 ; TELNET
827
828 telneturl = "telnet://" login [ "/" ]
829
830 ; WAIS (see also RFC1625)
831
832 waisurl = waisdatabase | waisindex | waisdoc
833 waisdatabase = "wais://" hostport "/" database
834 waisindex = "wais://" hostport "/" database "?" search
835 waisdoc = "wais://" hostport "/" database "/" wtype "/" wpath
836 database = *uchar
837 wtype = *uchar
838 wpath = *uchar
839
840 ; PROSPERO
841
842 prosperourl = "prospero://" hostport "/" ppath *[ fieldspec ]
843 ppath = psegment *[ "/" psegment ]
844 psegment = *[ uchar | "?" | ":" | "@" | "&" | "=" ]
845 fieldspec = ";" fieldname "=" fieldvalue
846 fieldname = *[ uchar | "?" | ":" | "@" | "&" ]
847 fieldvalue = *[ uchar | "?" | ":" | "@" | "&" ]
848
849 ; Miscellaneous definitions
850
851 lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" |
852 "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" |
853 "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" |
854 "y" | "z"
855 hialpha = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
856 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
857 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
858 alpha = lowalpha | hialpha
859 digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
860 "8" | "9"
861 safe = "$" | "-" | "_" | "." | "+"
862 extra = "!" | "*" | "'" | "(" | ")" | "," | "="
863 national = "{" | "}" | "|" | "\" | "^" | "~" | "[" | "]"
864 punctuation = "<" | ">" | """ | "#"
865 reserved = ";" | "/" | "?" | ":" | "@" | "&" | "="
866 hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
867 "a" | "b" | "c" | "d" | "e" | "f"
868 escape = "%" hex hex
869
870 unreserved = alpha | digit | safe | extra | national
871 uchar = unreserved | escape
872 xchar = unreserved | reserved | escape
873 digits = 1*digit
874
875
876 6. Security considerations
877
878 The URL scheme does not in itself pose a security threat. Users
879 should beware that there is no general guarantee that a URL which
880 at one time points to a given object continues to do so, and does
881 not even at some later time point to a different object due to the
882 movement of objects on servers.
883
884 A URL-related security threat is that it is sometimes possible to
885 construct a URL such that an attempt to perform a harmless
886 idempotent operation such as the retrieval of the object will in
887 fact cause a possibly damaging remote operation to occur. The
888 unsafe URL is typically constructed by specifying a port number
889 other than that reserved for the network protocol in question. The
890 client unwittingly contacts a server which is in fact running a
891 different protocol. The content of the URL contains instructions
892 which when interpreted according to this other protocol cause an
893 unexpected operation. An example has been the use of gopher URLs
894 to cause a rude message to be sent via a SMTP server. Caution
895 should be used when using any URL which specifies a port number
896 other than the default for the protocol, especially when it is a
897 number within the reserved space.
898
899 Care should be taken when URLs contain embedded encoded delimiters
900 for a given protocol (for example, CR and LF characters for telnet
901 protocols) that these are not unencoded before transmission. This
902 would violate the protocol but could be used to simulate an extra
903 operation or parameter, again causing an unexpected and possible
904 harmful remote operation to be performed.
905
906 The use of URLs containing passwords that should be secret is
907 clearly unwise.
908
909 7. Acknowledgements
910
911 This paper builds on the basic WWW design (RFC 1630) and much
912 discussion of these issues by many people on the network. The
913 discussion was particularly stimulated by articles by Clifford
914 Lynch, Brewster Kahle [10] and Wengyik Yeong [18]. Contributions
915 from John Curran, Clifford Neuman, Ed Vielmetti and later the IETF
916 URL BOF and URI working group were incorporated.
917
918 Most recently, careful readings and comments by Dan Connolly, Ned
919 Freed, Roy Fielding, Guido van Rossum, Michael Dolan, Bert Bos,
920 John Kunze, and many others have helped refine the current draft.
921
922 APPENDIX: Recommendations for URLs in Context
923
924 URIs, including URLs, are intended to be transmitted though
925 protocols which provide a context for their interpretation.
926
927 In some cases, it will be necessary to distinguish URLs from other
928 possible data structures in a syntactic structure. In this case, is
929 recommended that URLs be preceeded with a prefix consisting of the
930 characters "URL:". For example, this prefix may be used to
931 distinguish URLs from other kinds of URIs.
932
933 In addition, there are many occasions when URLs are included in
934 other kinds of text; examples include electronic mail, USENET news
935 messages, or printed on paper. In such cases, it is convenient to
936 have a separate syntactic wrapper that delimits the URL and
937 separates it from the rest of the text. For this purpose, is
938 recommended that angle brackets ("<" and ">"), along with the
939 prefix "URL:", be used to delimit the boundaries of the URL. This
940 wrapper does not form part of the URL and should not be used in
941 contexts in which delimiters are already specified.
942
943 In the case where a fragment/anchor identifier is associated with a
944 URL (following a "#"), the identifier would be placed within the
945 brackets as well.
946
947 In some cases, extra whitespace (spaces, linebreaks, tabs, etc.)
948 may need to be added to break long URLs across lines. The
949 whitespace should be ignored when extracting the URL.
950
951 No whitespace should be introduced after a hyphen ("-") character.
952 Because some typesetters and printers may (erroneously) introduce a
953 hyphen at the end of line when breaking a line, the interpreter of
954 a URL containing a line break immediately after a hyphen should
955 ignore all unencoded whitespace around the line break, and should
956 be aware that the hyphen may or may not actually be part of the
957 URL.
958
959 Examples:
960
961 Yes, Jim, I found it under <URL:ftp://info.cern.ch/pub/www/doc;
962 type=d> but you can probably pick it up from <URL:ftp://ds.in
963 ternic.net/rfc>. Note the warning in <URL:http://ds.internic.
964 net/instructions/overview.html#WARNING>.
965
966 REFERENCES
967
968 [1] Anklesaria, F., McCahill, M., Lindner, P., Johnson, D.,
969 Torrey, D., and Alberti, B., "The Internet Gopher Protocol:
970 A distributed document search and retrieval protocol",
971 RFC 1436, <URL:ftp://ds.internic.net/rfc/rfc1436.txt>,
972 March 1993.
973
974 [2] Anklesaria, F., Lindner, P., McCahill, M., Torrey, D.,
975 Johnson, D., and Alberti, B., "Gopher+: Upward compatible
976 enhancements to the Internet Gopher protocol",
977 University of Minnesota, <URL:ftp://boombox.micro.umn.edu
978 /pub/gopher/gopher_protocol/Gopher+/Gopher+.txt>, July 1993.
979
980 [3] Berners-Lee, T., "Universal Resource Identifiers in WWW: A
981 Unifying Syntax for the Expression of Names and Addresses of
982 Objects on the Network as used in the World-Wide Web", RFC
983 1630, <URL:ftp://ds.internic.net/rfc/rfc1630.txt>, June 1994.
984
985 [4] Berners-Lee, T ., "Hypertext Transfer Protocol (HTTP)" ,
986 CERN, <URL:ftp://info.cern.ch/pub/www/doc/http-spec.txt.Z>,
987 November 1993.
988
989 [5] Crocker, D. H., "Standard for the Format of ARPA Internet Text
990 Messages", RFC 822, <URL:ftp://ds.internic.net/rfc/rfc822.txt>,
991 April 1982.
992
993 [6] Davis, F., Kahle, B., Morris, H., Salem, J., Shen, T., Wang, R.,
994 Sui, J., and Grinbaum, M., "WAIS Interface Protocol Prototype
995 Functional Specification", (v1.5), Thinking Machines Corporation,
996 <URL:ftp://quake.think.com/pub/wais/doc/protspec.txt>,
997 April 1990.
998
999 [7] Deutsch, P., Emtage, A. & Marine, A. "How to Use Anonymous
1000 FTP", RFC1635, <URL: ftp://ds.internic.net/rfc/rfc1635.txt>,
1001 May 1994.
1002
1003 [8] Horton, M. and Adams, R., "Standard For Interchange of USENET
1004 messages", RFC 1036, <URL:ftp://ds.internic.net/rfc/rfc1036.txt>,
1005 December 1987.
1006
1007 [9] Huitema, C., "Naming: strategies and techniques", Computer
1008 Networks and ISDN Systems 23 (1991) 107-110.
1009
1010 [10] Kahle, B., "Document Identifiers, or International Standard
1011 Book Numbers for the Electronic Age", <URL:ftp://quake.think.
1012 com/pub/wais/doc/doc-ids.txt>, 1991.
1013
1014 [11] Kantor, B. and Lapsley, P., "Network News Transfer Protocol:
1015 A Proposed Standard for the Stream-Based Transmission of News",
1016 RFC977, <URL:ftp://ds.internic.net/rfc/rfc977.txt>,
1017 February 1986.
1018
1019 [12] Kunze, J., "Functional Requirements for Internet Resource
1020 Locators", Internet-Draft (work in progress), <URL:ftp://ds.in
1021 ternic.net/internet-drafts/draft-ietf-uri-irl-fun-req-01.txt>,
1022 July 1994.
1023
1024 [13] Mockapetris, P., "Domain Names - Concepts and Facilities",
1025 RFC1034, USC-ISI, <URL:ftp://ds.internic.net/rfc/rfc1034.txt>,
1026 November, 1987.
1027
1028 [14] Neuman, B.C., and Augart, S. "The Prospero Protocol", USC
1029 Information Sciences Institute, <URL:ftp://prospero.isi.edu/pub
1030 /prospero/doc/prospero-protocol.PS.Z>, June 1993.
1031
1032 [15] Postel, J. and Reynolds, J.K., "File Transfer Protocol (FTP)",
1033 RFC 959, <URL:ftp://ds.internic.net/rfc/rfc959.txt>, October
1034 1985.
1035
1036 [16] Sollins, K. and Masinter, L. "Requirements for Uniform Resource
1037 Names", Internet-Draft (work in progress), <URL:ftp://ds.inter
1038 nic.net/internet-drafts/draft-sollins-urn-00.txt>
1039
1040 [17] St. Pierre, M, Fullton, J., Gamiel, K., Goldman, J., Kahle, B.,
1041 Kunze, J., Morris, H., and Schiettecatte, F., "WAIS over
1042 Z39.50-1988", RFC 1625, <URL:ftp://ds.internic.net/rfc/rfc1625.
1043 txt>, June 1994.
1044
1045 [18] Yeong, W. "Towards Networked Information Retrieval", Technical
1046 report 91-06-25-01, Performance Systems International, Inc.
1047 <URL:ftp://uu.psi.com/wp/nir.txt>, June 1991.
1048
1049 [19] Yeong, W., "Representing Public Archives in the Directory",
1050 Internet Draft, November 1991, now expired.
1051
1052 EDITORS' ADDRESSES
1053
1054 Tim Berners-Lee
1055 World-Wide Web project
1056 CERN,
1057 1211 Geneva 23,
1058 Switzerland
1059 Tel: +41 (22)767 3755
1060 Fax: +41 (22)767 7155
1061 Email: timbl@info.cern.ch
1062
1063 Larry Masinter
1064 Xerox PARC
1065 3333 Coyote Hill Road
1066 Palo Alto, CA 94034
1067 Tel: (415) 812-4365
1068 Fax: (415) 812-4333
1069 Email: masinter@parc.xerox.com
1070
1071 Mark McCahill
1072 Computer and Information Services,
1073 University of Minnesota
1074 Room 152 Shepherd Labs
1075 100 Union Street SE
1076 Minneapolis, MN 55455
1077 Tel: (612) 625 1300
1078 EMail: mpm@boombox.micro.umn.edu
1079

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24