INTERNET DRAFT P. Deutsch Expires: November 1, 1994 A. Emtage Bunyip May 1994 Publishing Information on the Internet with Anonymous FTP Status of This Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet-Drafts as reference material or to cite them other than as a "working draft" or "work in progress." Please send comments to Alan Emtage, bajan@bunyip.com TABLE OF CONTENTS ABSTRACT......................................................................1 ACKNOWLEDGEMENTS..............................................................1 PREFACE.......................................................................1 1. ADMINISTRATION.............................................................3 1.1 Scope of this Document.................................................3 1.2 Definitions............................................................3 1.3 Directory Services and Uniform Resource Identifiers....................3 1.3.1 Variant Information...............................................4 1.4 Machine vs. Human Readability..........................................4 2. CONFIGURATION AND CONTENTS INFORMATION.....................................5 2.1 Clusters: Common Data Elements.........................................5 2.1.1 Individuals and Groups............................................5 2.1.2 Organizations.....................................................6 2.1.3 Resource Information..............................................6 2.2 Site-Specific Configuration Information................................7 2.2.1 Configuration Information.........................................7 2.2.2 Logical Archives Configuration....................................8 2.3 Site-Specific Content Information......................................8 2.3.1 Services..........................................................9 2.3.2 Documents, Datasets, Mailing List Archives, Usenet Archives, Software Packages, Images and other objects......................10 3. INFORMATION ENCODING FOR SPECIFIC ENVIRONMENTS............................12 3.1 Data Element Structure................................................12 3.2 Variant Fields........................................................13 3.3 Data Formats..........................................................13 3.4 File Naming...........................................................15 3.4.1 Scheme 1: Multi-Record Files.....................................16 3.4.2 Scheme 2: Single-Record Files....................................16 3.5 Encoding..............................................................17 3.6 Common Data Elements..................................................17 3.6.1 Individuals or Groups............................................17 3.6.2 Organizations....................................................18 3.6.3 Miscellaneous....................................................19 3.7 Template Definitions..................................................19 3.7.1 Site Information ................................................19 3.7.2 Logical Archive Information......................................22 3.8 Content Information...................................................24 3.8.1 User Information.................................................24 3.8.2 Organization Information.........................................24 3.8.3 Services Information.............................................24 3.8.4 Documents, Datasets, Mailing List Archives, Usenet Archives, Software Packages, Images and other objects......................27 4. CONCLUSION................................................................31 BIBLIOGRAPHY.................................................................31 ABSTRACT This document specifies a range of information that your site may wish to make available on your Anonymous FTP Archive to the Internet user community. Automatic archive indexing tools have been created that can gather and index this information, thus making it easier for users to find and access it. It also may be used by the general user community for extracting information about the archive itself, or about material contained on the archive. ACKNOWLEDGEMENTS This document is the result of work done in the Internet Anonymous FTP Archives (IAFA) working group of the IETF. Special thanks are due to George Brett, Jill Foster, Jim Fullton, Joan Gargano, Rebecca Guenther, John Kunze, Clifford Lynch, Pete Percival, Paul Peters, Cecilia Preston, Peggy Seiden, Craig Summerhill, Chris Weider and Janet Vratney. PREFACE Over the past several years, Anonymous FTP has become the primary method of publishing information in the Internet environment. Anonymous FTP is an application-level service that makes use of the File Transfer Protocol [1], one of the principal protocols of the TCP/IP suite. A well organized and well maintained Anonymous FTP archive (AFA) can provide a relatively cheap and simple way to distribute the software, documents, datasets images and other sources of information that are produced for general availability on the network today. Those groups wishing to set up an Anonymous FTP Archive should refer to "A Guide to Anonymous FTP Site Administration" [2], which provides details on why you would want to set up such an archive and what steps are required to have a secure, well-maintained system. This document specifies a range of information that your site may wish to make available on the Anonymous FTP Archive to the Internet user community. Automatic archive indexing tools have been created that can gather and index this information, thus making it easier for users to find and access it. It also may be used by the general user community for extracting information about the archive itself or about material contained on the archive. Although not required, providing such information will make your archive a more useful resource. It is intended that this information be made available through anonymous FTP archives although the templates described may also be made available through any other information access mechanism. It is beyond the scope of this document to provide specific transformations to other mechanisms since the individual encoding method used will necessarily depend on several external factors such as operating systems and network protocols used. Section 1 of this document contains definitions of the terminology used, as well as issues related to the use and construction of the information to be distributed. In Section 2 we make recommendations that are intended to provide a standardized means for sharing information about the contents of a specific archive site such as as services provided by the institution, document abstracts, software descriptions. In addition administrative contacts, local time zone and other site-specific details may be given. Section 3 contains a set of encoding procedures for the information outlined in Section 2. These procedures allow the you, the AFA administrator, to take into account site-specific issues such as whether your particular operating system offers the capability of creating and using subdirectories, any limitations on filename length or the inability to use specific characters in filenames. A generic encoding method is described and it is expected that conventions to transform this to specific computing environments will be performed and become generally known. Interested parties should also refer to the companion document "Data Element Templates for Internet Information Objects" [8] for full definitions of the data templates defined in this document. 1. ADMINISTRATION 1.1 SCOPE OF THIS DOCUMENT The templates listed below are not intended to comprehensively describe all possible information that could be provided, but rather to cover common, useful elements. The determination about what specific information to provide will have to be made on a case by case basis. Those individuals or groups completing the information have to determine how appropriate a particular data element is for their needs. In many cases data elements such as "home telephone number" would be not be desirable in databases open for public access. However, in some cases they may be useful and thus have been included in this document. NOTE: Issues of privacy, security and maintainability should all be considered when determining what information to provide. This document does not mandate or require that any particular class of information be offered. However it is hoped that those sites wishing to offer the information described in this document adhere to the formats recommended in Section 3. 1.2 DEFINITIONS For the purposes of this document, the term "data element" is defined to be a discrete (though not necessarily atomic) piece of information. For example, a name, telephone number or postal address would all be considered a "data element". The granularity at which a data element is defined is determined by the purpose for which it is intended. The term "field" is interchangeable with "data element". "Templates" are logical groupings of one or more data elements. Collectively the templates described in this document will be referred to as "indexing" or "data" templates. A "resource" is any network object being described. This could be a "physical" object like a file, document or printer, or it may be a "service" such as a weather or Domain Name System server. Any object which can be referred to as being accessible or addressable on the network is a resource. A "record" is an instance of the template with the appropriate fields filled in for a particular resource. 1.3 DIRECTORY SERVICES AND UNIFORM RESOURCE IDENTIFIERS Work is currently underway for the construction of what are known as "Uniform Resource Identifiers" [3]. These will be structured strings whose purpose is to uniquely identify any resource on the Internet to determine access and identification information for that resource. This not only includes documents, software packages etc, but also images, interactive services and physical resources. This concept has been integrated into the data templates described below, however no examples of an actual URI are included. As this document is written, there are currently no ubiquitous directory services on the Internet. However, it is likely that in the relatively near future such a services will be tested and deployed. It is expected that such a system will provide information for both network resources (commonly referred to as "Yellow Pages") and personal data ("White Pages"). The Uniform Resource Identification scheme is needed to address the issue of identifiers for network resources. The equivalents for personal data are commonly referred to as "handles" [4]. In this document it is assumed that these objects are semantically equivalent. 1.3.1 VARIANT INFORMATION Due to the lack of a universal directory service infrastructure on the Internet, certain measures need to be taken currently to provide additional information which such a system would make available in a rationalized manner. These include the inclusion of what has come to be known as "variant" information. It is very difficult in a generic manner to determine equivalency of the "intellectual content" of a particular resource. For example, a document may exist both in standard preformatted ASCII (a "text" file), and PostScript versions. The determination of whether the two formats contain "equivalent" information must be left to the person or group indexing (cataloging) such a document. In order for the user searching the indexing files described in this document to be able to ultimately locate the desired resource, information such as location, format, character sets, languages etc. needs to be included to provide enough context to make an informed decision. It is hoped and expected that the methods of information dissemination described in this document will be superseded by a more comprehensive system in the relatively near future. 1.4 MACHINE VS. HUMAN READABILITY At the heart of some data element definitions is their ability to be parsed and "understood" by computer programs. It is hoped and expected that much of the information provided in the IAFA templates described below will be collected and indexed by automated processes without human intervention. As a result, care has been taken to restrict the syntax and semantics data element names and some values so as to facilitate these procedures. 2. CONFIGURATION AND CONTENTS INFORMATION In this section we define a recommended set of information that you could make available as the administrator of an archive site. In doing so, you would extend the functionality of your archive, as well as the functionality of indexing and resource discovery tools that can pick up and redistribute such information. 2.1 CLUSTERS: COMMON DATA ELEMENTS There are certain classes of data elements, such as contact information, which occur every time an individual, group or organization needs to be described. Such data as names, telephone numbers, postal and email addresses etc. fall into this category. To avoid repeating these common elements explicitly in every template below, we define such "clusters" here which can then be referred to in a shorthand manner in the actual template definitions. Predefined symbols specifying these clusters will then be used in their place with a prefix which determines to whom or to what this information applies. NOTE: A handle should be used in preference to a fully expanded entry in those situations where a handle for an individual, group or organization can be obtained and subsequently resolved by some other (external) method (directory service). The following clusters have been identified. 2.1.1 INDIVIDUALS AND GROUPS In order to describe each individual or group in a particular template, the following common data element subcomponents are defined. To avoid being repetitive, "individual" in this context should be read as "individual or group". - Name of individual - Name of organization to which individual belongs or under who's authority this information is being made - Type of organization to which this individual belongs (University, commercial organization etc.) - Work telephone number of individual - FAX (facsimile) telephone number of individual - Postal address of individual - Job title of individual (if appropriate) - Department to which individual belongs - Electronic mail address of individual - Home telephone number of individual - Home postal address of individual - Handle 2.1.2 ORGANIZATIONS The following elements apply when describing organizations and are a subset of those listed above for individuals and groups. Obviously some of the elements above (such as home phone number) make no sense when being applied to an organization. As above, the following may be subcomponents in a larger, hierarchically structured data element name. - Name of organization - Type of organization to which this individual or group belongs (University, commercial organization etc.) - Postal address of organization - Electronic mail address of organization - Phone number of organization - Fax number of organization - City of organization - State (province) of organization - Country of organization - Handle 2.1.3 RESOURCE INFORMATION The following is a list of generic data element subcomponents used when referring to particular resources. - A complete title for the resource - Short title - City of resource - State (or Province) of resource - Country of resource - Description - Any keywords which might be applied to the resource that would facilitate users' locating this information - Type of resource - Uniform Resource Identifier - Comment 2.2 SITE-SPECIFIC CONFIGURATION INFORMATION Information about your archive site itself can often be valuable to users of your system in order for them to utilize the resource in an efficient manner. 2.2.1 CONFIGURATION INFORMATION Site configuration information will help users better understand your wishes on how and when to access your AFA. This would include such information as: Site Information: - Primary host name of the AFA - A valid Domain Name System alias (CNAME) for this host [5] - Individual contact information for site owner(s) - Individual contact information for site maintainer (administrators) - Sponsoring organization contact information - The geographical (latitude/longitude) location - The time zone of the site - Individual contact information for last person last modifying this record - The frequency with which the archive site is generally modified - Times of preferred access for this site - A summary of the access policies of this site. This should include such information as preferred times of usage, conventions or restrictions for uploading files to this site etc. - A brief description of the kind of information stored at this anonymous FTP archive. If the site is intended to specialize in a particular type of information (examples might include software for a specific machine type, on-line copies of a particular type of literature or research papers and information in a particular branch of science or arts) you should indicate this. - Resource information as defined in the resource cluster 2.2.2 LOGICAL ARCHIVES CONFIGURATION One physical archive site may possibly contain multiple "logical" archives. For example, a single archive host may be shared amongst multiple departments, each responsible for the administration of their own part of the anonymous FTP directory subtree. Some information (such as a host's location) will remain constant for the site as a whole. We therefore recommend that you list Logical Archive specific and site-specific information separately. Logical Archive configuration: - Individual contact information for site maintainer (administrators) - A valid Domain Name System alias (CNAME) for this host [5] when referring to this logical archive - Owning organization contact information - Sponsoring organization contact information - Individual contact information for last person last modifying this record - A summary of the access policies of this logical archive - A summary of the type of information that this logical archive may specialize in - The frequency with which the archive site is generally modified - Resource information as defined in the resource cluster 2.3 SITE-SPECIFIC CONTENT INFORMATION The preceding collections of information make available access and utilization policies for a site. You could also wish to make available a selection of information about the actual contents of your archive or the services available from your organization or institution. The host system providing the resources need not be the same physical site on which the descriptive information below is stored. Thus at a University an AFA maintained by the central campus administration could advertize services provided by individual departments who might not have an AFA of their own. Similarly, mailing lists provided on other administratively related hosts (such as in the same organization) may have the indexing information available on one host while the actual mailing list is provided by another machine. The following categories have been identified. 2.3.1 SERVICES - The archive can offer an overall description of each the various Internet services offered by your organization's systems, along with corresponding contact information. This description would then indicate whether the the parent organization offers such services as: o on-line library catalogues o Interactive on-line information services such as WAIS, gopher, Prospero, World Wide Web or archie o specialized information servers such as those providing weather, geographic information, newswire feeds etc. o Other information services The following information can be made available: - Title of service - Short title of service - Name of host providing service - Protocol used by service - Port number of service - Required access protocol (telnet, FTP, etc.), - Contact information for service administration - A description of the service - Authentication information (login name, password etc. if required) or method for authentication (private key etc) - Description of registration process - Charging policies for service - Policies & restrictions on service use - Access times for service - Any keywords which might be applied to the record that would facilitate users' finding this service - Information on last modification times of this record - Information on last verification times of this record - Uniform Resource Identifier 2.3.2 DOCUMENTS, DATASETS, MAILING LIST ARCHIVES, USENET ARCHIVES, SOFTWARE PACKAGES, IMAGES AND OTHER OBJECTS You might wish to make available a brief description of available software, documents, images, sounds, video, datasets, USENET [6] archives and mailing list information through the AFA. Some of the information classes described may not be applicable to each of the above objects. This is NOT intended to be an official catalog entry in the sense used by librarians. It is a simple way to describe documents and announce their availability. More formal methods may be used elsewhere to further describe the documents. - Type of object - Category (for documents this would be technical report, conference paper etc) - Name of object. For example, the name of the mailing list, software package or title of the document. - Names and other contact information on the authors - Names and other contact information for object maintainer/administrator - Version designator - Source of data - Abstract/description of the object - Bibliographic entry - Citation - Special considerations or restrictions on the object's use (e.g., in the case of a software package programming languages/environments needed, hardware restrictions, etc). - Publication status (For documents: draft, published etc. For software packages: beta test, production etc.) - Contact information of publisher - Copyright and copying policy - Creation date - Appropriate keywords for this object - Discussion forums appropriate for this object (mailing lists, USENET newsgroups etc.) - Format of the object (variant) - Size (variant) - Language (variant) - Character set (variant) - ISBN (variant) - ISSN (variant) - Method of access (anonymous FTP etc) - Last revision date (variant) - Library Cataloging information - URI 3. INFORMATION ENCODING FOR SPECIFIC ENVIRONMENTS In this section we offer a recommended encoding format for each of the standard items of information suggested in Section 2. In most cases these recommendations should be applicable to all environments. We offer such a standardized format so that if such information IS to be offered, it is formatted in such a way that it can be utilized by automated indexing and retrieval tools. The encoding methods proposed were developed to be extensible, so that additional information can be offered in a similar format, if the site administrator so wishes. Developing such recommendations offers several challenges. It is hoped that the encoding conventions should be applicable to as wide a variety of operating systems, file structures and encoding schemes as possible. In addition, the globalization of the Internet requires attention to constraints such as the language in use at an archive site. In addition, the encoding methods proposed must be easy to implement and, for the moment, use existing methods of access and retrieval. We currently assume that the site language is English and the encoding ASCII, but it is expected that additional formats for other languages and encoding schemes will be developed over time. 3.1 DATA ELEMENT STRUCTURE All data elements have been defined as "attribute/value" pairs which can be generically described as: : where would for example be "Work-Phone" and the would be "+1 (514) 555 1212" (note that the double quotes (") are not part of the strings, but serve here to delimit the example). The term "field name" is interchangeable with "data element name". The term "field value" is interchangeable with "data element value". It is intended that wherever possible and necessary, a well-defined hierarchical structure will be used when defining data element names. This allows them to be generally and logically extensible. All data element names may contain only alphanumeric characters, the hyphen ("-") and hash (number sign, pound sign "#"). No embedded spaces are allowed. All data element names are case insensitive although here initial letters are capitalized for readability. Some data elements may be for internal use to the site administrator. These field names must start with the hash character "#". All other rules for continuation (section 3.4.1) remain the same. Such fields should be ignored by software indexing or otherwise Data element names without associated field values are legal in templates. 3.2 VARIANT FIELDS In section 1.3.1 we describe some information as being "variant" in that network objects may vary in "format" but are judged to have the same "intellectual content". In the following data element definitions we use the technique of allowing a sequence number to be appended to a set of data elements to describe a particular variant. For example, we have a document "War and Peace" which exists in ASCII text, PostScript and NROFF format. The PostScript version also exists in two natural languages, English and Russian. We define here 3 data elements: "Filename", "Language" and "Format". In addition to the other information stored in the indexing record for "War and Peace" which we consider to remain constant across all variants, (like the name of the author), we can add the following data elements: Format-v0: PostScript Language-v0: English Filename-v0: war-and-peace.english.ps Format-v1: PostScript Language-v1: Russian Filename-v1: war-and-peace.russian.ps Format-v2: ASCII Language-v2: English Filename-v2: war-and-peace.english.txt Format-v3: nroff Language-v3: English Filename-v3: war-and-peace.english.nroff The "-v" syntax allows one to repeat a set of data elements for a particular variant and tie them all together with a common sequence so that individual instances of the particular resource with the desired characteristics may be located. is an arbitrary number with the only restriction that all records with that particular sequence value are logically connected in a similar manner to that illustrated above. The variant number need not exist when variants are not being described and the "-v" syntax may be omitted in those cases. In the data element definitions below, the syntax "-v*" will be used to identify those elements for which variants are allowed. 3.3 DATA FORMATS To facilitate the machine readability of certain data elements, the following syntax applies. 1) All electronic mail (Email) addresses must be as defined in RFC 822, Section 6. Names and comments may be included in the Email address. For example: "John Doe" or jd@ftp.bar.org are valid Email addresses. 2) All hostnames are to be given as Fully Qualified Domain Names as defined in RFC 1034, Section 3. For example "foo.bar.com" 3) All host IP addresses are given in "dotted-quad" (or "dotted-decimal") notation. For example, "127.0.0.1" 4) All numeric values are in decimal unless otherwise stated. 5) Dates/times must be given as defined in RFC 822, Section 5.1 and modified in RFC 1123, Section 5.2.14 [7]: date-time = [ day "," ] date time ; dd mm yy ; hh:mm:ss zzz day = "Mon" / "Tue" / "Wed" / "Thu" / "Fri" / "Sat" / "Sun" date = date = 1*2DIGIT month 2*4DIGIT ; day month year ; e.g. 20 Jun 82 month = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun" / "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec" time = hour zone ; ANSI hour = 2DIGIT ":" 2DIGIT [":" 2DIGIT] ; 00:00:00 - 23:59:59 zone = "UT" / "GMT" ; Universal Time ; North American : UT / "EST" / "EDT" ; Eastern: - 5/ - 4 / "CST" / "CDT" ; Central: - 6/ - 5 / "MST" / "MDT" ; Mountain: - 7/ - 6 / "PST" / "PDT" ; Pacific: - 8/ - 7 ; ; / ( ("+" / "-") 4DIGIT ) ; Local differential ; hours+min. (HHMM) For example the string "Sat, 18 Jun 93 12:36:47 -0500" is a valid date. While the string "12:36:47 GMT" is a valid time. Quoting from RFC 1123, Section 5.2.14: There is a strong trend towards the use of numeric timezone indicators, and implementations SHOULD use numeric timezones instead of timezone names. However, all implementations MUST accept either notation. If timezone names are used, they MUST be exactly as defined in RFC-822. 6) Time ranges (or periods) must be specified as pairs of time values (as defined above in note (5)), separated by a "/". Thus 12:00 GMT / 05:45 GMT is a valid time range. Multiple time ranges are separated by whitespace. 7) "whitespace" is defined as one or more blank (octal 40) and/or tab (octal 11) ASCII characters. 8) References to "UT" mean Universal Time (also known as Greenwich Mean Time or "GMT"). 9) All telephone numbers are to be given as a minimum in full with country and routing codes without separators. The number should be given assuming someone calling internationally. The number given in the local convention may optionally be specified. For example, Telephone: 1 514 875 8189 (+1-514-875-8611) or Telephone: 44 71 732 8011 3.4 FILE NAMING For the greatest flexibility, it is assumed that unless otherwise stated each file containing the indexing information may reside anywhere in the anonymous FTP subtree and in addition, any number of these files may exist. The intention here is that they may be placed in the same location as the information they are indexing. You, as the administrator are free to place these files wherever you think appropriate in most cases. However, some files may carry information from their place in the directory structure and therefore they may not just be randomly placed in the archive. One of two naming schemes may be used to provide maximum flexibility to the archive administrator to allow use on as varied a set of hardware and software platforms as possible. In the first case, full filenames are used to identify indexing files. In the second, use is made of a filename extension. It is possible to use both methods on one archive concurrently; however care should be taken that information is not duplicated in files named under the two conventions. The use of one naming scheme in a consistent manner is strongly recommended. It is expected that those administrators wishing to have one large file containing all templates relevant to their AFA will use Scheme 1, whereas those who prefer associating templates with individual packages or documents in the file system will use scheme 2. For this reason, scheme 2 does not allow multi-record entries in the same file. 3.4.1 SCHEME 1: MULTI-RECORD FILES A file of the given name will exist for each category listed in Section 2. For the sake of consistency across operating systems and for the ability to distinguish them from non-configuration files of the same name, the filenames will be in all upper case letters. In order for tools to easily identify an indexing file from the other data files at the archive site, all indexing filenames must begin with the four character string "AFA-". (1) Files that may contain multiple instances of a given category (a set of mailing lists, for example) will be divided into records and each record containing multiple fields. Unless otherwise specified, files may contain one or more records as defined below and multiple records are separated by one or more blank lines (lines which contain zero or more whitespace characters and the NEWLINE character). The start of each field is marked by a special fieldname on a new line in the leftmost (first) column followed by a colon (:). All data element names must start in the first column. (2) Field data must be separated from fieldname by whitespace. Any field may continue on the next line by whitespace in the first column of that line. Multi-line fields are delimited by the first line which does not have whitespace in the first column. (3) Fields in the same record must not contain any blank lines between them. 3.4.2 SCHEME 2: SINGLE-RECORD FILES An alternate method for naming indexing files is the use of the filename extension ".AFA". Note that this is all capital letters. Filenames in this context are defined as having 2 parts, a "basename" and an "extension", which are combined to form the full filename in the following manner: . (1) The basename under this scheme can be any arbitrary string and is determined by the administrator. Any user or indexing tool retrieving this indexing file needs to know what kind of record any file contains, and so a special field (Template-Type) is used to specify this information. (2) No indexing file under this scheme may contain more than one record (as defined above). (3) Field data must be separated from fieldname by whitespace. Any field may be continued on the next line by whitespace in the first column of that line. Multi-line fields are delimited by the first line which does not have whitespace in the first column. (4) Fields must not contain any blank lines between them. 3.5 ENCODING Indexing files should be made world readable. It is assumed that size and modification times can be obtained through the existing FTP mechanism and are operating system specific. The advantages to this system are that this information need only be constructed once with infrequent periodic updates as changes occur. Several of these files may never change during the lifetime of the host as an anonymous FTP site. They require no special programs or protocols to construct: a text editor is all that is needed. The filename for those sites using the first naming scheme is given at the top of the definition. This is not part of the definition. NOTE: In the definitions below, the fields are separated by blank lines ONLY to improve readability, these lines must not occur in an actual record (see example below). Below are listed the suggested templates for each type of information described in part 2. 3.6 COMMON DATA ELEMENTS As described in Section 2, there are number of data elements which are often needed and which form a natural grouping for certain kinds of information ("clusters"). Below we define the data element names and semantics of these clusters. These clusters are intended to provide the lowest level in the hierarchical structure of data element names. For example, contact information for the authors of a document would be preceded by the string "Author-" thus forming data elements of "Author-Name", "Author-Postal", "Author-Fax", etc. 3.6.1 INDIVIDUALS OR GROUPS Data Element Name Description Name Name of individual Organization-Name Name of organization to which individual belongs or under whose authority this information is being made Organization-Type Type of organization to which this individual belongs (University, commercial organization etc.) Work-Phone Work telephone number of individual Work-Fax FAX (facsimile) telephone number of individual Work-Postal Postal address of individual Job-Title Job title of individual (if appropriate) Department Department to which individual belongs Email Electronic mail address of individual Handle Unique identifier for this record Home-Phone Home telephone number of individual Home-Postal Home postal address of individual Home-Fax FAX (facsimile) telephone number of individual This cluster will be referred to as "USER*" in the template definitions below. 3.6.2 ORGANIZATIONS The following elements apply when describing organizations and are a subset of those listed above for individuals and groups. Obviously some of the elements above (such as home phone number) make no sense when being applied to an organization. As above, the following may be subcomponents in a larger, hierarchically structured data element name. Data Element Name Description Organization-Name Name of organization Organization-Type Type of organization to which this individual or group belongs (University, commercial organization etc.) Organization-Postal Postal address of organization Organization-City City of organization Organization-State State (province) of organization Organization-Country Country of organization Organization-Email Electronic mail address of organization Organization-Phone Phone number of organization Organization-Fax Fax number of organization Organization-Handle Handle of organization This cluster will be referred to as "ORGANIZATION*" in the template definitions below. 3.6.3 MISCELLANEOUS The following is a list of generic data element subcomponents used when referring to particular resources. Data Element Name Description Title A complete title for the resource Short-Title Summary title City City of resource State State (Province, etc) of resource Country Country of resource Description Description of resource Keywords Any keywords which might be applied to the record that would facilitate users' finding this resource. URI Uniform Resource Identifier 3.7 TEMPLATE DEFINITIONS 3.7.1 SITE INFORMATION This file contains one (1) record with the following fields. IMPORTANT: There should only be one instance of this file in each archive. Filename for this index file (naming scheme 1): AFA-SITEINFO Fields for this file. Template-Name: SITEINFO Host-Name: Primary Domain Name System host name Alias: Preferred DNS-registered name for the AFA host. This name must be valid CNAME entry in the Domain Name System. Admin-(USER*): Contact information of the individual or group responsible for administering this site. Owner-(ORGANIZATION*): Contact information for the organization owning this site. Sponsoring-(ORGANIZATION*): Contact information for the organization sponsoring this site. City: City of the host State: State (province) of the host Country: Country of the host Latitude-Longitude: Latitude and longitude of site (See Note <1>) Timezone: Timezone as defined in section 3.3 above. Record-Last-Modified-(USER*): Contact information for individual who last modified this record Record-Last-Modified-Date: The date this record was last modified Record-Last-Verified-(USER*): Contact information of person or group last verifying that this record was accurate Record-Last-Verified-Date: The date the last time this record was verified Update-Frequency: Preferred frequency of retrieval of all AFA extended configuration information by automated retrieval tools (See Note <2>) Access-Times: Time ranges (as defined in Section 3.3) of access to anonymous FTP users Access-Policy: Information such as conventions or restrictions for uploading files to this site etc. Description: This file contains text describing any areas of specialization for this site. For example, if the site contains information related to the field of molecular biology a paragraph or two with the keywords "molecular biology" and some further description would be in order. It should also mention if this site contains "logical" archives. Keywords: Appropriate keywords describing contents of this AFA Notes for this template. <1> Latitude and longitude are specified in that order as CDD.MM.SS/CDD.MM.SS Where DD is in degrees MM is in minutes SS is in seconds C is the direction designator which is For latitude "+" is north of the equator "-" is south of the equator For longitude "+" is west of the Greenwich meridian "-" is east of the Greenwich meridian The double quotes (") are not part of the designator, but are used here to delimit the symbols. <2> The period is measured in days. This value should be chosen to reflect the turnover of information at the archive. An example of a SITEINFO record: Template-Type: SITEINFO Name: foo.bar.org Preferred-Name: ftp.bar.org Admin-Name: John Doe Admin-Work-Postal: PO Box. 6977, Marinetown, PA 17602 Admin-Work-Phone: +1 717 555 1212 Admin-Work-Fax: +1 717 555 1213 Admin-Email: FTP@bar.org Owner-Organization-Name: Beyond All Recognition Foundation City: Lampeter State: Pennsylvania Country: USA Latitude-Longitude: -37.24.43/+121.58.54 Timezone: -0400 Record-Last-Modified-Name: John X. Doe Record-Last-Modified-Email: johnd@bar.org Record-Last-Modified-Date: Mon, Feb 10 92 22:43:31 EST Update-Frequency: 10 Access-Times: 02:00 GMT / 08:00 GMT 18:00 GMT / 21:00 GMT Access-Policy: Non-proprietary data may be uploaded to this site in the "incoming" directory. Please contact site administrators if you do so. Proprietary material found in this directory will be removed. This site is not to be used as a temporary storage area. Description: This site contains data relating to DNA sequencing particularly Yeast chromosome 1. Datasets are available. There is also a selection of programs available for manipulating this information. Keywords: DNA, sequencing, yeast, genome, chromosome 3.7.2 LOGICAL ARCHIVE INFORMATION Filename for this index file (naming scheme 1): AFA-LARCHIVE IMPORTANT: The placement of this file in the file structure is significant: It implies that the directory in which this file exists and all subdirectories are part of the logical archive. Any number of these files may exist in the archive, but only one per directory. Template-Type: LARCHIVE Admin-(USER*): Contact information of the individual or group responsible for administering this site. Alias: Preferred DNS-registered name for the AFA host as this logical archive. This name must be valid CNAME entry in the Domain Name System. Owner-(ORGANIZATION*): Contact information for the organization owning this site. Sponsoring-(ORGANIZATION*): Contact information for the organization sponsoring this site. Record-Last-Modified-(USER*): Contact information for individual who last modified this record Record-Last-Verified-(USER*): Contact information of person or group last verifying that this record was accurate Record-Last-Verified-Date: The date the last time this record was verified Access-Policy: Information such as conventions or restrictions for uploading files to this logical archive. Description: Contains text describing any area of specialization for the logical archive Update-Frequency: Preferred frequency of retrieval of all AFA extended configuration information by automated retrieval tools (See Note <1>) Keywords: Appropriate keywords describing contents of this logical AFA Notes for this record. <1> The period is measured in days. This value should be chosen to reflect how often information at the archive changes. An example of a LARCHIVE record: Template-Type: LARCHIVE Owner-Organization: Orymonix Incorporated Organization-Type: Commercial Preferred-Name: oxymoron-x.com.uk Access-Policy: This archive is open to general access Description: This archive contains essays on Military Intelligence, Postal Service and Progressive Conservatism. All material contained in this archive is in the public domain Admin-Name: Ima Admin Admin-Email: imaa@oxymoron.com.uk Admin-Work-Phone: +44 71 123 4567 Admin-Work-Fax: +44 71 123 5678 Admin-Postal: 555 Marsden Road, London, SE15 4EE Record-Last-Modified: Yuri Tolstoy Record-Last-Modified-Date: Mon, Jun 21 93 17:03:23 EDT Update-Frequency: 20 Keywords: Militarism, Post Office, Conservatism 3.7.3 AUTOMATIC FILE UPDATE INFORMATION Filename for this index file (naming scheme 1): AFA-MIRROR Any number of these files may exist in the archive. Template-Type: MIRROR Admin-(USER*): Contact information of the individual or group responsible for administering this mirror Admin-(ORGANIZATION*): Information on organization responsible for this mirror unit Title: The title of the package Description: Text describing the package Record-Last-Modified-(USER*): Contact information for individual who last modified this record Record-Last-Modified-Date: The date this record was last modified Record-Last-Verified-(USER*): Contact information for individual or group last verifying this record was accurate Record-Last-Verified-Date: The date this record was last verified Reference-Location-v*: The starting point. This is the initial site the package can be found of. As there may be more than one file or directory belonging to this package this is a -v* type. Specified as an URL. <1> Source-Location-v*: The location the package is mirrored from. This may itself be a mirror site of Reference-Location or another Source-Location. Specified as an URL. Destination-Location-v*: The location the package can be found locally. Specified as an URL. Timezone: The timezone this site is in. (see section 3.3 of this document) Update-Frequency: The Source-Site is checked each this number of days or on these days. <2> Update-Time: The time of the day the update is started. This is important for chained updates, i.e. sites using this site as Source-Location. Update-Policy: This is how the update is done. There are a few valid keywords. See <3> for mores information. Update-Filename-Translation: Substitute expression. This may used to reorganize e.g. a flat directory on Source-Location into various subdirectories on Destination-Location. Update-Transfer-Pattern: A regular expression. Only files matching this pattern on Source-Location will be updated/fetched. Update-Exclude-Pattern: A regular expression. Files matching this pattern on Source-Location will not be updated/fetched. Update-Compression-Pattern: A regular expression. Used for packing or re-packing files being updated/fetched. <4> Update-Software: Name and version of the software used for the automatic updates. <1> The -v* form is especially useful, if you mirror a package within a directory called "path", but you don't mirror the whole "path", but only the "src" and "doc" subdirectories. <2> This may be any number or one or more of the (comma seperated) words "Mon", "Tue", Wed", "Thu", "Fri", "Sat" or "Sun" <3> Valid keywords are: autodelete - files be automatically deleted, when they are no longer found on Source-Location sizechange - file will also be updated if only the size but not the time changed on the Source-Location. newer - file will be updated if the file on Source-Location is newer than the one on Destination-Location maxdays=num - file will not be fetched/updated if its modification time has a difference bigger than days to the file on Destination-Location. recursive - directories will be mirrored recursively (otherwise only the contents of the "flat" directory will be updated and no subdirectories will be checked). <4> This specifies whether e.g. *.tar files will be packed (and therefor renamed) to *.tar.Z or *.tar.gz, or whether e.g. *.Z files will be packed and renamed to *.gz Example: -------- This is an example of a AFA-MIRROR file. Template-Type: MIRROR Admin-Name: John Long Silver Admin-Email: silver@jamaica.world Admin-Home-Phone: +1 222 333 4567 Admin-Organization-Name: The Pirates Club Title: The ultimate treasury package Description: This package helps you to become rich, and richer and richer. It shows how to collect money and hide it from anyone within your computer. You can use a program from this package to materialize the money again, later. Record-Last-Modified-Name: Sailor One Record-Last-Modified-Date: Sat Jan 15 02:47:57 MEZ 1994 Record-Last-Verified-Name: Sailer Two Record-Last-Verified-Date: Sat Jan 15 02:48:31 MEZ 1854 Reference-Location-v0: URL:ftp://ftp.money.us/pub/coins/silver/ Reference-Location-v1: URL:ftp://ftp.money.us/pub/coins/gold/ Source-Location-v0: URL:ftp://ftp.cash.mx/pub/money/coins/silver/ Source-Location-v1: URL:ftp://ftp.cash.mx/pub/money/coins/gold/ Destination-Location-v0: URL:ftp://ftp.jamaica/pub/coins/ Destination-Location-v1: URL:ftp://ftp.jamaica/pub/coins/ Timezone: -0700 Update-Frequency: Mon, Wed, Fri Update-Time: 02:00 Update-Policy: sizechange, maxdays=14, recursive Update-Filename-Translation: s:(.*)(gold/|silver/)(.*):$1$2:; Update-Transfer-Pattern: .*dollar.* Update-Exclude-Pattern: .*penny.* Update-Software: coin-transfer, version 3.17 3.8 CONTENT INFORMATION For the following categories the assumption should not be made that the information applies to the anonymous FTP host itself. Rather, the group or organization may publish general information: the specific information will be contained inside the file describing the category. 3.8.1 USER INFORMATION To allow for the use of "handles" and so as not to require the repetition of the USER* information each time this cluster is needed in other templates we define here a USER template in which the information can be stored in one place. Assuming the use of a unique handle, other records may then refer to this template to complete the require information. The definition is simply the data elements listed in 3.6.1 above. Filename for this index file (naming scheme 1): AFA-USER. The Template-Type is USER (naming scheme 2). 3.8.2 ORGANIZATION INFORMATION In a similar manner to the USER template, the ORGANIZATION template provides common information which may be used in other (larger) templates to yielding a central source of information. Filename for this index file (naming scheme 1): AFA-ORGANIZATION. The Template-Type is ORGANIZATION (naming scheme 2). 3.8.3 SERVICES INFORMATION This file contains records with the following fields. In the first naming scheme each record is started and delimited by the "Template-Type" field. Filename for this index file (Naming scheme 1): AFA-SERVICES Any number of these files may exist in the archive. Template-Type: SERVICES Name: Name of service Host-Name: Host name of host providing service Host-Port: Port number of service Protocol: Method required to access service (See Note <1>) Admin-(USER*): Contact information of person or group responsible for service administration (administrative contact) Admin-(ORGANIZATION*): Information on organization responsible for this service Sponsoring-(ORGANIZATION*): Contact information for the organization sponsoring this site. Description: Free text description of service Authentication: Authentication information. Free text field supplying login and password information (if necessary) or other method for authentication Registration: How to register for this service if general access is not available Charging-Policy: Free text field describing any changing mechanism in place. Additionally, fee structure may be included in this field. Access-Policy: Policies and restrictions for using this service Access-Times: Time ranges for mandatory or preferred access of service. Keywords: Keywords appropriate for describing this service Record-Last-Modified-(USER*): Contact information of person or group last modifying this record Record-Last-Modified-Date: The date the last time this record was modified Record-Last-Verified-(USER*): Contact information of person or group last verifying that this record was accurate Record-Last-Verified-Date: The date the last time this record was verified Notes on this file. <1> The Internet protocol used to communicate with this service. For example, "telnet" or "SMTP" or "NNTP" etc. A more complete explanation of specialized protocols (which may not be generally known) should be supplied in the main description. Example 1 --------- The following is an example of an entry for a telnet service. Template-Type: SERVICES Name: Census Bureau information server Host-Name: census.ispy.gov Host-Port: 1234 Protocol: telnet Admin-Name: Jay Bond Admin-Postal: PO Box. 42, A Street Washington DC, USA 20001 Admin-Work-Phone: +1-202-222-3333 Admin-Work-Fax: +1 202 444 5555 Admin-Email: jb007@census.ispy.gov Description: This server provides information from the latest USA Census Bureau statistics (1990) Type "help" for more information. Authentication: Once connected type your email address at the "login:" prompt. No password is required. Registration: No formal registration is required Charging-Policy: There is no charge for the use of this service Access-Times: 9:00 EST / 17:00 EST Access-Policy: This service may not be used by sites in the Republic of the VTTS Keywords: census, population, 1990, statistics Last-Modified-Name: Miss Moneypenny Last-Modified-Email: m.moneypenny@census.ispy.gov Last-Modified-Date: Wed, 1 Jan 1970 12:00:00 GMT Example 2 --------- The following is an example of a mailing list (service). Template-Type: SERVICES Name: fishlovers Host-Name: foo.com Admin-Name: Ima Adams Admin-Email: fishlovers-request@foo.com Protocol: Email to fishlovers@foo.com Registration: Send mail to the administrative address with your own email address requesting addition Description: Discussion list for people who love fish of all types Address: iafa@cc.mcgill.ca Keywords: fish, aquarium, marine, freshwater, saltwater Access-Policy: Any Internet user may subscribe to this mailing list 3.8.4 DOCUMENTS, DATASETS, MAILING LIST ARCHIVES, USENET ARCHIVES, SOFTWARE PACKAGES, IMAGES AND OTHER OBJECTS This file contains records with the following fields. For multi-record files each record is started and the previous record is delimited by the "Template-Type" field which also determines the type of object being indexed. Suggestions for these types include: Type of Object Template-Type -------------- -------------- Document DOCUMENT Image IMAGE Software Package SOFTWARE Mailing List Archive MAILARCHIVE Usenet Archive USENET Sound File SOUND Video File VIDEO Frequently Asked Questions File FAQ Other names may be constructed as necessary. Any number of these files may exist in the archive. Filename for this index file (naming scheme 1): AFA-OBJECT Template-Type: See above list Category: Type of object. See Note <1> Title: Title of the object Author-(USER*): Description/contact information about the authors/creators of the object. These fields may be repeated as often as is necessary. Admin-(USER*): Description/contact information about the administrators/maintainers of the object. These fields may be repeated as often as is necessary. Record-Last-Modified-(USER*): Contact information about person last modifying this record Record-Last-Modified-Date: The date the last time this record was modified Record-Last-Verified-(USER*): Contact information of person or group last verifying that this record was accurate Record-Last-Verified-Date: The date the last time this record was verified Version: A version designator for the object Source: Information as to the source of the object. Requirements: Any requirements for the use of the object. A free text description of any hardware/software requirements necessary to use the object Description: Description (that is, "abstract" in the case of documents) of the object. Bibliography: A bibliographic entry for the object Citation: The citation for the object when used in other works Publication-Status: Current publication status of object (draft, published etc). Publisher-(ORGANIZATION*): Description/contact information about object publisher Copyright: The copyright statement. Any additional information on the copying policy may be included Creation-Date: The creation date for the object Discussion: Free text description of possible discussion forums (USENET groups, mailing lists) appropriate for this object Keywords: Appropriate keywords for this object Format-v*: Formats in which the object is available (See Note <2>). Size-v*: Length of object in bytes (octets). Language-v*: The name of the language in which the object is written. For documents this would be the natural language. For software this would be the programming language Character-Set-v*: The character set of the object. This should be a well-known value for example "ASCII" or "ISO Latin-1". ISBN-v*: The International Standard Book Number of the object ISSN-v*: The International Standard Serial Number of the object Access-Protocol-v*: Method of access to this object (eg, anonymous FTP, Gopher etc.) as well as the host on which it resides. Also any additional information needed to access the object. Access-Host-Name-v*: Host on which to access the object Access-Host-Port-v*: Port on which to access the object. This may be implied by the Access-Protocol and so not be necessary Pathname-v*: The full pathname of this object. This is operating system specific. This is not required if naming scheme 2 is used Last-Revision-Date-v*: Last date that the object was revised Library-Catalog-v*: Library cataloging information (See Note <3>). Notes for this template. <1> The intention of this field is to define the category of the object. For example, in the case of documents it could be "Technical Report", or "Conference Paper" and the name and date of the conference at which the paper was presented. It may also be something like "General Guide" or "User manual". <2> Objects are often available in several formats. Examples for documents include "PostScript", "ASCII text", "DVI" etc. For images this may be "gif", "jpeg", "miff" etc. <3> Library cataloging numbers. In those cases where the number itself does not contain enough information to determine the cataloging scheme, the name of the scheme should be included. Example 1 --------- Example of AFA-OBJECT file for the DOCUMENT type. Note that this example contains variant information. Template-Type: DOCUMENT Title: The Function of Homeoboxes in Yeast Chromosome 1 Access-Method: These files are available from ftp.fungus.newu.edu via anonymous FTP. The are stored in the directory pub/yeast/chromosome1 Author-Name: John Doe Author-Email: jdoe@yeast.foobar.com Author-Home-Phone: +1 898 555 1212; Author-Name: Jane Buck Author-Email: jane@fungus.newu.edu Last-Revision-Date: 27 November 1991 Category: Conference paper. Yeastcon, January 1992, Mushroom Rock, CA, USA Abstract: Homeoboxes have been shown to have a significant impact on the expressions of genes in Chromosome 1 of Bakers' Yeast. Citation: J. Doe, J. Buck, The function of homeoboxes in Yeast Chromosome 1, Conf. proc. Yeastcon, January 1992, Mushroom Rock, pp. 33-50 Publication-Status: Published Publisher-Organization-Name: Yeast-Hall Publisher-Organization-Postal: 1212 5th Avenue NY, NY, 12001 Copyright: The copyright on this document is held by the authors. It may be freely copied and quoted as long as the contribution of the authors is acknowledged Library-Catalog: LCC 1701D Keywords: homeobox, yeast, chromosome, DNA, sequencing, yeastcon Format-v0: PostScript Pathname-v0: yeast-homeobox1.ps Language-v0: English Size-v0: 18 pages Format-v1: ASCII (without graphs) Pathname-v1: yeast-homeobox1.txt Size-v1: 13 pages Language-v1: Russian Example 2 --------- This is an example of a software entry. Note the use of the software maintainer's "handle" instead of the explicit contact information. This could be used if there was a well-known external method of resolving this handle. Template-Type: SOFTWARE Title: Beethoven's Fifth Player Version: 67 Author-Name: Ludwig Van Beethoven Author-Email: beet@romantic.power.org Author-Fax: +43 1 123 4567 Admin-Handle: berlioz01 Abstract: The program provides the novice to Transitional Classical-Romantic music a V-window interface to the author's latest composition Abstract: V-window based music player Requirements: Requires the V-Window system version 10 or higher Discussion: USENET rec.music.classical Copyright: Freely redistributable for non-commercial use. Copyright held by author Keywords: Classical music, V-windows Format-v0: LZ compressed Pathname-v0: /pub/Vfifth.tar.Z Access-Method-v0: Anonymous FTP Access-Host-Name-v0: power.org 4. CONCLUSION This document attempts to provide the foundation for a common set of recommended cataloging practices which may be used on the Internet to enhance the utility of Anonymous FTP archives, currently the most widely used and supported mechanism for general information storage and retrieval. It is intended that these recommendations be flexible enough to accommodate a broad spectrum of information classes and it is hoped that they will be widely used and that automated tools will be developed to use the valuable information that they make available. ---------------------------------------------------------------------- Bibliography ------------ [1] RFC 959 Postel, J.B.; Reynolds, J.K. File Transfer Protocol. 1985 October [2] "A Guide to Anonymous FTP Site Administration". Work in progress from the Internet Anonymous FTP Archive Working Group of the IETF. [3] Internet Draft "draft-ietf-uri-resource-names-00.txt" Work in Progress from the Uniform Resource Identifier Working Group of the IETF. [4] RFC 954 Harrenstien, K.; Stahl, M.K.; Feinler, E.J. NICNAME/WHOIS. 1985 October [5] RFC 1034 Mockapetris, P.V. Domain names - concepts and facilities. 1987 November [6] RFC 1036 Horton, M.R.; Adams, R. Standard for interchange of USENET messages. 1987 December [7] RFC 1123 Braden, R.T.,ed. Requirements for Internet hosts - application and support. 1989 October [8] Internet Draft "Data Element Templates for Internet Information Objects". Work in progress from the Internet Anonymous FTP Archive Working Group of the IETF.