Content negotiation subgroup of HTTP working group Koen Holtman/TUE Internet-Draft 22 February 1996 Expires: 27 August 1996 Proposed Content Negotiation Definitions for HTTP/1.1 draft-holtman-http-negotiation-00.txt STATUS OF THIS MEMO This is a snapshot of a working draft of the content negotiation subgroup of the HTTP working group. It does not at the moment represent the consensus of the content negotiation subgroup of the http working group, or of the http working group. This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Please send comments to the HTTP working group at . Discussions of the working group are archived at . General discussions about HTTP and the applications which use HTTP should take place on the mailing list. ABSTRACT This document proposes a content negotiation mechanism for HTTP/1.1. It contains a number of sections that should be read as definitions in the context of the current draft HTTP/1.1 specification (draft-ietf-http-v11-spec-01.txt). It is intended that these sections are merged into a future version of this specification. [Page 1] Internet-Draft Content Negotiation 22 February 1996 Content negotiation, as proposed in this document, is an optional feature for the HTTP/1.1 protocol: resources may be negotiable, but they need not be. If a resource is negotiable, this changes the semantics of GET and HEAD transactions on the resource. Other transactions are not affected. A negotiable resource has a number of alternates bound to it. The proposed content negotiation mechanism allows for automatic selection of the preferred alternate bound to a negotiable resource based on the properties of the alternates and on the user agent preferences for the retrieval action. 1 Introduction This document proposes a content negotiation mechanism for HTTP/1.1. It contains a number of sections that should be read as definitions in the context of the current draft HTTP/1.1 specification [1]. It is intended that these sections are merged into a future version of the draft HTTP/1.1 specification. Content negotiation, as proposed in this document, is an optional feature for the HTTP/1.1 protocol: resources may be negotiable, but they need not be. If a resource is negotiable, this changes the semantics of GET and HEAD transactions on the resource. Other transactions are not affected. A negotiable resource has a number of alternates bound to it. The proposed content negotiation mechanism allows for automatic selection of the preferred alternate bound to a negotiable resource based on the properties of the alternates and on the user agent preferences for the retrieval action. This document builds on the content negotiation descriptions in [1], and directly incorporates text from [1] in some places. A new directive, reactive-on-wildcard, is introduced to allow user agents to signal the capability of doing content negotiation. If this directive is absent, the proposed definitions produce server behavior that yields adequate results for (HTTP/1.0) user agents that do not support content negotiation. 2 Terminology and notation This document uses the terminology and notational conventions defined in [1]. It sometimes refers directly to sections in [1], using the notation `Section (1.2[1])'. If a (sub)section title below is marked with (*), is intended as a replacement for the (sub)section with the same title in [1]. All other (sub)sections below, up to Section 7, contain new material intended as an addition to [1]. 3 Status code definitions 3.1 Redirection 3xx [Page 2] Internet-Draft Content Negotiation 22 February 1996 300 Multiple Choices (*) The requested resource is a negotiable resource and the server is engaging in reactive content negotiation (Section 5). The server has determined that multiple alternates are acceptable, but is not able to determine which alternate is the best alternate. This response may only be generated if specific conditions given in Section 5.2 are met. The response must include an Alternates header describing the alternates bound to the resource, allowing a user agent to automatically select and retrieve an alternate if appropriate. This response is cachable, subject to the restrictions specified in the cache-control directive, if present, of the included Alternates header. If no Accept header in the request contains a reactive-on-wildcard directive, and it was not a HEAD request, the response must include an entity that gives the user the option to select the most appropriate alternate manually. The suggested entity media type as given in the Content-Type response header is "text/html". If there is a reactive-on-wildcard directive, no entity should be included. If the service author finds it appropriate for any user agent that does not implement an alternate selection algorithm to automatically retrieve a certain alternate, then a Location response header giving the URI of that alternate may be included in the response. 9.4 Client Error 4xx 406 None Acceptable (*) The requested resource is a negotiable resource and the server is engaging in reactive content negotiation (Section 5). Usually, this response indicates that the server was not able to positively determine that at least one of the available alternates would be acceptable. The response must include an Alternates header describing the alternates bound to the resource, allowing a user agent to automatically select and retrieve an alternate if appropriate. This response is cachable, subject to the restrictions specified in the cache-control directive, if present, of the included Alternates header. If no Accept header in the request contains a reactive-on-wildcard directive, and it was not a HEAD request, the response must include an entity that gives the user the option to select the most appropriate alternate manually. The suggested entity media type as given in the Content-Type response header is "text/html". If there is a reactive-on-wildcard directive, no entity should be included. If the service author finds it appropriate for any user agent that does not implement an alternate selection algorithm to automatically [Page 3] Internet-Draft Content Negotiation 22 February 1996 retrieve a certain alternate, then a Location response header giving the URI of that alternate may be included in the response. 408 Not Acceptable (*) The resource identified by the Request-URI has content characteristics that are not acceptable according to the accept headers sent in the request. This response code must only be generated by un-negotiable resources. 3 Protocol parameter descriptions 3.1 Language Tags (*) A language tag identifies a natural language spoken, written, or otherwise conveyed by human beings for communication of information to other human beings. Computer languages are explicitly excluded. HTTP uses language tags within the Accept-Language, Content-Language, and Alternates fields. The syntax and registry of HTTP language tags is the same as that defined by RFC 1766 [2]. In summary, a language tag is composed of 1 or more parts: A primary language tag and a possibly empty series of subtags: language-tag = primary-tag *( "-" subtag ) primary-tag = 1*8ALPHA subtag = 1*8ALPHA Whitespace is not allowed within the tag and all tags are case-insensitive. The namespace of language tags is administered by the IANA. Example tags include: en, en-US, en-cockney, i-cherokee, x-pig-latin where any two-letter primary-tag is an ISO 639 language abbreviation and any two-letter initial subtag is an ISO 3166 country code. 4 Header field definitions 4.1 Accept (*) The Accept request-header field can be used to specify certain media types which are acceptable for the response. Accept headers can be used to guide content negotiation (Section 5), and can also be used to indicate that the request is specifically limited to a small set of desired types, as in the case of a request for an in-line image. In general, it is not efficient to send long Accept headers in every request. See Section 5.2 for a discussion of Accept header efficiency considerations. [Page 4] Internet-Draft Content Negotiation 22 February 1996 The field may be folded onto several lines and more than one occurrence of the field is allowed, with the semantics being the same as if all the entries had been in one field value. Accept = "Accept" ":" #( ( media-range [ ";" "q" "=" qvalue ] [ ";" "mxb" "=" 1*DIGIT ] ) | reactive-on-wildcard ) media-range = ( "*/*" | ( type "/" "*" ) | ( type "/" subtype ) ) *( ";" parameter ) reactive-on-wildcard = "reactive-on-wildcard" | "r-o-w" The asterisk "*" character is used to group media types into ranges, with "*/*" indicating all media types and "type/*" indicating all subtypes of that type. The parameter q is used to indicate the media type quality factor, which represents the user's preference for that range of media types. The parameter mxb gives the maximum acceptable size of the Entity-Body, in decimal number of octets, for that range of media types. The default values are: q=1 and mxb=undefined (i.e., infinity). Section 5 describes the content negotiation algorithm which makes use of these values. The example Accept: audio/*; q=0.2, audio/basic should be interpreted as "I prefer audio/basic, but send me any audio type if it is the best available after an 80% mark-down in quality." If no Accept header is present, then it is assumed that the client accepts all media types. If Accept headers are present, and if the resource is an un-negotiable resource which cannot generate a response which is acceptable according to the Accept headers, then the server should generate an error response with the 408 (not acceptable) status code. A more elaborate example is Accept: text/plain; q=0.5, text/html, text/x-dvi; q=0.8; mxb=100000, text/x-c Verbally, this would be interpreted as "text/html and text/x-c are the preferred media types, but if they do not exist, then send the text/x-dvi entity if it is less than 100000 bytes, otherwise send the text/plain entity." Media ranges can be overridden by more specific media ranges or specific media types. If more than one media range applies to a given [Page 5] Internet-Draft Content Negotiation 22 February 1996 type, the most specific reference has precedence. For example, Accept: text/*, text/html, text/html;version=2.0, */* have the following precedence: 1) text/html;version=2.0 2) text/html 3) text/* 4) */* The media type quality factor and maximum acceptable size associated with a given type are determined by finding the media range with the highest precedence which matches that type. For example, Accept: text/*;q=0.3, text/html;q=0.7, text/html;version=2.0, */*;q=0.5 would cause the following type quality factors to be associated: text/html;version=2.0 = 1 text/html = 0.7 text/plain = 0.3 image/jpeg = 0.5 text/html;level=3 = 0.7 The inclusion of a reactive-on-wildcard directive in an Accept header will change the rules for the sending of reactive negotiation responses (Section 5). The example Accept: text/html; */*;q=0.95, r-o-w should be interpreted as "text/html is my preferred media type, and I assign media type quality factors in the range 0 - 0.95 to all other media types. Send me a reactive negotiation response, so that I can pick the best alternate myself, if you have any non-text/html alternate which might give me a higher overall quality than any text/html alternate." Note: A user agent may be provided with a default set of quality values for certain media ranges. However, unless the user agent is a closed system which cannot interact with other rendering agents, this default set should be configurable by the user. 4.2 Accept-Charset (*) The Accept-Charset request-header field can be used to indicate what character sets are acceptable for the response. This field allows clients capable of understanding more comprehensive or special-purpose character sets to signal that capability to a server which is capable of representing documents in those character [Page 6] Internet-Draft Content Negotiation 22 February 1996 sets. The US-ASCII character set can be assumed to be acceptable to all user agents. Accept-Charset = "Accept-Charset" ":" 1#charset Character set values are described in Section (3.4[1]). An example is Accept-Charset: iso-8859-1, unicode-1-1 If no Accept-Charset header is present, the default is that any character set is acceptable. If an Accept-Charset header is present, and if the resource is an un-negotiable resource which cannot generate a response which is acceptable according to the Accept-Charset header, then the server should generate an error response with the 408 (not acceptable) status code. 4.3 Accept-Encoding (*) The Accept-Encoding request-header field is similar to Accept, but restricts the content-coding values (Section (3.5[1])) which are acceptable in the response. Accept-Encoding = "Accept-Encoding" ":" #( content-coding ) An example of its use is Accept-Encoding: compress, gzip If no Accept-Encoding header is present in a request, the server may assume that the client will accept any content coding. If an Accept-Encoding header is present, and if the resource is an un-negotiable resource which cannot generate a response which is acceptable according to the Accept-Encoding header, then the server should generate an error response with the 408 (not acceptable) status code. 4.4 Accept-Language (*) The Accept-Language request-header field is similar to Accept, but restricts the set of natural languages that are preferred as a response to the request. Accept-Language = "Accept-Language" ":" 1#( language-range [ ";" "q" "=" qvalue ] ) language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" ) Each language-range may be given an associated quality value which represents an estimate of the user's comprehension of the languages specified by that range. The quality value defaults to "q=1" (100% comprehension). This value may be used in the server's content [Page 7] Internet-Draft Content Negotiation 22 February 1996 negotiation algorithm (Section 5). For example, Accept-Language: da, en-gb;q=0.8, en;q=0.7 would mean: "I prefer Danish, but will accept British English (with 80% comprehension) and other types of English (with 70% comprehension)." A language-range matches a language-tag if it exactly equals the tag, or if it is a prefix of the tag such that the first tag character following the prefix is "-". The special range "*", if present in the Accept-Language field, matches every tag not matched by any other ranges present in the Accept-Language field. Note: This use of a prefix matching rule does not imply that language tags are assigned to languages in such a way that it is always true that if a user understands a language with a certain tag, then this user will also understand all languages with tags for which this tag is a prefix. The prefix rule simply allows the use of prefix tags if this is the case. The language quality factor assigned to a language-tag by the Accept-Language field is the quality value of the longest language-range in the field that matches the language-tag. If no language-range in the field matches the tag, the language quality factor assigned is 0. If no Accept-Language header is present in a request, the server should assume that all languages are equally acceptable. If an Accept-Language header is present, then all languages which are assigned a quality factor greater than 0 are acceptable. If the resource is an un-negotiable resource which cannot generate a response for an audience capable of understanding at least one acceptable language, it is acceptable to serve a response that uses other languages. It may be contrary to be privacy expectations of the user to send an Accept-Language header with the complete linguistic preferences of the user in every request. For a complete discussion of this issue, see Section 6.3. If a reactive-on-wildcard directive is present in an Accept header, the user agent can safely omit certain languages intelligible to the user from the Accept-Language header, without affecting the quality of the negotiation process in requests on negotiated resources, if the language-range "*" is included with an appropriate language quality factor, Note: As intelligibility is highly dependent on the individual user, it is recommended that client applications make the choice of linguistic preference available to the user. If the choice is not made available, then the Accept-Language header field must not be given in the request. [Page 8] Internet-Draft Content Negotiation 22 February 1996 4.5 URI (*) The URI entity-header field is used to inform the recipient of other Uniform Resource Identifiers (Section (3.2[1])) by which the resource can be identified. URI-header = "URI" ":" 1#( uri-mirror | uri-name ) uri-mirror = "{" "mirror" <"> URI <"> "}" uri-name = "{" "name" <"> URI <"> "}" Any URI specified in this field can be absolute or relative to the Request-URI. The "mirror" form of URI refers to a location which is a mirror copy of the Request-URI. The "name" form refers to a location-independent name corresponding to the Request-URI. 4.6 Alternates The Alternates entity-header field is used to describe the alternate resources bound to a negotiable resource. Alternates = "Alternates" ":" 1#( alternate-descr | caching-directive ) alternate-descr = "{" <"> URI <"> source-quality [ "{" "type" <"> media-type <"> "}" ] [ "{" "language" <"> 1#language-tag <"> "}" ] [ "{" "encoding" <"> 1#content-coding <"> "}" ] [ "{" "length" 1*DIGIT "}" ] [ "{" "description" quoted-string "}" ] [ extension-attribute ] "}" source-quality = qvalue extension-attribute = "{" extension-name extension-value "}" extension-name = token extension-value = #( token | quoted-string | ) Note: the extension-attribute is included because it is expected that HTTP/1.2 will define new attributes for use in the Alternates header. Also, this attribute eases content negotiation experiments under HTTP/1.1. caching-directive = "{" "cache-control" 1#cache-directive "}" Cache-directives are defined in Section (10.8[1]). Any URI specified in this field can be absolute or relative to the Request-URI. For each of the alternates bound to the negotiable resource, the alternates header should include an alternate-descr form describing that alternate. [Page 9] Internet-Draft Content Negotiation 22 February 1996 The source-quality attribute given in an alternate description is measured by the content provider as representing the amount of degradation from the original source. For example, a picture originally in JPEG form would have a lower source quality when translated to the XBM format, and much lower source quality when translated to an ASCII-art alternate. Note, however, that this is a function of the source -- an original piece of ASCII-art may degrade in quality if it is captured in JPEG form. Content providers should use the following table as a guide when assigning source quality values: 1.000 no degradation 0.999-0.900 no noticeable degradation 0.899-0.700 noticeable, but acceptable degradation 0.699-0.500 barely acceptable degradation 0.499-0.000 unacceptable degradation It is important that content providers do not assign very low source quality values without good reason, as this will limit the ability of users to influence the negotiation process with their own preference settings. If alternates are not converted from one source, but constructed separately to represent the same abstract information in different ways, then the source quality attributes can be used to express differences in quality between the alternates. An example Alternates header for a negotiable resource with the URI http://www.w3.org/pub/WWW/TheProject is: Alternates: {"TheProject.fr.html" 1.0 {type "text/html"} {language "fr"}}, {"TheProject.en.html" 1.0 {type "text/html"} {language "en"}}, {"TheProject.fr.txt" 0.7 {type "text/plain"} {language "fr"}}, {"TheProject.en.txt" 0.8 {type "text/plain"} {language "en"}} which indicates that the negotiable resource binds to four alternate resources that differ in media type and natural language. The type, language, encoding, and length attributes of an alternate description refer to their Content-* header counterparts. Though all attributes are optional, it is often desirable to include as many attributes as possible as this will increase the quality of the negotiation process. Servers must only generate extension-attributes whose names start with "x-". Clients should ignore all extension attributes they do not recognize. The description attribute is meant to provide a textual description of some properties of the alternate, to be displayed by a user agent when showing the list of all alternates bound to a negotiable [Page 10] Internet-Draft Content Negotiation 22 February 1996 resource (see Section 5). This attribute can be included if the URI and normal attributes of an alternate are considered too opaque to allow interpretation by the user. The cache-control directive of the Alternates header field can be used to restrict the cachability of the Alternates header, and, for 300 (multiple choices) and 406 (none acceptable) responses, the other parts of the response. This directives duplicates the control functionality offered for un-negotiated resources by the Cache-Control header. 4.7 Alt-Header The Alt-Header request-header can be used in requests to negotiable resources to introduce new request headers in any derived requests on alternate resources (see section 5.2). Alt-Header = "Alt-Header" ":" <"> URI <"> Request-Header The URI specified in this field can be absolute or relative to the Request-URI. A typical example is Alt-Header: "TheProject.en.html" If-Validator-Valid: 6a7bf If it already has a copy of the "TheProject.en.html" alternate in cache, a caching client can include this header in requests to allow the server to shorten a 200 (OK) preemptive negotiation response to a 304 (not Modified) response in case that preemptive negotiation yields "TheProject.en.html" as the best alternate. Servers are always allowed to ignore Alt-Header request headers. 5 Content negotiation (*) Content negotiation is an optional feature of the HTTP/1.1 protocol: resources may be negotiable, but they need not be. If a resource is negotiable, this changes the semantics of GET and HEAD transactions on the resource. Other transactions are not affected. A negotiable resource has a number of alternates bound to it. The HTTP content negotiation mechanism allows for automatic selection of the preferred alternate bound to a negotiable resource based on the properties of the alternates and on the user agent preferences for the retrieval action on the negotiated resource. An alternate is a resource, identified by an alternate URI, that provides one possible representation of the `contents' of the negotiable resource. An alternate resource must never be a negotiable resource itself. It is the responsibility of the author of the negotiable resource, not the author of the alternate, to ensure that this restriction is not violated. The negotiability of a resource is expressed by the Alternates [Page 11] Internet-Draft Content Negotiation 22 February 1996 response header. If a 2xx or 3xx class response does not include an Alternates response header, then the resource is un-negotiable. If any response does include an Alternates response header, then the resource is negotiable. When displaying an alternate as the end result of a retrieval action on a negotiable resource, a user agent should allow the user to review a list of all alternates bound to the negotiable resource, and to initiate retrieval of another alternate if desired. The list can be annotated with some or all of the properties of the alternates, as given by the Alternates header in the negotiable resource response. When displaying an alternate as the end result of a retrieval action on a negotiable resource, a user agent should show the negotiable resource URI, not the alternate resource URI, as being the URI the contents of which were retrieved. If the user agent stores a reference to the content displayed for future use, it is the the negotiable resource URI, not the alternate resource URI, which should be stored. HTTP/1.1 provides for two types of content negotiation: preemptive and reactive. Preemptive negotiation is generally faster than reactive negotiation, but it can only be used if sufficient information about user agent capabilities and user preferences is present in the request on the negotiable resource. Reactive negotiation can always be used. Therefore, preemptive negotiation is best seen as mechanism that can sometimes optimize on reactive negotiation transactions. 5.1 Reactive negotiation In reactive negotiation, the selection and retrieval of an alternate bound to the negotiable resource spans two transactions. In the first transaction, the client transmits a request on the negotiable resource URI, and the server responds with a 300 (multiple choices) or 406 (none acceptable) response, which includes an alternates header describing the alternates bound to the negotiable resource. A 406 response may always be generated, a 300 response may only be generated if specific conditions given in Section 5.2 are met. The client can use the Alternates header in the 300 or 406 response to select the alternate that matches best to the preferences for the retrieval action. In the second transaction, the user agent transmits a request on the URI of the selected alternate resource, and the server will typically respond with a 200 (OK) response, though other response codes like 302 (moved temporarily) are also possible. Only the user agent needs to know that the second request is part of a reactive negotiation process, all other parties can treat it as a normal request on an un-negotiated resource. User agents should use the reactive alternate selection algorithm below when automatically selecting the best alternate listed in an alternates header. User agents are allowed to use other selection [Page 12] Internet-Draft Content Negotiation 22 February 1996 algorithms, but this is not recommended, as preemptive negotiation is defined to optimize the case in which the reactive alternate selection algorithm below is used. User agents that do not wish to implement an alternate selection algorithm can, by only using Accept request headers of a certain form, force servers to always include an entity when a reactive negotiation response is sent. They can then use this entity to allow the user to select an alternate manually, or use the reactive response Location header, if present, to automatically fetch the alternate recommended by the server. In the first step of the reactive alternate selection algorithm, the overall quality for every alternate listed in the Alternates header of the negotiable resource is computed. The overall quality of an alternate is a real number Q in the range 0 through 1, where 0 is the minimum and 1 the maximum value, defined as Q = qs * qe * qc * ql * q * qml The values qs,qe,qc,ql,q,qml for a particular alternate are all determined using the part of the received Alternates header describing that alternate, called the alternate description below. qs The source quality factor for the alternate is given by the source-quality attribute in the alternate description. qe The encoding quality factor is 1 if there is no encoding attribute in the alternate description. If there is an encoding attribute in the alternate description, the encoding quality factor is 1 if the user agent can decode the given content encoding, 0 otherwise. qc The charset quality factor is 1 if there is no type attribute in the alternate description, or if the media type given in the type attribute of the alternate description does not have a charset parameter. If there is a charset parameter, then the charset quality factor is 1 if the user agent can process a message with the given character set, 0 otherwise. User agents must always be able to process a message with the US-ASCII charset. ql The language quality factor is 1 if there is no language attribute in the alternate description. If there is a language attribute, then the language quality factor is the highest quality factor assigned to any one of the listed languages according to the user agent language preferences for the retrieval action. q The media type quality factor is 1 if there is no type attribute in the alternate description. If there is a type attribute, then the media type quality factor is the quality factor assigned to the given media type in in the user agent media type preferences for the retrieval action. [Page 13] Internet-Draft Content Negotiation 22 February 1996 qml The maximum length quality factor is 1 if there is no length attribute in the alternate description. If there is a length attribute in the alternate description, then the maximum length quality factor is 1 if the length given is less than or equal to the maximum acceptable length according to the user agent maximum length preferences for the retrieval action, 0 otherwise. Preferred maximum lengths are often equal to `infinity'. In the second step of the the reactive alternate selection algorithm, the overall qualities of all alternates are compared to select the best alternate. If there is one alternate with the highest overall quality value, then that alternate is the best alternate. If there are multiple alternates that share the highest overall quality value, then the alternate that is listed first in the received Alternates header is the best alternate. If all alternates have an overall quality value of zero, a user agent should not automatically retrieve the first alternate, but stop the reactive negotiation process, allowing the user to decide on the next action. 5.2 Preemptive negotiation (*) In preemptive negotiation, the selection and retrieval of an alternate bound to the negotiable resource is done in a single transaction, saving one round trip time over reactive negotiation. A preemptive negotiation response must only be generated by a server if the request on the negotiable resource contains enough information about user agent capabilities and user preferences to allow the server to determine which alternate would be chosen if the reactive alternate selection algorithm outlined above were used by the user agent in reactive negotiation. When engaging in preemptive negotiation, the server must use the following algorithm, or any other algorithm that produces the same result, to construct the preemptive response message. 1. Construct a request message on the best alternate resource by modifying the received request message on the negotiable resource in the following way. First, the Request-URI and the Host request header must be rewritten to point to the best alternate resource. Then, if there are any Alt-Header request headers that match the best alternate resource URI, the headers given in these matching Alt-Header request headers may be added to the headers in the request message. Finally, the Alt-Header request headers in the request message may be removed. 2. Generate a valid HTTP response message for the request message constructed in step 1. If the server is a proxy, this may involve sending the constructed request to the origin server. 3. Add two headers to the HTTP response message generated in step 2. These are an Alternates header describing the alternates [Page 14] Internet-Draft Content Negotiation 22 February 1996 bound to the negotiable resource, and a Location header that gives the URI of the best alternate resource. A preemptive response message satisfies the origin server restriction if and only if the full URI of the best alternate resource can be obtained by adding a sequence of characters excluding "/" to the end of the full URI of the negotiable resource, where the first character added may not be an US-ASCII uppercase or lowercase letter. Origin servers should not generate a preemptive response message that violates the origin server restriction. If a client receives a preemptive response message that violates the origin server restriction directly from an origin server, then that client must reject the message as a probable spoofing attempt. If the client is a proxy, it must not pass on the response, it can pass on a 502 (bad gateway) response instead. Servers acting as proxies may generate preemptive responses that do violate the origin server restriction, and clients should not reject these responses. Clients, including caching proxies, may treat the HTTP response that can be derived from a reactive negotiation response by deleting the Alternates and Location headers as being controlled by the author of the best alternate resource, not the author of the negotiable resource on which the actual request was made. It is the responsibility of the server to ensure that the best alternate resource author indeed has this control. Section 6.1 discusses the implications of this rule on server design and administration. User agents can transmit information about their capabilities and preferences for a retrieval action using the various accept request headers. If the accept headers present in a request on a negotiable resource contain enough information, a server may be able to generate a preemptive negotiation response. As most resources will be un-negotiable, user agents are encouraged to send empty or small accept headers, or even omit some accept headers entirely, by default. If a user agent knows or discovers that an origin server provides negotiated resources, it is encouraged to use data from the negotiated responses received so far to dynamically add or extend accept headers sent in future requests on resources provided by that origin server, in order to increase the probability that preemptive negotiation can be used instead of the slower reactive negotiation. Servers that want to support preemptive negotiation must use the preemptive alternate selection algorithm below. This algorithm can be applied to determine o whether a preemptive negotiation response may be sent, and if so, which alternate is the best alternate o the appropriate response code, either 300 (Multiple Choices) or or 406 (None Acceptable), when a reactive response is sent. The algorithm uses the alternate descriptions for each of the available alternates, as will be included in the Alternates header of the response, and the Accept headers of the request on the negotiable [Page 15] Internet-Draft Content Negotiation 22 February 1996 resource as input. In the first step of the preemptive alternate selection algorithm, the overall quality for every alternate bound to the negotiable resource is computed. The overall quality is a real number Q in the range 0 through 1, where 0 is the minimum and 1 the maximum value, defined as Q = qs * qe * qc * ql * q * qml The overall quality values computed in the preemptive algorithm are not necessarily equal to the overall quality values values computed in the reactive algorithm of Section 5.1. The values qs,qe,qc,ql,q,qml for a particular alternate are all determined using the alternate description of the particular alternate and the Accept headers of the request. qs The source quality factor for the alternate is given by the source-quality attribute in the alternate description. qe The encoding quality factor is 1 if there is no encoding attribute in the alternate description. If there is an encoding attribute in the alternate description, the encoding quality factor is 1 if no Accept-Encoding header is present in the request, 1 if an Accept-Encoding header present indicates the ability to decode the given content encoding, and 0 otherwise. qc The charset quality factor is 1 if there is no type attribute in the alternate description, or if the media type given in the type attribute of the alternate description does not have a charset parameter. If there is a charset parameter, then the charset quality factor is 1 if the charset is US-ASCII, 1 if no Accept-Charset header is present in the request, 1 if an Accept-Charset header present indicates the ability to handle the given character set, and 0 otherwise. ql The language quality factor is 1 if there is no language attribute in the alternate description. If there is a language attribute, then the language quality factor is the highest quality factor assigned by the Accept-Language header in the request to any one of the languages listed in the attribute, 0 if none of the listed languages are assigned a quality factor by the Accept-Language header in the request, and 1 if there is no Accept-Language header in the request. q The media type quality factor is 1 if there is no type attribute in the alternate description. If there is a type attribute, then the media type quality factor is the quality factor assigned to the given media type by the Accept headers in the request, 0 if the Accept headers do not assign a quality factor to the media type, and 1 if there are no Accept headers in the request. [Page 16] Internet-Draft Content Negotiation 22 February 1996 qml The maximum length quality factor is 1 if there is no length attribute or no type attribute in the alternate description. If there is a length and a type attribute in the alternate description, then the maximum length quality factor is 0 if is the "mxb" value assigned to the given media type by the Accept headers in the request is less than the value given in the length attribute, 1 if the "mxb" value is greater or equal, 1 if the Accept headers do not assign an "mxb" value to the media type, and 1 if there are no Accept headers in the request. In the second step of the algorithm, the overall qualities of all alternates are compared to select the best one. If there is one alternate with the highest overall quality value, then this is the best alternate. If there are multiple alternates that share the highest overall quality value, then the alternate that is listed first in the Alternates header is the best alternate. If all alternates have an overall quality value of zero, then any reactive negotiation response sent must use the 406 (None Acceptable) response code. Else, any reactive negotiation response sent should use the 300 (Multiple Choices) response code. In the third step of the preemptive negotiation alternate selection algorithm, it is determined whether a preemptive negotiation response may be sent to return the best alternate found. If the best alternate has an overall quality value of zero, then the server must not generate a preemptive response, it should generate a reactive response with the 406 (None Acceptable) response code. If the best alternate has an overall quality factor greater than zero, and no Accept header in the request contains a reactive-on-wildcard directive, then the server may generate a preemptive response, provided that the origin server restriction, if applicable, is met. If the best alternate has an overall quality factor greater than zero, and an Accept header in the request contains a reactive-on-wildcard directive, then the server may generate a preemptive response, provided that the origin server restriction, if applicable, is met, if o the type quality factor (q) of the best alternate was not derived from a match to a media range containing an asterisk "*" wildcard character in an Accept header, and o the language quality factor (ql) of the best alternate was not derived from a match to a "*" language-range in the Accept-Language header. In all other cases, the server must generate a reactive response. [Page 17] Internet-Draft Content Negotiation 22 February 1996 5.3 Caching issues HTTP/1.1 does not provide a mechanism for conditional GET requests on negotiable resources, but does provide a mechanism, the Alt-Header request header, for conditional GET requests on alternate resources. When generating a 300 (Multiple Options) response, a 406 (None Acceptable) response, or the Alternates headers for a preemptive response, a cache may re-use an Alternates header received earlier from the negotiable resource, as long as the restrictions expressed by any cache-control directive in the Alternates header are met. If the presence of an entity is required in a 300 or 406 response, caches may generate that entity on behalf of the origin server. When relaying a preemptive response, a cache may infer the request and response messages of the HTTP transaction on the best alternate resource performed by the server that generated the preemptive response, and may update its internal data structures to reflect the occurrence of this HTTP transaction. Caches are encouraged to perform such updates because they increase efficiency and prevent strange (but otherwise allowed) effects if the contents of an alternate resource are changed at the origin server while there is still a non-expired version of these contents in cache. 6 Security and Privacy considerations 6.1 Spoofing using Location headers Clients, including caching proxies, may treat the HTTP response that can be derived from a reactive negotiation response by deleting the Alternates and Location headers as being controlled by the author of the best alternate resource, not the author of the negotiable resource on which the actual request was made. It is the responsibility of the server to ensure that the best alternate resource author indeed has this control, because if this control is lost, control over the responses generated by direct requests on the best alternate resource is also lost. Origin servers are helped carrying this responsibility by the rule that clients must reject preemptive responses that do not satisfy the origin server restrictions. This paragraph discusses the implications of the above on server design and administration. First, it is intended that any negotiable resource authoring mechanism built into the server, and accessible to authors of static content and CGI scripts, generates preemptive responses by internally doing a request on the best variant resource, and adding the required Alternates and Location headers to the generated response. Second, it is intended that, if the CGI interface has a feature that allows script authors to generate a preemptive response directly, then a) two distrusting parties will never be able to author CGI scripts in a shared directory, or b) use [Page 18] Internet-Draft Content Negotiation 22 February 1996 of this feature is only enabled for a CGI script if the script author is trusted by all other authors that use the same directory, or c) the server filters the Location headers generated by the CGI script to prevent spoofing that is not prevented by clients applying the origin server restriction. 6.2 User tracking based on accept headers If users fine-tune quality factors put into the default user agent accept headers to the third decimal, these accept headers can be used as relatively long-lived user identifiers, enabling content providers (even if they do not provide negotiable resources) to tell apart different users behind a proxy. This identification allows content providers to do click-trail tracking, and allows collaborating content providers to match cross-server click-trails or form submissions of individual users. Thus, privacy reasons demand that user agents are conservative in the amount of quality factor fine tuning they allow to users without giving a warning about privacy and in the sending of long accept headers by default in a request. (See also the remarks on sending short accept headers for performance reasons in Section 5.2). 6.3 Accept headers revealing information of private nature without real need. Preferences sent in accept headers, in particular language quality factors sent in Accept-Language headers, may reveal information that the user rather keeps private unless it will directly improve the quality of the service. The content negotiation mechanism allows users to leave some languages (e.g. languages the knowledge of which strongly correlates with membership of a particular ethnic group) out of the Accept-Language header without decreasing the quality of the negotiation process if the request happens to be on a negotiable resource. Note however that the speed of the negotiation process may be affected. No matter how much information is left out of the Accept headers, automatic reactive negotiation by a user agent on a negotiable resource will inevitably reveal some of the user preferences by the generation of a request on the best alternate resource as partly determined by the user preferences. Malicious service authors could provide `fake' negotiable resources, which not even bind to alternate resources that are in fact different, whose only purpose is to get information about (ethnicity correlated) languages understood by the visiting users. Such plots would however be visible to alert victims, as user agents will allow the user to review a list of all alternates bound to the negotiable resource. Maintainers of firewall proxies may want to process outgoing accept headers to enhance privacy beyond the level provided by the user agents behind the firewall. [Page 19] Internet-Draft Content Negotiation 22 February 1996 7 Acknowledgments This document builds on the content negotiation descriptions in [1], and directly incorporates text from [1] in some places. Many members of the HTTP working group have contributed to discussions that are reflected in this document. 8 References [1] Roy T. Fielding, Henrik Frystyk Nielsen, and Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1. Internet-Draft draft-ietf-http-v11-spec-01.txt, HTTP Working Group, January, 1996. [2] H. Alvestrand. "Tags for the identification of languages." RFC 1766, UNINETT, March 1995. 9 Author's address Koen Holtman Technische Universiteit Eindhoven Postbus 513 Kamer HG 6.57 5600 MB Eindhoven (Holland) e-mail: koen@win.tue.nl Expires: 27 August 1996 [Page 20]