HTTP Working Group J. C. Mogul, DECWRL Internet-Draft 6 January 1997 Expires: 15 July 1997 Forcing HTTP/1.1 proxies to revalidate responses draft-mogul-http-revalidate-00.txt STATUS OF THIS MEMO This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. Please send comments to the HTTP working group at . Discussions of the working group are archived at . General discussions about HTTP and the applications which use HTTP should take place on the mailing list. ABSTRACT The HTTP/1.1 specification [1] currently defines a ``proxy-revalidate'' Cache-control directive, which forces a proxy to revalidate a stale response before using it in a reply. There is no mechanism defined that forces a proxy, but not an end-client, to revalidate a fresh response. The lack of such a mechanism is due to an error in drafting RFC2068, and appears to create problems for use of the Authorization header, the Digest Access Authentication extension [2], the State Management Mechanism [3], and several other proposed extensions. This document discusses the problem and several possible solutions, and proposes to add a new ``proxy-maxage'' directive as the best available solution. Mogul [Page 1] Internet-Draft HTTP proxy revalidation 6 January 1997 18:22 TABLE OF CONTENTS 1 Introduction 2 2 Problems with proxy-revalidate 3 3 Possible alternatives 5 3.1 Alternatives not requiring changes to RFC2068 5 3.2 Alternatives that require changes to RFC2068 6 4 Proposed solution 8 5 Security Considerations 10 6 Acknowledgements 10 7 References 10 8 Author's address 10 1 Introduction HTTP/1.1 introduces a ``Cache-control'' header to allow origin servers and clients to impose fine-grained control over the operation of HTTP caches. One important aspect of HTTP caching is whether a cache should ``revalidate'' a cached response with the origin server, before using the response as a cache hit. In cases where the use of an invalid cache entry could lead to serious error, such as the violation of an authentication policy, or incorrect behavior of an online shopping application, proper revalidation could be crucial. On the other hand, caching can yield significant performance benefits, and so we want to make caching as effective as possible. Note: HTTP caches normally revalidate a cached response by sending a conditional GET to the origin server. This may be done using the ``If-none-match'' request header or the ``If-modified-since'' request header. If the server would return the same response as the cached response, the server may reply with a status code of 304 (Not Modified). While this does involve a message exchange, by avoiding the transmission of the entity body, a revalidation is often much cheaper than an unconditional retrieval. Regarding the terms ``fresh'' and ``stale'': a cached response is considered to be fresh if its current age is less than its maximum allowed age. A cached response is stale otherwise. Normally, only stale responses need to be revalidated; a fresh response is inherently usable without revalidation, until it reaches its maximum age. During the design of HTTP/1.1, it was realized that different revalidation policies might be applied to end-client caches (e.g., in browsers) and to intermediate proxy caches. For example, a proxy cache might be shared between multiple users (raising security considerations), or it might be operated by someone whose interests in reducing transmission costs do not coincide with the interest of the ultimate client or origin server in preserving certain kinds of application semantics. Mogul [Page 2] Internet-Draft HTTP proxy revalidation 6 January 1997 18:22 It was also realized that in some cases, it might sometimes be appropriate to configure a cache to be ``loose'' in its behavior for stale responses. That is, in such a situation, the cache might return a stale response without revalidating it. This might be done, for example, if the network connection between the cache and the origin server is not working, or if the cost or delay for revalidation is prohibitively high. There is an obvious potential contradiction between the occasional requirement for strict revalidation of certain responses, and the occasional desire to allow loose operation of some HTTP caches. HTTP/1.1 resolves this by allowing (although not encouraging) loose operation as the default, but by providing a protocol mechanism for origin servers or end-clients to insist on mandatory strict operation when necessary. This is done using the ``Cache-control'' header, which can carry a number of cache-control directives. In particular, these directives are defined in section 14.9 of RFC2068: - max-age=NNN Sets the maximum age for this response to NNN seconds. By itself, does not force strict revalidation behavior. - no-cache Prevents any caching of this response. - private Prevents any caching by a shared cache. - must-revalidate Requires that an HTTP/1.1 cache revalidate the response before using it, if the response is stale. - proxy-revalidate Requires that an HTTP/1.1 proxy cache revalidate the response before using it, if the response is stale; does not affect an end-client cache. 2 Problems with proxy-revalidate The fundamental problem with proxy-revalidate, as defined in RFC2068, is that it does not require a proxy cache to revalidate a fresh response before using it. However, there are several circumstances in which it is desirable or necessary to force a proxy cache to revalidate a response that, to an end-client cache, would appear to be fresh. In section 14.8 of RFC2068, defining the ``Authorization'' header, this language appears: Mogul [Page 3] Internet-Draft HTTP proxy revalidation 6 January 1997 18:22 1. If the response includes the "proxy-revalidate" Cache-Control directive, the cache MAY use that response in replying to a subsequent request, but a proxy cache MUST first revalidate it with the origin server, using the request-headers from the new request to allow the origin server to authenticate the new request. While this could be read as modifying the definition of proxy-revalidate from section 14.9.4, it was in fact not intended as a modification. Rather, the author of these two sections of RFC2068 (me!) failed to notice the conflicting intentions of these two uses of proxy-revalidate. In section 2.1.2 of RFC2069 [2] (the specification of the Digest Access Authentication extension to HTTP/1.1), this language appears: Implementors should be aware of how authenticated transactions interact with proxy caches. The HTTP/1.1 protocol specifies that when a shared cache (see section 13.10 of [2]) has received a request containing an Authorization header and a response from relaying that request, it MUST NOT return that response as a reply to any other request, unless one of two Cache-control (see section 14.9 of [2]) directives was present in the response. If the original response included the ``must-revalidate'' Cache-control directive, the cache MAY use the entity of that response in replying to a subsequent request, but MUST first revalidate it with the origin server, using the request headers from the new request to allow the origin server to authenticate the new request. Alternatively, if the original response included the ``public'' Cache-control directive, the response entity MAY be returned in reply to any subsequent request. This discussion appears to be in error, since its implication that ``must-revalidate'' always MUST cause a revalidation does not cover the case of apparently fresh responses. In fact, discussion with one of the authors of RFC2069 has confirmed that he understood that RFC2068 had provided a ``proxy must revalidate even if fresh'' directive, which it does not. In section 4.2.3 of RFCXXXX [3] (the specification of the State Management Mechanism for HTTP/1.1), this language appears: The origin server should send [one of] the following additional HTTP/1.1 response headers, depending on circumstances: * To suppress caching of a private document in shared caches: Cache-control: private. Mogul [Page 4] Internet-Draft HTTP proxy revalidation 6 January 1997 18:22 * To allow caching of a document and require that it be validated before returning it to the client: Cache-control: must-revalidate. * To allow caching of a document, but to require that proxy caches (not user agent caches) validate it before returning it to the client: Cache-control: proxy-revalidate. * To allow caching of a document and request that it be validated before returning it to the client (by ``pre-expiring'' it): Cache-control: max-age=0. Not all caches will revalidate the document in every case. Here again there seems to be a (false) assumption that must-revalidate and proxy-revalidate cause revalidation even of fresh responses. Finally, the proposed Hit-Metering extension to HTTP/1.1 [4] depends on a mechanism whereby an origin server can require proxy caches to revalidate a response before every use, without requiring end-client caches to do the same thing (which would be prohibitively inefficient). In summary, there is a clear need for a Cache-control mechanism that allows an origin server to specify that a proxy must always revalidate a response, while allowing end-clients to cache it without revalidation (perhaps for a limited period). 3 Possible alternatives 3.1 Alternatives not requiring changes to RFC2068 Assuming that we do want a mechanism that allows an origin server to specify that a proxy must always revalidate a response, while allowing end-clients to cache it without revalidation, we could certainly do this by modifying the HTTP/1.1 specification proposed in RFC2068. Would it be possible to do this without modifying RFC2068, possibly by using a combination of existing Cache-control directives to approximate the desired behavior? One solution would simply to use ``Cache-control: private''. This would preserve any necessary semantics (because it would prevent any and all proxy caching of the response). However, it is much less efficient; because ``private'' prevents a shared cache from even storing the response, it cannot do a conditional request for subsequent references. Hence, this approach would lead to much unnecessary transmission of entity bodies when we could be using 304 (Not Modified) responses. Another approach would be to use ``Cache-control: proxy-revalidate, max-age=0''. This allows proxies to store the response and forces Mogul [Page 5] Internet-Draft HTTP proxy revalidation 6 January 1997 18:22 them to revalidate it on every reference. However, it also implies that strict end-user caches should revalidate on every reference as well, which could cause even more unnecessary traffic than ``Cache-control: private'' would. In short, there does not appear to be a way to use the existing RFC2068 mechanisms to preserve both the necessary semantics and optimal cache performance. 3.2 Alternatives that require changes to RFC2068 Several proposals have been made for modifications to RFC2068 to resolve this problem. We could redefine proxy-revalidate to mean ``always revalidate, even if the response is fresh.'' However, this would leave us either with no way to allow strict caches to use a response while it is fresh, or with no way to force loose caches to revalidate certain responses. The other proposals all involve adding one new Cache-control directive, while preserving the current meaning of the existing proxy-revalidate directive: - proxy-mustcheck This would mean that a proxy, but not an end-client, would have to revalidate the response even it is fresh - proxy-maxage=NNN This would mean defining separate maximum ages for proxy caches and for end-client caches. The existing max-age (or Expires) value would continue to apply to end-client caches, and would continue to apply to proxy caches if the proxy-maxage directive were not present. However, if proxy-maxage is present, then it would override the max-age (or Expires) limit for proxies, but would be ignored by end-clients. - agent-maxage=NNN This would also mean defining separate maximum ages for proxy caches and for end-client caches. The existing max-age (or Expires) value would continue to apply to proxy caches, and would continue to apply to end-client caches if the agent-maxage directive were not present. However, if agent-maxage is present, then it would override the max-age (or Expires) limit for end-clients, but would be ignored by proxy caches. One option would be to make either ``proxy-maxage'' or ``agent-maxage'' always strict: that is, they would imply that a proxy or end-client, respectively, would be required to revalidate a stale response. Alternatively, they could be combined with a ``must-revalidate'' directive to force strict behavior, but would otherwise allow loose behavior. Mogul [Page 6] Internet-Draft HTTP proxy revalidation 6 January 1997 18:22 Each of these proposals would solve the existing problem, would be simple to specify, and would probably not require significant implementation complexity or overhead. However, we should probably choose just one of these options; what are the relative merits? The proxy-mustcheck approach is clearly the simplest, but gives up the possibility of separate control over proxy and end-client expiration times. Since this orthogonality could potentially be useful, it seems more useful to adopt the proxy-maxage or agent-maxage proposals. If one assumes that this change could be made to the HTTP/1.1 specification before the permanent deployment of any HTTP/1.1 proxies, there at first seems to be no obvious reason to prefer one to the other. That is, this header Cache-control: max-age=10,proxy-maxage=3 and this one Cache-control: max-age=3,agent-maxage=10 both express the same semantics in the same number of bytes. However, if we also adopt the rule that proxy-maxage implies the presence of proxy-revalidate, then in order to express the semantics of Cache-control: max-age=10,proxy-maxage=3 the origin server would have to send Cache-control: max-age=3,agent-maxage=10,proxy-revalidate which is somewhat more expensive. Also, if we do adopt the Hit-metering proposal [4], the proxy-maxage approach seems preferable, because it would allow the necessary header rewriting to be accomplished by simple addition of a directive, rather than more elaborate rewriting. For example, if the origin server sends a hit-metered response with Cache-control: max-age=10 then it would be rewritten (at the appropriate proxy, if necessary; see [4] for details) as Cache-control: max-age=10,proxy-maxage=0 using the proxy-maxage alternative, but would have to be rewritten as Mogul [Page 7] Internet-Draft HTTP proxy revalidation 6 January 1997 18:22 Cache-control: max-age=0,agent-maxage=10 using the agent-maxage proposal. On the other hand, if we cannot make the necessary change to the specification before the deployment of HTTP/1.1 proxies, then the agent-maxage proposal is somewhat safer in terms of semantics. That is, if HTTP/1.1 proxies are deployed that do not understand the proxy-maxage directive, the use of agent-maxage will not cause these proxies to avoid revalidating fresh responses. This is because they will presumably carry a ``max-age=0'' directive, and so not appear to be fresh to these proxies. Unfortunately, if we fail to change the specification before the permanent deployment of HTTP/1.1 end-clients, then we may face a performance problem with the use of agent-maxage: clients that do not understand this new directive might do many more revalidations than necessary, and so cause excessive network and server loading, as well as unnecessary delays. Ultimately, therefore, it would be best if we made this specification change before any permanent deployment of HTTP/1.1 proxies or clients. If we do so, then it seems more efficient to use the proxy-maxage mechanism. 4 Proposed solution The HTTP/1.1 specification in RFC2068 should be changed, in section 14.9 (Cache-Control), in the following ways: - The grammar for cache-response-directive should include a new alternative: | "proxy-maxage" "=" delta-seconds There is no need for a corresponding change to the grammar for cache-request-directive. - Section 14.9.3 should include, after the second paragraph (which starts with ``If a response includes ...''), this new paragraph: If a response includes a proxy-maxage directive, then for a proxy cache (but not for an end-client cache), the maximum age specified by this directive overrides the maximum age specified by either the max-age directive or the Expires header. The proxy-maxage directive also implies the semantics of the proxy-revalidate directive (see section 14.9.4), i.e., that the proxy MUST NOT use the Mogul [Page 8] Internet-Draft HTTP proxy revalidation 6 January 1997 18:22 entry after it becomes stale to respond to a subsequent request without first revalidating it with the origin server. The proxy-maxage directive is always ignored by an end-client. - In section 13.4, the list of response headers and directives implicitly allowing cachability should include ``proxy-maxage'' after ``max-age''. - In section 14.8 (Authorization), this paragraph: 1. If the response includes the "proxy-revalidate" Cache-Control directive, the cache MAY use that response in replying to a subsequent request, but a proxy cache MUST first revalidate it with the origin server, using the request-headers from the new request to allow the origin server to authenticate the new request. should become 1. If the response includes the "proxy-revalidate" Cache-Control directive, the cache MAY use that response in replying to a subsequent request, but if the response is stale, a proxy cache MUST first revalidate it with the origin server, using the request-headers from the new request to allow the origin server to authenticate the new request. This paragraph 2. If the response includes the "must-revalidate" Cache-Control directive, the cache MAY use that response in replying to a subsequent request, but all caches MUST first revalidate it with the origin server, using the request-headers from the new request to allow the origin server to authenticate the new request. should become 2. If the response includes the "must-revalidate" Cache-Control directive, the cache MAY use that response in replying to a subsequent request, but if the response is stale, all caches MUST first revalidate it with the origin server, using the request-headers from the new request to allow the origin server to authenticate the new request. Additionally, RFC2069 [2] and RFCXXXX [3] should probably be modified to suggest the use of ``proxy-maxage=0'' and/or ``max-age=0, must-revalidate'' to force proxies to revalidate a response. Mogul [Page 9] Internet-Draft HTTP proxy revalidation 6 January 1997 18:22 5 Security Considerations The proposed Digest Access Authentication extension [2] depends upon a mechanism to force proxies to always revalidate certain responses. Whether or not the proposal in this document is adopted, the Digest Access Authentication extension requires modification to reflect the option chosen (unless the HTTP/1.1 specification is revised to make ``proxy-revalidate'' apply to fresh as well as to stale responses.) 6 Acknowledgements Several people contributed to my understanding of this issue, including Koen Holtman, Paul Leach, Ingrid Melve, and Anselm Baird-Smith. However, the proposal in this document is my fault alone. 7 References 1. Roy T. Fielding, Jim Gettys, Jeffrey C. Mogul, Henrik Frystyk Nielsen, and Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1. RFC 2068, HTTP Working Group, January, 1997. 2. J. Franks, P. Hallam-Baker, J. Hostetler, P. Leach, A. Luotonen, E. Sink, L. Stewart. An Extension to HTTP: Digest Access Authentication. RFC 2069, HTTP Working Group, January, 1997. 3. D. Kristol, L. Montulli. HTTP State Management Mechanism. RFC XXXX, HTTP Working Group, January, 1997. draft-ietf-http-state-mgmt-05.txt; approved by the IESG, not yet assigned an RFC number.. 4. J. Mogul and P. Leach. Simple Hit-Metering for HTTP. Internet Draft draft-mogul-http-hit-metering-01.txt, HTTP Working Group, December, 1996. This is a work in progress.. 8 Author's address Jeffrey C. Mogul Western Research Laboratory Digital Equipment Corporation 250 University Avenue Palo Alto, California, 94305, USA Email: mogul@wrl.dec.com Mogul [Page 10]