/[suikacvs]/webroot/www/2004/id/draft-mogul-http-dcluster-00.txt
Suika

Contents of /webroot/www/2004/id/draft-mogul-http-dcluster-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 wakaba 1.1
2     Network Working Group Jeffrey Mogul, Compaq WRL,
3     Internet-Draft Fred Douglis, AT&T,
4     Expires: 25 February 2001 Daniel Hellerstein, ERS/USDA
5     24 August 2000
6    
7    
8    
9     HTTP Delta Clusters and Templates
10    
11     draft-mogul-http-dcluster-00.txt
12    
13    
14     STATUS OF THIS MEMO
15    
16     This document is an Internet-Draft and is in full
17     conformance with all provisions of Section 10 of RFC2026.
18    
19     Internet-Drafts are working documents of the Internet
20     Engineering Task Force (IETF), its areas, and its working
21     groups. Note that other groups may also distribute working
22     documents as Internet-Drafts.
23    
24     Internet-Drafts are draft documents valid for a maximum of
25     six months and may be updated, replaced, or obsoleted by
26     other documents at any time. It is inappropriate to use
27     Internet-Drafts as reference material or to cite them other
28     than as "work in progress."
29    
30     The list of current Internet-Drafts can be accessed at
31     http://www.ietf.org/ietf/1id-abstracts.txt
32    
33     The list of Internet-Draft Shadow Directories can be
34     accessed at http://www.ietf.org/shadow.html.
35    
36     Distribution of this document is unlimited. Please send
37     comments to the authors.
38    
39    
40     ABSTRACT
41    
42     HTTP "Delta encoding," the transmission of a compact
43     encoding of the change between instances of a Web resource
44     instead of retransmitting the entire new value, has been
45     shown to be of potential value. Research has shown
46     additional benefits if deltas can be computed between
47     instances of different resources. This document describes
48     a compatible extension to HTTP delta encoding to support
49     "clustering", where multiple resources (URLs) are treated
50     as a pool, and the use of "templates", where a large set of
51     resource instances are most naturally described as deltas
52     from a chosen template resource.
53    
54    
55    
56    
57     Mogul et al. [Page 1]
58    
59     Internet-Draft Delta clustering 24 August 2000 16:15
60    
61    
62     TABLE OF CONTENTS
63    
64     1 Introduction 3
65     1.1 Related research and proposals 4
66     2 Terminology 5
67     3 Delta-encoding and clustering 6
68     4 Use of templates 8
69     5 Specification 11
70     5.1 Modified basic requirements for delta-encoded responses 11
71     5.2 Modified header specifications 12
72     5.2.1 A-IM 12
73     5.3 New header specifications 12
74     5.3.1 DCluster 12
75     5.3.2 DTemplate 13
76     5.4 Rules for determining base instances in a uniqueness scope 13
77     6 Security Considerations 15
78     6.1 Spoofing attacks using the DCluster header 15
79     6.2 Privacy attacks using the DCluster header 17
80     6.3 Data leakage attacks using the DCluster header 18
81     7 History 18
82     7.1 draft-mogul-http-dcluster-00.txt 18
83     8 Acknowledgements 18
84     9 References 18
85     10 Authors' addresses 20
86    
87    
88    
89    
90    
91    
92    
93    
94    
95    
96    
97    
98    
99    
100    
101    
102    
103    
104    
105    
106    
107    
108    
109    
110    
111    
112    
113    
114     Mogul et al. [Page 2]
115    
116     Internet-Draft Delta clustering 24 August 2000 16:15
117    
118    
119     1 Introduction
120    
121     WARNING: THIS SPECIFICATION WILL CHANGE. DO NOT DEPLOY
122     ANY IMPLEMENTATIONS BASED ON THIS SPECIFICATION.
123    
124     The World Wide Web is a distributed system, and so often benefits
125     from caching to reduce retrieval delays. Retrieval of a Web resource
126     (such as document, image, icon, or applet) over the Internet or other
127     wide-area network usually takes enough time that the delay is over
128     the human threshold of perception. Often, that delay is measured in
129     seconds. Caching can often eliminate or significantly reduce
130     retrieval delays.
131    
132     Many Web resources change over time, so a practical caching approach
133     must include a coherency mechanism, to avoid presenting stale
134     information to the user. Originally, the Hypertext Transfer Protocol
135     (HTTP) provided little support for caching, but under operational
136     pressures, it quickly evolved to support a simple mechanism for
137     maintaining cache coherency.
138    
139     In HTTP/1.0 [2], the server may supply a ``last-modified'' timestamp
140     with a response. If a client stores this response in a cache entry,
141     and then later wishes to re-use the response, it may transmit a
142     request message with an ``If-modified-since'' field containing that
143     timestamp; this is known as a conditional retrieval. Upon receiving
144     a conditional request, the server may either reply with a full
145     response, or, if the resource has not changed, it may send an
146     abbreviated reply, indicating that the client's cache entry is still
147     valid. HTTP/1.0 also includes a means for the server to indicate,
148     via an ``Expires'' timestamp, that a response will be valid until
149     that time; if so, a client may use a cached copy of the response
150     until that time, without first validating it using a conditional
151     retrieval.
152    
153     HTTP/1.1 [6] adds many new features to improve cache coherency and
154     performance. However, it preserves the all-or-none model for
155     responses to conditional retrievals: either the server indicates that
156     the resource value has not changed at all, or it must transmit the
157     entire current value.
158    
159     Common sense suggests (and traces confirm), however, that even when a
160     Web resource does change, the new instance is often substantially
161     similar to the old one. If the difference, or ``delta'', between the
162     two instances could be sent to the client instead of the entire new
163     instance, a client holding a cached copy of the old instance could
164     apply the delta to construct the new version. In a world of finite
165     bandwidth, the reduction in response size and delay could be
166     significant.
167    
168     One can think of deltas as a way to squeeze as much benefit as
169     possible from client and proxy caches. Rather than treating an
170    
171     Mogul et al. [Page 3]
172    
173     Internet-Draft Delta clustering 24 August 2000 16:15
174    
175    
176     entire response as the ``cache line,'' with deltas we can treat
177     arbitrary pieces of a cached response as the replaceable unit, and
178     avoid transferring pieces that have not changed.
179    
180     A separate document [8] specifies a set of compatible extensions to
181     HTTP/1.1 that allow clients and servers to use delta encoding with
182     minimal overhead. That mechanism only supports deltas between
183     instances of a single resource.
184    
185     This document specifies further extensions to the delta encoding
186     mechanism. These extensions allow deltas to be computed between
187     instances of different resources. This increases the likelihood that
188     a compact delta might be found to encode the current instance of a
189     requested resource.
190    
191     We assume that the reader is familiar with the HTTP/1.1
192     specification, and with the delta encoding specification.
193    
194     1.1 Related research and proposals
195     The WebExpress project [7] appears to be the first published
196     description of an implementation of delta encoding for HTTP (which
197     they call ``differencing''). WebExpress is aimed specifically at
198     wireless environments, and includes a number of orthogonal
199     optimizations. Also, the WebExpress design does not propose changing
200     the HTTP protocol itself, but rather uses a pair of interposed
201     proxies to convert the HTTP message stream into an optimized form.
202     The results reported for WebExpress differencing are impressive, but
203     are limited to a few selected benchmarks.
204    
205     The WebExpress paper also pointed out that in many cases, the
206     individual responses to different queries with the same ``URL
207     prefix'' (that is, the prefix of the URL before the ``?'' character)
208     are often similar enough to make delta encoding effective. Since
209     users frequently make numerous different queries using the same URL
210     prefix, it might be much more effective to compute deltas between
211     different queries for a given URL prefix, rather than simply between
212     different queries using an identical URL. Banga et al. [1] make a
213     similar observation. A 1997 trace-based study [9] showed that this
214     approach has significant potential for improving the bandwidth
215     requirements. The "clustering" mechanism described in this
216     specification is intended to support the use of delta encoding in
217     contexts where the delta is computed between two different URLs.
218    
219     The WebExpress project [7] adopted the concept of a designated ``base
220     object'', rather than simply relying on a prefix-matching mechanism.
221     WebExpress included a mechanism for ``rebasing'' a client (providing
222     it with a new base object). The "templates" mechanism described in
223     this specification supports a very similar approach.
224    
225     The approaches described above, and in this specification, operate
226     independent of the syntax and semantics of the data being transferred
227    
228     Mogul et al. [Page 4]
229    
230     Internet-Draft Delta clustering 24 August 2000 16:15
231    
232    
233     (although delta encoding algorithms for images may require some
234     specialization). They function by decomposing responses at the bit
235     or byte level into currently-cached and need-to-be-transferred
236     components. One can also do this decomposition at a higher level.
237     Douglis et al. [5] describe an "HTML macro" mechanism, in which a set
238     of similar HTML pages is decomposed into a constant component (akin
239     to a macro body) and a variable component (akin to macro arguments).
240     In many cases, the variable component can be quite small; this means
241     once the constant component is in a cache, references to similar
242     pages require fetching only the small variable component, at a
243     significant cost savings over transferring a monolithic response.
244    
245     The main drawback to the HTML macro approach is that it requires
246     direct involvement by the designer (or software) when generating the
247     Web pages, including some careful attention to the decomposition of a
248     set of similar pages. It might also require some additional
249     language-level standardization, although this perhaps could be
250     obviated through the use of Java-based macros. Therefore, support
251     for HTML macros is beyond the scope of this specification.
252    
253    
254     2 Terminology
255    
256     HTTP/1.1 [6] defines the following terms:
257    
258     resource A network data object or service that can be
259     identified by a URI, as defined in section 3.2.
260     Resources may be available in multiple
261     representations (e.g. multiple languages, data
262     formats, size, resolutions) or vary in other ways.
263    
264     entity The information transferred as the payload of a
265     request or response. An entity consists of
266     metainformation in the form of entity-header fields
267     and content in the form of an entity-body, as
268     described in section 7.
269    
270     variant A resource may have one, or more than one,
271     representation(s) associated with it at any given
272     instant. Each of these representations is termed a
273     `variant.' Use of the term `variant' does not
274     necessarily imply that the resource is subject to
275     content negotiation.
276    
277     The specification for delta encoding [8] defined these additional
278     terms:
279    
280     instance The entity that would be returned in a status-200
281     response to a GET request, at the current time, for
282     the selected variant of the specified resource, with
283     the application of zero or more content-codings, but
284    
285     Mogul et al. [Page 5]
286    
287     Internet-Draft Delta clustering 24 August 2000 16:15
288    
289    
290     without the application of any instance manipulations
291     or transfer-codings.
292    
293     instance manipulation
294     An operation on one or more instances which may
295     result in an instance being conveyed from server to
296     client in parts, or in more than one response
297     message. For example, a range selection or a delta
298     encoding. Instance manipulations are end-to-end, and
299     often involve the use of a cache at the client.
300    
301     See that specification for further discussion of those terms.
302    
303     For the extensions specified in this document, we define one more
304     term:
305    
306     uniqueness scope
307     The uniqueness scope of an entity tag is the set of
308     resources across which this entity tag is unique for
309     all time. That is, within this set of resources, if
310     two instances share an entity tag, then the values of
311     these instances (including their instance bodies and
312     their instance headers) are equal.
313    
314     In unmodified HTTP/1.1, the uniqueness scope of an entity tag is
315     always a single resource. In this proposal, we provide a means to
316     extend the uniqueness scope to include multiple resources.
317    
318    
319     3 Delta-encoding and clustering
320    
321     The basic delta-encoding model assumes that deltas are computed
322     between two instances of a specific resources; i.e., both deltas are
323     associated with a single URL. However, the WebExpress project [7]
324     suggested that by treating a query URL (that is, a URL with an
325     embedded ``?'') as a prefix followed by a set of parameters, one
326     could then profitably compute deltas between resource values whose
327     URLs have identical prefixes, but perhaps different parameters
328     (suffixes). Our trace-based study confirmed this [10]. We believe
329     that this might be generalized to certain other patterns of URLs
330     (i.e., not just those using ``?'' as a separator). We use the term
331     ``clustering'' for this approach.
332    
333     For example, if a client has cached a response for a DEC stock quote
334     (``http://quote.yahoo.com/q?s=DEC&d=f''), and then requests a quote
335     for AT&T from the same server (``http://quote.yahoo.com/q?s=T&d=f''),
336     the prefix for the cluster would be ``http://quote.yahoo.com/q?''.
337    
338     In order to support clustering, we need a mechanism for the server to
339     indicate to the client which URLs are eligible for clustering (since
340     it would be highly inefficient for the client to send the entity tags
341     of every resource in its cache on every request).
342     Mogul et al. [Page 6]
343    
344     Internet-Draft Delta clustering 24 August 2000 16:15
345    
346    
347     We propose a new, optional response header for this purpose, to
348     specify a URL-prefix for other resources that ``cluster'' with the
349     given response. The header name is ``DCluster''.
350    
351     Once a cluster-eligible response is cached, when the client is about
352     to make a subsequent request, it would match the request-URI against
353     all of the URL-prefixes in its cache. (As specified in section
354     5.3.1, only cache entries received after the matching DCluster header
355     are eligible.) The ``If-None-Match'' field in its request could then
356     list the entity tags for all of the matching entries. In some cases,
357     it might be more efficient to list only a subset (such as the most
358     recently received cache entries), to avoid excessive request header
359     lengths.
360    
361     For example, if a client makes this initial request:
362    
363     GET /foo?p=1 HTTP/1.1
364     Host: bar.example.net
365    
366     and receives this response:
367    
368     HTTP/1.1 200 OK
369     Date: Sun, 06 Nov 1994 08:49:37 GMT
370     Etag: "abc"
371     DCluster: "//bar.example.net/foo?"
372    
373     then when the client later makes a request for
374     ``http://bar.example.net/foo?p=2'', it can match the stored cluster
375     prefix in its cache, and generate this request:
376    
377     GET /foo?p=2 HTTP/1.1
378     Host: bar.example.net
379     If-None-Match: "abc"
380     A-IM: vcdiff
381    
382     As a generalization, the DCluster header field may include multiple
383     URL-prefixes, to allow specification of a set of URIs that do not
384     share a single common prefix.
385    
386     In order to use this approach to clustering, we need to impose one
387     important constraint. HTTP/1.1 requires so-called ``strong'' entity
388     tags to be unique for a given URI, but does not impose any broader
389     requirements on the uniqueness of entity tags. However, if a server
390     sends a ``DCluster'' header, this implies that the entity tag in the
391     response is unique not only for the Request-URI, but also for all
392     URIs for which the string given by ``DCluster'' is a prefix.
393    
394     We call this set of URIs the ``uniqueness scope'' of the entity tag.
395     Note that a response might carry multiple ``DCluster'' header fields
396     (or, by the basic HTTP syntax rules, one such header field with a
397     comma-separated list of prefix strings). This means that the
398    
399     Mogul et al. [Page 7]
400    
401     Internet-Draft Delta clustering 24 August 2000 16:15
402    
403    
404     uniqueness scope is the union of the scopes specified by the set of
405     prefixes, plus the original Request-URI. Because the URI in a
406     ``DCluster'' header field can be an absolute URI (i.e., contain a
407     host name), a uniqueness scope can span multiple servers.
408     Presumably, these servers have some out-of-band means to maintain the
409     uniqueness property.
410    
411     A client making a request may have cache entries for many different
412     resources in the uniqueness scope of the Request-URI. This is
413     another situation where the ability of ``If-None-Match'' to carry
414     multiple entity tags is employed. Abstractly, when the client makes
415     a request for which it wants a delta-encoded response, it finds all
416     of its cache entries in the same uniqueness scope, then sends the
417     entity tags for these cache entries in an ``If-None-Match'' header.
418    
419     It would not make sense to have an extremely broad uniqueness scope
420     (i.e., one that includes large numbers of resources), because this
421     would imply that a client that has cache entries for many of those
422     files would send lots of entity-tags in its request for a delta.
423     This would bloat the request message, obviating the transfer-time
424     reduction of the delta encoding. Therefore, in actual use, the
425     ``DCluster'' header field value should represent not the entire
426     uniqueness scope, but a subset of the uniqueness scope that is most
427     likely to result in small deltas.
428    
429     Client implementations, however, should be prepared to prune their
430     ``If-None-Match'' headers in case a server inadvertently (or
431     maliciously) specifies an over-broad uniqueness scope.
432    
433     Server implementation that support clustering should minimize the
434     length of the entity tags that they generate, consistent with the
435     other requirements for entity tags, since the effect of overlong
436     entity on request-header size is potentially multiplied many times by
437     the use of clustering.
438    
439     Note that the ``DCluster'' header can be used in a potential spoofing
440     attack. This attack, and defenses against it, are discussed in
441     section 6.1.
442    
443    
444     4 Use of templates
445    
446     The model of delta encoding outlined so far requires the server to
447     compute a delta between the current instance of the resource and some
448     previous instance of that resource, or (if clustering is used) a
449     previous instance of some other resource. This means that the base
450     instance is, in effect, a moving target, since we do not want to
451     require servers or clients to retain old instances for indefinite
452     periods.
453    
454    
455    
456     Mogul et al. [Page 8]
457    
458     Internet-Draft Delta clustering 24 August 2000 16:15
459    
460    
461     Douglis et al. describe an approach to dynamically-generated
462     documents in which the document is broken down into separate static
463     and dynamic parts [5]. The static part is a macro with unbound
464     variables, and the dynamic part is a set of bindings between
465     variables and specific values. In their mechanism, the client
466     retains the static part, called a ``template'' in its cache. It
467     repeatedly requests, as needed, a new instance of the dynamic part,
468     and then reevaluates the template macro, with its variables bound as
469     specified in the dynamic part, in order to generate the current
470     instance of the entire document. Their macro language is an
471     extension to HTML, although other languages (such as Java) might be
472     just as suitable.
473    
474     The WebExpress project [7] adopted the concept of a designated ``base
475     object'', which is nearly identical to the template concept described
476     here. WebExpress included a mechanism for ``rebasing'' a client
477     (providing it with a new base object). The primary difference
478     between the WebExpress approach and our approach is the time at which
479     a client discovers the identity of a (possibly new) template.
480    
481     We can apply a similar template-based mechanism to substantially
482     simplify the use of delta encoding. In this approach, the server
483     ``computes'' the delta between the current instance of a resource,
484     and a separately-identified template resource. (Depending on the
485     encoding format, it might be possible to generate the delta directly,
486     rather than generating the current instance and then computing a
487     delta.) The client then applies the delta to the template resource,
488     rather than to a previous instance of the requested resource.
489    
490     Since this approach avoids the need to retain old instances of the
491     dynamic resource at either the client or the server, it greatly
492     simplifies the implementation and optimization of base instance
493     management at both client and server. However, it requires a new
494     mechanism to inform the client of the appropriate template resource,
495     and its success may depend on the proper construction of the
496     template.
497    
498     To support template-base deltas, therefore, we define a new response
499     header that the origin server uses as a ``hint'' to inform a client
500     of the URI of the template resource. For example, if the client
501     request is
502    
503     GET /foo.html HTTP/1.1
504     Host: bar.example.net
505     A-IM: vcdiff
506    
507     the server might send:
508    
509     HTTP/1.1 200 OK
510     Date: Sun, 06 Nov 1994 08:49:37 GMT
511    
512    
513     Mogul et al. [Page 9]
514    
515     Internet-Draft Delta clustering 24 August 2000 16:15
516    
517    
518     Etag: "abc"
519     DTemplate: "http://bar.example.net/foo.tplt"
520    
521     The implication of the DTemplate header is that, on subsequent
522     requests for http://bar.example.net/foo.html, the client should ask
523     for a delta between http://bar.example.net/foo.tplt and the current
524     instance. This means, of course, that the client would first have
525     obtained and cached an instance of http://bar.example.net/foo.tplt.
526     The client might retrieve the template either on demand (i.e., just
527     before making the new request for foo.html), or during an otherwise
528     idle moment, or not at all (since the use of deltas is fully
529     optional).
530    
531     The DTemplate header implies that the specified URL is within the
532     uniqueness scope of the Request-URI (or else it would not be
533     meaningful to ask for a delta between the template and the
534     Request-URI). For example, if the client requests the template:
535    
536     GET /foo.tplt HTTP/1.1
537     Host: bar.example.net
538    
539     and receives the response:
540    
541     HTTP/1.1 200 OK
542     Date: Sun, 06 Nov 1994 08:49:47 GMT
543     Etag: "pqr"
544    
545     then the client can make a subsequent request for foo.html as:
546    
547     GET /foo.html HTTP/1.1
548     Host: bar.example.net
549     If-None-match: "pqr"
550     A-IM: vcdiff
551    
552     Alternatively, the DTemplate header field can be used to specify that
553     a specific instance of a resource (rather than any available
554     instance) be used as a template, by including an entity tag in the
555     header field. For example:
556    
557     HTTP/1.1 200 OK
558     Date: Sun, 06 Nov 1994 08:49:37 GMT
559     Etag: "abc"
560     DTemplate: "http://bar.example.net/foo.tplt"/etag="pqr"
561    
562     This form of the header further simplifies the instance-management
563     problem, by eliminating any ambiguity about which instances are worth
564     saving. It might, however, reduce the possibilities for delta
565     encoding.
566    
567     Finally, the DTemplate and DCluster headers can be combined. For
568     example:
569    
570     Mogul et al. [Page 10]
571    
572     Internet-Draft Delta clustering 24 August 2000 16:15
573    
574    
575     HTTP/1.1 200 OK
576     Date: Sun, 06 Nov 1994 08:49:37 GMT
577     Etag: "abc"
578     DTemplate: "http://bar.example.net/foo.tplt"
579     DCluster: "//bar.example.net/foo?"
580    
581     This means that for any Request-URI matching the prefix specified in
582     the DCluster header field, the URI specified in the DTemplate field
583     is an appropriate template.
584    
585     Note that an origin server ought not necessarily send a DTemplate
586     header field on every response; doing so could waste network
587     bandwidth, if the recipient is not delta-capable. Instead, the
588     server should employ heuristics to decide whether to send this header
589     field. For example, it might be worth sending it whenever the
590     client's request message indicates its willingness to accept a
591     delta-encoded response, and when the If-None-Match field in the
592     request does not already specify the entity-tag of the template
593     resource.
594    
595    
596     5 Specification
597    
598     In this specification, the The key words "MUST", "MUST NOT",
599     "SHOULD", "SHOULD NOT", and "MAY" document are to be interpreted as
600     described in RFC2119 [4].
601    
602     5.1 Modified basic requirements for delta-encoded responses
603     The basic requirements for delta-encoded responses, specified in [8],
604     are modified for servers that support the DCluster and/or DTemplate
605     header fields.
606    
607     A server MAY send a delta-encoded response if:
608    
609     1. The server would be able to send a 200 (OK) response for
610     the request.
611    
612     2. The client's request includes an A-IM header field listing
613     at least one delta-coding.
614    
615     3. The client's request includes an If-None-Match header
616     field listing at least one valid entity tag for an
617     instance (a "base instance") of at least one of:
618    
619     a. the Request-URI.
620    
621     b. a different URI within the uniqueness scope of the
622     Request-URI.
623    
624     c. a URI that matches a uri-prefix in a DTemplate
625     header field that was sent in a response for a URI
626     within the uniqueness scope of the Request-URI.
627     Mogul et al. [Page 11]
628    
629     Internet-Draft Delta clustering 24 August 2000 16:15
630    
631    
632     XXX Anything else?
633    
634     5.2 Modified header specifications
635     One of the headers defined in the specification for delta
636     encoding [8] has a slightly different meaning when delta clustering
637     or delta templates are used.
638    
639     5.2.1 A-IM
640     When an A-IM request-header field includes one or more delta-coding
641     values, the request MUST contain an If-None-Match header field,
642     listing one or more entity tags from URIs in the uniqueness scope of
643     an entity tag from a prior response for the request-URI.
644    
645     Section 5.4 defines rules that a client uses for determining the set
646     of base instances in the uniqueness scope of a request-URI.
647    
648     5.3 New header specifications
649     The following headers are defined, for use as entity-headers. (Due
650     to the terminological confusion discussed in [8], some entity-headers
651     are more properly associated with instances than with entities.)
652    
653     5.3.1 DCluster
654     The DCluster entity-header field is used in a response to specify a
655     subset of the uniqueness scope of the entity tag given in the Etag
656     header field of the response. The uniqueness scope is the set of
657     URIs across which this strong entity tag is guaranteed to be unique,
658     for all time. A uniqueness scope is specified by providing one or
659     more prefixes for other URIs in the set.
660    
661     DCluster = "DCluster" ":" #( <"> uri-prefix <">)
662     uri-prefix = scheme ":" "//" host [ ":" port ] [ abs_path ]
663     | abs_path
664     | rel_path
665    
666     If the uri-prefix is an abs_path or rel_path, the implied scheme is
667     the scheme used in the Request-URI. (Typically, the scheme would be
668     "http".) If the uri-prefix is an abs_path, it is interpreted
669     relative to the origin server host name. If the uri-prefix is a
670     rel_path, it is interpreted relative to the Request-URI.
671    
672     The uniqueness scope of a strong entity tag in an ETag header field
673     always includes the Request-URI of the corresponding request, and the
674     union of all URIs matching one or more of the uri-prefix strings in
675     the DCluster header field of the response. It may include other URIs
676     not described in a DCluster header field. That is, the set of URIs
677     for which a uri-prefix in a DCluster header field is a prefix MUST be
678     a subset of the uniqueness scope, and MAY be a proper subset.
679    
680     Generally, the DCluster header does not necessarily describe the
681     entire uniqueness scope of an entity tag. Rather, it describes a
682     subset of the uniqueness scope whose members are likely to differ by
683     small deltas.
684     Mogul et al. [Page 12]
685    
686     Internet-Draft Delta clustering 24 August 2000 16:15
687    
688    
689     A server SHOULD NOT include a uri-prefix in a DCluster header field
690     if the server is not likely to be able to generate deltas between the
691     Request-URI and the URIs matching that uri-prefix.
692    
693     The uniqueness scope specified by a DCluster header is valid for use
694     by the client only for entity tags received in the same response or
695     in subsequent responses, never for entity tags received in previous
696     responses.
697    
698     Section 5.4 defines rules that a client uses for determining the set
699     of base instances in the uniqueness scope of a request-URI.
700    
701     5.3.2 DTemplate
702     The DTemplate entity-header field is used in a response to specify
703     another resource that the origin server prefers to use as the base
704     instance for computing deltas for the Request-URI, or for other
705     resources in the uniqueness scope specified by a DCluster header
706     field in the response.
707    
708     DTemplate = "DTemplate" ":"
709     #( <"> dt-uri <"> [ "/" dt-param])
710     dt-uri = absoluteURI | abs_path
711     dt-param = "etag" "=" entity-tag
712    
713     If the dt-uri is an abs_path, it is interpreted relative to the
714     origin server host name.
715    
716     A URI specified in a DTemplate header field is, by definition, in the
717     uniqueness scope of the Request-URI.
718    
719     If a client has received a DTemplate header field within a given
720     uniqueness scope, the client SHOULD use an instance of the specified
721     template resource(s) as the base instance for any future delta
722     requests for other resources in the uniqueness scope.
723    
724     If the DTemplate header field includes an entity tag with a URI, then
725     the client SHOULD use only the specified instance of the template
726     resource base instance for any future delta requests for other
727     resources in the uniqueness scope.
728    
729     The URI specified by a DTemplate header is valid for use by the
730     client only with entity tags received in the same response or in
731     subsequent responses, never for use with entity tags received in
732     previous responses.
733    
734     5.4 Rules for determining base instances in a uniqueness scope
735     When a client is about to make a request for a given Request-URI, and
736     wishes to choose entity tags to the request's If-None-Match header
737     field, it follows a set of rules to determine which base instances
738     (and hence, which entity tags) may be included. These rules do not
739     require the client to include any entity tags, and for reasons of
740    
741     Mogul et al. [Page 13]
742    
743     Internet-Draft Delta clustering 24 August 2000 16:15
744    
745    
746     performance, a client implementation should not necessarily include
747     all of the legal choices.
748    
749     Recall that the uniqueness scope of an entity tag is the set of
750     resources across which this entity tag is unique for all time. In
751     other words, if the client and server correctly agree that the
752     Request-URI is contained in the uniqueness scope for an entity tag E
753     for some URI X, then if the client sends this entity tag E in an
754     If-None-Match header field, the server will know unambiguously which
755     resource it refers to (even though X is not explicitly named in the
756     request).
757    
758     The client's view of the uniqueness scope of an entity tag might be a
759     subset of the server's view. (It cannot be a superset, or the server
760     would be unable to interpret the If-None-Match field.) For example,
761     a server might not list all possible uri-prefix values in a DCluster
762     header, for performance reasons, or the client might not support the
763     DTemplate header. A client probably will not have received responses
764     for more than a small subset of the URIs in a uniqueness scope, or it
765     might have deleted some of the instances in order to create space in
766     its cache. A client SHOULD NOT list an entity tag in an
767     If-None-Match header unless it has a cache entry containing at least
768     part of the corresponding instance, since this would otherwise lead
769     to uninterpretable delta responses.
770    
771     A Request-URI is in the uniqueness scope of an entity tag E for an
772     instance of URI X if one or more of these conditions holds:
773    
774     1. X is the Request-URI.
775    
776     2. The DCluster header field of a prior response for the
777     Request-URI includes a prefix of X. The base instance
778     associated with entity tag E MUST NOT have been received
779     before the first such DCluster header field.
780    
781     3. The DCluster header field of a prior response for X
782     includes a prefix of the Request-URI. The base instance
783     associated with entity tag E MUST NOT have been received
784     before the first such DCluster header field.
785    
786     4. X has been listed in the DTemplate header field of a prior
787     response for the Request-URI, or of a prior response for
788     another URI Y in the uniqueness scope of the Request-URI
789     (by recursive application of these conditions to an
790     instance of URI Y).
791    
792     XXX Is this unambiguous?
793    
794     Security considerations (see section 6.1) require that a client not
795     always trust every DCluster header that it receives. A malicious
796     server might send a DCluster header that could cause the client to
797    
798     Mogul et al. [Page 14]
799    
800     Internet-Draft Delta clustering 24 August 2000 16:15
801    
802    
803     believe that a URI is within the uniqueness scope of an entity tag
804     when, in fact, it is not. Therefore, a client MUST NOT use condition
805     #3 above (DCluster of a prior response for X includes prefix of
806     Request-URI) unless it can securely verify that a resulting delta is
807     not spoofed.
808    
809     Our current belief is that spoofing can be detected by any one of the
810     following means:
811    
812     - The delta-encoded response is accompanied by a secure
813     message digest covering the entire current instance,
814     generated by the origin server. This allows the client to
815     verify that it has received the current instance of the
816     Request-URI.
817    
818     - All of the URIs in the uniqueness scope of the Request-URI
819     have the same "hostport" as the Request-URI; see
820     RFC2396 [3] for the specification of this term. This
821     ensures that, if no interception mechanism is in use, that
822     the client receives what the server wishes it to receive.
823     (In general, malicious interception mechanisms create
824     broader risks than the spoofing of deltas.)
825    
826     - All of base instances associated with the entity tags
827     listed in the client's A-IM header came from URIs listed in
828     DCluster or DTemplate headers in responses for prior
829     Request-URIs having the same "hostport" as the current
830     Request-URI. This ensures that the chosen base instances
831     came from origin servers trusted by the origin server for
832     the current Request-URI.
833    
834     Note: the spoofing detection mechanisms listed above should be
835     reviewed by competent security experts.
836    
837    
838     6 Security Considerations
839    
840     Note: This aspect of the specification is the subject of some
841     controversy, and the details of protections against spoofing
842     attacks in particular are likely to change. We will seek a
843     more formal security review of this specification as part of
844     the IETF standardization process.
845    
846     6.1 Spoofing attacks using the DCluster header
847     We have identified a potential spoofing attack via the ``DCluster''
848     header. In this scenario, a malicious server (e.g.,
849     malicious.example.org) generates a response (e.g., for
850     http://malicious.example.org/trap.html) with a ``DCluster'' header
851     indicating that the uniqueness scope of the entity tag in the
852     response includes another server (e.g., victim.example.com). Suppose
853     that the response includes the entity tag "abc". Now suppose that
854     the client makes this request:
855     Mogul et al. [Page 15]
856    
857     Internet-Draft Delta clustering 24 August 2000 16:15
858    
859    
860     GET /foo.html HTTP/1.1
861     host: victim.example.com
862     If-None-Match: "abc"
863     A-IM: vcdiff
864    
865     If the victim.example.com server does actually have an instance with
866     entity tag "abc", either for http://victim.example.com/foo.html or
867     for a resource that really is in the same uniqueness scope, then the
868     server will generate a delta. However, if the client applies this
869     delta to the cached response for
870     http://malicious.example.org/trap.html, it will end up either with
871     garbage, or (more perniciously) with an apparently genuine result
872     that actually contains bogus information inserted by
873     malicious.example.org. (The response for
874     http://malicious.example.org/trap.html might contain the bogus
875     information concealed in HTML comments.)
876    
877     Protection against this attack can be accomplished by the use of
878     end-to-end digests on the instances, as described in another
879     proposal [11]. (Message digests, such as provided by ``Content-MD5''
880     or by Digest Authentication, are not sufficient, since none of the
881     individual messages are tampered with in this attack.)
882    
883     Note that protection against spoofing via the ``DCluster'' header
884     does not inherently require a keyed digest. Since the delta encoded
885     response for http://victim.example.com/foo.html is not itself
886     generated by malicious.example.org, an end-to-end digest included
887     with this response by victim.example.com is sufficient to prove that
888     the client's reconstruction of foo.html is correct. However, if
889     message tampering is also a possibility, then the server should also
890     provide a keyed message digest.
891    
892     Another defense against such an attack is for the client to ignore a
893     ``DCluster'' header that specifies a different server. However, this
894     defense is only effective if servers that generate delta-encoded
895     responses are not shared among multiple, possibly mutually
896     untrustworthy, content providers. It also reduces the potential
897     effectiveness of clustering, especially for large sites split across
898     multiple servers.
899    
900     Note that because the DTemplate header field also adds one or more
901     URIs to the uniqueness scope of an entity tag, the same spoofing
902     attack is possible using the DTemplate header, and the same defenses
903     apply.
904    
905     We recommend that if a client receives a delta-encoded response
906     without an accompanying Digest, and if the client's view of the
907     uniqueness scope for the Request-URI includes more than one server
908     hostname, then the response should either be discarded, or presented
909     to the user as potentially corrupt.
910    
911    
912     Mogul et al. [Page 16]
913    
914     Internet-Draft Delta clustering 24 August 2000 16:15
915    
916    
917     6.2 Privacy attacks using the DCluster header
918     Many people have drawn attention to the privacy risks associated with
919     HTTP Cookies, which allow a site (or group of cooperating sites) to
920     track the activity of a user. More recently, Martin Pool has
921     identified a similar tracking mechanism based on cache validators,
922     especially entity tags [12]. In this attack, a site encodes
923     user-specific information in an entity tag, and then tracks repeated
924     requests by that user to the same resource, as the user's browser
925     attempts to validate its cache entry using that entity tag.
926    
927     Although this tracks only the requests for a specific resource (URL),
928     a site can indirectly track references to many other pages by
929     embedding an image reference to the tracked URL on each of those
930     pages.
931    
932     Just as with Cookies, the entity-tag tracking mechanism depends upon
933     the server's ability to induce the client to send back a specific
934     string on subsequent requests. However, the basic entity-tag
935     tracking mechanism only allows a site to track access to pages that
936     it controls.
937    
938     The ``DCluster'' header field specified in this document makes this
939     tracking mechanism more powerful, by allowing one site to gain access
940     to entity tags from many other sites. For example, suppose that the
941     site evil.example.com knows the format used to encode client-specific
942     information in entity tags issued by the site naive.example.com. Any
943     client who visits http://evil.example.com/home.html and receives a
944    
945     DCluster: http://naive.example.com/
946    
947     header in response might then later make a delta-capable request to
948     evil.example.com that includes entity tags issued by
949     naive.example.com.
950    
951     It might be possible to defend against such ``hijacked'' tracking
952     attacks by chosing a cryptographically strong encoding for the
953     client-specific data hidden in entity tags, but this might not always
954     be feasible. In any event, this could not hide from evil.example.com
955     the fact that the client had at some point visited naive.example.com
956     (which could be significant if this site provided, for example,
957     medical information about an embarrassing disease).
958    
959     Cryptographic digests of instances, as described in section 6.1 to
960     protect against DCluster spoofing, do not help, because the malicious
961     site in this case is the source of the requested data, and need not
962     actually use a delta encoding to accomplish its attack.
963    
964     As in section 6.1, one possible defense is for the client to ignore a
965     ``DCluster'' header that specifies a different server, but (also as
966     discussed in section 6.1) this is not ideal.
967    
968    
969     Mogul et al. [Page 17]
970    
971     Internet-Draft Delta clustering 24 August 2000 16:15
972    
973    
974     User agents SHOULD provide a method to allow users to disable the use
975     of the ``DCluster'' header, preferably either in all cases, or in
976     cross-site cases.
977    
978     6.3 Data leakage attacks using the DCluster header
979     Suppose that a server has asserted, using a DCluster header, that
980     resources URL1 and URL2 are in the same uniqueness scope. Also
981     suppose that a client is allowed to access URL1, but is not allowed
982     to access URL2. (Access may be denied due to a lack of
983     authentication, or a server configuration setting, or some other
984     mechanism.) Finally, suppose that the client can guess or obtain the
985     entity tag ET2 of some instance of URL2. If the client asks the
986     server for the current instance of URL1 as a delta from the ET2
987     instance of URL2, and the server responds with such a delta, this may
988     reveal information about the contents of URL2. (The amount of
989     information revealed depends strongly on the delta-coding format, and
990     probably will not be enough to recover the full contents of URL2.)
991    
992     A server MUST NOT reply using a delta encoding, if the chosen base
993     instance is not an instance of the Request-URI, unless the server can
994     verify that the client would currently be allowed access to both the
995     chosen base instance and the Request-URI.
996    
997    
998     7 History
999    
1000     7.1 draft-mogul-http-dcluster-00.txt
1001     This document was split off from draft-mogul-http-delta-*.txt, to
1002     avoid having the security issues affect the basic HTTP delta encoding
1003     specification, and to ensure that the design of clusters and
1004     templates was done so that they are entirely optional for
1005     implementors of basic delta encoding.
1006    
1007    
1008     8 Acknowledgements
1009    
1010     Andrew Birrell alerted us to the possibility of data leakage attacks
1011     using the DCluster header. Koen Holtman contributed to the drafting
1012     of this document, and especially to the security considerations and
1013     mechanisms.
1014    
1015    
1016     9 References
1017    
1018     NOTE TO RFC EDITOR: many of the references here might be out of date.
1019     Please verify these with the primary author of this Internet-Draft
1020     before issuing this document as an RFC.
1021    
1022     1. Gaurav Banga, Fred Douglis, and Michael Rabinovich. Optimistic
1023     Deltas for WWW Latency Reduction. Proc. 1997 USENIX Technical
1024     Conference, Anaheim, CA, January, 1997, pp. 289-303.
1025    
1026     Mogul et al. [Page 18]
1027    
1028     Internet-Draft Delta clustering 24 August 2000 16:15
1029    
1030    
1031     2. T. Berners-Lee, R. Fielding, and H. Frystyk. Hypertext Transfer
1032     Protocol -- HTTP/1.0. RFC 1945, HTTP Working Group, May, 1996.
1033    
1034     3. T. Berners-Lee, R. Fielding, and L. Masinter. Uniform Resource
1035     Identifiers (URI): Generic Syntax. RFC 2396, IETF, August, 1998.
1036    
1037     4. S. Bradner. Key words for use in RFCs to Indicate Requirement
1038     Levels. RFC 2119, Harvard University, March, 1997.
1039    
1040     5. Fred Douglis, Antonio Haro, and Michael Rabinovich. HPP: HTML
1041     Macro-Preprocessing to Support Dynamic Document Caching. Proc.
1042     USENIX Symposium on Internet Technologies and Systems, USENIX,
1043     Monterey, CA, December, 1997, pp. 83-94.
1044    
1045     6. Roy T. Fielding, Jim Gettys, Jeffrey C. Mogul, Henrik Frystyk
1046     Nielsen, Larry Masinter, Paul Leach, and Tim Berners-Lee. Hypertext
1047     Transfer Protocol -- HTTP/1.1. RFC 2616, HTTP Working Group, June,
1048     1999.
1049    
1050     7. Barron C. Housel and David B. Lindquist. WebExpress: A System
1051     for Optimizing Web Browsing in a Wireless Environment. Proc. 2nd
1052     Annual Intl. Conf. on Mobile Computing and Networking, ACM, Rye, New
1053     York, November, 1996, pp. 108-116.
1054     http://www.networking.ibm.com/art/artwewp.htm.
1055    
1056     8. Jeffrey C. Mogul, Balachander Krishnamurthy, Fred Douglis, Anja
1057     Feldmann, Yaron Goland, and Arthur van Hoff. Delta encoding in HTTP.
1058     Internet-Draft draft-mogul-http-delta-06, IETF, August, 2000. This is
1059     a work in progress.
1060    
1061     9. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander
1062     Krishnamurthy. Potential benefits of delta encoding and data
1063     compression for HTTP. Proc. SIGCOMM '97, Cannes, France, September,
1064     1997, pp. 181-194.
1065    
1066     10. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander
1067     Krishnamurthy. Potential benefits of delta encoding and data
1068     compression for HTTP. Research Report 97/4, DECWRL, July, 1997. URL
1069     http://www.research.digital.com/wrl/techreports/abstracts/97.4.html.
1070    
1071     11. Jeffrey C. Mogul and Arthur Van Hoff. Instance Digests in HTTP.
1072     Internet-Draft draft-mogul-http-digest-02, IETF, March, 2000. This is
1073     a work in progress.
1074    
1075     12. Martin Pool. meantime: non-consensual http user tracking using
1076     caches. http://www.linuxcare.com.au/mbp/meantime/.
1077    
1078    
1079    
1080    
1081    
1082    
1083     Mogul et al. [Page 19]
1084    
1085     Internet-Draft Delta clustering 24 August 2000 16:15
1086    
1087    
1088     10 Authors' addresses
1089    
1090     Jeffrey C. Mogul
1091     Western Research Laboratory
1092     Compaq Computer Corporation
1093     250 University Avenue
1094     Palo Alto, California, 94305, U.S.A.
1095     Email: mogul@pa.dec.com
1096     Phone: 1 650 617 3304 (email preferred)
1097    
1098     Fred Douglis
1099     AT&T Labs - Research
1100     180 Park Ave, Room B-137
1101     Florham Park, NJ 07932-0971, U.S.A.
1102     Email: douglis@research.att.com
1103     Phone: 1 973 360-8775
1104    
1105     Daniel M. Hellerstein
1106     Economic Research Service, USDA
1107     1909 Franwall Ave, Wheaton MD 20902
1108     E-mail: danielh@crosslink.net or webmaster@srehttp.org
1109     Phone: 1 202 694-5613 or 1 301 649-4728
1110    
1111    
1112    
1113    
1114    
1115    
1116    
1117    
1118    
1119    
1120    
1121    
1122    
1123    
1124    
1125    
1126    
1127    
1128    
1129    
1130    
1131    
1132    
1133    
1134    
1135    
1136    
1137    
1138    
1139    
1140     Mogul et al. [Page 20]

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24