/[suikacvs]/webroot/www/2004/id/draft-mogul-http-dcluster-00.txt
Suika

Contents of /webroot/www/2004/id/draft-mogul-http-dcluster-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2 Network Working Group Jeffrey Mogul, Compaq WRL,
3 Internet-Draft Fred Douglis, AT&T,
4 Expires: 25 February 2001 Daniel Hellerstein, ERS/USDA
5 24 August 2000
6
7
8
9 HTTP Delta Clusters and Templates
10
11 draft-mogul-http-dcluster-00.txt
12
13
14 STATUS OF THIS MEMO
15
16 This document is an Internet-Draft and is in full
17 conformance with all provisions of Section 10 of RFC2026.
18
19 Internet-Drafts are working documents of the Internet
20 Engineering Task Force (IETF), its areas, and its working
21 groups. Note that other groups may also distribute working
22 documents as Internet-Drafts.
23
24 Internet-Drafts are draft documents valid for a maximum of
25 six months and may be updated, replaced, or obsoleted by
26 other documents at any time. It is inappropriate to use
27 Internet-Drafts as reference material or to cite them other
28 than as "work in progress."
29
30 The list of current Internet-Drafts can be accessed at
31 http://www.ietf.org/ietf/1id-abstracts.txt
32
33 The list of Internet-Draft Shadow Directories can be
34 accessed at http://www.ietf.org/shadow.html.
35
36 Distribution of this document is unlimited. Please send
37 comments to the authors.
38
39
40 ABSTRACT
41
42 HTTP "Delta encoding," the transmission of a compact
43 encoding of the change between instances of a Web resource
44 instead of retransmitting the entire new value, has been
45 shown to be of potential value. Research has shown
46 additional benefits if deltas can be computed between
47 instances of different resources. This document describes
48 a compatible extension to HTTP delta encoding to support
49 "clustering", where multiple resources (URLs) are treated
50 as a pool, and the use of "templates", where a large set of
51 resource instances are most naturally described as deltas
52 from a chosen template resource.
53
54
55
56
57 Mogul et al. [Page 1]
58
59 Internet-Draft Delta clustering 24 August 2000 16:15
60
61
62 TABLE OF CONTENTS
63
64 1 Introduction 3
65 1.1 Related research and proposals 4
66 2 Terminology 5
67 3 Delta-encoding and clustering 6
68 4 Use of templates 8
69 5 Specification 11
70 5.1 Modified basic requirements for delta-encoded responses 11
71 5.2 Modified header specifications 12
72 5.2.1 A-IM 12
73 5.3 New header specifications 12
74 5.3.1 DCluster 12
75 5.3.2 DTemplate 13
76 5.4 Rules for determining base instances in a uniqueness scope 13
77 6 Security Considerations 15
78 6.1 Spoofing attacks using the DCluster header 15
79 6.2 Privacy attacks using the DCluster header 17
80 6.3 Data leakage attacks using the DCluster header 18
81 7 History 18
82 7.1 draft-mogul-http-dcluster-00.txt 18
83 8 Acknowledgements 18
84 9 References 18
85 10 Authors' addresses 20
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114 Mogul et al. [Page 2]
115
116 Internet-Draft Delta clustering 24 August 2000 16:15
117
118
119 1 Introduction
120
121 WARNING: THIS SPECIFICATION WILL CHANGE. DO NOT DEPLOY
122 ANY IMPLEMENTATIONS BASED ON THIS SPECIFICATION.
123
124 The World Wide Web is a distributed system, and so often benefits
125 from caching to reduce retrieval delays. Retrieval of a Web resource
126 (such as document, image, icon, or applet) over the Internet or other
127 wide-area network usually takes enough time that the delay is over
128 the human threshold of perception. Often, that delay is measured in
129 seconds. Caching can often eliminate or significantly reduce
130 retrieval delays.
131
132 Many Web resources change over time, so a practical caching approach
133 must include a coherency mechanism, to avoid presenting stale
134 information to the user. Originally, the Hypertext Transfer Protocol
135 (HTTP) provided little support for caching, but under operational
136 pressures, it quickly evolved to support a simple mechanism for
137 maintaining cache coherency.
138
139 In HTTP/1.0 [2], the server may supply a ``last-modified'' timestamp
140 with a response. If a client stores this response in a cache entry,
141 and then later wishes to re-use the response, it may transmit a
142 request message with an ``If-modified-since'' field containing that
143 timestamp; this is known as a conditional retrieval. Upon receiving
144 a conditional request, the server may either reply with a full
145 response, or, if the resource has not changed, it may send an
146 abbreviated reply, indicating that the client's cache entry is still
147 valid. HTTP/1.0 also includes a means for the server to indicate,
148 via an ``Expires'' timestamp, that a response will be valid until
149 that time; if so, a client may use a cached copy of the response
150 until that time, without first validating it using a conditional
151 retrieval.
152
153 HTTP/1.1 [6] adds many new features to improve cache coherency and
154 performance. However, it preserves the all-or-none model for
155 responses to conditional retrievals: either the server indicates that
156 the resource value has not changed at all, or it must transmit the
157 entire current value.
158
159 Common sense suggests (and traces confirm), however, that even when a
160 Web resource does change, the new instance is often substantially
161 similar to the old one. If the difference, or ``delta'', between the
162 two instances could be sent to the client instead of the entire new
163 instance, a client holding a cached copy of the old instance could
164 apply the delta to construct the new version. In a world of finite
165 bandwidth, the reduction in response size and delay could be
166 significant.
167
168 One can think of deltas as a way to squeeze as much benefit as
169 possible from client and proxy caches. Rather than treating an
170
171 Mogul et al. [Page 3]
172
173 Internet-Draft Delta clustering 24 August 2000 16:15
174
175
176 entire response as the ``cache line,'' with deltas we can treat
177 arbitrary pieces of a cached response as the replaceable unit, and
178 avoid transferring pieces that have not changed.
179
180 A separate document [8] specifies a set of compatible extensions to
181 HTTP/1.1 that allow clients and servers to use delta encoding with
182 minimal overhead. That mechanism only supports deltas between
183 instances of a single resource.
184
185 This document specifies further extensions to the delta encoding
186 mechanism. These extensions allow deltas to be computed between
187 instances of different resources. This increases the likelihood that
188 a compact delta might be found to encode the current instance of a
189 requested resource.
190
191 We assume that the reader is familiar with the HTTP/1.1
192 specification, and with the delta encoding specification.
193
194 1.1 Related research and proposals
195 The WebExpress project [7] appears to be the first published
196 description of an implementation of delta encoding for HTTP (which
197 they call ``differencing''). WebExpress is aimed specifically at
198 wireless environments, and includes a number of orthogonal
199 optimizations. Also, the WebExpress design does not propose changing
200 the HTTP protocol itself, but rather uses a pair of interposed
201 proxies to convert the HTTP message stream into an optimized form.
202 The results reported for WebExpress differencing are impressive, but
203 are limited to a few selected benchmarks.
204
205 The WebExpress paper also pointed out that in many cases, the
206 individual responses to different queries with the same ``URL
207 prefix'' (that is, the prefix of the URL before the ``?'' character)
208 are often similar enough to make delta encoding effective. Since
209 users frequently make numerous different queries using the same URL
210 prefix, it might be much more effective to compute deltas between
211 different queries for a given URL prefix, rather than simply between
212 different queries using an identical URL. Banga et al. [1] make a
213 similar observation. A 1997 trace-based study [9] showed that this
214 approach has significant potential for improving the bandwidth
215 requirements. The "clustering" mechanism described in this
216 specification is intended to support the use of delta encoding in
217 contexts where the delta is computed between two different URLs.
218
219 The WebExpress project [7] adopted the concept of a designated ``base
220 object'', rather than simply relying on a prefix-matching mechanism.
221 WebExpress included a mechanism for ``rebasing'' a client (providing
222 it with a new base object). The "templates" mechanism described in
223 this specification supports a very similar approach.
224
225 The approaches described above, and in this specification, operate
226 independent of the syntax and semantics of the data being transferred
227
228 Mogul et al. [Page 4]
229
230 Internet-Draft Delta clustering 24 August 2000 16:15
231
232
233 (although delta encoding algorithms for images may require some
234 specialization). They function by decomposing responses at the bit
235 or byte level into currently-cached and need-to-be-transferred
236 components. One can also do this decomposition at a higher level.
237 Douglis et al. [5] describe an "HTML macro" mechanism, in which a set
238 of similar HTML pages is decomposed into a constant component (akin
239 to a macro body) and a variable component (akin to macro arguments).
240 In many cases, the variable component can be quite small; this means
241 once the constant component is in a cache, references to similar
242 pages require fetching only the small variable component, at a
243 significant cost savings over transferring a monolithic response.
244
245 The main drawback to the HTML macro approach is that it requires
246 direct involvement by the designer (or software) when generating the
247 Web pages, including some careful attention to the decomposition of a
248 set of similar pages. It might also require some additional
249 language-level standardization, although this perhaps could be
250 obviated through the use of Java-based macros. Therefore, support
251 for HTML macros is beyond the scope of this specification.
252
253
254 2 Terminology
255
256 HTTP/1.1 [6] defines the following terms:
257
258 resource A network data object or service that can be
259 identified by a URI, as defined in section 3.2.
260 Resources may be available in multiple
261 representations (e.g. multiple languages, data
262 formats, size, resolutions) or vary in other ways.
263
264 entity The information transferred as the payload of a
265 request or response. An entity consists of
266 metainformation in the form of entity-header fields
267 and content in the form of an entity-body, as
268 described in section 7.
269
270 variant A resource may have one, or more than one,
271 representation(s) associated with it at any given
272 instant. Each of these representations is termed a
273 `variant.' Use of the term `variant' does not
274 necessarily imply that the resource is subject to
275 content negotiation.
276
277 The specification for delta encoding [8] defined these additional
278 terms:
279
280 instance The entity that would be returned in a status-200
281 response to a GET request, at the current time, for
282 the selected variant of the specified resource, with
283 the application of zero or more content-codings, but
284
285 Mogul et al. [Page 5]
286
287 Internet-Draft Delta clustering 24 August 2000 16:15
288
289
290 without the application of any instance manipulations
291 or transfer-codings.
292
293 instance manipulation
294 An operation on one or more instances which may
295 result in an instance being conveyed from server to
296 client in parts, or in more than one response
297 message. For example, a range selection or a delta
298 encoding. Instance manipulations are end-to-end, and
299 often involve the use of a cache at the client.
300
301 See that specification for further discussion of those terms.
302
303 For the extensions specified in this document, we define one more
304 term:
305
306 uniqueness scope
307 The uniqueness scope of an entity tag is the set of
308 resources across which this entity tag is unique for
309 all time. That is, within this set of resources, if
310 two instances share an entity tag, then the values of
311 these instances (including their instance bodies and
312 their instance headers) are equal.
313
314 In unmodified HTTP/1.1, the uniqueness scope of an entity tag is
315 always a single resource. In this proposal, we provide a means to
316 extend the uniqueness scope to include multiple resources.
317
318
319 3 Delta-encoding and clustering
320
321 The basic delta-encoding model assumes that deltas are computed
322 between two instances of a specific resources; i.e., both deltas are
323 associated with a single URL. However, the WebExpress project [7]
324 suggested that by treating a query URL (that is, a URL with an
325 embedded ``?'') as a prefix followed by a set of parameters, one
326 could then profitably compute deltas between resource values whose
327 URLs have identical prefixes, but perhaps different parameters
328 (suffixes). Our trace-based study confirmed this [10]. We believe
329 that this might be generalized to certain other patterns of URLs
330 (i.e., not just those using ``?'' as a separator). We use the term
331 ``clustering'' for this approach.
332
333 For example, if a client has cached a response for a DEC stock quote
334 (``http://quote.yahoo.com/q?s=DEC&d=f''), and then requests a quote
335 for AT&T from the same server (``http://quote.yahoo.com/q?s=T&d=f''),
336 the prefix for the cluster would be ``http://quote.yahoo.com/q?''.
337
338 In order to support clustering, we need a mechanism for the server to
339 indicate to the client which URLs are eligible for clustering (since
340 it would be highly inefficient for the client to send the entity tags
341 of every resource in its cache on every request).
342 Mogul et al. [Page 6]
343
344 Internet-Draft Delta clustering 24 August 2000 16:15
345
346
347 We propose a new, optional response header for this purpose, to
348 specify a URL-prefix for other resources that ``cluster'' with the
349 given response. The header name is ``DCluster''.
350
351 Once a cluster-eligible response is cached, when the client is about
352 to make a subsequent request, it would match the request-URI against
353 all of the URL-prefixes in its cache. (As specified in section
354 5.3.1, only cache entries received after the matching DCluster header
355 are eligible.) The ``If-None-Match'' field in its request could then
356 list the entity tags for all of the matching entries. In some cases,
357 it might be more efficient to list only a subset (such as the most
358 recently received cache entries), to avoid excessive request header
359 lengths.
360
361 For example, if a client makes this initial request:
362
363 GET /foo?p=1 HTTP/1.1
364 Host: bar.example.net
365
366 and receives this response:
367
368 HTTP/1.1 200 OK
369 Date: Sun, 06 Nov 1994 08:49:37 GMT
370 Etag: "abc"
371 DCluster: "//bar.example.net/foo?"
372
373 then when the client later makes a request for
374 ``http://bar.example.net/foo?p=2'', it can match the stored cluster
375 prefix in its cache, and generate this request:
376
377 GET /foo?p=2 HTTP/1.1
378 Host: bar.example.net
379 If-None-Match: "abc"
380 A-IM: vcdiff
381
382 As a generalization, the DCluster header field may include multiple
383 URL-prefixes, to allow specification of a set of URIs that do not
384 share a single common prefix.
385
386 In order to use this approach to clustering, we need to impose one
387 important constraint. HTTP/1.1 requires so-called ``strong'' entity
388 tags to be unique for a given URI, but does not impose any broader
389 requirements on the uniqueness of entity tags. However, if a server
390 sends a ``DCluster'' header, this implies that the entity tag in the
391 response is unique not only for the Request-URI, but also for all
392 URIs for which the string given by ``DCluster'' is a prefix.
393
394 We call this set of URIs the ``uniqueness scope'' of the entity tag.
395 Note that a response might carry multiple ``DCluster'' header fields
396 (or, by the basic HTTP syntax rules, one such header field with a
397 comma-separated list of prefix strings). This means that the
398
399 Mogul et al. [Page 7]
400
401 Internet-Draft Delta clustering 24 August 2000 16:15
402
403
404 uniqueness scope is the union of the scopes specified by the set of
405 prefixes, plus the original Request-URI. Because the URI in a
406 ``DCluster'' header field can be an absolute URI (i.e., contain a
407 host name), a uniqueness scope can span multiple servers.
408 Presumably, these servers have some out-of-band means to maintain the
409 uniqueness property.
410
411 A client making a request may have cache entries for many different
412 resources in the uniqueness scope of the Request-URI. This is
413 another situation where the ability of ``If-None-Match'' to carry
414 multiple entity tags is employed. Abstractly, when the client makes
415 a request for which it wants a delta-encoded response, it finds all
416 of its cache entries in the same uniqueness scope, then sends the
417 entity tags for these cache entries in an ``If-None-Match'' header.
418
419 It would not make sense to have an extremely broad uniqueness scope
420 (i.e., one that includes large numbers of resources), because this
421 would imply that a client that has cache entries for many of those
422 files would send lots of entity-tags in its request for a delta.
423 This would bloat the request message, obviating the transfer-time
424 reduction of the delta encoding. Therefore, in actual use, the
425 ``DCluster'' header field value should represent not the entire
426 uniqueness scope, but a subset of the uniqueness scope that is most
427 likely to result in small deltas.
428
429 Client implementations, however, should be prepared to prune their
430 ``If-None-Match'' headers in case a server inadvertently (or
431 maliciously) specifies an over-broad uniqueness scope.
432
433 Server implementation that support clustering should minimize the
434 length of the entity tags that they generate, consistent with the
435 other requirements for entity tags, since the effect of overlong
436 entity on request-header size is potentially multiplied many times by
437 the use of clustering.
438
439 Note that the ``DCluster'' header can be used in a potential spoofing
440 attack. This attack, and defenses against it, are discussed in
441 section 6.1.
442
443
444 4 Use of templates
445
446 The model of delta encoding outlined so far requires the server to
447 compute a delta between the current instance of the resource and some
448 previous instance of that resource, or (if clustering is used) a
449 previous instance of some other resource. This means that the base
450 instance is, in effect, a moving target, since we do not want to
451 require servers or clients to retain old instances for indefinite
452 periods.
453
454
455
456 Mogul et al. [Page 8]
457
458 Internet-Draft Delta clustering 24 August 2000 16:15
459
460
461 Douglis et al. describe an approach to dynamically-generated
462 documents in which the document is broken down into separate static
463 and dynamic parts [5]. The static part is a macro with unbound
464 variables, and the dynamic part is a set of bindings between
465 variables and specific values. In their mechanism, the client
466 retains the static part, called a ``template'' in its cache. It
467 repeatedly requests, as needed, a new instance of the dynamic part,
468 and then reevaluates the template macro, with its variables bound as
469 specified in the dynamic part, in order to generate the current
470 instance of the entire document. Their macro language is an
471 extension to HTML, although other languages (such as Java) might be
472 just as suitable.
473
474 The WebExpress project [7] adopted the concept of a designated ``base
475 object'', which is nearly identical to the template concept described
476 here. WebExpress included a mechanism for ``rebasing'' a client
477 (providing it with a new base object). The primary difference
478 between the WebExpress approach and our approach is the time at which
479 a client discovers the identity of a (possibly new) template.
480
481 We can apply a similar template-based mechanism to substantially
482 simplify the use of delta encoding. In this approach, the server
483 ``computes'' the delta between the current instance of a resource,
484 and a separately-identified template resource. (Depending on the
485 encoding format, it might be possible to generate the delta directly,
486 rather than generating the current instance and then computing a
487 delta.) The client then applies the delta to the template resource,
488 rather than to a previous instance of the requested resource.
489
490 Since this approach avoids the need to retain old instances of the
491 dynamic resource at either the client or the server, it greatly
492 simplifies the implementation and optimization of base instance
493 management at both client and server. However, it requires a new
494 mechanism to inform the client of the appropriate template resource,
495 and its success may depend on the proper construction of the
496 template.
497
498 To support template-base deltas, therefore, we define a new response
499 header that the origin server uses as a ``hint'' to inform a client
500 of the URI of the template resource. For example, if the client
501 request is
502
503 GET /foo.html HTTP/1.1
504 Host: bar.example.net
505 A-IM: vcdiff
506
507 the server might send:
508
509 HTTP/1.1 200 OK
510 Date: Sun, 06 Nov 1994 08:49:37 GMT
511
512
513 Mogul et al. [Page 9]
514
515 Internet-Draft Delta clustering 24 August 2000 16:15
516
517
518 Etag: "abc"
519 DTemplate: "http://bar.example.net/foo.tplt"
520
521 The implication of the DTemplate header is that, on subsequent
522 requests for http://bar.example.net/foo.html, the client should ask
523 for a delta between http://bar.example.net/foo.tplt and the current
524 instance. This means, of course, that the client would first have
525 obtained and cached an instance of http://bar.example.net/foo.tplt.
526 The client might retrieve the template either on demand (i.e., just
527 before making the new request for foo.html), or during an otherwise
528 idle moment, or not at all (since the use of deltas is fully
529 optional).
530
531 The DTemplate header implies that the specified URL is within the
532 uniqueness scope of the Request-URI (or else it would not be
533 meaningful to ask for a delta between the template and the
534 Request-URI). For example, if the client requests the template:
535
536 GET /foo.tplt HTTP/1.1
537 Host: bar.example.net
538
539 and receives the response:
540
541 HTTP/1.1 200 OK
542 Date: Sun, 06 Nov 1994 08:49:47 GMT
543 Etag: "pqr"
544
545 then the client can make a subsequent request for foo.html as:
546
547 GET /foo.html HTTP/1.1
548 Host: bar.example.net
549 If-None-match: "pqr"
550 A-IM: vcdiff
551
552 Alternatively, the DTemplate header field can be used to specify that
553 a specific instance of a resource (rather than any available
554 instance) be used as a template, by including an entity tag in the
555 header field. For example:
556
557 HTTP/1.1 200 OK
558 Date: Sun, 06 Nov 1994 08:49:37 GMT
559 Etag: "abc"
560 DTemplate: "http://bar.example.net/foo.tplt"/etag="pqr"
561
562 This form of the header further simplifies the instance-management
563 problem, by eliminating any ambiguity about which instances are worth
564 saving. It might, however, reduce the possibilities for delta
565 encoding.
566
567 Finally, the DTemplate and DCluster headers can be combined. For
568 example:
569
570 Mogul et al. [Page 10]
571
572 Internet-Draft Delta clustering 24 August 2000 16:15
573
574
575 HTTP/1.1 200 OK
576 Date: Sun, 06 Nov 1994 08:49:37 GMT
577 Etag: "abc"
578 DTemplate: "http://bar.example.net/foo.tplt"
579 DCluster: "//bar.example.net/foo?"
580
581 This means that for any Request-URI matching the prefix specified in
582 the DCluster header field, the URI specified in the DTemplate field
583 is an appropriate template.
584
585 Note that an origin server ought not necessarily send a DTemplate
586 header field on every response; doing so could waste network
587 bandwidth, if the recipient is not delta-capable. Instead, the
588 server should employ heuristics to decide whether to send this header
589 field. For example, it might be worth sending it whenever the
590 client's request message indicates its willingness to accept a
591 delta-encoded response, and when the If-None-Match field in the
592 request does not already specify the entity-tag of the template
593 resource.
594
595
596 5 Specification
597
598 In this specification, the The key words "MUST", "MUST NOT",
599 "SHOULD", "SHOULD NOT", and "MAY" document are to be interpreted as
600 described in RFC2119 [4].
601
602 5.1 Modified basic requirements for delta-encoded responses
603 The basic requirements for delta-encoded responses, specified in [8],
604 are modified for servers that support the DCluster and/or DTemplate
605 header fields.
606
607 A server MAY send a delta-encoded response if:
608
609 1. The server would be able to send a 200 (OK) response for
610 the request.
611
612 2. The client's request includes an A-IM header field listing
613 at least one delta-coding.
614
615 3. The client's request includes an If-None-Match header
616 field listing at least one valid entity tag for an
617 instance (a "base instance") of at least one of:
618
619 a. the Request-URI.
620
621 b. a different URI within the uniqueness scope of the
622 Request-URI.
623
624 c. a URI that matches a uri-prefix in a DTemplate
625 header field that was sent in a response for a URI
626 within the uniqueness scope of the Request-URI.
627 Mogul et al. [Page 11]
628
629 Internet-Draft Delta clustering 24 August 2000 16:15
630
631
632 XXX Anything else?
633
634 5.2 Modified header specifications
635 One of the headers defined in the specification for delta
636 encoding [8] has a slightly different meaning when delta clustering
637 or delta templates are used.
638
639 5.2.1 A-IM
640 When an A-IM request-header field includes one or more delta-coding
641 values, the request MUST contain an If-None-Match header field,
642 listing one or more entity tags from URIs in the uniqueness scope of
643 an entity tag from a prior response for the request-URI.
644
645 Section 5.4 defines rules that a client uses for determining the set
646 of base instances in the uniqueness scope of a request-URI.
647
648 5.3 New header specifications
649 The following headers are defined, for use as entity-headers. (Due
650 to the terminological confusion discussed in [8], some entity-headers
651 are more properly associated with instances than with entities.)
652
653 5.3.1 DCluster
654 The DCluster entity-header field is used in a response to specify a
655 subset of the uniqueness scope of the entity tag given in the Etag
656 header field of the response. The uniqueness scope is the set of
657 URIs across which this strong entity tag is guaranteed to be unique,
658 for all time. A uniqueness scope is specified by providing one or
659 more prefixes for other URIs in the set.
660
661 DCluster = "DCluster" ":" #( <"> uri-prefix <">)
662 uri-prefix = scheme ":" "//" host [ ":" port ] [ abs_path ]
663 | abs_path
664 | rel_path
665
666 If the uri-prefix is an abs_path or rel_path, the implied scheme is
667 the scheme used in the Request-URI. (Typically, the scheme would be
668 "http".) If the uri-prefix is an abs_path, it is interpreted
669 relative to the origin server host name. If the uri-prefix is a
670 rel_path, it is interpreted relative to the Request-URI.
671
672 The uniqueness scope of a strong entity tag in an ETag header field
673 always includes the Request-URI of the corresponding request, and the
674 union of all URIs matching one or more of the uri-prefix strings in
675 the DCluster header field of the response. It may include other URIs
676 not described in a DCluster header field. That is, the set of URIs
677 for which a uri-prefix in a DCluster header field is a prefix MUST be
678 a subset of the uniqueness scope, and MAY be a proper subset.
679
680 Generally, the DCluster header does not necessarily describe the
681 entire uniqueness scope of an entity tag. Rather, it describes a
682 subset of the uniqueness scope whose members are likely to differ by
683 small deltas.
684 Mogul et al. [Page 12]
685
686 Internet-Draft Delta clustering 24 August 2000 16:15
687
688
689 A server SHOULD NOT include a uri-prefix in a DCluster header field
690 if the server is not likely to be able to generate deltas between the
691 Request-URI and the URIs matching that uri-prefix.
692
693 The uniqueness scope specified by a DCluster header is valid for use
694 by the client only for entity tags received in the same response or
695 in subsequent responses, never for entity tags received in previous
696 responses.
697
698 Section 5.4 defines rules that a client uses for determining the set
699 of base instances in the uniqueness scope of a request-URI.
700
701 5.3.2 DTemplate
702 The DTemplate entity-header field is used in a response to specify
703 another resource that the origin server prefers to use as the base
704 instance for computing deltas for the Request-URI, or for other
705 resources in the uniqueness scope specified by a DCluster header
706 field in the response.
707
708 DTemplate = "DTemplate" ":"
709 #( <"> dt-uri <"> [ "/" dt-param])
710 dt-uri = absoluteURI | abs_path
711 dt-param = "etag" "=" entity-tag
712
713 If the dt-uri is an abs_path, it is interpreted relative to the
714 origin server host name.
715
716 A URI specified in a DTemplate header field is, by definition, in the
717 uniqueness scope of the Request-URI.
718
719 If a client has received a DTemplate header field within a given
720 uniqueness scope, the client SHOULD use an instance of the specified
721 template resource(s) as the base instance for any future delta
722 requests for other resources in the uniqueness scope.
723
724 If the DTemplate header field includes an entity tag with a URI, then
725 the client SHOULD use only the specified instance of the template
726 resource base instance for any future delta requests for other
727 resources in the uniqueness scope.
728
729 The URI specified by a DTemplate header is valid for use by the
730 client only with entity tags received in the same response or in
731 subsequent responses, never for use with entity tags received in
732 previous responses.
733
734 5.4 Rules for determining base instances in a uniqueness scope
735 When a client is about to make a request for a given Request-URI, and
736 wishes to choose entity tags to the request's If-None-Match header
737 field, it follows a set of rules to determine which base instances
738 (and hence, which entity tags) may be included. These rules do not
739 require the client to include any entity tags, and for reasons of
740
741 Mogul et al. [Page 13]
742
743 Internet-Draft Delta clustering 24 August 2000 16:15
744
745
746 performance, a client implementation should not necessarily include
747 all of the legal choices.
748
749 Recall that the uniqueness scope of an entity tag is the set of
750 resources across which this entity tag is unique for all time. In
751 other words, if the client and server correctly agree that the
752 Request-URI is contained in the uniqueness scope for an entity tag E
753 for some URI X, then if the client sends this entity tag E in an
754 If-None-Match header field, the server will know unambiguously which
755 resource it refers to (even though X is not explicitly named in the
756 request).
757
758 The client's view of the uniqueness scope of an entity tag might be a
759 subset of the server's view. (It cannot be a superset, or the server
760 would be unable to interpret the If-None-Match field.) For example,
761 a server might not list all possible uri-prefix values in a DCluster
762 header, for performance reasons, or the client might not support the
763 DTemplate header. A client probably will not have received responses
764 for more than a small subset of the URIs in a uniqueness scope, or it
765 might have deleted some of the instances in order to create space in
766 its cache. A client SHOULD NOT list an entity tag in an
767 If-None-Match header unless it has a cache entry containing at least
768 part of the corresponding instance, since this would otherwise lead
769 to uninterpretable delta responses.
770
771 A Request-URI is in the uniqueness scope of an entity tag E for an
772 instance of URI X if one or more of these conditions holds:
773
774 1. X is the Request-URI.
775
776 2. The DCluster header field of a prior response for the
777 Request-URI includes a prefix of X. The base instance
778 associated with entity tag E MUST NOT have been received
779 before the first such DCluster header field.
780
781 3. The DCluster header field of a prior response for X
782 includes a prefix of the Request-URI. The base instance
783 associated with entity tag E MUST NOT have been received
784 before the first such DCluster header field.
785
786 4. X has been listed in the DTemplate header field of a prior
787 response for the Request-URI, or of a prior response for
788 another URI Y in the uniqueness scope of the Request-URI
789 (by recursive application of these conditions to an
790 instance of URI Y).
791
792 XXX Is this unambiguous?
793
794 Security considerations (see section 6.1) require that a client not
795 always trust every DCluster header that it receives. A malicious
796 server might send a DCluster header that could cause the client to
797
798 Mogul et al. [Page 14]
799
800 Internet-Draft Delta clustering 24 August 2000 16:15
801
802
803 believe that a URI is within the uniqueness scope of an entity tag
804 when, in fact, it is not. Therefore, a client MUST NOT use condition
805 #3 above (DCluster of a prior response for X includes prefix of
806 Request-URI) unless it can securely verify that a resulting delta is
807 not spoofed.
808
809 Our current belief is that spoofing can be detected by any one of the
810 following means:
811
812 - The delta-encoded response is accompanied by a secure
813 message digest covering the entire current instance,
814 generated by the origin server. This allows the client to
815 verify that it has received the current instance of the
816 Request-URI.
817
818 - All of the URIs in the uniqueness scope of the Request-URI
819 have the same "hostport" as the Request-URI; see
820 RFC2396 [3] for the specification of this term. This
821 ensures that, if no interception mechanism is in use, that
822 the client receives what the server wishes it to receive.
823 (In general, malicious interception mechanisms create
824 broader risks than the spoofing of deltas.)
825
826 - All of base instances associated with the entity tags
827 listed in the client's A-IM header came from URIs listed in
828 DCluster or DTemplate headers in responses for prior
829 Request-URIs having the same "hostport" as the current
830 Request-URI. This ensures that the chosen base instances
831 came from origin servers trusted by the origin server for
832 the current Request-URI.
833
834 Note: the spoofing detection mechanisms listed above should be
835 reviewed by competent security experts.
836
837
838 6 Security Considerations
839
840 Note: This aspect of the specification is the subject of some
841 controversy, and the details of protections against spoofing
842 attacks in particular are likely to change. We will seek a
843 more formal security review of this specification as part of
844 the IETF standardization process.
845
846 6.1 Spoofing attacks using the DCluster header
847 We have identified a potential spoofing attack via the ``DCluster''
848 header. In this scenario, a malicious server (e.g.,
849 malicious.example.org) generates a response (e.g., for
850 http://malicious.example.org/trap.html) with a ``DCluster'' header
851 indicating that the uniqueness scope of the entity tag in the
852 response includes another server (e.g., victim.example.com). Suppose
853 that the response includes the entity tag "abc". Now suppose that
854 the client makes this request:
855 Mogul et al. [Page 15]
856
857 Internet-Draft Delta clustering 24 August 2000 16:15
858
859
860 GET /foo.html HTTP/1.1
861 host: victim.example.com
862 If-None-Match: "abc"
863 A-IM: vcdiff
864
865 If the victim.example.com server does actually have an instance with
866 entity tag "abc", either for http://victim.example.com/foo.html or
867 for a resource that really is in the same uniqueness scope, then the
868 server will generate a delta. However, if the client applies this
869 delta to the cached response for
870 http://malicious.example.org/trap.html, it will end up either with
871 garbage, or (more perniciously) with an apparently genuine result
872 that actually contains bogus information inserted by
873 malicious.example.org. (The response for
874 http://malicious.example.org/trap.html might contain the bogus
875 information concealed in HTML comments.)
876
877 Protection against this attack can be accomplished by the use of
878 end-to-end digests on the instances, as described in another
879 proposal [11]. (Message digests, such as provided by ``Content-MD5''
880 or by Digest Authentication, are not sufficient, since none of the
881 individual messages are tampered with in this attack.)
882
883 Note that protection against spoofing via the ``DCluster'' header
884 does not inherently require a keyed digest. Since the delta encoded
885 response for http://victim.example.com/foo.html is not itself
886 generated by malicious.example.org, an end-to-end digest included
887 with this response by victim.example.com is sufficient to prove that
888 the client's reconstruction of foo.html is correct. However, if
889 message tampering is also a possibility, then the server should also
890 provide a keyed message digest.
891
892 Another defense against such an attack is for the client to ignore a
893 ``DCluster'' header that specifies a different server. However, this
894 defense is only effective if servers that generate delta-encoded
895 responses are not shared among multiple, possibly mutually
896 untrustworthy, content providers. It also reduces the potential
897 effectiveness of clustering, especially for large sites split across
898 multiple servers.
899
900 Note that because the DTemplate header field also adds one or more
901 URIs to the uniqueness scope of an entity tag, the same spoofing
902 attack is possible using the DTemplate header, and the same defenses
903 apply.
904
905 We recommend that if a client receives a delta-encoded response
906 without an accompanying Digest, and if the client's view of the
907 uniqueness scope for the Request-URI includes more than one server
908 hostname, then the response should either be discarded, or presented
909 to the user as potentially corrupt.
910
911
912 Mogul et al. [Page 16]
913
914 Internet-Draft Delta clustering 24 August 2000 16:15
915
916
917 6.2 Privacy attacks using the DCluster header
918 Many people have drawn attention to the privacy risks associated with
919 HTTP Cookies, which allow a site (or group of cooperating sites) to
920 track the activity of a user. More recently, Martin Pool has
921 identified a similar tracking mechanism based on cache validators,
922 especially entity tags [12]. In this attack, a site encodes
923 user-specific information in an entity tag, and then tracks repeated
924 requests by that user to the same resource, as the user's browser
925 attempts to validate its cache entry using that entity tag.
926
927 Although this tracks only the requests for a specific resource (URL),
928 a site can indirectly track references to many other pages by
929 embedding an image reference to the tracked URL on each of those
930 pages.
931
932 Just as with Cookies, the entity-tag tracking mechanism depends upon
933 the server's ability to induce the client to send back a specific
934 string on subsequent requests. However, the basic entity-tag
935 tracking mechanism only allows a site to track access to pages that
936 it controls.
937
938 The ``DCluster'' header field specified in this document makes this
939 tracking mechanism more powerful, by allowing one site to gain access
940 to entity tags from many other sites. For example, suppose that the
941 site evil.example.com knows the format used to encode client-specific
942 information in entity tags issued by the site naive.example.com. Any
943 client who visits http://evil.example.com/home.html and receives a
944
945 DCluster: http://naive.example.com/
946
947 header in response might then later make a delta-capable request to
948 evil.example.com that includes entity tags issued by
949 naive.example.com.
950
951 It might be possible to defend against such ``hijacked'' tracking
952 attacks by chosing a cryptographically strong encoding for the
953 client-specific data hidden in entity tags, but this might not always
954 be feasible. In any event, this could not hide from evil.example.com
955 the fact that the client had at some point visited naive.example.com
956 (which could be significant if this site provided, for example,
957 medical information about an embarrassing disease).
958
959 Cryptographic digests of instances, as described in section 6.1 to
960 protect against DCluster spoofing, do not help, because the malicious
961 site in this case is the source of the requested data, and need not
962 actually use a delta encoding to accomplish its attack.
963
964 As in section 6.1, one possible defense is for the client to ignore a
965 ``DCluster'' header that specifies a different server, but (also as
966 discussed in section 6.1) this is not ideal.
967
968
969 Mogul et al. [Page 17]
970
971 Internet-Draft Delta clustering 24 August 2000 16:15
972
973
974 User agents SHOULD provide a method to allow users to disable the use
975 of the ``DCluster'' header, preferably either in all cases, or in
976 cross-site cases.
977
978 6.3 Data leakage attacks using the DCluster header
979 Suppose that a server has asserted, using a DCluster header, that
980 resources URL1 and URL2 are in the same uniqueness scope. Also
981 suppose that a client is allowed to access URL1, but is not allowed
982 to access URL2. (Access may be denied due to a lack of
983 authentication, or a server configuration setting, or some other
984 mechanism.) Finally, suppose that the client can guess or obtain the
985 entity tag ET2 of some instance of URL2. If the client asks the
986 server for the current instance of URL1 as a delta from the ET2
987 instance of URL2, and the server responds with such a delta, this may
988 reveal information about the contents of URL2. (The amount of
989 information revealed depends strongly on the delta-coding format, and
990 probably will not be enough to recover the full contents of URL2.)
991
992 A server MUST NOT reply using a delta encoding, if the chosen base
993 instance is not an instance of the Request-URI, unless the server can
994 verify that the client would currently be allowed access to both the
995 chosen base instance and the Request-URI.
996
997
998 7 History
999
1000 7.1 draft-mogul-http-dcluster-00.txt
1001 This document was split off from draft-mogul-http-delta-*.txt, to
1002 avoid having the security issues affect the basic HTTP delta encoding
1003 specification, and to ensure that the design of clusters and
1004 templates was done so that they are entirely optional for
1005 implementors of basic delta encoding.
1006
1007
1008 8 Acknowledgements
1009
1010 Andrew Birrell alerted us to the possibility of data leakage attacks
1011 using the DCluster header. Koen Holtman contributed to the drafting
1012 of this document, and especially to the security considerations and
1013 mechanisms.
1014
1015
1016 9 References
1017
1018 NOTE TO RFC EDITOR: many of the references here might be out of date.
1019 Please verify these with the primary author of this Internet-Draft
1020 before issuing this document as an RFC.
1021
1022 1. Gaurav Banga, Fred Douglis, and Michael Rabinovich. Optimistic
1023 Deltas for WWW Latency Reduction. Proc. 1997 USENIX Technical
1024 Conference, Anaheim, CA, January, 1997, pp. 289-303.
1025
1026 Mogul et al. [Page 18]
1027
1028 Internet-Draft Delta clustering 24 August 2000 16:15
1029
1030
1031 2. T. Berners-Lee, R. Fielding, and H. Frystyk. Hypertext Transfer
1032 Protocol -- HTTP/1.0. RFC 1945, HTTP Working Group, May, 1996.
1033
1034 3. T. Berners-Lee, R. Fielding, and L. Masinter. Uniform Resource
1035 Identifiers (URI): Generic Syntax. RFC 2396, IETF, August, 1998.
1036
1037 4. S. Bradner. Key words for use in RFCs to Indicate Requirement
1038 Levels. RFC 2119, Harvard University, March, 1997.
1039
1040 5. Fred Douglis, Antonio Haro, and Michael Rabinovich. HPP: HTML
1041 Macro-Preprocessing to Support Dynamic Document Caching. Proc.
1042 USENIX Symposium on Internet Technologies and Systems, USENIX,
1043 Monterey, CA, December, 1997, pp. 83-94.
1044
1045 6. Roy T. Fielding, Jim Gettys, Jeffrey C. Mogul, Henrik Frystyk
1046 Nielsen, Larry Masinter, Paul Leach, and Tim Berners-Lee. Hypertext
1047 Transfer Protocol -- HTTP/1.1. RFC 2616, HTTP Working Group, June,
1048 1999.
1049
1050 7. Barron C. Housel and David B. Lindquist. WebExpress: A System
1051 for Optimizing Web Browsing in a Wireless Environment. Proc. 2nd
1052 Annual Intl. Conf. on Mobile Computing and Networking, ACM, Rye, New
1053 York, November, 1996, pp. 108-116.
1054 http://www.networking.ibm.com/art/artwewp.htm.
1055
1056 8. Jeffrey C. Mogul, Balachander Krishnamurthy, Fred Douglis, Anja
1057 Feldmann, Yaron Goland, and Arthur van Hoff. Delta encoding in HTTP.
1058 Internet-Draft draft-mogul-http-delta-06, IETF, August, 2000. This is
1059 a work in progress.
1060
1061 9. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander
1062 Krishnamurthy. Potential benefits of delta encoding and data
1063 compression for HTTP. Proc. SIGCOMM '97, Cannes, France, September,
1064 1997, pp. 181-194.
1065
1066 10. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander
1067 Krishnamurthy. Potential benefits of delta encoding and data
1068 compression for HTTP. Research Report 97/4, DECWRL, July, 1997. URL
1069 http://www.research.digital.com/wrl/techreports/abstracts/97.4.html.
1070
1071 11. Jeffrey C. Mogul and Arthur Van Hoff. Instance Digests in HTTP.
1072 Internet-Draft draft-mogul-http-digest-02, IETF, March, 2000. This is
1073 a work in progress.
1074
1075 12. Martin Pool. meantime: non-consensual http user tracking using
1076 caches. http://www.linuxcare.com.au/mbp/meantime/.
1077
1078
1079
1080
1081
1082
1083 Mogul et al. [Page 19]
1084
1085 Internet-Draft Delta clustering 24 August 2000 16:15
1086
1087
1088 10 Authors' addresses
1089
1090 Jeffrey C. Mogul
1091 Western Research Laboratory
1092 Compaq Computer Corporation
1093 250 University Avenue
1094 Palo Alto, California, 94305, U.S.A.
1095 Email: mogul@pa.dec.com
1096 Phone: 1 650 617 3304 (email preferred)
1097
1098 Fred Douglis
1099 AT&T Labs - Research
1100 180 Park Ave, Room B-137
1101 Florham Park, NJ 07932-0971, U.S.A.
1102 Email: douglis@research.att.com
1103 Phone: 1 973 360-8775
1104
1105 Daniel M. Hellerstein
1106 Economic Research Service, USDA
1107 1909 Franwall Ave, Wheaton MD 20902
1108 E-mail: danielh@crosslink.net or webmaster@srehttp.org
1109 Phone: 1 202 694-5613 or 1 301 649-4728
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140 Mogul et al. [Page 20]

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24