/[suikacvs]/webroot/www/2004/id/draft-mogul-http-age-00.txt
Suika

Contents of /webroot/www/2004/id/draft-mogul-http-age-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:04:06 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2
3 Network Working Group J. Mogul, DECWRL
4 Internet-Draft 12 September 1997
5 Expires: 26 March 1998
6
7
8 Generation of the Age header field in HTTP/1.1
9
10 draft-mogul-http-age-00.txt
11
12
13 STATUS OF THIS MEMO
14
15 This document is an Internet-Draft. Internet-Drafts are
16 working documents of the Internet Engineering Task Force
17 (IETF), its areas, and its working groups. Note that other
18 groups may also distribute working documents as
19 Internet-Drafts.
20
21 Internet-Drafts are draft documents valid for a maximum of
22 six months and may be updated, replaced, or obsoleted by
23 other documents at any time. It is inappropriate to use
24 Internet-Drafts as reference material or to cite them other
25 than as "work in progress."
26
27 To learn the current status of any Internet-Draft, please
28 check the "1id-abstracts.txt" listing contained in the
29 Internet-Drafts Shadow Directories on ftp.is.co.za
30 (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific
31 Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US
32 West Coast).
33
34 Distribution of this document is unlimited. Please send
35 comments to the HTTP working group at
36 <http-wg@cuckoo.hpl.hp.com>. Discussions of the working
37 group are archived at
38 <URL:http://www.ics.uci.edu/pub/ietf/http/>. General
39 discussions about HTTP and the applications which use HTTP
40 should take place on the <www-talk@w3.org> mailing list.
41
42
43 ABSTRACT
44
45 The 'Age' response-header field in HTTP/1.1 [RFC 2068] is
46 intended to provide a lower bound of an estimate of a
47 response message's age (time since generation), by
48 explicitly indicating the amount of time that is known to
49 have passed since the response message was retrieved or
50 revalidated. There has been considerable controversy over
51 when the Age header field should be added to a response.
52 This document explains the issues, rebuts a previous
53 proposal, and provides a set of proposed changes for the
54 revision of RFC 2068.
55
56
57
58 Mogul [Page 1]
59
60 Internet-Draft HTTP Age generation 12 September 1997 13:48
61
62
63 TABLE OF CONTENTS
64
65 1 Introduction 2
66 2 Proposed change to RFC2068 3
67 3 Discussion 3
68 3.1 Quantifying the error of the Age estimate: Interpretation 4
69 C
70 3.2 Quantifying the error of the Age estimate: Interpretation 5
71 B
72 3.3 Prevalence of clock skew in the Web 5
73 3.4 Implications of overestimating the Age value: 6
74 Interpretation C
75 3.5 Implications of underestimating the Age value: 7
76 Interpretation B
77 3.6 So what's all the fuss about, anyway? 8
78 4 Acknowledgements 9
79 5 References 9
80 6 Author's address 9
81
82
83 1 Introduction
84
85 RFC2068 [3] defines, in section 14.6, the Age header field for
86 HTTP/1.1:
87
88 The Age response-header field conveys the sender's estimate of
89 the amount of time since the response (or its revalidation) was
90 generated at the origin server. A cached response is "fresh" if
91 its age does not exceed its freshness lifetime. Age values are
92 calculated as specified in section 13.2.3.
93
94 Age = "Age" ":" age-value
95
96 age-value = delta-seconds
97
98 Age values are non-negative decimal integers, representing time
99 in seconds.
100
101 If a cache receives a value larger than the largest positive
102 integer it can represent, or if any of its age calculations
103 overflows, it MUST transmit an Age header with a value of
104 2147483648 (2^31). HTTP/1.1 caches MUST send an Age header in
105 every response. Caches SHOULD use an arithmetic type of at least
106 31 bits of range.
107
108 Roy Fielding has taken issue with the statement ``HTTP/1.1 caches
109 MUST send an Age header in every response.'' [2]. Fielding correctly
110 points out that the use of the word ``caches'' in this sentence is
111 incorrect. He lists two possible rewordings for this sentence:
112
113
114
115 Mogul [Page 2]
116
117 Internet-Draft HTTP Age generation 12 September 1997 13:48
118
119
120 a) An HTTP/1.1 server that includes a cache MUST send an Age
121 header field in every response.
122
123 b) An HTTP/1.1 server that includes a cache MUST include an Age
124 header field in every response generated from its own cache.
125
126 There is, however, a third possible rewording, if one realizes that
127 the original drafting error was to sloppily substitute the word
128 ``cache'' where ``proxy'' was meant (mea culpa):
129
130 c) An HTTP/1.1 proxy MUST send an Age header field in every
131 response.
132
133 Note that an HTTP proxy does not necessarily include a cache. The
134 other sentences referring to ``caches'' in that paragraph reflect the
135 same drafting error.
136
137
138 2 Proposed change to RFC2068
139
140 For this paragraph in section 14.6 (Age),
141
142 If a cache receives a value larger than the largest positive
143 integer it can represent, or if any of its age calculations
144 overflows, it MUST transmit an Age header with a value of
145 2147483648 (2^31). HTTP/1.1 caches MUST send an Age header in
146 every response. Caches SHOULD use an arithmetic type of at least
147 31 bits of range.
148
149 substitute
150
151 If a proxy server receives a value larger than the largest
152 positive integer it can represent, or if any of its age
153 calculations overflows, it MUST transmit an Age header with a
154 value of 2147483648 (2^31). An HTTP/1.1 proxy MUST send an Age
155 header in every response. Proxies SHOULD use an arithmetic type
156 of at least 31 bits of range to represent Age values.
157
158
159 3 Discussion
160
161 Because of facts of life such as resolution granularity, clock skews,
162 and the theory of relativity, the Age value can never be exactly
163 accurate; it is always an estimate. We would like to bound the
164 inaccuracy of this estimate. Most important, we would like to ensure
165 that any inaccuracy remaining in the Age estimate not cause avoidable
166 and significant harm.
167
168
169
170
171
172 Mogul [Page 3]
173
174 Internet-Draft HTTP Age generation 12 September 1997 13:48
175
176
177 3.1 Quantifying the error of the Age estimate: Interpretation C
178 If interpretation (c) is used (i.e., the change proposed above in
179 section 2), then the value of the Age header field might overestimate
180 the actual amount of time since the response was generated at the
181 origin server. One can calculate the size of the estimation error
182 for a path containing N HTTP/1.1 proxies:
183
184 Error_C = Mean_RTT * N * (N + 1)/2
185
186 where Mean_RTT is the mean round-trip time (RTT) between neighboring
187 pairs of proxies. Note that this measures the error in the Age value
188 at the final recipient client, not the error at the last HTTP/1.1
189 proxy on the path.
190
191 Actually, this analysis assumes that the RTT on each hop is the
192 Mean_RTT; in fact, the contribution from the hops closest to the
193 origin server contribute more to the true error value than this
194 simplified formula, and so it is not necessarily an upper bound on
195 the error. If one uses
196
197 Error_C_bound = Max_RTT * N * (N + 1)/2
198
199 where Max_RTT is the largest of the RTTs over all of the hops, then
200 the Error_C_bound estimate is an upper bound.
201
202 In order to quantify this bound, we need parameter values for N and
203 Max_RTT. It's not clear exactly how many proxy caches are used on an
204 arbitrary path through the Internet, especially since many of these
205 paths start within large intranets, but I think it would be realistic
206 to assume that most such paths have somewhere between N=1 and N=6.
207
208 Clearly, the Max_RTT value is harder to quantify, since over
209 extremely lossy paths, the RTT can be nearly infinite. As a
210 surrogate for this value, I looked at the total retrieval time for
211 each of 504736 retrievals in a two-day proxy trace made in December,
212 1996. (See [5] for more discussion of this trace.) Among these
213 retrievals, the mean total retrieval time was 1.25 seconds, with a
214 median of 0.25 seconds and a standard deviation of 13.7 seconds. As
215 far as I can tell, a large part of the variation is due to servers
216 that drop SYN packets due to buggy TCP stacks. Given this, and given
217 that the median is closer to 1/4 second, and given that a full
218 HTTP/1.0 retrieval takes at least two round-trips (and probably
219 more), I think it's fair to assume a value of 2 seconds as an
220 approximate upper bound for the Max_RTT for most paths.
221
222 Therefore, doing the math
223
224 Error_C_bound[N=1] = 2 seconds
225 Error_C_bound[N=6] = 42 seconds
226
227 and, just for comparison, if one uses the observed median retrieval
228
229 Mogul [Page 4]
230
231 Internet-Draft HTTP Age generation 12 September 1997 13:48
232
233
234 time (0.25 seconds) and assumes the use of no more than 3 HTTP/1.1
235 proxies, then Error_C_bound = 1.5 seconds.
236
237 3.2 Quantifying the error of the Age estimate: Interpretation B
238 If interpretation (b) is used, then the value of the Age header field
239 might underestimate the actual amount of time since the response was
240 generated at the origin server. The scenario that generates the
241 greatest underestimate is when the path between the origin server and
242 the HTTP/1.1 client's cache includes an HTTP/1.0 proxy cache, and the
243 client has a skewed clock. In particular, the client's clock is set
244 into the past by some amount.
245
246 In this scenario, since an HTTP/1.0 proxy cache will not add the Age
247 header, there is no indication that the response was delayed by a
248 cache between the origin server and the first HTTP/1.1 cache. The
249 only way for that HTTP/1.1 cache to construct its initial Age
250 estimate is to compare its own clock against the Date value. If the
251 client's clock is wrong, this error is, in general, impossible to
252 correct. For example, suppose the client's clock is one hour slow.
253 If the origin server generates the response at 10:00AM, and then the
254 response sits in the HTTP/1.0 cache for 65 minutes, when the response
255 arrives at the HTTP/1.1 client, it will appear to be only 5 minutes
256 old.
257
258 The benefit of using interpretation (c) is that if there is a
259 non-caching HTTP/1.1 proxy between the HTTP/1.0 cache and the client,
260 unless its clock is also skewed, it will add a nearly-correct Age
261 header field to the response. I.e., the client will realize that the
262 response is at least 65 minutes old, regardless of its clock skew.
263 So, while the error is impossible to correct "in general", it is
264 possible to correct in certain specific cases, and this is the value
265 of interpretation (c).
266
267 3.3 Prevalence of clock skew in the Web
268 Is clock skew a real problem? Unfortunately, I know of no systematic
269 study of HTTP client clock skews. This is difficult, in part,
270 because HTTP requests generally do not include a Date header.
271
272 However, since I do have access to a trace of the headers flowing
273 through a proxy whose clock, at the time of the trace, was carefully
274 synchronized using NTP, I was able to look at the clock-skew
275 distribution of a large set of HTTP servers. (The trace covers 22034
276 distinct server IP addresses.) While this is not the same as a
277 population of HTTP clients, one might actually expect a set of HTTP
278 servers to have better clock synchronization characteristics than a
279 set of HTTP clients. After all, many HTTP clients run on personal
280 computers or workstations, and are managed by non-experts; most Web
281 servers on the Internet have at least some semblance of
282 administration (e.g., someone at least had to obtain a DNS name). In
283 other words, whatever the situation with Web server clocks, one would
284 expect the situation among clients to be worse.
285
286 Mogul [Page 5]
287
288 Internet-Draft HTTP Age generation 12 September 1997 13:48
289
290
291 For each response in the trace, I compared the Date header field
292 value (if any) to the proxy's NTP-synchronized timestamps for the
293 start of the connection and the end of the connection. If the
294 server's clock is accurate, the Date value ought to be between those
295 two timestamps. If the server's clock is slow, the Date value would
296 be lower than the start-timestamp; if the server's clock is fast, the
297 Date value would be higher than the end-timestamp.
298
299 Because of the 1-second granularity of Date, I treated as "valid" any
300 values less than 1 second in error. I also treated as "obviously
301 bogus" any Date where the server's clock appeared to be more than 1
302 day wrong, since one could assume that such a badly skewed server
303 clock would be abnormal.
304
305 The trace contained 503969 responses with parsable response headers.
306 Of these, only 286779 actually had Date headers (most of the rest
307 appear to be PointCast responses). 1087 of these had Date values
308 that were clearly bogus (by the "1-day-wrong" test). Of the others,
309 116966 (41%) showed a server with a "slow" clock (by at least one
310 second), and 83782 (29%) showed a "fast" clock. Only 84944 (30%) had
311 apparently-synchronized clocks.
312
313 What if we set the threshold for an OK clock at +/- 60 seconds
314 (which, by the earlier analysis, is somewhat larger than the
315 Error_C_bound for N = 6 and Max_RTT = 2)? In this case, we still
316 find 79443 (27%) responses indicating "slow" clocks, and 56429
317 responses (20%) indicating "fast" clocks. In other words, a lot of
318 the clocks are off by a lot of time.
319
320 Using the 1-second threshold, the mean error in the slow clocks is
321 1287 seconds, with a median error of 113 seconds. For the fast
322 clocks, the mean error is 1383 seconds, with a median of 97 seconds.
323
324 Using the 60-second threshold, the mean error in the slow clocks is
325 1884 seconds, with a median error of 198 seconds. For the fast
326 clocks, the mean error is 2039 seconds, with a median of 152 seconds.
327 (We're removing the small-error samples from these sets, so we're
328 left with sets biased towards high-error samples.)
329
330 In summary, clock skew seems to be prevalent among HTTP servers, and
331 the skews seem to be fairly large. One might be justified in
332 guessing that the situation is worse among HTTP clients.
333
334 NOTE: I should reanalyze this data, breaking it down by server
335 address, rather than by response, but that will have to wait
336 for another draft of this document.
337
338 3.4 Implications of overestimating the Age value: Interpretation C
339 What happens if the Age value is overestimated? If this happens,
340 some "fresh" responses appear to be "stale", and so unnecessary cache
341 misses may be generated. Except in the case where the network is
342
343 Mogul [Page 6]
344
345 Internet-Draft HTTP Age generation 12 September 1997 13:48
346
347
348 partitioned, this is a performance problem, but does not lead to the
349 delivery of responses with the wrong body or headers. (When the
350 network is partitioned, caches are allowed to return stale values
351 with an appropriate Warning, so this is irrelevant to the current
352 discussion).
353
354 Further, only a very small subset of the fresh, cachable responses
355 will be affected. In particular, this can only affect a response
356 whose actual remaining freshness lifetime is smaller than the error
357 in the Age estimate. Presumably, this is only a significant issue
358 for responses whose initial max-age value is small (since otherwise
359 it would be a low-probability coincidence that a request arrives at
360 the cache just before the freshness lifetime expires). (Remember
361 that the Error_C_bound = 42 for N = 6 and Max_RTT = 2, and is a lot
362 lower for shorter proxy paths.)
363
364 We have no good information on what kinds of lifetimes HTTP/1.1
365 origin servers will be assigning to their responses (very few current
366 servers send Expires), but we can make a guess that of the responses
367 that are likely to be cachable, most will be assigned fairly long
368 lifetimes. This is because existing caches already assign, by
369 heuristic, fairly long lifetimes to responses without Expires or
370 max-age values, and (for the most part) this kind of caching seems to
371 be tolerated.
372
373 Certainly, one would hope that forthcoming HTTP/1.1 server
374 implementations will make it easier for service authors to specify
375 small freshness lifetimes. However, it is unlikely that this could
376 lead to a great improvement in the number of cache hits. Several
377 trace-based studies [1, 4] have shown that there is an upper limit to
378 the performance of straightforward HTTP caches, because for many
379 resources, no two requests result in the same response body.
380 Therefore, one should not expect the currently-observed cache hit
381 rates to get a lot better.
382
383 3.5 Implications of underestimating the Age value: Interpretation B
384 What happens if the Age value is underestimated? If this happens,
385 some "stale" responses appear to be "fresh", and are returned to the
386 client without any Warning. The client, in this case, naively
387 obtains the wrong response value.
388
389 This is a far more serious error than causing an extra cache miss.
390 Delivering the wrong response is not always a significant error, but
391 in some cases it can lead to serious external consequences.
392
393 Moreover, the set of cachable responses vulnerable to this error is
394 much larger than for an overestimation error. The reason is, as we
395 saw above, clock-skew errors appear to be quite common, especially
396 slow-clock errors, and the mean error is measured in tens of minutes.
397 This is likely to be much closer to the magnitude of freshness
398 lifetimes than the approximately 1-minute overestimation error.
399
400 Mogul [Page 7]
401
402 Internet-Draft HTTP Age generation 12 September 1997 13:48
403
404
405 The adoption of interpretation (b), in an attempt to avoid
406 unnecessary cache misses, could have a perverse effect: by
407 significantly increasing the likelihood of undetected delivery of
408 stale responses when the freshness lifetime is short, this might
409 discourage service authors from allowing such responses to be cached
410 at all. In other words, service authors generating responses with
411 short cachable lifetimes might be driven to declare these responses
412 as totally uncachable, to avoid the possibility of unwarranted
413 caching.
414
415 One should also note that a small (but certainly non-empty) fraction
416 of the bogus Date values in our trace are wrong by days, not hours.
417 If such a problem afflicts many clients, this could affect the
418 willingness of service authors to allow caching even of responses
419 with relatively long lifetimes.
420
421 3.6 So what's all the fuss about, anyway?
422 It might seem that arguing over the issue of caching responses with
423 relatively short freshness lifetimes is a tempest in a teapot. At
424 some level, it is: straightforward HTTP caching will probably never
425 yield the kind of hit rates seen for CPU caches, and this is
426 especially true for short-lifetime values.
427
428 However, there are a number of ways in which the effectiveness of an
429 HTTP cache could be improved; for example, prefetching seems to be
430 useful for decreasing latency (although it doesn't help if the path
431 is already bandwidth-limited). Prefetching might take advantage of
432 the cachability of responses with relatively short lifetimes, but it
433 will not be feasible if service authors cannot trust the prefetching
434 caches to be punctilious about freshness lifetimes.
435
436 In other fields of computer science (for example, compilers or
437 multiprocessor caches), aggressive optimization always depends on a
438 reliable understanding of the situation at hand. That is, if one
439 cannot be sure that the transformation done by the optimization
440 preserves the semantics of the system, one cannot safely do that
441 optimization (and, consequently, one may be prevented from doing a
442 lot of other optimizations that are enabled by the first one). Even
443 when optimizations are done "speculatively", this always involves
444 being able to check the results for semantic correctness before
445 committing them.
446
447 Attempts to make things faster at the cost of semantic transparency
448 might have some short-term attractions, especially in a setting where
449 we have historically been sloppy about semantic correctness. But
450 this is ultimately a dead-end path; after once giving up on semantic
451 correctness, it's almost impossible to go back.
452
453
454
455
456
457 Mogul [Page 8]
458
459 Internet-Draft HTTP Age generation 12 September 1997 13:48
460
461
462 4 Acknowledgements
463
464 I would like to thank Jim Gettys for comments on this draft.
465
466
467 5 References
468
469 1. Fred Douglis, Anja Feldmann, Balachander Krishnamurthy, and
470 Jeffrey Mogul. Rate of Change and Other Metrics: a Live Study of
471 the World Wide Web. Proc. Symposium on Internet Technologies and
472 Systems, USENIX, Monterey, CA, December, 1997. To appear.
473
474 2. Roy T. Fielding. Age Header Field in HTTP/1.1. Internet-Draft
475 draft-fielding-http-age-00.txt, Network Working Group, March, 1997.
476
477 3. Roy T. Fielding, Jim Gettys, Jeffrey C. Mogul, Henrik Frystyk
478 Nielsen, and Tim Berners-Lee. Hypertext Transfer Protocol --
479 HTTP/1.1. RFC 2068, HTTP Working Group, January, 1997.
480
481 4. Thomas M. Kroeger, Darrell D. E. Long, and Jeffrey C. Mogul.
482 Exploring the Bounds of Web Latency Reduction from Caching and
483 Prefetching. Proc. Symposium on Internet Technologies and Systems,
484 USENIX, Monterey, CA, December, 1997. To appear.
485
486 5. Jeffrey C. Mogul, Fred Douglis, Anja Feldmann, and Balachander
487 Krishnamurthy. Potential benefits of delta encoding and data
488 compression for HTTP. Research Report 97/4, DECWRL, July, 1997. URL
489 http://www.research.digital.com/wrl/techreports/abstracts/97.4.html.
490
491
492 6 Author's address
493
494 Jeffrey C. Mogul
495 Western Research Laboratory
496 Digital Equipment Corporation
497 250 University Avenue
498 Palo Alto, California, 94305, USA
499 Email: mogul@wrl.dec.com
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514 Mogul [Page 9]

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24