/[suikacvs]/webroot/www/2004/id/draft-hallam-http-logfile-00.txt
Suika

Contents of /webroot/www/2004/id/draft-hallam-http-logfile-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:37:15 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2 INTERNET DRAFT Phillip M. Hallam-Baker, W3C
3 Expires in six months email: <hallam@w3.org>
4 Brian Behlendorf
5 email: <brian@organic.com>
6 21st February 1996
7
8
9 Extended Log File Format
10
11 <draft-hallam-http-logfile-00.txt>
12
13 Status of this Memo
14
15 This document is an Internet draft. Internet drafts are working
16 documents of the Internet Engineering Task Force (IETF), its areas
17 and its working groups. Note that other groups may also distribute
18 working information as Internet drafts.
19
20 Internet Drafts are draft documents valid for a maximum of six
21 months and can be updated, replaced or obsoleted by other documents
22 at any time. It is inappropriate to use Internet drafts as reference
23 material or to cite them as other than as "work in progress".
24
25 To learn the current status of any Internet draft please check the
26 "lid-abstracts.txt" listing contained in the Internet drafts shadow
27 directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
28 munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or
29 ftp.isi.edu (US West coast). Further information about the IETF can
30 be found at URL: http://www.cnri.reston.va.us/
31
32 Distribution of this document is unlimited. Please send comments to
33 the HTTP working group (HTTP-WG) of the Internet Engineering Task
34 Force (IETF) at < http://www.ics.uci.edu/pub/ietf/http/. This note
35 is also avaliable as a World Wide Web Consortium Working Draft
36 WD-logfile-960221, archived at
37 http://www.w3.org/pub/WWW/TR/WD-logfile-960221.html
38
39 Extended Log File Format
40
41 WD-logfile-960221
42
43 Extended Log File Format
44
45 W3C Working Draft _WD-logfile-960221_
46
47 This version:
48 http://www.w3.org/pub/WWW/TR/WD-logfile-960221.html
49
50 Latest version:
51 http://www.w3.org/pub/WWW/TR/WD-logfile.html
52
53 Authors:
54 Phillip M. Hallam-Baker <hallam@w3.org>
55
56 Phillip M. Hallam-Baker Page 1
57
58
59
60 Extended Log File Format
61
62 Brian Behlendorf <brian@organic.com>
63
64 ------------------------------------------------------------------------------
65 Status of this document
66
67 This is a W3C Working Draft for review by W3C members and other
68 interested parties. It is a draft document and may be updated,
69 replaced or obsoleted by other documents at any time. It is
70 inappropriate to use W3C Working Drafts as reference material or to
71 cite them as other than "work in progress". A list of current W3C
72 working drafts can be found at: http://www.w3.org/pub/WWW/TR
73
74 Note: since working drafts are subject to frequent change, you are
75 advised to reference the above URL, rather than the URLs for working
76 drafts themselves.
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116 Phillip M. Hallam-Baker Page 2
117
118
119
120 Extended Log File Format
121
122 Abstract
123
124 An improved format for Web server log files is presented. The format
125 is extensible, permitting a wider range of data to be captured. This
126 proposal is motivated by the need to capture a wider range of data
127 for demographic analysis and also the needs of proxy caches.
128
129 Introduction
130
131 Most Web servers offer the option to store logfiles in either the
132 common log format or a proprietary format. The common log file
133 format is supported by the majority of analysis tools but the
134 information about each server transaction is fixed. In many cases it
135 is desirable to record more information. Sites sensitive to personal
136 data issues may wish to omit the recording of certain data. In
137 addition ambiguities arise in analysing the common log file format
138 since field separator characters may in some cases occur within
139 fields. The extended log file format is designed to meet the
140 following needs:
141
142 * Permit control over the data recorded.
143
144 * Support needs of proxies, clients and servers in a common format
145
146 * Provide robust handling of character escaping issues
147
148 * Allow exchange of demographic data.
149
150 * Allow summary data to be expressed.
151
152 The log file format described permits customized logfiles to be
153 recorded in a format readable by generic analysis tools. A header
154 specifying the data types recorded is written out at the start of
155 each log.
156
157 This work is in part motivated by the need to support collection of
158 demographic data. This work is discussed at greater length in
159 companion drafts describing session identifier URIs [Hallam96a] and
160 more consistent proxy behaviour [Hallam96b].
161
162 Format
163
164 A extended log file contains a sequence of _lines_ containing ASCII
165 characters terminated by either the sequence CR or CRLF. Log file
166 generators should follow the line termination convention for the
167 platform on which they are executed. Analysers should accept either
168 form. Each line may contain either a _directive_ or an _entry_.
169
170 Entries consist of a sequence of _fields_ relating to a single HTTP
171 transaction. Fields are separated by whitespace, the use of tab
172 characters for this purpose is encouraged. If a field is unused in a
173 particular entry dash "-" marks the omitted field. Directives record
174 information about the logging process itself.
175
176 Phillip M. Hallam-Baker Page 3
177
178
179
180 Extended Log File Format
181
182 The following directives are defined:
183
184 Version: _<integer>_._<integer>_
185 The version of the extended log file format used. This draft
186 defines version 1.0.
187
188 Fields: [_<specifier>_...]
189 Specifies the fields recorded in the log.
190
191 Software: _string_
192 Identifies the software which generated the log.
193
194 Start-Date: _<date>_ _<time>_
195 The date and time at which the log was started.
196
197 End-Date:_<date>_ _<time>_
198 The date and time at which the log was finished.
199
200 Date:_<date>_ _<time>_
201 The date and time at which the entry was added.
202
203 Remark: _<text>_
204 Comment information. Data recorded in this field should be
205 ignored by analysis tools.
206
207 The directives Version and Fields are required and should preceed
208 all entries in the log. The Fields directive specifies the data
209 recorded in the fields of each entry.
210
211 Example
212
213 The following is an example file in the extended log format:
214
215
216 #Version: 1.0
217 #Date: 12-Jan-1996
218 #Fields: time cs-method cs-uri
219 00:34:23 GET /foo/bar.html
220 12:21:16 GET /foo/bar.html
221 12:45:52 GET /foo/bar.html
222 12:57:34 GET /foo/bar.html
223
224
225 Fields
226
227 The #Fields directive lists a sequence of _field identifiers_
228 specifying the information recorded in each entry. Field identifiers
229 may have one of the following forms:
230
231 _identifier_
232 Identifier relates to the transaction as a whole.
233
234 _prefix_-_identifier_
235
236 Phillip M. Hallam-Baker Page 4
237
238
239
240 Extended Log File Format
241
242 Identifier relates to information transfer between parties
243 defined by the value _prefix_.
244
245 _prefix_(_header_)
246 Identifies the value of the HTTP header field _header_ for
247 transfer between parties defined by the value _prefix_. Fields
248 specified in this manner always have the value <string>.
249
250 The following prefixes are defined:
251
252 cs
253 Client to Server.
254
255 sc
256 Server to Client.
257
258 sr
259 Server to Remote Server, this prefix is used by proxies.
260
261 rs
262 Remote Server to Server, this prefix is used by proxies.
263
264 x
265 Application specific identifier.
266
267 The identifier cs-method thus refers to the method in the request
268 sent by the client to the server while sc(Content-Type) refers to
269 the content type field of the reply.
270
271 Identifiers.
272
273 The following identifiers do not require a prefix
274
275 date
276 Date at which transaction completed, field has type <date>
277
278 time
279 Time at which transaction completed, field has type <time>
280
281 time-taken
282 Time taken for transaction to complete in seconds, field has
283 type <fixed>
284
285 bytes
286 bytes transfered, field has type <integer>
287
288 cached
289 Records wether a cache hit occured, field has type <integer> 0
290 indicates a cache miss.
291
292 The following identifiers require a prefix
293
294 ip
295
296 Phillip M. Hallam-Baker Page 5
297
298
299
300 Extended Log File Format
301
302 IP address and port, field has type <address>
303
304 dns
305 DNS name, field has type <name>
306
307 status
308 Status code, field has type <integer>
309
310 comment
311 Comment returned with status code, field has type <>
312
313 method
314 Method, field has type <name>
315
316 uri
317 URI, field has type <uri>
318
319 uri-stem
320 Stem protion alone of URI (omitting query), field has type <uri>
321
322 uri-query
323 Query portion alone of URI, field has type <uri>
324
325 host
326 DNS hostname used, field has type <name>
327
328 Special fields for log summaries.
329
330 Analysis tools may generate log summaries. A log summary lists the
331 number of operations performed on a URI but omits information
332 specific to a particular transaction.
333
334 The following field is mandatory and must preceed all others:
335
336 count
337 The number of entries for which the listed data, field has type
338 <>
339
340 The following fields may be used in place of time to allow
341 aggregation of log file entries over intervals of time.
342
343 time-from
344 Time at which sampling began, field has type <time>
345
346 time-to
347 Time at which sampling ended, field has type <time>
348
349 interval
350 Time over which sampling occurred in seconds, field has type
351 <integer>
352
353 Entries
354
355
356 Phillip M. Hallam-Baker Page 6
357
358
359
360 Extended Log File Format
361
362 This section describes the data formats for log file field entries.
363 These formats are chosen so as to avoid ambiguity, minimize the
364 difficulty of generation and parsing and provide for human
365 readability.
366
367 Each logfile entry consists of a sequence of fields separated by
368 whitespace and terminated by a CR or CRLF sequence. The meanings of
369 the fields are defined by a preceeding #Fields directive. If a field
370 is ommitted for a particular entry a single dash "-" is substituted.
371
372 Log file parsers should be tolerant of errors. If an entry contains
373 corrupt data or is terminated unexpectedly the parser should
374 resynchronize using the end of line marker and continue to parse the
375 following entries. Entries must not contain any ASCII control
376 characters.
377
378
379 <entry> = <field>* <end-of-line>
380
381 <field> = <integer> | <fixed> | <uri> | <date> | <time> | <string>
382
383 <digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
384
385
386 Integer
387
388
389 <integer> = <digit> +
390
391
392 Integers are represented as a sequence of digits.
393
394 Fixed Format Float
395
396
397 <float> = <digit> [. <digit>*]
398
399
400 URI
401
402 A URI as specified by RFC1738, relative URIs are specified by
403 RFC1808. URIs cannot by definition include whitespace or ASCII
404 control characters. Consequently no ambiguity arises from their use.
405
406 Date
407
408
409 <date> = <digit> <digit> <digit> <digit>
410 "-" <digit> <digit> "-" <digit> <digit>
411
412
413 Dates are recorded in the format YYYY-MM-DD where YYYY, MM and DD
414 stand for the numeric year, month and day respectively. This format
415
416 Phillip M. Hallam-Baker Page 7
417
418
419
420 Extended Log File Format
421
422 is chosen to assist collation using sort.
423
424 Time
425
426
427 <time> = <digit> <digit> ":" <digit> <digit>
428 [":" <digit> <digit> ["." <digit>*]
429
430
431
432 Times are recorded in the form HH:MM, HH:MM:SS or HH:MM:SS.S where
433 HH is the hour in 24 hour format, MM is minutes and SS is seconds.
434
435 String
436
437
438 <string> = '"' <schar>* '"'
439
440 <schar> = xchar | '"' '"'
441
442
443 Strings are output in quoted form. If a string contains a quotation
444 character the character is repeated. This format is unambiguous
445 since fields are by definition separated by whitespace.
446
447 No mechanism for incorporating control characters is defined.
448
449 Text
450
451
452 <text> = <char>*
453
454
455 The text field is used only by directives.
456
457 Name
458
459
460 <name> = <alpha> [ "." <alpha> * ]
461
462
463 DNS name.
464
465 Address
466
467
468 <name> = <integer> [ "." <integer> * ] [ ":" <integer> ]
469
470
471 Numeric IP address and optional port specifier.
472
473 Acknowledgements.
474
475
476 Phillip M. Hallam-Baker Page 8
477
478
479
480 Extended Log File Format
481
482 Robert Thau provided usefull advice and some code. John Mallery and
483 Roger Hurwitz helped develop many of the ideas.
484
485
486 Phillip M. Hallam-Baker
487 hallam@w3.org
488 World Wid Web Consortium
489 Cambridge MA
490
491 Brian Behlendorf
492 brian@organic.com
493 Organic Online
494
495
496 References.
497
498
499 [RFC1808]
500 R. Fielding _ Relative Uniform Resource Locators_, June 1995
501
502 [RFC1738]
503 T. Berners-Lee, L. Masinter, _ Uniform Resource Locators (URL)_,
504 December 1994
505
506 [Luotonen95]
507 A. luotonen, _ The Common Logfile Format_, 1995,
508 http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html
509
510 [Hallam96a]
511 P. M. Hallam-Baker _ Session Identification URI_ World Wide Web
512 Consortium Working Draft, WD-session-id.
513
514 [Hallam96b]
515 P. M. Hallam-Baker _ Notification for Proxy Caches_ World Wide
516 Web Consortium Working Draft, WD-proxy.
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536 Phillip M. Hallam-Baker Page 9
537

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24