/[suikacvs]/webroot/www/2004/id/draft-hallam-http-logfile-00.txt
Suika

Contents of /webroot/www/2004/id/draft-hallam-http-logfile-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Tue Jun 15 08:37:15 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 wakaba 1.1
2     INTERNET DRAFT Phillip M. Hallam-Baker, W3C
3     Expires in six months email: <hallam@w3.org>
4     Brian Behlendorf
5     email: <brian@organic.com>
6     21st February 1996
7    
8    
9     Extended Log File Format
10    
11     <draft-hallam-http-logfile-00.txt>
12    
13     Status of this Memo
14    
15     This document is an Internet draft. Internet drafts are working
16     documents of the Internet Engineering Task Force (IETF), its areas
17     and its working groups. Note that other groups may also distribute
18     working information as Internet drafts.
19    
20     Internet Drafts are draft documents valid for a maximum of six
21     months and can be updated, replaced or obsoleted by other documents
22     at any time. It is inappropriate to use Internet drafts as reference
23     material or to cite them as other than as "work in progress".
24    
25     To learn the current status of any Internet draft please check the
26     "lid-abstracts.txt" listing contained in the Internet drafts shadow
27     directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
28     munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or
29     ftp.isi.edu (US West coast). Further information about the IETF can
30     be found at URL: http://www.cnri.reston.va.us/
31    
32     Distribution of this document is unlimited. Please send comments to
33     the HTTP working group (HTTP-WG) of the Internet Engineering Task
34     Force (IETF) at < http://www.ics.uci.edu/pub/ietf/http/. This note
35     is also avaliable as a World Wide Web Consortium Working Draft
36     WD-logfile-960221, archived at
37     http://www.w3.org/pub/WWW/TR/WD-logfile-960221.html
38    
39     Extended Log File Format
40    
41     WD-logfile-960221
42    
43     Extended Log File Format
44    
45     W3C Working Draft _WD-logfile-960221_
46    
47     This version:
48     http://www.w3.org/pub/WWW/TR/WD-logfile-960221.html
49    
50     Latest version:
51     http://www.w3.org/pub/WWW/TR/WD-logfile.html
52    
53     Authors:
54     Phillip M. Hallam-Baker <hallam@w3.org>
55    
56     Phillip M. Hallam-Baker Page 1
57    
58    
59    
60     Extended Log File Format
61    
62     Brian Behlendorf <brian@organic.com>
63    
64     ------------------------------------------------------------------------------
65     Status of this document
66    
67     This is a W3C Working Draft for review by W3C members and other
68     interested parties. It is a draft document and may be updated,
69     replaced or obsoleted by other documents at any time. It is
70     inappropriate to use W3C Working Drafts as reference material or to
71     cite them as other than "work in progress". A list of current W3C
72     working drafts can be found at: http://www.w3.org/pub/WWW/TR
73    
74     Note: since working drafts are subject to frequent change, you are
75     advised to reference the above URL, rather than the URLs for working
76     drafts themselves.
77    
78    
79    
80    
81    
82    
83    
84    
85    
86    
87    
88    
89    
90    
91    
92    
93    
94    
95    
96    
97    
98    
99    
100    
101    
102    
103    
104    
105    
106    
107    
108    
109    
110    
111    
112    
113    
114    
115    
116     Phillip M. Hallam-Baker Page 2
117    
118    
119    
120     Extended Log File Format
121    
122     Abstract
123    
124     An improved format for Web server log files is presented. The format
125     is extensible, permitting a wider range of data to be captured. This
126     proposal is motivated by the need to capture a wider range of data
127     for demographic analysis and also the needs of proxy caches.
128    
129     Introduction
130    
131     Most Web servers offer the option to store logfiles in either the
132     common log format or a proprietary format. The common log file
133     format is supported by the majority of analysis tools but the
134     information about each server transaction is fixed. In many cases it
135     is desirable to record more information. Sites sensitive to personal
136     data issues may wish to omit the recording of certain data. In
137     addition ambiguities arise in analysing the common log file format
138     since field separator characters may in some cases occur within
139     fields. The extended log file format is designed to meet the
140     following needs:
141    
142     * Permit control over the data recorded.
143    
144     * Support needs of proxies, clients and servers in a common format
145    
146     * Provide robust handling of character escaping issues
147    
148     * Allow exchange of demographic data.
149    
150     * Allow summary data to be expressed.
151    
152     The log file format described permits customized logfiles to be
153     recorded in a format readable by generic analysis tools. A header
154     specifying the data types recorded is written out at the start of
155     each log.
156    
157     This work is in part motivated by the need to support collection of
158     demographic data. This work is discussed at greater length in
159     companion drafts describing session identifier URIs [Hallam96a] and
160     more consistent proxy behaviour [Hallam96b].
161    
162     Format
163    
164     A extended log file contains a sequence of _lines_ containing ASCII
165     characters terminated by either the sequence CR or CRLF. Log file
166     generators should follow the line termination convention for the
167     platform on which they are executed. Analysers should accept either
168     form. Each line may contain either a _directive_ or an _entry_.
169    
170     Entries consist of a sequence of _fields_ relating to a single HTTP
171     transaction. Fields are separated by whitespace, the use of tab
172     characters for this purpose is encouraged. If a field is unused in a
173     particular entry dash "-" marks the omitted field. Directives record
174     information about the logging process itself.
175    
176     Phillip M. Hallam-Baker Page 3
177    
178    
179    
180     Extended Log File Format
181    
182     The following directives are defined:
183    
184     Version: _<integer>_._<integer>_
185     The version of the extended log file format used. This draft
186     defines version 1.0.
187    
188     Fields: [_<specifier>_...]
189     Specifies the fields recorded in the log.
190    
191     Software: _string_
192     Identifies the software which generated the log.
193    
194     Start-Date: _<date>_ _<time>_
195     The date and time at which the log was started.
196    
197     End-Date:_<date>_ _<time>_
198     The date and time at which the log was finished.
199    
200     Date:_<date>_ _<time>_
201     The date and time at which the entry was added.
202    
203     Remark: _<text>_
204     Comment information. Data recorded in this field should be
205     ignored by analysis tools.
206    
207     The directives Version and Fields are required and should preceed
208     all entries in the log. The Fields directive specifies the data
209     recorded in the fields of each entry.
210    
211     Example
212    
213     The following is an example file in the extended log format:
214    
215    
216     #Version: 1.0
217     #Date: 12-Jan-1996
218     #Fields: time cs-method cs-uri
219     00:34:23 GET /foo/bar.html
220     12:21:16 GET /foo/bar.html
221     12:45:52 GET /foo/bar.html
222     12:57:34 GET /foo/bar.html
223    
224    
225     Fields
226    
227     The #Fields directive lists a sequence of _field identifiers_
228     specifying the information recorded in each entry. Field identifiers
229     may have one of the following forms:
230    
231     _identifier_
232     Identifier relates to the transaction as a whole.
233    
234     _prefix_-_identifier_
235    
236     Phillip M. Hallam-Baker Page 4
237    
238    
239    
240     Extended Log File Format
241    
242     Identifier relates to information transfer between parties
243     defined by the value _prefix_.
244    
245     _prefix_(_header_)
246     Identifies the value of the HTTP header field _header_ for
247     transfer between parties defined by the value _prefix_. Fields
248     specified in this manner always have the value <string>.
249    
250     The following prefixes are defined:
251    
252     cs
253     Client to Server.
254    
255     sc
256     Server to Client.
257    
258     sr
259     Server to Remote Server, this prefix is used by proxies.
260    
261     rs
262     Remote Server to Server, this prefix is used by proxies.
263    
264     x
265     Application specific identifier.
266    
267     The identifier cs-method thus refers to the method in the request
268     sent by the client to the server while sc(Content-Type) refers to
269     the content type field of the reply.
270    
271     Identifiers.
272    
273     The following identifiers do not require a prefix
274    
275     date
276     Date at which transaction completed, field has type <date>
277    
278     time
279     Time at which transaction completed, field has type <time>
280    
281     time-taken
282     Time taken for transaction to complete in seconds, field has
283     type <fixed>
284    
285     bytes
286     bytes transfered, field has type <integer>
287    
288     cached
289     Records wether a cache hit occured, field has type <integer> 0
290     indicates a cache miss.
291    
292     The following identifiers require a prefix
293    
294     ip
295    
296     Phillip M. Hallam-Baker Page 5
297    
298    
299    
300     Extended Log File Format
301    
302     IP address and port, field has type <address>
303    
304     dns
305     DNS name, field has type <name>
306    
307     status
308     Status code, field has type <integer>
309    
310     comment
311     Comment returned with status code, field has type <>
312    
313     method
314     Method, field has type <name>
315    
316     uri
317     URI, field has type <uri>
318    
319     uri-stem
320     Stem protion alone of URI (omitting query), field has type <uri>
321    
322     uri-query
323     Query portion alone of URI, field has type <uri>
324    
325     host
326     DNS hostname used, field has type <name>
327    
328     Special fields for log summaries.
329    
330     Analysis tools may generate log summaries. A log summary lists the
331     number of operations performed on a URI but omits information
332     specific to a particular transaction.
333    
334     The following field is mandatory and must preceed all others:
335    
336     count
337     The number of entries for which the listed data, field has type
338     <>
339    
340     The following fields may be used in place of time to allow
341     aggregation of log file entries over intervals of time.
342    
343     time-from
344     Time at which sampling began, field has type <time>
345    
346     time-to
347     Time at which sampling ended, field has type <time>
348    
349     interval
350     Time over which sampling occurred in seconds, field has type
351     <integer>
352    
353     Entries
354    
355    
356     Phillip M. Hallam-Baker Page 6
357    
358    
359    
360     Extended Log File Format
361    
362     This section describes the data formats for log file field entries.
363     These formats are chosen so as to avoid ambiguity, minimize the
364     difficulty of generation and parsing and provide for human
365     readability.
366    
367     Each logfile entry consists of a sequence of fields separated by
368     whitespace and terminated by a CR or CRLF sequence. The meanings of
369     the fields are defined by a preceeding #Fields directive. If a field
370     is ommitted for a particular entry a single dash "-" is substituted.
371    
372     Log file parsers should be tolerant of errors. If an entry contains
373     corrupt data or is terminated unexpectedly the parser should
374     resynchronize using the end of line marker and continue to parse the
375     following entries. Entries must not contain any ASCII control
376     characters.
377    
378    
379     <entry> = <field>* <end-of-line>
380    
381     <field> = <integer> | <fixed> | <uri> | <date> | <time> | <string>
382    
383     <digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
384    
385    
386     Integer
387    
388    
389     <integer> = <digit> +
390    
391    
392     Integers are represented as a sequence of digits.
393    
394     Fixed Format Float
395    
396    
397     <float> = <digit> [. <digit>*]
398    
399    
400     URI
401    
402     A URI as specified by RFC1738, relative URIs are specified by
403     RFC1808. URIs cannot by definition include whitespace or ASCII
404     control characters. Consequently no ambiguity arises from their use.
405    
406     Date
407    
408    
409     <date> = <digit> <digit> <digit> <digit>
410     "-" <digit> <digit> "-" <digit> <digit>
411    
412    
413     Dates are recorded in the format YYYY-MM-DD where YYYY, MM and DD
414     stand for the numeric year, month and day respectively. This format
415    
416     Phillip M. Hallam-Baker Page 7
417    
418    
419    
420     Extended Log File Format
421    
422     is chosen to assist collation using sort.
423    
424     Time
425    
426    
427     <time> = <digit> <digit> ":" <digit> <digit>
428     [":" <digit> <digit> ["." <digit>*]
429    
430    
431    
432     Times are recorded in the form HH:MM, HH:MM:SS or HH:MM:SS.S where
433     HH is the hour in 24 hour format, MM is minutes and SS is seconds.
434    
435     String
436    
437    
438     <string> = '"' <schar>* '"'
439    
440     <schar> = xchar | '"' '"'
441    
442    
443     Strings are output in quoted form. If a string contains a quotation
444     character the character is repeated. This format is unambiguous
445     since fields are by definition separated by whitespace.
446    
447     No mechanism for incorporating control characters is defined.
448    
449     Text
450    
451    
452     <text> = <char>*
453    
454    
455     The text field is used only by directives.
456    
457     Name
458    
459    
460     <name> = <alpha> [ "." <alpha> * ]
461    
462    
463     DNS name.
464    
465     Address
466    
467    
468     <name> = <integer> [ "." <integer> * ] [ ":" <integer> ]
469    
470    
471     Numeric IP address and optional port specifier.
472    
473     Acknowledgements.
474    
475    
476     Phillip M. Hallam-Baker Page 8
477    
478    
479    
480     Extended Log File Format
481    
482     Robert Thau provided usefull advice and some code. John Mallery and
483     Roger Hurwitz helped develop many of the ideas.
484    
485    
486     Phillip M. Hallam-Baker
487     hallam@w3.org
488     World Wid Web Consortium
489     Cambridge MA
490    
491     Brian Behlendorf
492     brian@organic.com
493     Organic Online
494    
495    
496     References.
497    
498    
499     [RFC1808]
500     R. Fielding _ Relative Uniform Resource Locators_, June 1995
501    
502     [RFC1738]
503     T. Berners-Lee, L. Masinter, _ Uniform Resource Locators (URL)_,
504     December 1994
505    
506     [Luotonen95]
507     A. luotonen, _ The Common Logfile Format_, 1995,
508     http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html
509    
510     [Hallam96a]
511     P. M. Hallam-Baker _ Session Identification URI_ World Wide Web
512     Consortium Working Draft, WD-session-id.
513    
514     [Hallam96b]
515     P. M. Hallam-Baker _ Notification for Proxy Caches_ World Wide
516     Web Consortium Working Draft, WD-proxy.
517    
518    
519    
520    
521    
522    
523    
524    
525    
526    
527    
528    
529    
530    
531    
532    
533    
534    
535    
536     Phillip M. Hallam-Baker Page 9
537    

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24