1 |
wakaba |
1.1 |
|
2 |
|
|
INTERNET DRAFT Phillip M. Hallam-Baker, W3C
|
3 |
|
|
Expires in six months email: <hallam@w3.org>
|
4 |
|
|
Brian Behlendorf
|
5 |
|
|
email: <brian@organic.com>
|
6 |
|
|
21st February 1996
|
7 |
|
|
|
8 |
|
|
|
9 |
|
|
Extended Log File Format
|
10 |
|
|
|
11 |
|
|
<draft-hallam-http-logfile-00.txt>
|
12 |
|
|
|
13 |
|
|
Status of this Memo
|
14 |
|
|
|
15 |
|
|
This document is an Internet draft. Internet drafts are working
|
16 |
|
|
documents of the Internet Engineering Task Force (IETF), its areas
|
17 |
|
|
and its working groups. Note that other groups may also distribute
|
18 |
|
|
working information as Internet drafts.
|
19 |
|
|
|
20 |
|
|
Internet Drafts are draft documents valid for a maximum of six
|
21 |
|
|
months and can be updated, replaced or obsoleted by other documents
|
22 |
|
|
at any time. It is inappropriate to use Internet drafts as reference
|
23 |
|
|
material or to cite them as other than as "work in progress".
|
24 |
|
|
|
25 |
|
|
To learn the current status of any Internet draft please check the
|
26 |
|
|
"lid-abstracts.txt" listing contained in the Internet drafts shadow
|
27 |
|
|
directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
|
28 |
|
|
munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or
|
29 |
|
|
ftp.isi.edu (US West coast). Further information about the IETF can
|
30 |
|
|
be found at URL: http://www.cnri.reston.va.us/
|
31 |
|
|
|
32 |
|
|
Distribution of this document is unlimited. Please send comments to
|
33 |
|
|
the HTTP working group (HTTP-WG) of the Internet Engineering Task
|
34 |
|
|
Force (IETF) at < http://www.ics.uci.edu/pub/ietf/http/. This note
|
35 |
|
|
is also avaliable as a World Wide Web Consortium Working Draft
|
36 |
|
|
WD-logfile-960221, archived at
|
37 |
|
|
http://www.w3.org/pub/WWW/TR/WD-logfile-960221.html
|
38 |
|
|
|
39 |
|
|
Extended Log File Format
|
40 |
|
|
|
41 |
|
|
WD-logfile-960221
|
42 |
|
|
|
43 |
|
|
Extended Log File Format
|
44 |
|
|
|
45 |
|
|
W3C Working Draft _WD-logfile-960221_
|
46 |
|
|
|
47 |
|
|
This version:
|
48 |
|
|
http://www.w3.org/pub/WWW/TR/WD-logfile-960221.html
|
49 |
|
|
|
50 |
|
|
Latest version:
|
51 |
|
|
http://www.w3.org/pub/WWW/TR/WD-logfile.html
|
52 |
|
|
|
53 |
|
|
Authors:
|
54 |
|
|
Phillip M. Hallam-Baker <hallam@w3.org>
|
55 |
|
|
|
56 |
|
|
Phillip M. Hallam-Baker Page 1
|
57 |
|
|
|
58 |
|
|
|
59 |
|
|
|
60 |
|
|
Extended Log File Format
|
61 |
|
|
|
62 |
|
|
Brian Behlendorf <brian@organic.com>
|
63 |
|
|
|
64 |
|
|
------------------------------------------------------------------------------
|
65 |
|
|
Status of this document
|
66 |
|
|
|
67 |
|
|
This is a W3C Working Draft for review by W3C members and other
|
68 |
|
|
interested parties. It is a draft document and may be updated,
|
69 |
|
|
replaced or obsoleted by other documents at any time. It is
|
70 |
|
|
inappropriate to use W3C Working Drafts as reference material or to
|
71 |
|
|
cite them as other than "work in progress". A list of current W3C
|
72 |
|
|
working drafts can be found at: http://www.w3.org/pub/WWW/TR
|
73 |
|
|
|
74 |
|
|
Note: since working drafts are subject to frequent change, you are
|
75 |
|
|
advised to reference the above URL, rather than the URLs for working
|
76 |
|
|
drafts themselves.
|
77 |
|
|
|
78 |
|
|
|
79 |
|
|
|
80 |
|
|
|
81 |
|
|
|
82 |
|
|
|
83 |
|
|
|
84 |
|
|
|
85 |
|
|
|
86 |
|
|
|
87 |
|
|
|
88 |
|
|
|
89 |
|
|
|
90 |
|
|
|
91 |
|
|
|
92 |
|
|
|
93 |
|
|
|
94 |
|
|
|
95 |
|
|
|
96 |
|
|
|
97 |
|
|
|
98 |
|
|
|
99 |
|
|
|
100 |
|
|
|
101 |
|
|
|
102 |
|
|
|
103 |
|
|
|
104 |
|
|
|
105 |
|
|
|
106 |
|
|
|
107 |
|
|
|
108 |
|
|
|
109 |
|
|
|
110 |
|
|
|
111 |
|
|
|
112 |
|
|
|
113 |
|
|
|
114 |
|
|
|
115 |
|
|
|
116 |
|
|
Phillip M. Hallam-Baker Page 2
|
117 |
|
|
|
118 |
|
|
|
119 |
|
|
|
120 |
|
|
Extended Log File Format
|
121 |
|
|
|
122 |
|
|
Abstract
|
123 |
|
|
|
124 |
|
|
An improved format for Web server log files is presented. The format
|
125 |
|
|
is extensible, permitting a wider range of data to be captured. This
|
126 |
|
|
proposal is motivated by the need to capture a wider range of data
|
127 |
|
|
for demographic analysis and also the needs of proxy caches.
|
128 |
|
|
|
129 |
|
|
Introduction
|
130 |
|
|
|
131 |
|
|
Most Web servers offer the option to store logfiles in either the
|
132 |
|
|
common log format or a proprietary format. The common log file
|
133 |
|
|
format is supported by the majority of analysis tools but the
|
134 |
|
|
information about each server transaction is fixed. In many cases it
|
135 |
|
|
is desirable to record more information. Sites sensitive to personal
|
136 |
|
|
data issues may wish to omit the recording of certain data. In
|
137 |
|
|
addition ambiguities arise in analysing the common log file format
|
138 |
|
|
since field separator characters may in some cases occur within
|
139 |
|
|
fields. The extended log file format is designed to meet the
|
140 |
|
|
following needs:
|
141 |
|
|
|
142 |
|
|
* Permit control over the data recorded.
|
143 |
|
|
|
144 |
|
|
* Support needs of proxies, clients and servers in a common format
|
145 |
|
|
|
146 |
|
|
* Provide robust handling of character escaping issues
|
147 |
|
|
|
148 |
|
|
* Allow exchange of demographic data.
|
149 |
|
|
|
150 |
|
|
* Allow summary data to be expressed.
|
151 |
|
|
|
152 |
|
|
The log file format described permits customized logfiles to be
|
153 |
|
|
recorded in a format readable by generic analysis tools. A header
|
154 |
|
|
specifying the data types recorded is written out at the start of
|
155 |
|
|
each log.
|
156 |
|
|
|
157 |
|
|
This work is in part motivated by the need to support collection of
|
158 |
|
|
demographic data. This work is discussed at greater length in
|
159 |
|
|
companion drafts describing session identifier URIs [Hallam96a] and
|
160 |
|
|
more consistent proxy behaviour [Hallam96b].
|
161 |
|
|
|
162 |
|
|
Format
|
163 |
|
|
|
164 |
|
|
A extended log file contains a sequence of _lines_ containing ASCII
|
165 |
|
|
characters terminated by either the sequence CR or CRLF. Log file
|
166 |
|
|
generators should follow the line termination convention for the
|
167 |
|
|
platform on which they are executed. Analysers should accept either
|
168 |
|
|
form. Each line may contain either a _directive_ or an _entry_.
|
169 |
|
|
|
170 |
|
|
Entries consist of a sequence of _fields_ relating to a single HTTP
|
171 |
|
|
transaction. Fields are separated by whitespace, the use of tab
|
172 |
|
|
characters for this purpose is encouraged. If a field is unused in a
|
173 |
|
|
particular entry dash "-" marks the omitted field. Directives record
|
174 |
|
|
information about the logging process itself.
|
175 |
|
|
|
176 |
|
|
Phillip M. Hallam-Baker Page 3
|
177 |
|
|
|
178 |
|
|
|
179 |
|
|
|
180 |
|
|
Extended Log File Format
|
181 |
|
|
|
182 |
|
|
The following directives are defined:
|
183 |
|
|
|
184 |
|
|
Version: _<integer>_._<integer>_
|
185 |
|
|
The version of the extended log file format used. This draft
|
186 |
|
|
defines version 1.0.
|
187 |
|
|
|
188 |
|
|
Fields: [_<specifier>_...]
|
189 |
|
|
Specifies the fields recorded in the log.
|
190 |
|
|
|
191 |
|
|
Software: _string_
|
192 |
|
|
Identifies the software which generated the log.
|
193 |
|
|
|
194 |
|
|
Start-Date: _<date>_ _<time>_
|
195 |
|
|
The date and time at which the log was started.
|
196 |
|
|
|
197 |
|
|
End-Date:_<date>_ _<time>_
|
198 |
|
|
The date and time at which the log was finished.
|
199 |
|
|
|
200 |
|
|
Date:_<date>_ _<time>_
|
201 |
|
|
The date and time at which the entry was added.
|
202 |
|
|
|
203 |
|
|
Remark: _<text>_
|
204 |
|
|
Comment information. Data recorded in this field should be
|
205 |
|
|
ignored by analysis tools.
|
206 |
|
|
|
207 |
|
|
The directives Version and Fields are required and should preceed
|
208 |
|
|
all entries in the log. The Fields directive specifies the data
|
209 |
|
|
recorded in the fields of each entry.
|
210 |
|
|
|
211 |
|
|
Example
|
212 |
|
|
|
213 |
|
|
The following is an example file in the extended log format:
|
214 |
|
|
|
215 |
|
|
|
216 |
|
|
#Version: 1.0
|
217 |
|
|
#Date: 12-Jan-1996
|
218 |
|
|
#Fields: time cs-method cs-uri
|
219 |
|
|
00:34:23 GET /foo/bar.html
|
220 |
|
|
12:21:16 GET /foo/bar.html
|
221 |
|
|
12:45:52 GET /foo/bar.html
|
222 |
|
|
12:57:34 GET /foo/bar.html
|
223 |
|
|
|
224 |
|
|
|
225 |
|
|
Fields
|
226 |
|
|
|
227 |
|
|
The #Fields directive lists a sequence of _field identifiers_
|
228 |
|
|
specifying the information recorded in each entry. Field identifiers
|
229 |
|
|
may have one of the following forms:
|
230 |
|
|
|
231 |
|
|
_identifier_
|
232 |
|
|
Identifier relates to the transaction as a whole.
|
233 |
|
|
|
234 |
|
|
_prefix_-_identifier_
|
235 |
|
|
|
236 |
|
|
Phillip M. Hallam-Baker Page 4
|
237 |
|
|
|
238 |
|
|
|
239 |
|
|
|
240 |
|
|
Extended Log File Format
|
241 |
|
|
|
242 |
|
|
Identifier relates to information transfer between parties
|
243 |
|
|
defined by the value _prefix_.
|
244 |
|
|
|
245 |
|
|
_prefix_(_header_)
|
246 |
|
|
Identifies the value of the HTTP header field _header_ for
|
247 |
|
|
transfer between parties defined by the value _prefix_. Fields
|
248 |
|
|
specified in this manner always have the value <string>.
|
249 |
|
|
|
250 |
|
|
The following prefixes are defined:
|
251 |
|
|
|
252 |
|
|
cs
|
253 |
|
|
Client to Server.
|
254 |
|
|
|
255 |
|
|
sc
|
256 |
|
|
Server to Client.
|
257 |
|
|
|
258 |
|
|
sr
|
259 |
|
|
Server to Remote Server, this prefix is used by proxies.
|
260 |
|
|
|
261 |
|
|
rs
|
262 |
|
|
Remote Server to Server, this prefix is used by proxies.
|
263 |
|
|
|
264 |
|
|
x
|
265 |
|
|
Application specific identifier.
|
266 |
|
|
|
267 |
|
|
The identifier cs-method thus refers to the method in the request
|
268 |
|
|
sent by the client to the server while sc(Content-Type) refers to
|
269 |
|
|
the content type field of the reply.
|
270 |
|
|
|
271 |
|
|
Identifiers.
|
272 |
|
|
|
273 |
|
|
The following identifiers do not require a prefix
|
274 |
|
|
|
275 |
|
|
date
|
276 |
|
|
Date at which transaction completed, field has type <date>
|
277 |
|
|
|
278 |
|
|
time
|
279 |
|
|
Time at which transaction completed, field has type <time>
|
280 |
|
|
|
281 |
|
|
time-taken
|
282 |
|
|
Time taken for transaction to complete in seconds, field has
|
283 |
|
|
type <fixed>
|
284 |
|
|
|
285 |
|
|
bytes
|
286 |
|
|
bytes transfered, field has type <integer>
|
287 |
|
|
|
288 |
|
|
cached
|
289 |
|
|
Records wether a cache hit occured, field has type <integer> 0
|
290 |
|
|
indicates a cache miss.
|
291 |
|
|
|
292 |
|
|
The following identifiers require a prefix
|
293 |
|
|
|
294 |
|
|
ip
|
295 |
|
|
|
296 |
|
|
Phillip M. Hallam-Baker Page 5
|
297 |
|
|
|
298 |
|
|
|
299 |
|
|
|
300 |
|
|
Extended Log File Format
|
301 |
|
|
|
302 |
|
|
IP address and port, field has type <address>
|
303 |
|
|
|
304 |
|
|
dns
|
305 |
|
|
DNS name, field has type <name>
|
306 |
|
|
|
307 |
|
|
status
|
308 |
|
|
Status code, field has type <integer>
|
309 |
|
|
|
310 |
|
|
comment
|
311 |
|
|
Comment returned with status code, field has type <>
|
312 |
|
|
|
313 |
|
|
method
|
314 |
|
|
Method, field has type <name>
|
315 |
|
|
|
316 |
|
|
uri
|
317 |
|
|
URI, field has type <uri>
|
318 |
|
|
|
319 |
|
|
uri-stem
|
320 |
|
|
Stem protion alone of URI (omitting query), field has type <uri>
|
321 |
|
|
|
322 |
|
|
uri-query
|
323 |
|
|
Query portion alone of URI, field has type <uri>
|
324 |
|
|
|
325 |
|
|
host
|
326 |
|
|
DNS hostname used, field has type <name>
|
327 |
|
|
|
328 |
|
|
Special fields for log summaries.
|
329 |
|
|
|
330 |
|
|
Analysis tools may generate log summaries. A log summary lists the
|
331 |
|
|
number of operations performed on a URI but omits information
|
332 |
|
|
specific to a particular transaction.
|
333 |
|
|
|
334 |
|
|
The following field is mandatory and must preceed all others:
|
335 |
|
|
|
336 |
|
|
count
|
337 |
|
|
The number of entries for which the listed data, field has type
|
338 |
|
|
<>
|
339 |
|
|
|
340 |
|
|
The following fields may be used in place of time to allow
|
341 |
|
|
aggregation of log file entries over intervals of time.
|
342 |
|
|
|
343 |
|
|
time-from
|
344 |
|
|
Time at which sampling began, field has type <time>
|
345 |
|
|
|
346 |
|
|
time-to
|
347 |
|
|
Time at which sampling ended, field has type <time>
|
348 |
|
|
|
349 |
|
|
interval
|
350 |
|
|
Time over which sampling occurred in seconds, field has type
|
351 |
|
|
<integer>
|
352 |
|
|
|
353 |
|
|
Entries
|
354 |
|
|
|
355 |
|
|
|
356 |
|
|
Phillip M. Hallam-Baker Page 6
|
357 |
|
|
|
358 |
|
|
|
359 |
|
|
|
360 |
|
|
Extended Log File Format
|
361 |
|
|
|
362 |
|
|
This section describes the data formats for log file field entries.
|
363 |
|
|
These formats are chosen so as to avoid ambiguity, minimize the
|
364 |
|
|
difficulty of generation and parsing and provide for human
|
365 |
|
|
readability.
|
366 |
|
|
|
367 |
|
|
Each logfile entry consists of a sequence of fields separated by
|
368 |
|
|
whitespace and terminated by a CR or CRLF sequence. The meanings of
|
369 |
|
|
the fields are defined by a preceeding #Fields directive. If a field
|
370 |
|
|
is ommitted for a particular entry a single dash "-" is substituted.
|
371 |
|
|
|
372 |
|
|
Log file parsers should be tolerant of errors. If an entry contains
|
373 |
|
|
corrupt data or is terminated unexpectedly the parser should
|
374 |
|
|
resynchronize using the end of line marker and continue to parse the
|
375 |
|
|
following entries. Entries must not contain any ASCII control
|
376 |
|
|
characters.
|
377 |
|
|
|
378 |
|
|
|
379 |
|
|
<entry> = <field>* <end-of-line>
|
380 |
|
|
|
381 |
|
|
<field> = <integer> | <fixed> | <uri> | <date> | <time> | <string>
|
382 |
|
|
|
383 |
|
|
<digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
|
384 |
|
|
|
385 |
|
|
|
386 |
|
|
Integer
|
387 |
|
|
|
388 |
|
|
|
389 |
|
|
<integer> = <digit> +
|
390 |
|
|
|
391 |
|
|
|
392 |
|
|
Integers are represented as a sequence of digits.
|
393 |
|
|
|
394 |
|
|
Fixed Format Float
|
395 |
|
|
|
396 |
|
|
|
397 |
|
|
<float> = <digit> [. <digit>*]
|
398 |
|
|
|
399 |
|
|
|
400 |
|
|
URI
|
401 |
|
|
|
402 |
|
|
A URI as specified by RFC1738, relative URIs are specified by
|
403 |
|
|
RFC1808. URIs cannot by definition include whitespace or ASCII
|
404 |
|
|
control characters. Consequently no ambiguity arises from their use.
|
405 |
|
|
|
406 |
|
|
Date
|
407 |
|
|
|
408 |
|
|
|
409 |
|
|
<date> = <digit> <digit> <digit> <digit>
|
410 |
|
|
"-" <digit> <digit> "-" <digit> <digit>
|
411 |
|
|
|
412 |
|
|
|
413 |
|
|
Dates are recorded in the format YYYY-MM-DD where YYYY, MM and DD
|
414 |
|
|
stand for the numeric year, month and day respectively. This format
|
415 |
|
|
|
416 |
|
|
Phillip M. Hallam-Baker Page 7
|
417 |
|
|
|
418 |
|
|
|
419 |
|
|
|
420 |
|
|
Extended Log File Format
|
421 |
|
|
|
422 |
|
|
is chosen to assist collation using sort.
|
423 |
|
|
|
424 |
|
|
Time
|
425 |
|
|
|
426 |
|
|
|
427 |
|
|
<time> = <digit> <digit> ":" <digit> <digit>
|
428 |
|
|
[":" <digit> <digit> ["." <digit>*]
|
429 |
|
|
|
430 |
|
|
|
431 |
|
|
|
432 |
|
|
Times are recorded in the form HH:MM, HH:MM:SS or HH:MM:SS.S where
|
433 |
|
|
HH is the hour in 24 hour format, MM is minutes and SS is seconds.
|
434 |
|
|
|
435 |
|
|
String
|
436 |
|
|
|
437 |
|
|
|
438 |
|
|
<string> = '"' <schar>* '"'
|
439 |
|
|
|
440 |
|
|
<schar> = xchar | '"' '"'
|
441 |
|
|
|
442 |
|
|
|
443 |
|
|
Strings are output in quoted form. If a string contains a quotation
|
444 |
|
|
character the character is repeated. This format is unambiguous
|
445 |
|
|
since fields are by definition separated by whitespace.
|
446 |
|
|
|
447 |
|
|
No mechanism for incorporating control characters is defined.
|
448 |
|
|
|
449 |
|
|
Text
|
450 |
|
|
|
451 |
|
|
|
452 |
|
|
<text> = <char>*
|
453 |
|
|
|
454 |
|
|
|
455 |
|
|
The text field is used only by directives.
|
456 |
|
|
|
457 |
|
|
Name
|
458 |
|
|
|
459 |
|
|
|
460 |
|
|
<name> = <alpha> [ "." <alpha> * ]
|
461 |
|
|
|
462 |
|
|
|
463 |
|
|
DNS name.
|
464 |
|
|
|
465 |
|
|
Address
|
466 |
|
|
|
467 |
|
|
|
468 |
|
|
<name> = <integer> [ "." <integer> * ] [ ":" <integer> ]
|
469 |
|
|
|
470 |
|
|
|
471 |
|
|
Numeric IP address and optional port specifier.
|
472 |
|
|
|
473 |
|
|
Acknowledgements.
|
474 |
|
|
|
475 |
|
|
|
476 |
|
|
Phillip M. Hallam-Baker Page 8
|
477 |
|
|
|
478 |
|
|
|
479 |
|
|
|
480 |
|
|
Extended Log File Format
|
481 |
|
|
|
482 |
|
|
Robert Thau provided usefull advice and some code. John Mallery and
|
483 |
|
|
Roger Hurwitz helped develop many of the ideas.
|
484 |
|
|
|
485 |
|
|
|
486 |
|
|
Phillip M. Hallam-Baker
|
487 |
|
|
hallam@w3.org
|
488 |
|
|
World Wid Web Consortium
|
489 |
|
|
Cambridge MA
|
490 |
|
|
|
491 |
|
|
Brian Behlendorf
|
492 |
|
|
brian@organic.com
|
493 |
|
|
Organic Online
|
494 |
|
|
|
495 |
|
|
|
496 |
|
|
References.
|
497 |
|
|
|
498 |
|
|
|
499 |
|
|
[RFC1808]
|
500 |
|
|
R. Fielding _ Relative Uniform Resource Locators_, June 1995
|
501 |
|
|
|
502 |
|
|
[RFC1738]
|
503 |
|
|
T. Berners-Lee, L. Masinter, _ Uniform Resource Locators (URL)_,
|
504 |
|
|
December 1994
|
505 |
|
|
|
506 |
|
|
[Luotonen95]
|
507 |
|
|
A. luotonen, _ The Common Logfile Format_, 1995,
|
508 |
|
|
http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html
|
509 |
|
|
|
510 |
|
|
[Hallam96a]
|
511 |
|
|
P. M. Hallam-Baker _ Session Identification URI_ World Wide Web
|
512 |
|
|
Consortium Working Draft, WD-session-id.
|
513 |
|
|
|
514 |
|
|
[Hallam96b]
|
515 |
|
|
P. M. Hallam-Baker _ Notification for Proxy Caches_ World Wide
|
516 |
|
|
Web Consortium Working Draft, WD-proxy.
|
517 |
|
|
|
518 |
|
|
|
519 |
|
|
|
520 |
|
|
|
521 |
|
|
|
522 |
|
|
|
523 |
|
|
|
524 |
|
|
|
525 |
|
|
|
526 |
|
|
|
527 |
|
|
|
528 |
|
|
|
529 |
|
|
|
530 |
|
|
|
531 |
|
|
|
532 |
|
|
|
533 |
|
|
|
534 |
|
|
|
535 |
|
|
|
536 |
|
|
Phillip M. Hallam-Baker Page 9
|
537 |
|
|
|