1 |
wakaba |
1.1 |
|
2 |
|
|
|
3 |
|
|
Session Identification URI
|
4 |
|
|
|
5 |
|
|
INTERNET DRAFT Phillip M. Hallam-Baker, W3C
|
6 |
|
|
Expires in six months email: <hallam@w3.org>
|
7 |
|
|
Dan Connolly, W3C
|
8 |
|
|
email: <connolly@w3.org>
|
9 |
|
|
21st February 1996
|
10 |
|
|
|
11 |
|
|
|
12 |
|
|
Session Identification URI
|
13 |
|
|
|
14 |
|
|
<draft-hallam-http-session-id-00.txt>
|
15 |
|
|
|
16 |
|
|
Status of this Memo
|
17 |
|
|
|
18 |
|
|
This document is an Internet draft. Internet drafts are working
|
19 |
|
|
documents of the Internet Engineering Task Force (IETF), its areas
|
20 |
|
|
and its working groups. Note that other groups may also distribute
|
21 |
|
|
working information as Internet drafts.
|
22 |
|
|
|
23 |
|
|
Internet Drafts are draft documents valid for a maximum of six
|
24 |
|
|
months and can be updated, replaced or obsoleted by other documents
|
25 |
|
|
at any time. It is inappropriate to use Internet drafts as reference
|
26 |
|
|
material or to cite them as other than as "work in progress".
|
27 |
|
|
|
28 |
|
|
To learn the current status of any Internet draft please check the
|
29 |
|
|
"lid-abstracts.txt" listing contained in the Internet drafts shadow
|
30 |
|
|
directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
|
31 |
|
|
munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or
|
32 |
|
|
ftp.isi.edu (US West coast). Further information about the IETF can
|
33 |
|
|
be found at URL: http://www.cnri.reston.va.us/
|
34 |
|
|
|
35 |
|
|
Distribution of this document is unlimited. Please send comments to
|
36 |
|
|
the HTTP working group (HTTP-WG) of the Internet Engineering Task
|
37 |
|
|
Force (IETF) at < http://www.ics.uci.edu/pub/ietf/http/. This note
|
38 |
|
|
is also avaliable as a World Wide Web Consortium Working Draft
|
39 |
|
|
WD-session-id-960221, archived at
|
40 |
|
|
http://www.w3.org/pub/WWW/TR/WD-session-id-960221.html
|
41 |
|
|
|
42 |
|
|
Abstract
|
43 |
|
|
|
44 |
|
|
A Uniform Resource Identifier for identifying HTTP sessions is
|
45 |
|
|
described. Session identification URIs permit HTTP transactions to
|
46 |
|
|
be linked within a limited domain. This provides a balance between
|
47 |
|
|
the needs of commercial servers for demographic data collection and
|
48 |
|
|
the privacy concerns of users. In addition session identification
|
49 |
|
|
URIs may be used as part of a high security authentication mechanism
|
50 |
|
|
to prevent replay attacks.
|
51 |
|
|
|
52 |
|
|
Introduction
|
53 |
|
|
|
54 |
|
|
HTTP is specified as a stateless protocol. This permits HTTP servers
|
55 |
|
|
to handle a large number of simultaneous requests. The stateless
|
56 |
|
|
nature of HTTP reduces its utility however. It is not possible to
|
57 |
|
|
track user reading patterns on a single server nor is is possible
|
58 |
|
|
|
59 |
|
|
Phillip M. Hallam-Baker Page 1
|
60 |
|
|
|
61 |
|
|
|
62 |
|
|
|
63 |
|
|
Session Identification URI
|
64 |
|
|
|
65 |
|
|
for a server to adapt its behavior on the basis of previous
|
66 |
|
|
interactions.
|
67 |
|
|
|
68 |
|
|
The ability to trace the path of readers within a Web is important
|
69 |
|
|
for maintainers of larger sites. Trace information may be used to
|
70 |
|
|
analyze the efficacy of cross references within the site, and to
|
71 |
|
|
build profiles of typical users. If it is known for example, that
|
72 |
|
|
readers of an online newspaper who visit the computer section are
|
73 |
|
|
very likely to also visit the business section reporters might be
|
74 |
|
|
asked to provide more cross linkages between these sections.
|
75 |
|
|
Administrators may also wish to discover the number of users
|
76 |
|
|
visiting their site rather than the number of visits.
|
77 |
|
|
|
78 |
|
|
Advertising as Revenue for Web Content Providers.
|
79 |
|
|
|
80 |
|
|
Many content providers raise revenue through advertising.
|
81 |
|
|
Advertisers therefore need to know the effectiveness of Web based
|
82 |
|
|
advertising. Content providers who can provide advertisers with
|
83 |
|
|
detailed profiles of the readership of their material will be able
|
84 |
|
|
to charge higher rates. Reader profiling would permit those
|
85 |
|
|
advertisements most likely to obtain a response to be chosen.
|
86 |
|
|
|
87 |
|
|
A distinctive feature of the Web is its interactive nature. Gill
|
88 |
|
|
[Gill96] points out that the interactive nature of the Web may make
|
89 |
|
|
traditional models of "targeted" advertising obsolete, replacing
|
90 |
|
|
them with participatory models. The Web is an information system and
|
91 |
|
|
users who wish to purchase goods are likely to use it to find out
|
92 |
|
|
details. It may be unnecessary to target advertising in an intrusive
|
93 |
|
|
manner (e.g. unsolicited email). As users become accustomed to more
|
94 |
|
|
participatory modes of advertising intrusive methods may become
|
95 |
|
|
counter productive.
|
96 |
|
|
|
97 |
|
|
There are many metrics which an advertiser may with to use to asses
|
98 |
|
|
the value of a Web placement. These include:
|
99 |
|
|
|
100 |
|
|
Hit counts
|
101 |
|
|
The number of times an advertisement is downloaded. These
|
102 |
|
|
roughly correspond to exposures as understood in conventional
|
103 |
|
|
media.
|
104 |
|
|
|
105 |
|
|
Referrals
|
106 |
|
|
The number of times an advertising hyperlink is followed. This
|
107 |
|
|
implies that the advertiser also has a Web site.
|
108 |
|
|
|
109 |
|
|
Hot leads and Sales.
|
110 |
|
|
Referrals which result in readers demonstrating a significant
|
111 |
|
|
level of interest or which generate sales.
|
112 |
|
|
|
113 |
|
|
Referrals may be determined using the HTTP referer field which
|
114 |
|
|
informs a server of the URI of the resource which referred the
|
115 |
|
|
client to a resource. Unfortunately current log file formats do not
|
116 |
|
|
include this information. A companion document describes an
|
117 |
|
|
extension to the logfile format to record this data.
|
118 |
|
|
|
119 |
|
|
Phillip M. Hallam-Baker Page 2
|
120 |
|
|
|
121 |
|
|
|
122 |
|
|
|
123 |
|
|
Session Identification URI
|
124 |
|
|
|
125 |
|
|
The number of hot leads and/or sales generated by a placement may be
|
126 |
|
|
determined by correlating trace data within the advertiser's home
|
127 |
|
|
Web site with the referer field. This procedure creates an
|
128 |
|
|
interesting correspondence of interest between the parties which
|
129 |
|
|
removes the need for conventional auditing. An advertiser might pay
|
130 |
|
|
the publisher according to the business generated by a placement. It
|
131 |
|
|
is in the advertisers interest to be honest in determining the
|
132 |
|
|
amount paid since the publisher would determine placement frequency
|
133 |
|
|
according to the rate of return. This mechanism is of particular
|
134 |
|
|
interest for adverts targeted at a particular readership where
|
135 |
|
|
auditing may be difficult.
|
136 |
|
|
|
137 |
|
|
Privacy Concerns
|
138 |
|
|
|
139 |
|
|
Just because an advertiser is interested in information does not
|
140 |
|
|
mean that the user is willing to provide it. If care is not taken to
|
141 |
|
|
protect the privacy of its users the Web could enable more extensive
|
142 |
|
|
surveillance of its users than has been available to the most
|
143 |
|
|
ruthless dictatorships.
|
144 |
|
|
|
145 |
|
|
The Internet has a strongly developed but highly unpredictable
|
146 |
|
|
ethical sense. It is a medium of active participants, not of passive
|
147 |
|
|
consumers. Users may complain very publically about perceived wrongs
|
148 |
|
|
(whether justified or not) via Usenet which has a readership of
|
149 |
|
|
several millions. Privacy issues in particular are a frequently
|
150 |
|
|
issues. Consequently it is advisable to approach the issue of
|
151 |
|
|
personal privacy cautiously.
|
152 |
|
|
|
153 |
|
|
Users may be prepared to exchange information about themselves in
|
154 |
|
|
return for access to content. Such systems may provide inaccurate
|
155 |
|
|
data however. Users who believe their privacy to be threatened may
|
156 |
|
|
deliberately supply incorrect information, supplying a false address
|
157 |
|
|
and telephone number to prevent unsolicited mail and phone calls.
|
158 |
|
|
|
159 |
|
|
Personal data is often collected by financial institutions to serve
|
160 |
|
|
as a means of customer authentication. Disclosure of personal data
|
161 |
|
|
may therefore increase fraud risks.
|
162 |
|
|
|
163 |
|
|
Many countries have enacted privacy legislation which controls
|
164 |
|
|
storage and use of personal data. Sites which are governed by such
|
165 |
|
|
laws may wish to avoid unnecessary acquisition and recording of
|
166 |
|
|
personal data.
|
167 |
|
|
|
168 |
|
|
Although the Web has gained popularity as a publishing medium it was
|
169 |
|
|
conceived as a collaboration tool. As Turkel points out [Turkel96],
|
170 |
|
|
a part of the interest of cyberspace may be the ability to take on
|
171 |
|
|
different personas, the ability to voice unpopular views without
|
172 |
|
|
risk. Such partitioning of identity requires the ability to separate
|
173 |
|
|
online activity from offline activity and online activity at one
|
174 |
|
|
site with activity at another. The Web should therefore permit users
|
175 |
|
|
to take on new cyberspace identities through use of pseudonyms and
|
176 |
|
|
the boundaries between these identities must be carefully protected.
|
177 |
|
|
|
178 |
|
|
|
179 |
|
|
Phillip M. Hallam-Baker Page 3
|
180 |
|
|
|
181 |
|
|
|
182 |
|
|
|
183 |
|
|
Session Identification URI
|
184 |
|
|
|
185 |
|
|
Privacy or lack thereof has often been an unanticipated consequence
|
186 |
|
|
of a particular technology. Early telephone users had little privacy
|
187 |
|
|
since every conversation could be overheard by the operator. This
|
188 |
|
|
lead directly to the automatic exchange which was invented by an
|
189 |
|
|
undertaker whose rival was stealing his business by bribing
|
190 |
|
|
telephone operators.
|
191 |
|
|
|
192 |
|
|
Transactions in the HTTP 1.0 protocol are disjoint. A single request
|
193 |
|
|
is made which results in a single response after which the operation
|
194 |
|
|
is completed and the TCP/IP connection closed. The HTTP/1.1 allows
|
195 |
|
|
the same TCP/IP connection to be used to perform multiple
|
196 |
|
|
operations.
|
197 |
|
|
|
198 |
|
|
Pseudo Session Identifiers.
|
199 |
|
|
|
200 |
|
|
IP addresses and ports may be used to provide pseudo identifiers for
|
201 |
|
|
analysis of demographic data. The usefulness of such identifiers is
|
202 |
|
|
severely limited. It is not possible to differentiate two users
|
203 |
|
|
timesharing on a single machine. Nor do users necessarily use the
|
204 |
|
|
same IP address each time. The value of IP addresses for analysis is
|
205 |
|
|
rapidly decreasing due to the growing use of proxies and dynamic IP
|
206 |
|
|
address assignment. These trends will be exacerbated by new
|
207 |
|
|
developments such as mobile IP.
|
208 |
|
|
|
209 |
|
|
Although these pseudo session identifiers are unreliable and
|
210 |
|
|
unsatisfactory they should be taken into consideration when
|
211 |
|
|
considering the privacy issues raised by this proposal. In
|
212 |
|
|
particular it is unnecessary to provide exhaustive proofs of that
|
213 |
|
|
certain forms of linkage cannot be achieved where this is possible
|
214 |
|
|
through similar analysis of IP addresses and ports.
|
215 |
|
|
|
216 |
|
|
Relationship to State-Info (Cookies)
|
217 |
|
|
|
218 |
|
|
State Info [Kristol95] is a proposed extension to the HTTP
|
219 |
|
|
protocol. It is a refinement of the Netscape "Cookies" proposal
|
220 |
|
|
[Netscape95]. This mechanism permits a server to generate a token
|
221 |
|
|
which a client which is returned with future requests. This
|
222 |
|
|
mechanism is requires clients to store data for every server visited
|
223 |
|
|
and is consequently unusable with a tracking mechanism unless the
|
224 |
|
|
number of sites using it is small. In the Session Identifier URI
|
225 |
|
|
proposal identifiers are generated by clients, not servers. This
|
226 |
|
|
provides for scalability since a client need only store a fixed
|
227 |
|
|
amount of identifier information regardless of the number of sites
|
228 |
|
|
visited.
|
229 |
|
|
|
230 |
|
|
URI Format
|
231 |
|
|
|
232 |
|
|
Session IDs have the form:
|
233 |
|
|
|
234 |
|
|
|
235 |
|
|
SID:_type_:_realm_:_identifier[_-_thread][_:_count]_
|
236 |
|
|
|
237 |
|
|
|
238 |
|
|
|
239 |
|
|
Phillip M. Hallam-Baker Page 4
|
240 |
|
|
|
241 |
|
|
|
242 |
|
|
|
243 |
|
|
Session Identification URI
|
244 |
|
|
|
245 |
|
|
Where the fields _type_, _realm_, _identifier_. _thread_ and _count_
|
246 |
|
|
are defined as follows:
|
247 |
|
|
|
248 |
|
|
type
|
249 |
|
|
Type of session identifier. This field allows other session
|
250 |
|
|
identifier types to be defined. This draft specifies the
|
251 |
|
|
identifier type "ANON".
|
252 |
|
|
|
253 |
|
|
realm
|
254 |
|
|
Specifies the realm within which linkage of the identifier is
|
255 |
|
|
possible. Realms have the same format as DNS names.
|
256 |
|
|
|
257 |
|
|
identifier
|
258 |
|
|
Unstructured random integer specific to realm generated using a
|
259 |
|
|
procedure with a negligible probability of collision. The
|
260 |
|
|
identifier is encoded using base 64.
|
261 |
|
|
|
262 |
|
|
thread
|
263 |
|
|
Optional extension of identifier field used to differentiate
|
264 |
|
|
concurrent uses of the same session identifier. The thread field
|
265 |
|
|
is an integer encoded in hexadecimal.
|
266 |
|
|
|
267 |
|
|
count
|
268 |
|
|
Optional Hexadecimal encoded Integer containing a monotonically
|
269 |
|
|
increasing counter value. A client should increment the count
|
270 |
|
|
field after each operation.
|
271 |
|
|
|
272 |
|
|
Examples
|
273 |
|
|
|
274 |
|
|
The following example shows a sequence of session identifiers
|
275 |
|
|
created by the same client. Note that the same counter register is
|
276 |
|
|
used to generate all the session identifiers within the same thread.
|
277 |
|
|
|
278 |
|
|
|
279 |
|
|
SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:34
|
280 |
|
|
SID:ANON:mc.ai.mit.edu:NRviSpoYm7mdkYB4W2471l-01:35
|
281 |
|
|
SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:36
|
282 |
|
|
SID:ANON:mc.ai.mit.edu:NRviSpoYm7mdkYB4W2471l-01:37
|
283 |
|
|
SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-02:01
|
284 |
|
|
SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:38
|
285 |
|
|
|
286 |
|
|
|
287 |
|
|
Limited Linkage of Session Identifiers.
|
288 |
|
|
|
289 |
|
|
Session Identifier URIs permit linkage of transactions within a
|
290 |
|
|
single _realm_. A realm may be considered to approximate to a DNS
|
291 |
|
|
name. DNS names correlate reasonably well with administrative
|
292 |
|
|
divisions. This allows a content provider to track activities within
|
293 |
|
|
sites on their network but does not permit data from different sites
|
294 |
|
|
to be correlated without specific user authorization in advance.
|
295 |
|
|
|
296 |
|
|
Prevention of Replay Attacks
|
297 |
|
|
|
298 |
|
|
|
299 |
|
|
Phillip M. Hallam-Baker Page 5
|
300 |
|
|
|
301 |
|
|
|
302 |
|
|
|
303 |
|
|
Session Identification URI
|
304 |
|
|
|
305 |
|
|
Session Identifiers may also be used within a strong authentication
|
306 |
|
|
scheme to prevent replay attacks. A replay attack involve the
|
307 |
|
|
recording of authentic traffic then replaying it at a later date.
|
308 |
|
|
For example Mallet might intercept Alice's request to download her
|
309 |
|
|
mail file on Monday and then replays it each day to receive the mail
|
310 |
|
|
for the rest of the week.
|
311 |
|
|
|
312 |
|
|
Replay attacks may be prevented by checking message timestamps.
|
313 |
|
|
Unfortunately this requires accurate and secure synchronisation of
|
314 |
|
|
clocks at both ends of the communication which is difficult.
|
315 |
|
|
Alternatively a challenge/response sequence may be employed. This
|
316 |
|
|
introduces an additional round trip delay into the transaction and
|
317 |
|
|
requires the server to maintain a check of which challenges have
|
318 |
|
|
already been responded to.
|
319 |
|
|
|
320 |
|
|
The session identifier URI may be used to prevent replay attacks in
|
321 |
|
|
combination with a timestamp. The server maintains a record of each
|
322 |
|
|
identifier used and checks that subsequent requests with that
|
323 |
|
|
identifier have a higher count field. The volume of data storage
|
324 |
|
|
required may be minimized by checking that the timestamp falls
|
325 |
|
|
within an acceptable validity interval.
|
326 |
|
|
|
327 |
|
|
Implementation Issues
|
328 |
|
|
|
329 |
|
|
A standardized method of constructing session identifiers would
|
330 |
|
|
permit users to use the same session identification information on
|
331 |
|
|
different machines avoiding the need to re-register with content
|
332 |
|
|
providers. This would also be convenient for content providers,
|
333 |
|
|
avoiding a user with more than one machine being counted twice. The
|
334 |
|
|
nature of the session identifiers prevents enforcement of such a
|
335 |
|
|
policy however and the following construction method is therefore
|
336 |
|
|
only advisory.
|
337 |
|
|
|
338 |
|
|
A convenient method of constructing session identifiers which does
|
339 |
|
|
not require separate storage for each realm visited is to use a
|
340 |
|
|
Message Authentication Code (MAC) based upon a cryptographically
|
341 |
|
|
secure one way function such as MD5 [Rivest92].
|
342 |
|
|
|
343 |
|
|
On initialization the client obtains a value _key_. This value
|
344 |
|
|
should be selected in a random manner so as to provide at least 128
|
345 |
|
|
bits of ergodicity. When a realm is visited the value of the
|
346 |
|
|
identifier field is created using the formula
|
347 |
|
|
|
348 |
|
|
_identifier = MD5 (realm + key)_.
|
349 |
|
|
|
350 |
|
|
The client should store the value of _key_ and the counter value
|
351 |
|
|
associated with each thread.
|
352 |
|
|
|
353 |
|
|
HTTP Integration
|
354 |
|
|
|
355 |
|
|
Session identifiers may be incorporated in HTTP messages using the
|
356 |
|
|
Session-Id header. The existing WWW-Authenticate header is extended
|
357 |
|
|
to permit use of session identifiers as a lightweight authentication
|
358 |
|
|
|
359 |
|
|
Phillip M. Hallam-Baker Page 6
|
360 |
|
|
|
361 |
|
|
|
362 |
|
|
|
363 |
|
|
Session Identification URI
|
364 |
|
|
|
365 |
|
|
mechanism.
|
366 |
|
|
|
367 |
|
|
Session-Id
|
368 |
|
|
|
369 |
|
|
|
370 |
|
|
Session-Id: _URI_
|
371 |
|
|
|
372 |
|
|
|
373 |
|
|
The Session-Id header may be incorporated in a http request or
|
374 |
|
|
response. The header accepts a single parameter, the identifier URI.
|
375 |
|
|
|
376 |
|
|
Session identifiers are only created by clients. A Session-Id header
|
377 |
|
|
should only be present in a response if one was specified in the
|
378 |
|
|
corresponding request and should return the same session identifier
|
379 |
|
|
value as the request.
|
380 |
|
|
|
381 |
|
|
Example
|
382 |
|
|
|
383 |
|
|
The following example shows a HTTP request incorporating a session
|
384 |
|
|
identifier.
|
385 |
|
|
|
386 |
|
|
|
387 |
|
|
GET / HTTP/1.0
|
388 |
|
|
Accept: text/plain
|
389 |
|
|
Accept: text/html
|
390 |
|
|
Session-Id: SID:ANON:w3.org:j6oAOxCWZh/CD723LGeXlf-01:034
|
391 |
|
|
User-Agent: libwww/4.1
|
392 |
|
|
|
393 |
|
|
|
394 |
|
|
|
395 |
|
|
A client supporting session identifier URIs should by default attach
|
396 |
|
|
a session identifier to every request using the DNS name of the
|
397 |
|
|
server as the realm. Clients must provide users with an option to
|
398 |
|
|
disable session identifier generation. Clients are encouraged to
|
399 |
|
|
provide a means of selecting the _realm -> identifier_ mapping.
|
400 |
|
|
|
401 |
|
|
WWW-Authenticate
|
402 |
|
|
|
403 |
|
|
|
404 |
|
|
WWW-Authenticate: _1#challenge_
|
405 |
|
|
|
406 |
|
|
|
407 |
|
|
The WWW-Authenticate header is used by a server to request that a
|
408 |
|
|
client to provide a session identifier where none was given or to
|
409 |
|
|
specify one for an alternative realm. This mechanism permits linkage
|
410 |
|
|
of identifiers across realms, but only under user control.
|
411 |
|
|
|
412 |
|
|
Example
|
413 |
|
|
|
414 |
|
|
The following data shows a server requesting an identifier for the
|
415 |
|
|
realm "w3.org".
|
416 |
|
|
|
417 |
|
|
|
418 |
|
|
|
419 |
|
|
Phillip M. Hallam-Baker Page 7
|
420 |
|
|
|
421 |
|
|
|
422 |
|
|
|
423 |
|
|
Session Identification URI
|
424 |
|
|
|
425 |
|
|
HTTP/1.1 401 Unauthorized
|
426 |
|
|
WWW-Authenticate: Session, realm=w3.org
|
427 |
|
|
Server: libwww/4.1
|
428 |
|
|
|
429 |
|
|
|
430 |
|
|
|
431 |
|
|
Clients must not automatically respond to a WWW-Authenticate
|
432 |
|
|
challenge without user direction.
|
433 |
|
|
|
434 |
|
|
A client may offer the user a facility whereby requests for session
|
435 |
|
|
identifiers in alternative names are automatically accepted provided
|
436 |
|
|
they are compatible. Realms may be considered compatible provided
|
437 |
|
|
they are a non trivial prefix of the server dns name. For example a
|
438 |
|
|
server www.w3.org request for the session identifier in the realm
|
439 |
|
|
w3.org would be regarded as compatible but requests for w3.com,
|
440 |
|
|
mit.edu or org would not. DNS names in the toplevel domains com,
|
441 |
|
|
edu, gov, mil and org may generally be considered non trivial
|
442 |
|
|
prefixes (the exclusion of net from this list is intentional. Other
|
443 |
|
|
DNS domains may be considered non trivial prefixes if they are below
|
444 |
|
|
the second level of the DNS hierarchy.
|
445 |
|
|
|
446 |
|
|
Security Considerations
|
447 |
|
|
|
448 |
|
|
Security considerations are discussed throughout this paper in
|
449 |
|
|
addition to this section.
|
450 |
|
|
|
451 |
|
|
Unintended Linkage
|
452 |
|
|
|
453 |
|
|
Collusion between sites may permit linkage of session identifiers
|
454 |
|
|
between realms. A server may permit linkage between identifiers
|
455 |
|
|
within its own realm and another by incorporating the identifier
|
456 |
|
|
component in a URI. The server www.w3.org receiving the session
|
457 |
|
|
identifier SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:34 could
|
458 |
|
|
construct an identifier
|
459 |
|
|
http://ai.mit.edu/link/j6oAOxCWZh/CD723LGeXlf. If the link was
|
460 |
|
|
followed the server ai.mit.edu would be able to track the user's
|
461 |
|
|
activity across both realms.
|
462 |
|
|
|
463 |
|
|
Unsafe Construction Techniques.
|
464 |
|
|
|
465 |
|
|
Care must be taken in constructing session identifiers. A keyed
|
466 |
|
|
digest technique known to be cryptographically sound is recommended.
|
467 |
|
|
In particular implementors should note that a number of techniques
|
468 |
|
|
for constructing MACs from ciphers using XOR functions are insecure
|
469 |
|
|
for this application.
|
470 |
|
|
|
471 |
|
|
Further Work
|
472 |
|
|
|
473 |
|
|
Data Escrow Agents.
|
474 |
|
|
|
475 |
|
|
The method for constructing session identification URIs described
|
476 |
|
|
provides only one possible compromise between privacy and tracking.
|
477 |
|
|
In particular no provision is made for supporting joint registration
|
478 |
|
|
|
479 |
|
|
Phillip M. Hallam-Baker Page 8
|
480 |
|
|
|
481 |
|
|
|
482 |
|
|
|
483 |
|
|
Session Identification URI
|
484 |
|
|
|
485 |
|
|
services. Such services would permit a user to register demographic
|
486 |
|
|
details (age, sex, interests etc.) with a single server
|
487 |
|
|
|
488 |
|
|
Data Escrow Agents support Joint registration services without
|
489 |
|
|
compromising user privacy. A data escrow agent would capture
|
490 |
|
|
demographic data at a central location, and analyze content
|
491 |
|
|
providers log files on their behalf. Escrow agents would be
|
492 |
|
|
responsible for preventing content providers receiving data detailed
|
493 |
|
|
enough to compromise user privacy.
|
494 |
|
|
|
495 |
|
|
In order to protect user privacy session identifiers must only be
|
496 |
|
|
linkable by the data escrow agent. This may be achieved using either
|
497 |
|
|
public key cryptography or message authentication codes.
|
498 |
|
|
|
499 |
|
|
In an implementation of a data escrow agent using public keys the
|
500 |
|
|
data escrow agent provides each content provider with the public
|
501 |
|
|
component of a public key pair. A user visiting a content provider's
|
502 |
|
|
site first creates a session identifier as if the escrow agent's
|
503 |
|
|
realm were to be visited then encrypts it using the content
|
504 |
|
|
provider's public key to create a session identifier specific to the
|
505 |
|
|
content provider. In order to analyze a log file the escrow agent
|
506 |
|
|
decrypts the session identifiers using the private portion of the
|
507 |
|
|
key.
|
508 |
|
|
|
509 |
|
|
In an implementation of a data escrow agent using a MAC, the user
|
510 |
|
|
provides the data escrow agent with demographic data indexed by a
|
511 |
|
|
session identifier keyed to the agent's realm. When contacting a
|
512 |
|
|
content provider the client constructs a session identifier using a
|
513 |
|
|
MAC of the session identifier keyed by the provider's realm. The
|
514 |
|
|
escrow agent may construct a linkage between the provider's logfiles
|
515 |
|
|
and entries in the escrowed database by calculating a MAC for every
|
516 |
|
|
entry in the database. Although this technique involves a larger
|
517 |
|
|
number of operations that the public key based scheme, these
|
518 |
|
|
operations are approximately four orders of magnitude faster.
|
519 |
|
|
|
520 |
|
|
Interaction with Proxies and Caches.
|
521 |
|
|
|
522 |
|
|
Many Web users browse the Web through a caching proxy. In many
|
523 |
|
|
countries this mode of operation is essential due to saturation of
|
524 |
|
|
international network connections. When a proxy serves a user from a
|
525 |
|
|
local cache the originating server has no knowledge of the
|
526 |
|
|
transaction. Consequently logfiles may be incomplete. This problem
|
527 |
|
|
is most serious for commercial sites which use hit counts as a
|
528 |
|
|
measure of readership.
|
529 |
|
|
|
530 |
|
|
A number of techniques may be used to prevent proxies from caching
|
531 |
|
|
data. This permits demographic data to be collected at the cost of
|
532 |
|
|
severely reducing network response. In a significant number of cases
|
533 |
|
|
this will prevent a user from receiving any data at all [Smith96].
|
534 |
|
|
|
535 |
|
|
A better solution is to provide a mechanism whereby a proxy supplies
|
536 |
|
|
a server on request with a log of hits served from the cache. Such
|
537 |
|
|
logs are potentially of value as an indication of audited
|
538 |
|
|
|
539 |
|
|
Phillip M. Hallam-Baker Page 9
|
540 |
|
|
|
541 |
|
|
|
542 |
|
|
|
543 |
|
|
Session Identification URI
|
544 |
|
|
|
545 |
|
|
circulation, particularly if they were to be authenticated using a
|
546 |
|
|
digital signature technique. In some circumstances it may be
|
547 |
|
|
desirable for providers of such information to mask usernames by
|
548 |
|
|
using session identifiers. It is intended to address these issues in
|
549 |
|
|
a separate document.
|
550 |
|
|
|
551 |
|
|
Acknowledgments
|
552 |
|
|
|
553 |
|
|
Dave Raggett made the original proposal to use an anonymous session
|
554 |
|
|
identifier for capture of demographic data. Rohit Khare and Dan
|
555 |
|
|
Connoly helped refine many of the ideas. Roger Hurwitz and John
|
556 |
|
|
Mallery made many helpful comments on early versions of this draft.
|
557 |
|
|
|
558 |
|
|
Authors Addresses
|
559 |
|
|
|
560 |
|
|
|
561 |
|
|
Phillip M. Hallam-Baker
|
562 |
|
|
hallam@w3.org
|
563 |
|
|
World Wid Web Consortium
|
564 |
|
|
Cambridge MA
|
565 |
|
|
|
566 |
|
|
Dan Connolly
|
567 |
|
|
connolly@w3.org
|
568 |
|
|
World Wid Web Consortium
|
569 |
|
|
Cambridge MA
|
570 |
|
|
|
571 |
|
|
|
572 |
|
|
|
573 |
|
|
References
|
574 |
|
|
|
575 |
|
|
[Netscape95]
|
576 |
|
|
Netscape Communications Corp. Persistent client State HTTP
|
577 |
|
|
Cookies_
|
578 |
|
|
|
579 |
|
|
[Hallam96]
|
580 |
|
|
Phillip M. Hallam-Baker _ Extended Log File Format_
|
581 |
|
|
|
582 |
|
|
[Kristol95]
|
583 |
|
|
Kristol, D. _ Proposed HTTP State-Info Mechanism _
|
584 |
|
|
|
585 |
|
|
[Connoly96]
|
586 |
|
|
Dan Connoly _Proposals for Gathering Consumer Demographics_
|
587 |
|
|
|
588 |
|
|
[Hallam93]
|
589 |
|
|
Phillip M. Hallam-Baker _Design note on HTTP referer field._
|
590 |
|
|
Memo to Tim Berners-Lee.
|
591 |
|
|
|
592 |
|
|
[Smith96]
|
593 |
|
|
Neil Smith, Address at MIT _Workshop on Internet Survey
|
594 |
|
|
Methodology and Web Demographics_ January 29-30 1996. Cambridge
|
595 |
|
|
Ma.
|
596 |
|
|
|
597 |
|
|
[Rivest92]
|
598 |
|
|
|
599 |
|
|
Phillip M. Hallam-Baker Page 10
|
600 |
|
|
|
601 |
|
|
|
602 |
|
|
|
603 |
|
|
Session Identification URI
|
604 |
|
|
|
605 |
|
|
Rivest, R., _"The MD4 Message-Digest Algorithm"_, RFC 1321, MIT
|
606 |
|
|
and RSA Data Security, Inc., April 1992
|
607 |
|
|
|
608 |
|
|
[Berners-Lee96]
|
609 |
|
|
Tim Berners-Lee, Roy T. Fielding, and Henrik Frystyk Nielsen.
|
610 |
|
|
_Hypertext Transfer Protocol -- HTTP/1.0_
|
611 |
|
|
|
612 |
|
|
[Gill96]
|
613 |
|
|
Neil Smith, Address at MIT _Workshop on Internet Survey
|
614 |
|
|
Methodology and Web Demographics_ January 29-30 1996. Cambridge
|
615 |
|
|
Ma.
|
616 |
|
|
|
617 |
|
|
[RFC1034]
|
618 |
|
|
P. Mockapetris. _Domain Name System_ . ( RFC1034, RFC1035)
|
619 |
|
|
November 1987
|
620 |
|
|
|
621 |
|
|
[Hallam-Baker94]
|
622 |
|
|
Phillip M. Hallam-Baker _Shen Secure Hypertext Environment,
|
623 |
|
|
Design Notes._ CERN Programming Techniques Group.
|
624 |
|
|
|
625 |
|
|
|
626 |
|
|
|
627 |
|
|
|
628 |
|
|
|
629 |
|
|
|
630 |
|
|
|
631 |
|
|
|
632 |
|
|
|
633 |
|
|
|
634 |
|
|
|
635 |
|
|
|
636 |
|
|
|
637 |
|
|
|
638 |
|
|
|
639 |
|
|
|
640 |
|
|
|
641 |
|
|
|
642 |
|
|
|
643 |
|
|
|
644 |
|
|
|
645 |
|
|
|
646 |
|
|
|
647 |
|
|
|
648 |
|
|
|
649 |
|
|
|
650 |
|
|
|
651 |
|
|
|
652 |
|
|
|
653 |
|
|
|
654 |
|
|
|
655 |
|
|
|
656 |
|
|
|
657 |
|
|
|
658 |
|
|
|
659 |
|
|
Phillip M. Hallam-Baker Page 11
|
660 |
|
|
|