/[suikacvs]/webroot/www/2004/id/draft-hallam-http-session-id-00.txt
Suika

Contents of /webroot/www/2004/id/draft-hallam-http-session-id-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (show annotations) (download)
Tue Jun 15 08:37:16 2004 UTC (19 years, 11 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1
2
3 Session Identification URI
4
5 INTERNET DRAFT Phillip M. Hallam-Baker, W3C
6 Expires in six months email: <hallam@w3.org>
7 Dan Connolly, W3C
8 email: <connolly@w3.org>
9 21st February 1996
10
11
12 Session Identification URI
13
14 <draft-hallam-http-session-id-00.txt>
15
16 Status of this Memo
17
18 This document is an Internet draft. Internet drafts are working
19 documents of the Internet Engineering Task Force (IETF), its areas
20 and its working groups. Note that other groups may also distribute
21 working information as Internet drafts.
22
23 Internet Drafts are draft documents valid for a maximum of six
24 months and can be updated, replaced or obsoleted by other documents
25 at any time. It is inappropriate to use Internet drafts as reference
26 material or to cite them as other than as "work in progress".
27
28 To learn the current status of any Internet draft please check the
29 "lid-abstracts.txt" listing contained in the Internet drafts shadow
30 directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
31 munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or
32 ftp.isi.edu (US West coast). Further information about the IETF can
33 be found at URL: http://www.cnri.reston.va.us/
34
35 Distribution of this document is unlimited. Please send comments to
36 the HTTP working group (HTTP-WG) of the Internet Engineering Task
37 Force (IETF) at < http://www.ics.uci.edu/pub/ietf/http/. This note
38 is also avaliable as a World Wide Web Consortium Working Draft
39 WD-session-id-960221, archived at
40 http://www.w3.org/pub/WWW/TR/WD-session-id-960221.html
41
42 Abstract
43
44 A Uniform Resource Identifier for identifying HTTP sessions is
45 described. Session identification URIs permit HTTP transactions to
46 be linked within a limited domain. This provides a balance between
47 the needs of commercial servers for demographic data collection and
48 the privacy concerns of users. In addition session identification
49 URIs may be used as part of a high security authentication mechanism
50 to prevent replay attacks.
51
52 Introduction
53
54 HTTP is specified as a stateless protocol. This permits HTTP servers
55 to handle a large number of simultaneous requests. The stateless
56 nature of HTTP reduces its utility however. It is not possible to
57 track user reading patterns on a single server nor is is possible
58
59 Phillip M. Hallam-Baker Page 1
60
61
62
63 Session Identification URI
64
65 for a server to adapt its behavior on the basis of previous
66 interactions.
67
68 The ability to trace the path of readers within a Web is important
69 for maintainers of larger sites. Trace information may be used to
70 analyze the efficacy of cross references within the site, and to
71 build profiles of typical users. If it is known for example, that
72 readers of an online newspaper who visit the computer section are
73 very likely to also visit the business section reporters might be
74 asked to provide more cross linkages between these sections.
75 Administrators may also wish to discover the number of users
76 visiting their site rather than the number of visits.
77
78 Advertising as Revenue for Web Content Providers.
79
80 Many content providers raise revenue through advertising.
81 Advertisers therefore need to know the effectiveness of Web based
82 advertising. Content providers who can provide advertisers with
83 detailed profiles of the readership of their material will be able
84 to charge higher rates. Reader profiling would permit those
85 advertisements most likely to obtain a response to be chosen.
86
87 A distinctive feature of the Web is its interactive nature. Gill
88 [Gill96] points out that the interactive nature of the Web may make
89 traditional models of "targeted" advertising obsolete, replacing
90 them with participatory models. The Web is an information system and
91 users who wish to purchase goods are likely to use it to find out
92 details. It may be unnecessary to target advertising in an intrusive
93 manner (e.g. unsolicited email). As users become accustomed to more
94 participatory modes of advertising intrusive methods may become
95 counter productive.
96
97 There are many metrics which an advertiser may with to use to asses
98 the value of a Web placement. These include:
99
100 Hit counts
101 The number of times an advertisement is downloaded. These
102 roughly correspond to exposures as understood in conventional
103 media.
104
105 Referrals
106 The number of times an advertising hyperlink is followed. This
107 implies that the advertiser also has a Web site.
108
109 Hot leads and Sales.
110 Referrals which result in readers demonstrating a significant
111 level of interest or which generate sales.
112
113 Referrals may be determined using the HTTP referer field which
114 informs a server of the URI of the resource which referred the
115 client to a resource. Unfortunately current log file formats do not
116 include this information. A companion document describes an
117 extension to the logfile format to record this data.
118
119 Phillip M. Hallam-Baker Page 2
120
121
122
123 Session Identification URI
124
125 The number of hot leads and/or sales generated by a placement may be
126 determined by correlating trace data within the advertiser's home
127 Web site with the referer field. This procedure creates an
128 interesting correspondence of interest between the parties which
129 removes the need for conventional auditing. An advertiser might pay
130 the publisher according to the business generated by a placement. It
131 is in the advertisers interest to be honest in determining the
132 amount paid since the publisher would determine placement frequency
133 according to the rate of return. This mechanism is of particular
134 interest for adverts targeted at a particular readership where
135 auditing may be difficult.
136
137 Privacy Concerns
138
139 Just because an advertiser is interested in information does not
140 mean that the user is willing to provide it. If care is not taken to
141 protect the privacy of its users the Web could enable more extensive
142 surveillance of its users than has been available to the most
143 ruthless dictatorships.
144
145 The Internet has a strongly developed but highly unpredictable
146 ethical sense. It is a medium of active participants, not of passive
147 consumers. Users may complain very publically about perceived wrongs
148 (whether justified or not) via Usenet which has a readership of
149 several millions. Privacy issues in particular are a frequently
150 issues. Consequently it is advisable to approach the issue of
151 personal privacy cautiously.
152
153 Users may be prepared to exchange information about themselves in
154 return for access to content. Such systems may provide inaccurate
155 data however. Users who believe their privacy to be threatened may
156 deliberately supply incorrect information, supplying a false address
157 and telephone number to prevent unsolicited mail and phone calls.
158
159 Personal data is often collected by financial institutions to serve
160 as a means of customer authentication. Disclosure of personal data
161 may therefore increase fraud risks.
162
163 Many countries have enacted privacy legislation which controls
164 storage and use of personal data. Sites which are governed by such
165 laws may wish to avoid unnecessary acquisition and recording of
166 personal data.
167
168 Although the Web has gained popularity as a publishing medium it was
169 conceived as a collaboration tool. As Turkel points out [Turkel96],
170 a part of the interest of cyberspace may be the ability to take on
171 different personas, the ability to voice unpopular views without
172 risk. Such partitioning of identity requires the ability to separate
173 online activity from offline activity and online activity at one
174 site with activity at another. The Web should therefore permit users
175 to take on new cyberspace identities through use of pseudonyms and
176 the boundaries between these identities must be carefully protected.
177
178
179 Phillip M. Hallam-Baker Page 3
180
181
182
183 Session Identification URI
184
185 Privacy or lack thereof has often been an unanticipated consequence
186 of a particular technology. Early telephone users had little privacy
187 since every conversation could be overheard by the operator. This
188 lead directly to the automatic exchange which was invented by an
189 undertaker whose rival was stealing his business by bribing
190 telephone operators.
191
192 Transactions in the HTTP 1.0 protocol are disjoint. A single request
193 is made which results in a single response after which the operation
194 is completed and the TCP/IP connection closed. The HTTP/1.1 allows
195 the same TCP/IP connection to be used to perform multiple
196 operations.
197
198 Pseudo Session Identifiers.
199
200 IP addresses and ports may be used to provide pseudo identifiers for
201 analysis of demographic data. The usefulness of such identifiers is
202 severely limited. It is not possible to differentiate two users
203 timesharing on a single machine. Nor do users necessarily use the
204 same IP address each time. The value of IP addresses for analysis is
205 rapidly decreasing due to the growing use of proxies and dynamic IP
206 address assignment. These trends will be exacerbated by new
207 developments such as mobile IP.
208
209 Although these pseudo session identifiers are unreliable and
210 unsatisfactory they should be taken into consideration when
211 considering the privacy issues raised by this proposal. In
212 particular it is unnecessary to provide exhaustive proofs of that
213 certain forms of linkage cannot be achieved where this is possible
214 through similar analysis of IP addresses and ports.
215
216 Relationship to State-Info (Cookies)
217
218 State Info [Kristol95] is a proposed extension to the HTTP
219 protocol. It is a refinement of the Netscape "Cookies" proposal
220 [Netscape95]. This mechanism permits a server to generate a token
221 which a client which is returned with future requests. This
222 mechanism is requires clients to store data for every server visited
223 and is consequently unusable with a tracking mechanism unless the
224 number of sites using it is small. In the Session Identifier URI
225 proposal identifiers are generated by clients, not servers. This
226 provides for scalability since a client need only store a fixed
227 amount of identifier information regardless of the number of sites
228 visited.
229
230 URI Format
231
232 Session IDs have the form:
233
234
235 SID:_type_:_realm_:_identifier[_-_thread][_:_count]_
236
237
238
239 Phillip M. Hallam-Baker Page 4
240
241
242
243 Session Identification URI
244
245 Where the fields _type_, _realm_, _identifier_. _thread_ and _count_
246 are defined as follows:
247
248 type
249 Type of session identifier. This field allows other session
250 identifier types to be defined. This draft specifies the
251 identifier type "ANON".
252
253 realm
254 Specifies the realm within which linkage of the identifier is
255 possible. Realms have the same format as DNS names.
256
257 identifier
258 Unstructured random integer specific to realm generated using a
259 procedure with a negligible probability of collision. The
260 identifier is encoded using base 64.
261
262 thread
263 Optional extension of identifier field used to differentiate
264 concurrent uses of the same session identifier. The thread field
265 is an integer encoded in hexadecimal.
266
267 count
268 Optional Hexadecimal encoded Integer containing a monotonically
269 increasing counter value. A client should increment the count
270 field after each operation.
271
272 Examples
273
274 The following example shows a sequence of session identifiers
275 created by the same client. Note that the same counter register is
276 used to generate all the session identifiers within the same thread.
277
278
279 SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:34
280 SID:ANON:mc.ai.mit.edu:NRviSpoYm7mdkYB4W2471l-01:35
281 SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:36
282 SID:ANON:mc.ai.mit.edu:NRviSpoYm7mdkYB4W2471l-01:37
283 SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-02:01
284 SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:38
285
286
287 Limited Linkage of Session Identifiers.
288
289 Session Identifier URIs permit linkage of transactions within a
290 single _realm_. A realm may be considered to approximate to a DNS
291 name. DNS names correlate reasonably well with administrative
292 divisions. This allows a content provider to track activities within
293 sites on their network but does not permit data from different sites
294 to be correlated without specific user authorization in advance.
295
296 Prevention of Replay Attacks
297
298
299 Phillip M. Hallam-Baker Page 5
300
301
302
303 Session Identification URI
304
305 Session Identifiers may also be used within a strong authentication
306 scheme to prevent replay attacks. A replay attack involve the
307 recording of authentic traffic then replaying it at a later date.
308 For example Mallet might intercept Alice's request to download her
309 mail file on Monday and then replays it each day to receive the mail
310 for the rest of the week.
311
312 Replay attacks may be prevented by checking message timestamps.
313 Unfortunately this requires accurate and secure synchronisation of
314 clocks at both ends of the communication which is difficult.
315 Alternatively a challenge/response sequence may be employed. This
316 introduces an additional round trip delay into the transaction and
317 requires the server to maintain a check of which challenges have
318 already been responded to.
319
320 The session identifier URI may be used to prevent replay attacks in
321 combination with a timestamp. The server maintains a record of each
322 identifier used and checks that subsequent requests with that
323 identifier have a higher count field. The volume of data storage
324 required may be minimized by checking that the timestamp falls
325 within an acceptable validity interval.
326
327 Implementation Issues
328
329 A standardized method of constructing session identifiers would
330 permit users to use the same session identification information on
331 different machines avoiding the need to re-register with content
332 providers. This would also be convenient for content providers,
333 avoiding a user with more than one machine being counted twice. The
334 nature of the session identifiers prevents enforcement of such a
335 policy however and the following construction method is therefore
336 only advisory.
337
338 A convenient method of constructing session identifiers which does
339 not require separate storage for each realm visited is to use a
340 Message Authentication Code (MAC) based upon a cryptographically
341 secure one way function such as MD5 [Rivest92].
342
343 On initialization the client obtains a value _key_. This value
344 should be selected in a random manner so as to provide at least 128
345 bits of ergodicity. When a realm is visited the value of the
346 identifier field is created using the formula
347
348 _identifier = MD5 (realm + key)_.
349
350 The client should store the value of _key_ and the counter value
351 associated with each thread.
352
353 HTTP Integration
354
355 Session identifiers may be incorporated in HTTP messages using the
356 Session-Id header. The existing WWW-Authenticate header is extended
357 to permit use of session identifiers as a lightweight authentication
358
359 Phillip M. Hallam-Baker Page 6
360
361
362
363 Session Identification URI
364
365 mechanism.
366
367 Session-Id
368
369
370 Session-Id: _URI_
371
372
373 The Session-Id header may be incorporated in a http request or
374 response. The header accepts a single parameter, the identifier URI.
375
376 Session identifiers are only created by clients. A Session-Id header
377 should only be present in a response if one was specified in the
378 corresponding request and should return the same session identifier
379 value as the request.
380
381 Example
382
383 The following example shows a HTTP request incorporating a session
384 identifier.
385
386
387 GET / HTTP/1.0
388 Accept: text/plain
389 Accept: text/html
390 Session-Id: SID:ANON:w3.org:j6oAOxCWZh/CD723LGeXlf-01:034
391 User-Agent: libwww/4.1
392
393
394
395 A client supporting session identifier URIs should by default attach
396 a session identifier to every request using the DNS name of the
397 server as the realm. Clients must provide users with an option to
398 disable session identifier generation. Clients are encouraged to
399 provide a means of selecting the _realm -> identifier_ mapping.
400
401 WWW-Authenticate
402
403
404 WWW-Authenticate: _1#challenge_
405
406
407 The WWW-Authenticate header is used by a server to request that a
408 client to provide a session identifier where none was given or to
409 specify one for an alternative realm. This mechanism permits linkage
410 of identifiers across realms, but only under user control.
411
412 Example
413
414 The following data shows a server requesting an identifier for the
415 realm "w3.org".
416
417
418
419 Phillip M. Hallam-Baker Page 7
420
421
422
423 Session Identification URI
424
425 HTTP/1.1 401 Unauthorized
426 WWW-Authenticate: Session, realm=w3.org
427 Server: libwww/4.1
428
429
430
431 Clients must not automatically respond to a WWW-Authenticate
432 challenge without user direction.
433
434 A client may offer the user a facility whereby requests for session
435 identifiers in alternative names are automatically accepted provided
436 they are compatible. Realms may be considered compatible provided
437 they are a non trivial prefix of the server dns name. For example a
438 server www.w3.org request for the session identifier in the realm
439 w3.org would be regarded as compatible but requests for w3.com,
440 mit.edu or org would not. DNS names in the toplevel domains com,
441 edu, gov, mil and org may generally be considered non trivial
442 prefixes (the exclusion of net from this list is intentional. Other
443 DNS domains may be considered non trivial prefixes if they are below
444 the second level of the DNS hierarchy.
445
446 Security Considerations
447
448 Security considerations are discussed throughout this paper in
449 addition to this section.
450
451 Unintended Linkage
452
453 Collusion between sites may permit linkage of session identifiers
454 between realms. A server may permit linkage between identifiers
455 within its own realm and another by incorporating the identifier
456 component in a URI. The server www.w3.org receiving the session
457 identifier SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:34 could
458 construct an identifier
459 http://ai.mit.edu/link/j6oAOxCWZh/CD723LGeXlf. If the link was
460 followed the server ai.mit.edu would be able to track the user's
461 activity across both realms.
462
463 Unsafe Construction Techniques.
464
465 Care must be taken in constructing session identifiers. A keyed
466 digest technique known to be cryptographically sound is recommended.
467 In particular implementors should note that a number of techniques
468 for constructing MACs from ciphers using XOR functions are insecure
469 for this application.
470
471 Further Work
472
473 Data Escrow Agents.
474
475 The method for constructing session identification URIs described
476 provides only one possible compromise between privacy and tracking.
477 In particular no provision is made for supporting joint registration
478
479 Phillip M. Hallam-Baker Page 8
480
481
482
483 Session Identification URI
484
485 services. Such services would permit a user to register demographic
486 details (age, sex, interests etc.) with a single server
487
488 Data Escrow Agents support Joint registration services without
489 compromising user privacy. A data escrow agent would capture
490 demographic data at a central location, and analyze content
491 providers log files on their behalf. Escrow agents would be
492 responsible for preventing content providers receiving data detailed
493 enough to compromise user privacy.
494
495 In order to protect user privacy session identifiers must only be
496 linkable by the data escrow agent. This may be achieved using either
497 public key cryptography or message authentication codes.
498
499 In an implementation of a data escrow agent using public keys the
500 data escrow agent provides each content provider with the public
501 component of a public key pair. A user visiting a content provider's
502 site first creates a session identifier as if the escrow agent's
503 realm were to be visited then encrypts it using the content
504 provider's public key to create a session identifier specific to the
505 content provider. In order to analyze a log file the escrow agent
506 decrypts the session identifiers using the private portion of the
507 key.
508
509 In an implementation of a data escrow agent using a MAC, the user
510 provides the data escrow agent with demographic data indexed by a
511 session identifier keyed to the agent's realm. When contacting a
512 content provider the client constructs a session identifier using a
513 MAC of the session identifier keyed by the provider's realm. The
514 escrow agent may construct a linkage between the provider's logfiles
515 and entries in the escrowed database by calculating a MAC for every
516 entry in the database. Although this technique involves a larger
517 number of operations that the public key based scheme, these
518 operations are approximately four orders of magnitude faster.
519
520 Interaction with Proxies and Caches.
521
522 Many Web users browse the Web through a caching proxy. In many
523 countries this mode of operation is essential due to saturation of
524 international network connections. When a proxy serves a user from a
525 local cache the originating server has no knowledge of the
526 transaction. Consequently logfiles may be incomplete. This problem
527 is most serious for commercial sites which use hit counts as a
528 measure of readership.
529
530 A number of techniques may be used to prevent proxies from caching
531 data. This permits demographic data to be collected at the cost of
532 severely reducing network response. In a significant number of cases
533 this will prevent a user from receiving any data at all [Smith96].
534
535 A better solution is to provide a mechanism whereby a proxy supplies
536 a server on request with a log of hits served from the cache. Such
537 logs are potentially of value as an indication of audited
538
539 Phillip M. Hallam-Baker Page 9
540
541
542
543 Session Identification URI
544
545 circulation, particularly if they were to be authenticated using a
546 digital signature technique. In some circumstances it may be
547 desirable for providers of such information to mask usernames by
548 using session identifiers. It is intended to address these issues in
549 a separate document.
550
551 Acknowledgments
552
553 Dave Raggett made the original proposal to use an anonymous session
554 identifier for capture of demographic data. Rohit Khare and Dan
555 Connoly helped refine many of the ideas. Roger Hurwitz and John
556 Mallery made many helpful comments on early versions of this draft.
557
558 Authors Addresses
559
560
561 Phillip M. Hallam-Baker
562 hallam@w3.org
563 World Wid Web Consortium
564 Cambridge MA
565
566 Dan Connolly
567 connolly@w3.org
568 World Wid Web Consortium
569 Cambridge MA
570
571
572
573 References
574
575 [Netscape95]
576 Netscape Communications Corp. Persistent client State HTTP
577 Cookies_
578
579 [Hallam96]
580 Phillip M. Hallam-Baker _ Extended Log File Format_
581
582 [Kristol95]
583 Kristol, D. _ Proposed HTTP State-Info Mechanism _
584
585 [Connoly96]
586 Dan Connoly _Proposals for Gathering Consumer Demographics_
587
588 [Hallam93]
589 Phillip M. Hallam-Baker _Design note on HTTP referer field._
590 Memo to Tim Berners-Lee.
591
592 [Smith96]
593 Neil Smith, Address at MIT _Workshop on Internet Survey
594 Methodology and Web Demographics_ January 29-30 1996. Cambridge
595 Ma.
596
597 [Rivest92]
598
599 Phillip M. Hallam-Baker Page 10
600
601
602
603 Session Identification URI
604
605 Rivest, R., _"The MD4 Message-Digest Algorithm"_, RFC 1321, MIT
606 and RSA Data Security, Inc., April 1992
607
608 [Berners-Lee96]
609 Tim Berners-Lee, Roy T. Fielding, and Henrik Frystyk Nielsen.
610 _Hypertext Transfer Protocol -- HTTP/1.0_
611
612 [Gill96]
613 Neil Smith, Address at MIT _Workshop on Internet Survey
614 Methodology and Web Demographics_ January 29-30 1996. Cambridge
615 Ma.
616
617 [RFC1034]
618 P. Mockapetris. _Domain Name System_ . ( RFC1034, RFC1035)
619 November 1987
620
621 [Hallam-Baker94]
622 Phillip M. Hallam-Baker _Shen Secure Hypertext Environment,
623 Design Notes._ CERN Programming Techniques Group.
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659 Phillip M. Hallam-Baker Page 11
660

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24