/[suikacvs]/webroot/www/2004/id/draft-hallam-http-session-id-00.txt
Suika

Contents of /webroot/www/2004/id/draft-hallam-http-session-id-00.txt

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.1 - (hide annotations) (download)
Tue Jun 15 08:37:16 2004 UTC (20 years, 10 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
File MIME type: text/plain
New

1 wakaba 1.1
2    
3     Session Identification URI
4    
5     INTERNET DRAFT Phillip M. Hallam-Baker, W3C
6     Expires in six months email: <hallam@w3.org>
7     Dan Connolly, W3C
8     email: <connolly@w3.org>
9     21st February 1996
10    
11    
12     Session Identification URI
13    
14     <draft-hallam-http-session-id-00.txt>
15    
16     Status of this Memo
17    
18     This document is an Internet draft. Internet drafts are working
19     documents of the Internet Engineering Task Force (IETF), its areas
20     and its working groups. Note that other groups may also distribute
21     working information as Internet drafts.
22    
23     Internet Drafts are draft documents valid for a maximum of six
24     months and can be updated, replaced or obsoleted by other documents
25     at any time. It is inappropriate to use Internet drafts as reference
26     material or to cite them as other than as "work in progress".
27    
28     To learn the current status of any Internet draft please check the
29     "lid-abstracts.txt" listing contained in the Internet drafts shadow
30     directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
31     munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or
32     ftp.isi.edu (US West coast). Further information about the IETF can
33     be found at URL: http://www.cnri.reston.va.us/
34    
35     Distribution of this document is unlimited. Please send comments to
36     the HTTP working group (HTTP-WG) of the Internet Engineering Task
37     Force (IETF) at < http://www.ics.uci.edu/pub/ietf/http/. This note
38     is also avaliable as a World Wide Web Consortium Working Draft
39     WD-session-id-960221, archived at
40     http://www.w3.org/pub/WWW/TR/WD-session-id-960221.html
41    
42     Abstract
43    
44     A Uniform Resource Identifier for identifying HTTP sessions is
45     described. Session identification URIs permit HTTP transactions to
46     be linked within a limited domain. This provides a balance between
47     the needs of commercial servers for demographic data collection and
48     the privacy concerns of users. In addition session identification
49     URIs may be used as part of a high security authentication mechanism
50     to prevent replay attacks.
51    
52     Introduction
53    
54     HTTP is specified as a stateless protocol. This permits HTTP servers
55     to handle a large number of simultaneous requests. The stateless
56     nature of HTTP reduces its utility however. It is not possible to
57     track user reading patterns on a single server nor is is possible
58    
59     Phillip M. Hallam-Baker Page 1
60    
61    
62    
63     Session Identification URI
64    
65     for a server to adapt its behavior on the basis of previous
66     interactions.
67    
68     The ability to trace the path of readers within a Web is important
69     for maintainers of larger sites. Trace information may be used to
70     analyze the efficacy of cross references within the site, and to
71     build profiles of typical users. If it is known for example, that
72     readers of an online newspaper who visit the computer section are
73     very likely to also visit the business section reporters might be
74     asked to provide more cross linkages between these sections.
75     Administrators may also wish to discover the number of users
76     visiting their site rather than the number of visits.
77    
78     Advertising as Revenue for Web Content Providers.
79    
80     Many content providers raise revenue through advertising.
81     Advertisers therefore need to know the effectiveness of Web based
82     advertising. Content providers who can provide advertisers with
83     detailed profiles of the readership of their material will be able
84     to charge higher rates. Reader profiling would permit those
85     advertisements most likely to obtain a response to be chosen.
86    
87     A distinctive feature of the Web is its interactive nature. Gill
88     [Gill96] points out that the interactive nature of the Web may make
89     traditional models of "targeted" advertising obsolete, replacing
90     them with participatory models. The Web is an information system and
91     users who wish to purchase goods are likely to use it to find out
92     details. It may be unnecessary to target advertising in an intrusive
93     manner (e.g. unsolicited email). As users become accustomed to more
94     participatory modes of advertising intrusive methods may become
95     counter productive.
96    
97     There are many metrics which an advertiser may with to use to asses
98     the value of a Web placement. These include:
99    
100     Hit counts
101     The number of times an advertisement is downloaded. These
102     roughly correspond to exposures as understood in conventional
103     media.
104    
105     Referrals
106     The number of times an advertising hyperlink is followed. This
107     implies that the advertiser also has a Web site.
108    
109     Hot leads and Sales.
110     Referrals which result in readers demonstrating a significant
111     level of interest or which generate sales.
112    
113     Referrals may be determined using the HTTP referer field which
114     informs a server of the URI of the resource which referred the
115     client to a resource. Unfortunately current log file formats do not
116     include this information. A companion document describes an
117     extension to the logfile format to record this data.
118    
119     Phillip M. Hallam-Baker Page 2
120    
121    
122    
123     Session Identification URI
124    
125     The number of hot leads and/or sales generated by a placement may be
126     determined by correlating trace data within the advertiser's home
127     Web site with the referer field. This procedure creates an
128     interesting correspondence of interest between the parties which
129     removes the need for conventional auditing. An advertiser might pay
130     the publisher according to the business generated by a placement. It
131     is in the advertisers interest to be honest in determining the
132     amount paid since the publisher would determine placement frequency
133     according to the rate of return. This mechanism is of particular
134     interest for adverts targeted at a particular readership where
135     auditing may be difficult.
136    
137     Privacy Concerns
138    
139     Just because an advertiser is interested in information does not
140     mean that the user is willing to provide it. If care is not taken to
141     protect the privacy of its users the Web could enable more extensive
142     surveillance of its users than has been available to the most
143     ruthless dictatorships.
144    
145     The Internet has a strongly developed but highly unpredictable
146     ethical sense. It is a medium of active participants, not of passive
147     consumers. Users may complain very publically about perceived wrongs
148     (whether justified or not) via Usenet which has a readership of
149     several millions. Privacy issues in particular are a frequently
150     issues. Consequently it is advisable to approach the issue of
151     personal privacy cautiously.
152    
153     Users may be prepared to exchange information about themselves in
154     return for access to content. Such systems may provide inaccurate
155     data however. Users who believe their privacy to be threatened may
156     deliberately supply incorrect information, supplying a false address
157     and telephone number to prevent unsolicited mail and phone calls.
158    
159     Personal data is often collected by financial institutions to serve
160     as a means of customer authentication. Disclosure of personal data
161     may therefore increase fraud risks.
162    
163     Many countries have enacted privacy legislation which controls
164     storage and use of personal data. Sites which are governed by such
165     laws may wish to avoid unnecessary acquisition and recording of
166     personal data.
167    
168     Although the Web has gained popularity as a publishing medium it was
169     conceived as a collaboration tool. As Turkel points out [Turkel96],
170     a part of the interest of cyberspace may be the ability to take on
171     different personas, the ability to voice unpopular views without
172     risk. Such partitioning of identity requires the ability to separate
173     online activity from offline activity and online activity at one
174     site with activity at another. The Web should therefore permit users
175     to take on new cyberspace identities through use of pseudonyms and
176     the boundaries between these identities must be carefully protected.
177    
178    
179     Phillip M. Hallam-Baker Page 3
180    
181    
182    
183     Session Identification URI
184    
185     Privacy or lack thereof has often been an unanticipated consequence
186     of a particular technology. Early telephone users had little privacy
187     since every conversation could be overheard by the operator. This
188     lead directly to the automatic exchange which was invented by an
189     undertaker whose rival was stealing his business by bribing
190     telephone operators.
191    
192     Transactions in the HTTP 1.0 protocol are disjoint. A single request
193     is made which results in a single response after which the operation
194     is completed and the TCP/IP connection closed. The HTTP/1.1 allows
195     the same TCP/IP connection to be used to perform multiple
196     operations.
197    
198     Pseudo Session Identifiers.
199    
200     IP addresses and ports may be used to provide pseudo identifiers for
201     analysis of demographic data. The usefulness of such identifiers is
202     severely limited. It is not possible to differentiate two users
203     timesharing on a single machine. Nor do users necessarily use the
204     same IP address each time. The value of IP addresses for analysis is
205     rapidly decreasing due to the growing use of proxies and dynamic IP
206     address assignment. These trends will be exacerbated by new
207     developments such as mobile IP.
208    
209     Although these pseudo session identifiers are unreliable and
210     unsatisfactory they should be taken into consideration when
211     considering the privacy issues raised by this proposal. In
212     particular it is unnecessary to provide exhaustive proofs of that
213     certain forms of linkage cannot be achieved where this is possible
214     through similar analysis of IP addresses and ports.
215    
216     Relationship to State-Info (Cookies)
217    
218     State Info [Kristol95] is a proposed extension to the HTTP
219     protocol. It is a refinement of the Netscape "Cookies" proposal
220     [Netscape95]. This mechanism permits a server to generate a token
221     which a client which is returned with future requests. This
222     mechanism is requires clients to store data for every server visited
223     and is consequently unusable with a tracking mechanism unless the
224     number of sites using it is small. In the Session Identifier URI
225     proposal identifiers are generated by clients, not servers. This
226     provides for scalability since a client need only store a fixed
227     amount of identifier information regardless of the number of sites
228     visited.
229    
230     URI Format
231    
232     Session IDs have the form:
233    
234    
235     SID:_type_:_realm_:_identifier[_-_thread][_:_count]_
236    
237    
238    
239     Phillip M. Hallam-Baker Page 4
240    
241    
242    
243     Session Identification URI
244    
245     Where the fields _type_, _realm_, _identifier_. _thread_ and _count_
246     are defined as follows:
247    
248     type
249     Type of session identifier. This field allows other session
250     identifier types to be defined. This draft specifies the
251     identifier type "ANON".
252    
253     realm
254     Specifies the realm within which linkage of the identifier is
255     possible. Realms have the same format as DNS names.
256    
257     identifier
258     Unstructured random integer specific to realm generated using a
259     procedure with a negligible probability of collision. The
260     identifier is encoded using base 64.
261    
262     thread
263     Optional extension of identifier field used to differentiate
264     concurrent uses of the same session identifier. The thread field
265     is an integer encoded in hexadecimal.
266    
267     count
268     Optional Hexadecimal encoded Integer containing a monotonically
269     increasing counter value. A client should increment the count
270     field after each operation.
271    
272     Examples
273    
274     The following example shows a sequence of session identifiers
275     created by the same client. Note that the same counter register is
276     used to generate all the session identifiers within the same thread.
277    
278    
279     SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:34
280     SID:ANON:mc.ai.mit.edu:NRviSpoYm7mdkYB4W2471l-01:35
281     SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:36
282     SID:ANON:mc.ai.mit.edu:NRviSpoYm7mdkYB4W2471l-01:37
283     SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-02:01
284     SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:38
285    
286    
287     Limited Linkage of Session Identifiers.
288    
289     Session Identifier URIs permit linkage of transactions within a
290     single _realm_. A realm may be considered to approximate to a DNS
291     name. DNS names correlate reasonably well with administrative
292     divisions. This allows a content provider to track activities within
293     sites on their network but does not permit data from different sites
294     to be correlated without specific user authorization in advance.
295    
296     Prevention of Replay Attacks
297    
298    
299     Phillip M. Hallam-Baker Page 5
300    
301    
302    
303     Session Identification URI
304    
305     Session Identifiers may also be used within a strong authentication
306     scheme to prevent replay attacks. A replay attack involve the
307     recording of authentic traffic then replaying it at a later date.
308     For example Mallet might intercept Alice's request to download her
309     mail file on Monday and then replays it each day to receive the mail
310     for the rest of the week.
311    
312     Replay attacks may be prevented by checking message timestamps.
313     Unfortunately this requires accurate and secure synchronisation of
314     clocks at both ends of the communication which is difficult.
315     Alternatively a challenge/response sequence may be employed. This
316     introduces an additional round trip delay into the transaction and
317     requires the server to maintain a check of which challenges have
318     already been responded to.
319    
320     The session identifier URI may be used to prevent replay attacks in
321     combination with a timestamp. The server maintains a record of each
322     identifier used and checks that subsequent requests with that
323     identifier have a higher count field. The volume of data storage
324     required may be minimized by checking that the timestamp falls
325     within an acceptable validity interval.
326    
327     Implementation Issues
328    
329     A standardized method of constructing session identifiers would
330     permit users to use the same session identification information on
331     different machines avoiding the need to re-register with content
332     providers. This would also be convenient for content providers,
333     avoiding a user with more than one machine being counted twice. The
334     nature of the session identifiers prevents enforcement of such a
335     policy however and the following construction method is therefore
336     only advisory.
337    
338     A convenient method of constructing session identifiers which does
339     not require separate storage for each realm visited is to use a
340     Message Authentication Code (MAC) based upon a cryptographically
341     secure one way function such as MD5 [Rivest92].
342    
343     On initialization the client obtains a value _key_. This value
344     should be selected in a random manner so as to provide at least 128
345     bits of ergodicity. When a realm is visited the value of the
346     identifier field is created using the formula
347    
348     _identifier = MD5 (realm + key)_.
349    
350     The client should store the value of _key_ and the counter value
351     associated with each thread.
352    
353     HTTP Integration
354    
355     Session identifiers may be incorporated in HTTP messages using the
356     Session-Id header. The existing WWW-Authenticate header is extended
357     to permit use of session identifiers as a lightweight authentication
358    
359     Phillip M. Hallam-Baker Page 6
360    
361    
362    
363     Session Identification URI
364    
365     mechanism.
366    
367     Session-Id
368    
369    
370     Session-Id: _URI_
371    
372    
373     The Session-Id header may be incorporated in a http request or
374     response. The header accepts a single parameter, the identifier URI.
375    
376     Session identifiers are only created by clients. A Session-Id header
377     should only be present in a response if one was specified in the
378     corresponding request and should return the same session identifier
379     value as the request.
380    
381     Example
382    
383     The following example shows a HTTP request incorporating a session
384     identifier.
385    
386    
387     GET / HTTP/1.0
388     Accept: text/plain
389     Accept: text/html
390     Session-Id: SID:ANON:w3.org:j6oAOxCWZh/CD723LGeXlf-01:034
391     User-Agent: libwww/4.1
392    
393    
394    
395     A client supporting session identifier URIs should by default attach
396     a session identifier to every request using the DNS name of the
397     server as the realm. Clients must provide users with an option to
398     disable session identifier generation. Clients are encouraged to
399     provide a means of selecting the _realm -> identifier_ mapping.
400    
401     WWW-Authenticate
402    
403    
404     WWW-Authenticate: _1#challenge_
405    
406    
407     The WWW-Authenticate header is used by a server to request that a
408     client to provide a session identifier where none was given or to
409     specify one for an alternative realm. This mechanism permits linkage
410     of identifiers across realms, but only under user control.
411    
412     Example
413    
414     The following data shows a server requesting an identifier for the
415     realm "w3.org".
416    
417    
418    
419     Phillip M. Hallam-Baker Page 7
420    
421    
422    
423     Session Identification URI
424    
425     HTTP/1.1 401 Unauthorized
426     WWW-Authenticate: Session, realm=w3.org
427     Server: libwww/4.1
428    
429    
430    
431     Clients must not automatically respond to a WWW-Authenticate
432     challenge without user direction.
433    
434     A client may offer the user a facility whereby requests for session
435     identifiers in alternative names are automatically accepted provided
436     they are compatible. Realms may be considered compatible provided
437     they are a non trivial prefix of the server dns name. For example a
438     server www.w3.org request for the session identifier in the realm
439     w3.org would be regarded as compatible but requests for w3.com,
440     mit.edu or org would not. DNS names in the toplevel domains com,
441     edu, gov, mil and org may generally be considered non trivial
442     prefixes (the exclusion of net from this list is intentional. Other
443     DNS domains may be considered non trivial prefixes if they are below
444     the second level of the DNS hierarchy.
445    
446     Security Considerations
447    
448     Security considerations are discussed throughout this paper in
449     addition to this section.
450    
451     Unintended Linkage
452    
453     Collusion between sites may permit linkage of session identifiers
454     between realms. A server may permit linkage between identifiers
455     within its own realm and another by incorporating the identifier
456     component in a URI. The server www.w3.org receiving the session
457     identifier SID:ANON:www.w3.org:j6oAOxCWZh/CD723LGeXlf-01:34 could
458     construct an identifier
459     http://ai.mit.edu/link/j6oAOxCWZh/CD723LGeXlf. If the link was
460     followed the server ai.mit.edu would be able to track the user's
461     activity across both realms.
462    
463     Unsafe Construction Techniques.
464    
465     Care must be taken in constructing session identifiers. A keyed
466     digest technique known to be cryptographically sound is recommended.
467     In particular implementors should note that a number of techniques
468     for constructing MACs from ciphers using XOR functions are insecure
469     for this application.
470    
471     Further Work
472    
473     Data Escrow Agents.
474    
475     The method for constructing session identification URIs described
476     provides only one possible compromise between privacy and tracking.
477     In particular no provision is made for supporting joint registration
478    
479     Phillip M. Hallam-Baker Page 8
480    
481    
482    
483     Session Identification URI
484    
485     services. Such services would permit a user to register demographic
486     details (age, sex, interests etc.) with a single server
487    
488     Data Escrow Agents support Joint registration services without
489     compromising user privacy. A data escrow agent would capture
490     demographic data at a central location, and analyze content
491     providers log files on their behalf. Escrow agents would be
492     responsible for preventing content providers receiving data detailed
493     enough to compromise user privacy.
494    
495     In order to protect user privacy session identifiers must only be
496     linkable by the data escrow agent. This may be achieved using either
497     public key cryptography or message authentication codes.
498    
499     In an implementation of a data escrow agent using public keys the
500     data escrow agent provides each content provider with the public
501     component of a public key pair. A user visiting a content provider's
502     site first creates a session identifier as if the escrow agent's
503     realm were to be visited then encrypts it using the content
504     provider's public key to create a session identifier specific to the
505     content provider. In order to analyze a log file the escrow agent
506     decrypts the session identifiers using the private portion of the
507     key.
508    
509     In an implementation of a data escrow agent using a MAC, the user
510     provides the data escrow agent with demographic data indexed by a
511     session identifier keyed to the agent's realm. When contacting a
512     content provider the client constructs a session identifier using a
513     MAC of the session identifier keyed by the provider's realm. The
514     escrow agent may construct a linkage between the provider's logfiles
515     and entries in the escrowed database by calculating a MAC for every
516     entry in the database. Although this technique involves a larger
517     number of operations that the public key based scheme, these
518     operations are approximately four orders of magnitude faster.
519    
520     Interaction with Proxies and Caches.
521    
522     Many Web users browse the Web through a caching proxy. In many
523     countries this mode of operation is essential due to saturation of
524     international network connections. When a proxy serves a user from a
525     local cache the originating server has no knowledge of the
526     transaction. Consequently logfiles may be incomplete. This problem
527     is most serious for commercial sites which use hit counts as a
528     measure of readership.
529    
530     A number of techniques may be used to prevent proxies from caching
531     data. This permits demographic data to be collected at the cost of
532     severely reducing network response. In a significant number of cases
533     this will prevent a user from receiving any data at all [Smith96].
534    
535     A better solution is to provide a mechanism whereby a proxy supplies
536     a server on request with a log of hits served from the cache. Such
537     logs are potentially of value as an indication of audited
538    
539     Phillip M. Hallam-Baker Page 9
540    
541    
542    
543     Session Identification URI
544    
545     circulation, particularly if they were to be authenticated using a
546     digital signature technique. In some circumstances it may be
547     desirable for providers of such information to mask usernames by
548     using session identifiers. It is intended to address these issues in
549     a separate document.
550    
551     Acknowledgments
552    
553     Dave Raggett made the original proposal to use an anonymous session
554     identifier for capture of demographic data. Rohit Khare and Dan
555     Connoly helped refine many of the ideas. Roger Hurwitz and John
556     Mallery made many helpful comments on early versions of this draft.
557    
558     Authors Addresses
559    
560    
561     Phillip M. Hallam-Baker
562     hallam@w3.org
563     World Wid Web Consortium
564     Cambridge MA
565    
566     Dan Connolly
567     connolly@w3.org
568     World Wid Web Consortium
569     Cambridge MA
570    
571    
572    
573     References
574    
575     [Netscape95]
576     Netscape Communications Corp. Persistent client State HTTP
577     Cookies_
578    
579     [Hallam96]
580     Phillip M. Hallam-Baker _ Extended Log File Format_
581    
582     [Kristol95]
583     Kristol, D. _ Proposed HTTP State-Info Mechanism _
584    
585     [Connoly96]
586     Dan Connoly _Proposals for Gathering Consumer Demographics_
587    
588     [Hallam93]
589     Phillip M. Hallam-Baker _Design note on HTTP referer field._
590     Memo to Tim Berners-Lee.
591    
592     [Smith96]
593     Neil Smith, Address at MIT _Workshop on Internet Survey
594     Methodology and Web Demographics_ January 29-30 1996. Cambridge
595     Ma.
596    
597     [Rivest92]
598    
599     Phillip M. Hallam-Baker Page 10
600    
601    
602    
603     Session Identification URI
604    
605     Rivest, R., _"The MD4 Message-Digest Algorithm"_, RFC 1321, MIT
606     and RSA Data Security, Inc., April 1992
607    
608     [Berners-Lee96]
609     Tim Berners-Lee, Roy T. Fielding, and Henrik Frystyk Nielsen.
610     _Hypertext Transfer Protocol -- HTTP/1.0_
611    
612     [Gill96]
613     Neil Smith, Address at MIT _Workshop on Internet Survey
614     Methodology and Web Demographics_ January 29-30 1996. Cambridge
615     Ma.
616    
617     [RFC1034]
618     P. Mockapetris. _Domain Name System_ . ( RFC1034, RFC1035)
619     November 1987
620    
621     [Hallam-Baker94]
622     Phillip M. Hallam-Baker _Shen Secure Hypertext Environment,
623     Design Notes._ CERN Programming Techniques Group.
624    
625    
626    
627    
628    
629    
630    
631    
632    
633    
634    
635    
636    
637    
638    
639    
640    
641    
642    
643    
644    
645    
646    
647    
648    
649    
650    
651    
652    
653    
654    
655    
656    
657    
658    
659     Phillip M. Hallam-Baker Page 11
660    

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24