2004/id/draft-ietf-html-charset-harmful-00.txt



HTML Working Group                                           D. Connolly
INTERNET-DRAFT                                               MIT/W3C
draft-ietf-html-charset-harmful-00.txt                       May 2, 1995
Expires November, 1995


Character Set Considered Harmful


Status of this Document


This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute working
documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference material
or to cite them other than as "work in progress."

To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast).

Distribution of this document is unlimited. Please send comments to the
HTML working group (HTML-WG) of the Internet Engineering Task Force
(IETF) at <html-wg@oclc.org> ;. Discussions of the group are archived at
http://www.acl.lanl.gov/HTML_WG/archives.html .

Abstract


The term character set is often used to describe a ditigal
representation of text. ASCII is perhaps the most widely deployed
representation of text, and in the interest of interoperability,
information systems on the Internet traditionally rely on it
exclusively.

The Multipurpose Internet Mail Extensions (MIME) introduces Internet
Media Types, including text representations besides ASCII. The Hypertext
Markup Language (HTML) used in the World-Wide Web is a proposed Internet
Media Type. But HTML is also an application of Standard Generalized
Markup Language (SGML).


Connolly                                                       [Page 1]

Internet Draft          Character Terminology                 May, 1995


In the MIME and SGML specifications, the discussion of characters
representation is notoriously complex, and apparently subtly
inconsistent or incompatible. This document presents a collection of
terms intended to reconcile the two specifications and serve as a basis
for rigorous discussion of characters and their digital representations.

Introduction


The term character set is often used to describe a ditigal
representation of text. The specification of such a representation
typically involves identifying a sufficiently expressive collection of
characters, and giving each of them a number.

In conventional mathematics terminology then, a "character set" is not
just a set of characters, but a function whose domain is a set of
integers, and whose range is a set of characters.

Some standards documents, including the SGML standard, make little or no
use of such conventional mathematical terms as function, domain and
range. Perhaps the authors of those documents intend the documents to be
comprehensible without a prior understanding of mathematics. But the
specification of notions such as the conformance of an SGML document or
SGML system are much more complex than the basics of logic and
mathematics.

In his text on Calculus [Spivak] , Michael Spivak writes:
     
     
     Every aspect of this book was influenced by the desire to
     present calculus not merely as a prelude to but as the first
     real encounter with mathematics. Since the foundation of
     analysis provided the arena in which modern modes of
     mathematical thinking developed, calculus ought to be the
     place in which to expect, rather than avoid, the strengthening
     of insight with logic. In addition to developing the students'
     intuition about the beautiful concepts of analysis, it is
     surely equally important to persuade them that precision and
     rigor are neither deterrents to intuition, nor ends in
     themselves, but the natural medium in which to formulate and
     think about mathematical questions.


This document is not intended as the first real encounter with
mathematics. But neither will we make any effort to avoid or apologize
for mathematical terminology. The reader is referred to the large body
of literature on logic and set theory, including a history of writings
on math and logic[SET] and Douglas Hofstadter's fascinating book [GEB] .


Connolly                                                       [Page 2]

Internet Draft          Character Terminology                 May, 1995


Coded Character Sets


Using "character set" rather than something such as character table or
even character sequence to denote the functions that maps integers to
characters is unfortunate, but it is water under the bridge, and a lot
of it by now. Rather than attempting to divert all that water at this
point, we introduce the primitive notion of character and use it to
define the term coded character set from [ISO10646] and other standards:

character
     An atom of information
coded character set
     A function whose domain is a subset of the integers, and whose
     range is a set of characters.


Note that by the term character, we do not mean a glyph, a name, a
phoneme, nor a bit combination. A character is simply an atomic unit of
communication. It is typically a symbol whose various representations
are understood to mean the same thing by a community of people.

It might seem more intuitive to map from characters to integers, rather
than the way it is defined here. But in practice there are some coded
character sets that assign two different numbers to the same character
[Lee] , and so the inverse is not a function in the general case.

There are two other terms used in standards such as [ISO10646] that we
define in relation to the first two:

code position
     An integer. A coded character set and a code position from its
     domain determine a character.
character repertoire
     A set of characters; that is, the range of a coded character set.


Character Encoding Schemes


The only practical means for exchanging information on the Internet is
to represent it as a sequence of octets (bytes).

One way to transmit a sequence of characters is to agree on a coded
character set and transmit the character numbers of each of the
characters.


Connolly                                                       [Page 3]

Internet Draft          Character Terminology                 May, 1995


But in practice, characters are encoded using a variety of optimizations
of this brute-force approach: code switching techniques, escape
sequences, etc. The encoding of a sequence of characters is not, in
general, the result of encoding each character independently and then
concatenating them. But it is sufficiently general to note that
sequences of characters are encoded as a sequence of bytes. So we
define:

octet
     an element of the set {0, 1, 2, ..., 255}
character encoding scheme
     a function whose domain is the set of sequences of octets, and
     whose range is the set of sequences of characters over some
     character repertoire.


Representation of SGML Text Entities


An SGML document is made up of entities: a text entity called the
document entity, and possibly some other text entities and data
entities.

A text entity is a sequence of characters. The representation of a text
entity is not specified by the SGML standard. For the purpose of
MIME-based interchange of SGML text entities, we define the following:

text entity
     a sequence of characters
message entity
     a pair (T, OS) where T is an Internet Media Type and OS is a
     sequence of octets.


Note that each text/* media type has an associated charset parameter,
which designates a character encoding scheme. The character encoding
scheme maps the body -- a sequence of octets -- to a text entity -- a
sequence of characters. Hence any message entity of type text/* is
equivalent to a text entity.

Numeric Character References


Numeric character references are a great source of confusion. The key
insights are that:
   * Every SGML document has exactly one document character set, which
     is a coded character set
   * Numeric character references give code positions in the document
     character set


Connolly                                                       [Page 4]

Internet Draft          Character Terminology                 May, 1995


Example: ISO2022 Encoding with ISO10646 Coded Character Set


Consider the following message entity:
Date: Saturday, 29-Apr-95 03:53:33 GMT
MIME-version: 1.0
Content-Type: text/html; charset=iso-2022-jp

<TITLE>...</TITLE>
<BODY>
Here is some normal text.
Here is a 10646 numeric character reference &#2432;.
Here is some ISO-2022-JP text: ...
</BODY>


To interpret the message entity, we notice that the Content-Type is
text/html , so this represents a text entity. The charset parameter
iso-2022-jp , along with the octet sequence of the body, determines a
sequence of characters. The octets denoted above by '...' represent
characters, as per iso-2022-jp .

To parse the resulting text entity as per SGML, the sender and receiver
must agree on an SGML declaration, since none is present in the document
entity. For this example, we assume that SGML declaration specifies
ISO10646 as the document character set. So the numeric character
reference &#2432; is resolved with respect to ISO10646.

It may seem contradictory that the ISO-2022-JP character encoding scheme
is defined in terms of a collection of coded character sets, none of
which is ISO10646. But there is no contradiction. Each character encoded
by ISO-2022-JP is in the repertoire of one of those coded character
sets, each of which is a subset of the repertoire of ISO10646.

So while ISO-2022-JP is not sufficient for every ISO10646 document, it
is the case that ISO10646 is a sufficient document character set for any
entity encoded with ISO-2022-JP .

Example: Reducing the Repertoire of an Entity


Suppose we have an SGML document D whose document character set is the
coded character set ISO10646. We find the document entity DE in the form
of sequence of octets OS in a disk file, encoded using the Unicode-UCS-2
character encoding scheme.
        Unicode-UCS-2(OS) = DE


Connolly                                                       [Page 5]

Internet Draft          Character Terminology                 May, 1995


We can reduce the character repertoire necessary to represent the
document entity by replacing characters outside the ISO-646-IRV
character repertoire with numeric character references:
        DE' = reduce(DE, ISO10646, ISO-646-IRV)

where

  reduce : SEQ(char) X Coded Character Set X Character Repertoire -> 
SEQ(char)

and

  reduce(c . rest, CCS, R) = if c in R, c . reduce(rest, CCS, R)
                                        else &#N; . reduce(rest, CCS, R)
                                        where CCS(N) = c


The resulting entity, DE' can then be endoded using US-ASCII
        US-ASCII(OS') = DE' = reduce(DE, ISO10646, ISO-646-IRV)


Hence, we can represent the document D as a message entity whose content
type is "text/plain; charset=US-ASCII" and whose body is OS'.

Conclusion


It is critical to keep separate the notion of a simple table of
characters and their numbers, i.e. a coded character set, separate from
the various algorithms to encoded sequences of characters, i.e.
character encoding schemes. This separation allows a representation of a
text entity which is consistent with both the MIME and SGML
specifications.

Acknowledgements


The idea for the title of this document actually came from John Klensin.
The notion of character encoding scheme was inspired by the MIME
specification by Ned Freed. James Clark, Ed Levinson, and several other
members of the MIMESGML working group collaborated in discussions
leading up to this draft. Liam Quin from SoftQuad and Gavin Nicol from
EBT have provided guidance on these issues in the past. Erik Naggum has
provided invaluable aid in understanding the SGML standard.

References


Connolly                                                       [Page 6]

Internet Draft          Character Terminology                 May, 1995


[MIME]
     N. Borenstein and N. Freed. "MIME (Multipurpose Internet Mail
     Extensions) Part One: Mechanisms for Specifying and Describing the
     Format of Internet Message Bodies." RFC 1521, Bellcore, Innosoft,
     September 1993.
[ASCII]
     US-ASCII. Coded Character Set - 7-Bit American Standard Code for
     Information Interchange. Standard ANSI X3.4-1986, ANSI, 1986.
[ISO-8859]
     ISO 8859. International Standard -- Information Processing -- 8-bit
     Single-Byte Coded Graphic Character Sets -- Part 1: Latin Alphabet
     No. 1, ISO 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2,
     1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988. Part 4: Latin
     alphabet No. 4, ISO 8859-4, 1988. Part 5: Latin/Cyrillic alphabet,
     ISO 8859-5, 1988. Part 6: Latin/Arabic alphabet, ISO 8859-6, 1987.
     Part 7: Latin/Greek alphabet, ISO 8859-7, 1987. Part 8:
     Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin alphabet No.
     5, ISO 8859-9, 1990.
[SGML]
     ISO 8879. Information Processing -- Text and Office Systems --
     Standard Generalized Markup Language (SGML), 1986.
[Nicol]
     The Multilingual World Wide Web , Gavin T. Nicol, Electronic Book
     Technologies, Japan gtn@ebt.com
[Lee]Private communication with Liam Quin, from SoftQuad.
[Spivak]
     Spivak, Michael. Calculus. 2nd Ed. 1967 ISBN 0-914098-77-2
[GEB]Hofstadter, Douglas R. G&ouml;del, Escher, Bach: An Eternal Golden
     Braid, 1979 ISBN 0-394-75682-7
[SET]"Investigations in the foundations of set theory I", in Jean van
     Heijenoort (ed.) _From Frege to Godel: A Source Book in
     Mathematical Logic, 1879-1931_ (Harvard U.P., 1967)


Author:

Dan Connolly
545 Technology Square
Cambridge, MA 02139
617-258-8143
connolly@w3.org


Connolly                                                       [Page 7]
1
2
3	HTML Working Group D. Connolly
4	INTERNET-DRAFT MIT/W3C
5	draft-ietf-html-charset-harmful-00.txt May 2, 1995
6	Expires November, 1995
7
8
9
10	Character Set Considered Harmful
11
12
13
14	Status of this Document
15
16
17
18	This document is an Internet-Draft. Internet-Drafts are working
19	documents of the Internet Engineering Task Force (IETF), its areas, and
20	its working groups. Note that other groups may also distribute working
21	documents as Internet-Drafts.
22
23	Internet-Drafts are draft documents valid for a maximum of six months
24	and may be updated, replaced, or obsoleted by other documents at any
25	time. It is inappropriate to use Internet-Drafts as reference material
26	or to cite them other than as "work in progress."
27
28	To learn the current status of any Internet-Draft, please check the
29	"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
30	Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
31	munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
32	ftp.isi.edu (US West Coast).
33
34	Distribution of this document is unlimited. Please send comments to the
35	HTML working group (HTML-WG) of the Internet Engineering Task Force
36	(IETF) at <html-wg@oclc.org> ;. Discussions of the group are archived at
37	http://www.acl.lanl.gov/HTML_WG/archives.html .
38
39	Abstract
40
41
42
43	The term character set is often used to describe a ditigal
44	representation of text. ASCII is perhaps the most widely deployed
45	representation of text, and in the interest of interoperability,
46	information systems on the Internet traditionally rely on it
47	exclusively.
48
49	The Multipurpose Internet Mail Extensions (MIME) introduces Internet
50	Media Types, including text representations besides ASCII. The Hypertext
51	Markup Language (HTML) used in the World-Wide Web is a proposed Internet
52	Media Type. But HTML is also an application of Standard Generalized
53	Markup Language (SGML).
54
55
56
57
58	Connolly [Page 1]
59
60	Internet Draft Character Terminology May, 1995
61
62
63	In the MIME and SGML specifications, the discussion of characters
64	representation is notoriously complex, and apparently subtly
65	inconsistent or incompatible. This document presents a collection of
66	terms intended to reconcile the two specifications and serve as a basis
67	for rigorous discussion of characters and their digital representations.
68
69	Introduction
70
71
72
73	The term character set is often used to describe a ditigal
74	representation of text. The specification of such a representation
75	typically involves identifying a sufficiently expressive collection of
76	characters, and giving each of them a number.
77
78	In conventional mathematics terminology then, a "character set" is not
79	just a set of characters, but a function whose domain is a set of
80	integers, and whose range is a set of characters.
81
82	Some standards documents, including the SGML standard, make little or no
83	use of such conventional mathematical terms as function, domain and
84	range. Perhaps the authors of those documents intend the documents to be
85	comprehensible without a prior understanding of mathematics. But the
86	specification of notions such as the conformance of an SGML document or
87	SGML system are much more complex than the basics of logic and
88	mathematics.
89
90	In his text on Calculus [Spivak] , Michael Spivak writes:
91
92
93
94	Every aspect of this book was influenced by the desire to
95	present calculus not merely as a prelude to but as the first
96	real encounter with mathematics. Since the foundation of
97	analysis provided the arena in which modern modes of
98	mathematical thinking developed, calculus ought to be the
99	place in which to expect, rather than avoid, the strengthening
100	of insight with logic. In addition to developing the students'
101	intuition about the beautiful concepts of analysis, it is
102	surely equally important to persuade them that precision and
103	rigor are neither deterrents to intuition, nor ends in
104	themselves, but the natural medium in which to formulate and
105	think about mathematical questions.
106
107
108
109	This document is not intended as the first real encounter with
110	mathematics. But neither will we make any effort to avoid or apologize
111	for mathematical terminology. The reader is referred to the large body
112	of literature on logic and set theory, including a history of writings
113	on math and logic[SET] and Douglas Hofstadter's fascinating book [GEB] .
114
115
116	Connolly [Page 2]
117
118	Internet Draft Character Terminology May, 1995
119
120
121	Coded Character Sets
122
123
124
125	Using "character set" rather than something such as character table or
126	even character sequence to denote the functions that maps integers to
127	characters is unfortunate, but it is water under the bridge, and a lot
128	of it by now. Rather than attempting to divert all that water at this
129	point, we introduce the primitive notion of character and use it to
130	define the term coded character set from [ISO10646] and other standards:
131
132	character
133	An atom of information
134	coded character set
135	A function whose domain is a subset of the integers, and whose
136	range is a set of characters.
137
138
139	Note that by the term character, we do not mean a glyph, a name, a
140	phoneme, nor a bit combination. A character is simply an atomic unit of
141	communication. It is typically a symbol whose various representations
142	are understood to mean the same thing by a community of people.
143
144	It might seem more intuitive to map from characters to integers, rather
145	than the way it is defined here. But in practice there are some coded
146	character sets that assign two different numbers to the same character
147	[Lee] , and so the inverse is not a function in the general case.
148
149	There are two other terms used in standards such as [ISO10646] that we
150	define in relation to the first two:
151
152	code position
153	An integer. A coded character set and a code position from its
154	domain determine a character.
155	character repertoire
156	A set of characters; that is, the range of a coded character set.
157
158
159	Character Encoding Schemes
160
161
162
163	The only practical means for exchanging information on the Internet is
164	to represent it as a sequence of octets (bytes).
165
166	One way to transmit a sequence of characters is to agree on a coded
167	character set and transmit the character numbers of each of the
168	characters.
169
170
171
172
173
174	Connolly [Page 3]
175
176	Internet Draft Character Terminology May, 1995
177
178
179	But in practice, characters are encoded using a variety of optimizations
180	of this brute-force approach: code switching techniques, escape
181	sequences, etc. The encoding of a sequence of characters is not, in
182	general, the result of encoding each character independently and then
183	concatenating them. But it is sufficiently general to note that
184	sequences of characters are encoded as a sequence of bytes. So we
185	define:
186
187	octet
188	an element of the set {0, 1, 2, ..., 255}
189	character encoding scheme
190	a function whose domain is the set of sequences of octets, and
191	whose range is the set of sequences of characters over some
192	character repertoire.
193
194
195	Representation of SGML Text Entities
196
197
198
199	An SGML document is made up of entities: a text entity called the
200	document entity, and possibly some other text entities and data
201	entities.
202
203	A text entity is a sequence of characters. The representation of a text
204	entity is not specified by the SGML standard. For the purpose of
205	MIME-based interchange of SGML text entities, we define the following:
206
207	text entity
208	a sequence of characters
209	message entity
210	a pair (T, OS) where T is an Internet Media Type and OS is a
211	sequence of octets.
212
213
214	Note that each text/* media type has an associated charset parameter,
215	which designates a character encoding scheme. The character encoding
216	scheme maps the body -- a sequence of octets -- to a text entity -- a
217	sequence of characters. Hence any message entity of type text/* is
218	equivalent to a text entity.
219
220	Numeric Character References
221
222
223
224	Numeric character references are a great source of confusion. The key
225	insights are that:
226	* Every SGML document has exactly one document character set, which
227	is a coded character set
228	* Numeric character references give code positions in the document
229	character set
230
231
232
233	Connolly [Page 4]
234
235	Internet Draft Character Terminology May, 1995
236
237
238	Example: ISO2022 Encoding with ISO10646 Coded Character Set
239
240
241	Consider the following message entity:
242	Date: Saturday, 29-Apr-95 03:53:33 GMT
243	MIME-version: 1.0
244	Content-Type: text/html; charset=iso-2022-jp
245
246	<TITLE>...</TITLE>
247	<BODY>
248	Here is some normal text.
249	Here is a 10646 numeric character reference ঀ.
250	Here is some ISO-2022-JP text: ...
251	</BODY>
252
253
254
255	To interpret the message entity, we notice that the Content-Type is
256	text/html , so this represents a text entity. The charset parameter
257	iso-2022-jp , along with the octet sequence of the body, determines a
258	sequence of characters. The octets denoted above by '...' represent
259	characters, as per iso-2022-jp .
260
261	To parse the resulting text entity as per SGML, the sender and receiver
262	must agree on an SGML declaration, since none is present in the document
263	entity. For this example, we assume that SGML declaration specifies
264	ISO10646 as the document character set. So the numeric character
265	reference ঀ is resolved with respect to ISO10646.
266
267	It may seem contradictory that the ISO-2022-JP character encoding scheme
268	is defined in terms of a collection of coded character sets, none of
269	which is ISO10646. But there is no contradiction. Each character encoded
270	by ISO-2022-JP is in the repertoire of one of those coded character
271	sets, each of which is a subset of the repertoire of ISO10646.
272
273	So while ISO-2022-JP is not sufficient for every ISO10646 document, it
274	is the case that ISO10646 is a sufficient document character set for any
275	entity encoded with ISO-2022-JP .
276
277	Example: Reducing the Repertoire of an Entity
278
279
280	Suppose we have an SGML document D whose document character set is the
281	coded character set ISO10646. We find the document entity DE in the form
282	of sequence of octets OS in a disk file, encoded using the Unicode-UCS-2
283	character encoding scheme.
284	Unicode-UCS-2(OS) = DE
285
286
287
288
289
290
291	Connolly [Page 5]
292
293	Internet Draft Character Terminology May, 1995
294
295
296	We can reduce the character repertoire necessary to represent the
297	document entity by replacing characters outside the ISO-646-IRV
298	character repertoire with numeric character references:
299	DE' = reduce(DE, ISO10646, ISO-646-IRV)
300
301	where
302
303	reduce : SEQ(char) X Coded Character Set X Character Repertoire ->
304	SEQ(char)
305
306	and
307
308	reduce(c . rest, CCS, R) = if c in R, c . reduce(rest, CCS, R)
309	else &#N; . reduce(rest, CCS, R)
310	where CCS(N) = c
311
312
313	The resulting entity, DE' can then be endoded using US-ASCII
314	US-ASCII(OS') = DE' = reduce(DE, ISO10646, ISO-646-IRV)
315
316
317	Hence, we can represent the document D as a message entity whose content
318	type is "text/plain; charset=US-ASCII" and whose body is OS'.
319
320	Conclusion
321
322
323
324	It is critical to keep separate the notion of a simple table of
325	characters and their numbers, i.e. a coded character set, separate from
326	the various algorithms to encoded sequences of characters, i.e.
327	character encoding schemes. This separation allows a representation of a
328	text entity which is consistent with both the MIME and SGML
329	specifications.
330
331	Acknowledgements
332
333
334
335	The idea for the title of this document actually came from John Klensin.
336	The notion of character encoding scheme was inspired by the MIME
337	specification by Ned Freed. James Clark, Ed Levinson, and several other
338	members of the MIMESGML working group collaborated in discussions
339	leading up to this draft. Liam Quin from SoftQuad and Gavin Nicol from
340	EBT have provided guidance on these issues in the past. Erik Naggum has
341	provided invaluable aid in understanding the SGML standard.
342
343	References
344
345
346
347
348
349	Connolly [Page 6]
350
351	Internet Draft Character Terminology May, 1995
352
353
354	[MIME]
355	N. Borenstein and N. Freed. "MIME (Multipurpose Internet Mail
356	Extensions) Part One: Mechanisms for Specifying and Describing the
357	Format of Internet Message Bodies." RFC 1521, Bellcore, Innosoft,
358	September 1993.
359	[ASCII]
360	US-ASCII. Coded Character Set - 7-Bit American Standard Code for
361	Information Interchange. Standard ANSI X3.4-1986, ANSI, 1986.
362	[ISO-8859]
363	ISO 8859. International Standard -- Information Processing -- 8-bit
364	Single-Byte Coded Graphic Character Sets -- Part 1: Latin Alphabet
365	No. 1, ISO 8859-1:1987. Part 2: Latin alphabet No. 2, ISO 8859-2,
366	1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988. Part 4: Latin
367	alphabet No. 4, ISO 8859-4, 1988. Part 5: Latin/Cyrillic alphabet,
368	ISO 8859-5, 1988. Part 6: Latin/Arabic alphabet, ISO 8859-6, 1987.
369	Part 7: Latin/Greek alphabet, ISO 8859-7, 1987. Part 8:
370	Latin/Hebrew alphabet, ISO 8859-8, 1988. Part 9: Latin alphabet No.
371	5, ISO 8859-9, 1990.
372	[SGML]
373	ISO 8879. Information Processing -- Text and Office Systems --
374	Standard Generalized Markup Language (SGML), 1986.
375	[Nicol]
376	The Multilingual World Wide Web , Gavin T. Nicol, Electronic Book
377	Technologies, Japan gtn@ebt.com
378	[Lee]Private communication with Liam Quin, from SoftQuad.
379	[Spivak]
380	Spivak, Michael. Calculus. 2nd Ed. 1967 ISBN 0-914098-77-2
381	[GEB]Hofstadter, Douglas R. Gödel, Escher, Bach: An Eternal Golden
382	Braid, 1979 ISBN 0-394-75682-7
383	[SET]"Investigations in the foundations of set theory I", in Jean van
384	Heijenoort (ed.) _From Frege to Godel: A Source Book in
385	Mathematical Logic, 1879-1931_ (Harvard U.P., 1967)
386
387
388
389
390
391
392
393
394	Author:
395
396	Dan Connolly
397	545 Technology Square
398	Cambridge, MA 02139
399	617-258-8143
400	connolly@w3.org
401
402
403
404
405
406
407	Connolly [Page 7]