HTML+ Document Format Dave Raggett Internet Draft Hewlett Packard 28th October 1993 HTML+ (Hypertext markup format) Status of this memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months. Internet-Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet-Drafts as reference material or to cite them other than as a "working draft" or "work in progress". To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow directories on ds.internic.txt, nic.nordu.net, ftp.nisc.sri.com or munnari.oz.au. Distribution of this document is unlimited. Please mail comments to the author at dsr@hplb.hpl.hp.com or to the discussion list: www-talk@nxoc01.cern.ch This draft is valid until May 1st, 1994. It is available in the file draft-raggett-www-html-00.ps and draft-raggett-www-html-00.txt in the internet-drafts directories on the hosts mentioned above. Readers are recommended to try and obtain the Postscript version, which contains figures and formattting examples which are missing from the plain text version. Abstract This draft presents a proposal for a light weight delivery format for browsing and querying information in a web of globally distributed hypertext, accessible over the Internet. HTML+ embodies a pageless model making it suitable for efficient rendering on a wide range of display types, for example, VT100 terminals, X11 Workstations, Windows 3.x and the Macintosh. HTML+ is based upon SGML, and represents document elements at a logical level, e.g. headers, paragraphs, lists, tables, and figures. Authors can choose to create HTML+ documents directly, or to use filters to convert from other formats such as LaTeX, Framemaker, and Word for Windows. HTML+ has grown out of several years experience with the HTML document format in the World Wide Web community. Browser writers are experimenting with extensions to HTML and it is now appropriate to draw these ideas together into a revised document format.. The new format is designed to allow a gradual roll over from HTML, adding features like Internet Draft 1 November 1993 HTML+ Document Format Raggett tables, captioned figures and fill-out forms for querying remote databases or mailing questionnaires. Large documents can be split into a number of smaller nodes for reduced latency, with explicit or implicit navigation links. This draft also includes a proposal to add direct support for mathematical formulae. Authors can include limited presentation hints, and further control may eventually be possible via associated style sheets. Table of Contents HTML+ Discussion Document 4 Introduction 4 Positioning of HTML+ 4 HTML+ and HTML 5 HTML+ and SGML 5 An Overview of HTML+ 6 Document Structure 6 Large Documents 7 Headers 7 Paragraphs and
8 Normal Text 9 Character Sets and Entity Definitions 10 Hypertext Links 11 Character Emphasis 13 Presentation Only Tags 13 Generic Emphasis 13 Logical Emphasis 14 Extending the Set of Logical Roles 15 Annotations 15 Images 15 Change Bars and Document Amendments 17 Conditional Text 18 Explicit Line Breaks 18 Different Paragraph Styles 19 Longer Quotations 19 Abstracts 19 Bylines 19 Notes and admonishments 20 Lists 20 Ordered Lists 21 Bulleted Lists 21 Plain Lists 22 Definition Lists 23 Figures 24 Internet Draft 2 November 1993 HTML+ Document Format Raggett Active Areas 25 Placing Hypertext Buttons on Images 25 Possible extensions 26 Tables 26 Implementation Issues for Tables 28 Fill-out Forms and Input fields 28 Sending form data to an HTTP server 33 Sending a form via Electronic Mail 34 Literal and Preformatted Text 34 Mathematical Equations 36 Indexing 39 Document declarations 40 HTMLPLUS 40 The HEAD and BODY elements 40 TITLE 40 ISINDEX 40 NEXTID 41 BASE 41 LINK 41 Dealing with Large Documents 43 Acknowledgements 45 References 45 Appendix I - The HTML+ DTD 46 Appendix II - Character Entity Names 57 Appendix III - Code for Polygon testing 58 Appendix IV - Sorted list of tags and attributes 60 Internet Draft 3 November 1993 HTML+ Document Format Raggett HTML+ (Hypertext markup format) 1 HTML+ Discussion Document Following the WWW workshop in July 1993 and subsequent discussions on www-talk, it seems an opportune moment to try and reach a consensus before writing a more formal draft as an informational RFC. 1.1 Introduction The World Wide Web is a wide-area client-server architecture for retrieving hypermedia documents over the Internet. It also supports a means of searching remote information sources, for example bibliographies, phone directories and instruction manuals. There are three main ingredients: naming schemes for retrievable objects, protocols and interchange formats. o Universal naming scheme for documents. The Universal Resource Location (URL) syntax specifies documents in terms of the protocol to be used to retrieve them, their Internet Host and path name. A format for location independent lifetime identifiers is currently being defined by a working group of the IETF. A network protocol will allow Universal Resource Numbers (URNs) to be resolved to the URL for the nearest available copy. A URN may specify a number of variants of a document, but the URL will always specify a single copy. o Use of de facto protocols for retrieving documents over the Internet including FTP, NNTP, WAIS, Gopher and HTTP. The latter being designed specifically for the World Wide Web, and uses the MIME message format for document exchange. o A document format supporting hypertext links based on URLs and URNs which can be rendered on a wide variety of display types. HTML+ is intended in this role as a successor to the existing HTML format. HTML+ documents offer a means for providing hypertext links to a variety of media including images, sound sequences, MPEG movies, Postscript files and other formats. These links allow a global web of information sources to be established as new servers and document names are announced. Registers of information sources can also be made available via the web, using its ability to let users search for information via keywords. It is hoped that HTML+ will be useful for information exchange via email and network news as well as HTTP. 1.2 Positioning of HTML+ HTML+ is designed for use in the World Wide Web as a non-proprietary delivery format for wide-area hypertext. It embodies a pageless model making it suitable for efficient rendering on a wide range of display Internet Draft 4 November 1993 HTML+ Document Format Raggett types including VT100 terminals, X11, Windows 3.1 and the Macin- tosh. HTML+ is based upon SGML and represents document elements at a logical level. Authors may choose to create HTML+ documents directly or to use filters to convert from other formats such as LaTeX, Framemaker, and Word for Windows. 1.3 HTML+ and HTML HTML+ is a superset of HTML and designed to allow a gradual roll over from the earlier format, with features like tables, captioned figures and fill-out forms for querying remote data- bases or mailing questionnaires. Large documents can be split into a number of smaller nodes for reduced latency, with explicit or implicit navigation links. This draft also includes a proposal to add support for mathematical formulae. Authors can include limited presentation hints, and further control may eventually be possible via associated style sheets. 1.4 HTML+ and SGML HTML+ is based on the Standard Generalized Markup Language which is an international standard for document markup that is becoming increasingly important. The term markup derives from the way proof-readers have traditionally pencilled in marks that indicate how a document is to be revised. SGML grew out of a decade of work addressing the need for capturing the logical elements of documents as opposed to the processing functions to be performed on those elements. SGML is essentially an extensible document description language, based on a notation for embedding tags into the body of a document' s text. It is defined by the international standard ISO 8879. The markup structure permitted for each class of documents is defined by an SGML Document Type Definition, usually abbreviated to DTD. A lot of work is underway to produce DTDs for a range of purposes. These include ISO 12083 for books and ISO 10744 which defines the HyTime architectural forms for hypermedia/time- based documents. The Text Encoding Initiative (TEI) is an international research project for SGML-based document exchange in the humanities. Publishers are cooperating to produce common DTDs for computer manuals, e.g. the DocBook DTD. The CALS programme of the US Department of Defence defines SGML DTDs for documentation for defence procurement contracts. So what sets HTML+ apart from these efforts? It is impractical to design a DTD to meet the needs of all possible users. Instead, the markup has to be tailored to the needs of a specific community. HTML+ is aimed at fulfilling the dream of a web of information freely accessible over the Internet with links between documents spanning continents. The need to support a very wide range of display types and to keep browser software as simple as possible limits the complexity that can be handled. Similarly the disparate needs of authors has led Internet Draft 5 November 1993 HTML+ Document Format Raggett to the inclu- sion of limited rendering hints. The features supported arise from several years experience with the World Web and the existing HTML format. 2 An Overview of HTML+ HTML+ documents consists of headers, paragraphs, lists, tables and figures. A simple example of an HTML+ document is:
This is some normal text which will wrap at the window margin. You can emphasise parts of the text if you wish.
This is a new paragraph. Note that unlike title and header tags
the matching end tag is not needed.
The text of the document includes tags which are enclosed in
Normal text is automatically wrapped by the browser at the current
window margin and adapts to changes in window size. The text is
generally shown in a proportional font:
The P element acts as container for the text between
the start tag <P> and end tag </P>. You don't need
to give the end tag as it is implied by the context, e.g. the
following <P> tag.
If you wish, you may think of the <P> tag as
a paragraph separator. This works since HTML+ formally doesn't
require you to wrap text up as paragraphs.
This would be rendered as:
The P element acts as a container for the text between the start
tag and the end tag tag.
If you wish, you may think of the tag as a paragraph separator.
This works since HTML+ formally doesn't require you to wrap text up
as paragraphs.
Internet Draft 8 November 1993
HTML+ Document Format Raggett
The following samples of HTML+ all produce exactly the same results
when displayed:
The first piece of text The second piece The first piece of text The second piece
The second piece
They all produce:
Different ways of using the P element
The first piece of text
The second piece
In some situations you will want to preserve the original line breaks
and spacing, for this you should use the LIT or PRE elements, these are
described in a later section. You can force line breaks in normal
paragraph text with the . Other possibilities are ALIGN=left (the default),
ALIGN=right, ALIGN=justify and ALIGN=indent. This attribute is a hint
and may be ignored by some browsers. Note that when using explicit line
breaks (see section 5.12) you may wish to switch off word wrap with
WRAP=OFF.
Browsers, when parsing paragraphs, can choose to simply treat the
tag as denoting a paragraph break. If the paragraph style includes a
blank line between paragraphs, then additional care is needed after
headers and other major elements to avoid inserting an unwanted blank
line, e.g. when a tag directly follows a header. This ability to
perceive as a paragraph break provides for continuity with HTML,
and allows authors to graduate to treating it as a container in their
own time.
5 Normal Text
Paragraphs can include the following:
o Character entity names for unusual characters such as o which
are included using SGML entity definitions: &name; as in
"the dream of óengus" which is displayed as: "the dream
of oengus". The full list of standard entity names recognised
Internet Draft 9 November 1993
HTML+ Document Format Raggett
by most browsers is given in Appendix II
o Character emphasis using logical and presentational markup. The
set of logical character emphasis can be extended, and HTML+
provides the means for browsers to deter- mine how to render
such extensions
o Simple footnotes or margin notes, which can be rendered as
pop-up overlays
o Images which act as single characters and which can be
vertically aligned relative to the text line in which
they are embedded
o Hypertext Links based on the URL or URN notations
o Markup signifying the start and end of change bars. You can
also mark text as being removed or added, as is common in
legal documents
o Conditional text which appears only on-line or only when printed
o Input fields when the paragraph is part of a form
o Explicit line breaks
5.1 Character Sets and Entity Definitions
By default, HTML+ documents are made up of 8-bit characters from the
ISO 8859 Latin-1 character set. The network protocol used to retrieve
documents may translate the character set into a locally acceptable
form, e.g. EBCDIC. The HTTP protocol uses the MIME standard (RFC 1341)
to specify the document type and character set. ISO SGML entity
definitions are used to include characters which are missing from the
character set or which would otherwise be confused with markup
elements, e.g:
& ampersand &
< less than sign <
> greater than sign >
" the double quote sign "
Appendix II lists a broad range of characters and symbols, relating
their ISO names to the corresponding character codes in common
character sets. They allow authors to include accented characters in
7-bit ASCII documents. Some other useful entity definitions are:
– en dash (half the width of an em unit)
— em dash (equal to width of an "m" character)
en space
em space
Internet Draft 10 November 1993
HTML+ Document Format Raggett
non breaking space
soft hyphen (normally invisible)
© copyright sign
™ trade mark sign
® registered sign
There are a large number of entities defined by the ISO, covering most
languages and symbols for publishing and mathematics. Requiring all
browsers to support these would be impractical, e.g. how should a dumb
terminal show such symbols. In some cases there will be accepted ways
of mapping them to normal characters, e.g. as ae and e as e. Perhaps
the safest recommendation is that where authors need to use a
specialised character or symbol, they should use ISO entity names
rather than inventing their own. Browsers should leave unrecognised
entity names untranslated.
In some cases it is useful to specify the language used in a given
element, with the LANG attribute. The ISO defines abbreviations for
most languages, e.g. FR for french as in: Before
coming to CERN, Tim worked on, among other things, document
production and text processing. He developed his first hypertext
system, "Enquire", in 1980 for his own use (although unaware of
the existence of the term HyperText). With a background in text
processing, real-time software and communications, Tim decided
that high energy physics needed a networked hypertext system and
CERN was an ideal site for the development of wide-area hypertext
ideas. Tim started the WorldWideWeb project at CERN in 1989. He
wrote the application on the NeXT along with most of the
communications software.
The IMG element specifies an image via a URL. The ALIGN=TOP attribute
ensures that the top of the image is level with the top of the current
text line. You can also use ALIGN=MIDDLE to align the center of the
image with that of the current text line, and ALIGN=BOTTOM to align the
bottom of the image with the bottom of the current text line. Browsers
are not expected to apply text flow retrospectively, so using
ALIGN=MIDDLE and ALIGN=BOTTOM may overwrite previous lines of text. If
the ALIGN attribute is missing then ALIGN=TOP is assumed.
Not all display types can show images. The IMAGE element behaves in the
same way as IMG but allows you to include descriptive text, which can
be shown on text-only displays:
This bill would require the Legislative Counsel, with the advice
of the Joint Rules Committee of the Senate and Assembly, to make
available to the public by means of access by way of
contains the
text of the paragraph.
22 The Avenue .
Internet Draft 18 November 1993
HTML+ Document Format Raggett
6 Different Paragraph Styles
To avoid all text appearing in the same style, HTML+ provides distinct
styles for quotes, abstracts, bylines and admonishments. All these
elements can contain multiple paragraphs:
6.1 Longer Quotations
When you want to include a quotation that extends over more that one
paragraph, you should use the QUOTE element. Quoted text should
preferably be indented, and rendered using a distinctive font, e.g.
The following is a quotation from the forward by Yuri Rubinsky
to "The SGML Handbook" by Charles F. Goldfarb, published by the
Clarendon Press, Oxford, 1990.
element
is allowed.
7 Lists
There are three kinds of lists, which can be freely nested within one
another:
o Ordered lists - the list items are automatically numbered
o Unordered lists - bulleted or plain styles, in single or
multiple columns
o Definition lists of terms and associated definitions
Internet Draft 20 November 1993
HTML+ Document Format Raggett
7.1 Ordered Lists
The OL element is used with LI for each item to represent ordered
lists:
The text in the following paragraphs will flow around the figure
if there is enough room. The browser is free to position the caption at
the top, bottom or sides of the figure.
which is rendered as:
[ ] The text in the following paragraphs will
[ picture missing ] flow around the figure if there is enough
[ ] room. The browser is free to position the
[ ] caption at the top, bottom or sides of the
figure.
"Not curried fish again!"
Internet Draft 24 November 1993
HTML+ Document Format Raggett
Note that browsers can only support a limited range of image types.
Currently these are GIF and XBM (X bitmap format). This list will
evolve over time.
8.1 Active Areas
The uppe left of the image is designated as x,y = (0, 0), with x
increasing across the page and y down the page. This choice was made
for continuity with the IMG element in HTML, to ensure a simple
migration path to HTML+. If points are given in real numbers, the lower
right corner of the image is taken as being (1.0, 1.0), otherwise, with
integer values the coordinates are assumed to be in pixels. A simple
test to distinguish the two schemes is to check if a "." character
occurs anywhere in the list of points. Using scaled coordinates is much
safer as the pixel extent of an image may alter, e.g. as a result of
format negotiation with the server.
For some images, HTTP servers will be able to handle mouse/pen clicks
or drags on the image. This is signalled in the header information
returned along with the image data. Alternatively, the ISMAP attribute
can be used to signal this capability. The mouse click is sent to the
server indicated by the URL in the SRC attribute, using the same URL
plus the suffix "?x=X&y=Y" where X and Y are the coordinates of the
click event. Mouse drags can be used
to designate a rectangular region of the image. In this case the suffix
takes the form: "?x=X&y=Y&w=W&h=H" where (X, Y) is the upper left of
the rectangle, and (W, H) define its width and height. The ISMAP
mechanism is useful when the active regions in the image change their
boundaries with time, e.g.
and Introduction to HTML+
Internet Draft 7 November 1993
HTML+ Document Format Raggett
Header names should be appropriate to the following section of the
document, while the title should cover the document as a whole. There
are no restrictions on the sequence of headers, e.g. you could use a
level three header following a level one header.Header and section
elements can take an identifier, unique to the current document, for
use as named destinations of hypertext links. This is specified with
the ID attribute, e.g.
Introduction to HTML+
This allows authors to create hypertext links to particular sections of
documents. It is a good idea to use something obvious when creating an
identifier, to help jog your memory at a later date. WYSIWYG editors
should automatically generate identifiers. In this case, they should
provide a point and click mechanism for defining links so that authors
don't need to deal explicitly with identifier names. Automatic
generation of IDs for headers, paragraphs and other major elements is
important as it makes it easier for other people to create links to
your document, by ensuring that there are plenty of ID attributes
present as potential destinations.
Should we support headers for which the level is implicitly defined by
nestable section elements? We could also support autonumbering of
headers. Unfortunately, on further investigation these ideas proved
trickier than thought at first, and so have been dropped from this
draft.
4 Paragraphs and Different ways of using the P element
Different ways of using the P element
Different ways of using the P element
The first piece of text
element, but the browser may wrap lines
arbitrarily at window margins prior to reaching the
element.
The ALIGN attribute can be used to center a paragraph, e.g. Je
m'aveugle.
. This attribute permits language dependent layout and
hyphenation decisions, e.g. Hebrew uses right to left word order.
To allow SGML parsers to recognise entity names, authors should declare
them before use, for example:
%ISOcyr1;
This introduces ISOcyr1 as a local name for the ISO public identifier
for the cyrillic alphabet and then includes the associated set of
entity definitions as part of the current document. This declaration is
unnecessary for entities defined within the HTML+ DTD.
5.2 Hypertext Links
HTML+ allows authors to embed hypertext links into the document text.
In a browser this might look to the reader like:
Links are defined with the A tag[1]. HTML+ supports a number of different
link types[2].
Clicking on a link will normally cause the browser to retrieve the
linked document and display it in place of the current one. This
example is represented by the following piece of HTML+
Links are defined with the A tag. HTML+ supports
a number of different link types.
The first link is to an anchor named "z1" in the current document
(using an ID attribute on some element). The second is to a file named
"links.html" in the same directory as the current document. The link
Internet Draft 11 November 1993
HTML+ Document Format Raggett
caption is the text between the start and end tags. The HREF attribute
defines the link destination using the URL or URN notations. This may
be abbreviated in certain circumstances using relative URLs. The link
should be rendered in a clearly distinguishable way, e.g. as a raised
button, or with underlined text in a particular color or emphasis. For
displays without pointing devices, it is suggested that the link is
indicated with a reference number in square brackets after the caption,
which the reader enters to follow the link. Note that it is illegal
for anchors to include headers, paragraphs, lists etc. The anchor text
is restricted to normal text with emphasis and inline images.
The A element has several optional attributes:
ID This can be used to define a unique identifier for the text
contained by the A element. Another document can then make
a reference to this by putting the identifier after the URL
for this document, separated by a hash sign. The ID
attribute replaces the NAME attribute in HTML.
HREF This specifies a URL or URN of a linked document which will
be retrieved when the user clicks on the anchor's label.
HREF=#id can be used for links to other parts of the same
document.
REL The relationship the linked document has to this one.
REL=Subdocument is used to break long documents into
smaller ones. This importance of this particular
attribute value is explained in section 14.
REV The reverse relationship type, and the inverse of REL.
EFFECT This determines how the browser displays the linked document
when following the link. EFFECT=Replace causes the browser
to replace the current document with the linked one;
EFFECT=NEW results in the linked document being shown in a
new window (if practical); and EFFECT=OVERLAY causes the
linked document to be shown in a pop-up window, as used by
the Microsoft Windows Help system.
PRINT This attribute makes it easy for users to print off the
current document and relevant parts. PRINT=REFERENCE (the
default) treats the link as a reference, i.e. the URL is
given as a footnote; PRINT=FOOTNOTE prints the linked
document as a footnote; PRINT=SIDEBAR prints the linked
document as a sidebar; and PRINT=SECTION prints the
linked document as a follow on section. Use PRINT=SILENT
when you don't want the link referenced or printed out.
TITLE Defines the title to use when the linked document is
otherwise untitled.
Internet Draft 12 November 1993
HTML+ Document Format Raggett
TYPE The MIME content type of the linked document - for use
in providing presentation cues only, as it could easily
become out of date.
SIZE The size in bytes for the linked document. This should only
be used as a guide to progress in retrieving documents, as
it is likely to get out of step with changes to the target
document.
METHODS This is a comma separated list of HTTP methods supported
by the linked object. The browser might choose a different
way of rendering the link for say searchable objects.
SHAPE This is used to define shaped buttons on top of images or
figures, and is explained later on.
You can also use the LINK element at the start of the document to
define document-wide relationships with other documents, e.g. a link to
a table of contents. This is described later on.
5.3 Character Emphasis
There has been considerable discussion on how to represent character
emphasis. The previous draft of HTML+ used a single element to handle
all forms with a role attribute for the logical role, and other
attributes for providing hints as to how to render the emphasis. This
mechanism was seen as being overloaded and prompted the use of separate
elements in the current draft.
5.4 Presentation Only Tags
In many cases it is convenient to indicate directly how the text is to
be rendered, e.g. as italic, bold, underline or strike-through:
italic text italic text
bold text bold text
underlined text underlined text
strike through strike through
superscript superscript
subscript subscript
fixed pitch fixed pitch (TT for Teletype)
These tags may be nested to combine effects, e.g.
bold-italic-fixed-pitch text, and should be considered as hints rather
than as binding obligations on the browser, e.g.
Some bold italic fixed pitch text.
which is rendered as: Some bold italic fixed pitch text.
5.5 Generic Emphasis
Internet Draft 13 November 1993
HTML+ Document Format Raggett
These are some tags for indicating a level of emphasis without
committing oneself to how they should be rendered:
normal emphasis typically italic
strong emphasis typically bold
5.6 Logical Emphasis
These tags indicate the role of the marked text, e.g. bibliographic
references. By using a stand- ard way of marking up text, it becomes
possible to automatically index such references. There are a
potentially huge number of different distinctions that could be made,
and the set given below is intentionally minimalistic. Discussion is
welcomed on just which elements should be included in HTML+ given its
intended role as a delivery format for hypertext documents:
q a short quotation which can be included inline,
e.g. to be or not to be, that is the question
use and
in place of double quote marks.
cite citation, e.g. Festinger, L.(1957), A Theory
of Cognitive Dissonance, Stanford.
person proper names, e.g.
element which acts as a forced line break.
Harrow,
London, NW1 5ER
The The next five years will see a revolution in computing. Users
will no longer have to work at every computer task as if they had
no need or ability to share data with all their other computer tasks,
they will not need to act as if the computer is simply a replacement
for paper, nor will they have to appease computers or software
programs that seem to be at war with one another.
which might be rendered as:
The following is a quotation from the forward by Yuri Rubinsky to
"The SGML Handbook" by Charles F. Goldfarb, published by the
Clarendon Press, Oxford, 1990.
The next five years will see a revolution in computing. Users
will no longer have to work at every computer task as if they
had no need or ability to share data with all their other computer
tasks, they will not need to act as if the computer is simply a
replacement for paper, nor will they have to appease computers or
software programs that seem to be at war with one another.
6.2 Abstracts
The ABSTRACT element can be used to give an overview of a document and
typically follows a level one heading. It should be rendered in an
easily read font, distinct from normal text, and preferably indented.
An example is given in the next section.
6.3 Bylines
The BYLINE element is similar to QUOTE and is used for information
about the author, e.g. contact details and release date. A common
convention is to include a hypertext link to a node with more
information about the author. Bylines can occur at the beginning or end
of a document, e.g:
Internet Draft 19 November 1993
HTML+ Document Format Raggett
HTML+ (Hypertext markup format
element which can appear anywhere a
which is usually rendered as:
1) Wake up
2) Get dressed
3) Have breakfast
4) Drive to work
The COMPACT attribute when present e.g. has the effect of
reducing inter-item spacing. The numbering style is the responsibility
of the browser. Other styles use roman numerals or letters from the
alphabet in upper or lower case. One issue for browsers, is how to
render ordered lists, nested within a list of the same type. List item
text can't include headers, see the DTD in Appendix I for details.
7.2 Bulleted Lists
Bulleted lists are represented with the UL and LI elements:
which is usually rendered as:
o Wake up
o Get dressed
o Have breakfast
o Drive to work
Internet Draft 21 November 1993
HTML+ Document Format Raggett
The COMPACT attribute when present e.g. has the effect of
reducing inter-item spacing. The bullet style is the responsibility of
the browser, and normally an unordered list nested within a list of the
same type is given a different style (bullet, dash, box or check).
Authors can instead use the SRC attribute for the LI element to specify
an icon with a URL or URN, e.g.
. The WRAP attribute is used
for multi-column lists and should be WRAP=HORIZ for horizontally
wrapping of list items or WRAP=VERT for vertical wrapping of list
items, e.g.
without the WRAP attribute, this is rendered as:
icons1/
icons2/
icons3/
src/
xpm-3-paper.ps
xpm-3.2-to-3.2a.patch
with this would appear like:
icons1/ icons3/ xpm-3-paper.ps
icons2/ src/ xpm-3.2-to-3.2a.patch
with WRAP=HORIZ it would appear like:
icons1/ icons2/ icons3/
src/ xpm-3-paper.ps xpm-3.2-to-3.2a.patch
Everyday familiarity with printed lists leads us to expect lists to be
organized into columns which are read top to bottom; horizontally
wrapped lists are seldom seen. Browsers are free to choose the number
of columns to match the current window size and item widths. If there
are N items and M columns then the longest column will have (N+M-1)/M
rows. This requires a prepass through the list to count the items (and
optionally their maximum width). However, this information can be
Internet Draft 22 November 1993
HTML+ Document Format Raggett
cached to avoid speed penalties when resizing the window or refreshing
the screen. You can use the SRC attribute for the LI element to specify
an icon for each item in the list, e.g. to show the type of each
document in a directory listing.
For convenience, the
and
respectively.
7.4 Definition Lists
These consist of pairs of terms and definitions, but can also be used
for plays as in:
This could be rendered as:
King Henry: I myself heard the King say he would not be
ransomed.
Williams: Ay, he said so, to make us fight cheerfully: but
when our throats are cut he may be ransomed, and we none
the wiser.
King Henry: If I live to see it, I will never trust his word
after.
Williams: You pay him then! That's a perilous shot out of an
elder-gun, that a poor and private displeasure can do
against a monarch! You may as well go about to turn the sun
to ice, with fanning his face with a peacock's feather.
You'll never trust his word after! Come `tis a foolish saying.
or as:
King Henry I myself heard the King say he would not be
Internet Draft 23 November 1993
HTML+ Document Format Raggett
ransomed.
Williams Ay, he said so, to make us fight cheerfully: but
when our throats are cut he may be ransomed, and
we none the wiser.
King Henry If I live to see it, I will never trust his word
after.
Williams You pay him then! That's a perilous shot out of an
elder-gun, that a poor and private displeasure can
do against a monarch! You may as well go about to
turn the sun to ice, with fanning his face with a
peacock's feather. You'll never trust his word
after! Come `tis a foolish saying.
Browsers should make allowance for the infrequent case when the term
text (DT) is longer than the definition text (DD) and wraps onto
subsequent lines. Note that you are allowed to have several consecutive
DT elements followed by a DD element, but you can't have DD without an
associated DT element, The COMPACT attribute as in forces
the browser to use the former more compact style.
8 Figures
The FIG element is similar to the IMAGE element, but acts as a
paragraph. The ALIGN attribute can be one of LEFT (the default),
CENTER, RIGHT or FLOAT. This determines whether the figure is flush
left, centered or flush right. If ALIGN=FLOAT the figure may float to
another more convenient location (and possibly zoomed or reduced in the
process). A caption can be defined with the CAPTION element and
followed by text describing the figure for readers using text only
displays:
The BORDER attribute acts as a hint to the browser to draw lines
enclosing each cell. The TH ele- ment precedes header cell text and the
TD element precedes data cell text. The TR element is used to separate
table rows. By default text is centered in each cell. Header text
should be shown emphasised, e.g. the browser could use a bold sans
serif font for headers and a serif font for the data cells. The next
example shows how cells can be merged with their neighbors:
Table 2: A more complex table
average other
height weight category
males 1.9 0.003 yyy
females 1.7 0.002 xxx
This table is defined by the markup:
Year Month Day
1972 June 23rd
1982 October 7th
The first cell (a header cell) is merged with the cell below it: average other
category
height weight
males 1.9 0.003 yyy
females 1.7 0.002 xxx
. Note that this merged cell is empty - the definition of the
next column for the first row starts immediately. Looking again at the
first row, the second column is merged with the third: .
The definition for the third column is skipped as it was covered by the
merged cell. The fourth column/first row is also merged, this time with
the next row: . The
element has been used here to
force a line break between other and category. The element
signifies the end of the first row and the beginning of the second.
Note that empty cells at the end of a row can be omitted as the
element unambiguously marks the end of the row.
The second row only contains definitions for the second and third
columns since the others were merged with cells on the preceding row.
The general rule is to avoid defining any cell twice. The last two rows
start with headers and the align=left attribute ensures that the
Internet Draft 27 November 1993
HTML+ Document Format Raggett
browser will align these headers to the left of their cells. The ALIGN
attribute can be one of LEFT, CENTER or RIGHT, with CENTER as the
default. It can be used with both TH and TD.
9.1 Implementation Issues for Tables
Browsers need a prepass through the table markup to count the number of
columns and determine their widths. A simple algorithm that takes
merged cells into account will suffice. Text fields wrap to fit their
columns, which should be sized to best match current window width. This
information should be cached to avoid speed penalties during subsequent
screen refresh/ window resize operations. Browsers can ignore alignment
hints if required, and using a fixed pitch font may speed up the sizing
step.
The number of columns is given by the row with the largest number of
and elements, remembering to add in merged cells. The widths
of columns are evaluated by finding the minimum and maximum widths
needed for each cell, and hence the minimum and maximum width for the
column as a whole. All this can be done during a single pass through
the element. Caching these min/max values for each column then
permits the browser to instantly adjust the table when the window is
resized.
10 Fill-out Forms and Input fields
Forms are composed by placing input fields within paragraphs,
preformatted/literal text, lists and tables. This gives considerable
scope in designing the layout of forms. Each field is defined by an
INPUT element and must have an NAME attribute which uniquely names the
field in the document. Additional optional attributes can be used to
specify the type of the field (defaults to free text), its
size/precision, its initial value and whether the field is currently
disa- bled or in error:
This fictitious example is a questionnaire that will be emailed to
www_admin@info.cern.ch. The FORM element is used to delimit the form.
There can be several forms in a single document, but the FORM element
can't be nested. The ACTION attribute specifies a URL that designates
an HTTP server or an email address. If missing, the URL for the
document itself will be assumed. The effect of the action can be
modified by including a method prefix, e.g. ACTION="POST http://....".
This prefix is used to select the HTTP method when sending the form's
contents to an HTTP server. Would it be cleaner to use a separate
attribute, e.g. METHOD?
Servers can disable forms by sending an appropriate header or by an
attribute on the optional HTMLPLUS element at the very start of the
document, e.g.
elements have been used to lay out the text and
input fields. The browser has changed the background color within the
FORM element to distinguish the form from other parts of the document.
The browser is responsible for handling the input focus, i.e. which
field will currently get keyboard input.
For many platforms there will be existing conventions for forms, e.g.
tab and shift-tab keys to move the keyboard focus forwards and
backwards between fields, while an Enter key submits the form. In the
example, the Submit and Reset buttons are specified explicitly with
special purpose fields. The Submit button is used to email the form or
send its contents to the server as specified by the ACTION attribute,
while the Reset button resets the fields to their initial values. When
the form consists of a single text field, it may be appropriate to
leave such buttons out and rely on the Enter key.
The INPUT element has the following attributes:
NAME Symbolic name used when transferring the form's
contents. This attribute is always needed and
should uniquely identify this field.
TYPE Defines the type of data the field accepts. Defaults
to free text.
SIZE Specifies the size or precision of the field according
to its type.
MAXLENGTH The maximum number of characters that will be accepted
as input. This can be greater that specified by SIZE,
in which case the field will scroll appropriately.
The default is unlimited.
VALUE The initial value for the field, or the value when
checked for checkboxes and radio buttons. This
attribute is required for radio buttons.
CHECKED When present indicates that a checkbox or radio
button is selected.
Internet Draft 30 November 1993
HTML+ Document Format Raggett
DISABLED When present indicates that this field is temporarily
disabled. Browsers should show this by "greying it"
out in some manner.
ERROR When present indicates that the field's initial value
is in error in some way, e.g. because it is
inconsistent with the values of other fields. Servers
should include an explanatory error message with the
form's text.
SRC A URL or URN specifying an image - for use only with
TYPE=IMAGEMAP.
ALIGN Vertical alignment of the image - for use only with
TYPE=IMAGEMAP.
The following types of fields can be defined with the TYPE attribute
(upper or lower case):
TEXT Single line text entry fields. Use the SIZE attribute
to specify the visible width in characters, e.g.
SIZE="24" for a 24 character field. The MAX attribute
can be used to specify an upper limit to the number of
characters that can be entered into a text field, e.g.
MAX=72. Use the TEXTAREA element for text fields which
can accept multiple lines (see below).
INT For entering integer numbers, the maximum number of
digits can be specified with the SIZE attribute
(excluding the sign character), e.g. size=3 for a
three digit number.
FLOAT For fields which can accept floating point numbers.
DATE Fields which can accept a recognized date format.
URL For fields which expect document references as URLs or
URNs.
CHECKBOX Used for simple Boolean attributes, or for attributes
which can take multiple values at the same time. The
latter is represented by a number of checkbox fields
each of which has the same NAME.
RADIO For attributes which can take a single value from a set
of alternatives. Each radio button field in the group
should be given the same NAME.
RANGE This allows you to specify an integer range with the
MIN and MAX attributes, e.g. MIN=1 MAX=100. Users can
Internet Draft 31 November 1993
HTML+ Document Format Raggett
select any value in this range.
IMAGE This allows you to specify an image field upon which
you can click with a pointing device. The SRC and
ALIGN attributes are exactly the same as for the IMG
and IMAGE elements. The symbolic names for the x and
y coordinates of the click event are specified with
name.x and name.y for the name given with the NAME
attribute. The VALUE attribute is ignored.
SCRIBBLE A field upon which you can write with a pen or mouse.
The size of the field in millimeters is given as
SIZE=width,height. The units are absolute as they
relate to the dimensions of the human hand, rather
than pixels of varying resolution. The scribble may
involve time and pressure data in addition to the basic
ink data. You can use scribble for signatures or
sketches. The field can be initialised by setting the
SRC attribute to a URL which contains the ink. The
VALUE attribute is ignored.
AUDIO This provides a way of entering spoken messages into
a form. Browsers might show an icon which when clicked
pops-up a set of tape controls that you can use to
record and replay messages. The initial message can be
set by specifying a URL with the SRC attribute. The
VALUE attribute is ignored.
SUBMIT This is a button that when pressed submits the form. It
is provided as a kind of field to offer authors some
control over the location of this button. You can use
an image as a submit button by specifying a URL with
the SRC attribute.
RESET This is a button that when pressed resets the form's
fields to their initial values as specified by the
VALUE attribute. You can use an image as a reset
button by specifying a URL with the SRC attribute.
When you need to let users enter more than one line of text, you should
use the TEXTAREA element, e.g.
The text up to the end tag is used to initialize the field's value.
This end tag is always required even if the field is initially blank.
The ROWS and COLS attributes determine the visible dimension of the
Internet Draft 32 November 1993
HTML+ Document Format Raggett
field in characters. Browsers are recommended to allow text to grow
beyond these limits by scrolling as needed. In the initial design for
forms, multi-line text fields were supported by the INPUT element with
TYPE=TEXT. Unfortunately, this causes problems for fields with long
text values as SGML limits the length of attribute literals. The HTML+
DTD allows for up to 1024 characters (the SGML default is only 240
characters!).
The RADIO and CHECKBOX fields can be used to specify multiple choice
forms in which every alternative is visible as part of the form. An
alternative is to use the SELECT element which is generally rendered in
a more compact fashion as a pull down combo list. Every alternative is
represented by the OPTION element, e.g.
The SEVERAL attribute is needed when users are allowed to make several
selections, e.g.