HTML Tables 03 Oct 95 INTERNET DRAFT Dave Raggett, W3C Expires in six months email: HTML Tables Status of this Memo This document is an Internet draft. Internet drafts are working documents of the Internet Engineering Task Force (IETF), its areas and its working groups. Note that other groups may also distribute working information as Internet drafts. Internet Drafts are draft documents valid for a maximum of six months and can be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use Internet drafts as reference material or to cite them as other than as "work in progress". To learn the current status of any Internet draft please check the "lid-abstracts.txt" listing contained in the Internet drafts shadow directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or ftp.isi.edu (US West coast). Further information about the IETF can be found at URL: http://www.cnri.reston.va.us/ Distribution of this document is unlimited. Please send comments to the HTML working group (HTML-WG) of the Internet Engineering Task Force (IETF) at . Discussions of this group are archived at URL: http://www.acl.lanl.gov/HTML-WG/archives.html. Abstract The HyperText Markup Language (HTML) is a simple markup language used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with generic semantics that are appropriate for representing information from a wide range of applications. This specification extends HTML to support a wide variety of tables. The model is designed to work well with associated style sheets, but does not require them. It also supports rendering to braille, or speech, and exchange of tabular data with databases and spreadsheets. The html table model embodies certain aspects of the CALS table model, e.g. the ability to group table rows into thead, tbody and tfoot sections, plus the ability to specify cell alignment compactly for sets of cells according to the context. ------------------------------------------------------------------------------ Dave Raggett Page 1 HTML Tables 03 Oct 95 Contents * Recent Changes .................................................... 2 * Brief Introduction ................................................ 3 * Design Rationale .................................................. 5 * Walkthrough of the Table DTD ...................................... 7 * Recommended Layout Algorithms .................................... 18 * The Table DTD .................................................... 21 * References ....................................................... 24 Recent Changes This specification extends HTML to support tables. The table model has grown out of early work on HTML+ and the initial draft of HTML3. The earlier model has been been extended in response to requests from information providers for improved control over the presentation of tabular information: * alignment on designated characters such as "." and ":" e.g. aligning a column of numbers on the decimal point * more flexibility in specifying table frames and rules * incremental display for large tables as data is received * the ability to support scrollable tables with fixed headers plus better support for breaking tables across pages for printing * optional column based defaults for alignment properties In addition, a major goal has been to provide backwards compatibility with the widely deployed Netscape implementation of tables. A subsidiary goal has been to simplify importing tables conforming to the SGML CALS model. ------------------------------------------------------------------------------ Dave Raggett Page 2 HTML Tables 03 Oct 95 A Brief Introduction to HTML Tables Tables start with an optional caption followed by one or more rows. Each row is formed by one or more cells, which are differentiated into header and data cells. Cells can be merged across rows and columns, and include attributes assisting rendering to speech and braille, or for exporting table data into databases. The model provides limited support for control over appearence, for example horizontal and vertical alignment of cell contents, border styles and cell margins. You can further affect this by grouping rows and columns together. Tables can contain a wide range of content, such as headers, lists, paragraphs, forms, figures, preformatted text and even nested tables. Example
A test table with merged cells
Average other
category
Misc
heightweight
males1.90.003
females1.70.002
On a dumb terminal, this would be rendered something like: A test table with merged cells /--------------------------------------------------\ | | Average | other | Misc | | |-------------------| category |--------| | | height | weight | | | |-----------------------------------------|--------| | males | 1.9 | 0.003 | | | |-----------------------------------------|--------| | females | 1.7 | 0.002 | | | \--------------------------------------------------/ Next, a richer example with grouped rows and columns (adapted from "Developing International Software" by Nadine Kano). First here is what the table looks like on paper: Dave Raggett Page 3 HTML Tables 03 Oct 95 CODE-PAGE SUPPORT IN MICROSOFT WINDOWS =============================================================================== Code-Page | Name | ACP OEMCP | Windows Windows Windows ID | | | NT 3.1 NT 3.51 95 ------------------------------------------------------------------------------- 1200 | Unicode (BMP of ISO 10646) | | X X * 1250 | Windows 3.1 Eastern European | X | X X X 1251 | Windows 3.1 Cyrillic | X | X X X 1252 | Windows 3.1 US (ANSI) | X | X X X 1253 | Windows 3.1 Greek | X | X X X 1254 | Windows 3.1 Turkish | X | X X X 1255 | Hebrew | X | X 1256 | Arabic | X | X 1257 | Baltic | X | X 1361 | Korean (Johab) | X | ** X ------------------------------------------------------------------------------- 437 | MS-DOS United States | X | X X X 708 | Arabic (ASMO 708) | X | X 709 | Arabic (ASMO 449+, BCON V4) | X | X 710 | Arabic (Transparent Arabic) | X | X 720 | Arabic (Transparent ASMO) | X | X =============================================================================== The markup for this uses COL elements to group columns and set column alignment. TBODY elements are used to group rows. The FRAME and RULES attributes are used to select which borders to render.
CODE-PAGE SUPPORT IN MICROSOFT WINDOWS
Code-Page
ID
Name ACP OEMCP Windows
NT 3.1
Windows
NT 3.51
Windows
95
1200Unicode (BMP of ISO 10646)XX*
1250Windows 3.1 Eastern EuropeanXXXX
1251Windows 3.1 CyrillicXXXX
1252Windows 3.1 US (ANSI)XXXX
1253Windows 3.1 GreekXXXX
1254Windows 3.1 TurkishXXXX Dave Raggett Page 4 HTML Tables 03 Oct 95
1255HebrewXX
1256ArabicXX
1257BalticXX
1361Korean (Johab)X**X
437MS-DOS United StatesXXXX
708Arabic (ASMO 708)XX
709Arabic (ASMO 449+, BCON V4)XX
710Arabic (Transparent Arabic)XX
720Arabic (Transparent ASMO)XX
------------------------------------------------------------------------------ Design Rationale The HTML table model has evolved from studies of existing SGML tables models, the treatment of tables in common word processing packages, and looking at a wide range of tabular layout in magazines, books and other paper-based documents. The model was chosen to allow simple tables to be expressed simply with extra complexity only when needed. This makes it practical to create the markup for HTML tables with everyday text editors and reduces the learning curve for getting started. This feature has been very important to the success of HTML to date. Increasingly people are using filters from other document formats or direct wysiwyg editors for HTML. It is important that the HTML table model fits well with these routes for authoring HTML. This affects how the representation handles cells which span multiple rows or columns, and how alignment and other presentation properties are associated with groups of cells. A major consideration for the HTML table model is that the fonts and window sizes etc. in use with browsers are not under the author's control. This makes it risky to rely on column widths specified in terms of absolute units such as picas or pixels. Instead, tables can be dynamically sized to match the current window size and fonts. Authors can provide guidance as to the relative widths of columns, but user agents should to ensure that columns are wide enough to avoid clipping cell contents. For large tables or slow network connections, it is desirable to be able to start displaying the table before all of the data has been received. The default window width for most user agents shows about 80 characters, and the graphics for many HTML pages are designed with these defaults in mind. Authors can provide a hint to user agents to activate incremental display of table contents. This feature requires the author to specify the number of columns, and includes provision for control of table width and the relative widths of different columns. Dave Raggett Page 5 HTML Tables 03 Oct 95 For incremental display, the browser needs the number of columns and their widths. The default width of the table is the current window size (width="100%"). This can be altered by including a WIDTH attribute in the TABLE start tag. By default all columns have the same width, but you can specify column widths with one or more COL elements before the table data starts. The remaining issue is the number of columns. Some people have suggested waiting until the first row of the table has been received, but this could take a long time if the cells have a lot of content. On the whole it makes more sense, when incremental display is desired, to get authors to explicitly specify the number of columns in the TABLE start tag. Authors still need a way of informing the browser whether to use incremental display or to automatically size the table to match the cell contents. For the two pass auto sizing mode, the number of columns is determined by the first pass, while for the incremental mode, the number of columns needs to be stated up front. So it seems to that the COLS=--nn-- would be better for this purpose, than a LAYOUT attribute such as LAYOUT=FIXED or LAYOUT=AUTO. It is generally held useful to consider documents from two perspectives: Structural idioms such as headers, paragraphs, lists, tables, and figures; and rendering idioms such as margins, leading, font names and sizes. The wisdom of past experience encourages us to separate the structural information of documents from rendering information. Mixing them together ends up causing increased cost of ownership for maintaining documents, and reduced portability between applications and media. For tables, the alignment of text within table cells, and the borders between cells are, from the purist's point of view, rendering information. In practice, though, it is useful to group these with the structural information, as these features are highly portable from one application to the next. The HTML table model leaves most rendering information to associated style sheets. The model is designed to take advantage of such style sheets but not to require them. This specification provides a superset of the simpler model presented in earlier work on HTML+. Tables are considered as being formed from an optional caption together with a sequence of rows, which in turn consist of a sequence of table cells. The model further differentiates header and data cells, and allows cells to span multiple rows and columns. Following the CALS table model, this specification allows table rows to be grouped into head and body and foot sections. This simplifies the representation of rendering information and can be used to repeat table head and foot rows when breaking tables across page boundaries, or to provide fixed headers above a scrollable body Dave Raggett Page 6 HTML Tables 03 Oct 95 panel. In the markup, the foot section is placed before the body sections. This is an optimization shared with CALS for dealing with very long tables. It allows the foot to be rendered without having to wait for the entire table to be processed. For the visually impaired, HTML offers the hope of setting to rights the damage caused by the adoption of windows based graphical user interfaces. The HTML table model includes attributes for labeling each cell, to support high quality text to speech conversion. The same attributes can also be used to support automated import and export of table data to databases or spreadsheets. This specification allows authors to define groups of columns along with column based alignment properties. A simple inheritance mechanism defines the precedence order for determining applicable values for alignment for each cell. Current desktop publishing packages provide very rich control over the rendering of tables, and it would be impractical to reproduce this in HTML, without making HTML into a bulky rich text format like RTF or MIF. This specification does, however, offer authors the ability to choose from a set of commonly used classes of border styles. The BORDER attribute controls the appearence of the border frame around the table while the RULES attribute determines the choice of rulings within the table. A finer level of control will be possible via style sheets. ------------------------------------------------------------------------------ A walk through the table DTD The table document type definition provides the formal definition of the allowed syntax for html tables. The following is an annotated listing of the DTD. The complete listing appears at the end of this document. Note that the TABLE element is a block-like element rather a character-level element. As such it is a peer of other HTML block-like elements such as paragraphs, lists and headers. Common Attributes The following attributes occur in several of the elements and are defined here for brevity. In general, all attribute names and values in this specification are case insensitive, except where noted otherwise. Dave Raggett Page 7 HTML Tables 03 Oct 95 ID Used to define a document-wide identifier. This can be used for naming positions within documents as the destination of a hypertext link. It may also be used by style sheets for rendering an element in a unique style. An ID attribute value is an SGML NAME token. NAME tokens are formed by an initial letter followed by letters, digits, "-" and "." characters. The letters are restricted to A-Z and a-z. CLASS A space separated list of SGML NAME tokens. CLASS names specify that the element belongs to the corresponding named classes. These may be used by style sheets to provide class dependent renderings. LANG A LANG attribute identifies the natural language used by the content of the associated element.The syntax and registry of language values are defined by RFC 1766. In summary the language is given as a primary tag followed by zero or more subtags, separated by "-". White space is not allowed and all tags are case insensitive. The name space of tags is administered by IANA. The two letter primary tag is an ISO 639 language abbreviation, while the initial subtag is a two letter ISO 3166 country code. Example values for LANG include: en, en-US, en-uk, i-cherokee, x-pig-latin. DIR This specifies an override to the base direction for laying out text as determined from the language context (see LANG attribute). The attribute value is either DIR=LTR for left to right or DIR=RTL for right to left rendering of text. The value is case insensitive. The attribute sets the base direction for the text for this element and is overridden by directives within the content, e.g. by LANG and DIR attributes on nested elements. ------------------------------------------------------------------------------ Horizontal and Vertical Alignment Attributes The alignment of cell contents can be specified on a cell by cell basis, or inherited from enclosing elements, such as the row, column or the table element itself. ALIGN This specifies the horizontal alignment of cell contents. Dave Raggett Page 8 HTML Tables 03 Oct 95 The attribute value should be one of LEFT, CENTER, RIGHT, JUSTIFY and CHAR. User agents may treat JUSTIFY as left alignment if they lack support for text justification. ALIGN=CHAR is used for aligning cell contents on a particular character. For cells spanning multiple rows or columns, where the alignment property is inherited from the row or column, the initial row and column for the cell determines the appropriate alignment property to use. Note that an alignment attribute on elements within the cell, e.g. on a P element, overrides the normal alignment value for the cell. CHAR This is used to specify an alignment character for use with align=char, e.g. char=":". The default character is the decimal point for the current language, as set by the LANG attribute. The CHAR attribute value is case sensitive. CHAROFF An integer value that defines the offset to the alignment character as a percentage of the cell width, e.g. charoff=70. The default value is charoff=50, i.e. midway through the cell. Only the first occurence of the alignment character is significant for alignment purposes. If a line doesn't include the alignment character, it should be horizontally shifted to end at the alignment position. Note that this applies whether the text is displayed left to right, or right to left. If several cells in different rows for the same column use character alignment, then all such cells should line up, regardless of which character is used for alignment. VALIGN Defines whether the cell contents are aligned with the top, middle or bottom of the cell. If present, the value of the attribute should be one of: TOP, MIDDLE, BOTTOM or BASELINE. All cells in the same row with valign=baseline should be vertically positioned so that the first text line in each such cell occur on a common baseline. This constraint does not apply to subsequent text lines in these cells. Dave Raggett Page 9 HTML Tables 03 Oct 95 Inheritance Order Alignment properties can be included with most of the table elements: COL, THEAD, TBODY, TFOOT, TR, TH and TD. When rendering cells, horizontal alignment is determined by columns in preference to rows, while for vertical alignment, the rows are more important than the columns. The following table gives the detailed precedence order for each attribute: ALIGN (TH|TD) < COL < TR < (THEAD|TBODY|TFOOT) < default VALIGN (TH|TD) < TR < (THEAD|TBODY|TFOOT) < COL < default LANG (TH|TD) < TR < (THEAD|TBODY|TFOOT) < COL < TABLE < default Properties defined on cells take precedence over inherited properties, but are in turn over-ridden by alignment properties on elements within cells. In the absence of an ALIGN attribute along the inheritance path, the recommended default alignment for table cell contents is ALIGN=LEFT for table data and ALIGN=CENTER for table headers. The recommended default for vertical alignment is VALIGN=MIDDLE. ------------------------------------------------------------------------------ Standard Units for Widths Several attributes specify widths as a number followed by an optional suffix. The units for widths are specified by the suffix: pt denotes points, pi denotes picas, in denotes inches, cm denotes centimeters, mm denotes millimeters, em denotes em units (equal to the height of the default font), and px denotes screen pixels. The default units are screen pixels. The number is an integer value or a real valued number such as "2.5". Exponents, as in "1.2e2", are not allowed. White space is not allowed between the number and the suffix. ------------------------------------------------------------------------------ The TABLE element Dave Raggett Page 10 HTML Tables 03 Oct 95 The TABLE element requires both start and end tags. Table elements start with an optional CAPTION element, optionally followed by one or more COL elements, then an optional THEAD, an optional TFOOT, and finally one or more TBODY elements. ------------------------------------------------------------------------------ ID, LANG and CLASS See earlier description of common attributes. FLOAT Defines whether the table is part of the main text flow or, whether it floats to the left (FLOAT=LEFT) or right (FLOAT=RIGHT), with the main text flow continuing around it. The default is deliberately unspecified. --The attribute is named FLOAT to distinguish it from the ALIGN attribute, which is used for setting the default horizontal alignment of cell contents.-- WIDTH Specifies the desired width of the table. In addition to the standard units, the "%" sign may used to indicate that the width specifies the percentage width of the space between the current left and right margins, e.g. width="50%". It is recommended that the table width be increased beyond the value indicated by the WIDTH attribute as needed to avoid clipping of cell contents. In the absence of this attribute, the table width can be determined by the layout algorithm given later on. COLS Specifies the number of columns for the table. If present the user agent may render the table dynamically as data is received from the network without waiting for the complete table to be received. If the WIDTH attribute is missing, a default of "100%" may be assumed for this purpose. If the COLS attribute is absent, a prepass through the table's contents is needed to determine the number of columns together with suitable values for the widths of each column. BORDER Specifies the width of the border framing the the table, see standard units. FRAME Specifies which sides of the frame to render. NONE (the default, but see below) Don't render any parts of the frame. TOP Dave Raggett Page 11 HTML Tables 03 Oct 95 The top part of the frame BOTTOM The bottom part of the frame TOPBOT The top and bottom parts of the frame SIDES The left and right sides of the frame ALL All four parts of the frame BORDER All four parts of the frame These values are compatible with the CALS table model with the exception of "border" which has been added for backwards compatibility with deployed browsers. If a document includes the user agent will see FRAME=BORDER and BORDER=--implied--. If the document includes
then the user agent should treat this as frame=border except if --n=0-- for which FRAME=NONE is appropriate. RULES Specifies where to place rules within the table. NONE (the default if BORDER is absent or BORDER=0) The table should be rendered without any internal rulings. BASIC The THEAD, TFOOT and TBODY elements divide the table into groups of rows. This choice places a horizontal rule between each such group. ROWS Place horizontal rules between all rows. User agents may choose to use a heavier rule between groups of rows for emphasis. COLS Place vertical rules between groups of columns as defined by COL elements, plus horizontal rules between row groups (see rules=basic). ALL (default if BORDER=--n-- and --n-- is non-zero) Place rules between all rows and all columns. User agents may choose to use a heavier rule between groups of rows and columns for emphasis. Dave Raggett Page 12 HTML Tables 03 Oct 95 CELLSPACING Specifies the space between individual cells in a table. See standard units. CELLPADDING Specifies the amount of space between the border of the cell and its contents. See standard units. ------------------------------------------------------------------------------ Table Captions The optional CAPTION element is used to provide a caption for the table. Both start and end tags are required. ID, LANG and CLASS See earlier description of common attributes. ALIGN This may be used to control the placement of captions relative to the table. When present, the ALIGN attribute should have one of the values: TOP, BOTTOM, LEFT and RIGHT. It is recommended that the caption is made to fit within the width or height of the table as appropriate. The default position of the caption is deliberately unspecified. --The ALIGN attribute is overused in HTML, but is retained here for compatibility with currently deployed browsers.-- ------------------------------------------------------------------------------ The COL Element Dave Raggett Page 13 HTML Tables 03 Oct 95 This optional element is used to specify column based defaults for table properties. It is an empty element, and as such has no content, and shouldn't be given an end tag. Several COL elements may be given in succession. ID, LANG and CLASS See earlier description of common attributes. SPAN A positive integer value that specifies how many columns this element applies to, defaulting to one. In the absence of SPAN attributes the first COL element applies to the first column, the second COL element to the second column and so on. If the second COL element had span=2, it would apply to the second and third column. The next COL element would then apply to the fourth column and so on. SPAN=0 has a special significance and implies that the COL element spans all columns from the current column up to and including the last column. WIDTH Specifies the width of the columns, see standard units. In addition, the "*" suffix denotes relative widths, e.g. width=64 width in screen pixels width=0.5* a relative width of 0.5 Relative widths act as constraints on the relative widths of different columns. When widths are given in absolute units, the user agent can use these to constrain the width of the table. --Percentage widths are inappropriate as you would need to check that the numbers add up, and they would all have to be changed if a column was inserted or removed. The "*" suffix is used to simplify importing tables from the CALS representation.-- ALIGN, CHAR, CHAROFF and VALIGN Specify values for horizontal and vertical alignment within table cells. See inheritance order of alignment properties. ------------------------------------------------------------------------------ Table Head, Foot and Body Elements Tables may be divided up into head and body sections. The THEAD and TFOOT elements are optional, but one or more TBODY elements are always required. If the table only consists of a TBODY section, the TBODY start and end tags may be omitted, as the parser can infer them. If a THEAD element is present, the THEAD start tag is required, but the end tag can be omitted, provided a TFOOT or TBODY start tag follows. The same applies to TFOOT. --This definition provides compatibility with tables created for the older model, as well as allowing the end tags for THEAD, TFOOT and TBODY to be omitted.-- The THEAD, TFOOT and TBODY elements provide a convenient means for controlling rendering. If the table has a large number of rows in the body, user agents may choose to use a scrolling region for the table body sections. When rendering to a paged device, tables will often have to be broken across page boundaries. The THEAD, TFOOT and TBODY elements allow the user agent to repeat the table foot at the bottom of the current page, and then the table head at the top of the new page before continuing on with the table body. TFOOT is placed before the TBODY in the markup sequence, so that browsers can render the foot before receiving all of the table data. This is useful when very long tables are rendered with scrolling body sections, or for paged output, involving breaking the table over many pages. Each THEAD, TFOOT and TBODY element must contain one or more TR elements. ID, LANG and CLASS See earlier description of common attributes. ALIGN, CHAR, CHAROFF and VALIGN Specify values for horizontal and vertical alignment within table cells. See inheritance order of alignment properties. ------------------------------------------------------------------------------ Table Row (TR) elements Dave Raggett Page 15 HTML Tables 03 Oct 95 The TR or table row element acts as a container for a row of table cells. The end tag may be omitted. ID, LANG and CLASS See earlier description of common attributes. ALIGN, CHAR, CHAROFF and VALIGN Specify values for horizontal and vertical alignment within table cells. See inheritance order of alignment properties. ------------------------------------------------------------------------------ Table Cells: TH and TD TH elements are used to represent header cells, while TD elements are used to represent data cells. This allows user agents to render header and data cells distinctly, even in the absence of style sheets. Cells can span multiple rows and columns, and may be empty. Cells spanning rows contribute to the column count on each of the spanned rows, but only appear in the markup once (in the first row spanned). The row count is determined by the number of TR elements. Any rows implied by cells spanning rows beyond this should be ignored. If the column count for the table is greater than the number of cells for a given row (after including cells for spanned rows), the missing cells are treated as occurring on the right hand side of the table and rendered as empty cells. If the language context indicates a right to left writing order, then the missing cells should be placed on the left hand side. It is possible to create tables with overlapping cells, for instance:
123 Dave Raggett Page 16 HTML Tables 03 Oct 95
4
56
which might look something like: /-----------\ | 1 | 2 | 3 | | |-------| | | 4 | | |---|...|---| | 5 : | 6 | \-----------/ In this example, the cells labelled 4 and 5 overlap. In such cases, the rendering is implementation dependent. The AXIS and AXES attributes for cells provide a means for defining concise labels for cells. When rendering to speech, these attributes may be used to provide abbreviated names for the headers relevant to each cell. Another application is when you want to be able to later process table contents to enter them into a database. These attributes are then used to give database field names. The table's class attribute should be used to let the software recognize which tables can be treated in this way. ID, LANG and CLASS See earlier description of common attributes. AXIS This defines an abbreviated name for a header cell, e.g. which can be used when rendering to speech. It defaults to the cell's content. AXES This is a comma separated list of axis names which together identify the row and column headers that pertain to this cell. It is used for example when rendering to speech to identify the cell's position in the table. If missing the user agent can try to follow up columns and left along rows (right for some languages) to find the corresponding header cells. NOWRAP, e.g. The presence of this attribute disables automatic wrapping of text lines for this cell. If used uncautiously, it may result in excessively wide cells. ROWSPAN, e.g. A positive integer value that defines how may rows this cell spans. The default ROWSPAN is 1. ROWSPAN=0 has a special significance and implies that the cell spans all rows from the current row up to the last row of the table. Dave Raggett Page 17 HTML Tables 03 Oct 95 COLSPAN, e.g. A positive integer value that defines how may columns this cell spans. The default COLSPAN is 1. COLSPAN=0 has a special significance and implies that the cell spans all columns from the current column up to the last column of the table. ALIGN, CHAR, CHAROFF and VALIGN Specify values for horizontal and vertical alignment within table cells. See inheritance order of alignment properties. ------------------------------------------------------------------------------ Recommended Layout Algorithms If the COLS attribute on the TABLE element specifies the number of columns, then the table may be rendered using a fixed layout, otherwise the autolayout algorithm described below should be used. Fixed Layout Algorithm For this algorithm, it is assumed that the number of columns is known. The column widths by default should be set to the same size. Authors may override this by specifying relative or absolute column widths, using the COL element. The default table width is the space between the current left and right margins, but may be overridden by the WIDTH attribute on the TABLE element, or determined from absolute column widths. The table syntax alone is insufficient to guarantee the consistency of attribute values. For instance, the number of columns specified by the COLS attribute may be inconsistent with the number of columns implied by the COL elements. This in turn, may be inconsistent with the number of columns implied by the table cells. A further problem occurs when the columns are too narrow to avoid clipping cell contents. The width of the table as specified by the TABLE element or COL elements may result in clipping of cell contents. It is recommended that user agents attempt to recover gracefully from these situations. Autolayout Algorithm If the COLS attribute is missing from the table start tag, then the user agent should use the following autolayout algorithm. It uses two passes through the table data and scales linearly with the size of the table. In the first pass, line wrapping is disabled, and the user agent keeps track of the minimum and maximum width of each cell. The maximum width is given by the widest line. As line wrap has been disabled, paragraphs are treated as long lines unless broken by
elements. The minimum width is given by the widest word or image etc. taking into account leading indents and list bullets etc. In other words, if you were to format the cell's content in a window of Dave Raggett Page 18 HTML Tables 03 Oct 95 its own, determine the minimum width you could make the window before things begin to be clipped. To cope with character alignment of cell contents, the algorithm keeps three running min/max totals for each column: Left of align char, right of align char and un-aligned. The minimum width for a column is then: max(min_left + min_right, min_non-aligned). The minimum and maximum cell widths are then used to determine the corresponding minimum and maximum widths for the columns. These in turn, are used to find the minimum and maximum width for the table. Note that cells can contain nested tables, but this doesn't complicate the code significantly. The next step is to assign column widths according to the current window size (more accurately - the width between the left and right margins). For cells which span multiple columns, a simple approach, as used by Arena, is to evenly apportion the min/max widths to each of the constituent columns. A slightly more complex approach is to use the min/max widths of unspanned cells to weight how spanned widths are apportioned. Experimental study suggests a blend of the two approaches will give good results for a wide range of tables. The table borders and intercell margins need to be included in assigning column widths. There are three cases: 1. The minimum table width is equal to or wider than the available space. In this case, assign the minimum widths and allow the user to scroll horizontally. For conversion to braille, it will be necessary to replace the cells by references to notes containing their full content. By convention these appear before the table. 2. The maximum table width fits within the available space. In this case, set the columns to their maximum widths. 3. The maximum width of the table is greater than the available space, but the minimum table width is smaller. In this case, find the difference between the available space and the minimum table width, lets call it W. Lets also call D the difference between maximum and minimum width of the table. For each column, let d be the the difference between maximum and minimum width of that column. Now set the column's width to the minimum width plus d times W over D. This makes columns with lots of text wider than columns with smaller amounts. This assignment step is then repeated for nested tables. In this case, the width of the enclosing table's cell plays the role of the current window size in the above description. This process is repeated recursively for all nested tables. If the table width is specified with the WIDTH attribute, the user Dave Raggett Page 19 HTML Tables 03 Oct 95 agent attempts to set column widths to match. The WIDTH attribute is not binding if this results in columns having less than their minimum widths. If relative widths are specified with the COL element, the algorithm is modified to increase column widths over the minimum width to meet the relative width constraints. The COL elements should be taken as hints only, so columns shouldn't be set to less than their minimum width. Similarly, columns shouldn't be made so wide that the table stretches well beyond the extent of the window. If a COL element specifies a relative width of zero, the column should always be set to its minimum width. ------------------------------------------------------------------------------ Dave Raggett Page 20 HTML Tables 03 Oct 95 HTML Table DTD The DTD or document type definition provides the formal definition of the allowed syntax for html tables. Dave Raggett Page 21 HTML Tables 03 Oct 95 ------------------------------------------------------------------------------ Dave Raggett Page 23 HTML Tables 03 Oct 95 References Arena W3C's HTML3 browser, see "http://www.w3.org/pub/WWW/Arena/". Arena was originally created as a proof of concept demo for ideas in the HTML+ specification that preceded HTML3. The browser is now being re-implemented to provide a reference implementation of HTML3 along with support for style sheets and client-side scripting. CALS Continuous Acquisition and Life-Cycle Support (formerly Computer-aided Acquisition and Logistics Support) (CALS) is a Department of Defense (DoD) strategy for achieving effective creation, exchange, and use of digital data for weapon systems and equipment. More information can be found from the US Navy CALS home page at http://navysgml.dt.navy.mil/cals.html HTML 3.0 HyperText Markup Language Specification Version 3.0. This is the initial draft specification as published in March 1995. Work on refining HTML3 is proceeding piecemeal with the new table specification as one of the pieces. For W3C related work on HTML, see "http://www.w3.org/pub/WWW/MarkUp/". RFC 1766 "Tags for the Identification of Languages", by H. Alvestrand, UNINETT, March 1995. This document can be downloaded from "ftp://ds.internic.net/rfc/rfc1766.txt". ------------------------------------------------------------------------------ Dave Raggett Page 24