Selectors Serialization Format for Testing (SSFT) Specification

Abstract

This document defines SSFT, a subset of Selectors language which is suitable for describing results of Selectors parsing tests.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents might supersede this document.

This document is a working draft, produced as part of the Whatpm subproject of the manakai project. It might be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Comments on this document are welcome and may be sent to the author.

Translations of thie document might be available. The English version of the document is the only normative version.

Introduction

This section is non‐normative.

Selectors [SELECTORS] are patterns that match against elements in a tree structure and are used in Web technologies, including CSS [CSS], HTML5 [HTML5], XBL2 [XBL2], and Selectors API [SAPI].

It is so flexible language that a conceptually same pattern can be represented in inifnite numbers of instances of selectors. It might be convinient for e.g. validation of a test result to define a subset of the language, in which no semantically equivalent selectors are contained.

The CSSOM specification [CSSOM] is trying to define an algorithm to serialize a group of selectors for the purpose of selectorText attribute. However, the algorithm is incomplete at the time of the writing. Note that current Web browser implementations of the selectorText attribute are consistently broken in not escaping unsafe identifiers and therefore it is not wise to deploy that format for testing.

Terminology

The key words MUST, MUST NOT, SHOULD, and MAY in this document are to be interpreted as described in RFC 2119 [KEYWORDS].

Requirements phrased in the imperative as part of algorithms are to be interpreted with the meaning of the key word (MUST) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps MAY be implemented in any manner, so long as the end result is equivalent.

All examples and notes in this specification are non‐normative, as are all sections explicitly marked non‐normative. Everything else in this specification is normative.

For the purpose of this specification, pseudo‐elements are also referred to as simple selectors and are considered as part of the last sequence of simple selector in a selector.

Serializing Selectors in SSFT Representation

Subsections of this section define algorithms to serialize a simple selector or serialize a group of selectors.

The serialized representation will not contain any comment.

TODO: link terms to Selectors

Serializing Tokens

To normalize a string (n) as an IDENT, the following algorithm MUST be used:

If n is either an IDENT or a HASH, replace the escape sequences (\ and following characters) in n by characters represented by them respectively.
If n is a HASH, remove the leading U+0023 NUMBER SIGN (#) character from n.
If there is any character that is not one of U+002D HYPHEN-MINUS (-), U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER A (A) to U+005A LATIN CAPITAL LETTER Z (Z), U+005F LOW LINE (_), U+0061 LATIN SMALL LETTER A (a) to U+006A LATIN SMALL LETTER Z (z), U+0080 to U+D7FF, or U+E000 to U+10FFFF in n, then replace it by string \HHHHHH, where HHHHHH is the uppercase zero‐padded hexadecimal representation of the character. If the character is outside the range of [U+0000, U+10FFFF], HHHHHH MUST be FFFFFF.
If the first character in n is one of U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), then prepend a string \00003 at the beginning of n.
If the first character in n is U+002D HYPHEN-MINUS (-) and the second character, if any, is not one of U+0041 LATIN CAPITAL LETTER A (A) to U+005A LATIN CAPITAL LETTER Z (Z), U+005F LOW LINE (_), U+0061 LATIN SMALL LETTER A (a) to U+006A LATIN SMALL LETTER Z (z), U+0080 to U+D7FF, or U+E000 to U+10FFFF, or n is equal to U+002D HYPHEN-MINUS (-), then replace the U+002D HYPHEN-MINUS (-) character by string \00002D.
Then, n is the result of the algorithm.

To normalize a string (s) as a STRING, the following algorithm MUST be used:

If s is either an IDENT or a STRING, replace the escape sequences (\ and following characters) in s by characters represented by them (or an empty string if \ is immediately followed by newline character(s)) respectively.
If s is a STRING, remove the quotation marks enclosing s.
If there is any character that is not one of U+0020 SPACE, U+0021 EXCLAMATION MARK (!), U+0023 NUMBER SIGN (#) to U+005B LEFT SQUARE BRACKET, U+005D RIGHT SQUARE BRACKET to U+D7FF, or U+E000 to U+10FFFF in s, then replace it by string \HHHHHH, where HHHHHH is the uppercase zero‐padded hexadecimal representation of the character. If the character is outside the range of [U+0000, U+10FFFF], HHHHHH MUST be FFFFFF.
Insert a U+0022 QUOTATION MARK (") at the beginning of s.
Insert a U+0022 QUOTATION MARK (") at the end of s.
Then, s is the result of the algorithm.

A newline SHOULD be represented as a U+000A LINE FEED character. However, depending on the context where the SSFT representation is in use, a sequence of U+000D CARRIAGE RETURN followed by a U+000A LINE FEED (i.e. a CRLF sequence) or a character U+000D CARRIAGE RETURN MAY be used as a newline instead. Newline representations SHOULD be consistent in an SSFT representation.

Therefore, newlines must be normalized before SSFT representations are literally compared.

In a future version of the specification, we need to define lowercase (maybe by referencing some version of some module of CSS). It is tr/A-Z/a-z/ for now.

To normalize an integer (i), the following algorithm MUST be used:

If i is equal to zero (0), return a string with only a character: U+0030 DIGIT ZERO (0). Stop the algorithm.
Let result be an empty string.
If i is less than zero (0), append a U+002D HYPHEN-MINUS (-) character to result.
Append the decimal representation of the absolute value of i to result. The decimal (base‐ten) representation MUST NOT begin with a U+0030 DIGIT ZERO (0) character.
Return result.

Serializing Simple Selectors

No SSFT representation is defined for an invalid simple selector.

To serialize a simple selector (ss), the following algorithm MUST be used:

Let r an empty string.
Process ss as following:
If ss is a type or universal selector
1. Append a namespace prefix as following:
  
  If a namespace URI is specified (via namespace prefix, or by omitting namespace prefix where a default namespace is specified)
  
  Append the namespace URI as a normalized IDENT to r. Then, append a U+007C VERTICAL BAR (|) character to r.
  
  If the null namespace is specified (by zero‐length namespace prefix)
  
  Append a U+007C VERTICAL BAR (|) character to r.
  
  If no namespace is specified (by namespace prefix *, or by omitting namespace prefix where no default namespace is specified)
  
  Append a U+002A ASTERISK (*) character, followed by a U+007C VERTICAL BAR (|) character, to r.
2. If ss is a type selector, append the normalized local name to r.
3. If ss is a universal selector, append a U+002A ASTERISK (*) character to r.
If ss is an attribute selector
1. Append a U+005B LEFT SQUARE BRACKET ([) to r.
2. Append a namespace prefix as following:
  
  If a namespace URI is specified
  
  Append the namespace URI as a normalized IDENT to r. Then, append a U+007C VERTICAL BAR (|) character to r.
  
  If the null namespace is specified
  
  Append a U+007C VERTICAL BAR character to r.
  
  If no namespace is specified
  
  Append a U+002A ASTERISK (*) character, followed by a U+007C VERTICAL BAR (|) character, to r.
3. Append the normalized local name to r.
4. If a value is specified in ss:
  1. Depending on the way to match, append =, ~=, |=, ^=, $=, or *= to r.
  2. Append the normalized value to r.
5. Append a U+005D RIGHT SQUARE BRACKET (]) to r.
If ss is a class selector

Append a U+002E FULL STOP (.) character, followed by normalized class name, to r.

If ss is an ID selector

Append a U+0023 NUMBER SIGN (#) character, followed by normalized ID, to r.

If ss is a pseudo‐class selector
1. Append a U+003A COLON (:) character to r.
2. Append the normalized name (in lowercase) of the pseudo‐class to r.
3. Append any arguments as following:
  If the name of the pseudo‐class is lang
  
  Append a U+0028 LEFT PARENTHESIS (() character to r.
  
  Append the normalized language tag (the argument) to r. Should we normalize cases?
  
  Append a U+0029 RIGHT PARENTHESIS ()) character to r.
  
  If the name of the pseudo‐class is -manakai-contains
  
  Append a U+0028 LEFT PARENTHESIS (() character to r.
  
  Append the normalized string (the argument) to r.
  
  Append a U+0029 RIGHT PARENTHESIS ()) character to r.
  
  If name of the pseudo‐class is either nth-child, nth-last-child, nth-of-type, or nth-last-of-type
  
  Append a U+0028 LEFT PARENTHESIS (() character to r.
  
  Let a be the a part of the argument. If the argument is a keyword odd or even, then let a be 2. If the a part with following n character is omitted, then let a be 0. Otherwise, if only the a part is omitted, then let a be 1. Otherwise, if the a part is -, then let a be -1.
  
  Let a be the normalized string representation of integer a.
  
  Append a to r.
  
  Append a U+006E LATIN SMALL LETTER N (n) character to r.
  
  Let b be the b part of the argument. If the argument is a keyword odd, then let b be 1. If the argument is a keyword even, then let b be 0. If the b part is omitted, then let b be 0.
  
  Let b be the normalized string representation of integer b.
  
  If b does not start with a U+002D HYPHEN-MINUS (-) character, then append U+002B PLUS SIGN (+) to r.
  
  Append b to r.
  
  Append a U+0029 RIGHT PARENTHESIS ()) character to r.
  
  If the name of the pseudo‐class is not
  
  Append a string of U+003A COLON, U+006E LATIN SMALL LETTER N, U+006F LATIN SMALL LETTER O, U+0074 LATIN SMALL LETTER T, and U+0028 LEFT PARENTHESIS (:not(), followed by a newline, to r.
  
  Append four U+0020 SPACE characters to r.
  
  Append four U+0020 SPACE characters to r again.
  
  Serialize the simple selector specified as the argument to the not pseudo‐class.
  
  Append a newline to r.
  
  Append four U+0020 SPACE characters to r.
  
  Append a U+0029 RIGHT PARENTHESIS ()) character to r.
  
  Otherwise
  
  The pseudo‐class is not supported by this version of SSFT and therefore no SSFT representation is defined for ss. Stop the algorithm.
If ss is a pseudo‐element
1. Append two U+003A COLON characters (::) to r.
2. Append the normalized name (in lowercase) of the pseudo‐element to r.
3. If the pseudo‐element has any arguments: The pseudo‐element is not supported by this version of SSFT and therefore no SSFT representation is defined for ss. Stop the algorithm.
Then, r is the SSFT representation of ss.

Serializing Groups of Selectors

No SSFT representation is defined for an invalid group of selectors.

To serialize a group of selectors (gs), the following algorithm MUST be used:

Let result an empty string.
For each selector (selector) in gs:
1. If selector is not the first selector in the gs, then append a U+002C COMMA (,) followed by a newline to result.
2. Append four U+0020 SPACE characters to result.
3. Process each sequence of simple selectors or combinator in selector as following:
  If it is a sequence of simle selector (sss)
  1. If sss contains a universal or type selector, then serialize that simple selector and append the result to result.
  2. Otherwise, i.e. if sss does not contain a universal or type selector, then serialize a simple selector * and append the result to result.
  3. Serialize any attribute selectors in sss, sort the results, then append them to result in order.
  4. Serialize any class selectors in sss, sort the results, then append them to result in order.
  5. Serialize any ID selectors in sss, sort the results, then append them to result in order.
  6. Serialize any pseudo‐class selectors in sss, sort the results, then append them to result in order.
  7. Serialize any pseudo elements in sss, sort the results, then append them to result in order.
  8. Append a newline to result.
  If it is a combinator (c)
  1. Append two U+0020 SPACE characters to result.
  2. Process c as following:
    
    If c is descendant combinator (white space)
    
    Append a U+0020 SPACE character to result.
    
    If c is child combinator (>)
    
    Append a U+003E GREATER-THAN SIGN (>) character to result.
    
    If c is adjacent sibling combinator (+)
    
    Append a U+002B PLUS SIGN (+) character to result.
    
    If c is general sibling combinator (~)
    
    Append a U+007E TILDE (~) character to result.
  3. Append a U+0020 SPACE character to result.
Then, result is the SSFT representation of gs.

Parsing SSFT Representations of Selectors

Since the serializing algorithm for groups of selectors is so designed that it always outputs a valid group of selectors, no special parser for the SSFT representation is necessary. Any conforming parser for groups of selectors MAY be used to parse SSFT representations of groups of selectors.

Likewise, the serializing algorithm for simple selectors always outputs a valid simple selector and therefore any conforming parser for simple selector MAY be used to parse SSFT representations of simple selectors.

Examples

This section is non‐normative.

The SSFT representation of a selector p + q is:

    *|p
  + *|q

The SSFT representation of a selector a#id.class1:n\ot(:Active)/* comment */.class2 is:

    *|a.class1.class2#id:not(
        :active
    )

The SSFT representation of a group of selectors a, #b, C[d^=e], .\31 23 is:

    *|a
,
    *|*#b
,
    *|C[|d^="e"]
,
    *|*.\00003123

References

Normative References

KEYWORDS: Key words for use in RFCs to Indicate Requirement Levels, IETF BCP 14, RFC 2119, March 1997. This version of the specification is referenced.
SELECTORS: Selectors, W3C Working Draft, 15 December 2005. Work in progress. The latest version of the specification is referenced.

Non‐normative References

CSS: Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification, W3C Candidate Recommendation, 19 July 2007. Work in progress. The latest version of the specification is available at <http://www.w3.org/TR/CSS21>.
CSSOM: Cascading Style Sheets Object Model (CSSOM), W3C Editor's Draft, 18 June 2007. Work in progress. The latest Editor's Draft of the specification is available at <http://dev.w3.org/csswg/cssom/Overview.html>.
HTML5: HTML 5, WHATWG Working Draft, 3 November 2007. Work in progress.
SAPI: Selectors API, W3C Editor's Draft, 29 August 2007. Work in progress. The latest Editor's Draft of the specification is available at <http://dev.w3.org/2006/webapi/selectors-api/Overview.html>. The latest published version of the specification is available at <http://www.w3.org/TR/selectors-api/>.
XBL2: XBL 2.0, Mozilla.org, 15 Mar 2007. Work in progress. The latest W3C‐published version of the specification is available at <http://www.w3.org/TR/xbl/>.