Selectors Serialization Format for Testing (SSFT) Specification

Working Draft

This Version
<http://suika.fam.cx/www/markup/selectors/ssft/ssft>
Latest Version
<http://suika.fam.cx/www/markup/selectors/ssft/ssft>
Version History
<http://suika.fam.cx/gate/cvs/markup/selectors/ssft/ssft.en.html>
Author
<>

Abstract

...

Status of This Document

This section describes the status of this document at the time of its publication. Other documents might supersede this document.

This document is a working draft, produced as part of the Whatpm subproject of the manakai project. It might be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Comments on this document are welcome and may be sent to the author.

Translations of thie document might be available. The English version of the document is the only normative version.

Introduction

This section is non‐normative.

Selectors [SELECTORS] are patterns that match against elements in a tree structure and are used in Web technologies, including CSS [CSS], HTML5 [HTML5], XBL2 [XBL2], and Selectors API [SAPI].

...

The CSSOM specification [CSSOM] is trying to define an algorithm to serialize a group of selectors for the purpose of selectorText attribute. However, the algorithm is incomplete . Note that current Web browser implementations of the selectorText attribute are consistently broken in not escaping unsafe identifiers.

Terminology

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119 [KEYWORDS].

Requirements phrased in the imperative as part of algorithms are to be interpreted with the meaning of the key word (MUST) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps MAY be implemented in any manner, so long as the end result is equivalent.

All examples and notes in this specification are non‐normative, as are all sections explicitly marked non‐normative. Everything else in this specification is normative.

For the purpose of this specification, pseudo‐elements are also referred to as simple selectors and are considered as part of the last sequence of simple selector in a selector.

Serializing Selectors in SSFT Representation

Subsections of this section define algorithms to serialize a simple selector or serialize a group of selectors.

See also the reference implementation for now.

The serialized representation will not contain any comment.

Serializing Tokens

To normalize an IDENT (n), the following algorithm MUST be used:

  1. Replace the escape sequences (\ and following characters) in n by characters represented by them respectively.
  2. If there is any character that is not one of U+002D HYPHEN-MINUS (-), U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER A (A) to U+005A LATIN CAPITAL LETTER Z (Z), U+005F LOW LINE (_), U+0061 LATIN SMALL LETTER A (a) to U+006A LATIN SMALL LETTER Z (z), U+0080 to U+D7FF, or U+E000 to U+10FFFF in n, then replace it by string \HHHHHH, where HHHHHH is the upper‐case zero‐padded hexadecimal representation of the character. If the character is outside the range of [U+0000, U+10FFFF], HHHHHH MUST be FFFFFF.
  3. If the first character in n is one of U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), then prepend a string \00003 at the beginning of n.
  4. If the first character in n is U+002D HYPHEN-MINUS (-) and the second character, if any, is not one of U+0041 LATIN CAPITAL LETTER A (A) to U+005A LATIN CAPITAL LETTER Z (Z), U+005F LOW LINE (_), U+0061 LATIN SMALL LETTER A (a) to U+006A LATIN SMALL LETTER Z (z), U+0080 to U+D7FF, or U+E000 to U+10FFFF, or n is equal to U+002D HYPHEN-MINUS (-), then replace the U+002D HYPHEN-MINUS (-) character by string \00002D.
  5. Then, n is the result of the algorithm.

To normalize a STRING (s), the following algorithm MUST be used:

  1. Replace the escape sequences (\ and following characters) in s by characters represented by them (or an empty string if \ is immediately followed by newline character(s)) respectively.

  2. Remove the quotation characters in s if s is a STRING.

    The input s to this algorithm may be an IDENT.

  3. If there is any character that is not one of U+0020 SPACE, U+0021 EXCLAMATION MARK (!), U+0023 NUMBER SIGN (#) to U+005B LEFT SQUARE BRACKET, U+005D RIGHT SQUARE BRACKET to U+D7FF, or U+E000 to U+10FFFF in s, then replace it by string \HHHHHH, where HHHHHH is the upper‐case zero‐padded hexadecimal representation of the character. If the character is outside the range of [U+0000, U+10FFFF], HHHHHH MUST be FFFFFF.

  4. Insert a U+0022 QUOTATION MARK (") at the beginning of s.

  5. Insert a U+0022 QUOTATION MARK (") at the end of s.

  6. Then, s is the result of the algorithm.

A newline SHOULD be represented as a U+000A LINE FEED character. However, depending on the context where the SSFT representation is in use, a sequence of U+000D CARRIAGE RETURN followed by a U+000A LINE FEED (i.e. a CRLF sequence) or a character U+000D CARRIAGE RETURN MAY be used as a newline instead. Newline representations SHOULD be consistent in an SSFT representation.

Therefore, newlines must be normalized before SSFT representations are literally compared.

Need to define lower‐case. It is tr/A-Z/a-z/ for now.

Serializing Simple Selectors

No SSFT representation is defined for an invalid simple selector.

To serialize a simple selector (ss), the following algorithm MUST be used:

  1. Let r an empty string.
  2. Process ss as following:
    If ss is a type or universal selector
    1. Append a namespace prefix as following:
      If a namespace URI is specified (via namespace prefix, or by omitting namespace prefix where a default namespace is specified)
      @@
      If the null namespace is specified (by zero‐length namespace prefix)
      Append a U+007C VERTICAL BAR (|) character to r.
      If no namespace is specified (by namespace prefix *, or by omitting namespace prefix where no default namespace is specified)
      Append a U+002A ASTERISK (*) character, followed by a U+007C VERTICAL BAR (|) character, to r.
    2. If ss is a type selector, append the normalized local name to r.
    3. If ss is a universal selector, append a U+002A ASTERISK (*) character to r.
    If ss is an attribute selector
    1. Append a U+005B LEFT SQUARE BRACKET ([) to r.
    2. Append a namespace prefix as following:
      If a namespace URI is specified
      @@
      @@ Current reference implementation is wrong, since its result of sorting depends on namespace prefix.
      If the null namespace is specified
      Append a U+007C VERTICAL BAR character to r.
      If no namespace is specified
      Append a U+002A ASTERISK (*) character, followed by a U+007C VERTICAL BAR (|) character, to r.
    3. Append the normalized local name to r.
    4. If a value is specified in ss:
      1. Depending on the way to match, append =, ~=, |=, ^=, $=, or *= to r.
      2. Append the normalized value to r.
    5. Append a U+005D RIGHT SQUARE BRACKET (]) to r.
    If ss is a class selector
    Append a U+002E FULL STOP (.) character, followed by normalized class name, to r.
    If ss is an ID selector
    Append a U+0023 NUMBER SIGN (#) character, followed by normalized ID, to r.
    If ss is a pseudo‐class selector
    1. Append a U+003A COLON (:) character to r.
    2. Append the normalized name (in lower‐case) of the pseudo‐class to r.
    3. Append any arguments as following:
      If the name of the pseudo‐class is lang
      1. Append a U+0028 LEFT PARENTHESIS (() character to r.
      2. Append the normalized language tag (the argument) to r. Should we normalize cases?
      3. Append a U+0029 RIGHT PARENTHESIS ()) character to r.
      If the name of the pseudo‐class is -manakai-contains
      1. Append a U+0028 LEFT PARENTHESIS (() character to r.
      2. Append the normalized string (the argument) to r.
      3. Append a U+0029 RIGHT PARENTHESIS ()) character to r.
      If name of the pseudo‐class is either nth-child, nth-last-child, nth-of-type, or nth-last-of-type
      1. Append a U+0028 LEFT PARENTHESIS (() character to r.
      2. Append @@ normalized integer a to r. If the argument is a keyword odd or even, a is 2. If a and n is omitted, a is assumed as 0. Otherwise, if a is omitted, it is assumed as 1 (or -1, if there is a U+002D HYPHEN-MINUS (-) character).
      3. Append a U+006E LATIN SMALL LETTER N (n) character to r.
      4. Append @@ normalized integer b to r. It MUST be preceded by a sign, either U+002B PLUS SIGN (+) or U+002D HYPHEN-MINUS (-) character. If the argument is a keyword odd, b is 1. If the argument is a keyword even, b is 0. If b is omitted, it is assumed as 0. The sign of 0 MUST be U+002B PLUS SIGN (+).
      5. Append a U+0029 RIGHT PARENTHESIS ()) character to r.
      If the name of the pseudo‐class is not
      1. Append a string of U+003A COLON, U+006E LATIN SMALL LETTER N, U+006F LATIN SMALL LETTER O, U+0074 LATIN SMALL LETTER T, and U+0028 LEFT PARENTHESIS (:not(), followed by a newline, to r.
      2. Append four U+0020 SPACE characters to r.
      3. Append four U+0020 SPACE characters to r again.
      4. Serialize the simple selector specified as the argument to the not pseudo‐class.
      5. Append a newline to r.
      6. Append four U+0020 SPACE characters to r.
      7. Append a U+0029 RIGHT PARENTHESIS ()) character to r.
      Otherwise
      The pseudo‐class is not supported by this version of SSFT and therefore no SSFT representation is defined for ss. Stop the algorithm.
    If ss is a pseudo‐element
    1. Append two U+003A COLON characters (::) to r.
    2. Append the normalized name (in lower‐case) of the pseudo‐element to r.
    3. If the pseudo‐element has any arguments: The pseudo‐element is not supported by this version of SSFT and therefore no SSFT representation is defined for ss. Stop the algorithm.
  3. Then, r is the SSFT representation of ss.

Serializing Groups of Selectors

No SSFT representation is defined for an invalid group of selectors.

To serialize a group of selectors (gs), the following algorithm MUST be used:

  1. Let result an empty string.
  2. For each selector (selector) in gs:
    1. If selector is not the first selector in the gs, then append a U+002C COMMA (,) followed by a newline to result.
    2. Append four U+0020 SPACE characters to result.
    3. Process each sequence of simple selectors or combinator in selector as following:
      If it is a sequence of simle selector (sss)
      1. Serialize each simple selector in sss.
      2. sort
      3. Append a newline to result.
      If it is a combinator (c)
      1. Append two U+0020 SPACE characters to result.
      2. Process c as following:
        If c is descendant combinator (white space)
        Append a U+0020 SPACE character to result.
        If c is child combinator (>)
        Append a U+003E GREATER-THAN SIGN (>) character to result.
        If c is adjacent sibling combinator (+)
        Append a U+002B PLUS SIGN (+) character to result.
        If c is general sibling combinator (~)
        Append a U+007E TILDE (~) character to result.
      3. Append a U+0020 SPACE character to result.
  3. Then, result is the SSFT representation of gs.

Parsing SSFT Representations of Selectors

Since the serializing algorithm for groups of selectors is so designed that it always outputs a valid group of selectors, no special parser for the SSFT representation is necessary. Any conforming parser for groups of selectors MAY be used to parse SSFT representations of groups of selectors.

Likewise, the serializing algorithm for simple selectors always outputs a valid simple selector and therefore any conforming parser for simple selector MAY be used to parse SSFT representations of simple selectors.

Examples

This section is non‐normative.

Try demo, by choosing Selectors radio button.

...
...

References

Normative References

KEYWORDS
Key words for use in RFCs to Indicate Requirement Levels, IETF BCP 14, RFC 2119, . This version of the specification is referenced.
SELECTORS
Selectors, W3C Working Draft, . Work in progress. The latest version of the specification is referenced.

Non‐normative References

CSS
Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification, W3C Candidate Recommendation, . Work in progress. The latest version of the specification is available at <http://www.w3.org/TR/CSS21>.
CSSOM
Cascading Style Sheets Object Model (CSSOM), W3C Editor's Draft, . Work in progress. The latest Editor's Draft of the specification is available at <http://dev.w3.org/csswg/cssom/Overview.html>.
HTML5
HTML 5, WHATWG Working Draft. Work in progress.
SAPI
Selectors API, W3C Editor's Draft, . Work in progress. The latest Editor's Draft of the specification is available at <http://dev.w3.org/2006/webapi/selectors-api/Overview.html>. The latest published version of the specification is available at <http://www.w3.org/TR/selectors-api/>.
XBL2
XBL 2.0, Mozilla.org, . Work in progress. The latest W3C‐published version of the specification is available at <http://www.w3.org/TR/xbl/>.