Selectors Serialization Format for Testing (SSFT) Specification

Abstract

...

Status of This Document

This section describes the status of this document at the time of its publication. Other documents might supersede this document.

This document is a working draft, produced as part of the Whatpm subproject of the manakai project. It might be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Comments on this document are welcome and may be sent to the author.

Translations of thie document might be available. The English version of the document is the only normative version.

Introduction

This section is non‐normative.

Selectors [SELECTORS] are patterns that match against elements in a tree structure and are used in Web technologies, including CSS [CSS], HTML5 [HTML5], XBL2 [XBL2], and Selectors API [SAPI].

...

The CSSOM specification [CSSOM] is trying to define an algorithm to serialize a group of selectors for the purpose of selectorText attribute. However, the algorithm is incomplete at the time of the writing. Note that current Web browser implementations of the selectorText attribute are consistently broken in not escaping unsafe identifiers.

Terminology

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119 [KEYWORDS].

Requirements phrased in the imperative as part of algorithms are to be interpreted with the meaning of the key word (MUST) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps MAY be implemented in any manner, so long as the end result is equivalent.

All examples and notes in this specification are non‐normative, as are all sections explicitly marked non‐normative. Everything else in this specification is normative.

For the purpose of this specification, pseudo‐elements are also referred to as simple selectors and are considered as part of the last sequence of simple selector in a selector.

Serializing Selectors as SSFT Representation

See the reference implementation for now.

The serialized representation will not contain any comment.

Serializing Tokens

To normalize an IDENT (n), the following algorithm MUST be used:

Replace the escape sequences (\ and following characters) in n by characters represented by them respectively.
If there is any character that is not one of U+002D HYPHEN-MINUS (-), U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER A (A) to U+005A LATIN CAPITAL LETTER Z (Z), U+005F LOW LINE (_), U+0061 LATIN SMALL LETTER A (a) to U+006A LATIN SMALL LETTER Z (z), U+0080 to U+D7FF, or U+E000 to U+10FFFF in n, then replace it by string \HHHHHH, where HHHHHH is the upper‐case zero‐padded hexadecimal representation of the character. If the character is outside the range of [U+0000, U+10FFFF], HHHHHH MUST be FFFFFF.
If the first character in n is one of U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), then prepend a string \00003 at the beginning of n.
If the first character in n is U+002D HYPHEN-MINUS (-) and the second character, if any, is not one of U+0041 LATIN CAPITAL LETTER A (A) to U+005A LATIN CAPITAL LETTER Z (Z), U+005F LOW LINE (_), U+0061 LATIN SMALL LETTER A (a) to U+006A LATIN SMALL LETTER Z (z), U+0080 to U+D7FF, or U+E000 to U+10FFFF, or n is equal to U+002D HYPHEN-MINUS (-), then replace the U+002D HYPHEN-MINUS (-) character by string \00002D.
Then, n is the result of the algorithm.

To normalize a STRING (s), the following algorithm MUST be used:

Replace the escape sequences (\ and following characters) in s by characters represented by them (or an empty string if \ is immediately followed by newline character(s)) respectively.
Remove the quotation characters in s if s is a STRING.

The input s to this algorithm may be an IDENT.
If there is any character that is not one of U+0020 SPACE, U+0021 EXCLAMATION MARK (!), U+0023 NUMBER SIGN (#) to U+005B LEFT SQUARE BRACKET, U+005D RIGHT SQUARE BRACKET to U+D7FF, or U+E000 to U+10FFFF in s, then replace it by string \HHHHHH, where HHHHHH is the upper‐case zero‐padded hexadecimal representation of the character. If the character is outside the range of [U+0000, U+10FFFF], HHHHHH MUST be FFFFFF.
Insert a U+0022 QUOTATION MARK (") at the beginning of s.
Insert a U+0022 QUOTATION MARK (") at the end of s.
Then, s is the result of the algorithm.

A newline SHOULD be represented as a U+000A LINE FEED character. However, depending on the context where the SSFT representation is in use, a sequence of U+000D CARRIAGE RETURN followed by a U+000A LINE FEED (i.e. a CRLF sequence) or a character U+000D CARRIAGE RETURN MAY be used as a newline instead. Newline representations SHOULD be consistent in an SSFT representation.

Therefore, newlines must be normalized before SSFT representations are literally compared.

Serializing Simple Selectors

No SSFT representation is defined for an invalid simple selector selector.

To serialize a simple selector (ss), the following algorithm MUST be used:

If ss is a type or universal selector

If a namespace URI is specified (via namespace prefix, or by omitting namespace prefix where a default namespace is specified)

@@

If the null namespace is specified (by zero‐length namespace prefix)

Append a U+007C VERTICAL BAR (|) character to result.

If no namespace is specified (by namespace prefix *, or by omitting namespace prefix where no default namespace is specified)

Append a U+002A ASTERISK (*) character, followed by a U+007C VERTICAL BAR (|) character, to result.
If ss is a type selector, append the normalized local name to result.
If ss is a universal selector, append a U+002A ASTERISK (*) character to result.

If ss is an attribute selector

Append a U+005B LEFT SQUARE BRACKET ([) to result.
Append a namespace prefix as following:

If a namespace URI is specified

@@

@@ Current reference implementation is wrong, since its result of sorting depends on namespace prefix.

If the null namespace is specified

Append a U+007C VERTICAL BAR character to result.

If no namespace is specified

Append a U+002A ASTERISK (*) character, followed by a U+007C VERTICAL BAR (|) character, to result.
Append the normalized local name to result.
If a value is specified in ss:
1. Depending on the way to match, append =, ~=, |=, ^=, $=, or *= to result.
2. Append the normalized value to result.
Append a U+005D RIGHT SQUARE BRACKET (]) to result.

If ss is a class selector

Append a U+002E

FULL
  STOP

(.) character, followed by normalized class name, result.

If ss is an ID selector

Append a U+0023

NUMBER
  SIGN

(#) character, followed by normalized ID, to result.

If ss is a pseudo‐class selector

Append a U+003A COLON (:) character to result.
Append the normalized name of the pseudo‐class to result.
Append any arguments as following:
If the name of the pseudo‐class is lang
1. Append a U+0028 LEFT PARENTHESIS (() character to result.
2. Append the normalized language tag (the argument) to result.
3. Append a U+0029 RIGHT PARENTHESIS ()) character to result.
If the name of the pseudo‐class is -manakai-contains
1. Append a U+0028 LEFT PARENTHESIS (() character to result.
2. Append the normalized string (the argument) to result.
3. Append a U+0029 RIGHT PARENTHESIS ()) character to result.
If name of the pseudo‐class is either nth-child, nth-last-child, nth-of-type, or nth-last-of-type
1. Append a U+0028 LEFT PARENTHESIS (() character to result.
2. Append @@ normalized integer a to result. If the argument is a keyword odd or even, a is 2. If a and n is omitted, a is assumed as 0. Otherwise, if a is omitted, it is assumed as 1 (or -1, if there is a U+002D HYPHEN-MINUS (-) character).
3. Append a U+006E LATIN SMALL LETTER N (n) character to result.
4. Append @@ normalized integer b to result. It MUST be preceded by a sign, either U+002B PLUS SIGN (+) or U+002D HYPHEN-MINUS (-) character. If the argument is a keyword odd, b is 1. If the argument is a keyword even, b is 0. If b is omitted, it is assumed as 0. The sign of 0 MUST be U+002B PLUS SIGN (+).
5. Append a U+0029 RIGHT PARENTHESIS ()) character to result.
If the name of the pseudo‐class is not
1. Append a string of U+003A COLON, U+006E LATIN SMALL LETTER N, U+006F LATIN SMALL LETTER O, U+0074 LATIN SMALL LETTER T, and U+0028 LEFT PARENTHESIS (:not(), followed by a newline, to result.
2. Append four U+0020 SPACE characters to result.
3. Append four U+0020 SPACE characters to result again.
4. Serialize the simple selector specified as the argument to the not pseudo‐class.
5. Append a newline to result.
6. Append four U+0020 SPACE characters to result.
7. Append a U+0029 RIGHT PARENTHESIS ()) character to result.

If ss is a pseudo‐element

Append two U+003A COLON characters (::) to result.
Append the normalized name of the pseudo‐element to result.

Serializing Groups of Selectors

No SSFT representation is defined for an invalid group of selectors.

To serialize a group of selectors (gs), the following algorithm MUST be used:

Let result an empty string.
For each selector (selector) in gs:
1. If selector is not the first selector in the group of selectors, then append a U+002C COMMA (,) followed by a newline to result.
2. Append four U+0020 SPACE characters to result.
3. Process each sequence of simple selectors or combinator in selector by following algorithm:
  If it is a sequence of simle selector (sss)
  1. Serialize each simple selector in sss.
  2. sort
  3. Append a newline to result.
  If it is a combinator (c)
  1. Append two U+0020 SPACE characters to result.
  2. If c is descendant combinator (white space)
    
    Append a U+0020 SPACE character to result.
    
    If c is child combinator (>)
    
    Append a U+003E GREATER-THAN SIGN (>) character to result.
    
    If c is adjacent sibling combinator (+)
    
    Append a U+002B PLUS SIGN (+) character to result.
    
    If c is general sibling combinator (~)
    
    Append a U+007E TILDE (~) character to result.
  3. Append a U+0020 SPACE character to result.
Then, result is the SSFT representation of gs.

Parsing SSFT Groups of Selectors

Since the serializing algorithm is so designed that it always output a valid group of selectors, no special parser for the SSFT representation is necessary. Any conforming parser for groups of selectors MAY be used to parse SSFT representations of groups of selectors.

Examples