<http://suika.fam.cx/www/markup/selectors/ssft/ssft>
<http://suika.fam.cx/www/markup/selectors/ssft/ssft>
<http://suika.fam.cx/gate/cvs/markup/selectors/ssft/ssft.en.html>
© Wakaba.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front‐Cover Texts, and no Back‐Cover
Texts. A copy of the license is
available at <http://www.gnu.org/copyleft/fdl.html>
.
This document defines SSFT, a subset of Selectors language which is suitable for describing results of Selectors parsing tests.
This section describes the status of this document at the time of its publication. Other documents might supersede this document.
This document is a working draft, produced as part of the
Whatpm
subproject
of the manakai
project. It might be updated, replaced, or obsoleted by
other documents at any time. It is inappropriate to
cite this document as other than work in progress
.
Comments on this document are welcome and may be sent to the author.
Translations of thie document might be available. The English version of the document is the only normative version.
This section is non‐normative.
Selectors [SELECTORS] are patterns that match against elements in a tree structure and are used in Web technologies, including CSS [CSS], HTML5 [HTML5], XBL2 [XBL2], and Selectors API [SAPI].
It is so flexible language that a conceptually same pattern can be represented in inifnite numbers of instances of selectors. It might be convinient for e.g. validation of a test result to define a subset of the language, in which no semantically equivalent selectors are contained.
The
CSSOM
specification
[CSSOM]
is trying to define an algorithm to
serialize
a group of selectors for the purpose of selectorText
attribute. However, the algorithm is incomplete
.
Note that current Web browser implementations
of the selectorText
attribute are consistently broken
in not escaping unsafe identifiers and therefore it is not wise
to deploy that format for testing.
The key words MUST
,
MUST NOT
,
SHOULD
, and
MAY
in this
document are to be interpreted as described in
RFC
2119
[KEYWORDS].
Requirements phrased in the imperative as part of algorithms are to be interpreted with the meaning of the key word (MUST) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps MAY be implemented in any manner, so long as the end result is equivalent.
All examples and notes in this specification are non‐normative, as are all sections explicitly marked non‐normative. Everything else in this specification is normative.
For the purpose of this specification, pseudo‐elements are also referred to as simple selectors and are considered as part of the last sequence of simple selector in a selector.
Subsections of this section define algorithms to serialize a simple selector or serialize a group of selectors.
The serialized representation will not contain any comment.
TODO: link terms to Selectors
To normalize
a string (n) as an IDENT
,
the following algorithm MUST be used:
IDENT
or a HASH
,
replace the escape sequences (\
and following characters) in
n by characters represented by them respectively.HASH
, remove the leading
U+0023
NUMBER SIGN
(#
) character from n.U+002D
HYPHEN-MINUS
(-
), U+0030
DIGIT ZERO
(0
) to U+0039
DIGIT NINE
(9
),
U+0041
LATIN CAPITAL LETTER
A
(A
) to U+005A
LATIN CAPITAL LETTER Z
(Z
),
U+005F
LOW LINE
(_
), U+0061
LATIN SMALL LETTER A
(a
) to U+006A
LATIN SMALL LETTER Z
(z
),
U+0080
to U+D7FF
, or
U+E000
to U+10FFFF
in
n, then replace it by string \HHHHHH
, where
HHHHHH is the uppercase
zero‐padded hexadecimal representation of the character. If the character
is outside the range of [U+0000
,
U+10FFFF
]
, HHHHHH
MUST be FFFFFF
.U+0030
DIGIT ZERO
(0
) to U+0039
DIGIT NINE
(9
),
then prepend a string \00003
at the beginning of
n.U+002D
HYPHEN-MINUS
(-
) and
the second character, if any, is not one of
U+0041
LATIN CAPITAL LETTER
A
(A
) to U+005A
LATIN CAPITAL LETTER Z
(Z
),
U+005F
LOW LINE
(_
), U+0061
LATIN SMALL LETTER A
(a
) to U+006A
LATIN SMALL LETTER Z
(z
),
U+0080
to U+D7FF
, or
U+E000
to U+10FFFF
, or
n is equal to U+002D
HYPHEN-MINUS
(-
), then
replace the U+002D
HYPHEN-MINUS
(-
)
character by string \00002D
.To normalize
a string (s) as a STRING
,
the following algorithm MUST be used:
IDENT
or a
STRING
, replace the escape sequences (\
and
following characters) in s by characters represented by them
(or an empty string if \
is immediately followed by
newline character(s)) respectively.STRING
, remove the quotation marks
enclosing s.If there is any character that is not one of
U+0020
SPACE
,
U+0021
EXCLAMATION MARK
(!
), U+0023
NUMBER SIGN
(#
) to
U+005B
LEFT SQUARE
BRACKET
, U+005D
RIGHT
SQUARE BRACKET
to U+D7FF
, or
U+E000
to U+10FFFF
in
s, then replace it by string \HHHHHH
, where
HHHHHH is the uppercase
zero‐padded hexadecimal representation of the character. If the character
is outside the range of [
, HHHHHH
MUST be U+0000
,
U+10FFFF
]FFFFFF
.
Insert a U+0022
QUOTATION
MARK
("
) at the beginning of
s.
Insert a U+0022
QUOTATION
MARK
("
) at the end of s.
Then, s is the result of the algorithm.
A newline
SHOULD be represented as a
U+000A
LINE FEED
character. However, depending on the context where
the SSFT
representation is in use, a sequence of
U+000D
CARRIAGE RETURN
followed by a U+000A
LINE FEED
(i.e. a
CRLF
sequence) or a character
U+000D
CARRIAGE RETURN
MAY be used as a newline instead.
Newline representations
SHOULD be consistent in an
SSFT
representation.
Therefore, newlines must be normalized before SSFT representations are literally compared.
In a future version of the specification, we need to define
lowercase (maybe by referencing some version of
some module of CSS). It is tr/A-Z/a-z/
for now.
To normalize an integer (i), the following algorithm MUST be used:
U+0030
DIGIT ZERO
(0
).
Stop the algorithm.U+002D
HYPHEN-MINUS
(-
) character to result.U+0030
DIGIT ZERO
(0
)
character.No SSFT representation is defined for an invalid simple selector.
To serialize a simple selector (ss), the following algorithm MUST be used:
IDENT
to r. Then, append a
U+007C
VERTICAL BAR
(|
)
character to r.U+007C
VERTICAL BAR
(|
)
character to r.*
,
or by omitting namespace prefix where no default namespace is
specified)U+002A
ASTERISK
(*
)
character, followed by a U+007C
VERTICAL BAR
(|
)
character, to r.U+002A
ASTERISK
(*
) character to r.U+005B
LEFT
SQUARE BRACKET
([
) to
r.IDENT
to r. Then, append a
U+007C
VERTICAL BAR
(|
)
character to r.U+007C
VERTICAL BAR
character to
r.U+002A
ASTERISK
(*
)
character, followed by a U+007C
VERTICAL BAR
(|
)
character, to r.=
,
~=
, |=
, ^=
,
$=
, or *=
to r.U+005D
RIGHT
SQUARE BRACKET
(]
) to
r.U+002E
FULL
STOP
(.
) character, followed
by normalized
class name, to r.U+0023
NUMBER
SIGN
(#
) character, followed
by normalized
ID, to r.U+003A
COLON
(:
) character
to r.lang
U+0028
LEFT
PARENTHESIS
((
) character
to r.U+0029
RIGHT PARENTHESIS
()
) character
to r.-manakai-contains
U+0028
LEFT
PARENTHESIS
((
) character
to r.U+0029
RIGHT PARENTHESIS
()
) character
to r.nth-child
,
nth-last-child
, nth-of-type
, or
nth-last-of-type
U+0028
LEFT
PARENTHESIS
((
) character
to r.odd
or even
, then let a be 2
.
If the a part with following n
character
is omitted, then let a be 0
.
Otherwise, if only the a part is omitted, then
let a be 1
. Otherwise, if the a
part is -
, then let a be
-1
.U+006E
LATIN SMALL LETTER N
(n
) character to r.odd
,
then let b be 1
. If the argument is a
keyword even
, then let b be 0
.
If the b part is omitted, then let b be
0
.U+002D
HYPHEN-MINUS
(-
)
character, then append U+002B
PLUS SIGN
(+
) to r.U+0029
RIGHT PARENTHESIS
()
) character
to r.not
U+003A
COLON
, U+006E
LATIN SMALL LETTER N
,
U+006F
LATIN
SMALL LETTER O
, U+0074
LATIN SMALL LETTER T
, and
U+0028
LEFT
PARENTHESIS
(:not(
), followed by a
newline, to r.U+0020
SPACE
characters to
r.U+0020
SPACE
characters to
r again.not
pseudo‐class.U+0020
SPACE
characters to
r.U+0029
RIGHT PARENTHESIS
()
) character
to r.U+003A
COLON
characters (::
)
to r.No SSFT representation is defined for an invalid group of selectors.
To serialize a group of selectors (gs), the following algorithm MUST be used:
U+002C
COMMA
(,
) followed by a
newline to result.U+0020
SPACE
characters to result.*
and append the result to
result.U+0020
SPACE
characters to result.U+0020
SPACE
character to
result.>
)U+003E
GREATER-THAN SIGN
(>
) character to result.+
)U+002B
PLUS SIGN
(+
)
character to result.~
)U+007E
TILDE
(~
)
character to result.U+0020
SPACE
character to
result.Since the serializing algorithm for groups of selectors is so designed that it always outputs a valid group of selectors, no special parser for the SSFT representation is necessary. Any conforming parser for groups of selectors MAY be used to parse SSFT representations of groups of selectors.
Likewise, the serializing algorithm for simple selectors always outputs a valid simple selector and therefore any conforming parser for simple selector MAY be used to parse SSFT representations of simple selectors.
This section is non‐normative.
The SSFT
representation of a selector p + q
is:
*|p
+ *|q
The SSFT
representation of a selector a#id.class1:n\ot(:Active)/*
comment
*/.class2
is:
*|a.class1.class2#id:not(
:active
)
The SSFT
representation of a
group of selectors
a, #b, C[d^=e], .\31 23
is:
*|a
,
*|*#b
,
*|C[|d^="e"]
,
*|*.\00003123
<http://www.w3.org/TR/CSS21>
.<http://dev.w3.org/csswg/cssom/Overview.html>
.<http://dev.w3.org/2006/webapi/selectors-api/Overview.html>
.
The latest published version of the specification is available at
<http://www.w3.org/TR/selectors-api/>
.<http://www.w3.org/TR/xbl/>
.