This document defines the SWML syntax and the SWML vocabulary.
document
element in the SuikaWiki/0.9 namespaceName
attribute in
the SuikaWiki/0.9 namespaceVersion
attribute in the SuikaWiki/0.9 namespaceparameter
element in the SuikaWiki/0.9 namespacevalue
element
in the SuikaWiki/0.9 namespaceclass
attributeid
attributexml:lang
attributedr
element in the
SuikaWiki/0.9 namespacecomment-p
element in the
SuikaWiki/0.10 namespacehistory
element in the
SuikaWiki/0.9 namespaceexample
element in the
SuikaWiki/0.9 namespacepreamble
element in the
SuikaWiki/0.9 namespacepostamble
element in the
SuikaWiki/0.9 namespacebox
element in the
SuikaWiki/0.9 namespaceanchor
element in the
SuikaWiki/0.9 namespaceanchor-internal
element in the
SuikaWiki/0.9 namespaceanchor-end
element in the
SuikaWiki/0.9 namespaceanchor
attribute
in the SuikaWiki/0.9 namespaceanchor-external
element in the
SuikaWiki/0.9 namespaceresScheme
attribute in the SuikaWiki/0.9 namespaceresParameter
attribute
in the SuikaWiki/0.9 namespacesw-l
element in the
SuikaWiki/0.9 namespacesw-lt
element in the
SuikaWiki/0.9 namespacesw-r
element in the
SuikaWiki/0.9 namespacesw-rt
element in the
SuikaWiki/0.9 namespacesw-v
element in the
SuikaWiki/0.9 namespacesw-vt
element in the
SuikaWiki/0.9 namespacesw-vb
element in the
SuikaWiki/0.9 namespacesw-vbt
element in the
SuikaWiki/0.9 namespacesw-tate
element in the
SuikaWiki/0.9 namespaceyoko
element in the
SuikaWiki/0.9 namespacesw-mirrored
element in the
SuikaWiki/0.9 namespacesw-left
element in the
SuikaWiki/0.9 namespacesw-right
element in the
SuikaWiki/0.9 namespacesw-vlr
element in the
SuikaWiki/0.9 namespacesw-vrl
element in the
SuikaWiki/0.9 namespacesw-leftbox
element in the
SuikaWiki/0.9 namespacesw-rightbox
element in the
SuikaWiki/0.9 namespacesw-leftbtbox
element in the
SuikaWiki/0.9 namespacesw-rightbtbox
element in the
SuikaWiki/0.9 namespacesw-vlrbox
element in the
SuikaWiki/0.9 namespacesw-vrlbox
element in the
SuikaWiki/0.9 namespacefenced
element in
the SuikaWiki/0.9 namespaceopenfence
element in
the SuikaWiki/0.9 namespacefencedtext
element in
the SuikaWiki/0.9 namespaceclosefence
element in
the SuikaWiki/0.9 namespacelines
element in
the SuikaWiki/0.9 namespaceline
element in
the SuikaWiki/0.9 namespacedotabove
element in
the SuikaWiki/0.9 namespacesw-macron
element in
the SuikaWiki/0.9 namespacevector
element in
the SuikaWiki/0.9 namespacesubsup
element in
the SuikaWiki/0.9 namespacesubscript
element in
the SuikaWiki/0.9 namespacesuperscript
element in
the SuikaWiki/0.9 namespacef
element in the
SuikaWiki/0.9 namespacekey
element in the
SuikaWiki/0.10 namespacen
element in the
SuikaWiki/0.9 namespacelat
element in the
SuikaWiki/0.9 namespacelon
element in the
SuikaWiki/0.9 namespacetz
element in
the SuikaWiki/0.9 namespaceattrvalue
element in the
SuikaWiki/0.10 namespaceThis section is non-normative.
This specification defines SuikaWiki Markup Language (SWML). SWML is the markup language developed and implemented for SuikaWiki hypertext system. It is also used by other systems as a human-friendly document authoring format.
SuikaWiki's Wiki syntax (now known as SWML text serialization) derived from WalWiki, which derived from YukiWiki, in .
The first specification of the extended language, SuikaWiki/0.9 Document Markup Format: Syntax Specification, was published in and was frequently updated until .
Then several updates to the language, known as SuikaWiki/0.10, were (incompletely) defined by following documents:
These "two" versions of the language was merged and rewritten as the SWML specification in 2008.
Previous versions of the SWML specification were published at https://suika.suikawiki.org/www/markup/suikawiki/spec/swml-work.
There is an obsolete list of possible new features that might have been introduced in a revision of this specification.
Revisions of the SWML specification are now available in the GitHub repository.
This specification depends on the Infra Standard.
For historical reason, different elements and attributes defined or used in this specification belong to different namespaces.
The AA namespace is
http://pc5.2ch.net/test/read.cgi/hp/1096723178/aavocab#
.
The preferred prefix is aa
.
The aa
element is defined by
Strict-HTML
スレッド24 >>16.
The HTML namespace's preferred prefix
is html
. Following elements
are defined by the HTML
Standard:
abbr
b
blockquote
body
cite
code
dd
del
dfn
div
dt
dl
em
figcaption
figure
h1
head
hr
html
i
ins
kbd
li
ol
p
pre
q
rt
ruby
samp
section
span
strong
sub
sup
table
tbody
td
th
time
tr
u
ul
var
The HTML3 namespace is
urn:x-suika-fam-cx:markup:ietf:html:3:draft:00:
.
The note
element is defined in
the HTML 3.0 specification and belongs to the HTML3
namespace for the purpose of this specification.
The MathML namespace's preferred prefix
is math
. Following elements
are defined by
the MathML
specification:
mfrac
mroot
msqrt
mtext
munder
munderover
The SuikaWiki/0.9 namespace is
urn:x-suika-fam-cx:markup:suikawiki:0:9:
. The
preferred prefix is sw
.
The SuikaWiki/0.10 namespace is
urn:x-suika-fam-cx:markup:suikawiki:0:10:
. The
preferred prefix is sw10
.
The XHTML2 namespace is
http://www.w3.org/2002/06/xhtml2/
.
The XML namespace's preferred prefix
is xml
.
The terms byte sequence, code point, string, ASCII case-insensitive, forgiving-base64 encode, forgiving-base64 decode, ordered map, key, value, exists, empty, HTML namespace, MathML namespace, XML namespace, XMLNS namespace, are defined by the Infra Standard.
The term
Vertical_Orientation
is defined by
The Unicode
Standard and
UAX #50.
Terms node tree, node, parent, child, append, element, create an element, element's local name, element's namespace, element's namespace prefix, descendant text content, attribute, get an attribute value, set an attribute value, attribute's local name, attribute's namespace, and attribute's namespace prefix, attribute's value, attribute's element, are defined by the DOM Standard.
Terms applicable specification, expected, content attribute, IDL attribute, valid integer, rules for parsing integers, valid e-mail address, HTML element, represents, inter-element whitespace, text, flow content, phrasing content, script-supporting elements, and nothing are defined by the HTML Standard.
Terms convert, UTF-8, and UTF-8 decode are defined by the Encoding Standard.
Terms URL and valid URL string are defined by the URL Standard.
Terms MIME type, essence, and XML MIME type are defined by the MIME Sniffing Standard.
White space characters are U+0009 CHARACTER TABULATION
and U+0020 SPACE
.
Digits are characters in the range U+0030 DIGIT ZERO
.. U+0039 DIGIT
NINE
.
Uppercase letters are characters in the range U+0041
LATIN CAPITAL LETTER A
.. U+005A LATIN CAPITAL LETTER Z
.
Lowercase letters are characters in the range U+0061
LATIN SMALL LETTER A
.. U+007A LATIN SMALL LETTER Z
.
Language tag characters are digits,
uppercase letters, lowercase letters, and
U+002D HYPHEN-MINUS
.
Scheme characters are digits,
uppercase letters, lowercase letters, U+0025
PERCENT SIGN
, U+002A PLUS SIGN
, U+002D HYPHEN-MINUS
, U+002E FULL
STOP
and U+005F LOW LINE
.
A language specification is a string consist of a
@
character followed by zero or more language tag characters.
The body of a language
specification is the substring in the language
specification except for the first @
character.
It might be the empty string.
Semantically, the body of a language specification
represents a language tag, similar to
the xml:lang
attribute.
Terms document white space and preserved white space are defined by the CSS Text specification.
This section is non‐normative.
Obviously, this section is incomplete; some prose definition is not yet available; some xrefs does not work yet. It should be specified why this is non-normative. ABNF definition & charset consideration need to be addressed.
Both prose and ABNF descriptions are non-normative. The conformance of a SWML text serialization document is defined in terms of the parser and its output.
Conformance checking steps
A document in the SWML text serialization consists of three parts: header part, body part, and optional image.
Several construct in a document refers page. A page is a unit of data in a hypertext database. The name of a page is sometimes referred to as WikiName. A page sometimes represents or is associated with an image. How to implement these concept, including how to resolve WikiNames, is not defined in this specification.
The header part has to be empty. In previous versions of SWML, a magic line could be contained, and in fact was required in some versions, in the header part of a document.
A magic line has to contain a
string #?
, followed by the format name, followed by a
/
character, followed by the format version. They identifies
the version of the markup language in which the document is written. Historically, only
two combinations of format name
and format version as shown
in the table below were defined, used, and implemented:
Format name | Format version | Description |
---|---|---|
SuikaWiki
| 0.9
| The SuikaWiki/0.9 markup language. |
SuikaWikiImage
| 0.9
| The SuikaWikiImage/0.9 markup language. |
A magic line can contain zero
or more parameters after the format version. A parameter consists of one or more
white space characters, followed by the name, followed by a =
character, followed by a quoted
string whose value representing zero or
more values separeted by a
,
character.
A parameter value consists
of zero or more text characters
except for the separator character ,
. Historically,
following combinations of parameter
names and values was
defined and used:
Name | Values | Description |
---|---|---|
default-name
| Zero or more text characters
except for ,
| The value represetns the default user name for WikiForm input fields. Exactly one value can be specified. The default when this parameter is implementation dependent. |
import
| Zero or more text characters
except for ,
| A value represents the WikiName by which definitions for entity references are imported. When this parameter is not specified, no definition is imported. |
interactive
| yes or
no
| Value yes
means that the document contains
an interactive content such as WikiForm.
Value no , the default
value used when the parameter is
not specified, means the document does
not contain such a content. It was intended to be used for the convinience
of cache control mechanisms.
|
obsolete
| yes or
no
| Value yes means
the content of the document is
obsolete, and value no ,
the default value used when the parameter
is not specified, means the content is not obsolete.
The parameter name |
obsoleted
| ||
page-icon
| Zero or more text characters
except for ,
| The value represents the WikiName by which the page icon is imported. The page icon can be used as favicon @@ [ref], for example. Exactly one value can be specified. The default when this parameter is implementation dependent. |
image-alt
| Zero or more text characters
except for ,
| The value represents the alternative text for the image embedded in the document. Exactly one value can be specified. The default when this parameter is the empty string. |
image-type
| An Internet Media Type with no parameter, white spaces, comments | The value represents the type of the image embedded in the document. Exactly one value can be specified. This parameter has to be specified when the document contains an image. |
The order in which parameters are specified is not significant. The parameter name of a parameter has to be different from the parameter name of any other parameter.
A magic line has to be terminated by zero or more white space characters followed by a newline.
The body part of a document consists of zero or more blocks.
There are several kinds of blocks: paragraphs, headings, lists, labeled lists, quotations, preformatted paragraphs, editted sections, tables, editorial notes, comment paragraphs, hrs, and empty blocks. In addition, forms and entity references can also be used as blocks.
Empty blocks, which is represented by an empty line, can be inserted between any two blocks. It is sometimes necessary to prevent a block from being interpreted as a part of the previous block.
For example, consider the following fragment:
- List item.
This line is part of the list item.
The second line is part of the list, by definition. If it is not desired, an empty block can be inserted between two lines as:
- List item.
This line is not part of the list item.
... such that the third line represents a paragraph.
A paragraph represents a unit of
the text, similar to HTML's p
element. It consists of an
optional destination
anchor number, followed by a line contents, followed by a newline, followed by zero or more block children.
A paragraph cannot begin with a form or entity reference, since it is treated as a block when it appears at the beginning of a line. A paragraph cannot begin with a white space character, since it is treated as a preformatted paragraph then.
A block child is one of an optional destination anchor number followed by line contents followed by a newline, a list, a labeled list, a preformatted paragraph, an section block, a table, an editorial note, a comment paragraph, or an hr.
An editorial note
represents an editorial note. It is represented by a
string @@
, followed by zero or more white space
characters, followed by zero or
more block children.
A comment paragraph
represents a note. It is represented by a string ;;
,
followed by zero or more white space characters, followed
by zero or more block children.
An hr represents a break in the run of
blocks in which it occurs, smilar to the HTML hr
element.
It is represented by a string -*-*-
, followed by an
optional class
specification, followed by zero or more white space
characters, finally followed by
a newline.
A heading introduces a section. It is represented by one or more
*
characters, followed by zero or more white space
characters, optionally followed by a destination anchor
number, optionally followed by line contents, followed by a newline. The number of the
*
represents the depth of
the section.
A heading with only
one *
character begins a
larger section than
a heading with more than
one *
characters.
The line contents represents
the name or caption for
the section.
There are three kinds of lists: ordered lists, unordered lists, and labeled lists. Ordered lists and unordered lists are called lists in this specification.
A list consists of zero or more
items. An item in the list is represented by one or more
-
or =
characters, followed by zero or more
white space characters, optionally followed by a destination anchor
number, optionally followed by line contents, followed by a newline, followed by zero or more block children. The number of
-
or =
characters at the beginning of the
item represents the depth of the list. In a list, depth of items has to be the same value. If
there is another list in block children, it's items' depth has to be greater than the depth of the parent item. The last character that
represents the depth of an item indicates the type of the list: -
indicates unordered
list while =
indicates ordered list. In a list all items has to be same type.
A labeled list consists of
one or more labeled list
items. A labeled list
item is represented by a :
character, followed by
zero or more white space characters, optionally followed
by a destination anchor
number, optionally followed by line contents, followed by zero or
more white space characters, followed by a :
character, followed by a destination anchor
number, followed by zero or more white space
characters, optionally followed by line contents, followed by newline, followed by zero or more block chidlren. The former line contents, if any, represents
the label. Block children
cannot contain a labeled list.
The following example contains no quotation:
>>1 This is a reference, not a quote.
A section block is a marked section of zero or more blocks, preceded by a section block start tag and followed by a section block end tag.
A section block start
tag is a [
character, followed by
a section block tag
name, followed by an
optional class
specification, followed by a [
character, followed
by zero or more white space characters, optionally
followed by line contents,
followed by a newline.
Whether the line contents component is allowed or not and its semantics depends on the section block tag name.
For example, the line
contents component of a FIG
block represents a caption (i.e. a short form
of FIGCAPTION
child.
A section block end
tag is a ]
character, followed by
a section block tag
name, followed by a ]
character, followed by zero
or more white space characters, followed by
a newline.
A section block tag name represents the type of the section block. The section block tag name of a section block has to be the same value. Their semantics are described by the Block Element Table.
A table represents a two-dimensional
tabular data. It is similar to HTML table
element, but
what can be represented is even narrower than HTML table model. A
table consists of one or
more table rows.
A table row consists of one or
more table cells. Syntactically
a table row is followed by
a newline.
There are three kinds of table
cells: data cells,
header cells, and
colspan cells. The
first cell in
a row has to be
a data cell or
a header cell.
Syntactically a cell is preceded
by a ,
character followed by zero or more white
space characters, and is followed by zero or more white
space characters.
A data cell represents a
cell that contains data, like HTML td
element. Likewise,
a header cell represents a
cell that contains data, like HTML th
element. The data
of a header cell has to
be preceded by a *
character. The cell consists of an
optional destination
anchor number, optionally followed
by line contents.
Syntactically, the cell can be provided as
a quoted string, in which
case its value is
interpreted as an
optional destination
anchor number, optionally followed
by line contents.
A colspan cell represents that the cell that would be placed there forms an integrated part of the cell just before that cell. The cell just before that cell might also be a colspan cell.
Need prose definitions...
External reference scheme | Syntax of external reference parameter | Semantics |
---|---|---|
IW
| (identifier /
quoted-string) ":"
(identifier /
quoted-string)
| InterWiki reference (An InterWikiName followed by a parameter) |
MAIL
| valid e-mail address | Internet mail address |
URI
| valid URL string | URL |
URL
| valid URL string | URL |
InterWiki is a mechanism for the hyperlinking and the combination of an InterWikiName and a parameter identifies the destination of the link. The interpretation of an InterWiki link is implementation dependent.
External reference
schemes URI
and URL
ought not to be used.
Tag name | Number of middle tags | Internal reference source anchor | External reference source anchor | Semantics |
---|---|---|---|---|
AA
| 0 | Not allowed | Not allowed | So-called ASCII-art (aa element)
|
ABBR
| 0 or 1 | Not allowed | Not allowed | Abbreviation (HTML abbr element)
|
ASIS
| 1 | Not allowed | Not allowed | “As-is” annotation (asis element)
|
B
| 0 | Not allowed | Not allowed | Bold (or sans-serif) text (HTML b element)
|
BR
| 0 | Not allowed | Not allowed | Explicit line break (sw-br element)
|
CITE
| 0 | Not allowed | Not allowed | Title of a work (HTML cite element)
|
CODE
| 0 or 1 | Not allowed | Not allowed | Code (HTML code element)
|
CURSIVE
| 0 | Not allowed | Not allowed | Cursive text (sw-cursive element)
|
CSECTION
| 0 | Not allowed | Not allowed | Title of a section in a work (csection element)
|
DEL
| 0 | Allowed | Allowed | Removal (HTML del element)
|
DFN
| 0 or 1 | Not allowed | Not allowed | Defined term (HTML dfn element)
|
DOTABOVE
| 0 | Not allowed | Not allowed | Text with dot above (dotabove element)
|
EMPH
| 0 | Not allowed | Not allowed | Emphasized text (emph element)
|
F
| 0 | Not allowed | Not allowed | Field name (f element)
|
FENCED
| 0, 1, or 2 | Not allowed | Not allowed | Text enclosed by parentheses (fenced element)
|
FRAC
| 1 | Not allowed | Not allowed | Fraction (MathML mfrac element)
|
I
| 0 | Not allowed | Not allowed | Italic text (HTML i element)
|
INS
| 0 | Allowed | Allowed | Insertion (HTML ins element)
|
KBD
| 0 | Not allowed | Not allowed | User input (HTML kbd element)
|
KEY
| 0 | Not allowed | Not allowed | Keyboard's key (key element)
|
L
| 0 | Not allowed | Not allowed | Horizontal left-to-right text (sw-l element)
|
LAT
| 0 or 1 | Not allowed | Not allowed | Latitude (lat element)
|
LINES
| 1 or more | Not allowed | Not allowed | Set of sublines (lines element)
|
LON
| 0 or 1 | Not allowed | Not allowed | Longitude (lon element)
|
LT
| 0 | Not allowed | Not allowed | Horizontal left-to-right, turned text (sw-lt
element)
|
MACRON
| 0 | Not allowed | Not allowed | Text with macron (sw-macron element)
|
MAY
| 0 | Not allowed | Not allowed | RFC 2119 keyword "MAY" (MAY element)
|
MIRRORED
| 0 | Not allowed | Not allowed | Mirrored text (sw-mirrored element)
|
MUST
| 0 | Not allowed | Not allowed | RFC 2119 keyword "MUST" (MUST element)
|
N
| 0 or 1 | Not allowed | Not allowed | Number (n element)
|
OKURI
| 0 or 1 | Not allowed | Not allowed | Okuri-gana annotations in Kambun (okuri element)
|
Q
| 0 | Allowed | Allowed | Quotation (HTML q element)
|
QN
| 0 or 1 | Not allowed | Not allowed | Qualified name (qn element)
|
R
| 0 | Not allowed | Not allowed | Horizontal right-to-left text (sw-r element)
|
ROOT
| 1 | Not allowed | Not allowed | Root (MathML mroot element)
|
RT
| 0 | Not allowed | Not allowed | Horizontal right-to-left, turned text (sw-rt
element)
|
RUBY
| 1 or 2 | Not allowed | Not allowed | Ruby annotation (HTML ruby element)
|
RUBYB
| 1 | Not allowed | Not allowed | Secondary ruby annotation (rubyb element)
|
SAMP
| 0 | Not allowed | Not allowed | Sample (HTML samp element)
|
SEE
| 0 | Not allowed | Not allowed | "See also" references (sw-see element)
|
SHOULD
| 0 | Not allowed | Not allowed | RFC 2119 keyword "SHOULD" (SHOULD element)
|
SMALLCAPS
| 0 | Not allowed | Not allowed | Small-caps text (smallcaps element)
|
SNIP
| 0 | Not allowed | Not allowed | Snipped point annotation (snip element)
|
SPAN
| 0 or 1 | Not allowed | Not allowed | Span of text (HTML span element)
|
SQRT
| 0 | Not allowed | Not allowed | Square root (MathML msqrt element)
|
SRC
| 0 | Not allowed | Not allowed | Short annotation for citation (src element)
|
SUBSUP
| 1 | Not allowed | Not allowed | Pair of subscript and superscript (subsup element)
|
SUB
| 0 | Not allowed | Not allowed | Subscript (HTML sub element)
|
SUP
| 0 | Not allowed | Not allowed | Superscript (HTML sup element)
|
TATE
| 0 | Not allowed | Not allowed | Vertical text (sw-tate element)
|
TIME
| 0 or 1 | Not allowed | Not allowed | Date or time (HTML time element)
|
TZ
| 0 or 1 | Not allowed | Not allowed | Time zone offset (tz element)
|
U
| 0 | Not allowed | Not allowed | Underlined text (HTML u element)
|
UNDER
| 0 | Not allowed | Not allowed | Text with an annotation under it (MathML munder element)
|
UNDEROVER
| 1 | Not allowed | Not allowed | Text with annotations under and over it (MathML munderover element)
|
V
| 0 | Not allowed | Not allowed | Vertical top-to-bottom text (sw-v element)
|
VB
| 0 | Not allowed | Not allowed | Vertical bottom-to-top text (sw-vb element)
|
VT
| 0 | Not allowed | Not allowed | Vertical top-to-bottom, turned text (sw-vt
element)
|
VBT
| 0 | Not allowed | Not allowed | Vertical bottom-to-top, turned text (sw-vbt
element)
|
VAR
| 0 | Not allowed | Not allowed | Variable (HTML var element)
|
VECTOR
| 0 | Not allowed | Not allowed | Text with vector symbol (vector element)
|
WEAK
| 0 | Not allowed | Not allowed | Small print (weak element)
|
YOKO
| 0 | Not allowed | Not allowed | Horizontal text (yoko element)
|
A future revison to this specification might define more tag names.
An inline start tag whose tag name is INS
or DEL
might not be placed at the beginning of a line contents construct, since it
could be interpreted as a block
start tag.
A class specification
represents class names unless otherwise specified. The class specification
syntactically consist of a (
character followed by the
body of the class
specification followed by a )
character.
The body of a class
specification consists of zero or
more text characters
excluding (
,
)
, and \
. The body of the class
specification has similar semantics and processed similarly to
HTML class
attribute.
The form name specification, if any, defines the name of the form. It has to be different from any other form name defined in the document. A form name specification is syntactically class specification and the body of it is the form name. A form name cannot contain white space characters.
Specific form name | Syntax of specific form parameters | Semantics |
---|---|---|
comment
| Empty | Comment input form. |
embed
| ['IMG:'] identifier
| Embedding another page. The parameter specifies the
WikiName of the page embedded. If
the parameter
begins with a string IMG: , the page is embedded as an
image and the string does not form the part of the WikiName.
|
form
| N/A | Reserved. |
rcomment
| Empty | Comment input form; a new comment is inserted after the form. |
searched
| identifier
| Insert a search result for the parameter. |
The form is an extension mechanism for the SWML text serialization. ...
The generic form can be used to embed a WikiForm specification. WikiForm provides a generic framework for describing user input forms and templates used for processing form inputs.
Three form fields in a form represents input template, output template, and options. Interpretation and processing for these fields are implementation dependent.
The name form cannot be used.
Names embed
, rcomment
, and searched
are obsolete and cannot be
used.
A document can contain
an image by including a
string __IMAGE__
followed by
a newline followed
by Base64-encoded image
data, at the end of
the document. Parameters image-type
and image-alt
provide
metadata for the image.
An entity reference is a
string __&&
, followed by
a entity reference
name, followed by a string &&__
.
An entity reference
name is a string of one or
more text characters that does
not contain substring &&__
.
If the Character Reference Table [entity reference name] exists, the entity reference represent a character Character Reference Table [entity reference name].
Otherwise, the entity reference is an obsolete entity reference, which was expected to be replaced by a fragment imported from another document. It is no longer supported.
A text character is a code point that is not U+000A or U+000D.
A newline is a U+000D character, a U+000A character, or a U+000D character followed by a U+000A character.
A quoted string is zero or
more text characters preceded by a
"
character and followed by a "
character.
In a quoted string, character
\
can only be used as part
of quoted pair.
A quoted pair is \
followed by a text character.
The value of a quoted
string is the string obtained by removing the first and
last "
characters removing the \
characters
at the beginning of any quoted
pair.
This section specifies how to convert a string into a node tree, assuming the string is written in the SWML text serialization. This process is referred to as parsing and an implementation that performes this process is referred to as parser.
How to convert a byte sequence into a string is outside of the scope of this specification.
The parser MUST run these steps to parse a string string:
Document
.
html
element
in the HTML namespace in document.
xmlns
attribute in the XMLNS
namespace to the HTML namespace.
head
element in
the HTML namespace.
body
element in
the HTML namespace.
Following definitions are used to describe the parsing algorithm:
manakai_append_text
method ... MUST be invoked on node with the
argument char.
null
, unless an attribute
in namespace is created, in which case
its namespace MUST be set to
the namespace's URL and its namespace
prefix MUST be set
to namespace's preferred prefix.
(
character, followed by zero or more sequence
of code point that is not one
of (
, )
, or white space
characters, and finally followed by a )
character. The body
of a class specification is the substring of the class
specification between parentheses (exclusive). It might be the
empty string.
When a string of characters is tokenized, the string s MUST be processed as follows:
CARRIAGE RETURN
, process line. Set
line to the empty string. If the (pos + 1)th
character of s is U+000A LINE
FEED
, increment
pos by one (1).
LINE
FEED
, process line.
Set
line to the empty string.
The steps above emit one or more sequence of tokens, which are inputs to the tree construction stage. A token can have zero or more properties, depending on the kind of the token. There are several kinds of tokens and properties as follows:
Mode is a state of the tokenizer and is one of "initial" (the initial value used when the tokenization starts), "body", "preformatted", "preformatted block", and "image data".
Continuous line flag is another flag of the tokenizer, representing whether a new line character should be appended to the data, and takes either true or false. This flag is mainly used in the "body" mode.
When a line is processed, rules specified in the following subsections is used according to the appropriate mode. Rules below sometimes require the line be reprocessed. In such cases, rules for the appropriate mode MUST be followed with the same line.
In the "initial" mode, line MUST be processed as follows:
#?
In the "body" mode, line MUST be processed as follows:
*
*
, run the following substeps:
*
.)
-*-*-
,
optionally followed by a class specification, followed by
zero or more white space characters
-
or =
-
or =
, run the following substeps:
:
:
.)
:
, run the following
substeps:
:
character.
In this case, line does not represent a description list.
:
.)
>
>
, run the following substeps:
>
.
>
characters to data.
LINE FEED
character to
data.
@@
, run the
following substeps:
@@
).
;;
, run the
following substeps:
;;
).
[
character, followed by
a section block tag
name, optionally followed by class specification,
followed by a [
character, followed by zero or
more white space characters, followed by zero or
more characters
[
character after
the section block tag
name and class specification (if any),
from line.
FIGCAPTION
.
TALK
,
set tag name to SPEAKER
.
[PRE
,
optionally followed by class specification, followed by
a [
character, followed by zero or more white space
characters
PRE
and classes is
the body of
the class specification, if any, or null otherwise.
@@
@@
.)
;;
;;
.)
]
character, followed by
a section block tag
name, followed by a ]
character, followed by zero
or more white space characters
,
__IMAGE__
LINE
FEED
character.
In the "preformatted" mode, line MUST be processed as follows:
]
character, followed by
a section block tag
name, followed by a ]
character, followed by zero
or more white space characters
LINE
FEED
character.
In the "preformatted block" mode, line MUST be processed as follows:
]PRE]
followed by zero or more white space characters
PRE
.
LINE
FEED
character.
In the "image data" mode, line MUST be processed as follows:
image
element in the SuikaWiki/0.9 namespace
and set the image element to that element. Append the
image element to the document element.
LINE FEED
to the image element.
The algorithm to tokenize a table row data is as follows:
*
character, set the header cell
flag and increase pos by one (1).
"
, set cell quoted to the empty
string and follow the substeps below:
"
, abort these substeps.
\
, follow the substeps below:
,
, abort these substeps.
==
, then emit a table colspan cell
token and go back to the step
labeled LOOP.
The algorithm to tokenize a text data is as follows:
[
followed by one or
more digits followed by ]
, run the following
steps:
anchor-end
,
namespace is the
SuikaWiki/0.9 namespace, anchor attribute is
number, and content is
[
followed by number followed by
]
.
[[#
, followed by one
or more lowercase letters or U+002D HYPHEN-MINUS
HYPHEN-MINUS
in the matched
substring.
:
,
run the following substeps:
]]
, abort
these substeps.
'
,
run the following steps:
'
, abort
these substeps.
\
, run
the following substeps:
:
, abort these substeps.
]]
, remove
these characters from data.
[[
[
, followed by one or
more uppercase letters, optionally followed by a
class specification, optionally followed by a
language specification, followed by [
]]
]<
, followed by one
or more scheme characters, followed by :
<
instead of ]<
, except
that the emitted token is an inline end tag token instead
of an element token. The resScheme attribute
of the token MUST be the resScheme attribute of the
token that would be emitted if the first two character were
<
. The resParameter
attribute of the token MUST be the resParameter attribute of
the token that would be emitted if the first two character were
<
.
]
, remove the
character from data.
]>>
followed by one or
more digits, followed by ]
]
followed by zero or more
white space characters followed by [
]
followed by zero or more
white space characters followed by a language
specification followed by [
<
, followed by one
or more scheme characters, followed by :
>
,
remove the first character of data and abort these steps.
"
,
append "
to data and run the following
substeps:
"
,
append "
to value, remove the first
character of data, and abort these substeps.
\
, run
the following substeps:
\
to value.
:
followed by value.
URI
.
anchor-external
, namespace is the SuikaWiki/0.9
namespace, resScheme
attribute is scheme, resParameter attribute is
value, and content
is content.
'''
''
>>
followed by one or
more digits
anchor-internal
, namespace is the SuikaWiki/0.9
namespace, anchor
attribute is the digits part of the matched
substring, and content is the
matched substring.
__&&
&&__
, or if
data does not contain &&__
as a
substring:
_
, _
, &
,
and &
, respectively.
&&__
, exclusive.
&&__
,
inclusive, from data.
replace
, namespace
is the SuikaWiki/0.9
namespace, by
attribute is name.
To parse a magic line data, the following steps MUST be used:
#?
.)
/
, set the
substring after the first occurence of the character to version. Note that version might become
the empty string. Remove the /
character and the
substring after the character from name.
Name
content
attribute of the document element in the
SuikaWiki/0.9 namespace to name.
Version
content attribute of
the document element in the SuikaWiki/0.9
namespace to version.
=
, set name to those characters
and remove those characters from data.
parameter
element in the
SuikaWiki/0.9 namespace and set the name
content
attribute of parameter to name.
=
.)
"
, remove that character from data.
"
, create a value
element in the SuikaWiki/0.9
namespace, set the textContent
IDL
attribute of the node to value, and append the
node to parameter.
\
, run the following substeps:
\
.)
,
, abort these substeps.
,
, create a value
element in the SuikaWiki/0.9 namespace, set the
textContent
IDL attribute of the node to
value, append the node to parameter,
and go back to the first substep of these substeps.
"
, remove that character from data.
The tree construction stage constructs a node tree from a series of tokens emitted by the tokenization stage. The tree construction stage has two state variables: insertion mode and stack of open elements.
The insertion mode is one of "in section", "in table row", or "in paragraph". The default that MUST be used when the tree construction begins is the "in section" insertion mode. The rules for these insertion modes are described in the subsections below.
When the algorithm below says that the parser is to do something “using the rules for the m insertion mode”, the parser MUST use the rules described under the m insertion mode's section, but MUST leave the insertion mode unchanged.
The stack of open elements contains tuples of
(element node, section depth, quotation
depth, list depth). These stack grows downdards; the
topmost entry on the stack is the first one added to the stack, and
the bottommost entry of the stack is the most recently added entry in
the stack. It initially contains only a tuple: (the body
element, 0
, 0
, 0
). When
an entry is pushed to the stack of open elements, the
items of the new tupple is set to the same values as the bottommost
tuple unless otherwise specified.
The current element is the element node of the bottommost entry in the stack of open elements.
In the "in section" insertion mode, a token MUST be processed as follows:
body
,
section
, and block elements, then pop the
element off the stack of open elements and follow this
substep again.
section
element in the HTML
namespace, append the element created to the current
element, push the element created to the stack of open
elements with section depth set to current depth + 1, quotation depth set to
zero (0), and list depth set to zero (0), and go back to
the first substep of these substeps.
section
element in the HTML
namespace.
h1
element in the HTML
namespace.
PRE
TALK
:
dialogue
:
dialogue
element in
the SuikaWiki/0.9 namespace.
dialogue
, pop the current
element off the stack of open elements.
class
attribute of the
element created to classes.
PRE
hr
element in the HTML
namespace.
class
attribute of the
element created to classes.
blockquote
, body
,
section
, and block elements, then pop the
element off the stack of open elements and follow this
substep again.
blockquote
element in the HTML
namespace, append the element created to the current
element, push the element created to the stack of open
elements with section depth set to zero (0),
quotation depth set to current depth +
1, and list depth set to zero (0), and go back to the
first substep of these substeps.
dialogue
, pop the current
element off the stack of open elements.
ul
, if the last
character in depth is
-
, or ol
, otherwise.
-
, create a ul
element in the HTML namespace.
ol
element in the HTML
namespace.
li
element in the HTML
namespace.
li
element in the HTML
namespace.
dialogue
, pop the current
element off the stack of open elements.
dd
, pop the element off the stack of open
elements.
dl
, create a dl
element in the
HTML namespace, append the element created to the
current element, and push the element created to the
stack of open elements.
dt
element in the HTML
namespace.
dialogue
, pop the current
element off the stack of open elements.
table
element in the HTML
namespace.
tbody
element in the HTML
namespace.
tr
element in the HTML
namespace.
PRE
dialogue
, pop the current
element off the stack of open elements.
pre
element in the HTML
namespace.
class
attribute of the
element created to classes.
dialogue
, pop the current
element off the stack of open elements.
pre
element in the HTML
namespace.
dialogue
, pop the current
element off the stack of open elements.
comment-p
element in the
SuikaWiki/0.10 namespace.
dialogue
, pop the current
element off the stack of open elements.
ed
element in the SuikaWiki/0.10
namespace.
body
,
section
, dialogue
, and block
elements, then pop the element off the stack of open
elements and follow this substep again.
replace
Document
has been constructed. Abort
the parser.
dialogue
, pop the current
element off the stack of open elements.
p
, li
, dd
, figcaption
,
comment-p
, or ed
, or if the current
element's local
name is figcaption
or speaker
and
the current element has any child, run the
following substeps:
p
element in the HTML
namespace.
In the "in table row" insertion mode, a token MUST be processed as follows:
th
if
the header of
the token is true, or td
otherwise.
lastChild
IDL attribute of the current
element, if any, is td
or th
,
increase the value of
colspan
IDL attribute of the node by one (1)
and abort these substeps.
td
element in the HTML
namespace.
tr
, pop the element off the stack of open
elements.
tr
element in the HTML
namespace.
In the "in paragraph" insertion mode, a token MUST be processed as follows:
class
attribute of the
element created
to classes.
lang
attribute in
the XML namespace of the element created
to language.
FRAC
, ROOT
, SQRT
,
UNDER
, or UNDEROVER
mtext
element in the MathML
namespace.
LINES
line
element in the SuikaWiki/0.9
namespace.
SUBSUP
subscript
element in
the SuikaWiki/0.9 namespace.
FENCED
openfence
element in
the SuikaWiki/0.9 namespace.
OKURI
rt
element in the HTML
namespace.
title
.
rt
rt
.
title
, nsuri
, tz
, n
,
lat
, lon
, closefence
, or
superscript
attrvalue
.
qn
nsuri
.
ruby
or rubyb
rt
.
mtext
mtext
.
openfence
fencedtext
.
fencedtext
closefence
.
subscript
superscript
.
line
line
.
lang
attribute in
the XML namespace of the element created
to language.
rt
, title
, nsuri
, mtext
,
line
, subscript
, superscript
,
fencedtext
, openfence
, closefence
,
or attrvalue
, pop the element off the stack of open
elements.
strong
or em
, run the
following substeps:
]]
to the current element,
push the current element to the stack of open
elements, and abort these substeps.
As a result, the bottommost and second bottommost entries becomes equal, but one of them is popped from the stack of open elements soon.
anchor-external
element in the
SuikaWiki/0.9 namespace.
anchor-internal
element in the
SuikaWiki/0.9 namespace.
textContent
IDL attribute of the
element created to ]]
.
anchor
content
attribute in the SuikaWiki/0.9 namespace of the
current element to anchor attribute of the
token.
resScheme
content
attribute in the SuikaWiki/0.9 namespace of the
current element to resScheme attribute
of the token.
resParameter
content attribute in the SuikaWiki/0.9
namespace of the current element to resParameter
attribute of the token.
strong
, pop the element off the stack of open
elements and abort these substeps.
strong
element in the HTML
namespace.
em
, pop the element off the stack of open
elements and abort these substeps.
em
element in the HTML
namespace.
form
form
element in the SuikaWiki/0.9
namespace.
id
attribute of the element created
to id of the form token.
input
content
attribute of the element created to the first item in parameters of the form
token, if any, or the empty string otherwise.
template
content attribute of the element created to the second
item in parameters of the
form token, if any, or the empty string otherwise.
option
content attribute of the element created to the third
item in parameters of the
form token, if any, or the empty string otherwise.
parameter
content
attribute of the element created to the concatenation of items
in parameters, separated by a
:
character, in the same order.
form
element in the SuikaWiki/0.9
namespace.
ref
content
attribute of the element created to name of the form token.
id
of the form
token is not null, set the id
attribute of the
element created to id of the form
token.
parameter
content
attribute of the element created to the concatenation of items
in parameters, separated by a
:
character, in the same order. The result value might
be the empty string.
anchor
content attribute in
the SuikaWiki/0.9 namespace of the element created to
anchor attribute of the
element token.
by
content attribute of
the element created to by
attribute of the element token.
resScheme
content attribute in the SuikaWiki/0.9
namespace of the element created to resScheme attribute of the
element token.
resParameter
content attribute in the SuikaWiki/0.9
namespace of the element created to resParameter attribute of
the element token.
textContent
IDL attribute of the element created to content of the element
token.
dt
, pop the element off the stack of open
elements.
dd
element in the HTML
namespace.
h1
, pop the element off the stack of open
elements.
td
or th
, pop the element off
the stack of open elements.
PRE
pre
, pop the element off the stack of open
elements.
...
The following ordered map known as Character Reference Table is referenced from the parser:
Key | Value |
---|---|
# | #
|
& | &
|
' | '
|
* | *
|
- | -
|
: | :
|
< | <
|
= | =
|
> | >
|
_ | _
|
[ | [
|
] | ]
|
The following Block Element Table is referenced from the parser:
Tag name | Local name | Namespace | Semantics (non-normative) | Semantics of start tag's line contents (non-normative) |
---|---|---|---|---|
BOX
| box
| SuikaWiki/0.9 namespace | A group of page content. | Not allowed. |
DEL
| delete
| SuikaWiki/0.9 namespace | Removal. | Not allowed. |
EG
| example
| SuikaWiki/0.9 namespace | Example. | Not allowed. |
FIG
| figure
| HTML namespace | Figure. | Figure caption. |
FIGCAPTION
| figcaption
| HTML namespace | Figure caption. | Not allowed. |
HISTORY
| history
| SuikaWiki/0.9 namespace | Historical notes. | Not allowed. |
INS
| insert
| SuikaWiki/0.9 namespace | Insertion. | Not allowed. |
LEFT
| sw-left
| SuikaWiki/0.9 namespace | A left-to-right defaulted horizontal content where lines are stacked top-to-bottom. | Not allowed. |
LEFTBOX
| sw-leftbox
| SuikaWiki/0.9 namespace | A box for left-to-right defaulted horizontal content where lines are stacked top-to-bottom. | Not allowed. |
LEFTBTBOX
| sw-leftbtbox
| SuikaWiki/0.9 namespace | A box for left-to-right defaulted horizontal content where lines are stacked bottom-to-top. | Not allowed. |
NOTE
| note
| HTML3 namespace | Note. | Not allowed. |
POSTAMBLE
| postamble
| SuikaWiki/0.9 namespace | Postamble. | Not allowed. |
PREAMBLE
| preamble
| SuikaWiki/0.9 namespace | Preamble. | Not allowed. |
REFS
| refs
| SuikaWiki/0.9 namespace | References and quotations. | Not allowed. |
RIGHT
| sw-right
| SuikaWiki/0.9 namespace | A right-to-left defaulted horizontal content where lines are stacked top-to-bottom. | Not allowed. |
RIGHTBOX
| sw-rightbox
| SuikaWiki/0.9 namespace | A box for right-to-left defaulted horizontal content where lines are stacked top-to-bottom. | Not allowed. |
RIGHTBTBOX
| sw-rightbtbox
| SuikaWiki/0.9 namespace | A box for right-to-left defaulted horizontal content where lines are stacked bottom-to-top. | Not allowed. |
SPEAKER
| speaker
| SuikaWiki/0.9 namespace | Speaker name of a talk. | Not allowed. |
TALK
| talk
| SuikaWiki/0.9 namespace | A single talk part in a dialogue. | Speaker name of the talk. |
VLR
| sw-vlr
| SuikaWiki/0.9 namespace | A top-to-bottom defaulted vertical content where lines are stacked left-to-right. | Not allowed. |
VLRBOX
| sw-vlrbox
| SuikaWiki/0.9 namespace | A box for top-to-bottom defaulted vertical content where lines are stacked left-to-right. | Not allowed. |
VRL
| sw-vrl
| SuikaWiki/0.9 namespace | A top-to-bottom defaulted vertical content where lines are stacked right-to-left. | Not allowed. |
VRLBOX
| sw-vrlbox
| SuikaWiki/0.9 namespace | A box for top-to-bottom defaulted vertical content where lines are stacked right-to-left. | Not allowed. |
A future revison to this specification might define more tag names.
The semantics of an element is formally defined in terms of corresponding DOM elements.
Block elements are elements whose local name is one of local names in the Block Element Table.
Structural elements are block elements and
elements
whose local name is one
of body
, section
,
blockquote
, h1
, ul
,
ol
, dl
, li
, dt
,
dd
, table
, tbody
, th
,
tr
, td
, p
, comment-p
,
ed
, box
, and pre
.
These definitions are referenced from the parser. The parser does not have to check elements' namespaces.
A document using SWML elements and attributes can be serialized using the XML syntax.
An SWML MIME type is any MIME type
whose essence is text/x-suikawiki
or text/x.suikawiki.image
.
The SWML text serialization can be identified by
an SWML MIME type. However, MIME
type text/x.suikawiki.image
MUST
NOT be used.
An implementation that recognize any SWML MIME type MUST interpret all the SWML MIME types as equivalent.
Historically, a document
whose format name
is SuikaWiki
was
expected to be labeled as text/x-suikawiki
, while
a document
whose format name
is SuikaWikiImage
was expected to be labeled as text/x.suikawiki.image
.
If a string is labeled by an SWML MIME type, it MUST be an SWML text serialization document.
If a byte sequence is labeled by an SWML MIME type, it MUST be an SWML text serialization document, encoded in UTF-8.
When a string labeled by an SWML MIME type is processed, the parser MUST be used with the string as the input.
When a byte sequence labeled by an SWML MIME type is processed, the parser MUST be used with the UTF-8 decoded I/O-queue-coverted byte sequence as the input.
An SWML MIME type MAY have
a charset
parameter,
unless it is not allowed (e.g. when it is used to label
a string). If specified, its
value MUST be UTF-8
, ASCII
case-insensitive.
An SWML MIME type MUST NOT have any other parameter.
An implementation that recognize any SWML MIME
type MUST ignore
any parameter (including
the charset
parameter).
Historically, there was
the version
parameter
whose value is 0.9
or 0.10
. It was
used to encode format
version in the magic
line.
fragment identifier
The SWML XML serialization has no dedicated MIME type. It can be identified by any XML MIME type.
This specification is the specification for the SuikaWiki/0.9 namespace and the SuikaWiki/0.10 namespace. Anything belongging to those namespaces is defined in this specification.
Elements and attributes in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace, as well as attributes in no namespace for elements in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace, MUST NOT be used in context where they are not allowed explicitly.
Elements in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace defined in this specification MUST conform to their content model.
Inter-element whitespace, comment nodes, and processing instruction nodes MUST be ignored when establishing whether an element matches its content model or not.
Elements in the SuikaWiki/0.9 namespace and in the SuikaWiki/0.10 namespace MAY be orphan nodes (i.e. without a parent node).
In the following subsections, attributes listed in the allowed attributes entry MAY be set to an element described in that subsection.
Some elements belong to categories such as flow content and phrasing content.
document
element in the SuikaWiki/0.9 namespacehead
element in the
XHTML2 namespace, followed by a body
element in the XHTML2
namespace, optionally followed by a image
element
in the SuikaWiki/0.9 namespace.
This element MUST NOT be used.
...
Name
attribute in
the SuikaWiki/0.9 namespaceThis attribute MUST NOT be used.
...
Version
attribute in the SuikaWiki/0.9 namespaceThis attribute MUST NOT be used.
...
parameter
element in the SuikaWiki/0.9 namespacevalue
element in
the SuikaWiki/0.9 namespace.
name
This element MUST NOT be used.
... name
value
element
in the SuikaWiki/0.9 namespaceThis element MUST NOT be used.
...
class
attributeThe class
attribute of
an element in the SuikaWiki/0.9 namespace or
in the SuikaWiki/0.10 namespace has the same semantics
and requirements as the class
attribute of
the HTML elements.
Unless otherwise specified by other applicable
specification, the class
attribute of
an element in the AA namespace, in
the HTML3 namespace, or in the HTML3
namespace has the same semantics and requirements as
the class
attribute of the HTML elements.
id
attributeThe id
attribute of
an element in the SuikaWiki/0.9 namespace or
in the SuikaWiki/0.10 namespace has the same semantics
and requirements as the id
attribute of
the HTML elements.
Unless otherwise specified by other applicable
specification, the id
attribute of
an element in the AA namespace, in
the HTML3 namespace, or in the XHTML2
namespace has the same semantics and requirements as
the id
attribute of the HTML elements.
xml:lang
attributeThe lang
attribute in
the XML namespace MAY be set to
any element in the SuikaWiki/0.9 namespace
or in the SuikaWiki/0.10 namespace.
Unless otherwise specified by other applicable
specification,
the lang
attribute in
the XML namespace MAY be set to
any element in the AA namespace, in
the HTML3 namespace, or in the XHTML2
namespace.
The space
attribute
in the XML namespace has no effect for
the elements in the SuikaWiki/0.9
namespace or in the SuikaWiki/0.10 namespace
and MUST NOT be used.
Unless otherwise specified by other applicable
specification,
the space
attribute in
the XML namespace has no effect for
the elements in the AA
namespace, in the HTML3 namespace, or in
the XHTML2 namespace and MUST NOT
be used.
The xml:base
attribute and
the xml:id
attribute MUST
NOT be used or supported.
dr
element in the
SuikaWiki/0.9 namespacedt
element in the XHTML2
namespace, followed by a dd
element
in the XHTML2 namespace.
The dr
element
is semantically equals to the
HTML div
element that is
a child of the HTML dl
element.
This element MUST NOT be used.
comment-p
element in the
SuikaWiki/0.10 namespaceThe comment-p
element represents
a note.
Historically, the p
suffix in the element name
implied that it represented
a paragraph. As its content is any flow
content, it now can contain any number
of paragraphs.
history
element in the
SuikaWiki/0.9 namespaceclass
The history
element represents a description of
history or an out-of-date content.
example
element in the
SuikaWiki/0.9 namespaceclass
The example
element represents an example.
preamble
element in the
SuikaWiki/0.9 namespaceclass
The preamble
element represents
a preamble or preface.
postamble
element in the
SuikaWiki/0.9 namespaceclass
The postamble
element represents
a postamble.
box
element in the
SuikaWiki/0.9 namespaceclass
The box
element represents a
physically grouped chunk of contents.
The class
attribute and CSS can be used
to specify presentations of the element.
dialogue
element in the
SuikaWiki/0.9 namespacetalk
elements, optionally
intermixed with script-supporting elements.
The dialogue
element represents
a conversation between one or more persons.
Each piece of the conversation is represented
by child talk
elements.
A dialogue
element SHOULD
have at least
one talk
element child.
talk
element in the
SuikaWiki/0.9 namespacespeaker
element followed
by flow content, optionally intermixed
with script-supporting elements.
class
The talk
element represents
a group of sentences by a person (or a specific group of persons) in
the dialogue.
The speaker of
a talk
element is the
first speaker
child element of
the element, if any, or null
. If
the speaker is
not null
, it describes the speaker(s) of the talk.
Otherwise, the speaker is not explicitly described.
Interviewer's questions are often identified by lack of explicit speaker name.
The class
attribute can be used to style talks in a
dialogue based on the speaker of them (e.g. use different colors for
different speakers).
speaker
element in the
SuikaWiki/0.9 namespaceclass
The speaker
element represents
a short string used to credit the person (or a group of person) of the
piece of the conversation.
It can also contain other metadata than person name, such as affiliation of the person or the timestamp of the talk, if desired.
Some of elements defined by this
specification or used in SWML documents are considered as
implicit link
elements. Elements
abbr
,
cite
,
code
, and
kbd
in the HTML namespace are impicit link
elements.
anchor
element in the
SuikaWiki/0.9 namespaceanchor
in the
SuikaWiki/0.9 namespace
...
anchor-internal
element in the
SuikaWiki/0.9 namespaceanchor
in the
SuikaWiki/0.9 namespace
...
anchor-end
element in the
SuikaWiki/0.9 namespaceanchor
in the
SuikaWiki/0.9 namespace.
...
anchor
attribute
in the SuikaWiki/0.9 namespaceThe anchor
attribute in the
SuikaWiki/0.9 namespace, when set to an
anchor-end
element, defines an anchor number
for the parent element of the anchor-end
element, if any.
The attribute MUST be present and its
value MUST be a valid integer.
The integer MUST have different value from any
other anchor
attribute in the
SuikaWiki/0.9 namespace of an
anchor-end
element in the SuikaWiki/0.9
namespace that belongs to the same tree as the first attribute.
The anchor
attribute in the
SuikaWiki/0.9 namespace MAY be
set to q
elements in the HTML
namespace and in the XHTML2 namespace, as well as
ins
and del
elements in the HTML
namespace. The attribute can also be present to
anchor
and anchor-internal
elements in the
SuikaWiki/0.9 namespace.
In these cases, the attribute represents the anchor
number of the element referenced. If the element on which the
attribute is found is an anchor
element, the element
referenced might be found in the document referenced by the element.
Otherwise, the element is in the tree the element belongs to.
If the element on which the attribute is found is not
an anchor
or anchor-internal
element, the
attribute has similar semantics to that of the cite
attribute on the element. In such cases, the anchor
attribute in the
SuikaWiki/0.9 namespace MUST NOT
be present when there is a cite
attribute.
A user agent MUST ignore the anchor
attribute in the
SuikaWiki/0.9 namespace if there is a cite
attribute.
The attribute value MUST be a valid
integer. Unless the element is anchor
, the
integer MUST be equal to one of the integer
represented by the anchor
attribute
in the SuikaWiki/0.9 namespace set to an
anchor-internal
element in the SuikaWiki/0.9
namespace that belongs to the same tree.
anchor-external
element in the
SuikaWiki/0.9 namespaceresParameter
in the
SuikaWiki/0.9 namespace
resScheme
in the
SuikaWiki/0.9 namespace
...
resScheme
attribute in the SuikaWiki/0.9 namespaceThe resScheme
attribute in
the SuikaWiki/0.9 namespace MAY be
used for q
elements in the HTML
namespace and in the XHTML2 namespace, as well as
ins
and del
elements in the HTML
namespace. The attribute can also be used for an
anchor-external
element in the SuikaWiki/0.9
namespace.
...
resParameter
attribute
in the SuikaWiki/0.9 namespaceThe resParameter
attribute
in the SuikaWiki/0.9 namespace MAY
be used for q
elements in the HTML
namespace and in the XHTML2 namespace, as well as
ins
and del
elements in the HTML
namespace. The attribute can also be used for an
anchor-external
element in the SuikaWiki/0.9
namespace.
...
The element's resScheme
of
an resParameter
attribute in
the SuikaWiki/0.9 namespace attr is the result
of the following steps:
resScheme
attribute
in the SuikaWiki/0.9 namespace.
If the element's resScheme
is MAIL
, the attribute
value MUST be a valid e-mail
address.
If the element's resScheme
is URL
or URI
, the attribute
value MUST be a valid URL
string.
Unless otherwise specified by other applicable
specification, the aa
element is
a phrasing content and its content model
is phrasing content.
form
element in the
SuikaWiki/0.9 namespaceid
input
option
parameter
ref
template
... ref
,
parameter
.
... input
, template
, option
image
element in
the SuikaWiki/0.9 namespaceThis element MUST NOT be used. It was used to embed an image data associated with the document.
An image
element element in
the SuikaWiki/0.9 namespace represents the
result of the following steps:
Historically, at most one image
element
was allowed to be inserted after
the body
element in the HTML
namespace. Its content has to be
a Base64-encoded PNG or JPEG
(JFIF) image data.
replace
element in the
SuikaWiki/0.9 namespaceby
This element MUST NOT be used.
... by
text
element in the
SuikaWiki/0.9 namespaceThis element MUST NOT be used.
...
csection
element in the
SuikaWiki/0.10 namespaceclass
...
src
element in the
SuikaWiki/0.10 namespaceclass
...
refs
element in the
SuikaWiki/0.9 namespaceclass
The refs
element represents a list of referenced
documents.
sw-see
element in the
SuikaWiki/0.9 namespaceclass
The sw-see
element represents
a text that contains references to other relevant parts (typically
another pages in the same Wiki or another pieces in the same document)
associated with the surrounding texts.
A node's inline direction mode is
either horizontal mode or vertical mode.
Unless otherwise specified, the node's inline
direction mode is same as
the node's parent's inline direction
mode if the node's parent is
not null
, or horizontal mode otherwise.
In horizontal mode, characters are intended to be stacked horizontally, left-to-right or right-to-left. Vertical glyphs are not used.
In vertical mode, characters are intended to be stacked vertically, top-to-bottom or bottom-to-top. Vertical glyphs are used.
Vertical glyphs
are glyphs used in vertical writings. For
Unicode characters, they are described by
the Vertical_Orientation
property of
the code points.
When an element is defined as turned, it is rotated 180 degree. For example, a left-to-right text becomes a right-to-text text with 180-degree rotated glyphs, a top-to-bottom text becomes a bottom-to-top text with 180-degree rotated glyphs, and an underline becomes an overline.
When an element is defined as mirrored, it is reversed at the horizontal center line in horizontal mode or at the vertical center line in vertical mode. For example, a left-to-right text becomes a right-to-text text with horizontally mirrored glyphs and a top-to-bottom text becomes a bottom-to-top text with vertically mirrored glyphs.
An node has horizontal orientation and vertical orientation. Unless otherwise specified, a horizontal mode node's horizontal orientation is its natural orientation and vertical orientation is its 90-degree clockwise rotation (that is, the left of the context becomes the bottom for the node's content). Unless otherwise specified, a vertical mode node's vertical orientation is its natural orientation and the node's horizontal orientation is its 270-degree clockwise rotation (that is, the right of the context becomes the top for the node's content). These rotations are applied after any turning.
For a horizontal mode node, its children represent their horizontal orientation. For a vertical mode node, its children represent their vertical orientation.
A directional group is a sequence of zero or more flow content with a specific set of inline-direction/block-direction. It is expected to be rendered in a way that is appropriate for the directions. For example, a right-to-left/top-to-bottom directional group is styled such that texts are right-aligned and blocks are top-aligned.
A directional box is a directional group. A directional box's horizontal orientation and vertical orientation are its natural orientation.
sw-l
element in the
SuikaWiki/0.9 namespaceclass
The sw-l
element represents
a left-to-right, horizontal mode text.
sw-lt
element in the
SuikaWiki/0.9 namespaceclass
The sw-lt
element represents
a turned left-to-right, horizontal mode
text.
sw-r
element in the
SuikaWiki/0.9 namespaceclass
The sw-r
element represents
a right-to-left, horizontal mode text.
sw-rt
element in the
SuikaWiki/0.9 namespaceclass
The sw-rt
element represents
a turned right-to-left, horizontal mode
text.
sw-v
element in the
SuikaWiki/0.9 namespaceclass
The sw-v
element represents
a top-to-bottom, vertical mode text.
sw-vt
element in the
SuikaWiki/0.9 namespaceclass
The sw-vt
element represents
a turned top-to-bottom, vertical mode text.
sw-vb
element in the
SuikaWiki/0.9 namespaceclass
The sw-vb
element represents
a bottom-to-top, vertical mode text.
sw-vbt
element in the
SuikaWiki/0.9 namespaceclass
The sw-vbt
element represents
a turned bottom-to-top, vertical mode text.
sw-tate
element in the
SuikaWiki/0.9 namespaceclass
The sw-tate
element represents
a vertical mode text embedded in horizontal
mode.
The sw-tate
element's horizontal
orientation and vertical orientation are its
natural orientation.
The sw-tate
element MUST
NOT be used in vertical mode.
yoko
element in the
SuikaWiki/0.9 namespaceclass
The yoko
element represents
a horizontal mode text embedded in vertical
mode.
The yoko
element's horizontal
orientation and vertical orientation are its
natural orientation.
The yoko
element MUST
NOT be used in horizontal mode.
The element can be used to mark up a 縦中横 text span or a upright latin letter in a vertical text.
sw-mirrored
element in the
SuikaWiki/0.9 namespaceclass
The sw-mirrored
element represents
a mirrored text.
sw-left
element in the
SuikaWiki/0.9 namespaceclass
The sw-left
element represents
a left-to-right/top-to-bottom directional group
in horizontal mode.
sw-right
element in the
SuikaWiki/0.9 namespaceclass
The sw-right
element represents
a right-to-left/top-to-bottom directional group
in horizontal mode.
sw-vlr
element in the
SuikaWiki/0.9 namespaceclass
The sw-vlr
element represents
a top-to-bottom/left-to-right directional group
in vertical mode.
sw-vrl
element in the
SuikaWiki/0.9 namespaceclass
The sw-vrl
element represents
a top-to-bottom/right-to-left directional group
in vertical mode.
sw-leftbox
element in the
SuikaWiki/0.9 namespaceclass
The sw-leftbox
element represents
a left-to-right/top-to-bottom directional box
in horizontal mode.
sw-rightbox
element in the
SuikaWiki/0.9 namespaceclass
The sw-rightbox
element represents
a right-to-left/top-to-bottom directional box
in horizontal mode.
sw-leftbtbox
element in the
SuikaWiki/0.9 namespaceclass
The sw-leftbtbox
element represents
a left-to-right/bottom-to-top directional box
in vertical mode.
Though it is desired that the element be
in horizontal mode, it is defined as
in vertical mode, such that the bottom-to-top block
direction can be emulated by
CSS 'writing-mode: vertical-lr'
with 270-degree rotation.
sw-rightbtbox
element in the
SuikaWiki/0.9 namespaceclass
The sw-rightbtbox
element represents
a right-to-left/bottom-to-top directional box
in vertical mode.
Though it is desired that the element be
in horizontal mode, it is defined as
in vertical mode, such that the bottom-to-top block
direction can be emulated by
CSS 'writing-mode: vertical-lr'
with 270-degree rotation.
sw-vlrbox
element in the
SuikaWiki/0.9 namespaceclass
The sw-vlrbox
element represents
a bottom-to-top/left-to-right directional box
in vertical mode.
sw-vrlbox
element in the
SuikaWiki/0.9 namespaceclass
The sw-vrlbox
element represents
a bottom-to-top/right-to-left directional box
in vertical mode.
fenced
element in
the SuikaWiki/0.9 namespaceopenfence
element, followed by
a fencedtext
element, followed by an
optional closefence
element, optionally
intermixed with script-supporting elements.
class
The fenced
element represents
a chunk of content with parentheses enclosing it.
The fenced
element's open
fence is the
first child openfence
element,
if any, or null
instead.
The fenced
element's fenced
content is the
first child fencedtext
element,
if any, after any open fence of the element.
The fenced
element's close
fence is the
first child closefence
element,
if any, after any fenced content of
the element.
Any other child represents nothing.
The element's non-null
open
fence or close
fence's child MAY
be empty. It is semantically equivalent to
the element's open fence or close
fence, respectively, is null
.
It is expected that the open fence and close fence, if any, are rendered by the same line height as the maximum of the height of the fenced content and the height of the line box usually generated by the open fence or close fence.
openfence
element in
the SuikaWiki/0.9 namespaceclass
The openfence
element represents
the parenthese before the fenced content if it is an
open fence. Otherwise, it represents
its children.
fencedtext
element in
the SuikaWiki/0.9 namespaceclass
The fencedtext
element represents
the content enclosed by parentheses if it is a
fenced content. Otherwise, it represents
its children.
closefence
element in
the SuikaWiki/0.9 namespaceclass
The closefence
element represents
the parenthese after the fenced content if it is an
close fence. Otherwise, it represents
its children.
lines
element in
the SuikaWiki/0.9 namespaceline
elements,
optionally intermixed with script-supporting elements.
class
The lines
element represents
zero or more spans of parallel contents embedded within a line.
line
element in
the SuikaWiki/0.9 namespaceclass
The line
element represents
a piece of content in the parent
lines
element, if any. Otherwise,
it represents its content.
insert
element in the
SuikaWiki/0.9 namespaceclass
delete
element in the
SuikaWiki/0.9 namespaceclass
ed
element in the SuikaWiki/0.10
namespace...
asis
element in
the SuikaWiki/0.9 namespacetitle
element, optionally intermixed
with script-supporting elements.
class
The asis
element represents
a chunk of content with an annotation that the content is repeated
from the source material as-is. This annotation is typically used
within a quotation to describe that the content has an error that
occurs in the original text (and is not produced by the
author of the document).
If there is a last child element that is
a title
element, it is the annotation (and
is not part of the annotated content). A user agent
is expected to show the annotation as such. If there is
no title
element child, a
user-agent dependent
annotation SHOULD be used instead.
Typically a short text such as “(sic)” or “ママ” is used as the annotation.
snip
element in the
SuikaWiki/0.9 namespaceclass
The snip
element represents
the point where some content in the source material are omitted. It
is typically used within a quotation.
If the element's children is not empty, it represents the annotation placed instead of the original text.
It is expected that the element's children are rendered. If the element's children is empty, a user-agent dependent annotation SHOULD be used instead.
Typically, a short text such as “...” or “(中略)” is used as the annotation.
emph
element in
the SuikaWiki/0.9 namespaceclass
The emph
element is equivalent to
the em
element.
The class
attribute and CSS can be used
to specify presentations of emphasized texts, such as italic and
various kinds of annotation marks.
rubyb
element in the
SuikaWiki/0.9 namespacert
element, optionally intermixed
with script-supporting elements.
class
The rubyb
element is equivalent to
the ruby
element except that it only
contains the base text (phrasing content) and the
secondary ruby text (an rt
element).
It is expected that the first rt
element child, if any, is rendered
after the other children before
it.
okuri
element in the
SuikaWiki/0.9 namespacert
element, optionally followed by
an rt
element, optionally intermixed
with script-supporting elements.
class
The okuri
element represents
okuri-ganas (送り仮名) annotated
after the base character in Kambun
(漢文).
Its first rt
element child,
if any, represents the primary ruby text. Its
second rt
element child, if
any, represents the secondary ruby text. The base
character is the character just before
the okuri
element (ignoring out-of-flow
content such as another ruby texts).
It is expected that the primary ruby text, if any, is
rendered at the right bottom of the base character and the secondary
ruby text, if any, is rendered at the left bottom of the base
character, in a vertical text. Note that the base character might
have its own ruby texts (i.e. it is part of
a ruby
element).
weak
element in the
SuikaWiki/0.9 namespaceclass
...
title
element in the
SuikaWiki/0.10 namespaceThe title
element represents
a text that would be specified to
the title
attribute of
the parent element, if applicable.
Otherwise, it represents nothing.
A title
element MAY
be the last
non-script-supporting
child element of
an abbr
, dfn
,
or span
element parent when there
is no title
attribute
in parent.
The title
element MUST
NOT be used when it represents nothing.
dotabove
element in
the SuikaWiki/0.9 namespaceclass
The dotabove
element represents
a content with a dot above it. This is used in mathematical
expressions, such as recurring decimals.
It is expected that a dot symbol is rendered above the element's children.
sw-macron
element in
the SuikaWiki/0.9 namespaceclass
The sw-macron
element represents
a content with a dot above it. This is used in mathematical
expressions, such as sample mean variables.
It is expected that an overline is rendered above the element's children.
vector
element in
the SuikaWiki/0.9 namespaceclass
The vector
element represents
a content with a right-pointing arrow above it. This is used in
mathematical expressions to denote a vector.
It is expected that a right-pointing arrow is rendered above the element's children.
This element SHOULD NOT be used for other possible presentations of vectors, such as bold face without arrow.
subsup
element in
the SuikaWiki/0.9 namespacesubscript
element, followed by a
superscript
element, optionally intermixed
with script-supporting elements.
class
The subsup
element represents
a pair of subscript and superscript chunks of contents.
The subscript content of
a subsup
element is the
first subscript
element child,
if any.
The superscript content of
a subsup
element is the
first child superscript
element,
if any, after the subscript content if there is.
Any other child represents nothing.
It is expected that the subscript content is rendered as subscript and the superscript content is rendered as superscript sharing the same horizontal spaces.
subscript
element in
the SuikaWiki/0.9 namespaceclass
The subscript
element
is semantically equivalent to
the sub
element.
superscript
element in
the SuikaWiki/0.9 namespaceclass
The superscript
element
is semantically equivalent to
the sup
element.
Some elements are defined as elements with value. For an element element with value, the element value is the value returned by the following steps:
attrvalue
element:
attrvalue
element child
of element.
f
element in the
SuikaWiki/0.9 namespaceclass
The f
element represents a
field name or key of some structure, such as a field name of a C data
structure, a key of a Perl hash, or a property name of an XML
information item.
key
element in the
SuikaWiki/0.10 namespaceclass
...
n
element in the
SuikaWiki/0.9 namespaceattrvalue
element, optionally intermixed
with script-supporting elements.
class
The n
element represents the
number given as the element value.
lat
element in the
SuikaWiki/0.9 namespaceattrvalue
element, optionally intermixed
with script-supporting elements.
class
The lat
element represents a
latitude given as the element value.
lon
element in the
SuikaWiki/0.9 namespaceattrvalue
element, optionally intermixed
with script-supporting elements.
class
The lon
element represents a longitude
given as the element value.
tz
element in
the SuikaWiki/0.9 namespaceattrvalue
element, optionally intermixed
with script-supporting elements.
class
A tz
element represents a time-zone
offset given as the element value.
attrvalue
element in the
SuikaWiki/0.10 namespaceclass
The attrvalue
element represents
the element value of
the parent element, if applicable, or the
text that would be set to the datetime
attribute of
the parent element, if it is
a time
element without
a datetime
attribute. Otherwise,
it represents nothing.
An attrvalue
MAY be the last
non-script-supporting
child element of
a time
element.
An attrvalue
MUST NOT be used
when it represents nothing.
MUST
element in the
SuikaWiki/0.9 namespaceclass
The MUST
element represents
an RFC 2119 keyword
"MUST".
It can also be used for keywords
"REQUIRED",
"SHALL",
"MUST
NOT", and
"SHALL
NOT".
SHOULD
element in the
SuikaWiki/0.9 namespaceclass
The SHOULD
element represents
an RFC 2119 keyword
"SHOULD".
It can also be used for keywords
"RECOMMENDED",
"SHOULD
NOT", and
"NOT
RECOMMENDED".
MAY
element in the
SuikaWiki/0.9 namespaceclass
The MAY
element represents
an RFC 2119 keyword
"MAY".
It can also be used for keyword
"OPTIONAL".
sw-cursive
element in the
SuikaWiki/0.9 namespaceclass
The sw-cursive
element represents
a text in a cursive font.
smallcaps
element in the
SuikaWiki/0.9 namespaceclass
The smallcaps
element represents
a text in small-capitals.
The characters in
the smallcaps
element, including but not
limited to uppercase and lowercase latin letters,
are expected to be rendered using small-capital glyphs,
whenever possible.
sw-br
element in the
SuikaWiki/0.9 namespaceclass
The sw-br
element represents
an explicit line break.
The element's children, if not empty, represents the hyphenation or other symbols indicating the end of the line, placed just before the line break.
The sw-br
element
is expected to be rendered as if there were
a U+000A
LINE
FEED
character at the end of
the element.
White space characters in
the sw-br
element and
the U+000A
character at the end of
the element are expected to be processed
as preserved white spaces.
qn
element in the SuikaWiki/0.10
namespacensuri
element, optionally intermixed
with script-supporting elements.
class
...
qname
element in the
SuikaWiki/0.10 namespaceThis element MUST NOT be used.
...
nsuri
element in the
SuikaWiki/0.10 namespace...
class
Uppercase elements are elements in the SuikaWiki/0.10 namespace whose local name consists of one or more uppercase letters.
These elements MUST NOT be used.
The node tree returned by the parsing algorithm does not contain
any script
element. The following nodes might
contain URLs, which might be potentially
dangerous (e.g. javascript:
URL):
resParameter
attribute in the SuikaWiki/0.9 namespace
nsuri
element in
the SuikaWiki/0.10 namespace
The image
element in
the SuikaWiki/0.9 namespace might contain
a Base64-encoded binary
data that is not in fact an image (e.g an executable binary).
There are test data.
There is a Perl implementation.
This document is written by Wakaba <wakaba@suikawiki.org>.
This document is developed as part of the SuikaWiki project.
Per CC0, to the extent possible under law, the author has waived all copyright and related or neighboring rights to this work.