/[suikacvs]/markup/xml/xmlcc/xmlcc-work.en.html
Suika

Diff of /markup/xml/xmlcc/xmlcc-work.en.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.19 by wakaba, Sat Dec 1 14:51:45 2007 UTC revision 1.24 by wakaba, Fri Mar 28 15:37:01 2008 UTC
# Line 18  Line 18 
18    
19  <div class="header">  <div class="header">
20  <h1>manakai's <abbr>XML</abbr> Conformance Checking</h1>  <h1>manakai's <abbr>XML</abbr> Conformance Checking</h1>
21  <h2>Working Draft <time datetime=2007-12-01>1 December 2007</time></h2>  <h2>Working Draft <time datetime=2008-03-29>29 March 2008</time></h2>
22    
23  <dl class="versions-uri">  <dl class="versions-uri">
24  <dt>This Version</dt>  <dt>This Version</dt>
# Line 42  Line 42 
42        >w@suika.fam.cx</a>&gt;</code></dd>        >w@suika.fam.cx</a>&gt;</code></dd>
43  </dl>  </dl>
44    
45  <p class="copyright" lang="en">&#xA9; <time>2007</time> <a  <p class="copyright" lang="en">&#xA9; <time>2007</time>$B!>(B<time>2008</time> <a
46      href="http://suika.fam.cx/~wakaba/who?" rel="author">Wakaba</a>.      href="http://suika.fam.cx/~wakaba/who?" rel="author">Wakaba</a>.
47  Permission is granted to copy, distribute and/or modify this document  Permission is granted to copy, distribute and/or modify this document
48  under the terms of the <a rel="license"  under the terms of the <a rel="license"
# Line 118  else in this specification is normative. Line 118  else in this specification is normative.
118  <p><span class=ed>Algorithm is normative but non-normative</span>.  <p><span class=ed>Algorithm is normative but non-normative</span>.
119  In addition, the order in which <a href="#errors">errors</a> are  In addition, the order in which <a href="#errors">errors</a> are
120  raised is undefined.</p>  raised is undefined.</p>
121    
122    <p>This document sometimes cites parts of <abbr>XML</abbr> 1.0 specification
123    by hyperlinks.  When the document being processed is an <abbr>XML</abbr> 1.1
124    document, however, corresponding parts of the <abbr>XML</abbr> 1.1
125    specification should be consulted instead.</p>
126  </div>  </div>
127    
128    
# Line 209  can be easily serialized into a valid XM Line 214  can be easily serialized into a valid XM
214    
215  <div class=ed><p>@@ TODO: #dt-atuseroption at user option  <div class=ed><p>@@ TODO: #dt-atuseroption at user option
216  (MAY or MUST), #dt-compat for compatibility,  (MAY or MUST), #dt-compat for compatibility,
217  #dt-interop for interoperability</p></div>  #dt-interop for interoperability</p>
218    
219    <p>TODO: XML 1.1, XML Namespace 1.0/1.1, xml:base, xml:id
220    </div>
221    
222  </div>  </div>
223    
224  <div class=section id=parsing-xml>  <div class=section id=parsing-xml>
225  <h2>Parsing <abbr>XML</abbr> Document</h2>  <h2>Parsing <abbr>XML</abbr> Document</h2>
226    
227  <ul>  <p>When an <abbr>XML</abbr> document is parsed, the following clauses
228  <li>If the <abbr>XML</abbr> document does not begin with an  are applied:</p>
229  <abbr>XML</abbr> declaration, then raise an  <dl>
230  <a href="#xml-misc-recommentation" id=xmr-xml-decl><code>xml-misc-recommendation</code></a>.</li>  <dt>For each document
231  <li>If the replacement text of an entity declaration is    <dd>If the <abbr>XML</abbr> document does not begin with an
232  <code>&lt;</code>, then raise an    <abbr>XML</abbr> declaration, then the parser <em class=rfc2119>MUST</em>
233  <a href="#xml-misc-warning" id=xmw-entity-value-lt><code>xml-misc-warning</code></a>.<!--    raise an
234      <a href="#xml-misc-recommentation" id=xmr-xml-decl><code>xml-misc-recommendation</code></a>.
235    <dt>For each internal general entity declaration processed by the parser
236      <dd>If the
237      <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#NT-EntityValue"><code>EntityValue</code></a>
238      part of the general entity declaration contains a bare <code>U+003C</code>
239      <code>LESS-THAN SIGN</code> (<code>&lt;</code>) character, then the parser
240      <em class=rfc2119>MUST</em> raise an
241      <a href="#xml-misc-warning" id=xmw-entity-value-lt><code>xml-misc-warning</code></a>.<!--
242  "strongly advised to avoid" in a Note in Section 2.3 of [XML10], [XML11].  "strongly advised to avoid" in a Note in Section 2.3 of [XML10], [XML11].
243  --></li>  -->
244  <li>If there is an element type declaration whose <code>Name</code>  <dt>For each element type declaration processed by the parser
245  value is already declared, then raise an    <dd>If there is another element type declaration whose <code>Name</code>
246  <a href="#xml-validity-error" id=vc-edunique><code>xml-validity-error</code></a>.</li>    is equal to the <code>Name</code> of the element type declaration, then
247  <li>If attribute definition whose <code>Name</code> is    the parser <em class=rfc2119>MUST</em> raise an
248  <code>xml:space</code> has <span class=ed>declared type different from    <a href="#xml-validity-error" id=vc-edunique><code>xml-validity-error</code></a>.
249  (default|preserve), (default), or (preserve)</span>, then raise an  <!--
250  <a href="#xml-misc-error" id=xme-ad-xml-space><code>xml-misc-error</code></a>.    NOTE: <!ATTLIST a xml:space (default) #IMPLIED xml:space CDATA #IMPLIED>
251  <span class=ed>@@ duplication with    will not be warned.
252  <a href="#xml-at-xml-space">#xml-at-xml-space</a>.<!--  -->
253  <!ATTLIST e xml:space CDATA #IMPLIED xml:space CDATA #IMPLIED> --></span></li>  <dt>For each empty-element tag
254  <li>If an empty-element tag is used for an element which is <em>not</em>    <dd>If the <code>Name</code> of the tag is not declared by a processed
255  declared <code>EMPTY</code>, then raise an    element type declaration as <code>EMPTY</code> content, then the parser
256  <a href="#xml-misc-recommentation" id=xmr-emptyelemtag-not-empty><code>xml-misc-recommendation</code></a>.</li>    <em class=rfc2119>MUST</em> raise an
257  <li>If an empty-element tag is <em>not</em> used for an element which is    <a href="#xml-misc-recommentation" id=xmr-emptyelemtag-not-empty><code>xml-misc-recommendation</code></a>.
258  declared <code>EMPTY</code>, then raise an  <dt>For each start-tag
259  <a href="#xml-misc-recommentation" id=xmr-empty-not-emptyelemtag><code>xml-misc-recommendation</code></a>.</li>    <dd>If the <code>Name</code> of the tag is declared by a processed element
260      type declaration as <code>EMPTY</code> content, then the parser
261      <em class=rfc2119>MUST</em> raise an
262      <a href="#xml-misc-recommentation" id=xmr-empty-not-emptyelemtag><code>xml-misc-recommendation</code></a>.
263    </dl>
264    
265    <p>In addition, the parser <em class=rfc2119>MUST</em> check whether the
266    following constraints are met.
267    
268    <p><strong>Well-formedness constraints</strong>.  For each violation to
269    one of constraints below, an
270    <a href="#xml-well-formedness-error"><code>xml-well-formedness-error</code></a>
271    <em class=rfc2119>MUST</em> be raised.  The list of well-formedness
272    constraints is below:
273    <ul>
274    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#wfc-PEinInternalSubset">Well-formedness constraint: PEs in Internal Subset</a>
275    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#ExtSubset">Well-formedness constraint: External Subset</a>
276    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#PE-between-Decls">Well-formedness constraint: PE Between Declarations</a>
277    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#GIMatch">Well-formedness constraint: Element Type Match</a>
278    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#uniqattspec">Well-formedness constraint: Unique Att Spec</a>
279    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#NoExternalRefs">Well-formedness constraint: No External Entity References</a>
280    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#CleanAttrVals">Well-formedness constraint: No &lt; in Attribute Values</a>
281    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#wf-Legalchar">Well-formedness constraint: Legal Character</a>
282    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#textent">Well-formedness constraint: Parsed Entity</a>
283    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#norecursion">Well-formedness constraint: No Recursion</a>
284    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#indtd">Well-formedness constraint: In DTD</a>
285    </ul>
286    
287    <p><strong>Validity constraints</strong>.  For each violation to
288    one of constraints below, an
289    <a href="#xml-validity-error"><code>xml-validity-error</code></a>.
290    <em class=rfc2119>MUST</em> be raised.  The list of validity
291    constraints is below:
292    <ul>
293    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#vc-PEinMarkupDecl">Validity constraint: Proper Declaration/PE Nesting</a>
294    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#vc-PEinGroup">Validity constraint: Proper Group/PE Nesting</a>
295    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#condsec-nesting">Validity constraint: Proper Conditional Section/PE Nesting</a>
296    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#UniqueNotationName">Validity constraint: Unique Notation Name</a>
297    </ul>
298    
299    <p><strong>Other creteria</strong>.  For each violation to
300    one of constraints below, an
301    <a href="#xml-misc-recommendation"><code>xml-misc-recommendation</code></a>
302    <em class=rfc2119>MUST</em> be raised.  The list of constraints is below:
303    <ul>
304    <li><q>For interoperability, if a parameter-entity reference appears in a
305    <code>choice</code>, <code>seq</code>, or <code>Mixed</code> construct, its
306    replacement text <em class=rfc2119>SHOULD</em> contain at least one non-blank
307    character, and neither the first nor last non-blank character of the
308    replacement text <em class=rfc2119>SHOULD</em> be a connector (<code>|</code>
309    or <code>,</code>).</q>
310    <li><q>External parsed entities <em class=rfc2119>SHOULD</em> each begin with a
311    text declaration.</q>
312    </ul>
313    
314    <!--
315    <li><q>For interoperability, an XML processor <em class=rfc2119>MAY</em> at user option issue a warning when more than one attribute-list declaration is provided for a given element type, or more than one attribute definition is provided for a given attribute, but this is not an error.</q>
316    
317    <a href="#xml-misc-error" id=xme-attr-xml-space><code>xml-misc-error</code></a>.
318    <li>It is an error for a reference to an unparsed entity to appear in the EntityValue in an entity declaration."
319    -->
320    
321  <!--  <!--
322    
 #vc-PEinMarkupDecl Validity constraint: Proper Declaration/PE Nesting  
 #wfc-PEinInternalSubset Well-formedness constraint: PEs in Internal Subset  
 #ExtSubset Well-formedness constraint: External Subset  
 #PE-between-Decls Well-formedness constraint: PE Between Declarations  
 #GIMatch Well-formedness constraint: Element Type Match  
 #uniqattspec Well-formedness constraint: Unique Att Spec  
 #NoExternalRefs Well-formedness constraint: No External Entity References  
 #CleanAttrVals Well-formedness constraint: No < in Attribute Values  
 #vc-PEinGroup Validity constraint: Proper Group/PE Nesting  
 "For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text SHOULD contain at least one non-blank character, and neither the first nor last non-blank character of the replacement text SHOULD be a connector (| or ,)."  
 "For interoperability, an XML processor MAY at user option issue a warning when more than one attribute-list declaration is provided for a given element type, or more than one attribute definition is provided for a given attribute, but this is not an error."  
 #condsec-nesting Validity constraint: Proper Conditional Section/PE Nesting  
 #wf-Legalchar Well-formedness constraint: Legal Character  
 #textent Well-formedness constraint: Parsed Entity  
 #norecursion Well-formedness constraint: No Recursion  
 #indtd Well-formedness constraint: In DTD  
 "External parsed entities SHOULD each begin with a text declaration."  
 "It is an error for a reference to an unparsed entity to appear in the EntityValue in an entity declaration."  
 #UniqueNotationName Validity constraint: Unique Notation Name  
323    
324  @@ Need detailed review, but maybe should be in parsing phase  @@ Need detailed review, but maybe should be in parsing phase
325    
# Line 284  declared <code>EMPTY</code>, then raise Line 342  declared <code>EMPTY</code>, then raise
342    
343  "It is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process. It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains byte sequences that are not legal in that encoding."  "It is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process. It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains byte sequences that are not legal in that encoding."
344    
345    @@ We should phrase out what the parser should do where the XML specification
346    left undefined.  For example: Comment must be converted to a Comment node,
347    illegal xml:space value must be preserved, so on.
348    
349  -->  -->
 </ul>  
350  </div>  </div>
351    
352  <div class="section" id=checking-dom>  <div class="section" id=checking-dom>
# Line 393  following algorithm <em class=rfc2119>MU Line 454  following algorithm <em class=rfc2119>MU
454  a public identifier (<dfn id=var-pid><var>pid</var></dfn>)</dfn>, the  a public identifier (<dfn id=var-pid><var>pid</var></dfn>)</dfn>, the
455  following algorithm <em class=rfc2119>MUST</em> be used:</p>  following algorithm <em class=rfc2119>MUST</em> be used:</p>
456  <ol>  <ol>
457    <li>If <var>pid</var> is <code>null</code>, abort these steps.</li>  <li>If <var>pid</var> is <code>null</code>, abort these steps.</li>
458    <li>If <var>pid</var> contains any character  <li>If <var>pid</var> contains a character that is <em>not</em> in the
459    that is outside of the range of <code>#x20 | #xD | #xA |  character class <a href="#class-PubidChar"><code>PubidChar</code></a>, then
460    [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]</code><!-- @@ TODO: formal def -->,  raise an
461    then it is an  <a href="#xml-well-formedness-error" id=wfe-pubid-char><code>xml-well-formedness-error</code></a>.</li>
   <a href="#xml-well-formedness-error" id=wfe-pubid-char><code>xml-well-formedness-error</code></a>.</li>  
462    <li>If <var>pid</var> contains one of <code class=char>U+0009</code>    <li>If <var>pid</var> contains one of <code class=char>U+0009</code>
463    <code class=charname>CHARACTER TABULATION</code>,    <code class=charname>CHARACTER TABULATION</code>,
464    <code class=char>U+000A</code> <code class=charname>CARRIAGE RETURN</code>,    <code class=char>U+000A</code> <code class=charname>CARRIAGE RETURN</code>,
# Line 414  following algorithm <em class=rfc2119>MU Line 474  following algorithm <em class=rfc2119>MU
474    <span class=ed>Is this really a roundtripness problem?  XML spec    <span class=ed>Is this really a roundtripness problem?  XML spec
475    does only define the way to match public identifiers in fact, no    does only define the way to match public identifiers in fact, no
476    canonical form.</span></li>    canonical form.</span></li>
   <li class=ed>@@ Should we check formal-public-identifierness?</li>  
477  </ol>  </ol>
478    
479  <p>To  <p>To
# Line 561  following:</p> Line 620  following:</p>
620      <li>If <code>nodeName</code> attribute of <var>n</var> is      <li>If <code>nodeName</code> attribute of <var>n</var> is
621      <code>xml:space</code> <span class=ed>@@ or {xml namespace}:space ?</span>      <code>xml:space</code> <span class=ed>@@ or {xml namespace}:space ?</span>
622      and <span class=ed>its declared type is different from (default|preserve),      and <span class=ed>its declared type is different from (default|preserve),
623      (default), or (preserve)</span>, then raise an      (preserve|default), (default), or (preserve)</span>, then raise an
624      <a href="#xml-misc-error" id=xme-at-xml-space><code>xml-misc-error</code></a>.</li>      <a href="#xml-misc-error" id=xme-at-xml-space><code>xml-misc-error</code></a>.</li>
625      <li>For each node <dfn id=var-ad-nc><var>n<sub><var>c</var></sub></var></dfn> in the      <li>For each node <dfn id=var-ad-nc><var>n<sub><var>c</var></sub></var></dfn> in the
626      <code>childNodes</code> list of <var>n</var>,      <code>childNodes</code> list of <var>n</var>,
# Line 996  as amended by Line 1055  as amended by
1055  contains the following characters:</p>  contains the following characters:</p>
1056  <ul class=ed>  <ul class=ed>
1057  </ul>  </ul>
1058    <div class="note memo">
1059    <p>This character class contains all characters allowed as the first character
1060    of a string matching to the production rule
1061    <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#NT-Name"><code>Name</code></a>
1062    of <abbr>XML</abbr> 1.0
1063    <cite class="bibref normative">[<a href="#ref-XML10">XML10</a>]</cite>.</p>
1064    </div>
1065    
1066  <p>The character class <dfn id=class-NameChar10><code>NameChar10</code></dfn>  <p>The character class <dfn id=class-NameChar10><code>NameChar10</code></dfn>
1067  contains the following characters:</p>  contains the following characters:</p>
# Line 1004  contains the following characters:</p> Line 1070  contains the following characters:</p>
1070  <a href="#class-NameStartChar10">NameStartChar10</a>.</li>  <a href="#class-NameStartChar10">NameStartChar10</a>.</li>
1071  <li class=ed></li>  <li class=ed></li>
1072  </ul>  </ul>
1073    <div class="note memo">
1074    <p>This character class contains all characters allowed as the second
1075    character of a string matching to the production rule
1076    <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#NT-Name"><code>Name</code></a>
1077    of <abbr>XML</abbr> 1.0
1078    <cite class="bibref normative">[<a href="#ref-XML10">XML10</a>]</cite>.</p>
1079    </div>
1080    
1081    <p>The character class <dfn id=class-PubidChar><code>PubidChar</code></dfn>
1082    contains the following characters:</p>
1083    <ul>
1084    <li><code class=char>U+0009</code> <code class=charname>CHARACTER
1085    TABULATION</code></li>
1086    <li><code class=char>U+000A</code> <code class=charname>LINE FEED</code></li>
1087    <li><code class=char>U+000D</code> <code class=charname>CARRIAGE
1088    RETURN</code></li>
1089    <li><code class=char>U+0020</code> <code class=charname>SPACE</code></li>
1090    <li><code class=char>U+0021</code> <code class=charname>EXCLAMATION MARK</code>
1091    (<code class=char>!</code>)</li>
1092    <li><code class=char>U+0023</code> <code class=charname>DOLLAR SIGN</code>
1093    (<code class=char>$</code>)</li>
1094    <li><code class=char>U+0024</code> <code class=charname>NUMBER SIGN</code>
1095    (<code class=char>#</code>)</li>
1096    <li><code class=char>U+0025</code> <code class=charname>PERCENT SIGN</code>
1097    (<code class=char>%</code>)</li>
1098    <li><code class=char>U+0027</code> <code class=charname>APOSTROPHE</code>
1099    (<code class=char>'</code>)</li>
1100    <li><code class=char>U+0028</code> <code class=charname>LEFT PARENTHESIS</code>
1101    (<code class=char>(</code>)</li>
1102    <li><code class=char>U+0029</code> <code class=charname>RIGHT
1103    PARENTHESIS</code> (<code class=char>)</code>)</li>
1104    <li><code class=char>U+002A</code> <code class=charname>ASTERISK</code>
1105    (<code class=char>*</code>)</li>
1106    <li><code class=char>U+002B</code> <code class=charname>PLUS SIGN</code>
1107    (<code class=char>+</code>)</li>
1108    <li><code class=char>U+002C</code> <code class=charname>COMMA</code>
1109    (<code class=char>,</code>)</li>
1110    <li><code class=char>U+002D</code> <code class=charname>HYPHEN-MINUS</code>
1111    (<code class=char>-</code>)</li>
1112    <li><code class=char>U+002E</code> <code class=charname>FULL STOP</code>
1113    (<code class=char>.</code>)</li>
1114    <li><code class=char>U+002F</code> <code class=charname>SOLIDUS</code>
1115    (<code class=char>/</code>)</li>
1116    <li><code class=char>U+0030</code> <code class=charname>DIGIT ZERO</code>
1117    (<code class=char>0</code>) .. <code class=char>U+0039</code>
1118    <code class=charname>DIGIT NINE</code> (<code class=char>9</code>)</li>
1119    <li><code class=char>U+003A</code> <code class=charname>COLON</code>
1120    (<code class=char>:</code>)</li>
1121    <li><code class=char>U+003B</code> <code class=charname>SEMICOLON</code>
1122    (<code class=char>;</code>)</li>
1123    <li><code class=char>U+003D</code> <code class=charname>EQUAL SIGN</code>
1124    (<code class=char>=</code>)</li>
1125    <li><code class=char>U+003F</code> <code class=charname>QUESTION MARK</code>
1126    (<code class=char>?</code>)</li>
1127    <li><code class=char>U+0040</code> <code class=charname>COMMERCIAL AT</code>
1128    (<code class=char>@</code>)</li>
1129    <li><code class=char>U+0041</code> <code class=charname>LATIN CAPITAL LETTER
1130    A</code> (<code class=char>A</code>) .. <code class=char>U+005A</code>
1131    <code class=charname>LATIN CAPITAL LETTER Z</code>
1132    (<code class=char>Z</code>)</li>
1133    <li><code class=char>U+005F</code> <code class=charname>LOW LINE</code>
1134    (<code class=char>_</code>)</li>
1135    <li><code class=char>U+0061</code> <code class=charname>LATIN CAPITAL LETTER
1136    A</code> (<code class=char>A</code>) .. <code class=char>U+007A</code>
1137    <code class=charname>LATIN CAPITAL LETTER Z</code>
1138    (<code class=char>Z</code>)</li>
1139    </ul>
1140    <div class="note memo">
1141    <p>This character class contains all characters allowed in the production rule
1142    <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#NT-PubidChar"><code>PubidChar</code></a>
1143    of <abbr>XML</abbr> 1.0
1144    <cite class="bibref normative">[<a href="#ref-XML10">XML10</a>]</cite>.</p>
1145    </div>
1146    
1147  </div>  </div>
1148    

Legend:
Removed from v.1.19  
changed lines
  Added in v.1.24

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24