/[suikacvs]/markup/xml/xmlcc/xmlcc-work.en.html
Suika

Diff of /markup/xml/xmlcc/xmlcc-work.en.html

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 1.20 by wakaba, Sat Dec 1 15:08:18 2007 UTC revision 1.29 by wakaba, Fri Oct 17 05:57:26 2008 UTC
# Line 18  Line 18 
18    
19  <div class="header">  <div class="header">
20  <h1>manakai's <abbr>XML</abbr> Conformance Checking</h1>  <h1>manakai's <abbr>XML</abbr> Conformance Checking</h1>
21  <h2>Working Draft <time datetime=2007-12-01>1 December 2007</time></h2>  <h2>Working Draft <time datetime=2008-10-17>17 October 2008</time></h2>
22    
23  <dl class="versions-uri">  <dl class="versions-uri">
24  <dt>This Version</dt>  <dt>This Version</dt>
# Line 42  Line 42 
42        >w@suika.fam.cx</a>&gt;</code></dd>        >w@suika.fam.cx</a>&gt;</code></dd>
43  </dl>  </dl>
44    
45  <p class="copyright" lang="en">&#xA9; <time>2007</time> <a  <p class="copyright" lang="en">&#xA9; <time>2007</time>$B!>(B<time>2008</time> <a
46      href="http://suika.fam.cx/~wakaba/who?" rel="author">Wakaba</a>.      href="http://suika.fam.cx/~wakaba/who?" rel="author">Wakaba</a>.
47  Permission is granted to copy, distribute and/or modify this document  Permission is granted to copy, distribute and/or modify this document
48  under the terms of the <a rel="license"  under the terms of the <a rel="license"
# Line 79  project.  It might be updated, replaced, Line 79  project.  It might be updated, replaced,
79  other documents at any time.  It is inappropriate to  other documents at any time.  It is inappropriate to
80  cite this document as other than <q>work in progress</q>.</p>  cite this document as other than <q>work in progress</q>.</p>
81    
82    <p>The scope of this specification is explicitly limited to the <a
83    href="http://suika.fam.cx/www/markup/html/whatpm/readme">Whatpm</a>
84    implementation.  It is <em>not</em> the purpose of this specification
85    to define a general guideline to parse or to check XML documents.
86    This specification does <em>not</em> try to define a new version of
87    XML at all.
88    
89    <p>This version of the specification supports the fourth edition of
90    XML 1.0 and the second edition of XML 1.1.  The fifth edition of XML
91    1.0 might be supported in a later version.  XML namespaces
92    specifications is expected to be supported in a later version of this
93    specification.
94    
95  <p>Comments on this document are welcome and  <p>Comments on this document are welcome and
96  may be sent to the <a href="#author">author</a>.</p>  may be sent to the <a href="#author">author</a>.</p>
97    
# Line 92  normative version.</p> Line 105  normative version.</p>
105    
106  <p class=section-info>This section is <em>non$B!>(Bnormative</em>.</p>  <p class=section-info>This section is <em>non$B!>(Bnormative</em>.</p>
107    
108  <div class="issue ed">...</div>  <p>This specification defines how the parsing and the conformance
109    checking of XML documents should be implemented in the <a
110    href="http://suika.fam.cx/www/markup/html/whatpm/readme">Whatpm</a>
111    XML parser and conformance checker.
112    
113    <p>It is <em>not</em> the purpose of this specification to define,
114    e.g., how to parse XML documents in general; its scope is explicitly
115    limited to the <a
116    href="http://suika.fam.cx/www/markup/html/whatpm/readme">Whatpm</a>
117    implementation.
118    
119    <div class="issue ed">...
120    
121    <p>Much of invalid (well-formed or not) XML document parsing and XML document
122    / XML DOM conformance is left undefined so that this document provides a
123    guideline for conformance checkers.
124    </div>
125    
126    
127  </div>  </div>
# Line 118  else in this specification is normative. Line 147  else in this specification is normative.
147  <p><span class=ed>Algorithm is normative but non-normative</span>.  <p><span class=ed>Algorithm is normative but non-normative</span>.
148  In addition, the order in which <a href="#errors">errors</a> are  In addition, the order in which <a href="#errors">errors</a> are
149  raised is undefined.</p>  raised is undefined.</p>
150    
151    <p>This document sometimes cites parts of <abbr>XML</abbr> 1.0 specification
152    by hyperlinks.  When the document being processed is an <abbr>XML</abbr> 1.1
153    document, however, corresponding parts of the <abbr>XML</abbr> 1.1
154    specification should be consulted instead.</p>
155  </div>  </div>
156    
157    <div class=section id=processing-model>
158    <h2>Processing Model</h2>
159    
160    <p>Conceptually, validation of an XML document is split into two
161    stages for the purpose of this specification: the <dfn
162    id=xml-document-parsing>XML document parsing</dfn> stage and the <dfn
163    id=dom-xml-conformance-checking>DOM XML conformance checking</dfn>
164    stage.
165    
166    <p>The input to the XML document parsing stage is a byte sequence
167    representing the parsed XML document (and any additional metadata),
168    and the output are a DOM tree representing the XML document and zero
169    or more <a href="#error">errors</a>.  The processor that implements
170    this stage is called <dfn id=parser>parser</dfn>.  Requirements for a
171    parser are defined in the section of <a href="#parsing-xml">Parsing an
172    XML Document</a>.
173    
174    <p>The input to the DOM XML conformance chcking stage is a DOM tree,
175    and the output are zero or more <a href="#error">errors</a>.  The
176    processor that implements this stage is called <dfn
177    id=conformance-checker>conformance checker</dfn>.  Requirements for a
178    conformance checker are defined in the section of <a
179    href="#checking-dom">Checking an XML DOM Tree</a>.
180    
181    
182  <div class=section id=error-categories>  <div class=section id=error-categories>
183  <h2>Error Classification</h2>  <h3>Error Classification</h3>
184    
185    <p class=ed>An <dfn id=error>error</dfn> is ...
186    
187  <p class=ed>If a <code>Document</code> node has no  <p class=ed>If a <code>Document</code> node has no
188  xml-well-formedness-error, entity-error, and unknown-error,  xml-well-formedness-error, entity-error, and unknown-error,
# Line 205  can be easily serialized into a valid XM Line 265  can be easily serialized into a valid XM
265    violate to any well$B!>(Bformedness constraint in XML    violate to any well$B!>(Bformedness constraint in XML
266    specification <cite class="bibref normative">[<a href="#ref-XML10">XML10</a>,    specification <cite class="bibref normative">[<a href="#ref-XML10">XML10</a>,
267    <a href="#ref-XML11">XML11</a>]</cite>.</p></dd>    <a href="#ref-XML11">XML11</a>]</cite>.</p></dd>
268    <dt><dfn id=misc-info><code>misc-info</code></dfn>
269      <dd><p>A <code>misc-info</code> is raised when some status information
270      on parsing or checking process that are considered useful for debugging
271      and so on is available.  It by no means implies the non-conformance of
272      the document.
273  </dl>  </dl>
274    
275  <div class=ed><p>@@ TODO: #dt-atuseroption at user option  <div class=ed><p>@@ TODO: #dt-atuseroption at user option
276  (MAY or MUST), #dt-compat for compatibility,  (MAY or MUST), #dt-compat for compatibility,
277  #dt-interop for interoperability</p></div>  #dt-interop for interoperability</p>
278    
279    <p>TODO: XML 1.1, XML Namespace 1.0/1.1, xml:base, xml:id
280    
281    <p>TODO: XML "error"/"fatal error" is not always non-conforming (only
282    when MUST or SHOULD).
283    </div>
284    
285    </div>
286    
287  </div>  </div>
288    
289  <div class=section id=parsing-xml>  <div class=section id=parsing-xml>
290  <h2>Parsing <abbr>XML</abbr> Document</h2>  <h2>Parsing an <abbr>XML</abbr> Document</h2>
291    
292    <p>When a byte stream that represents an XML document is given to a
293    parser, it <em class=rfc2119>MUST</em> create a DOM tree according to
294    relevant specifications <span class=ed>[XML10, XML11, XMLNAMES10,
295    XMLNAMES11, DOM3CORE, WEBDOMCORE, DOMDTDEF, MANAKAIDOMEXT]</span>.
296    
297    <p>The parser <em class=rfc2119>MAY</em> continue the parsing of the
298    document even after a fatal error (as defined by the relavant
299    specifications) is encountered.  How the parsing ought to be continued
300    is not defined by this specification.
301    
 <ul>  
 <li>If the <abbr>XML</abbr> document does not begin with an  
 <abbr>XML</abbr> declaration, then raise an  
 <a href="#xml-misc-recommentation" id=xmr-xml-decl><code>xml-misc-recommendation</code></a>.</li>  
 <li>If the replacement text of an entity declaration is  
 <code>&lt;</code>, then raise an  
 <a href="#xml-misc-warning" id=xmw-entity-value-lt><code>xml-misc-warning</code></a>.<!--  
 "strongly advised to avoid" in a Note in Section 2.3 of [XML10], [XML11].  
 --></li>  
 <li>If there is an element type declaration whose <code>Name</code>  
 value is already declared, then raise an  
 <a href="#xml-validity-error" id=vc-edunique><code>xml-validity-error</code></a>.</li>  
 <li>If attribute definition whose <code>Name</code> is  
 <code>xml:space</code> has <span class=ed>declared type different from  
 (default|preserve), (default), or (preserve)</span>, then raise an  
 <a href="#xml-misc-error" id=xme-ad-xml-space><code>xml-misc-error</code></a>.  
 <span class=ed>@@ duplication with  
 <a href="#xml-at-xml-space">#xml-at-xml-space</a>.<!--  
 <!ATTLIST e xml:space CDATA #IMPLIED xml:space CDATA #IMPLIED> --></span></li>  
 <li>If an empty-element tag is used for an element which is <em>not</em>  
 declared <code>EMPTY</code>, then raise an  
 <a href="#xml-misc-recommentation" id=xmr-emptyelemtag-not-empty><code>xml-misc-recommendation</code></a>.</li>  
 <li>If an empty-element tag is <em>not</em> used for an element which is  
 declared <code>EMPTY</code>, then raise an  
 <a href="#xml-misc-recommentation" id=xmr-empty-not-emptyelemtag><code>xml-misc-recommendation</code></a>.</li>  
302  <!--  <!--
303      <div class="note memo informative">
304      <p>It is expected that the XML5 specification <span class=ed>@@ ref</span>
305      will define how the parser has to convert any string into DOM tree
306      completely.
307      </div>
308    -->
309    
310  #vc-PEinMarkupDecl Validity constraint: Proper Declaration/PE Nesting  <p class=ed>A future version of this specification might define the
311  #wfc-PEinInternalSubset Well-formedness constraint: PEs in Internal Subset  entire parser in terms of input stream preprocessor, tokenizer, and
312  #ExtSubset Well-formedness constraint: External Subset  tree constructor.
 #PE-between-Decls Well-formedness constraint: PE Between Declarations  
 #GIMatch Well-formedness constraint: Element Type Match  
 #uniqattspec Well-formedness constraint: Unique Att Spec  
 #NoExternalRefs Well-formedness constraint: No External Entity References  
 #CleanAttrVals Well-formedness constraint: No < in Attribute Values  
 #vc-PEinGroup Validity constraint: Proper Group/PE Nesting  
 "For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text SHOULD contain at least one non-blank character, and neither the first nor last non-blank character of the replacement text SHOULD be a connector (| or ,)."  
 "For interoperability, an XML processor MAY at user option issue a warning when more than one attribute-list declaration is provided for a given element type, or more than one attribute definition is provided for a given attribute, but this is not an error."  
 #condsec-nesting Validity constraint: Proper Conditional Section/PE Nesting  
 #wf-Legalchar Well-formedness constraint: Legal Character  
 #textent Well-formedness constraint: Parsed Entity  
 #norecursion Well-formedness constraint: No Recursion  
 #indtd Well-formedness constraint: In DTD  
 "External parsed entities SHOULD each begin with a text declaration."  
 "It is an error for a reference to an unparsed entity to appear in the EntityValue in an entity declaration."  
 #UniqueNotationName Validity constraint: Unique Notation Name  
   
 @@ Need detailed review, but maybe should be in parsing phase  
   
 #vc-check-rmd Validity constraint: Standalone Document Declaration  
   
 @@ Need dtailed review  
   
 #wf-entdeclared Well-formedness constraint: Entity Declared  
 #vc-entdeclared Validity constraint: Entity Declared  
 "For interoperability, valid documents SHOULD declare the entities amp, lt, gt, apos, quot, in the form specified in 4.6 Predefined Entities."  
 "If the entities lt or amp are declared, they MUST be declared as internal entities whose replacement text is a character reference to the respective character (less-than sign or ampersand) being escaped; the double escaping is REQUIRED for these entities so that references to them produce a well-formed result. If the entities gt, apos, or quot are declared, they MUST be declared as internal entities whose replacement text is the single character being escaped (or a character reference to that character; the double escaping here is OPTIONAL but harmless)."  
313    
314  @@ flaged and then reported in DOM check phase  <p>In addition, the following requirements are applied to the parser:
315    
316  "Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin with the Byte Order Mark"  <dl class=switch>
317  "In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 MUST begin with a text declaration"  <dt>For each external entity (including the document entity and the external
318    subset entity, if any)
319      <dd>If there is a byte sequence that are not legal in the encoding in use,
320      then the parser <em class=rfc2119>MUST</em> raise an
321      <a href="#xml-misc-error" id=xme-illegal-bytes><code>xml-misc-error</code></a>.
322      <!--
323         <q>It is a fatal error when an XML processor encounters an entity with an
324         encoding that it is unable to process. It is a fatal error if an XML
325         entity is determined (via default, encoding declaration, or higher-level
326         protocol) to be in a certain encoding but contains byte sequences that are
327         not legal in that encoding.</q>
328      -->
329    
330      <dd>If it is the document entity or a general entity, then:
331        <ul>
332        <li>If the input byte sequence for the entity begins with the
333        <abbr title="BYTE ORDER MARK" class=charname>BOM</abbr>, then the parser
334        <em class=rfc2119>MUST</em> set the <span class=ed>BOM flag</span> of
335        the node corresponding to the entity (the <code>Document</code> node
336        for the document entity or an <code>Entity</code> node for a general
337        entity) to <code>true</code>.
338        <!--
339          <q>Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY
340          begin with the Byte Order Mark</q>
341        -->
342        <span class=ed>@@ flag must be checked later</span>
343        <!-- <?xml encoding=""?> must be reflected to xmlEncoding; this should be
344        enforced by DOM Core spec. -->
345        </ul>
346      <dd>If it is a parameter entity or the external subset entity, then:
347        <ul>
348        <li>If the character encoding of the entity is <code>UTF-16</code> but
349        the input byte stream for the entity does not begin with the
350        <abbr title="BYTE ORDER MARK" class=charname>BOM</abbr>, then the parser
351        <em class=rfc2119>MUST</em> raise an
352        <a href="#xml-misc-error" id=xme-pe-bom><code>xml-misc-error</code></a>.
353        <li class=ed>@@ encoding="" preferred name?
354    <!--
355  "In an encoding declaration, the values "UTF-8", "UTF-16", "ISO-10646-UCS-2", and "ISO-10646-UCS-4" SHOULD be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values "ISO-8859-1", "ISO-8859-2", ... "ISO-8859-n" (where n is the part number) SHOULD be used for the parts of ISO 8859, and the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP" SHOULD be used for the various encoded forms of JIS X-0208-1997. It is RECOMMENDED that character encodings registered (as charsets) with the Internet Assigned Numbers Authority [IANA-CHARSETS], other than those just listed, be referred to using their registered names; other encodings SHOULD use names starting with an "x-" prefix."  "In an encoding declaration, the values "UTF-8", "UTF-16", "ISO-10646-UCS-2", and "ISO-10646-UCS-4" SHOULD be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values "ISO-8859-1", "ISO-8859-2", ... "ISO-8859-n" (where n is the part number) SHOULD be used for the parts of ISO 8859, and the values "ISO-2022-JP", "Shift_JIS", and "EUC-JP" SHOULD be used for the various encoded forms of JIS X-0208-1997. It is RECOMMENDED that character encodings registered (as charsets) with the Internet Assigned Numbers Authority [IANA-CHARSETS], other than those just listed, be referred to using their registered names; other encodings SHOULD use names starting with an "x-" prefix."
356    
357  @@ in parsing phase  In addition, this should be checked later for Document and Entity nodes.
358    -->
359  "It is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process. It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains byte sequences that are not legal in that encoding."      </ul>
360    
361    <dt>For the document
362      <dd>If the <abbr>XML</abbr> document does not begin with an
363      <abbr>XML</abbr> declaration, then the parser <em class=rfc2119>MUST</em>
364      raise an
365      <a href="#xml-misc-recommentation" id=xmr-xml-decl><code>xml-misc-recommendation</code></a>.
366      <dd>If the document does not contain the document type declaration, or
367      if it does but the document type definition does not contain entity
368      declaration for any of <code>amp</code>, <code>lt</code>, <code>gt</code>,
369      <code>apos</code>, or <code>quot</code>, then the parser
370      <em class=rfc2119>MUST</em> raise
371      <a href="#xml-misc-recommentation" id=xmr-predefined-decl><code>xml-misc-recommendation</code></a>(s).
372      <!--
373        <q>For interoperability, valid documents SHOULD declare the entities
374        amp, lt, gt, apos, quot, in the form specified in 4.6 Predefined
375        Entities.</q>
376      -->
377    <dt>For the document type declaration
378      <dd class=ed>@@ read external entity
379      <dd>The <code>entities</code> attribute of the <code>DocumentType</code>
380      node <em class=rfc2119>MUST</em> contain a <code>NamedNodeMap</code> object
381      whose first five items are as follows:
382        <ol start=0>
383        <li>An <code>Entity</code> node whose <code>nodeName</code> attribute
384        is <code>amp</code>.  It contains a <code>Text</code> node whose
385        <code>data</code> attribute is set to <code>&amp;</code>.
386        <li>An <code>Entity</code> node whose <code>nodeName</code> attribute
387        is <code>lt</code>.  It contains a <code>Text</code> node whose
388        <code>data</code> attribute is set to <code>&lt;</code>.
389        <li>An <code>Entity</code> node whose <code>nodeName</code> attribute
390        is <code>gt</code>.  It contains a <code>Text</code> node whose
391        <code>data</code> attribute is set to <code>></code>.
392        <li>An <code>Entity</code> node whose <code>nodeName</code> attribute
393        is <code>quot</code>.  It contains a <code>Text</code> node whose
394        <code>data</code> attribute is set to <code>"</code>.
395        <li>An <code>Entity</code> node whose <code>nodeName</code> attribute
396        is <code>apos</code>.  It contains a <code>Text</code> node whose
397        <code>data</code> attribute is set to <code>'</code>.
398        </ol>
399    <dt>For each internal general entity declaration being processed by the parser
400      <dd>If the
401      <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#NT-EntityValue"><code>EntityValue</code></a>
402      part of the general entity declaration contains a bare <code>U+003C</code>
403      <code>LESS-THAN SIGN</code> (<code>&lt;</code>) character, then the parser
404      <em class=rfc2119>MUST</em> raise an
405      <a href="#xml-misc-warning" id=xmw-entity-value-lt><code>xml-misc-warning</code></a>.<!--
406    "strongly advised to avoid" in a Note in Section 2.3 of [XML10], [XML11].
407  -->  -->
408    <dt>For each element type declaration being processed by the parser
409      <dd>If there is another processed element type declaration whose
410      <code>Name</code> is equal to the <code>Name</code> of the element type
411      declaration, then the parser <em class=rfc2119>MUST</em> raise an
412      <a href="#xml-validity-error" id=vc-edunique><code>xml-validity-error</code></a>.
413    <dt>For each attribute definition list declaration being processed by the
414    parser
415      <dd>If there is another processed attribute defintion list declaration whose
416      <code>Name</code> is equal to the <code>Name</code> of the attribute
417      definition list declaration, then the parser <em class=rfc2119>MUST</em>
418      raise an
419      <a href="#xml-misc-warning" id=xme-attlist-unique><code>xml-misc-warning</code></a>.
420      <dd>For each attribute definition in the attribute definition list
421      declaration, if there is another processed attribute definition whose
422      <code>Name</code> is equal to the <code>Name</code> of the attribute
423      definition (whether or not in the same attribute definition list
424      declaration), then the parser <em class=rfc2119>MUST</em> raise an
425      <a href="#xml-misc-warning" id=xme-attrdef-unique><code>xml-misc-warning</code></a>.
426      <!--
427        <q>For interoperability, an XML processor <em class=rfc2119>MAY</em> at
428        user option issue a warning when more than one attribute-list declaration
429        is provided for a given element type, or more than one attribute definition
430        is provided for a given attribute, but this is not an error.</q>
431      -->
432    <!--
433      NOTE: <!ATTLIST a xml:space (default) #IMPLIED xml:space CDATA #IMPLIED>
434      will not be warned.
435    -->
436    
437    <dt>For each entity declaration being processed by the parser
438      <dd>Handle as follows:
439        <ol>
440        <li><p>If the entity declaration declares a general entity, the following
441        is applied:
442          <dl>
443          <dt>If the <code>Name</code> is <code>lt</code> or <code>amp</code>
444            <dd><p>If the entity declaration does not declare an internal entity,
445            or if the replacement text of the entity is not the escaped form of
446            <code>&lt;</code> (if <code>lt</code>) or <code>&amp;</code> (if
447            <code>amp</code>), then the parser <em class=rfc2119>MUST</em> raise an
448            <a href="#xml-misc-error" id=xme-double-escape><code>xml-misc-error</code></a>.
449    
450              <div class="note memo informative">
451              <p>In other word, the character in the <code>EntityValue</code>
452              has to be double-escaped.
453              </div>
454          <dt>If the <code>Name</code> is <code>gt</code>, <code>quot</code>, or
455          <code>apos</code>
456            <dd><p>If the entity declaration does not declare an internal entity,
457            or if the replacement text of the entity is not equal to or not the
458            escaped form of <code>></code> (if <code>gt</code>), <code>"</code> (if
459            <code>quot</code>), or <code>'</code> (if <code>apos</code>), then the
460            parser <em class=rfc2119>MUST</em> raise an
461            <a href="#xml-misc-error" id=xme-single-escape><code>xml-misc-error</code></a>.
462    
463              <div class="note memo informative">
464              <p>In other word, the character in the <code>EntityValue</code>
465              has to be single- or double-escaped.
466              </div>
467          </dl>
468          <!--
469            <q>If the entities lt or amp are declared, they MUST be declared as internal entities whose replacement text is a character reference to the respective character (less-than sign or ampersand) being escaped; the double escaping is REQUIRED for these entities so that references to them produce a well-formed result. If the entities gt, apos, or quot are declared, they MUST be declared as internal entities whose replacement text is the single character being escaped (or a character reference to that character; the double escaping here is OPTIONAL but harmless).</q>
470          -->
471    
472        <li><p>If the entity declaration has to be ignored since there has already
473        been declared an entity with the same <code>Name</code> as the declaration,
474        then the parser <em class=rfc2119>MUST</em> raise a
475        <a href="#misc-info" id=mi-ent-unique><code>misc-info</code></a>
476        and abort these steps.
477    
478        <div class="informative note memo">
479        <p>Five predefined entities, i.e. <code>amp</code>, <code>lt</code>,
480        <code>gt</code>, <code>quot</code>, and <code>apos</code>, are always
481        declared implicitly and therefore any declaration for such an entity
482        always raises an
483        <a href="#misc-info" id=mi-ent-unique><code>misc-info</code></a>.
484        </div>  
485    
486        <li><p>If the entity declaration declares a parameter entity and the
487        <code>Name</code> of the entity begins with the string <code>xml</code>
488        (in any combination of upper- and lowercase letters), then the parser
489        <em class=rfc2119>MUST</em> raise an
490        <a href="#xml-misc-warning" id=xmw-reserved-pe-name><code>xml-misc-warning</code></a>.
491    
492        <li><p>If the entity declaration contains the <code>EntityValue</code>,
493        then for each occurence of any references to unparsed entities in the
494        <code>EntityValue</code>, the parser <em class=rfc2119>MUST</em> raise an
495        <a href="#xml-misc-error" id=xme-unparsed-in-ev><code>xml-misc-error</code></a>.
496        <!--
497          <q>It is an error for a reference to an unparsed entity to appear in the
498          EntityValue in an entity declaration.</q>
499        -->
500        <li><p>If the entity declaration declares a general entity, then an
501        <code>Entity</code> node <em class=rfc2119>MUST</em> be created and
502        appended to the <code>NamedNodeMap</code> object in the
503        <code>entities</code> attribute of the <code>DocumentType</code> node.
504        
505        <p class=ed>Read the external entity
506    
507        <p>If the replacement text of the entity is read, then parse the
508        replacement text as if it were referenced from the content of an
509        element (with no namespace bindings).  If no <span class=ed>@@ parse error</span>
510        is raised by the parsing process, then the nodes generated by the
511        parsing <em class=rfc2119>MUST</em> be appended to the <code>Entity</code>
512        node.  The parse error <em class=rfc2119>MUST NOT</em> be propagated to
513        the entire parsing process.  Other kinds of errors
514        <em class=rfc2119>MUST</em> be propagated.  The first parse error
515        <em class=rfc2119>MUST</em> abort the internal parsing process.
516        <span class=ed>@@ better wording</span>
517    
518        <p class=ed>@@ prop
519        
520        <p>Then, the <code>Entity</code> node and its descendant
521        <em class=rfc2119>MUST</em> be marked as read-only.
522      </ol>
523    
524    <dt>For each notation declaration being processed by the parser
525      <dd>If there is another processed notation declaration whose
526      <code>Name</code> is equal to the <code>Name</code> of the notation
527      declaration, then the parser <em class=rfc2119>MUST</em> raise an
528      <a href="#xml-validity-error" id=vc-uniquenotationname><code>xml-validity-error</code></a>.
529      <!-- <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#UniqueNotationName">Validity constraint: Unique Notation Name</a> -->
530    
531    <dt>For each empty-element tag
532      <dd>If the <code>Name</code> of the tag is not declared by a processed
533      element type declaration as <code>EMPTY</code> content, then the parser
534      <em class=rfc2119>MUST</em> raise an
535      <a href="#xml-misc-recommentation" id=xmr-emptyelemtag-not-empty><code>xml-misc-recommendation</code></a>.
536    <dt>For each start-tag
537      <dd>If the <code>Name</code> of the tag is declared by a processed element
538      type declaration as <code>EMPTY</code> content, then the parser
539      <em class=rfc2119>MUST</em> raise an
540      <a href="#xml-misc-recommentation" id=xmr-empty-not-emptyelemtag><code>xml-misc-recommendation</code></a>.
541    <dt>For each attribute
542      <dd><p>The parser <em class=rfc2119>MUST</em> set the normalized value of
543      the attribute to the <code>value</code> attribute of the <code>Attr</code>
544      node created for the attribute.
545    
546        <div class="note memo informative">
547        <p>That is, any entity reference has to be expanded.  Unexpanded entity
548        references in attribute values are discarded.
549        </div>
550    <dt>For each <code>xml:space</code> attribute
551      <dd>The parser <em class=rfc2119>MUST</em> set the normalized value of
552      the <code>xml:space</code> attribute to the <code>value</code> attribute
553      of the <code>Attr</code> node created for the attribute even if the
554      normalized value is different from <code>default</code> or
555      <code>preserve</code>.
556      <!-- In XML 1.0/1.1 specification, the attribute specification MAY be
557      ignored. -->
558    
559    <dt>For each parameter entity reference
560      <dd><p>Process as follows:
561        <ol>
562        <li>If the declaration for the entity is <em>not</em> processed,
563        then:
564          <dl class=switch>
565          <dt>If the document contains no external entity or if the document
566          contains the <code>standalone</code> pseudo-attribute set to
567          <code>yes</code><!-- or the document contains no DTD -->
568            <dd>The parser <em class=rfc2119>MUST</em> raise an
569            <a href="#xml-well-formedness-error" id=wf-entdeclared-pe><code>xml-well-formedness-error</code></a>.
570          <dt>Otherwise
571            <dd>The parser <em class=rfc2119>MUST</em> raise an
572            <a href="#xml-validity-error" id=vc-entdeclared-pe><code>xml-validity-error</code></a>.
573          </dl>
574        <li>If the declaration for the entity <em>is</em> processed but the
575        referenced entity cannot be retrieved, then the parser
576        <em class=rfc2119>MUST</em> raise an
577        <span class=ed>@@ ??-error</span>.
578        </ol>
579    
580        <p>In any of two cases above, process as follows:
581        <ul>
582        <li>If the parameter entity reference is contained in a declaration, then
583        the declaration <em class=rfc2119>MUST</em> be ignored <em>except</em> that
584        any error before the parameter entity <em class=rfc2119>MUST</em> be
585        raised as usual.
586        <li>If the parameter entity reference is contained in the status portion of
587        a conditional section, then the conditional section
588        <em class=rfc2119>MUST</em> be processed as if it were an
589        <code>IGNORE</code>d section.
590        <li>The parser <em class=rfc2119>MUST NOT</em> process any entity or
591        attribute-list declaration after the parameter entity reference in the DTD
592        <em>except</em> when the <code>standalone</code> pseudo-attribute of the
593        XML declaration (if any) is set to <code>yes</code>.
594        <!-- This requirement is enforced for internal DTD subset case in
595        XML 1.0/1.1 specification (section 5.1) but not for any other cases. -->
596        <!-- According to this definition, element type declarations, notation
597        declarations, and PIs ARE processed. -->
598        <li>If the parameter entity reference is the first reference to an entity
599        that is not read, then the parser <em class=rfc2119>MUST</em> raise an
600        <a href="#entity-error" id=ee-unread-pe><code>entity-error</code></a>.
601        <li>The <code>allDeclarationsProcessed</code> <span class=ed>@@ ref</span>
602        attribute of the <code>Document</code> node <em class=rfc2119>MUST</em> be
603        set to <code>false</code>.
604        </ul>
605    <dt>For each general entity reference in an attribute value or in the content
606    of an element
607      <dd><p>Process as follows:
608        <ol>
609        <li>If the <code>Name</code> of the entity reference is either
610        <code>amp</code>, <code>lt</code>, <code>gt</code>, <code>quot</code>,
611        or <code>apos</code>, then abort these steps.
612        <li>If the declaration for the entity is <em>not</em> processed,
613        then:
614          <dl class=switch>
615          <dt>If the document contains no external entity or if the document
616          contains the <code>standalone</code> pseudo-attribute set to
617          <code>yes</code><!-- or the document contains no DTD -->
618            <dd>The parser <em class=rfc2119>MUST</em> raise an
619            <a href="#xml-well-formedness-error" id=wf-entdeclared-ge><code>xml-well-formedness-error</code></a>.
620          <dt>Otherwise
621            <dd>The parser <em class=rfc2119>MUST</em> raise an
622            <a href="#xml-validity-error" id=vc-entdeclared-ge><code>xml-validity-error</code></a>.
623          </dl>
624        <li>If the declaration for the entity <em>is</em> processed but the
625        referenced entity cannot be retrieved, then the parser
626        <em class=rfc2119>MUST</em> raise an
627        <span class=ed>@@ ??-error</span>.
628        </ol>
629    
630        <p>In any of two cases above, process as follows:
631        <ul>
632        <li>If the general entity reference is the first reference to an entity
633        that is not read, then the parser <em class=rfc2119>MUST</em> raise an
634        <a href="#entity-error" id=ee-unread-ge><code>entity-error</code></a>.
635        <span class=ed>@@ entity declared WFC?</span>
636        <li class=ed>An unexpended entity reference node <em class=rfc2119>MUST</em> be inserted to the current node.
637        </ul>
638    
639    <dt>For each comment <em>outside</em> of document type declaration
640      <dd>A <code>Comment</code> node <em class=rfc2119>MUST</em> be created
641      and inserted appropriately.
642      <!-- In XML 1.0/1.1 spec, this is optional. -->
643    </dl>
644    
645    <p>The parser <em class=rfc2119>MUST</em> try to read any entity
646    referenced by general or parameter entity references<!--,--> and the
647    external subset entity, if any<!--, and any general entity declared-->
648    in the document type definition.
649    
650    <p><strong>Well-formedness constraints</strong>.  When the parser
651    detects a voilation to one of certain well-formedness constraints, it
652    <em class=rfc2119>MUST</em> raise an <a
653    href="#xml-well-formedness-error"><code>xml-well-formedness-error</code></a>.
654    The list of such well-formed constraints is as follows:
655    
656    <ul>
657    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#wfc-PEinInternalSubset">Well-formedness constraint: PEs in Internal Subset</a>
658    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#GIMatch">Well-formedness constraint: Element Type Match</a>
659    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#uniqattspec">Well-formedness constraint: Unique Att Spec</a>
660    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#NoExternalRefs">Well-formedness constraint: No External Entity References</a>
661    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#CleanAttrVals">Well-formedness constraint: No &lt; in Attribute Values</a>
662    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#wf-Legalchar">Well-formedness constraint: Legal Character</a>
663    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#textent">Well-formedness constraint: Parsed Entity</a>
664    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#norecursion">Well-formedness constraint: No Recursion</a>
665    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#indtd">Well-formedness constraint: In DTD</a>
666  </ul>  </ul>
667    
668    <p><strong>Validity constraints</strong>.  When the parser detects a
669    violation to one of certain validity contraints, it <em
670    class=rfc2119>MUST</em> raise an <a
671    href="#xml-validity-error"><code>xml-validity-error</code></a>.  The
672    list of such validity constraints is as follows:
673    
674    <ul>
675    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#vc-PEinMarkupDecl">Validity constraint: Proper Declaration/PE Nesting</a>
676    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#vc-PEinGroup">Validity constraint: Proper Group/PE Nesting</a>
677    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#condsec-nesting">Validity constraint: Proper Conditional Section/PE Nesting</a>
678    <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#vc-check-rmd">Validity constraint: Standalone Document Declaration</a>
679    </ul>
680    
681    <p><strong>Other creteria</strong>.  If the parser detects a violation
682    to one of certain additional constraints, it <em
683    class=rfc2119>MUST</em> raise an <a
684    href="#xml-misc-recommendation"><code>xml-misc-recommendation</code></a>.
685    The list of such constraints is as follows:
686    
687    <ul>
688    <li><q>For interoperability, if a parameter-entity reference appears in a
689    <code>choice</code>, <code>seq</code>, or <code>Mixed</code> construct, its
690    replacement text <em class=rfc2119>SHOULD</em> contain at least one non-blank
691    character, and neither the first nor last non-blank character of the
692    replacement text <em class=rfc2119>SHOULD</em> be a connector (<code>|</code>
693    or <code>,</code>).</q>
694    <li><q>External parsed entities <em class=rfc2119>SHOULD</em> each begin with a
695    text declaration.</q>
696    </ul>
697    
698    <p>The parser <em class=rfc2119>MUST</em> act as if it is a <a
699    href="http://www.w3.org/TR/2006/REC-xml-20060816/#dt-validating">validating
700    XML processor</a> for the purpose of informing of white space
701    characters appearing in <a
702    href="http://www.w3.org/TR/2006/REC-xml-20060816/#dt-elemcontent">element
703    content</a> (See <a
704    href="http://www.w3.org/TR/2006/REC-xml-20060816/#sec-white-space">Section
705    2.10</a> of the XML specification).
706    
707      <div class="note memo informative">
708      <p>In other word, the <code>isElementContentWhitespace</code> attribute
709      of <code>Text</code> nodes has to be set appropriately.  Note that the
710      value of the attribute will be set to <code>false</code> for any
711      <code>Text</code> node in the content of an element whose declaration
712      is not processed.
713      </div>
714    
715    <p>The parser <em class=rfc2119>MUST</em> raise at least one <a
716    href="#xml-well-formedness-error"
717    id=wfe-syntax><code>xml-well-formedness-error</code></a> if the entity
718    it parses does not match to the appropriate production rule in the XML
719    specification.  As an exception to this requirement, it <em
720    class=rfc2119>MAY</em> choose not to raise such an error if the error
721    will be raised by the conformance checker when the conformance checker
722    <a href="#algorithm-to-check-a-node" title="check a node">checks</a>
723    the <code>Document</code> object produced by the parser.
724    
725    <!--
726      <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#ExtSubset">Well-formedness constraint: External Subset</a>
727      <li><a href="http://www.w3.org/TR/2006/REC-xml-20060816/#PE-between-Decls">Well-formedness constraint: PE Between Declarations</a>
728    -->
729    
730    <!--
731      Inpossible to test:
732        "In the absence of external character encoding information (such as MIME
733        headers), parsed entities which are stored in an encoding other than UTF-8
734        or UTF-16 MUST begin with a text declaration"
735    -->
736  </div>  </div>
737    
738  <div class="section" id=checking-dom>  <div class="section" id=checking-dom>
739  <h2>Checking <abbr>DOM</abbr></h2>  <h2>Checking an <abbr>XML</abbr> <abbr>DOM</abbr> Tree</h2>
740    
741  <p>The following algorithms and definitions are applied to  <p>The following algorithms and definitions are applied to
742  <abbr>XML</abbr> documents; especially, they are not applied  <abbr>XML</abbr> documents; especially, they are not applied
# Line 367  character that is <em>not</em> in the ch Line 814  character that is <em>not</em> in the ch
814  case combination), then raise an  case combination), then raise an
815  <a href="#xml-misc-warning" id=xmw-reserved-name><code>xml-misc-warning</code></a>.  <a href="#xml-misc-warning" id=xmw-reserved-name><code>xml-misc-warning</code></a>.
816  <span class=ed>@@ except for attribute names <code>xml:lang</code>,  <span class=ed>@@ except for attribute names <code>xml:lang</code>,
817  <code>xml:space</code>, <code>xml:base</code>, <code>xml:id</code>,  <code>xml:space</code><!--, <code>xml:base</code>, <code>xml:id</code>,
818  <code>xmlns</code>, <code>xmlns:<var>*</var></code>,  <code>xmlns</code>, <code>xmlns:<var>*</var></code>,
819  pi name <code>xml-stylesheet</code>.</span><!--  pi name <code>xml-stylesheet</code>-->.</span><!--
820  "names beginning with a match to (('X'|'x')('M'|'m')('L'|'l')) are reserved for standardization in this or future versions of this specification.":  "names beginning with a match to (('X'|'x')('M'|'m')('L'|'l')) are reserved for standardization in this or future versions of this specification.":
821  xmlns, xml-stylesheet, xml:base and xml:id specifications violate to this sentense!  xmlns, xml-stylesheet, xml:base and xml:id specifications violate to this sentense!
822  --></li>  --></li>
# Line 559  following:</p> Line 1006  following:</p>
1006      <li>If <code>nodeName</code> attribute of <var>n</var> is      <li>If <code>nodeName</code> attribute of <var>n</var> is
1007      <code>xml:space</code> <span class=ed>@@ or {xml namespace}:space ?</span>      <code>xml:space</code> <span class=ed>@@ or {xml namespace}:space ?</span>
1008      and <span class=ed>its declared type is different from (default|preserve),      and <span class=ed>its declared type is different from (default|preserve),
1009      (default), or (preserve)</span>, then raise an      (preserve|default), (default), or (preserve)</span>, then raise an
1010      <a href="#xml-misc-error" id=xme-at-xml-space><code>xml-misc-error</code></a>.</li>      <a href="#xml-misc-error" id=xme-at-xml-space><code>xml-misc-error</code></a>.</li>
1011      <li>For each node <dfn id=var-ad-nc><var>n<sub><var>c</var></sub></var></dfn> in the      <li>For each node <dfn id=var-ad-nc><var>n<sub><var>c</var></sub></var></dfn> in the
1012      <code>childNodes</code> list of <var>n</var>,      <code>childNodes</code> list of <var>n</var>,
# Line 714  following:</p> Line 1161  following:</p>
1161      <a href="#algorithm-to-check-a-node" title="check a node">check the      <a href="#algorithm-to-check-a-node" title="check a node">check the
1162      node</a> recursively.</li>      node</a> recursively.</li>
1163      <li class=ed>@@ externally declared?</li>      <li class=ed>@@ externally declared?</li>
1164        <li>If the <code>NamedNodeMap</code> object in the <code>entities</code>
1165        attribute of <var>n</var> does not contain <code>Entity</code> nodes
1166        whose <code>nodeName</code> attribute are <code>amp</code>,
1167        <code>lt</code>, <code>gt</code>, <code>apos</code>, and <code>quot</code>
1168        then raise
1169        <a href="#xml-misc-recommentation" id=xmr-predefined><code>xml-misc-recommendation</code></a>(s).
1170      </ol>      </ol>
1171    </dd>    </dd>
1172  <dt>If <var>n</var> is an <code>Element</code> node</dt>  <dt>If <var>n</var> is an <code>Element</code> node</dt>
# Line 901  parsed entity)</dt> Line 1354  parsed entity)</dt>
1354      <li>If the <code>childNodes</code> list of <var>n</var> contains      <li>If the <code>childNodes</code> list of <var>n</var> contains
1355      any nodes, then raise an      any nodes, then raise an
1356      <a href="#xml-well-formedness-error" id=wfe-pi-child><code>xml-well-formedness-error</code></a>.</li>      <a href="#xml-well-formedness-error" id=wfe-pi-child><code>xml-well-formedness-error</code></a>.</li>
1357        <li class=ed>@@ Warn if not declared
1358      </ol>      </ol>
1359    </dd>    </dd>
1360  <dt>If <var>n</var> is a <code>Text</code> node</dt>  <dt>If <var>n</var> is a <code>Text</code> node</dt>
# Line 1052  PARENTHESIS</code> (<code class=char>)</ Line 1506  PARENTHESIS</code> (<code class=char>)</
1506  (<code class=char>.</code>)</li>  (<code class=char>.</code>)</li>
1507  <li><code class=char>U+002F</code> <code class=charname>SOLIDUS</code>  <li><code class=char>U+002F</code> <code class=charname>SOLIDUS</code>
1508  (<code class=char>/</code>)</li>  (<code class=char>/</code>)</li>
1509    <li><code class=char>U+0030</code> <code class=charname>DIGIT ZERO</code>
1510    (<code class=char>0</code>) .. <code class=char>U+0039</code>
1511    <code class=charname>DIGIT NINE</code> (<code class=char>9</code>)</li>
1512  <li><code class=char>U+003A</code> <code class=charname>COLON</code>  <li><code class=char>U+003A</code> <code class=charname>COLON</code>
1513  (<code class=char>:</code>)</li>  (<code class=char>:</code>)</li>
1514  <li><code class=char>U+003B</code> <code class=charname>SEMICOLON</code>  <li><code class=char>U+003B</code> <code class=charname>SEMICOLON</code>
# Line 1062  PARENTHESIS</code> (<code class=char>)</ Line 1519  PARENTHESIS</code> (<code class=char>)</
1519  (<code class=char>?</code>)</li>  (<code class=char>?</code>)</li>
1520  <li><code class=char>U+0040</code> <code class=charname>COMMERCIAL AT</code>  <li><code class=char>U+0040</code> <code class=charname>COMMERCIAL AT</code>
1521  (<code class=char>@</code>)</li>  (<code class=char>@</code>)</li>
1522    <li><code class=char>U+0041</code> <code class=charname>LATIN CAPITAL LETTER
1523    A</code> (<code class=char>A</code>) .. <code class=char>U+005A</code>
1524    <code class=charname>LATIN CAPITAL LETTER Z</code>
1525    (<code class=char>Z</code>)</li>
1526  <li><code class=char>U+005F</code> <code class=charname>LOW LINE</code>  <li><code class=char>U+005F</code> <code class=charname>LOW LINE</code>
1527  (<code class=char>_</code>)</li>  (<code class=char>_</code>)</li>
1528    <li><code class=char>U+0061</code> <code class=charname>LATIN CAPITAL LETTER
1529    A</code> (<code class=char>A</code>) .. <code class=char>U+007A</code>
1530    <code class=charname>LATIN CAPITAL LETTER Z</code>
1531    (<code class=char>Z</code>)</li>
1532  </ul>  </ul>
1533  <div class="note memo">  <div class="note memo">
1534  <p>This character class contains all characters allowed in the production rule  <p>This character class contains all characters allowed in the production rule

Legend:
Removed from v.1.20  
changed lines
  Added in v.1.29

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24