/[suikacvs]/messaging/manakai/lib/Message/DOM/XMLParser.dis
Suika

Contents of /messaging/manakai/lib/Message/DOM/XMLParser.dis

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.5 - (hide annotations) (download)
Sat Jan 21 07:06:09 2006 UTC (18 years, 10 months ago) by wakaba
Branch: MAIN
Changes since 1.4: +71 -1 lines
++ manakai/t/ChangeLog	21 Jan 2006 07:06:03 -0000
2006-01-21  Wakaba  <wakaba@suika.fam.cx>

	* Makefile: |dom-xmlparser.t| added.

++ manakai/bin/ChangeLog	21 Jan 2006 06:58:44 -0000
2006-01-12  Wakaba  <wakaba@suika.fam.cx>

	* dac2test.pl: |test:ParserTestSet| and |test:ParserTest|
	test types are implemented.

++ manakai/lib/Message/Util/ChangeLog	21 Jan 2006 07:02:03 -0000
2006-01-19  Wakaba  <wakaba@suika.fam.cx>

	* PerlCode.dis (PCPackage.stringify): Appends a string
	representation of non-|pc:*| element children for
	the stringified value.

++ manakai/lib/Message/Util/DIS/ChangeLog	21 Jan 2006 07:04:28 -0000
2006-01-20  Wakaba  <wakaba@suika.fam.cx>

	* DPG.dis (plCodeFragment): Sets |param| value of default
	for default parse error handler to avoid array dereference error.

	* Test.dis (assertDOMTreeEquals): New method.

2006-01-11  Wakaba  <wakaba@suika.fam.cx>

	* Test.dis (TDTParser): New.
	(PARSE_ERR): New exception type.

++ manakai/lib/Message/DOM/ChangeLog	21 Jan 2006 07:00:52 -0000
2006-01-21  Wakaba  <wakaba@suika.fam.cx>

	* DOMFeature.dis (featuresParamToFeaturesHash): New block
	code (seprated from |InputProcessor|).  Now
	a |features| parameter's version can be specified by
	an array reference that contains a set of version
	numbers.  A test added.

	* XMLParser.dis: A test added.

++ manakai/lib/manakai/ChangeLog	21 Jan 2006 07:05:39 -0000
2006-01-11  Wakaba  <wakaba@suika.fam.cx>

	* Test.dis (test:ParserTestSet, test:ParserTest): New types.
	(test:Entity, test:RootEntity): New types.
	(lang:tdt): New lextype.

1 wakaba 1.1 Module:
2     @QName: MDOM|XMLParser
3     @Namespace:
4     http://suika.fam.cx/~wakaba/archive/2004/dom/xml-parser#
5    
6     @FullName:
7     @@lang:en
8     @@@: XML Parser
9    
10     @DISCore:author: DISCore|Wakaba
11     @License: license|Perl+MPL
12     @Date:
13 wakaba 1.5 $Date: 2005/12/31 12:21:15 $
14 wakaba 1.1
15     @DefaultFor: ManakaiDOM|ManakaiDOMLatest
16    
17     @Require:
18     @@Module:
19     @@@QName: MDOM|DOMLS
20     @@@WithFor: ManakaiDOM|ManakaiDOMLatest
21    
22     Namespace:
23     @dis:
24     http://suika.fam.cx/~wakaba/archive/2004/8/18/lang#dis--
25     @DOMCore:
26     http://suika.fam.cx/~wakaba/archive/2004/8/18/dom-core#
27     @DOMMain:
28     http://suika.fam.cx/~wakaba/archive/2004/dom/main#
29 wakaba 1.5 @dtest:
30     http://suika.fam.cx/~wakaba/archive/2005/manakai/Util/DIS#Test/
31 wakaba 1.1 @dx:
32     http://suika.fam.cx/~wakaba/archive/2005/manakai/Util/Error/DOMException#
33     @ecore:
34     http://suika.fam.cx/~wakaba/archive/2005/manakai/Util/Error/Core/
35     @f:
36     http://suika.fam.cx/~wakaba/archive/2004/dom/feature#
37     @idl:
38     http://suika.fam.cx/~wakaba/archive/2004/dis/IDL#
39     @infoset:
40     http://www.w3.org/2001/04/infoset#
41     @lang:
42     http://suika.fam.cx/~wakaba/archive/2004/8/18/lang#
43     @license:
44     http://suika.fam.cx/~wakaba/archive/2004/8/18/license#
45     @LSEV:
46     http://www.w3.org/2002/DOMLS
47     @ManakaiDOM:
48     http://suika.fam.cx/~wakaba/archive/2004/8/18/manakai-dom#
49     @ManakaiDOMLS:
50     http://suika.fam.cx/~wakaba/archive/2004/mdom-ls#
51     @MDOM:
52     http://suika.fam.cx/~wakaba/archive/2004/8/18/manakai-dom#ManakaiDOM.
53     @MDOMX:
54     http://suika.fam.cx/~wakaba/archive/2004/8/4/manakai-dom-exception#
55     @rdf:
56     http://www.w3.org/1999/02/22-rdf-syntax-ns#
57     @rdfs:
58     http://www.w3.org/2000/01/rdf-schema#
59     @t:
60     http://suika.fam.cx/~wakaba/archive/2004/dom/tree#
61 wakaba 1.5 @test:
62     http://suika.fam.cx/~wakaba/archive/2004/dis/Test#
63 wakaba 1.1 @xml:
64     http://www.w3.org/XML/1998/namespace
65     @xmlns:
66     http://www.w3.org/2000/xmlns/
67     @xp:
68     http://suika.fam.cx/~wakaba/archive/2004/dom/xml-parser#
69    
70     ## -- Features
71    
72     ElementTypeBinding:
73     @Name: FeatureDef
74     @ElementType:
75     dis:ResourceDef
76     @ShadowContent:
77     @@rdf:type: f|Feature
78     @@For: =ManakaiDOM|all
79    
80     ElementTypeBinding:
81     @Name: FeatureVerDef
82     @ElementType:
83     dis:ResourceDef
84     @ShadowContent:
85     @@rdf:type: f|Feature
86    
87     ElementTypeBinding:
88     @Name: featureQName
89     @ElementType:
90     f:name
91     @ShadowContent:
92     @@ContentType: DISCore|QName
93    
94     ResourceDef:
95     @QName: DOMString
96     @AliasFor: DOMMain|DOMString
97     @For: ManakaiDOM|DOM
98    
99     ResourceDef:
100     @QName: Node
101     @AliasFor: t|Node
102     @For: ManakaiDOM|DOM
103    
104     ResourceDef:
105     @QName: Element
106     @AliasFor: t|Element
107     @For: ManakaiDOM|DOM
108    
109     ResourceDef:
110     @QName: Document
111     @AliasFor: t|Document
112     @For: ManakaiDOM|DOM
113    
114     ElementTypeBinding:
115     @Name: ClsDef
116     @ElementType:
117     dis:ResourceDef
118     @ShadowContent:
119     @@rdf:type:
120     @@@@: dis|MultipleResource
121     @@@ForCheck: !ManakaiDOM|ForIF !ManakaiDOM|ForClass
122     @@resourceFor:
123     @@@@: ManakaiDOM|ForClass
124     @@@ForCheck: ManakaiDOM|ManakaiDOM !=ManakaiDOM|ManakaiDOM
125     @@For: ManakaiDOM|DOM3
126     @@For: =ManakaiDOM|ManakaiDOM
127    
128     @@rdf:type:
129     @@@@: DISLang|Class
130     @@@ForCheck: ManakaiDOM|ForClass
131    
132     ElementTypeBinding:
133     @Name: ClsQName
134     @ElementType:
135     dis:QName
136     @ShadowContent:
137     @@ForCheck: ManakaiDOM|ForClass
138    
139     ElementTypeBinding:
140     @Name: ClsISA
141     @ElementType:
142     dis:ISA
143     @ShadowContent:
144     @@ForCheck: ManakaiDOM|ForClass
145    
146     ElementTypeBinding:
147     @Name: nullCase
148     @ElementType:
149     dis:ResourceDef
150     @ShadowContent:
151     @@rdf:type: ManakaiDOM|InCase
152     @@Value:
153     @@@is-null:1
154    
155     ResourceDef:
156     @QName: LSParser
157     @AliasFor: DOMLS|LSParser
158     @For: ManakaiDOM|DOM3
159    
160     ClsDef:
161     @ClsQName: ManakaiXMLParser
162    
163     @Implement: DOMLS|LSParser
164    
165     @f:implements:
166     @@@: DOMLS|LSFeature30
167     @@For: ManakaiDOM|DOM3
168    
169     @DISLang:role: DOMLS|ParserRole
170    
171 wakaba 1.3 @enDesc:
172     Note that the <Class::ManakaiXMLParser> reports any XML errors
173     (syntax errors and / or well-formedness constraint errors)
174     via the <IF::DOMCore:error-handler> registered to
175     the <A::DOMLS:LSParser.domConfig> object. Each error has
176     its <A::DOMCore:DOMError.severity>, either <C::DOMCore:SEVERITY_ERROR>
177     or <C::DOMCore:SEVERITY_FATAL_ERROR>. However, their semantics
178     are slight different from the ones of <QUOTE::error> and
179     <QUOTE::fatal error> in XML; in this implemenetation,
180     <C::DOMCore:SEVERITY_ERROR> implies that the parsing process
181     can effectively be continued to detect more errors while
182     <C::DOMCore:SEVERITY_FATAL_ERROR> implies that the error
183     is serious so that the result document tree and any errors
184     might be far from the ones obtained when the error would not
185     be found.
186    
187 wakaba 1.1 @Attr:
188     @@Name: domConfig
189     @@enDesc:
190     The configuration of the parser.
191    
192     @@Get:
193     @@@Type: DOMCore|DOMConfiguration
194     @@@enDesc: The DOM configuration object.
195     @@@PerlDef:
196     __CODE{DOMCore|getConfigObject::
197     $target => $self,
198     $targetHash => $self,
199     $targetType => {<IFName::LSParser>},
200     $result => $r,
201     }__;
202    
203     @Method:
204     @@ManakaiDOM:isForInternal:1
205     @@ForCheck: ManakaiDOM|ForClass
206     @@Operator: DISPerl|NewMethod
207     @@enDesc:
208     Creates a new instance of the object.
209     @@Param:
210     @@@Name: impl
211     @@@Type: DOMLS|GLSImplementation
212     @@@enDesc:
213     The implementation from which the parser is created.
214     @@Param:
215     @@@Name: features
216     @@@Type: DOMString
217     @@@dis:actualType: f|FeaturesString
218     @@@enDesc:
219     The set of features requested for the parser.
220     @@Return:
221     @@@Type: DOMMain|DOMObject
222     @@@dis:actualType: LSParser
223     @@@enDesc:
224     The newly created parser.
225     @@@PerlDef:
226     $r = bless {
227     <H::DOMCore:implementation> => $impl,
228     }, $self;
229    
230     @Method:
231     @@Name: parseString
232     @@enImplNote:
233     Non-standard - to be removed
234    
235     @@Param:
236     @@@Name: sourceText
237     @@@Type: DOMString
238     @@Return:
239     @@@Type: Document
240     @@@PerlDef:
241    
242     $self->{char} = [];
243     $self->{token} = [];
244     $self->{source} = $sourceText;
245 wakaba 1.3 $self->{xml_version} = '1.0';
246     $self->{standalone} = false;
247     ## Well-formedness constraint Entity Declared takes effect?
248     $self->{line} = 1;
249     $self->{column} = 1;
250 wakaba 1.1
251     __DEEP{
252     $r = $self->_parse_DocumentEntity
253     ($self->{<H::DOMCore:implementation>});
254     }__;
255    
256     @Method:
257     @@Name: shiftChar
258     @@ManakaiDOM:isForInternal:1
259     @@ForCheck: ManakaiDOM|ForClass
260     @@enDesc:
261     Returns the next character.
262     @@Return:
263     @@@Type: idl|long||ManakaiDOM|all
264     @@@enDesc:
265     The code position number of the next character, if any,
266     or <CODE::-2>.
267     @@@PerlDef:
268     if (@{$self->{char}}) {
269     $r = shift @{$self->{char}};
270     } else {
271     my $char = substr ($self->{source}, pos ($self->{source}), 1);
272     pos ($self->{source})++;
273    
274     if (length $char) {
275     $r = ord $char;
276 wakaba 1.3 if ($r == 0x000A) {
277     $self->{line}++;
278     $self->{column} = 1;
279     } elsif ($r == 0x000D) {
280     my $next_char = substr ($self->{source},
281     pos ($self->{source}), 1);
282     if ($next_char eq "\x0A") {
283     pos ($self->{source})++;
284     $self->{column} = 1;
285     } elsif ($next_char eq "\x85") {
286     if ($self->{xml_version} eq '1.1') {
287     pos ($self->{source})++;
288     $self->{column} = 1;
289     } else {
290     $self->{column} = 0;
291     }
292     } else {
293     $self->{column} = 1;
294     }
295     $r = 0x000A;
296     $self->{line}++;
297     } elsif (
298     not ((0x0020 <= $r and $r <= 0x007E) or
299     (0x00A0 <= $r and $r <= 0xD7FF) or
300     (0xE000 <= $r and $r <= 0xFFFD) or
301     (0x10000 <= $r and $r <= 0x10FFFF)) and
302     $r != 0x0009 and $r != 0x0085 and
303     not ($self->{xml_version} eq '1.0' and
304     (0x007F <= $r and $r <= 0x009F))
305     ) {
306     my $location = {
307     utf32_offset => pos ($self->{source}),
308     line_number => $self->{line},
309     column_number => $self->{column},
310     };
311     my $continue = __DOMCore:ERROR{xp|wf-legal-literal-character::
312     DOMCore|location => {$location},
313     xp|character-number => {$r},
314     }__;
315     unless ($continue) {
316     __EXCEPTION{DOMLS|PARSE_ERR}__;
317     }
318     $self->{has_error} = true;
319     $self->{column}++;
320     } elsif ($r == 0x0085 or $r == 0x2028) {
321     $r = 0x000A if $self->{xml_version} eq '1.1';
322     $self->{line}++;
323     $self->{column} = 1;
324     } else {
325     $self->{column}++;
326     }
327 wakaba 1.1 } else {
328     $r = -1;
329     }
330     }
331    
332     @Method:
333     @@ManakaiDOM:isForInternal: 1
334     @@Operator: ManakaiDOM|MUErrorHandler
335     @@enDesc:
336     When a <IF::ecore|ErrorInterface||ManakaiDOM|Perl> is <Perl::report>ed,
337     then this method is invoked.
338    
339     The method calls the <cfg::DOMCore|error-handler> if the error is of
340     <IF::DOMCore|DOMError>. Otherwise, the error is re-thrown so that
341     corresponding <Perl::catch> clause, if any, can catch the error.
342     @@Param:
343     @@@Name: err
344     @@@Type: ecore|ErrorInterface||ManakaiDOM|Perl
345     @@@enDesc:
346     The reported error object.
347     @@Return:
348     @@@Type: DISPerl|Any
349     @@@enDesc:
350     If the <P::err> is a <IF::DOMCore|DOMError>, then the return value
351     of the error handler.
352    
353     {NOTE:: If the error is thrown, the method never returns.
354     }
355     @@@nullCase:
356     @@@@enDesc:
357     No error handler.
358     @@@PerlDef:
359     if ($err->isa (<IFName::DOMCore|DOMError||ManakaiDOM|ManakaiDOM>)) {
360     __DEEP{
361     A: {
362     my $cfg = $self-><AG::LSParser.domConfig>;
363     my $h = $cfg-><M::DOMCore|DOMConfiguration.getParameter>
364     ('error-handler');
365     $r = $h-><M::DOMCore|DOMErrorHandler.handleError> ($err);
366     } # A
367     }__;
368     } else {
369     $err-><M::ecore|ErrorInterface||ManakaiDOM|Perl.throw>;
370     }
371    
372     @DISPerl:dpgDef:
373    
374     /*
375     XML Document Entity
376    
377     document := prolog element *Misc
378     - *Char RestrictedChar *Char ;; [1]
379     */
380     rule DocumentEntity ($impl) : standalone {
381     my $doc : return;
382    
383     lang:Perl {
384     $doc = $impl-><M::DOMImpl.createDocument>;
385     $doc-><AS::Document.strictErrorChecking> (false);
386     }
387    
388     /*
389     prolog := XMLDecl? *Misc [doctypedecl *Misc] ;; [22]
390     */
391     ?lexmode 'DocumentStart';
392    
393 wakaba 1.3 ~? (XDO) {
394     &_XMLDeclaration_ ($doc => $doc);
395    
396     ~ (PIC) {
397     ?lexmode DocumentProlog;
398     } else {
399     ?lexmode DocumentProlog;
400     }
401     } else {
402     ?lexmode 'DocumentProlog';
403     }
404 wakaba 1.1
405     // *Misc
406 wakaba 1.2 ~* (CDO) {
407 wakaba 1.1 &_CommentDeclaration_ ($doc => $doc, $parent => $doc);
408    
409     ~ (MDC) {
410     ?lexmode DocumentProlog;
411     } else {
412     ?lexmode DocumentProlog;
413     }
414     } (PIO) {
415     &_ProcessingInstruction_ ($doc => $doc, $parent => $doc);
416    
417     ~ (PIC) {
418     ?lexmode 'DocumentProlog';
419     } else {
420     ?lexmode DocumentProlog;
421     }
422     } (S) {
423     //
424     }
425    
426     // doctypedecl
427     ~? (MDO) {
428     &_DocumentTypeDeclaration_ ($doc => $doc);
429    
430 wakaba 1.3 ~ (MDC) {
431     ?lexmode DocumentMisc;
432     } else {
433     ?lexmode DocumentMisc;
434     }
435     } else {
436     lang:Perl {
437     $self->{standalone} = true;
438     }
439     ?lexmode DocumentMisc;
440 wakaba 1.1 }
441    
442     // *Misc
443 wakaba 1.2 ~* (CDO) {
444 wakaba 1.1 &_CommentDeclaration_ ($doc => $doc, $parent => $doc);
445    
446     ~ (MDC) {
447     ?lexmode DocumentMisc;
448     } else {
449     ?lexmode DocumentMisc;
450     }
451     } (PIO) {
452     &_ProcessingInstruction_ ($doc => $doc, $parent => $doc);
453    
454     ~ (PIC) {
455     ?lexmode 'DocumentMisc';
456     } else {
457     ?lexmode DocumentMisc;
458     }
459     } (S) {
460     //
461     }
462    
463     // Document element
464     ~ (STAGO) {
465     &Element_ ($doc => $doc, $parent => $doc)
466     : unshift-current-token;
467     ~ (TAGC) {
468     ?lexmode DocumentEnd;
469     } else {
470     ?lexmode DocumentEnd;
471     }
472     } else {
473     ?lexmode 'DocumentEnd';
474     }
475    
476     // *Misc
477 wakaba 1.2 ~* (CDO) {
478 wakaba 1.1 &_CommentDeclaration_ ($doc => $doc, $parent => $doc);
479    
480     ~ (MDC) {
481     ?lexmode DocumentEnd;
482     } else {
483     ?lexmode DocumentEnd;
484     }
485     } (PIO) {
486     &_ProcessingInstruction_ ($doc => $doc, $parent => $doc);
487     ~ (PIC) {
488     ?lexmode 'DocumentEnd';
489     } else {
490     ?lexmode DocumentEnd;
491     }
492     } (S) {
493     //
494     }
495    
496     ~ (#EOF) { }
497    
498     lang:Perl {
499     if ($self->{has_error}) {
500     __EXCEPTION{DOMLS|PARSE_ERR::
501     }__;
502     }
503    
504     $doc-><AS::Document.strictErrorChecking> (true);
505     }
506     } // DocumentEntity
507    
508     /*
509     XML Declaration
510    
511     XMLDecl := '<?xml' VersionInfo
512     [EncodingDecl]
513     [SDDecl]
514     [S] '?>' ;; [23]
515    
516     NOTE: XML declaration is optional in XML 1.0
517     while it is required in XML 1.1.
518     */
519 wakaba 1.3 rule _XMLDeclaration_ ($doc) {
520     ?lexmode XMLDeclaration;
521    
522     ~ (S) { }
523    
524     ~ (Name == 'version') {
525     ~? (S) { }
526     ~ (VI) { }
527     ~? (S) { }
528 wakaba 1.1
529 wakaba 1.3 my $ver;
530     my $bad_token;
531    
532     ~ (LIT) {
533     ?lexmode AttributeValueLiteral;
534    
535     ~ (STRING) {
536     lang:Perl ($version => $token.value) {
537     $ver = $version;
538     $bad_token = $token;
539     }
540     }
541    
542     ~ (LIT) {
543     ?lexmode XMLDeclaration;
544     }
545     } (LITA) {
546     ?lexmode AttributeValueLiteralA;
547    
548     ~ (STRING) {
549     lang:Perl ($version => $token.value) {
550     $ver = $version;
551     $bad_token = $token;
552     }
553     }
554    
555     ~ (LITA) {
556     ?lexmode XMLDeclaration;
557     }
558     }
559    
560     lang:Perl : has-error {
561     unless ($ver eq '1.0' or $ver eq '1.1') {
562     my $location;
563     __CODE{xp|get-location-from-token::
564     $token => {$bad_token},
565     $result => {$location},
566     }__;
567     my $continue = __DOMCore:ERROR{xp|wf-unsupported-xml-version::
568     DOMCore|location => {$location},
569     xp|parent => {$doc},
570     infoset|version => {$ver},
571     xp|error-token => {$bad_token},
572     }__;
573     unless ($continue) {
574     __EXCEPTION{DOMLS|PARSE_ERR}__;
575     }
576     $self->{has_error} = true;
577     }
578     $doc-><AS::Document.xmlVersion> ($ver);
579     $self->{xml_version} = $ver;
580     }
581    
582     ~? (S) { }
583 wakaba 1.1 }
584    
585 wakaba 1.3 ~? (Name == 'encoding') {
586     ~? (S) { }
587     ~ (VI) { }
588     ~? (S) { }
589    
590     my $ver;
591     my $bad_token;
592    
593     ~ (LIT) {
594     ?lexmode AttributeValueLiteral;
595    
596     ~ (STRING) {
597     lang:Perl ($version => $token.value) {
598     $ver = $version;
599     $bad_token = $token;
600     }
601     }
602    
603     ~ (LIT) {
604     ?lexmode XMLDeclaration;
605     }
606     } (LITA) {
607     ?lexmode AttributeValueLiteralA;
608    
609     ~ (STRING) {
610     lang:Perl ($version => $token.value) {
611     $ver = $version;
612     $bad_token = $token;
613     }
614     }
615    
616     ~ (LITA) {
617     ?lexmode XMLDeclaration;
618     }
619     }
620    
621     lang:Perl : has-error {
622     unless ($ver =~ /\A[A-Za-z][A-Za-z0-9._-]*\z/) {
623     my $location;
624     __CODE{xp|get-location-from-token::
625     $token => {$bad_token},
626     $result => {$location},
627     }__;
628     my $continue = __DOMCore:ERROR{xp|wf-malformed-enc-name::
629     DOMCore|location => {$location},
630     xp|parent => {$doc},
631     xp|name => {$ver},
632     xp|error-token => {$bad_token},
633     }__;
634     unless ($continue) {
635     __EXCEPTION{DOMLS|PARSE_ERR}__;
636     }
637     $self->{has_error} = true;
638     }
639     $doc-><AS::Document.xmlEncoding> ($ver);
640     }
641    
642     ~? (S) { }
643 wakaba 1.1 }
644 wakaba 1.3
645     ~? (Name == 'standalone') {
646     ~? (S) { }
647     ~ (VI) { }
648     ~? (S) { }
649    
650     my $ver;
651     my $bad_token;
652    
653     ~ (LIT) {
654     ?lexmode AttributeValueLiteral;
655    
656     ~ (STRING) {
657     lang:Perl ($version => $token.value) {
658     $ver = $version;
659     $bad_token = $token;
660     }
661     }
662    
663     ~ (LIT) {
664     ?lexmode XMLDeclaration;
665     }
666     } (LITA) {
667     ?lexmode AttributeValueLiteralA;
668    
669     ~ (STRING) {
670     lang:Perl ($version => $token.value) {
671     $ver = $version;
672     $bad_token = $token;
673     }
674     }
675 wakaba 1.1
676 wakaba 1.3 ~ (LITA) {
677     ?lexmode XMLDeclaration;
678     }
679     }
680    
681     lang:Perl : has-error {
682     unless ($ver eq 'yes' or $ver eq 'no') {
683     my $location;
684     __CODE{xp|get-location-from-token::
685     $token => {$bad_token},
686     $result => {$location},
687     }__;
688     my $continue = __DOMCore:ERROR{xp|wf-malformed-xml-standalone::
689     DOMCore|location => {$location},
690     xp|parent => {$doc},
691     xp|name => {$ver},
692     xp|error-token => {$bad_token},
693     }__;
694     unless ($continue) {
695     __EXCEPTION{DOMLS|PARSE_ERR}__;
696     }
697     $self->{has_error} = true;
698     }
699     $doc-><AS::Document.xmlStandalone> ($ver);
700     $self->{standalone} = true if $ver eq 'yes';
701     }
702    
703     ~? (S) { }
704     }
705    
706     // ~ (PIC) { }
707     } // _XMLDeclaration_
708 wakaba 1.1
709     /*
710     Comment Declaration
711    
712     Comment := '<!--' *(Char - '-' / '-' (Char - '-'))
713     '-->' ;; [15]
714 wakaba 1.4 */
715 wakaba 1.1 rule _CommentDeclaration_ ($doc, $parent) {
716     ?lexmode 'CommentDeclaration';
717    
718     ~? (STRING) {
719     lang:Perl ($data => $token.value) {
720     my $com = $doc-><M::Document.createComment> ($data);
721     $parent-><M::Node.appendChild> ($com);
722     }
723     } else {
724     lang:Perl {
725     my $com = $doc-><M::Document.createComment> ('');
726     $parent-><M::Node.appendChild> ($com);
727     }
728     }
729    
730     ~ (COM) {
731     ?lexmode MarkupDeclaration;
732     } else {
733     ?lexmode MarkupDeclaration;
734     }
735    
736     // ~ (MDC) { }
737 wakaba 1.4 } // _CommentDeclaration
738     _
739     rule _CommentDeclarationDTD ($doc) {
740     ?lexmode 'CommentDeclaration';
741    
742     ~? (STRING) {
743     //
744     }
745    
746     ~ (COM) {
747     ?lexmode MarkupDeclaration;
748     } else {
749     ?lexmode MarkupDeclaration;
750     }
751    
752     ~ (MDC) {
753     ?lexmode DTD;
754     } else {
755     ?lexmode DTD;
756     }
757     } // _CommentDeclarationDTD
758    
759 wakaba 1.1 /*
760     Processing Instruction
761    
762     PI := '<?' PITarget [S *Char - *Char '?>' *Char]
763     '?>' ;; [16]
764     */
765     rule _ProcessingInstruction_ ($doc, $parent) {
766     ?lexmode 'PIName';
767    
768     my $pi;
769    
770     ~ (Name) {
771 wakaba 1.3 lang:Perl ($name => $token.value) : has-error {
772 wakaba 1.1 if (lc $name eq 'xml') {
773 wakaba 1.3 my $location;
774     __CODE{xp|get-location-from-token::
775     $token => {$token},
776     $result => {$location},
777     }__;
778     my $continue = __DOMCore:ERROR{xp|wf-pi-target-is-xml::
779     xp|name => {$name},
780     DOMCore|location => {$location},
781     xp|parent => {$parent},
782     }__;
783     unless ($continue) {
784     __EXCEPTION{DOMLS|PARSE_ERR::
785     }__;
786     }
787     $self->{has_error} = true;
788 wakaba 1.1 }
789     ## TODO: Namespace well-formedness
790     $pi = $doc-><M::Document.createProcessingInstruction>
791     ($name);
792     }
793     }
794    
795     ~ (S) {
796     ?lexmode 'PIData';
797    
798     my $tdata;
799    
800     ~? (DATA) {
801     lang:Perl ($data => $token.value) {
802     $tdata = $data;
803     }
804     } else {
805     lang:Perl {
806     $tdata = '';
807     }
808     }
809    
810     lang:Perl {
811     $pi-><AS::Node.nodeValue> ($tdata);
812     }
813     }
814    
815     lang:Perl {
816     $parent-><M::Node.appendChild> ($pi);
817     ## TODO: PIs in document type declaration subsets
818     }
819    
820     // ~ (PIC) { }
821     } // _ProcessingInstruction_
822 wakaba 1.4
823     rule _ProcessingInstructionDTD ($doc) {
824     ?lexmode 'PIName';
825    
826     ~ (Name) {
827    
828     }
829    
830     ~ (S) {
831     ?lexmode 'PIData';
832    
833     ~? (DATA) {
834    
835     }
836     }
837    
838     ~ (PIC) {
839     ?lexmode DTD;
840     } else {
841     ?lexmode DTD;
842     }
843     } // _ProcessingInstructionDTD
844 wakaba 1.1
845     /*
846     Element content parsing mode
847    
848     element := EmptyElemTag /
849     STag content ETag ;; [39]
850     content := (CharData / element / Reference / CDSect /
851     PI / Comment) ;; [43]
852     */
853     rule Element_ ($doc, $parent) : standalone {
854     ?lexmode 'ElementContent';
855    
856     my $node; // Current "parent" node
857     my $nodes; // Node stack (w/o $current_node)
858     my $type; // Current "parent" element type QName
859     my $types; // Element type stack (w/o $current_type)
860     my $ns; // Current in-scope namespace bindings
861     my $nses; // Namespace binding stack (w/o $current_ns)
862    
863     lang:Perl {
864     $node = $parent;
865     $nodes = [];
866     $type = '';
867     $types = [];
868     $ns = {
869     xml => <Q::xml:>,
870     xmlns => <Q::xmlns:>,
871     };
872     $nses = [];
873     }
874    
875     ~* : name => CONTENT
876     (CharData) {
877     // Character data
878     lang:Perl ($data => $token.value) {
879     $node-><M::Node.appendChild>
880     ($doc-><M::Document.createTextNode> ($data));
881     }
882     } (STAGO) {
883     // Start tag or empty element tag
884    
885     ?lexmode 'StartTag';
886    
887     ~ (Name) {
888     my $attrs;
889     lang:Perl ($name => $token.value) {
890     push @{$types}, $type;
891     $type = $name;
892     $attrs = {};
893     }
894    
895     ~? (S) {
896     &AttributeSpecificationList
897     ($doc => $doc, $attrs => $attrs);
898     }
899    
900     my $el;
901    
902     lang:Perl {
903     push @{$nses}, $ns;
904     $ns = {%$ns};
905    
906     my %gattr;
907     my %lattr;
908     for my $atqname (keys %$attrs) {
909     my ($pfx, $lname) = split /:/, $atqname;
910     if (defined $lname) { ## Global attribute
911     ## TODO: Namespace well-formedness (lname is NCName)
912     if ($pfx eq 'xmlns') {
913     my $nsuri = $attrs->{$atqname}->{value};
914     if ($lname eq 'xml' and
915     $nsuri ne <Q::xml:>) {
916     ## TODO: error
917     } elsif ($lname eq 'xmlns') {
918     ## TODO: error
919     }
920     if ($nsuri eq '') {
921     ## TODO: error in XML 1.0
922     } elsif ($nsuri eq <Q::xml:> and
923     $lname ne 'xml') {
924     ## TODO: error
925     } elsif ($nsuri eq <Q::xmlns:>) {
926     ## TODO: error
927     }
928     $ns->{$lname} = $attrs->{$atqname}->{value};
929     delete $ns->{$lname} unless length $ns->{$lname};
930     } elsif ($pfx eq '') {
931     ## TODO: pfx is not NCName error
932     } else {
933     if ($gattr{$pfx}->{$lname}) {
934     ## TODO: Namespace well-formedness error
935     }
936     }
937     $gattr{$pfx}->{$lname} = $attrs->{$atqname};
938     } else { ## Local attribute
939     if ($pfx eq 'xmlns') {
940     $ns->{''} = $attrs->{xmlns}->{value};
941     delete $ns->{''} unless length $ns->{''};
942     } else {
943     $lattr{$pfx} = $attrs->{$atqname};
944     }
945     }
946     }
947    
948     my ($pfx, $lname) = split /:/, $type;
949     my $nsuri;
950     ## TODO: lname is NCName?
951     if (defined $lname) { ## Prefixed namespace
952     if ($pfx eq '') {
953     ## TODO: pfx is not NCName error
954     }
955     if (defined $ns->{$pfx}) {
956     $nsuri = $ns->{$pfx};
957     } else {
958     ## TODO: namespace ill-formed
959     }
960     } else { ## Default namespace
961     $nsuri = $ns->{''};
962     }
963    
964     $el = $doc-><M::Document.createElementNS>
965     ($nsuri, $type);
966    
967     if ($attrs->{xmlns}) {
968     my $attr = $doc-><M::Document.createAttributeNS>
969     (<Q::xmlns:>, 'xmlns');
970     for (@{$attrs->{xmlns}->{nodes}}) {
971     $attr-><M::Node.appendChild> ($_);
972     }
973     $el-><M::Element.setAttributeNodeNS> ($attr);
974     }
975    
976     for my $lname (keys %lattr) {
977     my $attr = $doc-><M::Document.createAttributeNS>
978     (null, $lname);
979     for (@{$lattr{$lname}->{nodes}}) {
980     $attr-><M::Node.appendChild> ($_);
981     }
982     $el-><M::Element.setAttributeNodeNS> ($attr);
983     }
984    
985     for my $pfx (keys %gattr) {
986     for my $lname (keys %{$gattr{$pfx}}) {
987     my $attr = $doc-><M::Document.createAttributeNS>
988     ($ns->{$pfx}, $pfx.':'.$lname);
989     for (@{$gattr{$pfx}->{$lname}->{nodes}}) {
990     $attr-><M::Node.appendChild> ($_);
991     }
992     $el-><M::Element.setAttributeNodeNS> ($attr);
993     }
994     }
995    
996     $node-><M::Node.appendChild> ($el);
997     }
998    
999     ~ (TAGC) {
1000     lang:Perl {
1001     push @{$nodes}, $node;
1002     $node = $el;
1003     }
1004     ?lexmode ElementContent;
1005     } (MTAGC) {
1006     lang:Perl {
1007     $ns = pop @{$nses};
1008     $type = pop @{$types};
1009     }
1010     ?lexmode ElementContent;
1011     } else {
1012     ?lexmode ElementContent;
1013     }
1014     } else {
1015     ?lexmode ElementContent;
1016     }
1017    
1018     } (ETAGO) {
1019     // End tag
1020    
1021     ?lexmode 'EndTag';
1022    
1023     my $is_docel;
1024    
1025     ~ (Name) {
1026 wakaba 1.3 lang:Perl ($name => $token.value) : has-error {
1027 wakaba 1.1 if ($name eq $type) {
1028     $type = pop @{$types};
1029     if ($type eq '') {
1030     $is_docel = true;
1031     }
1032     $node = pop @{$nodes};
1033     $ns = pop @{$nses};
1034     } else {
1035 wakaba 1.3 my $location;
1036     __CODE{xp|get-location-from-token::
1037     $token => $token,
1038     $result => $location,
1039     }__;
1040     my $continue = __DOMCore:ERROR{xp|wf-element-type-match::
1041     DOMCore:location => {$location},
1042     xp|token => {$token},
1043     xp|expected-element-type => {$type},
1044     xp|actual-element-type => {$name},
1045     xp|node => {$node},
1046     }__;
1047     unless ($continue) {
1048     __EXCEPTION{DOMLS|PARSE_ERR}__;
1049     }
1050     $self->{has_error} = true;
1051 wakaba 1.1 }
1052     }
1053     }
1054    
1055     ~? (S) { }
1056    
1057     if-true ($is_docel) {
1058 wakaba 1.3 lang:Perl : has-error {
1059 wakaba 1.1 if (@{$types}) {
1060 wakaba 1.3 my $location;
1061     __CODE{xp|get-location-from-token::
1062     $token => $token,
1063     $result => $location,
1064     }__;
1065     for my $type (reverse @{$types}) {
1066     my $continue = __DOMCore:ERROR{xp|wf-no-end-tag::
1067     DOMCore:location => {$location},
1068     xp|token => {$token},
1069     xp|expected-element-type => {$type},
1070     xp|node => {$node},
1071     }__;
1072     unless ($continue) {
1073     __EXCEPTION{DOMLS|PARSE_ERR}__;
1074     }
1075     $node = shift @{$nodes};
1076     }
1077     $self->{has_error} = true;
1078 wakaba 1.1 }
1079     }
1080     return;
1081     }
1082    
1083     ~ (TAGC) {
1084     ?lexmode ElementContent;
1085     } else {
1086     ?lexmode 'ElementContent';
1087     }
1088    
1089     } (HCRO) {
1090     &_HexadecimalCharacterReference_
1091     ($doc => $doc, $parent => $node);
1092    
1093     ~ (REFC) {
1094     ?lexmode 'ElementContent';
1095     } else {
1096     ?lexmode ElementContent;
1097     }
1098     } (CRO) {
1099     &_NumericCharacterReference_
1100     ($doc => $doc, $parent => $node);
1101    
1102     ~ (REFC) {
1103     ?lexmode 'ElementContent';
1104     } else {
1105     ?lexmode ElementContent;
1106     }
1107     } (ERO) {
1108     &_GeneralEntityReference_
1109     ($doc => $doc, $parent => $node);
1110    
1111     ~ (REFC) {
1112     ?lexmode 'ElementContent';
1113     } else {
1114     ?lexmode ElementContent;
1115     }
1116     } (CDO) {
1117     &_CommentDeclaration_ ($doc => $doc, $parent => $node);
1118    
1119     ~ (MDC) {
1120     ?lexmode ElementContent;
1121     } else {
1122     ?lexmode ElementContent;
1123     }
1124     } (CDSO) {
1125     &_CDATASection_ ($doc => $doc, $parent => $node);
1126    
1127     ~ (MSE) {
1128     ?lexmode 'ElementContent';
1129     } else {
1130     ?lexmode ElementContent;
1131     }
1132     } (PIO) {
1133     &_ProcessingInstruction_ ($doc => $doc, $parent => $node);
1134    
1135     ~ (PIC) {
1136     ?lexmode 'ElementContent';
1137     } else {
1138     ?lexmode ElementContent;
1139     }
1140     }
1141 wakaba 1.3
1142     ~ (#NONE) { }
1143 wakaba 1.1 } // Element_
1144    
1145     rule AttributeSpecificationList ($doc, $attrs)
1146     : standalone
1147     {
1148     ?lexmode 'StartTag';
1149    
1150     my $i;
1151     lang:Perl {
1152     $i = 0;
1153     }
1154    
1155     ~* (Name) {
1156     my $atqname;
1157     lang:Perl ($name => $token.value) {
1158     $atqname = $name;
1159     }
1160    
1161     my $vals;
1162     lang:Perl {
1163     if ($attrs->{$atqname}) {
1164 wakaba 1.3 my $location;
1165     __CODE{xp|get-location-from-token::
1166     $token => $token,
1167     $result => $location,
1168     }__;
1169     my $continue = __DOMCore:ERROR{xp|wf-unique-att-spec::
1170     DOMCore:location => {$location},
1171     xp|token => {$token},
1172     xp|name => {$atqname},
1173     }__;
1174     unless ($continue) {
1175     __EXCEPTION{DOMLS|PARSE_ERR}__;
1176     }
1177     $self->{has_error} = true;
1178 wakaba 1.1 }
1179    
1180     $vals = $attrs->{$atqname} = {
1181     nodes => [],
1182     value => '',
1183     index => $i++,
1184     };
1185     }
1186 wakaba 1.3
1187     ~? (S) { }
1188     ~ (VI) { }
1189     ~? (S) { }
1190 wakaba 1.1
1191     ~ (LIT) {
1192     &_AttributeValueSpecification_
1193     ($doc => $doc, $vals => $vals);
1194    
1195     ~ (LIT) {
1196     ?lexmode StartTag;
1197     } else {
1198     ?lexmode StartTag;
1199     }
1200     } (LITA) {
1201     &_AttributeValueSpecificationA_
1202     ($doc => $doc, $vals => $vals);
1203    
1204     ~ (LITA) {
1205     ?lexmode StartTag;
1206     } else {
1207     ?lexmode StartTag;
1208     }
1209     }
1210     } (S) : separator : terminator? { }
1211     } // AttributeSpecificationList
1212    
1213     rule _AttributeValueSpecification_ ($doc, $vals) {
1214     // ~ (LIT) { }
1215     ?lexmode 'AttributeValueLiteral';
1216    
1217     ~* (STRING) {
1218     lang:Perl ($value => $token.value) {
1219     $value =~ s/[\x09\x0A\x0D]/ /g;
1220     my $text = $doc-><M::Document.createTextNode> ($value);
1221     push @{$vals->{nodes}}, $text;
1222     $vals->{value} .= $value;
1223     }
1224     } (HCRO) {
1225     &_HexadecimalCharacterReferenceV_
1226     ($doc => $doc, $vals => $vals);
1227    
1228     ~ (REFC) {
1229     ?lexmode AttributeValueLiteral;
1230     } else {
1231     ?lexmode AttributeValueLiteral;
1232     }
1233     } (CRO) {
1234     &_NumericCharacterReferenceV_
1235     ($doc => $doc, $vals => $vals);
1236    
1237     ~ (REFC) {
1238     ?lexmode AttributeValueLiteral;
1239     } else {
1240     ?lexmode AttributeValueLiteral;
1241     }
1242     } (ERO) {
1243     // TODO: Attribute value normalization
1244     &_GeneralEntityReferenceV_
1245     ($doc => $doc, $vals => $vals);
1246    
1247     ~ (REFC) {
1248     ?lexmode AttributeValueLiteral;
1249     } else {
1250     ?lexmode AttributeValueLiteral;
1251     }
1252     }
1253    
1254     // ~ (LIT) { } (LITA) { }
1255     } // _AttributeValueSpecification_
1256    
1257     rule _AttributeValueSpecificationA_ ($doc, $vals) {
1258     // ~ (LITA) { }
1259     ?lexmode 'AttributeValueLiteralA';
1260    
1261     ~* (STRING) {
1262     lang:Perl ($value => $token.value) {
1263     $value =~ s/[\x09\x0A\x0D]/ /g;
1264     my $text = $doc-><M::Document.createTextNode> ($value);
1265     push @{$vals->{nodes}}, $text;
1266     $vals->{value} .= $value;
1267     }
1268     } (HCRO) {
1269     &_HexadecimalCharacterReferenceV_
1270     ($doc => $doc, $vals => $vals);
1271    
1272     ~ (REFC) {
1273     ?lexmode AttributeValueLiteralA;
1274     } else {
1275     ?lexmode AttributeValueLiteralA;
1276     }
1277     } (CRO) {
1278     &_NumericCharacterReferenceV_
1279     ($doc => $doc, $vals => $vals);
1280    
1281     ~ (REFC) {
1282     ?lexmode AttributeValueLiteralA;
1283     } else {
1284     ?lexmode AttributeValueLiteralA;
1285     }
1286     } (ERO) {
1287     // TODO: Attribute value normalization
1288     &_GeneralEntityReferenceV_
1289     ($doc => $doc, $vals => $vals);
1290    
1291     ~ (REFC) {
1292     ?lexmode AttributeValueLiteralA;
1293     } else {
1294     ?lexmode AttributeValueLiteralA;
1295     }
1296     }
1297    
1298     // ~ (LITA) { }
1299     } // _AttributeValueSpecificationA_
1300    
1301     /*
1302     CDATA Section Content Parsing Mode
1303     */
1304     rule _CDATASection_ ($doc, $parent) {
1305     ?lexmode 'CDATASectionContent';
1306    
1307     my $cdata;
1308    
1309 wakaba 1.2 ~? (CData) {
1310 wakaba 1.1 lang:Perl ($data => $token.value) {
1311     $cdata = $data;
1312     }
1313     } else {
1314     lang:Perl {
1315     $cdata = '';
1316     }
1317     }
1318    
1319     lang:Perl {
1320     my $cdsect = $doc-><M::Document.createCDATASection>
1321     ($cdata);
1322     $parent-><M::Node.appendChild> ($cdsect);
1323     }
1324    
1325     // ~ (MSE) { }
1326     } // _CDATASection_
1327    
1328     rule _NumericCharacterReference_ ($doc, $parent) {
1329     ?lexmode 'NumericCharacterReference';
1330    
1331     ~ (NUMBER) {
1332 wakaba 1.3 lang:Perl ($num => $token.value) : has-error {
1333     $num += 0;
1334     unless (
1335     ($self->{xml_version} eq '1.0' and
1336     ((0x0020 <= $num and $num <= 0xD7FF) or
1337     (0xE000 <= $num and $num <= 0xFFFD) or
1338     (0x10000 <= $num and $num <= 0x10FFFF) or
1339     $num == 0x9 or $num == 0xA or $num == 0xD)) or
1340     ($self->{xml_version} eq '1.1' and
1341     ((0x0001 <= $num and $num <= 0xD7FF) or
1342     (0xE000 <= $num and $num <= 0xFFFD) or
1343     (0x10000 <= $num and $num <= 0x10FFFF)))
1344     ) {
1345     my $location;
1346     __CODE{xp|get-location-from-token::
1347     $token => $token,
1348     $result => $location,
1349     }__;
1350     my $continue = __DOMCore:ERROR{xp|wf-legal-character::
1351     DOMCore:location => {$location},
1352     xp|token => {$token},
1353     xp|character-number => {$num},
1354     xp|parent => {$parent},
1355     }__;
1356     unless ($continue) {
1357     __EXCEPTION{DOMLS|PARSE_ERR}__;
1358     }
1359     $self->{has_error} = true;
1360     }
1361     my $ncr = $doc-><M::Document.createTextNode> (chr $num);
1362 wakaba 1.1 $parent-><M::Node.appendChild> ($ncr);
1363     }
1364     }
1365    
1366     // ~ (REFC) { }
1367     } // _NumericCharacterReference_
1368    
1369     rule _NumericCharacterReferenceV_ ($doc, $vals) {
1370     ?lexmode 'NumericCharacterReference';
1371    
1372     ~ (NUMBER) {
1373 wakaba 1.3 lang:Perl ($num => $token.value) : has-error {
1374     $num += 0;
1375     unless (
1376     ($self->{xml_version} eq '1.0' and
1377     ((0x0020 <= $num and $num <= 0xD7FF) or
1378     (0xE000 <= $num and $num <= 0xFFFD) or
1379     (0x10000 <= $num and $num <= 0x10FFFF) or
1380     $num == 0x9 or $num == 0xA or $num == 0xD)) or
1381     ($self->{xml_version} eq '1.1' and
1382     ((0x0001 <= $num and $num <= 0xD7FF) or
1383     (0xE000 <= $num and $num <= 0xFFFD) or
1384     (0x10000 <= $num and $num <= 0x10FFFF)))
1385     ) {
1386     my $location;
1387     __CODE{xp|get-location-from-token::
1388     $token => $token,
1389     $result => $location,
1390     }__;
1391     my $continue = __DOMCore:ERROR{xp|wf-legal-character::
1392     DOMCore:location => {$location},
1393     xp|token => {$token},
1394     xp|character-number => {$num},
1395     }__;
1396     unless ($continue) {
1397     __EXCEPTION{DOMLS|PARSE_ERR}__;
1398     }
1399     $self->{has_error} = true;
1400     }
1401 wakaba 1.1 my $ncr = $doc-><M::Document.createTextNode>
1402     (my $char = chr (0+$num));
1403     push @{$vals->{nodes}}, $ncr;
1404     $vals->{value} .= $char;
1405     }
1406     }
1407    
1408     // ~ (REFC) { }
1409     } // _NumericCharacterReferenceV_
1410    
1411     rule _HexadecimalCharacterReference_ ($doc, $parent) {
1412     ?lexmode 'HexadecimalCharacterReference';
1413    
1414     ~ (Hex) {
1415 wakaba 1.3 lang:Perl ($num => $token.value) : has-error {
1416     $num += 0;
1417     unless (
1418     ($self->{xml_version} eq '1.0' and
1419     ((0x0020 <= $num and $num <= 0xD7FF) or
1420     (0xE000 <= $num and $num <= 0xFFFD) or
1421     (0x10000 <= $num and $num <= 0x10FFFF) or
1422     $num == 0x9 or $num == 0xA or $num == 0xD)) or
1423     ($self->{xml_version} eq '1.1' and
1424     ((0x0001 <= $num and $num <= 0xD7FF) or
1425     (0xE000 <= $num and $num <= 0xFFFD) or
1426     (0x10000 <= $num and $num <= 0x10FFFF)))
1427     ) {
1428     my $location;
1429     __CODE{xp|get-location-from-token::
1430     $token => $token,
1431     $result => $location,
1432     }__;
1433     my $continue = __DOMCore:ERROR{xp|wf-legal-character::
1434     DOMCore:location => {$location},
1435     xp|token => {$token},
1436     xp|character-number => {$num},
1437     xp|parent => {$parent},
1438     }__;
1439     unless ($continue) {
1440     __EXCEPTION{DOMLS|PARSE_ERR}__;
1441     }
1442     $self->{has_error} = true;
1443     }
1444 wakaba 1.1 my $ncr = $doc-><M::Document.createTextNode>
1445     (chr hex $num);
1446     $parent-><M::Node.appendChild> ($ncr);
1447     }
1448     }
1449    
1450     // ~ (REFC) { }
1451     } // _HexadecimalCharacterReference_
1452    
1453 wakaba 1.3 rule _HexadecimalCharacterReferenceV_ ($doc, $vals) {
1454 wakaba 1.1 ?lexmode 'HexadecimalCharacterReference';
1455    
1456     ~ (Hex) {
1457 wakaba 1.3 lang:Perl ($num => $token.value) : has-error {
1458     $num += 0;
1459     unless (
1460     ($self->{xml_version} eq '1.0' and
1461     ((0x0020 <= $num and $num <= 0xD7FF) or
1462     (0xE000 <= $num and $num <= 0xFFFD) or
1463     (0x10000 <= $num and $num <= 0x10FFFF) or
1464     $num == 0x9 or $num == 0xA or $num == 0xD)) or
1465     ($self->{xml_version} eq '1.1' and
1466     ((0x0001 <= $num and $num <= 0xD7FF) or
1467     (0xE000 <= $num and $num <= 0xFFFD) or
1468     (0x10000 <= $num and $num <= 0x10FFFF)))
1469     ) {
1470     my $location;
1471     __CODE{xp|get-location-from-token::
1472     $token => $token,
1473     $result => $location,
1474     }__;
1475     my $continue = __DOMCore:ERROR{xp|wf-legal-character::
1476     DOMCore:location => {$location},
1477     xp|token => {$token},
1478     xp|character-number => {$num},
1479     }__;
1480     unless ($continue) {
1481     __EXCEPTION{DOMLS|PARSE_ERR}__;
1482     }
1483     $self->{has_error} = true;
1484     }
1485 wakaba 1.1 my $ncr = $doc-><M::Document.createTextNode>
1486     (my $char = chr hex $num);
1487     push @{$vals->{nodes}}, $ncr;
1488     $vals->{value} .= $char;
1489     }
1490     }
1491    
1492     // ~ (REFC) { }
1493     } // _HexadecimalCharacterReferenceV_
1494    
1495     rule _GeneralEntityReference_ ($doc, $parent) {
1496     // TODO: Expansion
1497     ?lexmode 'EntityReference';
1498    
1499     ~ (Name) {
1500     lang:Perl ($name => $token.value) {
1501     ## TODO: Namespace well-formedness
1502     ## TODO: Entity declared constraints
1503     my $er = $doc-><M::Document.createEntityReference>
1504     ($name);
1505     $parent-><M::Node.appendChild> ($er);
1506     }
1507     }
1508    
1509     // ~ (REFC) { }
1510     } // _GeneralEntityReference_
1511    
1512     rule _GeneralEntityReferenceV_ ($doc, $vals) {
1513     // TODO: Expansion
1514     ?lexmode 'EntityReference';
1515    
1516     ~ (Name) {
1517     lang:Perl ($name => $token.value) {
1518     ## TODO: Namespace well-formedness
1519     ## TODO: Entity declared constraints
1520     my $er = $doc-><M::Document.createEntityReference>
1521     ($name);
1522     push @{$vals->{nodes}}, $er;
1523     }
1524     }
1525    
1526     // ~ (REFC) { }
1527     } // _GeneralEntityReferenceV_
1528    
1529     /*
1530 wakaba 1.4 Document Type Declaration
1531 wakaba 1.1 */
1532 wakaba 1.4 rule _DocumentTypeDeclaration_ ($doc) {
1533     ?lexmode MarkupDeclaration;
1534    
1535     ~ (Name == 'DOCTYPE') { }
1536    
1537     ~ (S) { }
1538    
1539     // Document type name
1540     ~ (Name) {
1541    
1542     }
1543    
1544     ~? (S) {
1545     ~? (Name == 'PUBLIC') {
1546     ~ (S) { }
1547    
1548     &PubidLiteral ($doc => $doc);
1549    
1550     ~ (S) { }
1551    
1552     &SystemLiteral ($doc => $doc);
1553    
1554     ~? (S) { }
1555     } (Name == 'SYSTEM') {
1556     ~ (S) { }
1557    
1558     &SystemLiteral ($doc => $doc);
1559    
1560     ~? (S) { }
1561     }
1562     }
1563    
1564     ~? (DSO) {
1565     &InternalSubset ($doc => $doc);
1566    
1567     ~ (DSC) {
1568     ?lexmode MarkupDeclaration;
1569     } else {
1570     ?lexmode MarkupDeclaration;
1571     }
1572    
1573     ~? (S) { }
1574     }
1575    
1576     // TODO: set $self->{standalone} true if only internal subset
1577     // with no param ref
1578    
1579     // ~ (MDC) { }
1580     } // _DocumentTypeDeclaration_
1581    
1582     rule PubidLiteral ($doc) {
1583     ~ (LIT) {
1584     ?lexmode SystemLiteral;
1585    
1586     ~? (STRING) {
1587     // TODO: Charrange check & normalization is required
1588    
1589     }
1590    
1591     ~ (LIT) {
1592     ?lexmode MarkupDeclaration;
1593     } else {
1594     ?lexmode MarkupDeclaration;
1595     }
1596     } (LITA) {
1597     ?lexmode SystemLiteralA;
1598    
1599     ~? (STRING) {
1600     // TODO: Charrange check & normalization is required
1601    
1602     }
1603    
1604     ~ (LITA) {
1605     ?lexmode MarkupDeclaration;
1606     } else {
1607     ?lexmode MarkupDeclaration;
1608     }
1609     }
1610     } // PubidLiteral
1611    
1612     rule SystemLiteral ($doc) {
1613     ~ (LIT) {
1614     ?lexmode SystemLiteral;
1615     &_SystemLiteral ($doc => $doc);
1616     } (LITA) {
1617     ?lexmode SystemLiteralA;
1618     &_SystemLiteral ($doc => $doc);
1619     }
1620     } // SystemLiteral
1621    
1622     rule _SystemLiteral ($doc) {
1623     ~? (STRING) {
1624    
1625     }
1626    
1627     ~ (LIT) {
1628     ?lexmode MarkupDeclaration;
1629     } (LITA) {
1630     ?lexmode MarkupDeclaration;
1631     } else {
1632     ?lexmode MarkupDeclaration;
1633     }
1634     } // _SystemLiteral
1635    
1636     /*
1637     DTD Internal Subset
1638    
1639     intSubset := *(markupdecl / DeclSep) ;; [28b]
1640     */
1641     rule InternalSubset ($doc) {
1642     ?lexmode DTD;
1643    
1644     ~* (MDO) {
1645     ?lexmode MarkupDeclaration;
1646    
1647     ~ (Name == 'ELEMENT') {
1648     &_ElementDeclaration ($doc => $doc);
1649     } (Name == 'ATTLIST') {
1650     &_AttlistDeclaration ($doc => $doc);
1651     } (Name == 'ENTITY') {
1652     &_EntityDeclaration ($doc => $doc);
1653     } (Name == 'NOTATION') {
1654     &_NotationDeclaration ($doc => $doc);
1655     }
1656     } (S) {
1657     //
1658     } (CDO) {
1659     &_CommentDeclarationDTD ($doc => $doc);
1660     } (PIO) {
1661     &_ProcessingInstructionDTD ($doc => $doc);
1662     } (PERO) {
1663     ?lexmode EntityReference;
1664    
1665     ~ (Name) {
1666    
1667     }
1668    
1669     ~ (REFC) {
1670     ?lexmode DTD;
1671     } else {
1672     ?lexmode DTD;
1673     }
1674     }
1675     } // InternalSubset
1676    
1677     rule _ElementDeclaration ($doc) {
1678     // ~ (MDO) { }
1679     // ?lexmode MarkupDeclaration
1680     // ~ (Name == 'Element') { }
1681    
1682     ~ (S) { }
1683    
1684     ~ (Name) {
1685    
1686     }
1687    
1688     ?lexmode ElementDeclaration;
1689    
1690     ~ (S) { }
1691    
1692     // contentspec
1693     ~ (MGO) {
1694     &_ContentModel ($doc => $doc);
1695     } (Name == 'EMPTY') {
1696    
1697     } (Name == 'ANY') {
1698    
1699     }
1700    
1701     ~? (S) { }
1702    
1703     ~ (MDC) {
1704     ?lexmode DTD;
1705     } else {
1706     ?lexmode DTD;
1707     }
1708     } // _ElementDeclaration
1709    
1710     rule _ContentModel ($doc) {
1711     // ~ (MGO) { }
1712    
1713     ~? (S) { }
1714    
1715     ~ (Name) {
1716     // Element content model
1717     &_ModelGroup ($doc => $doc)
1718     : unshift-current-token;
1719    
1720     ~? (OPT) {
1721    
1722     } (REP) {
1723    
1724     } (PLUS) {
1725    
1726     }
1727    
1728     } (MDO) {
1729     &_ModelGroup ($doc => $doc)
1730     : unshift-current-token;
1731    
1732     ~? (OPT) {
1733    
1734     } (REP) {
1735    
1736     } (PLUS) {
1737    
1738     }
1739    
1740     } (PCDATA) {
1741     // Mixed content declaration
1742     ~? (S) { }
1743    
1744     ~* (OR) {
1745     ~? (S) { }
1746    
1747     ~ (Name) {
1748    
1749     }
1750    
1751     ~? (S) { }
1752     }
1753    
1754     ~ (MGC) { }
1755    
1756     ~? (REP) {
1757    
1758     } else {
1759     // TODO: error if |Name|
1760     }
1761    
1762    
1763     }
1764     } // _ContentModel
1765    
1766     rule _ModelGroup ($doc)
1767     : standalone
1768     : recursive
1769     {
1770     // ~ (MGO) { }
1771     // ~? (S) { }
1772    
1773     &ContentParticle ($doc => $doc);
1774    
1775     ~? (S) { }
1776    
1777     ~? (OR) {
1778     ~? (S) { }
1779    
1780     &ContentParticle ($doc => $doc);
1781    
1782     ~? (S) { }
1783    
1784     ~* (OR) {
1785     ~? (S) { }
1786    
1787     &ContentParticle ($doc => $doc);
1788    
1789     ~? (S) { }
1790     }
1791    
1792     } (SEQ) {
1793     ~? (S) { }
1794    
1795     &ContentParticle ($doc => $doc);
1796    
1797     ~? (S) { }
1798    
1799     ~* (SEQ) {
1800     ~? (S) { }
1801    
1802     &ContentParticle ($doc => $doc);
1803    
1804     ~? (S) { }
1805     }
1806    
1807     }
1808    
1809     ~ (MGC) { }
1810     } // _ModelGroup
1811    
1812     rule ContentParticle ($doc) {
1813     ~ (Name) {
1814    
1815     } (MGO) {
1816     ~? (S) { }
1817    
1818     &_ModelGroup ($doc => $doc);
1819     }
1820    
1821     ~? (OPT) {
1822    
1823     } (REP) {
1824    
1825     } (PLUS) {
1826    
1827     }
1828     } // ContentParticle
1829    
1830     rule _AttlistDeclaration ($doc) {
1831     // ~ (MDO) { }
1832     // ~ (Name == 'ATTLIST') { }
1833    
1834     ?lexmode AttlistDeclaration;
1835    
1836     ~ (S) { }
1837    
1838     ~ (Name) {
1839    
1840     }
1841    
1842     ~? (S) { }
1843    
1844     ~* (Name) {
1845    
1846     ~ (S) { }
1847    
1848     // AttType
1849     ~ (Name == 'NOTATION') {
1850     ~ (S) { }
1851    
1852     ~ (EGO) {
1853     ~? (S) { }
1854    
1855     ~ (Name) {
1856    
1857     ~? (S) { }
1858     } (OR) : separator {
1859     ~? (S) { }
1860     }
1861    
1862     ~ (EGC) { }
1863     }
1864     } (Name) {
1865     // TODO: check keyword
1866    
1867     } (EGO) {
1868     ?lexmode Enumeration;
1869    
1870     ~? (S) { }
1871    
1872     ~ (Nmtoken) {
1873    
1874     ~? (S) { }
1875     } (OR) : separator {
1876     ~? (S) { }
1877     }
1878    
1879     ~ (EGC) {
1880     ?lexmode AttlistDeclaration;
1881     } else {
1882     ?lexmode AttlistDeclaration;
1883     }
1884    
1885     }
1886    
1887     ~ (S) { }
1888    
1889     // DefaultDecl
1890     ~ (RNI) {
1891     ~ (Name == 'REQUIRED') {
1892    
1893     } (Name == 'IMPLIED') {
1894    
1895     } (Name == 'FIXED') {
1896     ~ (S) { }
1897    
1898     ~ (LIT) {
1899     my $vals;
1900     lang:Perl {
1901     $vals = [];
1902     }
1903    
1904     &_AttributeValueSpecification_ ($doc => $doc, $vals => $vals);
1905    
1906     ~ (LIT) {
1907     ?lexmode AttlistDeclaration;
1908     } else {
1909     ?lexmode AttlistDeclaration;
1910     }
1911     } (LITA) {
1912     my $vals;
1913     lang:Perl {
1914     $vals = [];
1915     }
1916    
1917     &_AttributeValueSpecificationA_ ($doc => $doc, $vals => $vals);
1918    
1919     ~ (LIT) {
1920     ?lexmode AttlistDeclaration;
1921     } else {
1922     ?lexmode AttlistDeclaration;
1923     }
1924     }
1925     }
1926    
1927     } (LIT) {
1928     my $vals;
1929     lang:Perl {
1930     $vals = [];
1931     }
1932    
1933     &_AttributeValueSpecification_ ($doc => $doc, $vals => $vals);
1934    
1935     ~ (LIT) {
1936     ?lexmode AttlistDeclaration;
1937     } else {
1938     ?lexmode AttlistDeclaration;
1939     }
1940     } (LITA) {
1941     my $vals;
1942     lang:Perl {
1943     $vals = [];
1944     }
1945    
1946     &_AttributeValueSpecificationA_ ($doc => $doc, $vals => $vals);
1947    
1948     ~ (LITA) {
1949     ?lexmode AttlistDeclaration;
1950     } else {
1951     ?lexmode AttlistDeclaration;
1952     }
1953     }
1954    
1955     } (S) : separator : terminator? {
1956     //
1957     }
1958    
1959     ~ (MDC) {
1960     ?lexmode DTD;
1961     } else {
1962     ?lexmode DTD;
1963     }
1964     } // _AttlistDeclaration
1965    
1966     rule _EntityDeclaration ($doc) {
1967     // ~ (MDO) { }
1968     // ~ (Name == ENTITY) { }
1969    
1970     ~ (S) { }
1971    
1972     ~? (PERO) {
1973     ~ (S) { }
1974     }
1975    
1976     ~ (Name) {
1977    
1978     }
1979    
1980     ~ (S) { }
1981    
1982     ~ (LIT) {
1983     &_EntityValue ($doc => $doc);
1984     } (LITA) {
1985     &_EntityValueA ($doc => $doc);
1986     } (Name == 'PUBLIC') {
1987     ~ (S) { }
1988    
1989     &PubidLiteral ($doc => $doc);
1990    
1991     ~ (S) { }
1992    
1993     &SystemLiteral ($doc => $doc);
1994     } (Name == 'SYSTEM') {
1995     ~ (S) { }
1996    
1997     &SystemLiteral ($doc => $doc);
1998     }
1999    
2000     ~? (S) {
2001     ~? (Name == 'NDATA') {
2002     // TODO: error if parameter entity
2003    
2004     ~ (S) { }
2005    
2006     ~ (Name) {
2007    
2008     }
2009    
2010     ~? (S) { }
2011     }
2012     }
2013    
2014     ~ (MDC) {
2015     ?lexmode DTD;
2016     } else {
2017     ?lexmode DTD;
2018     }
2019     } // _EntityDeclaration
2020    
2021     rule _NotationDeclaration ($doc) {
2022     // ~ (MDO) { }
2023     // ~ (Name == NOTATION) { }
2024    
2025     ~ (S) { }
2026    
2027     ~ (Name) {
2028    
2029     }
2030    
2031     ~ (S) { }
2032    
2033     ~ (Name == 'PUBLIC') {
2034     ~ (S) { }
2035    
2036     &PubidLiteral ($doc => $doc);
2037    
2038     ~? (S) {
2039     ~? (LIT) {
2040     ?lexmode SystemLiteral;
2041     &_SystemLiteral ($doc => $doc);
2042    
2043     ~? (S) { }
2044     } (LITA) {
2045     ?lexmode SystemLiteralA;
2046     &_SystemLiteral ($doc => $doc);
2047    
2048     ~? (S) { }
2049     }
2050     }
2051     } (Name == 'SYSTEM') {
2052     ~ (S) { }
2053    
2054     &SystemLiteral ($doc => $doc);
2055    
2056     ~? (S) { }
2057     }
2058    
2059     ~ (MDC) {
2060     ?lexmode DTD;
2061     } else {
2062     ?lexmode DTD;
2063     }
2064     } // _NotationDeclaration
2065    
2066     rule _EntityValue ($doc) {
2067     ?lexmode EntityValue;
2068    
2069     my $vals;
2070     lang:Perl {
2071     $vals = [];
2072     }
2073    
2074     ~* (STRING) {
2075    
2076     } (PERO) {
2077     ?lexmode EntityDeclaration;
2078    
2079     ~ (Name) {
2080    
2081     }
2082    
2083     ~ (REFC) {
2084     ?lexmode EntityValue;
2085     } else {
2086     ?lexmode EntityValue;
2087     }
2088     } (HCRO) {
2089     &_HexadecimalCharacterReferenceV_
2090     ($doc => $doc, $vals => $vals);
2091    
2092     ~ (REFC) {
2093     ?lexmode EntityValue;
2094     } else {
2095     ?lexmode EntityValue;
2096     }
2097     } (CRO) {
2098     &_NumericCharacterReferenceV_
2099     ($doc => $doc, $vals => $vals);
2100    
2101     ~ (REFC) {
2102     ?lexmode EntityValue;
2103     } else {
2104     ?lexmode EntityValue;
2105     }
2106     } (ERO) {
2107     &_GeneralEntityReferenceV_
2108     ($doc => $doc, $vals => $vals);
2109    
2110     ~ (REFC) {
2111     ?lexmode EntityValue;
2112     } else {
2113     ?lexmode EntityValue;
2114     }
2115     }
2116    
2117     ~ (LIT) {
2118     ?lexmode MarkupDeclaration;
2119     } (LITA) {
2120     ?lexmode MarkupDeclaration;
2121     } else {
2122     ?lexmode MarkupDeclaration;
2123     }
2124     } // _EntityValue
2125    
2126     rule _EntityValueA ($doc) {
2127     ?lexmode EntityValueA;
2128    
2129     my $vals;
2130     lang:Perl {
2131     $vals = [];
2132     }
2133    
2134     ~* (STRING) {
2135    
2136     } (PERO) {
2137     ?lexmode EntityDeclaration;
2138    
2139     ~ (Name) {
2140    
2141     }
2142    
2143     ~ (REFC) {
2144     ?lexmode EntityValueA;
2145     } else {
2146     ?lexmode EntityValueA;
2147     }
2148     } (HCRO) {
2149     &_HexadecimalCharacterReferenceV_
2150     ($doc => $doc, $vals => $vals);
2151    
2152     ~ (REFC) {
2153     ?lexmode EntityValueA;
2154     } else {
2155     ?lexmode EntityValueA;
2156     }
2157     } (CRO) {
2158     &_NumericCharacterReferenceV_
2159     ($doc => $doc, $vals => $vals);
2160    
2161     ~ (REFC) {
2162     ?lexmode EntityValueA;
2163     } else {
2164     ?lexmode EntityValueA;
2165     }
2166     } (ERO) {
2167     &_GeneralEntityReferenceV_
2168     ($doc => $doc, $vals => $vals);
2169    
2170     ~ (REFC) {
2171     ?lexmode EntityValueA;
2172     } else {
2173     ?lexmode EntityValueA;
2174     }
2175     }
2176    
2177     ~ (LITA) {
2178     ?lexmode MarkupDeclaration;
2179     } else {
2180     ?lexmode MarkupDeclaration;
2181     }
2182     } // _EntityValueA
2183    
2184    
2185     /*
2186     XML Name
2187     */
2188     lexmode NameChar {
2189     $NameStartChar10 := [
2190     '_' ':'
2191     // Letter
2192     // BaseChar
2193     U+0041..U+005A U+0061..U+007A U+00C0..U+00D6
2194     U+00D8..U+00F6 U+00F8..U+00FF U+0100..U+0131
2195     U+0134..U+013E U+0141..U+0148 U+014A..U+017E
2196     U+0180..U+01C3 U+01CD..U+01F0 U+01F4..U+01F5
2197     U+01FA..U+0217 U+0250..U+02A8 U+02BB..U+02C1
2198     U+0386 U+0388..U+038A U+038C U+038E..U+03A1
2199     U+03A3..U+03CE U+03D0..U+03D6 U+03DA U+03DC
2200     U+03DE U+03E0 U+03E2..U+03F3 U+0401..U+040C
2201     U+040E..U+044F U+0451..U+045C U+045E..U+0481
2202     U+0490..U+04C4 U+04C7..U+04C8 U+04CB..U+04CC
2203     U+04D0..U+04EB U+04EE..U+04F5 U+04F8..U+04F9
2204     U+0531..U+0556 U+0559 U+0561..U+0586
2205     U+05D0..U+05EA U+05F0..U+05F2 U+0621..U+063A
2206     U+0641..U+064A U+0671..U+06B7 U+06BA..U+06BE
2207     U+06C0..U+06CE U+06D0..U+06D3 U+06D5
2208     U+06E5..U+06E6 U+0905..U+0939 U+093D
2209     U+0958..U+0961 U+0985..U+098C U+098F..U+0990
2210     U+0993..U+09A8 U+09AA..U+09B0 U+09B2
2211     U+09B6..U+09B9 U+09DC..U+09DD U+09DF..U+09E1
2212     U+09F0..U+09F1 U+0A05..U+0A0A U+0A0F..U+0A10
2213     U+0A13..U+0A28 U+0A2A..U+0A30 U+0A32..U+0A33
2214     U+0A35..U+0A36 U+0A38..U+0A39 U+0A59..U+0A5C
2215     U+0A5E U+0A72..U+0A74 U+0A85..U+0A8B U+0A8D
2216     U+0A8F..U+0A91 U+0A93..U+0AA8 U+0AAA..U+0AB0
2217     U+0AB2..U+0AB3 U+0AB5..U+0AB9 U+0ABD U+0AE0
2218     U+0B05..U+0B0C U+0B0F..U+0B10 U+0B13..U+0B28
2219     U+0B2A..U+0B30 U+0B32..U+0B33 U+0B36..U+0B39
2220     U+0B3D U+0B5C..U+0B5D U+0B5F..U+0B61
2221     U+0B85..U+0B8A U+0B8E..U+0B90 U+0B92..U+0B95
2222     U+0B99..U+0B9A U+0B9C U+0B9E..U+0B9F
2223     U+0BA3..U+0BA4 U+0BA8..U+0BAA U+0BAE..U+0BB5
2224     U+0BB7..U+0BB9 U+0C05..U+0C0C U+0C0E..U+0C10
2225     U+0C12..U+0C28 U+0C2A..U+0C33 U+0C35..U+0C39
2226     U+0C60..U+0C61 U+0C85..U+0C8C U+0C8E..U+0C90
2227     U+0C92..U+0CA8 U+0CAA..U+0CB3 U+0CB5..U+0CB9
2228     U+0CDE U+0CE0..U+0CE1 U+0D05..U+0D0C
2229     U+0D0E..U+0D10 U+0D12..U+0D28 U+0D2A..U+0D39
2230     U+0D60..U+0D61 U+0E01..U+0E2E U+0E30
2231     U+0E32..U+0E33 U+0E40..U+0E45 U+0E81..U+0E82
2232     U+0E84 U+0E87..U+0E88 U+0E8A U+0E8D
2233     U+0E94..U+0E97 U+0E99..U+0E9F U+0EA1..U+0EA3
2234     U+0EA5 U+0EA7 U+0EAA..U+0EAB U+0EAD..U+0EAE
2235     U+0EB0 U+0EB2..U+0EB3 U+0EBD U+0EC0..U+0EC4
2236     U+0F40..U+0F47 U+0F49..U+0F69 U+10A0..U+10C5
2237     U+10D0..U+10F6 U+1100 U+1102..U+1103
2238     U+1105..U+1107 U+1109 U+110B..U+110C
2239     U+110E..U+1112 U+113C U+113E U+1140 U+114C
2240     U+114E U+1150 U+1154..U+1155 U+1159
2241     U+115F..U+1161 U+1163 U+1165 U+1167 U+1169
2242     U+116D..U+116E U+1172..U+1173 U+1175 U+119E
2243     U+11A8 U+11AB U+11AE..U+11AF U+11B7..U+11B8
2244     U+11BA U+11BC..U+11C2 U+11EB U+11F0 U+11F9
2245     U+1E00..U+1E9B U+1EA0..U+1EF9 U+1F00..U+1F15
2246     U+1F18..U+1F1D U+1F20..U+1F45 U+1F48..U+1F4D
2247     U+1F50..U+1F57 U+1F59 U+1F5B U+1F5D
2248     U+1F5F..U+1F7D U+1F80..U+1FB4 U+1FB6..U+1FBC
2249     U+1FBE U+1FC2..U+1FC4 U+1FC6..U+1FCC
2250     U+1FD0..U+1FD3 U+1FD6..U+1FDB U+1FE0..U+1FEC
2251     U+1FF2..U+1FF4 U+1FF6..U+1FFC U+2126
2252     U+212A..U+212B U+212E U+2180..U+2182
2253     U+3041..U+3094 U+30A1..U+30FA U+3105..U+312C
2254     U+AC00..U+D7A3
2255     // Ideographic
2256     U+4E00..U+9FA5 U+3007 U+3021..U+3029
2257     ];
2258     $NameChar10 := [
2259     '.' '-' '_' ':'
2260     // Letter
2261     // BaseChar
2262     U+0041..U+005A U+0061..U+007A U+00C0..U+00D6
2263     U+00D8..U+00F6 U+00F8..U+00FF U+0100..U+0131
2264     U+0134..U+013E U+0141..U+0148 U+014A..U+017E
2265     U+0180..U+01C3 U+01CD..U+01F0 U+01F4..U+01F5
2266     U+01FA..U+0217 U+0250..U+02A8 U+02BB..U+02C1
2267     U+0386 U+0388..U+038A U+038C U+038E..U+03A1
2268     U+03A3..U+03CE U+03D0..U+03D6 U+03DA U+03DC
2269     U+03DE U+03E0 U+03E2..U+03F3 U+0401..U+040C
2270     U+040E..U+044F U+0451..U+045C U+045E..U+0481
2271     U+0490..U+04C4 U+04C7..U+04C8 U+04CB..U+04CC
2272     U+04D0..U+04EB U+04EE..U+04F5 U+04F8..U+04F9
2273     U+0531..U+0556 U+0559 U+0561..U+0586
2274     U+05D0..U+05EA U+05F0..U+05F2 U+0621..U+063A
2275     U+0641..U+064A U+0671..U+06B7 U+06BA..U+06BE
2276     U+06C0..U+06CE U+06D0..U+06D3 U+06D5
2277     U+06E5..U+06E6 U+0905..U+0939 U+093D
2278     U+0958..U+0961 U+0985..U+098C U+098F..U+0990
2279     U+0993..U+09A8 U+09AA..U+09B0 U+09B2
2280     U+09B6..U+09B9 U+09DC..U+09DD U+09DF..U+09E1
2281     U+09F0..U+09F1 U+0A05..U+0A0A U+0A0F..U+0A10
2282     U+0A13..U+0A28 U+0A2A..U+0A30 U+0A32..U+0A33
2283     U+0A35..U+0A36 U+0A38..U+0A39 U+0A59..U+0A5C
2284     U+0A5E U+0A72..U+0A74 U+0A85..U+0A8B U+0A8D
2285     U+0A8F..U+0A91 U+0A93..U+0AA8 U+0AAA..U+0AB0
2286     U+0AB2..U+0AB3 U+0AB5..U+0AB9 U+0ABD U+0AE0
2287     U+0B05..U+0B0C U+0B0F..U+0B10 U+0B13..U+0B28
2288     U+0B2A..U+0B30 U+0B32..U+0B33 U+0B36..U+0B39
2289     U+0B3D U+0B5C..U+0B5D U+0B5F..U+0B61
2290     U+0B85..U+0B8A U+0B8E..U+0B90 U+0B92..U+0B95
2291     U+0B99..U+0B9A U+0B9C U+0B9E..U+0B9F
2292     U+0BA3..U+0BA4 U+0BA8..U+0BAA U+0BAE..U+0BB5
2293     U+0BB7..U+0BB9 U+0C05..U+0C0C U+0C0E..U+0C10
2294     U+0C12..U+0C28 U+0C2A..U+0C33 U+0C35..U+0C39
2295     U+0C60..U+0C61 U+0C85..U+0C8C U+0C8E..U+0C90
2296     U+0C92..U+0CA8 U+0CAA..U+0CB3 U+0CB5..U+0CB9
2297     U+0CDE U+0CE0..U+0CE1 U+0D05..U+0D0C
2298     U+0D0E..U+0D10 U+0D12..U+0D28 U+0D2A..U+0D39
2299     U+0D60..U+0D61 U+0E01..U+0E2E U+0E30
2300     U+0E32..U+0E33 U+0E40..U+0E45 U+0E81..U+0E82
2301     U+0E84 U+0E87..U+0E88 U+0E8A U+0E8D
2302     U+0E94..U+0E97 U+0E99..U+0E9F U+0EA1..U+0EA3
2303     U+0EA5 U+0EA7 U+0EAA..U+0EAB U+0EAD..U+0EAE
2304     U+0EB0 U+0EB2..U+0EB3 U+0EBD U+0EC0..U+0EC4
2305     U+0F40..U+0F47 U+0F49..U+0F69 U+10A0..U+10C5
2306     U+10D0..U+10F6 U+1100 U+1102..U+1103
2307     U+1105..U+1107 U+1109 U+110B..U+110C
2308     U+110E..U+1112 U+113C U+113E U+1140 U+114C
2309     U+114E U+1150 U+1154..U+1155 U+1159
2310     U+115F..U+1161 U+1163 U+1165 U+1167 U+1169
2311     U+116D..U+116E U+1172..U+1173 U+1175 U+119E
2312     U+11A8 U+11AB U+11AE..U+11AF U+11B7..U+11B8
2313     U+11BA U+11BC..U+11C2 U+11EB U+11F0 U+11F9
2314     U+1E00..U+1E9B U+1EA0..U+1EF9 U+1F00..U+1F15
2315     U+1F18..U+1F1D U+1F20..U+1F45 U+1F48..U+1F4D
2316     U+1F50..U+1F57 U+1F59 U+1F5B U+1F5D
2317     U+1F5F..U+1F7D U+1F80..U+1FB4 U+1FB6..U+1FBC
2318     U+1FBE U+1FC2..U+1FC4 U+1FC6..U+1FCC
2319     U+1FD0..U+1FD3 U+1FD6..U+1FDB U+1FE0..U+1FEC
2320     U+1FF2..U+1FF4 U+1FF6..U+1FFC U+2126
2321     U+212A..U+212B U+212E U+2180..U+2182
2322     U+3041..U+3094 U+30A1..U+30FA U+3105..U+312C
2323     U+AC00..U+D7A3
2324     // Ideographic
2325     U+4E00..U+9FA5 U+3007 U+3021..U+3029
2326     // Digit
2327     U+0030..U+0039 U+0660..U+0669 U+06F0..U+06F9
2328     U+0966..U+096F U+09E6..U+09EF U+0A66..U+0A6F
2329     U+0AE6..U+0AEF U+0B66..U+0B6F U+0BE7..U+0BEF
2330     U+0C66..U+0C6F U+0CE6..U+0CEF U+0D66..U+0D6F
2331     U+0E50..U+0E59 U+0ED0..U+0ED9 U+0F20..U+0F29
2332     // CombiningChar
2333     U+0300..U+0345 U+0360..U+0361 U+0483..U+0486
2334     U+0591..U+05A1 U+05A3..U+05B9 U+05BB..U+05BD
2335     U+05BF U+05C1..U+05C2 U+05C4 U+064B..U+0652
2336     U+0670 U+06D6..U+06DC U+06DD..U+06DF
2337     U+06E0..U+06E4 U+06E7..U+06E8 U+06EA..U+06ED
2338     U+0901..U+0903 U+093C U+093E..U+094C U+094D
2339     U+0951..U+0954 U+0962..U+0963 U+0981..U+0983
2340     U+09BC U+09BE U+09BF U+09C0..U+09C4
2341     U+09C7..U+09C8 U+09CB..U+09CD U+09D7
2342     U+09E2..U+09E3 U+0A02 U+0A3C U+0A3E U+0A3F
2343     U+0A40..U+0A42 U+0A47..U+0A48 U+0A4B..U+0A4D
2344     U+0A70..U+0A71 U+0A81..U+0A83 U+0ABC
2345     U+0ABE..U+0AC5 U+0AC7..U+0AC9 U+0ACB..U+0ACD
2346     U+0B01..U+0B03 U+0B3C U+0B3E..U+0B43
2347     U+0B47..U+0B48 U+0B4B..U+0B4D U+0B56..U+0B57
2348     U+0B82..U+0B83 U+0BBE..U+0BC2 U+0BC6..U+0BC8
2349     U+0BCA..U+0BCD U+0BD7 U+0C01..U+0C03
2350     U+0C3E..U+0C44 U+0C46..U+0C48 U+0C4A..U+0C4D
2351     U+0C55..U+0C56 U+0C82..U+0C83 U+0CBE..U+0CC4
2352     U+0CC6..U+0CC8 U+0CCA..U+0CCD U+0CD5..U+0CD6
2353     U+0D02..U+0D03 U+0D3E..U+0D43 U+0D46..U+0D48
2354     U+0D4A..U+0D4D U+0D57 U+0E31 U+0E34..U+0E3A
2355     U+0E47..U+0E4E U+0EB1 U+0EB4..U+0EB9
2356     U+0EBB..U+0EBC U+0EC8..U+0ECD U+0F18..U+0F19
2357     U+0F35 U+0F37 U+0F39 U+0F3E U+0F3F
2358     U+0F71..U+0F84 U+0F86..U+0F8B U+0F90..U+0F95
2359     U+0F97 U+0F99..U+0FAD U+0FB1..U+0FB7 U+0FB9
2360     U+20D0..U+20DC U+20E1 U+302A..U+302F U+3099
2361     U+309A
2362     // Extender
2363     U+00B7 U+02D0 U+02D1 U+0387 U+0640 U+0E46
2364     U+0EC6 U+3005 U+3031..U+3035 U+309D..U+309E
2365     U+30FC..U+30FE
2366     ];
2367 wakaba 1.1
2368     $NameStartChar11 := [
2369     ':' '_'
2370     'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K' 'L' 'M'
2371     'N' 'O' 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' 'X' 'Y' 'Z'
2372     'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm'
2373     'n' 'o' 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z'
2374     U+00C0..U+00D6 U+00D8..U+00F6 U+00F8..U+02FF
2375     U+0370..U+037D U+037F..U+1FFF U+200C..U+200D
2376     U+2070..U+218F U+2C00..U+2FEF U+3001..U+D7FF
2377     U+F900..U+FDCF U+FDF0..U+FFFD U+10000..U+EFFFF
2378     ];
2379     $NameChar11 := [
2380     '-' '.' '0' '1' '2' '3' '4' '5' '6' '7' '8' '9'
2381     U+00B7 U+0300..U+036F U+203F..U+2040
2382     // NameStartChar
2383     ':' '_'
2384     'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K' 'L' 'M'
2385     'N' 'O' 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' 'X' 'Y' 'Z'
2386     'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm'
2387     'n' 'o' 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z'
2388     U+00C0..U+00D6 U+00D8..U+00F6 U+00F8..U+02FF
2389     U+0370..U+037D U+037F..U+1FFF U+200C..U+200D
2390     U+2070..U+218F U+2C00..U+2FEF U+3001..U+D7FF
2391     U+F900..U+FDCF U+FDF0..U+FFFD U+10000..U+EFFFF
2392     ];
2393 wakaba 1.4 } // NameChar
2394    
2395     lexmode Name
2396     : extends => 'NameChar'
2397     {
2398 wakaba 1.1 Name : value := $NameStartChar11 $NameChar11*;
2399     } // Name
2400 wakaba 1.4
2401     lexmode Nmtoken
2402     : extends => 'NameChar'
2403     {
2404     Nmtoken : value := $NameChar11*;
2405     } // Nmtoken
2406 wakaba 1.1
2407     /*
2408     Space
2409     */
2410     lexmode S {
2411     S := [U+0009 U+000A U+000D U+0020]+;
2412     } // S
2413    
2414     /*
2415     Document end scanning mode
2416     */
2417     lexmode DocumentEnd
2418     : standalone
2419     : extends => 'S'
2420     {
2421     /*
2422     Processing instruction
2423     */
2424     PIO := ['<'] ['?'];
2425    
2426     /*
2427     Comment declaration
2428     */
2429     CDO := ['<'] ['!'] ['-'] ['-'];
2430     } // DocumentEnd
2431    
2432     /*
2433     Document misc scanning mode
2434    
2435     This mode scans |Misc| constructions as well
2436     as document element's start tag.
2437     */
2438     lexmode DocumentMisc
2439     : standalone
2440     : extends => 'DocumentEnd'
2441     {
2442     /*
2443     Document element start tag
2444     */
2445     STAGO := ['<'];
2446     } // DocumentMisc
2447    
2448     /*
2449     Document prolog scanning mode
2450     */
2451     lexmode DocumentProlog
2452     : standalone
2453     : extends => 'DocumentMisc'
2454     {
2455     /*
2456     |DOCTYPE| declaration
2457     */
2458     MDO := ['<'] ['!'];
2459     } // DocumentProlog
2460    
2461     /*
2462     Document start scanning mode
2463     */
2464     lexmode DocumentStart
2465     : initial
2466     : standalone
2467     : extends => 'DocumentProlog'
2468     {
2469     /*
2470     XML declaration
2471     */
2472     XDO := ['<'] ['?'] ['x'] ['m'] ['l'];
2473     } // DocumentStart
2474    
2475     /*
2476     Comment declaration scanning mode
2477     */
2478     lexmode CommentDeclaration
2479     : standalone
2480     {
2481     /*
2482     Comment close
2483     */
2484     COM := ['-'] ['-'];
2485    
2486     /*
2487     Comment data
2488     */
2489     $string := ['-']? [^'-'];
2490     STRING : value := $string+;
2491     } // CommentDeclaration
2492    
2493     /*
2494     Processing instruction name and |S| scanning mode
2495     */
2496     lexmode PIName
2497     : standalone
2498     : extends => 'Name'
2499     : extends => 'S'
2500     {
2501     /*
2502     Processing instruction close
2503     */
2504     PIC := ['?'] ['>'];
2505     } // PIName
2506    
2507     /*
2508     Processing instruction data scanning mode
2509     */
2510     lexmode PIData
2511     : standalone
2512     {
2513     /*
2514     Processing instruction close
2515     */
2516     PIC := ['?'] ['>'];
2517    
2518     /*
2519     Processing instruction target data
2520     */
2521 wakaba 1.2 ?default-token DATA : value;
2522 wakaba 1.1 } // PIData
2523    
2524     /*
2525     Content of element scanning mode
2526     */
2527     lexmode ElementContent
2528     : standalone
2529     {
2530     /*
2531     Start tag open
2532     */
2533     STAGO := ['<'];
2534    
2535     /*
2536     End tag open
2537     */
2538     ETAGO := ['<'] ['/'];
2539    
2540     /*
2541     Hexadecimal character reference open
2542     */
2543     HCRO := ['&'] ['#'] ['x'];
2544    
2545     /*
2546     Numeric character reference open
2547     */
2548     CRO := ['&'] ['#'];
2549    
2550     /*
2551     General entity reference open
2552     */
2553     ERO := ['&'];
2554    
2555     /*
2556     Comment declaration open
2557     */
2558     CDO := ['<'] ['!'] ['-'] ['-'];
2559    
2560     /*
2561     CDATA section open
2562     */
2563     CDSO := ['<'] ['!'] ['[']
2564     ['C'] ['D'] ['A'] ['T'] ['A'] ['['];
2565    
2566     /*
2567     Processing instruction open
2568     */
2569     PIO := ['<'] ['?'];
2570 wakaba 1.2
2571     /*
2572     Markup section end
2573     */
2574     MSE := [']'] [']'] ['>'];
2575    
2576     /*
2577     Character data
2578     */
2579     /*
2580     Character data and/or |MSE|
2581     */
2582     ?default-token CharData : value;
2583 wakaba 1.1 } // ElementContent
2584    
2585     /*
2586     CDATA section content scanning mode
2587     */
2588     lexmode CDATASectionContent
2589     : standalone
2590     {
2591     /*
2592     Markup section end
2593     */
2594     MSE := [']'] [']'] ['>'];
2595    
2596     /*
2597     Character data
2598 wakaba 1.2 */
2599     ?default-token CData : value;
2600 wakaba 1.1 } // CDATASectionContent
2601    
2602     lexmode EntityReference
2603     : standalone
2604     : extends => 'Name'
2605     {
2606     /*
2607     Reference close
2608     */
2609     REFC := [';'];
2610     } // EntityReference
2611    
2612     lexmode NumericCharacterReference
2613     : standalone
2614     {
2615     /*
2616     Decimal number
2617     */
2618     $digit := ['0' '1' '2' '3' '4' '5' '6' '7' '8' '9'];
2619     NUMBER : value := $digit+;
2620    
2621     /*
2622     Reference close
2623     */
2624     REFC := [';'];
2625     } // NumericCharacterReference
2626    
2627     lexmode HexadecimalCharacterReference
2628     : standalone
2629     {
2630     /*
2631     Hexadecimal number
2632     */
2633     $hexdigit := ['0' '1' '2' '3' '4' '5' '6' '7' '8' '9'
2634     'A' 'B' 'C' 'D' 'E' 'F'
2635     'a' 'b' 'c' 'd' 'e' 'f'];
2636     Hex : value := $hexdigit+;
2637    
2638     /*
2639     Reference close
2640     */
2641     REFC := [';'];
2642     } // HexadecimalCharacterReference
2643    
2644 wakaba 1.3 lexmode XMLDeclaration
2645     : standalone
2646     : extends => 'Name'
2647     : extends => 'S'
2648     {
2649    
2650     /*
2651     Value indicator
2652     */
2653     VI := ['='];
2654    
2655     /*
2656     Literal open
2657     */
2658     LIT := ['"'];
2659     LITA := [U+0027];
2660    
2661     /*
2662     Processing instruction close
2663     */
2664     PIC := ['?'] ['>'];
2665     } // XMLDeclaration
2666    
2667 wakaba 1.1 lexmode StartTag
2668     : standalone
2669     : extends => 'Name'
2670     : extends => 'S'
2671     {
2672    
2673     /*
2674     Value indicator
2675     */
2676     VI := ['='];
2677    
2678     /*
2679     Literal open
2680     */
2681     LIT := ['"'];
2682     LITA := [U+0027];
2683    
2684     /*
2685     Tag close
2686     */
2687     TAGC := ['>'];
2688    
2689     /*
2690     Empty element tag close
2691     */
2692     MTAGC := ['/'] ['>'];
2693     } // StartTag
2694    
2695     lexmode EndTag
2696     : standalone
2697     : extends => 'Name'
2698     : extends => 'S'
2699     {
2700     /*
2701     Tag close
2702     */
2703     TAGC := ['>'];
2704     } // EndTag
2705    
2706     lexmode AttributeValueLiteral_ {
2707     ERO := ['&'];
2708     CRO := ['&'] ['#'];
2709     HCRO := ['&'] ['#'] ['x'];
2710     } // AttributeValueLiteral_
2711    
2712     lexmode AttributeValueLiteral
2713     : standalone
2714     : extends => 'AttributeValueLiteral_'
2715     {
2716     LIT := ['"'];
2717 wakaba 1.3 STRING : value := [^'"' '&' '<']+;
2718 wakaba 1.1 } // AttributeValueLiteral
2719    
2720     lexmode AttributeValueLiteralA
2721     : standalone
2722     : extends => 'AttributeValueLiteral_'
2723     {
2724 wakaba 1.4 LITA := [U+0027];
2725 wakaba 1.3 STRING : value := [^U+0027 '&' '<']+;
2726 wakaba 1.1 } // AttributeValueLiteralA
2727 wakaba 1.4
2728     lexmode EntityValue_ {
2729     PERO := ['%'];
2730     ERO := ['&'];
2731     CRO := ['&'] ['#'];
2732     HCRO := ['&'] ['#'] ['x'];
2733     } // EntityValue_
2734    
2735     lexmode EntityValue
2736     : standalone
2737     : extends => 'EntityValue_'
2738     {
2739     LIT := ['"'];
2740     STRING : value := [^'"' '&' '%']+;
2741     } // EntityValue
2742    
2743     lexmode EntityValueA
2744     : standalone
2745     : extends => 'EntityValue_'
2746     {
2747     LITA := [U+0027];
2748     STRING : value := [^U+0027 '&' '%']+;
2749     } // EntityValueA
2750    
2751     lexmode SystemLiteral
2752     : standalone
2753     {
2754     LIT := ['"'];
2755     ?default-token STRING : value;
2756     } // SystemLiteral
2757    
2758     lexmode SystemLiteralA
2759     : standalone
2760     {
2761     LITA := [U+0027];
2762     ?default-token STRING : value;
2763     } // SystemLiteralA
2764    
2765     lexmode DTD
2766     : standalone
2767     : extends => 'S'
2768     {
2769     /*
2770     Markup declaration open
2771     */
2772     MDO := ['<'] ['!'];
2773    
2774     /*
2775     Comment declaration open
2776     */
2777     CDO := ['<'] ['!'] ['-'] ['-'];
2778    
2779     /*
2780     Processing instruction open
2781     */
2782     PIO := ['<'] ['?'];
2783    
2784     /*
2785     Parameter entity reference open
2786     */
2787     PERO := ['%'];
2788    
2789     /*
2790     Declaration subset close
2791     */
2792     DSC := [']'];
2793    
2794     /*
2795     Contition section open
2796     */
2797     CSO := ['<'] ['!'] ['['];
2798     } // DTD
2799    
2800     /*
2801     Markup declaration scanning mode
2802    
2803     This mode is used to recognize |MDC| that terminates
2804     a comment declaration as well as the base |lexmode|
2805     for e.g. document type declaration scanning mode.
2806     */
2807     lexmode MarkupDeclaration
2808     : standalone
2809     : extends => 'Name'
2810     : extends => 'S'
2811     {
2812     /*
2813     Markup declaration close
2814     */
2815     MDC := ['>'];
2816    
2817     /*
2818     Literal open
2819     */
2820     LIT := ['"'];
2821    
2822     /*
2823     Alternative literal open
2824     */
2825     LITA := [U+0027];
2826    
2827     /*
2828     Parameter entity reference open
2829     */
2830     PERO := ['%'];
2831    
2832     /*
2833     Declaration subset open
2834     */
2835     DSO := ['['];
2836     } // MarkupDeclaration
2837    
2838     lexmode ElementDeclaration
2839     : standalone
2840     : extends => 'MarkupDeclaration'
2841     {
2842     /*
2843     Model group open
2844     */
2845     MGO := ['('];
2846    
2847     /*
2848     Model group close
2849     */
2850     MGC := [')'];
2851    
2852     /*
2853     Reserved name indicator
2854     */
2855     RNI := ['#'];
2856    
2857     /*
2858     Occurence indicators
2859     */
2860     OPT := ['?'];
2861     REP := ['*'];
2862     PLUS := ['+'];
2863    
2864     /*
2865     Connectors
2866     */
2867     OR := ['|'];
2868     SEQ := [','];
2869     } // ElementDeclaration
2870    
2871     lexmode AttlistDeclaration
2872     : standalone
2873     : extends => 'MarkupDeclaration'
2874     {
2875     /*
2876     Enumeration group open
2877     */
2878     EGO := ['('];
2879    
2880     /*
2881     Enumeration group close
2882     */
2883     EGC := [')'];
2884    
2885     /*
2886     Enumeration choice delimiter
2887     */
2888     OR := ['|'];
2889    
2890     /*
2891     Reserved name indicator
2892     */
2893     RNI := ['#'];
2894     } // AttlistDeclaration
2895    
2896     lexmode Enumeration
2897     : standalone
2898     : extends => 'Nmtoken'
2899     : extends => 'S'
2900     {
2901     /*
2902     Enumeration group close
2903     */
2904     EGC := [')'];
2905    
2906     /*
2907     Enumeration choice delimiter
2908     */
2909     OR := ['|'];
2910     } // Enumeration
2911    
2912 wakaba 1.1
2913     token-error default : default {
2914     lang:Perl {
2915 wakaba 1.3 my $location;
2916     __CODE{xp|get-location-from-token::
2917     $token => {$token},
2918     $result => {$location},
2919     }__;
2920     my $continue = __DOMCore:ERROR{xp|wf-syntax-error::
2921 wakaba 1.1 xp|error-token => {$token},
2922     DOMCore|location => {$location},
2923     }__;
2924     unless ($continue) {
2925     __EXCEPTION{DOMLS|PARSE_ERR::
2926     }__;
2927     }
2928     $self->{has_error} = true;
2929     }
2930     } // default
2931 wakaba 1.5
2932     @XMLTests:
2933     @@XMLTest:
2934     @@@DEnt:
2935     @@@@test:value:
2936     <p></p>
2937     @@@test:domTree:
2938     document {
2939     xml-version: '1.0';
2940     element {
2941     namespace-uri: null;
2942     prefix: null;
2943     local-name: 'p';
2944     text-content: '';
2945     }
2946     }
2947     @@PerlDef:
2948     my $impl = $Message::DOM::ImplementationRegistry->get_implementation ({
2949     'Core' => '3.0',
2950     'XML' => '3.0',
2951     'XMLVersion' => ['1.0', '1.1'],
2952     });
2953     my $parser = <Class::ManakaiXMLParser>->new ($impl);
2954    
2955     for my $test_data (@$TestData) {
2956     $test->start_new_test ($test_data->{uri});
2957     my $doc_ent = $test_data->{entity}->{$test_data->{root_uri}};
2958     my $doc = $parser->parse_string ($doc_ent->{<H::test:value>});
2959     try {
2960     $test->assert_dom_tree_equals
2961     (actual_value => $doc,
2962     expected_hash => $test_data->{dom_tree});
2963     $test->ok;
2964     } catch Message::Util::IF::DTException with { };
2965     }
2966    
2967 wakaba 1.1 ##ManakaiXMLParser
2968 wakaba 1.5
2969     ElementTypeBinding:
2970     @Name: XMLTests
2971     @ElementType:
2972     dis:ResourceDef
2973     @ShadowContent:
2974     @@ForCheck: ManakaiDOM|ForClass
2975     @@rdf:type: test|ParserTestSet
2976    
2977     ElementTypeBinding:
2978     @Name: XMLTest
2979     @ElementType:
2980     dis:ResourceDef
2981     @ShadowContent:
2982     @@ForCheck: ManakaiDOM|ForClass
2983     @@rdf:type: test|ParserTest
2984    
2985     ElementTypeBinding:
2986     @Name: DEnt
2987     @ElementType:
2988     dis:ResourceDef
2989     @ShadowContent:
2990     @@rdf:type: test|RootEntity
2991    
2992     ElementTypeBinding:
2993     @Name: Ent
2994     @ElementType:
2995     dis:ResourceDef
2996     @ShadowContent:
2997     @@rdf:type: test|Entity
2998 wakaba 1.1
2999 wakaba 1.3 ResourceDef:
3000     @QName: xp|get-location-from-token
3001     @rdf:type: DISPerl|BlockCode
3002     @enDesc:
3003     Creates a <IF::DOMCore:DOMLocator> object from a token.
3004     @PerlDef:
3005     $result = {
3006     utf32_offset => pos ($self->{source}),
3007     };
3008     @For: ManakaiDOM|ManakaiDOM3
3009 wakaba 1.1
3010     ElementTypeBinding:
3011     @Name: RuleDef
3012     @ElementType:
3013     dis:ResourceDef
3014     @ShadowContent:
3015     @@ForCheck: ManakaiDOM|ForClass
3016     @@rdf:type: Muf2003|RuleDefClass
3017    
3018     ElementTypeBinding:
3019     @Name: RuleParam
3020     @ElementType:
3021     dis:ResourceDef
3022     @ShadowContent:
3023     @@rdf:type: Muf2003|RuleParameter
3024    
3025     ElementTypeBinding:
3026     @Name: enImplNote
3027     @ElementType:
3028     dis:ImplNote
3029     @ShadowContent:
3030     @@lang:en
3031    
3032     ElementTypeBinding:
3033     @Name: ErrDef
3034     @ElementType:
3035     dis:ResourceDef
3036     @ShadowContent:
3037     @@rdf:type: DOMCore|DOMErrorType
3038     @@For: ManakaiDOM|DOM3
3039     @@ecore:textFormatter: ManakaiXMLParserExceptionFormatter
3040    
3041 wakaba 1.3 ElementTypeBinding:
3042     @Name: WFErrDef
3043     @ElementType:
3044     dis:ResourceDef
3045     @ShadowContent:
3046     @@rdf:type: DOMCore|DOMErrorType
3047     @@For: ManakaiDOM|DOM3
3048     @@ecore:textFormatter: ManakaiXMLParserExceptionFormatter
3049    
3050     WFErrDef:
3051     @QName: xp|wf-syntax-error
3052     @enDesc:
3053     The entity does not match to the production rule; it is not
3054     well-formed.
3055     @DOMCore:severity: DOMCore|SEVERITY_FATAL_ERROR
3056     @enMufDef:
3057     |%xp-error-token-type;|%xp-error-token-value
3058     (prefix => { (|}, suffix => {|)}); is not
3059     allowed%xp-error-lines (prefix => { (|}, suffix => {|)});
3060     @ecore:hasParameter:
3061     @@@: xp|error-token
3062     @@enDesc:
3063     The token that is not allowed.
3064    
3065     WFErrDef:
3066     @QName: xp|wf-pi-target-is-xml
3067     @enDesc:
3068     A processing instruction has its <CODE::PITarget> of
3069     <XML::xml> (in any case) which is not allowed.
3070     @DOMCore:severity: DOMCore|SEVERITY_ERROR
3071     @enMufDef:
3072     Processing instruction target name cannot be |%p
3073     (name => {<Q::xp|name>});|
3074     @ecore:hasParameter:
3075     @@@: xp|error-token
3076     @@enDesc:
3077     The token that contains the name.
3078     @ecore:hasParameter:
3079     @@@: xp|name
3080     @@enDesc:
3081     A string that is specified as target name of the
3082     processing instruction.
3083     @ecore:hasParameter: xp|parent
3084    
3085     WFErrDef:
3086     @QName: xp|wf-no-end-tag
3087     @DOMCore:severity: DOMCore|SEVERITY_ERROR
3088     @enDesc:
3089     An end-tag is not found.
3090     @enMufDef:
3091     End-tag |</%p (name => {<Q::xp|expected-element-type>});>| is required
3092     @ecore:hasParameter: xp|error-token
3093     @ecore:hasParameter:
3094     @@@: xp|node
3095     @@enDesc:
3096     The element node that is not closed.
3097     @ecore:hasParameter:
3098     @@@: xp|expected-element-type
3099     @@enDesc:
3100     The element type name of the element that is not closed.
3101    
3102     WFErrDef:
3103     @QName: xp|wf-unsupported-xml-version
3104     @DOMCore:severity: DOMCore|SEVERITY_ERROR
3105     @enDesc:
3106     The XML version specified in the version declaration is not supported.
3107     @enMufDef:
3108     XML version |%p (name => {<Q::infoset|version>});| is not supported
3109     @ecore:hasParameter: xp|bad-token
3110     @ecore:hasParameter:
3111     @@@: xp|parent
3112     @@enDesc:
3113     The document node.
3114     @ecore:hasParameter:
3115     @@@: infoset|version
3116     @@enDesc:
3117     The specified XML version.
3118    
3119     WFErrDef:
3120     @QName: xp|wf-malformed-enc-name
3121     @DOMCore:severity: DOMCore|SEVERITY_ERROR
3122     @enDesc:
3123     An <XA::encoding> pseudo-attribute value does not match
3124     to the procduction rule <CODE::EncName>.
3125     @enMufDef:
3126     Encoding name |%p (name => {<Q::xp|name>});| is not allowed
3127     @ecore:hasParameter: xp|error-token
3128     @ecore:hasParameter:
3129     @@@: xp|parent
3130     @@enDesc: The document node.
3131     @ecore:hasParameter:
3132     @@@: xp|name
3133     @@enDesc:
3134     The <XA::encoding> value.
3135    
3136     WFErrDef:
3137     @QName: xp|wf-malformed-xml-standalone
3138     @DOMCore:severity: DOMCore|SEVERITY_ERROR
3139     @enDesc:
3140     An <XA::standalone> pseudo-attribute value is neither <XML::yes>
3141     or <XML::no>.
3142     @enMufDef:
3143     |standalone| pseudo-attribute value |%p (name => {<Q::xp|name>});|
3144     is not allowed
3145     @ecore:hasParameter: xp|error-token
3146     @ecore:hasParameter:
3147     @@@: xp|parent
3148     @@enDesc: The document node.
3149     @ecore:hasParameter:
3150     @@@: xp|name
3151     @@enDesc:
3152     The <XA::standalone> value.
3153    
3154     WFErrDef:
3155     @QName: xp|wf-legal-literal-character
3156     @DOMCore:severity: DOMCore|SEVERITY_ERROR
3157     @enDesc:
3158     Each character in XML entity must match to the production
3159     rule <CODE::Char - RestrictedChar>.
3160     @enMufDef:
3161     Character %character-code-point
3162     (v => {<Q::xp|character-number>}); is not allowed
3163     @ecore:hasParameter:
3164     @@@: xp|character-number
3165     @@enDesc:
3166     The code position of the character being referred.
3167    
3168     WFErrDef:
3169     @QName: xp|wf-element-type-match
3170     @DOMCore:severity: DOMCore|SEVERITY_FATAL_ERROR
3171 wakaba 1.1 @enDesc:
3172 wakaba 1.3 The <CODE::Name> in an element's end-tag must match the element type
3173     in the start-tag.
3174     @enMufDef:
3175     End-tag |</%p (name => {<Q::xp|actual-element-type>});>| does
3176     not match to start-tag |<%p (name => {<Q::xp|expected-element-type>});>|
3177     @ecore:hasParameter: xp|error-token
3178     @ecore:hasParameter:
3179     @@@: xp|node
3180     @@enDesc:
3181     The current opening element node.
3182     @ecore:hasParameter:
3183     @@@: xp|expected-element-type
3184     @@enDesc:
3185     The element type name of the current element.
3186     @ecore:hasParameter:
3187     @@@: xp|actual-element-type
3188     @@enDesc:
3189     The <CODE::Name> occurs in the end-tag.
3190    
3191     WFErrDef:
3192     @QName: xp|wf-unique-att-spec
3193 wakaba 1.1 @DOMCore:severity: DOMCore|SEVERITY_ERROR
3194 wakaba 1.3 @enDesc:
3195     An attribute name <kwd:MUST-NOT> appear more than once in
3196     the same start-tag or empty-element tag.
3197 wakaba 1.1 @enMufDef:
3198 wakaba 1.3 Attribute |%p (name => {<Q::xp|name>});| is specified more
3199     than once in the same tag
3200     @ecore:hasParameter: xp|error-token
3201     @ecore:hasParameter:
3202     @@@: xp|name
3203     @@enDesc:
3204     The name of the attribute.
3205    
3206     WFErrDef:
3207     @QName: xp|wf-legal-character
3208     @DOMCore:severity: DOMCore|SEVERITY_ERROR
3209     @enDesc:
3210     Characters referred to using character references <kwd:MUST>
3211     match the production for <CODE::Char>.
3212     @enMufDef:
3213     Reference to character %character-code-point
3214     (v => {<Q::xp|character-number>}); is not allowed
3215     @ecore:hasParameter: xp|error-token
3216     @ecore:hasParameter:
3217     @@@: xp|character-number
3218     @@enDesc:
3219     The code position of the character being referred.
3220     @ecore:hasParameter:
3221     @@@: xp|parent
3222     @@enDesc:
3223     The parent node in which the character reference has
3224     occurred, if available.
3225 wakaba 1.1
3226 wakaba 1.3 XWParam:
3227 wakaba 1.1 @QName: xp|error-token
3228     @enDesc:
3229     The token where the parser found an error.
3230    
3231 wakaba 1.3 XWParam:
3232     @QName: xp|name
3233     @enDesc:
3234     A name.
3235    
3236     XWParam:
3237     @QName: xp|parent
3238     @enDesc:
3239     The parent node in which the error occurs.
3240    
3241     XWParam:
3242     @QName: xp|node
3243     @enDesc:
3244     The current node.
3245    
3246     XWParam:
3247     @QName: xp|actual-element-type
3248     @enDesc:
3249     The actual element type name occured in the source.
3250    
3251     XWParam:
3252     @QName: xp|expected-element-type
3253 wakaba 1.1 @enDesc:
3254 wakaba 1.3 The element type name expected.
3255    
3256     XWParam:
3257     @QName: xp|character-number
3258     @enDesc:
3259     The character code position.
3260    
3261     ElementTypeBinding:
3262     @Name: XWParam
3263     @ElementType:
3264     dis:ResourceDef
3265     @ShadowContent:
3266     @@For: =ManakaiDOM|all
3267     @@rdf:type: ecore|Parameter
3268 wakaba 1.1
3269     ElementTypeBinding:
3270     @Name:enMufDef
3271     @ElementType:
3272     ecore:defaultMessage
3273     @ShadowContent:
3274     @@lang:en
3275     @@ContentType:
3276     lang:muf
3277    
3278     ResourceDef:
3279     @QName: DOMImpl
3280     @AliasFor: DOMCore|DOMImplementation
3281     @For: ManakaiDOM|DOM
3282    
3283     ElementTypeBinding:
3284     @Name: Attr
3285     @ElementType:
3286     dis:ResourceDef
3287     @ShadowContent:
3288     @@rdf:type: DISLang|Attribute
3289     @@ForCheck: !=ManakaiDOM|ManakaiDOM
3290    
3291     ElementTypeBinding:
3292     @Name: Get
3293     @ElementType:
3294     dis:ResourceDef
3295     @ShadowContent:
3296     @@rdf:type: DISLang|AttributeGet
3297    
3298     ElementTypeBinding:
3299     @Name: Set
3300     @ElementType:
3301     dis:ResourceDef
3302     @ShadowContent:
3303     @@rdf:type: DISLang|AttributeSet
3304    
3305     ElementTypeBinding:
3306     @Name: enDesc
3307     @ElementType:
3308     dis:Description
3309     @ShadowContent:
3310     @@lang:en
3311    
3312     ElementTypeBinding:
3313     @Name: Method
3314     @ElementType:
3315     dis:ResourceDef
3316     @ShadowContent:
3317     @@rdf:type: DISLang|Method
3318     @@For: !=ManakaiDOM|ManakaiDOM
3319    
3320     ElementTypeBinding:
3321     @Name: Return
3322     @ElementType:
3323     dis:ResourceDef
3324     @ShadowContent:
3325     @@rdf:type: DISLang|MethodReturn
3326    
3327     ElementTypeBinding:
3328     @Name: Param
3329     @ElementType:
3330     dis:ResourceDef
3331     @ShadowContent:
3332     @@rdf:type: DISLang|MethodParameter
3333    
3334     ElementTypeBinding:
3335     @Name: PerlDef
3336     @ElementType:
3337     dis:Def
3338     @ShadowContent:
3339     @@ContentType: lang|Perl
3340    
3341     ElementTypeBinding:
3342     @Name: PropDef
3343     @ElementType:
3344     dis:ResourceDef
3345     @ShadowContent:
3346     @@rdf:type: rdf|Property
3347    
3348     ClsDef:
3349     @ClsQName: ManakaiXMLParserExceptionFormatter
3350    
3351     @ClsISA: ecore|MUErrorFormatter||ManakaiDOM|Perl
3352    
3353     @RuleDef:
3354     @@Name: xp-error-token-type
3355     @@enDesc:
3356     The type of the token the parser is encountered.
3357    
3358     @@Method:
3359     @@@Name: after
3360     @@@Param:
3361     @@@@Name: name
3362     @@@@Type: DOMString
3363     @@@@enDesc: The name of the method.
3364     @@@Param:
3365     @@@@Name: p
3366     @@@@Type: DISPerl|HASH
3367     @@@@enDesc: The set of the parameters to the method.
3368     @@@Param:
3369     @@@@Name: o
3370     @@@@Type: DISPerl|HASH
3371     @@@@enDesc: The option value.
3372     @@@Return:
3373     @@@@PerlDef:
3374     $p->{-result} = $o->{<H::xp|error-token>}->{type}
3375     if defined $o->{<H::xp|error-token>}->{type};
3376    
3377     @RuleDef:
3378     @@Name: xp-error-token-value
3379     @@enDesc:
3380     The value of the token the parser is encountered, if any.
3381    
3382     @@Method:
3383     @@@Name: after
3384     @@@Param:
3385     @@@@Name: name
3386     @@@@Type: DOMString
3387     @@@@enDesc: The name of the method.
3388     @@@Param:
3389     @@@@Name: p
3390     @@@@Type: DISPerl|HASH
3391     @@@@enDesc: The set of the parameters to the method.
3392     @@@Param:
3393     @@@@Name: o
3394     @@@@Type: DISPerl|HASH
3395     @@@@enDesc: The option value.
3396     @@@Return:
3397     @@@@PerlDef:
3398     $p->{-result} = $o->{<H::xp|error-token>}->{value}
3399     if defined $o->{<H::xp|error-token>}->{value};
3400    
3401     @RuleDef:
3402     @@Name: xp-error-lines
3403     @@enDesc:
3404     A copy of fragment of the source text that contains the line
3405     where the error occurred, if available.
3406    
3407     @@Method:
3408     @@@Name: after
3409     @@@Param:
3410     @@@@Name: name
3411     @@@@Type: DOMString
3412     @@@@enDesc: The name of the method.
3413     @@@Param:
3414     @@@@Name: p
3415     @@@@Type: DISPerl|HASH
3416     @@@@enDesc: The set of the parameters to the method.
3417     @@@Param:
3418     @@@@Name: o
3419     @@@@Type: DISPerl|HASH
3420     @@@@enDesc: The option value.
3421     @@@Return:
3422     @@@@PerlDef:
3423     my $pos = $o-><AG::DOMCore|DOMError.location>
3424     -><AG::DOMCore|DOMLocator.utf32Offset>;
3425     if ($pos > -1) {
3426 wakaba 1.3 my $src = \($o->{<H::ecore|object>}->{source});
3427 wakaba 1.1 my $start = $pos;
3428     $start = rindex ($$src, "\x0A", $start - 1) for 0..2;
3429     $start++;
3430     my $end = $pos;
3431     $end = index ($$src, "\x0A", $end + 1) for 0..2;
3432     $end = length $$src if $end < 0;
3433     $p->{-result} = substr $$src, $start, $end - $start;
3434     }
3435 wakaba 1.3
3436     @RuleDef:
3437     @@Name: character-code-point
3438     @@enDesc:
3439     The character code position, in <CODE::U+<VAR::HHHH>> notation.
3440    
3441     @@Method:
3442     @@@Name: after
3443     @@@Param:
3444     @@@@Name: name
3445     @@@@Type: DOMString
3446     @@@@enDesc: The name of the method.
3447     @@@Param:
3448     @@@@Name: p
3449     @@@@Type: DISPerl|HASH
3450     @@@@enDesc: The set of the parameters to the method.
3451     @@@Param:
3452     @@@@Name: o
3453     @@@@Type: DISPerl|HASH
3454     @@@@enDesc: The option value.
3455     @@@RuleParam:
3456     @@@@Name: v
3457     @@@@Type: DISPerl|Number
3458     @@@@enDesc:
3459     The name of the error parameter that contains the character code.
3460     @@@Return:
3461     @@@@PerlDef:
3462     $p->{-result} = sprintf 'U+%04X', $o->{$p->{v}};
3463 wakaba 1.1 ##XMLParserExceptionFormatter

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24