/[suikacvs]/messaging/manakai/lib/Message/Charset/Encode.dis
Suika

Contents of /messaging/manakai/lib/Message/Charset/Encode.dis

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.13 - (hide annotations) (download)
Sat Dec 30 12:00:38 2006 UTC (18 years, 3 months ago) by wakaba
Branch: MAIN
Changes since 1.12: +1 -6 lines
++ manakai/lib/Message/Markup/ChangeLog	30 Dec 2006 11:55:48 -0000
	* Atom.dis, SuikaWiki.dis, H2H.dis, SuikaWikiConfig21.dis: |For|
	specifications are removed.

	* SuikaWikiConfig21.dis: |WithFor| and |DefaultFor|
	specifications are removed.
	(ForEmpty, ForLatest): Removed.

2006-12-30  Wakaba  <wakaba@suika.fam.cx>

++ manakai/lib/Message/Util/ChangeLog	30 Dec 2006 11:57:42 -0000
	* PerlCode.dis, DIS.dis, ManakaiNode.dis,
	ManakaiNodeTest.dis: |For| specifications are removed.

	* common.dis: New module.

	* DIS.dis, PerlCode.dis, ManakaiNode.dis: |Util:| resource
	definitions are removed (and moved to |common.dis|).

	* DIS.dis (ForEmpty, ForLatest): Removed.

	* DIS.dis: |WithFor| and |DefaultFor| are removed.

2006-12-30  Wakaba  <wakaba@suika.fam.cx>

++ manakai/lib/Message/Util/Error/ChangeLog	30 Dec 2006 11:59:28 -0000
2006-12-30  Wakaba  <wakaba@suika.fam.cx>

	* Core.dis, DOMException.dis: |WithFor|, |DefaultFor|,
	and |For| specificaitons are removed.

++ manakai/lib/Message/Util/Formatter/ChangeLog	30 Dec 2006 11:59:59 -0000
2006-12-30  Wakaba  <wakaba@suika.fam.cx>

	* Muf2003.dis: |WithFor|, |DefaultFor|, and |For|
	specifications are removed.

++ manakai/lib/Message/Util/DIS/ChangeLog	30 Dec 2006 11:58:54 -0000
	* Perl.dis, Value.dis, DNLite.dis,
	DPG.dis, Test.dis: |WithFor|, |For|, and |DefaultFor|
	specifications are removed.

2006-12-30  Wakaba  <wakaba@suika.fam.cx>

++ manakai/lib/Message/DOM/ChangeLog	30 Dec 2006 11:53:43 -0000
        SimpleLS.dis, DOMMain.dis, XDP.dis: |For| specifications
	are removed.

2006-12-30  Wakaba  <wakaba@suika.fam.cx>

	* CharacterData.dis, DOMCore.dis, DOMFeature.dis,
        GenericLS.dis, TreeCore.dis, DOMString.dis,
        XML.dis, Element.dis, Document.dis, TreeStore,dis,
        Traversal.dis, XDoctype.dis, XMLParser.dis, DOMLS.dis,
++ manakai/lib/Message/URI/ChangeLog	30 Dec 2006 11:54:30 -0000
	* Generic.dis: |For| specifications are removed.

2006-12-30  Wakaba  <wakaba@suika.fam.cx>

++ manakai/lib/Message/Charset/ChangeLog	30 Dec 2006 11:54:10 -0000
	* Encode.dis: |For| specifications are removed.

2006-12-30  Wakaba  <wakaba@suika.fam.cx>

++ manakai/lib/manakai/ChangeLog	30 Dec 2006 12:00:29 -0000
	* XML.dis: |DefaultFor| specification is removed.

2006-12-30  Wakaba  <wakaba@suika.fam.cx>

1 wakaba 1.1 Module:
2     @QName: MCharset|Encode
3     @FullName:
4     @@lang: en
5     @@@: Manakai Charset Encode Module
6     @Namespace:
7     http://suika.fam.cx/~wakaba/archive/2005/manakai/Charset/Encode/
8    
9     @enDesc:
10     The <DFN::manakai Charset Encode Module> is ...
11    
12     {TODO::
13     This module is subject to change.
14     }
15    
16     @DISCore:author: DISCore|Wakaba
17     @License: license|Perl+MPL
18     @Date:
19 wakaba 1.13 $Date: 2006/12/30 08:27:43 $
20 wakaba 1.1
21     @Require:
22     @@Module:
23     @@@QName: MDOM|DOMFeature
24     @@Module:
25 wakaba 1.2 @@@QName: DISlib|Charset
26 wakaba 1.8 @@Module:
27     @@@QName: MDOM|DOMCore
28 wakaba 1.10 @@Module:
29     @@@QName: MDOM|DOMString
30 wakaba 1.1
31     Namespace:
32 wakaba 1.8 @c:
33     http://suika.fam.cx/~wakaba/archive/2004/8/18/dom-core#
34 wakaba 1.2 @cs:
35     http://suika.fam.cx/~wakaba/archive/2004/dis/Charset/
36 wakaba 1.1 @dis:
37     http://suika.fam.cx/~wakaba/archive/2004/8/18/lang#dis--
38     @DISlib:
39     http://suika.fam.cx/~wakaba/archive/2004/dis/
40 wakaba 1.8 @domperl:
41     http://suika.fam.cx/~wakaba/archive/2006/dom/perl/
42 wakaba 1.1 @dx:
43     http://suika.fam.cx/~wakaba/archive/2005/manakai/Util/Error/DOMException#
44     @f:
45     http://suika.fam.cx/~wakaba/archive/2004/dom/feature#
46     @fe:
47     http://suika.fam.cx/www/2006/feature/
48 wakaba 1.2 @icharset:
49     urn:x-suika-fam-cx:charset:
50 wakaba 1.1 @idl:
51     http://suika.fam.cx/~wakaba/archive/2004/dis/IDL#
52     @kwd:
53     http://suika.fam.cx/~wakaba/archive/2005/rfc2119/
54     @lang:
55     http://suika.fam.cx/~wakaba/archive/2004/8/18/lang#
56     @license:
57     http://suika.fam.cx/~wakaba/archive/2004/8/18/license#
58     @ManakaiDOM:
59     http://suika.fam.cx/~wakaba/archive/2004/8/18/manakai-dom#
60     @MCharset:
61     http://suika.fam.cx/~wakaba/archive/2005/manakai/Charset/
62 wakaba 1.2 @mce:
63     http://suika.fam.cx/~wakaba/archive/2005/manakai/Charset/Encode/
64 wakaba 1.1 @MDOM:
65     http://suika.fam.cx/~wakaba/archive/2004/8/18/manakai-dom#ManakaiDOM.
66     @mn:
67     http://suika.fam.cx/~wakaba/archive/2005/manakai/Util/ManakaiNode#
68     @rdf:
69     http://www.w3.org/1999/02/22-rdf-syntax-ns#
70     @rdfs:
71     http://www.w3.org/2000/01/rdf-schema#
72 wakaba 1.10 @str:
73     http://suika.fam.cx/~wakaba/archive/2005/manakai/DOM/DOMString/
74 wakaba 1.1 @test:
75     http://suika.fam.cx/~wakaba/archive/2004/dis/Test#
76 wakaba 1.2 @xml-auto-charset:
77     http://suika.fam.cx/www/2006/03/xml-entity/
78 wakaba 1.1
79     ResourceDef:
80     @QName: MCharset|
81     @rdf:type: dis|ModuleGroup
82     @FullName:
83     @@lang:en
84     @@@:
85     The manakai Charset modules
86     @DISPerl:packageName:
87     Message::Charset::
88     @DISPerl:interfacePackageName:
89 wakaba 1.12 Message::Charset::IF::
90 wakaba 1.1
91     ## -- Features
92    
93 wakaba 1.2 FeatureDef:
94     @QName: MCEncodeFeature
95     @featureQName: fe|MCEncode
96     @FeatureVerDef:
97     @@QName: MCEncodeFeature10
98     @@f:instanceOf: MCEncodeFeature
99     @@Version: 1.0
100     @@enDesc:
101     The manakai Charset Encode Module, version 1.0
102    
103 wakaba 1.1 ElementTypeBinding:
104     @Name: FeatureDef
105     @ElementType:
106     dis:ResourceDef
107     @ShadowContent:
108     @@rdf:type: f|Feature
109    
110     ElementTypeBinding:
111     @Name: FeatureVerDef
112     @ElementType:
113     dis:ResourceDef
114     @ShadowContent:
115     @@rdf:type: f|Feature
116    
117     ElementTypeBinding:
118     @Name: featureQName
119     @ElementType:
120     f:name
121     @ShadowContent:
122     @@ContentType: DISCore|QName
123    
124     ElementTypeBinding:
125     @Name: IFQName
126     @ElementType:
127     dis:QName
128     @ShadowContent:
129     @@ForCheck: ManakaiDOM|ForIF
130    
131     ElementTypeBinding:
132     @Name: ClsQName
133     @ElementType:
134     dis:QName
135     @ShadowContent:
136     @@ForCheck: ManakaiDOM|ForClass
137    
138     ElementTypeBinding:
139     @Name: IFISA
140     @ElementType:
141     dis:ISA
142     @ShadowContent:
143     @@ForCheck: ManakaiDOM|ForIF
144    
145     ElementTypeBinding:
146     @Name: ClsISA
147     @ElementType:
148     dis:ISA
149     @ShadowContent:
150     @@ForCheck: ManakaiDOM|ForClass
151    
152     ElementTypeBinding:
153     @Name: IFClsDef
154     @ElementType:
155     dis:ResourceDef
156     @ShadowContent:
157     @@rdf:type:
158     @@@@: dis|MultipleResource
159     @@@ForCheck: !ManakaiDOM|ForIF !ManakaiDOM|ForClass
160     @@resourceFor: ManakaiDOM|ForIF
161     @@resourceFor:
162     @@@@: ManakaiDOM|ForClass
163    
164     @@rdf:type:
165     @@@@: DISLang|Interface
166     @@@ForCheck: ManakaiDOM|ForIF
167    
168     @@rdf:type:
169     @@@@: DISLang|Class
170     @@@ForCheck: ManakaiDOM|ForClass
171 wakaba 1.11
172 wakaba 1.1 @@Implement:
173 wakaba 1.12 @@@@: ||+||ManakaiDOM|ForIF
174 wakaba 1.1 @@@ContentType: DISCore|TFPQNames
175 wakaba 1.12 @@@ForCheck: ManakaiDOM|ForClass
176 wakaba 1.1
177     @@f:implements:
178 wakaba 1.2 @@@@: MCEncodeFeature10
179    
180     ElementTypeBinding:
181     @Name: ClsDef
182     @ElementType:
183     dis:ResourceDef
184     @ShadowContent:
185     @@rdf:type:
186     @@@@: dis|MultipleResource
187     @@@ForCheck: !ManakaiDOM|ForIF !ManakaiDOM|ForClass
188     @@resourceFor:
189     @@@@: ManakaiDOM|ForClass
190    
191     @@rdf:type:
192     @@@@: DISLang|Class
193     @@@ForCheck: ManakaiDOM|ForClass
194    
195     @@f:implements:
196     @@@@: MCEncodeFeature10
197 wakaba 1.1
198     ElementTypeBinding:
199     @Name: Method
200     @ElementType:
201     dis:ResourceDef
202     @ShadowContent:
203     @@rdf:type: DISLang|Method
204    
205     ElementTypeBinding:
206     @Name: IntMethod
207     @ElementType:
208     dis:ResourceDef
209     @ShadowContent:
210     @@rdf:type: DISLang|Method
211 wakaba 1.11 @@ForCheck: ManakaiDOM|ForClass
212 wakaba 1.1 @@ManakaiDOM:isForInternal: 1
213    
214     ElementTypeBinding:
215     @Name: Param
216     @ElementType:
217     dis:ResourceDef
218     @ShadowContent:
219     @@DISCore:resourceType: DISLang|MethodParameter
220    
221     ElementTypeBinding:
222     @Name: NamedParam
223     @ElementType:
224     dis:ResourceDef
225     @ShadowContent:
226     @@DISCore:resourceType: DISLang|MethodParameter
227     @@DISPerl:isNamedParameter: 1
228    
229     ElementTypeBinding:
230     @Name: Return
231     @ElementType:
232     dis:ResourceDef
233     @ShadowContent:
234     @@DISCore:resourceType: DISLang|MethodReturn
235    
236     ElementTypeBinding:
237     @Name: Attr
238     @ElementType:
239     dis:ResourceDef
240     @ShadowContent:
241     @@DISCore:resourceType: DISLang|Attribute
242    
243     ElementTypeBinding:
244     @Name: Get
245     @ElementType:
246     dis:ResourceDef
247     @ShadowContent:
248     @@DISCore:resourceType: DISLang|AttributeGet
249    
250     ElementTypeBinding:
251     @Name: Set
252     @ElementType:
253     dis:ResourceDef
254     @ShadowContent:
255     @@DISCore:resourceType: DISLang|AttributeSet
256    
257     ElementTypeBinding:
258     @Name: InCase
259     @ElementType:
260     dis:ResourceDef
261     @ShadowContent:
262     @@DISCore:resourceType: ManakaiDOM|InCase
263    
264     ElementTypeBinding:
265     @Name: PerlDef
266     @ElementType:
267     dis:Def
268     @ShadowContent:
269     @@ContentType:
270     lang:Perl
271     @@ForCheck: ManakaiDOM|ForClass
272    
273     ElementTypeBinding:
274     @Name: Test
275     @ElementType:
276     dis:ResourceDef
277     @ShadowContent:
278     @@rdf:type: test|StandaloneTest
279     @@ForCheck: ManakaiDOM|ForClass
280    
281     ElementTypeBinding:
282     @Name: enDesc
283     @ElementType:
284     dis:Description
285     @ShadowContent:
286     @@lang:en
287    
288     ElementTypeBinding:
289     @Name: enImplNote
290     @ElementType:
291     dis:ImplNote
292     @ShadowContent:
293     @@lang:en
294    
295     ElementTypeBinding:
296     @Name: enFN
297     @ElementType:
298     dis:FullName
299     @ShadowContent:
300     @@lang:en
301 wakaba 1.2
302     ## -- Implementation
303    
304     IFClsDef:
305     @IFQName: MCEncodeImplementation
306     @ClsQName: ManakaiMCEncodeImplementation
307    
308 wakaba 1.8 @domperl:implementedByObjectsOf: c|DOMImplementation
309     @domperl:classImplementedByObjectsOf: c|ManakaiDOMImplementation
310 wakaba 1.2
311     @enDesc:
312     The <IF::MCEncodeImplementation> interface provides
313     factory methods to create <IF::MCEncodeHandle> objects.
314    
315     @f:provides: MCEncodeFeature10
316    
317     @Test:
318     @@enDesc:
319     The implementation registry should know this class when the
320     module is loaded.
321     @@PerlDef:
322     I: {
323     for my $impl (@{$Message::DOM::ImplementationRegistry
324 wakaba 1.10 ->get_dom_implementation_list
325 wakaba 1.2 ({<Q::fe|MCEncode> => '1.0'})}) {
326     if ($impl->isa (<IFName::MCEncodeImplementation>)) {
327     last I;
328     }
329     }
330     $test->assert_never;
331     } # I
332    
333     @Method:
334     @@Name: createMCDecodeHandle
335     @@enDesc:
336     Creates an <IF::MCDecodeHandle> object.
337     @@Param:
338     @@@Name: charset
339     @@@Type: String
340     @@@enDesc:
341     The URI that identifies the charset
342     in which strings are written to the <P::byteStream>.
343     @@Param:
344     @@@Name: byteStream
345     @@@Type: DISPerl|Filehandle||ManakaiDOM|all
346     @@@enDesc:
347     A reference to the filehandle that contains the byte
348     stream read by the <IF::MCDecodeHandle> object.
349 wakaba 1.3 @@Param:
350     @@@Name: onerror
351     @@@Type: DISPerl|CODE||ManakaiDOM|all
352     @@@enDesc:
353     A subroutine that is called back when an error is encountered.
354     The <A::MCDecodeHandle.onerror> attribute of the
355     created object is set to this parameter value.
356     @@@nullCase:
357     @@@@enDesc:
358     The <A::MCDecodeHandle.onerror> attribute is set to
359     a do-nothing subroutine.
360 wakaba 1.2 @@Return:
361     @@@Type: MCDecodeHandle
362     @@@enDesc:
363     The newly created filehandle object.
364     @@@nullCase:
365     @@@@enDesc:
366     If the implementation does not support <P::charset>.
367     @@@PerlDef:
368     my $csdef = $Message::Charset::Encode::CharsetDef->{$charset};
369     my $obj = {<H::mce|filehandle> => $byteStream,
370     <H::mce|charset> => $charset,
371     <H::mce|characterQueue> => [],
372     <H::mce|byteBuffer> => '',
373 wakaba 1.3 <H::mce|onerror> => $onerror || sub {}};
374 wakaba 1.2 if ($csdef->{uri}->{<Q::xml-auto-charset:>} or
375     $charset eq <Q::xml-auto-charset:>) {
376 wakaba 1.3 __DEEP{
377     my $b = '';
378     $csdef = $Message::Charset::Encode::CharsetDef
379     ->{<Q::cs|Perl.utf-8>}; # UTF-8 with no BOM
380     $obj->{<H::mce|inputEncoding>} = 'utf-8';
381     if (read $obj->{<H::mce|filehandle>}, $b, 256) {
382     no warnings "substr";
383     no warnings "uninitialized";
384     if (substr ($b, 0, 1) eq "<") {
385     if (substr ($b, 1, 1) eq "?") { # ASCII8
386     __CODE{XMLEntity.guess::
387     $ascii => $b,
388     $errorCondition => {
389     not $csdef->{<H::cs|ASCII8>} or
390     $csdef->{<H::cs|BOM.Required>}
391     },
392     $defaultURI => {<Q::cs|Perl.utf-8>},
393     $defaultName => 'utf-8',
394     $restoreBOM => {},
395     }__;
396     if (defined $csdef->{<H::cs|noBOMVariant>}) {
397     $csdef = $Message::Charset::Encode::CharsetDef
398     ->{$csdef->{<H::cs|noBOMVariant>}};
399     }
400     } elsif (substr ($b, 1, 1) eq "\x00") {
401     if (substr ($b, 2, 2) eq "?\x00") { # ASCII16LE
402     my $c = $b; $c =~ tr/\x00//d;
403     __CODE{XMLEntity.guess::
404     $ascii => $c,
405     $errorCondition => {
406     not $csdef->{<H::cs|ASCII16>} or
407     $csdef->{<H::cs|ASCII16BE>} or
408     $csdef->{<H::cs|BOM.Required>}
409     },
410     $defaultURI => {<Q::cs|Perl.utf-8>},
411     $defaultName => 'utf-8',
412     $restoreBOM => {},
413     }__;
414     if (defined $csdef->{<H::cs|noBOMVariant16LE>}) {
415     $csdef = $Message::Charset::Encode::CharsetDef
416     ->{$csdef->{<H::cs|noBOMVariant16LE>}};
417     }
418     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian4321
419     my $c = $b; $c =~ tr/\x00//d;
420     __CODE{XMLEntity.guess::
421     $ascii => $c,
422     $errorCondition => {
423     not $csdef->{<H::cs|ASCII32>} or
424     $csdef->{<H::cs|ASCII32Endian1234>} or
425     $csdef->{<H::cs|ASCII32Endian2143>} or
426     $csdef->{<H::cs|ASCII32Endian3412>} or
427     $csdef->{<H::cs|BOM.Required>}
428     },
429     $defaultURI => {<Q::cs|Perl.utf-8>},
430     $defaultName => 'utf-8',
431     $restoreBOM => {},
432     }__;
433     if (defined $csdef->{<H::cs|noBOMVariant32Endian4321>}) {
434     $csdef = $Message::Charset::Encode::CharsetDef
435     ->{$csdef->{<H::cs|noBOMVariant32Endian4321>}};
436     }
437     }
438     }
439     } elsif (substr ($b, 0, 3) eq "\xEF\xBB\xBF") { # UTF8
440     $obj->{<H::mce|hasBOM>} = true;
441     substr ($b, 0, 3) = '';
442     my $c = $b;
443     __CODE{XMLEntity.guess::
444     $ascii => $c,
445     $errorCondition => {
446     not $csdef->{<H::cs|UTF8EncodingScheme>} or
447     not $csdef->{<H::cs|BOM.Allowed>}
448     },
449     $defaultURI => {<Q::cs|Perl.utf-8>},
450     $defaultName => 'utf-8',
451     $restoreBOM => {},
452     }__;
453     if (defined $csdef->{<H::cs|noBOMVariant>}) {
454     $csdef = $Message::Charset::Encode::CharsetDef
455     ->{$csdef->{<H::cs|noBOMVariant>}};
456     }
457     } elsif (substr ($b, 0, 2) eq "\x00<") {
458     if (substr ($b, 2, 2) eq "\x00?") { # ASCII16BE
459     my $c = $b; $c =~ tr/\x00//d;
460     __CODE{XMLEntity.guess::
461     $ascii => $c,
462     $errorCondition => {
463     not $csdef->{<H::cs|ASCII16>} or
464     $csdef->{<H::cs|ASCII16LE>} or
465     $csdef->{<H::cs|BOM.Required>}
466     },
467     $defaultURI => {<Q::cs|Perl.utf-8>},
468     $defaultName => 'utf-8',
469     $restoreBOM => {},
470     }__;
471     if (defined $csdef->{<H::cs|noBOMVariant16BE>}) {
472     $csdef = $Message::Charset::Encode::CharsetDef
473     ->{$csdef->{<H::cs|noBOMVariant16BE>}};
474     }
475     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian3412
476     my $c = $b; $c =~ tr/\x00//d;
477     __CODE{XMLEntity.guess::
478     $ascii => $c,
479     $errorCondition => {
480     not $csdef->{<H::cs|ASCII32>} or
481     $csdef->{<H::cs|ASCII32Endian1234>} or
482     $csdef->{<H::cs|ASCII32Endian2143>} or
483     $csdef->{<H::cs|ASCII32Endian4321>} or
484     $csdef->{<H::cs|BOM.Required>}
485     },
486     $defaultURI => {<Q::cs|Perl.utf-8>},
487     $defaultName => 'utf-8',
488     $restoreBOM => {},
489     }__;
490     if (defined $csdef->{<H::cs|noBOMVariant32Endian3412>}) {
491     $csdef = $Message::Charset::Encode::CharsetDef
492     ->{$csdef->{<H::cs|noBOMVariant32Endian3412>}};
493     }
494     }
495     } elsif (substr ($b, 0, 2) eq "\xFE\xFF") {
496     if (substr ($b, 2, 2) eq "\x00<") { # ASCII16BE
497     $obj->{<H::mce|hasBOM>} = true;
498     substr ($b, 0, 2) = '';
499     my $c = $b; $c =~ tr/\x00//d;
500     __CODE{XMLEntity.guess::
501     $ascii => $c,
502     $errorCondition => {
503     not $csdef->{<H::cs|ASCII16>} or
504     $csdef->{<H::cs|ASCII16LE>} or
505     not $csdef->{<H::cs|BOM.Allowed>}
506     },
507     $defaultURI => {<Q::cs|Perl.utf-16be>},
508     $defaultName => 'utf-16',
509     $restoreBOM => {},
510     }__;
511     if (defined $csdef->{<H::cs|noBOMVariant16BE>}) {
512     $csdef = $Message::Charset::Encode::CharsetDef
513     ->{$csdef->{<H::cs|noBOMVariant16BE>}};
514     }
515     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian3412
516     $obj->{<H::mce|hasBOM>} = true;
517     substr ($b, 0, 4) = '';
518     my $c = $b; $c =~ tr/\x00//d;
519     __CODE{XMLEntity.guess::
520     $ascii => $c,
521     $errorCondition => {
522     not $csdef->{<H::cs|ASCII32>} or
523     $csdef->{<H::cs|ASCII32Endian1234>} or
524     $csdef->{<H::cs|ASCII32Endian2143>} or
525     $csdef->{<H::cs|ASCII32Endian4321>} or
526     not $csdef->{<H::cs|BOM.Allowed>}
527     },
528     $defaultURI => {<Q::cs|Perl.utf-16be>},
529     $defaultName => 'utf-16',
530     $restoreBOM => {
531     $obj->{<H::mce|byteBuffer>} .= "\x00\x00";
532     },
533     }__;
534     if (defined $csdef->{<H::cs|noBOMVariant32Endian3412>}) {
535     $csdef = $Message::Charset::Encode::CharsetDef
536     ->{$csdef->{<H::cs|noBOMVariant32Endian3412>}};
537     }
538     } else {
539     $csdef = $Message::Charset::Encode::CharsetDef
540     ->{<Q::cs|Perl.utf-16be>};
541     $obj->{<H::mce|inputEncoding>} = 'utf-16';
542     substr ($b, 0, 2) = '';
543     $obj->{<H::mce|hasBOM>} = true;
544     }
545     } elsif (substr ($b, 0, 2) eq "\xFF\xFE") {
546     if (substr ($b, 2, 2) eq "<\x00") { # ASCII16LE
547     $obj->{<H::mce|hasBOM>} = true;
548     substr ($b, 0, 2) = '';
549     my $c = $b; $c =~ tr/\x00//d;
550     __CODE{XMLEntity.guess::
551     $ascii => $c,
552     $errorCondition => {
553     not $csdef->{<H::cs|ASCII16>} or
554     $csdef->{<H::cs|ASCII16BE>} or
555     not $csdef->{<H::cs|BOM.Allowed>}
556     },
557     $defaultURI => {<Q::cs|Perl.utf-16le>},
558     $defaultName => 'utf-16',
559     $restoreBOM => {},
560     }__;
561     if (defined $csdef->{<H::cs|noBOMVariant16LE>}) {
562     $csdef = $Message::Charset::Encode::CharsetDef
563     ->{$csdef->{<H::cs|noBOMVariant16LE>}};
564     }
565     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian4321
566     $obj->{<H::mce|hasBOM>} = true;
567     substr ($b, 0, 4) = '';
568     my $c = $b; $c =~ tr/\x00//d;
569     __CODE{XMLEntity.guess::
570     $ascii => $c,
571     $errorCondition => {
572     not $csdef->{<H::cs|ASCII32>} or
573     $csdef->{<H::cs|ASCII32Endian1234>} or
574     $csdef->{<H::cs|ASCII32Endian2143>} or
575     $csdef->{<H::cs|ASCII32Endian3412>} or
576     not $csdef->{<H::cs|BOM.Allowed>}
577     },
578     $defaultURI => {<Q::cs|Perl.utf-16le>},
579     $defaultName => 'utf-16',
580     $restoreBOM => {
581     $obj->{<H::mce|byteBuffer>} .= "\x00\x00";
582     },
583     }__;
584     if (defined $csdef->{<H::cs|noBOMVariant32Endian4321>}) {
585     $csdef = $Message::Charset::Encode::CharsetDef
586     ->{$csdef->{<H::cs|noBOMVariant32Endian4321>}};
587     }
588     } else {
589     $csdef = $Message::Charset::Encode::CharsetDef
590     ->{<Q::cs|Perl.utf-16le>};
591     $obj->{<H::mce|inputEncoding>} = 'utf-16';
592     substr ($b, 0, 2) = '';
593     $obj->{<H::mce|hasBOM>} = true;
594     }
595     } elsif (substr ($b, 0, 2) eq "\x00\x00") {
596     if (substr ($b, 2, 2) eq "\x00<") { # ASCII32Endian1234
597     my $c = $b; $c =~ tr/\x00//d;
598     __CODE{XMLEntity.guess::
599     $ascii => $c,
600     $errorCondition => {
601     not $csdef->{<H::cs|ASCII32>} or
602     $csdef->{<H::cs|ASCII32Endian2143>} or
603     $csdef->{<H::cs|ASCII32Endian3412>} or
604     $csdef->{<H::cs|ASCII32Endian4321>} or
605     $csdef->{<H::cs|BOM.Required>}
606     },
607     $defaultURI => {<Q::cs|Perl.utf-8>},
608     $defaultName => 'utf-8',
609     $restoreBOM => {},
610     }__;
611     if (defined $csdef->{<H::cs|noBOMVariant32Endian1234>}) {
612     $csdef = $Message::Charset::Encode::CharsetDef
613     ->{$csdef->{<H::cs|noBOMVariant32Endian1234>}};
614     }
615     } elsif (substr ($b, 2, 2) eq "<\x00") { # ASCII32Endian2143
616     my $c = $b; $c =~ tr/\x00//d;
617     __CODE{XMLEntity.guess::
618     $ascii => $c,
619     $errorCondition => {
620     not $csdef->{<H::cs|ASCII32>} or
621     $csdef->{<H::cs|ASCII32Endian1234>} or
622     $csdef->{<H::cs|ASCII32Endian3412>} or
623     $csdef->{<H::cs|ASCII32Endian4321>} or
624     $csdef->{<H::cs|BOM.Required>}
625     },
626     $defaultURI => {<Q::cs|Perl.utf-8>},
627     $defaultName => 'utf-8',
628     $restoreBOM => {},
629     }__;
630     if (defined $csdef->{<H::cs|noBOMVariant32Endian2143>}) {
631     $csdef = $Message::Charset::Encode::CharsetDef
632     ->{$csdef->{<H::cs|noBOMVariant32Endian2143>}};
633     }
634     } elsif (substr ($b, 2, 2) eq "\xFE\xFF") { # ASCII32Endian1234
635     $obj->{<H::mce|hasBOM>} = true;
636     substr ($b, 0, 4) = '';
637     my $c = $b; $c =~ tr/\x00//d;
638     __CODE{XMLEntity.guess::
639     $ascii => $c,
640     $errorCondition => {
641     not $csdef->{<H::cs|ASCII32>} or
642     $csdef->{<H::cs|ASCII32Endian2143>} or
643     $csdef->{<H::cs|ASCII32Endian3412>} or
644     $csdef->{<H::cs|ASCII32Endian4321>} or
645     $csdef->{<H::cs|BOM.Required>}
646     },
647     $defaultURI => {<Q::cs|Perl.utf-8>},
648     $defaultName => 'utf-8',
649     $restoreBOM => {
650     $obj->{<H::mce|hasBOM>} = false;
651     $obj->{<H::mce|byteBuffer>} .= "\x00\x00\xFE\xFF";
652     },
653     }__;
654     if (defined $csdef->{<H::cs|noBOMVariant32Endian1234>}) {
655     $csdef = $Message::Charset::Encode::CharsetDef
656     ->{$csdef->{<H::cs|noBOMVariant32Endian1234>}};
657     }
658     } elsif (substr ($b, 2, 2) eq "\xFF\xFE") { # ASCII32Endian2143
659     $obj->{<H::mce|hasBOM>} = true;
660     substr ($b, 0, 4) = '';
661     my $c = $b; $c =~ tr/\x00//d;
662     __CODE{XMLEntity.guess::
663     $ascii => $c,
664     $errorCondition => {
665     not $csdef->{<H::cs|ASCII32>} or
666     $csdef->{<H::cs|ASCII32Endian1234>} or
667     $csdef->{<H::cs|ASCII32Endian3412>} or
668     $csdef->{<H::cs|ASCII32Endian4321>} or
669     $csdef->{<H::cs|BOM.Required>}
670     },
671     $defaultURI => {<Q::cs|Perl.utf-8>},
672     $defaultName => 'utf-8',
673     $restoreBOM => {
674     $obj->{<H::mce|hasBOM>} = false;
675     $obj->{<H::mce|byteBuffer>} .= "\x00\x00\xFF\xFE";
676     },
677     }__;
678     if (defined $csdef->{<H::cs|noBOMVariant32Endian2143>}) {
679     $csdef = $Message::Charset::Encode::CharsetDef
680     ->{$csdef->{<H::cs|noBOMVariant32Endian2143>}};
681     }
682     }
683     # \x4C\x6F\xA7\x94 EBCDIC
684     } # buffer
685     $obj->{<H::mce|byteBuffer>} .= $b;
686     } # read
687     }__;
688 wakaba 1.2 } elsif ($csdef->{uri}->{<Q::cs|XML.utf-8>}) {
689 wakaba 1.3 ## BOM is optional.
690     __DEEP{
691     my $b = '';
692     if (read $obj->{<H::mce|filehandle>}, $b, 3) {
693     if ($b eq "\xEF\xBB\xBF") {
694     $obj->{<H::mce|hasBOM>} = true;
695     } else {
696     $obj->{<H::mce|byteBuffer>} .= $b;
697     }
698     }
699     $csdef = $Message::Charset::Encode::CharsetDef
700     ->{<Q::cs|Perl.utf-8>}; # UTF-8 with no BOM
701     }__;
702 wakaba 1.2 } elsif ($csdef->{uri}->{<Q::cs|XML.utf-16>}) {
703 wakaba 1.3 ## BOM is mandated.
704     __DEEP{
705     my $b = '';
706     if (read $obj->{<H::mce|filehandle>}, $b, 2) {
707     if ($b eq "\xFE\xFF") {
708     $obj->{<H::mce|hasBOM>} = true;
709     $csdef = $Message::Charset::Encode::CharsetDef
710     ->{<Q::cs|Perl.utf-16be>}; # UTF-16BE with no BOM
711     } elsif ($b eq "\xFF\xFE") {
712     $obj->{<H::mce|hasBOM>} = true;
713     $csdef = $Message::Charset::Encode::CharsetDef
714     ->{<Q::cs|Perl.utf-16le>}; # UTF-16LE with no BOM
715     } else {
716     $obj->{<H::mce|onerror>}
717     ->($onerror, null, <Q::cs|no-bom-error>,
718     charset_uri => $charset);
719     $obj->{<H::mce|hasBOM>} = false;
720     $obj->{<H::mce|byteBuffer>} .= $b;
721     $csdef = $Message::Charset::Encode::CharsetDef
722     ->{<Q::cs|Perl.utf-16be>}; # UTF-16BE with no BOM
723     }
724     } else {
725     $obj->{<H::mce|onerror>}
726     ->($onerror, null, <Q::cs|no-bom-error>,
727     charset_uri => $charset);
728     $obj->{<H::mce|hasBOM>} = false;
729     $csdef = $Message::Charset::Encode::CharsetDef
730     ->{<Q::cs|Perl.utf-16be>}; # UTF-16BE with no BOM
731     }
732     }__;
733     }
734 wakaba 1.2
735 wakaba 1.6 if ($csdef->{uri}->{<Q::cs|XML.iso-2022-jp>}) {
736     $obj->{<H::mce|State.2440>} = 'gl-jis-1997-swapped';
737     $obj->{<H::mce|State.2442>} = 'gl-jis-1997';
738     $obj->{<H::mce|state>} = <H::mce|State.2842>;
739     $r = bless $obj, <ClassName::ManakaiMCISO2022JPDecodeHandle>;
740     require Encode::GLJIS1997Swapped;
741     require Encode::GLJIS1997;
742     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2440>});
743     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2442>});
744     } elsif ($csdef->{uri}->{<Q::icharset|iso-2022-jp>}) {
745     $obj->{<H::mce|State.2440>} = 'gl-jis-1978';
746     $obj->{<H::mce|State.2442>} = 'gl-jis-1983';
747     $obj->{<H::mce|state>} = <H::mce|State.2842>;
748     $r = bless $obj, <ClassName::ManakaiMCISO2022JPDecodeHandle>;
749     require Encode::GLJIS1978;
750     require Encode::GLJIS1983;
751     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2440>});
752     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2442>});
753     } elsif (defined $csdef->{<H::cs|perlName>}->[0]) {
754 wakaba 1.5 if ($csdef->{uri}->{<Q::cs|XML.euc-jp>} or
755     $csdef->{uri}->{<Q::icharset|euc-jp>}) {
756     $obj->{<H::mce|perlEncodingName>} = $csdef->{<H::cs|perlName>}->[0];
757     $r = bless $obj, <ClassName::ManakaiMCEUCJPDecodeHandle>;
758     require Encode::EUCJP1997;
759     undef $r unless Encode::find_encoding
760     ($obj->{<H::mce|perlEncodingName>});
761     } elsif ($csdef->{uri}->{<Q::cs|XML.shift_jis>} or
762     $csdef->{uri}->{<Q::icharset|shift_jis>}) {
763     $obj->{<H::mce|perlEncodingName>} = $csdef->{<H::cs|perlName>}->[0];
764     $r = bless $obj, <ClassName::ManakaiMCShiftJISDecodeHandle>;
765     require Encode::ShiftJIS1997;
766     undef $r unless Encode::find_encoding
767     ($obj->{<H::mce|perlEncodingName>});
768     } elsif ($csdef->{<H::cs|isBlockSafe>}) {
769 wakaba 1.2 $obj->{<H::mce|perlEncodingName>} = $csdef->{<H::cs|perlName>}->[0];
770     $r = bless $obj, <ClassName::ManakaiMCDecodeHandle>;
771     require Encode;
772     undef $r unless Encode::find_encoding
773     ($obj->{<H::mce|perlEncodingName>});
774     }
775     }
776 wakaba 1.5
777 wakaba 1.3 unless (defined $r) {
778     __DEEP{
779     $obj->{<H::mce|onerror>}
780     ->($onerror, null, <Q::cs|charset-not-supported-error>,
781     charset_uri => $charset);
782     }__;
783     }
784    
785     @@CODE:
786     @@@QName: XMLEntity.guess
787     @@@PerlDef:
788     if ($ascii =~ /^<\?xml\s+(?:version\s*=\s*["'][^"']*["']\s*)?
789     encoding\s*=\s*["']([^"']*)/x) {
790     $obj->{<H::mce|inputEncoding>} = lc $1;
791     my $__uri = $self-><M::MCEncodeImplementation.getURIFromCharsetName>
792     (<Q::cs|xmlName>, $obj->{<H::mce|inputEncoding>});
793     $csdef = $Message::Charset::Encode::CharsetDef->{$__uri};
794     if ($errorCondition) {
795     $obj->{<H::mce|onerror>}
796     ->($obj->{<H::mce|onerror>}, null,
797     <Q::cs|charset-name-mismatch-error>,
798     charset_uri => $__uri,
799     charset_name => $obj->{<H::mce|inputEncoding>});
800     }
801     } else {
802     $csdef = $Message::Charset::Encode::CharsetDef->{$defaultURI};
803     $obj->{<H::mce|inputEncoding>} = $defaultName;
804     $restoreBOM;
805     }
806 wakaba 1.2
807     @@Test:
808     @@@QName: MCEncodeImpl.createMCDecodeHandle.test
809     @@@PerlDef:
810     my $impl;
811     __CODE{createImplForTest:: $impl => $impl}__;
812    
813     my $byte = 'a';
814     open my $fh, '>', \$byte;
815     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
816     (<Q::cs|Perl.utf8>, $fh);
817    
818     $test->id ('interface');
819     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
820    
821 wakaba 1.3 $test->id ('onerr');
822     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
823    
824 wakaba 1.2 @@Test:
825     @@@QName: MCEncodeImpl.createMCXMLDecodeHandle.test
826     @@@PerlDef:
827     my $impl;
828     __CODE{createImplForTest:: $impl => $impl}__;
829    
830     my $byte = 'a';
831     open my $fh, '<', \$byte;
832     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
833     (<Q::xml-auto-charset:>, $fh);
834    
835     $test->id ('interface');
836     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
837    
838 wakaba 1.3 $test->id ('onerr');
839     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
840    
841     @@Test:
842     @@@QName: MCEncodeImpl.createMCDecodeHandle.3.test
843     @@@PerlDef:
844     my $impl;
845     __CODE{createImplForTest:: $impl => $impl}__;
846    
847     my $errors = 0;
848    
849     my $byte = 'a';
850     open my $fh, '>', \$byte;
851     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
852     (q<http://www.charset.example/>, $fh, sub {
853     $errors++;
854     $test->id ('errortype');
855     $test->assert_equals
856     ($_[2],
857     <Q::cs|charset-not-supported-error>);
858     });
859    
860     $test->id ('errors');
861     $test->assert_num_equals (actual_value => $errors, expected_value => 1);
862    
863     $test->id ('return');
864     $test->assert_null ($efh);
865 wakaba 1.2
866 wakaba 1.5 @@Test:
867     @@@QName: MCEncodeImpl.createMCDecodeHandle.4.test
868     @@@PerlDef:
869     my $impl;
870     __CODE{createImplForTest:: $impl => $impl}__;
871    
872     my $byte = 'a';
873     open my $fh, '<', \$byte;
874     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
875     (<Q::cs|XML.euc-jp>, $fh);
876    
877     $test->id ('interface');
878     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
879    
880     $test->id ('class');
881     $test->assert_isa ($efh, <ClassName::ManakaiMCEUCJPDecodeHandle>);
882    
883     $test->id ('onerr');
884     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
885    
886     @@Test:
887     @@@QName: MCEncodeImpl.createMCDecodeHandle.5.test
888     @@@PerlDef:
889     my $impl;
890     __CODE{createImplForTest:: $impl => $impl}__;
891    
892     my $byte = 'a';
893     open my $fh, '<', \$byte;
894     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
895     (<Q::cs|XML.euc-jp>, $fh);
896    
897     $test->id ('interface');
898     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
899    
900     $test->id ('class');
901     $test->assert_isa ($efh, <ClassName::ManakaiMCEUCJPDecodeHandle>);
902    
903     $test->id ('onerr');
904     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
905    
906 wakaba 1.6 @@Test:
907     @@@QName: MCEncodeImpl.createMCDecodeHandle.6.test
908     @@@PerlDef:
909     my $impl;
910     __CODE{createImplForTest:: $impl => $impl}__;
911    
912     my $byte = 'a';
913     open my $fh, '<', \$byte;
914     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
915     (<Q::cs|XML.iso-2022-jp>, $fh);
916    
917     $test->id ('interface');
918     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
919    
920     $test->id ('class');
921     $test->assert_isa ($efh, <ClassName::ManakaiMCISO2022JPDecodeHandle>);
922    
923     $test->id ('onerr');
924     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
925    
926     @@Test:
927     @@@QName: MCEncodeImpl.createMCDecodeHandle.7.test
928     @@@PerlDef:
929     my $impl;
930     __CODE{createImplForTest:: $impl => $impl}__;
931    
932     my $byte = 'a';
933     open my $fh, '<', \$byte;
934     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
935     (<Q::icharset|iso-2022-jp>, $fh);
936    
937     $test->id ('interface');
938     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
939    
940     $test->id ('class');
941     $test->assert_isa ($efh, <ClassName::ManakaiMCISO2022JPDecodeHandle>);
942    
943     $test->id ('onerr');
944     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
945    
946 wakaba 1.2 @Method:
947     @@Name: getURIFromCharsetName
948     @@enDesc:
949     Returns a DOM URI that identifies a charset.
950     @@Param:
951     @@@Name: domain
952     @@@Type: String
953     @@@enDesc:
954     A DOM URI that identifies the context in which the charset
955     name is used.
956     @@Param:
957     @@@Name: name
958     @@@Type: String
959     @@@enDesc:
960     The charset name to convert.
961     @@Return:
962     @@@Type: String
963     @@@enDesc:
964     A DOM URI that identifies <P::name>.
965     @@@nullCase:
966     @@@@enDesc:
967     The implementation was unable to resolve <P::name> to a URI.
968     @@@PerlDef:
969     if ({
970     <Q::cs|ietfName> => true,
971     <Q::cs|xmlName> => true,
972     }->{$domain}) {
973 wakaba 1.3 $name = lc $name;
974     if ($domain eq <Q::cs|ietfName>) {
975     $r = <Q::icharset|> . $name;
976     } elsif ($domain eq <Q::cs|xmlName>) {
977     $r = <Q::cs|XML.> . $name;
978     }
979    
980     unless ($Message::Charset::Encode::CharsetDef->{$r}) {
981     U: for my $uri (keys %$Message::Charset::Encode::CharsetDef) {
982     for (@{$Message::Charset::Encode::CharsetDef->{$uri}->{+{
983     <Q::cs|ietfName> => <H::cs|ietfName>,
984     <Q::cs|xmlName> => <H::cs|xmlName>,
985     }->{$domain}} or []}) {
986     if ($_ eq $name) {
987     $r = $uri;
988     last U;
989     }
990 wakaba 1.2 }
991 wakaba 1.3 } # U
992 wakaba 1.2 }
993 wakaba 1.3 } else {
994     $r = null;
995     }
996    
997     @@Test:
998     @@@QName: MCEncodeImpl.name2uri.test
999     @@@PerlDef:
1000     my $impl;
1001     __CODE{createImplForTest:: $impl => $impl}__;
1002    
1003     for (
1004     [<Q::icharset|utf-8>, 'utf-8'],
1005     [<Q::icharset|x-no-such-charset>, 'x-no-such-charset'],
1006     [<Q::icharset|utf-8>, 'UTF-8'],
1007     [<Q::icharset|utf-8>, 'uTf-8'],
1008     [<Q::icharset|utf-16be>, 'utf-16be'],
1009     ) {
1010     $test->id ('ietfname2uri.'.$_->[1]);
1011     my $iname = $impl-><M::MCEncodeImplementation.getURIFromCharsetName>
1012     (<Q::cs|ietfName>, $_->[1]);
1013     $test->assert_equals ($iname, $_->[0]);
1014     }
1015    
1016     for (
1017     [<Q::cs|XML.utf-8>, 'utf-8'],
1018     [<Q::cs|XML.x-no-such-charset>, 'x-no-such-charset'],
1019     [<Q::cs|XML.utf-8>, 'UTF-8'],
1020     [<Q::cs|XML.utf-8>, 'uTf-8'],
1021     [<Q::icharset|utf-16be>, 'utf-16be'],
1022     ) {
1023     $test->id ('xmlname2uri.'.$_->[1]);
1024     my $iname = $impl-><M::MCEncodeImplementation.getURIFromCharsetName>
1025     (<Q::cs|xmlName>, $_->[1]);
1026     $test->assert_equals ($iname, $_->[0]);
1027 wakaba 1.2 }
1028    
1029     @Method:
1030     @@Name: getCharsetNameFromURI
1031     @@enDesc:
1032     Returns a name for the charset identified by a DOM URI.
1033     @@Param:
1034     @@@Name: domain
1035     @@@Type: String
1036     @@@enDesc:
1037     A DOM URI that identifies the context in which the charset
1038     name is used.
1039     @@Param:
1040     @@@Name: uri
1041     @@@Type: String
1042     @@@enDesc:
1043     A DOM URI of the charset.
1044     @@Return:
1045     @@@Type: String
1046     @@@enDesc:
1047     A charset name that identifies <P::uri>.
1048     @@@nullCase:
1049     @@@@enDesc:
1050     The implementation was unable to find the charset name
1051     for the <P::uri> that can be used in <P::domain> context.
1052     @@@PerlDef:
1053     if ({
1054     <Q::cs|ietfName> => true,
1055     <Q::cs|xmlName> => true,
1056     }->{$domain}) {
1057 wakaba 1.3 $r = $Message::Charset::Encode::CharsetDef->{$uri}->{+{
1058     <Q::cs|ietfName> => <H::cs|ietfName>,
1059     <Q::cs|xmlName> => <H::cs|xmlName>,
1060     }->{$domain}}->[0];
1061     unless (defined $r) {
1062     if ($domain eq <Q::cs|ietfName> and
1063     substr ($uri, 0, length <Q::icharset|>) eq <Q::icharset|>) {
1064     $r = substr ($uri, length <Q::icharset|>);
1065     } elsif ($domain eq <Q::cs|xmlName> and
1066     substr ($uri, 0, length <Q::cs|XML.>) eq <Q::cs|XML.>) {
1067     $r = substr ($uri, length <Q::cs|XML.>);
1068     }
1069     }
1070 wakaba 1.2 } else {
1071     $r = null;
1072     }
1073    
1074 wakaba 1.3 @@Test:
1075     @@@QName: MCEncodeImpl.uri2name.test
1076     @@@PerlDef:
1077     my $impl;
1078     __CODE{createImplForTest:: $impl => $impl}__;
1079    
1080     for (
1081     [<Q::icharset|utf-8>, 'utf-8'],
1082     [<Q::icharset|x-no-such-charset>, 'x-no-such-charset'],
1083     [q<http://charset.example/>, null],
1084     ) {
1085     $test->id ('uri2ietfname.'.$_->[0]);
1086     my $iname = $impl-><M::MCEncodeImplementation.getCharsetNameFromURI>
1087     (<Q::cs|ietfName>, $_->[0]);
1088     $test->assert_equals ($iname, $_->[1]);
1089     }
1090    
1091     for (
1092     [<Q::cs|XML.utf-8>, 'utf-8'],
1093     [<Q::cs|XML.x-no-such-charset>, 'x-no-such-charset'],
1094     [q<http://charset.example/>, null],
1095     ) {
1096     $test->id ('uri2xmlname.'.$_->[0]);
1097     my $iname = $impl-><M::MCEncodeImplementation.getCharsetNameFromURI>
1098     (<Q::cs|xmlName>, $_->[0]);
1099     $test->assert_equals ($iname, $_->[1]);
1100     }
1101    
1102 wakaba 1.2 @CODE:
1103     @@QName: createImplForTest
1104     @@PerlDef:
1105 wakaba 1.9 $impl = <Class::c|ManakaiDOMImplementation>->_new;
1106 wakaba 1.2 ##MCEncodeImplementation
1107    
1108     ElementTypeBinding:
1109     @Name: CODE
1110     @ElementType:
1111     dis:ResourceDef
1112     @ShadowContent:
1113     @@DISCore:resourceType: DISPerl|BlockCode
1114     @@ForCheck: ManakaiDOM|ForClass
1115    
1116     IFClsDef:
1117     @IFQName: MCDecodeHandle
1118     @ClsQName: ManakaiMCDecodeHandle
1119    
1120     @enDesc:
1121     An <IF::MCDecodeHandle> provides the read access to a character
1122     stream.
1123    
1124     @enDesc:
1125     @@ddid: cestype
1126     @@ForCheck: ManakaiDOM|ForClass
1127     @@@:
1128     The class <Class::ManakaiMCDecodeHandle> can be used to
1129     encapsulate a byte filehandle with <Perl::Encode> call
1130     into a character filehandle-like object.
1131    
1132     The encoding <kwd:MUST> be stateless and signatureless. In addition,
1133     its <Perl::Encode> implementation <kwd:MUST> support
1134     the <Perl::FB_QUIET> flag.
1135    
1136     @Attr:
1137     @@Name: charset
1138     @@enDesc:
1139     A URI that identifies the charset of the handle.
1140     @@Type: String
1141     @@Get:
1142     @@@PerlDef:
1143     $r = $self->{<H::mce|charset>};
1144    
1145     @Attr:
1146 wakaba 1.3 @@Name: onerror
1147 wakaba 1.2 @@enDesc:
1148 wakaba 1.3 A callback function that is invoked when an error is encountered.
1149    
1150     {P:: The function will be invoked with arguments:
1151 wakaba 1.2
1152 wakaba 1.3 - <Perl::$self>::: The function itself.
1153 wakaba 1.2
1154 wakaba 1.3 - <Perl::$handle>::: The <IF::MCDecodeHandle> object.
1155     If the error is thrown during the
1156     construction of the object, it might
1157     be <DOM::null> instead.
1158    
1159     - <Perl::$errorType>::: The DOM URI that identifies the category
1160     of the error.
1161    
1162     - <Perl::%opt>::: Named parameters depending to the <Perl::$errorType>.
1163    
1164     It <kwd:MAY> throw an exception.
1165     }
1166 wakaba 1.2 @@Type: DISPerl|CODE||ManakaiDOM|all
1167     @@Get:
1168     @@@PerlDef:
1169 wakaba 1.3 $r = $self->{<H::mce|onerror>};
1170 wakaba 1.2 @@Set:
1171     @@@PerlDef:
1172 wakaba 1.3 $self->{<H::mce|onerror>} = $given;
1173 wakaba 1.2
1174     @@Test:
1175     @@@QName: MCDecodeHandle.onoctetstreamerror.test
1176     @@@PerlDef:
1177     my $impl;
1178     __CODE{createImplForTest:: $impl => $impl}__;
1179    
1180     my $byte = "a\xE3\x81\x82\x81a";
1181     open my $fh, '<', \$byte;
1182     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1183     (<Q::cs|Perl.utf8>, $fh);
1184    
1185     $test->id ('default');
1186 wakaba 1.3 $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>,
1187 wakaba 1.2 'CODE');
1188    
1189     $test->id ('get.set');
1190     my $sub1 = sub { return "2" };
1191 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> ($sub1);
1192     my $sub2 = $efh-><AG::MCDecodeHandle.onerror>;
1193 wakaba 1.2 $test->assert_equals ($sub2, $sub1);
1194     $test->assert_equals ($sub2->(), "2");
1195    
1196     @Method:
1197     @@Name: getc
1198     @@enDesc:
1199     Returns the next character from the input.
1200     @@Return:
1201     @@@Type: String
1202     @@@enDesc:
1203     The next character.
1204     @@@nullCase:
1205     @@@@enDesc:
1206     If at the end of the file, or if there was an error, in which
1207     case <Perl::$!> is set.
1208     @@@PerlDef:
1209     if (@{$self->{<H::mce|characterQueue>}}) {
1210     $r = shift @{$self->{<H::mce|characterQueue>}};
1211     } else {
1212     __DEEP{
1213     my $error;
1214     if ($self->{<H::mce|continue>}) {
1215     if (read $self->{<H::mce|filehandle>},
1216     $self->{<H::mce|byteBuffer>}, 256,
1217     length $self->{<H::mce|byteBuffer>}) {
1218     #
1219     } else {
1220     $error = true;
1221     }
1222     $self->{<H::mce|continue>} = false;
1223 wakaba 1.3 } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
1224 wakaba 1.2 read $self->{<H::mce|filehandle>},
1225     $self->{<H::mce|byteBuffer>}, 256,
1226     length $self->{<H::mce|byteBuffer>};
1227     }
1228    
1229     unless ($error) {
1230     my $string = Encode::decode ($self->{<H::mce|perlEncodingName>},
1231     $self->{<H::mce|byteBuffer>},
1232     Encode::FB_QUIET ());
1233     if (length $string) {
1234     push @{$self->{<H::mce|characterQueue>}}, split //, $string;
1235     $r = shift @{$self->{<H::mce|characterQueue>}};
1236 wakaba 1.3 if (length $self->{<H::mce|byteBuffer>}) {
1237     $self->{<H::mce|continue>} = true;
1238     }
1239 wakaba 1.2 } else {
1240 wakaba 1.3 if (length $self->{<H::mce|byteBuffer>}) {
1241     $error = true;
1242     } else {
1243     $r = null;
1244     }
1245 wakaba 1.2 }
1246 wakaba 1.3 }
1247     if ($error) {
1248 wakaba 1.2 $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
1249 wakaba 1.3 $self->{<H::mce|onerror>}
1250     ->($self->{<H::mce|onerror>}, $self,
1251     <Q::cs|illegal-octets-error>,
1252     octets => \$r);
1253 wakaba 1.2 }
1254     }__;
1255     }
1256    
1257     @@Test:
1258     @@@QName: MCDecodeHandle.getc.1.test
1259     @@@PerlDef:
1260     my $impl;
1261     __CODE{createImplForTest:: $impl => $impl}__;
1262    
1263     my $byte = "a\xE3\x81\x82\x81a";
1264     open my $fh, '<', \$byte;
1265     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1266     (<Q::cs|Perl.utf8>, $fh);
1267    
1268     my $error = null;
1269 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1270     my ($self, $efh, $type, %opt) = @_;
1271     $error = ${$opt{octets}};
1272 wakaba 1.2 });
1273    
1274     $test->id (1);
1275     $test->assert_equals ($efh->getc, "a");
1276     $test->id ('1.err');
1277     $test->assert_null ($error);
1278    
1279     $test->id (2);
1280     $test->assert_equals ($efh->getc, "\x{3042}");
1281     $test->id ('1.err');
1282     $test->assert_null ($error);
1283    
1284     $test->id (3);
1285     $test->assert_equals ($efh->getc, "\x81");
1286     $test->id ('1.err');
1287     $test->assert_equals ($error, "\x81");
1288     $error = null;
1289    
1290     $test->id (4);
1291     $test->assert_equals ($efh->getc, "a");
1292     $test->id ('4.err');
1293     $test->assert_null ($error);
1294    
1295     $test->id ('eof');
1296     $test->assert_null ($efh->getc);
1297     $test->id ('eof.err');
1298     $test->assert_null ($error);
1299    
1300     @@Test:
1301     @@@QName: MCDecodeHandle.getc.2.test
1302     @@@PerlDef:
1303     my $impl;
1304     __CODE{createImplForTest:: $impl => $impl}__;
1305    
1306     my $byte = "a" x 256;
1307     $byte .= "b" x 256;
1308    
1309     open my $fh, '<', \$byte;
1310     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1311     (<Q::cs|Perl.utf8>, $fh);
1312    
1313     my $error = null;
1314 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1315     my ($self, $efh, $type, %opt) = @_;
1316     $error = ${$opt{octets}};
1317 wakaba 1.2 });
1318    
1319     for my $i (0..255) {
1320     $test->id ("a.$i");
1321     $test->assert_equals ($efh->getc, "a");
1322     $test->id ("a.$i.err");
1323     $test->assert_null ($error);
1324     }
1325    
1326     for my $i (0..255) {
1327     $test->id ("b.$i");
1328     $test->assert_equals ($efh->getc, "b");
1329     $test->id ("b.$i.err");
1330     $test->assert_null ($error);
1331     }
1332    
1333     $test->id ('eof');
1334     $test->assert_null ($efh->getc);
1335     $test->id ('eof.err');
1336     $test->assert_null ($error);
1337    
1338     @@Test:
1339     @@@QName: MCDecodeHandle.getc.3.test
1340     @@@PerlDef:
1341     my $impl;
1342     __CODE{createImplForTest:: $impl => $impl}__;
1343    
1344     my $byte = "a" x 255;
1345     $byte .= "\xE3\x81\x82";
1346     $byte .= "b" x 256;
1347    
1348     open my $fh, '<', \$byte;
1349     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1350     (<Q::cs|Perl.utf8>, $fh);
1351    
1352     my $error = null;
1353 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1354     my ($self, $efh, $type, %opt) = @_;
1355     $error = ${$opt{octets}};
1356 wakaba 1.2 });
1357    
1358     for my $i (0..254) {
1359     $test->id ("a.$i");
1360     $test->assert_equals ($efh->getc, "a");
1361     $test->id ("a.$i.err");
1362     $test->assert_null ($error);
1363     }
1364    
1365     $test->id ("A");
1366     $test->assert_equals ($efh->getc, "\x{3042}");
1367     $test->id ("A.err");
1368     $test->assert_null ($error);
1369    
1370     for my $i (0..255) {
1371     $test->id ("b.$i");
1372     $test->assert_equals ($efh->getc, "b");
1373     $test->id ("b.$i.err");
1374     $test->assert_null ($error);
1375     }
1376    
1377     $test->id ('eof');
1378     $test->assert_null ($efh->getc);
1379     $test->id ('eof.err');
1380     $test->assert_null ($error);
1381    
1382     @@Test:
1383     @@@QName: MCDecodeHandle.getc.4.test
1384     @@@PerlDef:
1385     my $impl;
1386     __CODE{createImplForTest:: $impl => $impl}__;
1387    
1388     my $byte = "a" x 255;
1389     $byte .= "\xE3";
1390    
1391     open my $fh, '<', \$byte;
1392     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1393     (<Q::cs|Perl.utf8>, $fh);
1394    
1395     my $error = null;
1396 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1397     my ($self, $efh, $type, %opt) = @_;
1398     $error = ${$opt{octets}};
1399 wakaba 1.2 });
1400    
1401     for my $i (0..254) {
1402     $test->id ("a.$i");
1403     $test->assert_equals ($efh->getc, "a");
1404     $test->id ("a.$i.err");
1405     $test->assert_null ($error);
1406     }
1407    
1408     $test->id ("E3");
1409     $test->assert_equals ($efh->getc, "\xE3");
1410     $test->id ("E3.err");
1411     $test->assert_equals ($error, "\xE3");
1412     $error = null;
1413    
1414     $test->id ('eof');
1415     $test->assert_null ($efh->getc);
1416     $test->id ('eof.err');
1417     $test->assert_null ($error);
1418 wakaba 1.4
1419     @Method:
1420     @@Name: ungetc
1421     @@enDesc:
1422     Pushes a character with the given ordinal value back
1423     onto the handle's input stream. In <Perl::IO::Handle>
1424     only one character of pushback per handle is guaranteed.
1425     @@Param:
1426     @@@Name: ord
1427     @@@Type: idl|unsignedLong||ManakaiDOM|all
1428     @@@enDesc:
1429     The ordinal value of the character to push back.
1430     @@Return:
1431     @@@PerlDef:
1432     unshift @{$self->{<H::mce|characterQueue>}}, chr $ord;
1433    
1434     @@Test:
1435     @@@QName: MCDecodeHandle.ungetc.test
1436     @@@PerlDef:
1437     my $impl;
1438     __CODE{createImplForTest:: $impl => $impl}__;
1439    
1440     my $byte = "a\x{4E00}b\x{4E11}";
1441    
1442     open my $fh, '<', \$byte;
1443     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1444     (<Q::cs|Perl.utf8>, $fh);
1445    
1446     $test->id ('1.getc');
1447     $test->assert_equals ($efh->getc, "a");
1448    
1449     $test->id ('1.ungetc');
1450     $efh->ungetc (ord "a");
1451     $test->assert_equals ($efh->getc, "a");
1452    
1453     $test->id ('2.getc');
1454     $test->assert_equals ($efh->getc, "\x{4E00}");
1455    
1456     $test->id ('2.ungetc');
1457     $efh->ungetc (ord "\x{4E00}");
1458     $test->assert_equals ($efh->getc, "\x{4E00}");
1459    
1460     $test->id ('3.getc');
1461     $test->assert_equals ($efh->getc, "b");
1462    
1463     $test->id ('4.getc');
1464     $test->assert_equals ($efh->getc, "\x{4E11}");
1465    
1466     $test->id ('4.ungetc');
1467     $efh->ungetc (ord "\x{4E11}");
1468     $test->assert_equals ($efh->getc, "\x{4E11}");
1469 wakaba 1.2
1470 wakaba 1.3 @Attr:
1471     @@Name: inputEncoding
1472     @@enDesc:
1473     The name of the input charset.
1474     @@Type: String
1475     @@Get:
1476     @@@enDesc:
1477     If there is a string looks like encoding declaration,
1478     then the value of it, in lowercase. Otherwise and
1479     there is the UTF-16 <CHAR::BOM>, then <CODE::utf-16>.
1480     Otherwise, <CODE::utf-8>.
1481     @@@nullCase:
1482     @@@@enDesc:
1483     If the charset is different from <Q::xml-auto-charset:>.
1484     @@@PerlDef:
1485     $r = $self->{<H::mce|inputEncoding>};
1486    
1487     @Attr:
1488     @@Name: hasBOM
1489     @@enDesc:
1490     Whether the decoder detected the <CHAR::BYTE ORDER MARK> or not.
1491     @@Type: idl|boolean||ManakaiDOM|all
1492     @@Get:
1493     @@@TrueCase:
1494     @@@@enDesc:
1495     If there is the <CHAR::BOM>.
1496     @@@FalseCase:
1497     @@@@enDesc:
1498     Either if there is no <CHAR::BOM>, the decoder not
1499     reached to the end of the <CHAR::BOM>, or
1500     the decoder implementation does not provide whether
1501     there is the <CHAR::BOM> or not.
1502     @@@PerlDef:
1503     $r = $self->{<H::mce|hasBOM>};
1504    
1505     @Test:
1506     @@QName: MCDecodeHandle.utf-8-optional-bom.1.test
1507     @@PerlDef:
1508     my $impl;
1509     __CODE{createImplForTest:: $impl => $impl}__;
1510    
1511     my $byte = qq<\xEF\xBB\xBFabc>;
1512    
1513     my $error;
1514    
1515     open my $fh, '<', \$byte;
1516     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1517     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1518    
1519     $test->id (1);
1520     $test->assert_equals ($efh->getc, "a");
1521    
1522     $test->id (2);
1523     $test->assert_equals ($efh->getc, "b");
1524    
1525     $test->id (3);
1526     $test->assert_equals ($efh->getc, "c");
1527    
1528     $test->id (4);
1529     $test->assert_null ($efh->getc);
1530    
1531     $test->id ('err');
1532     $test->assert_false ($error);
1533    
1534     $test->id ('bom');
1535     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1536    
1537     @Test:
1538     @@QName: MCDecodeHandle.utf-8-optional-bom.2.test
1539     @@PerlDef:
1540     my $impl;
1541     __CODE{createImplForTest:: $impl => $impl}__;
1542    
1543     my $byte = qq<abc>;
1544    
1545     my $error;
1546    
1547     open my $fh, '<', \$byte;
1548     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1549     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1550    
1551     $test->id (1);
1552     $test->assert_equals ($efh->getc, "a");
1553    
1554     $test->id (2);
1555     $test->assert_equals ($efh->getc, "b");
1556    
1557     $test->id (3);
1558     $test->assert_equals ($efh->getc, "c");
1559    
1560     $test->id (4);
1561     $test->assert_null ($efh->getc);
1562    
1563     $test->id ('err');
1564     $test->assert_false ($error);
1565    
1566     $test->id ('bom');
1567     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1568    
1569     @Test:
1570     @@QName: MCDecodeHandle.utf-8-optional-bom.3.test
1571     @@PerlDef:
1572     my $impl;
1573     __CODE{createImplForTest:: $impl => $impl}__;
1574    
1575     my $byte = qq<\xEF\xBB\xBF\xEF\xBB\xBFabc>;
1576    
1577     my $error;
1578    
1579     open my $fh, '<', \$byte;
1580     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1581     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1582    
1583     $test->id ('zwnbsp');
1584     $test->assert_equals ($efh->getc, "\x{FEFF}");
1585    
1586     $test->id (1);
1587     $test->assert_equals ($efh->getc, "a");
1588    
1589     $test->id (2);
1590     $test->assert_equals ($efh->getc, "b");
1591    
1592     $test->id (3);
1593     $test->assert_equals ($efh->getc, "c");
1594    
1595     $test->id (4);
1596     $test->assert_null ($efh->getc);
1597    
1598     $test->id ('err');
1599     $test->assert_false ($error);
1600    
1601     $test->id ('bom');
1602     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1603    
1604     @Test:
1605     @@QName: MCDecodeHandle.utf-8-optional-bom.4.test
1606     @@PerlDef:
1607     my $impl;
1608     __CODE{createImplForTest:: $impl => $impl}__;
1609    
1610     my $byte = qq<\xEF\xBB\xBF>;
1611    
1612     my $error;
1613    
1614     open my $fh, '<', \$byte;
1615     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1616     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1617    
1618     $test->id (1);
1619     $test->assert_null ($efh->getc);
1620    
1621     $test->id ('err');
1622     $test->assert_false ($error);
1623    
1624     $test->id ('bom');
1625     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1626    
1627     @Test:
1628     @@QName: MCDecodeHandle.utf-8-optional-bom.5.test
1629     @@PerlDef:
1630     my $impl;
1631     __CODE{createImplForTest:: $impl => $impl}__;
1632    
1633     my $byte = qq<>;
1634    
1635     my $error;
1636    
1637     open my $fh, '<', \$byte;
1638     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1639     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1640    
1641     $test->id (1);
1642     $test->assert_null ($efh->getc);
1643    
1644     $test->id ('err');
1645     $test->assert_false ($error);
1646    
1647     $test->id ('bom');
1648     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1649    
1650     @Test:
1651     @@QName: MCDecodeHandle.utf-8-optional-bom.6.test
1652     @@PerlDef:
1653     my $impl;
1654     __CODE{createImplForTest:: $impl => $impl}__;
1655    
1656     my $byte = qq<ab>;
1657    
1658     my $error;
1659    
1660     open my $fh, '<', \$byte;
1661     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1662     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1663    
1664     $test->id (1);
1665     $test->assert_equals ($efh->getc, "a");
1666    
1667     $test->id (2);
1668     $test->assert_equals ($efh->getc, "b");
1669    
1670     $test->id (3);
1671     $test->assert_null ($efh->getc);
1672    
1673     $test->id ('err');
1674     $test->assert_false ($error);
1675    
1676     $test->id ('bom');
1677     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1678    
1679     @Test:
1680     @@QName: MCDecodeHandle.utf-8-optional-bom.7.test
1681     @@PerlDef:
1682     my $impl;
1683     __CODE{createImplForTest:: $impl => $impl}__;
1684    
1685     my $byte = qq<a>;
1686    
1687     my $error;
1688    
1689     open my $fh, '<', \$byte;
1690     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1691     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1692    
1693     $test->id (1);
1694     $test->assert_equals ($efh->getc, "a");
1695    
1696     $test->id (2);
1697     $test->assert_null ($efh->getc);
1698    
1699     $test->id ('err');
1700     $test->assert_false ($error);
1701    
1702     $test->id ('bom');
1703     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1704    
1705    
1706     @Test:
1707     @@QName: MCDecodeHandle.utf-16-with-bom.1.test
1708     @@PerlDef:
1709     my $impl;
1710     __CODE{createImplForTest:: $impl => $impl}__;
1711    
1712     my $byte = qq<\xFE\xFF\x4E\x00\x00a>;
1713    
1714     my $error;
1715    
1716     open my $fh, '<', \$byte;
1717     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1718     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1719    
1720     $test->id (1);
1721     $test->assert_equals ($efh->getc, "\x{4E00}");
1722    
1723     $test->id (2);
1724     $test->assert_equals ($efh->getc, "a");
1725    
1726     $test->id (3);
1727     $test->assert_null ($efh->getc);
1728    
1729     $test->id ('err');
1730     $test->assert_false ($error);
1731    
1732     $test->id ('bom');
1733     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1734    
1735     @Test:
1736     @@QName: MCDecodeHandle.utf-16-with-bom.2.test
1737     @@PerlDef:
1738     my $impl;
1739     __CODE{createImplForTest:: $impl => $impl}__;
1740    
1741     my $byte = qq<\xFF\xFE\x00\x4Ea\x00>;
1742    
1743     my $error;
1744    
1745     open my $fh, '<', \$byte;
1746     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1747     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1748    
1749     $test->id (1);
1750     $test->assert_equals ($efh->getc, "\x{4E00}");
1751    
1752     $test->id (2);
1753     $test->assert_equals ($efh->getc, "a");
1754    
1755     $test->id (3);
1756     $test->assert_null ($efh->getc);
1757    
1758     $test->id ('err');
1759     $test->assert_false ($error);
1760    
1761     $test->id ('bom');
1762     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1763    
1764     @Test:
1765     @@QName: MCDecodeHandle.utf-16-with-bom.3.test
1766     @@PerlDef:
1767     my $impl;
1768     __CODE{createImplForTest:: $impl => $impl}__;
1769    
1770     my $byte = qq<\xFE\xFF\x00a>;
1771    
1772     my $error;
1773    
1774     open my $fh, '<', \$byte;
1775     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1776     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1777    
1778     $test->id (1);
1779     $test->assert_equals ($efh->getc, "a");
1780    
1781     $test->id (2);
1782     $test->assert_null ($efh->getc);
1783    
1784     $test->id ('err');
1785     $test->assert_false ($error);
1786    
1787     $test->id ('bom');
1788     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1789    
1790     @Test:
1791     @@QName: MCDecodeHandle.utf-16-with-bom.4.test
1792     @@PerlDef:
1793     my $impl;
1794     __CODE{createImplForTest:: $impl => $impl}__;
1795    
1796     my $byte = qq<\xFF\xFEa\x00>;
1797    
1798     my $error;
1799    
1800     open my $fh, '<', \$byte;
1801     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1802     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1803    
1804     $test->id (1);
1805     $test->assert_equals ($efh->getc, "a");
1806    
1807     $test->id (2);
1808     $test->assert_null ($efh->getc);
1809    
1810     $test->id ('err');
1811     $test->assert_false ($error);
1812    
1813     $test->id ('bom');
1814     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1815    
1816     @Test:
1817     @@QName: MCDecodeHandle.utf-16-with-bom.5.test
1818     @@PerlDef:
1819     my $impl;
1820     __CODE{createImplForTest:: $impl => $impl}__;
1821    
1822     my $byte = qq<\xFE\xFFa>;
1823    
1824     my $error;
1825    
1826     open my $fh, '<', \$byte;
1827     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1828     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1829    
1830     $test->id ('0.error');
1831     $test->assert_null ($error);
1832    
1833     $test->id (1);
1834     $test->assert_equals ($efh->getc, "a");
1835     $test->id ('1.error');
1836     $test->assert_equals ($error, <Q::cs|illegal-octets-error>);
1837     $error = null;
1838    
1839     $test->id (2);
1840     $test->assert_null ($efh->getc);
1841    
1842     $test->id ('err');
1843     $test->assert_null ($error);
1844    
1845     $test->id ('bom');
1846     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1847    
1848     @Test:
1849     @@QName: MCDecodeHandle.utf-16-with-bom.6.test
1850     @@PerlDef:
1851     my $impl;
1852     __CODE{createImplForTest:: $impl => $impl}__;
1853    
1854     my $byte = qq<\xFF\xFEa>;
1855    
1856     my $error;
1857    
1858     open my $fh, '<', \$byte;
1859     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1860     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1861    
1862     $test->id ('0.error');
1863     $test->assert_null ($error);
1864    
1865     $test->id (1);
1866     $test->assert_equals ($efh->getc, "a");
1867     $test->id ('1.error');
1868     $test->assert_equals ($error, <Q::cs|illegal-octets-error>);
1869     $error = null;
1870    
1871     $test->id (2);
1872     $test->assert_null ($efh->getc);
1873    
1874     $test->id ('err');
1875     $test->assert_null ($error);
1876    
1877     $test->id ('bom');
1878     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1879    
1880     @Test:
1881     @@QName: MCDecodeHandle.utf-16-with-bom.7.test
1882     @@PerlDef:
1883     my $impl;
1884     __CODE{createImplForTest:: $impl => $impl}__;
1885    
1886     my $byte = qq<\xFE\xFF>;
1887    
1888     my $error;
1889    
1890     open my $fh, '<', \$byte;
1891     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1892     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1893    
1894     $test->id (1);
1895     $test->assert_null ($efh->getc);
1896    
1897     $test->id ('err');
1898     $test->assert_null ($error);
1899    
1900     $test->id ('bom');
1901     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1902    
1903     @Test:
1904     @@QName: MCDecodeHandle.utf-16-with-bom.8.test
1905     @@PerlDef:
1906     my $impl;
1907     __CODE{createImplForTest:: $impl => $impl}__;
1908    
1909     my $byte = qq<\xFF\xFE>;
1910    
1911     my $error;
1912    
1913     open my $fh, '<', \$byte;
1914     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1915     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1916    
1917     $test->id (1);
1918     $test->assert_null ($efh->getc);
1919    
1920     $test->id ('err');
1921     $test->assert_null ($error);
1922    
1923     $test->id ('bom');
1924     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1925    
1926     @Test:
1927     @@QName: MCDecodeHandle.utf-16-with-bom.9.test
1928     @@PerlDef:
1929     my $impl;
1930     __CODE{createImplForTest:: $impl => $impl}__;
1931    
1932     my $byte = qq<\xFD\xFF>;
1933    
1934     my $error;
1935    
1936     open my $fh, '<', \$byte;
1937     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1938     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1939    
1940     $test->id ('no.bom');
1941     $test->assert_equals ($error, <Q::cs|no-bom-error>);
1942     $error = null;
1943    
1944     $test->id (1);
1945     $test->assert_equals ($efh->getc, "\x{FDFF}");
1946    
1947     $test->id (2);
1948     $test->assert_null ($efh->getc);
1949    
1950     $test->id ('err');
1951     $test->assert_null ($error);
1952    
1953     $test->id ('bom');
1954     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1955    
1956     @Test:
1957     @@QName: MCDecodeHandle.utf-16-with-bom.10.test
1958     @@PerlDef:
1959     my $impl;
1960     __CODE{createImplForTest:: $impl => $impl}__;
1961    
1962     my $byte = qq<\xFD>;
1963    
1964     my $error;
1965    
1966     open my $fh, '<', \$byte;
1967     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1968     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1969    
1970     $test->id ('no.bom');
1971     $test->assert_equals ($error, <Q::cs|no-bom-error>);
1972     $error = null;
1973    
1974     $test->id (1);
1975     $test->assert_equals ($efh->getc, "\xFD");
1976     $test->id ('1.error');
1977     $test->assert_equals ($error, <Q::cs|illegal-octets-error>);
1978     $error = null;
1979    
1980     $test->id (2);
1981     $test->assert_null ($efh->getc);
1982    
1983     $test->id ('err');
1984     $test->assert_null ($error);
1985    
1986     $test->id ('bom');
1987     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1988    
1989     @Test:
1990     @@QName: MCDecodeHandle.utf-16-with-bom.11.test
1991     @@PerlDef:
1992     my $impl;
1993     __CODE{createImplForTest:: $impl => $impl}__;
1994    
1995     my $byte = qq<>;
1996    
1997     my $error;
1998    
1999     open my $fh, '<', \$byte;
2000     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
2001     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
2002    
2003     $test->id ('no.bom');
2004     $test->assert_equals ($error, <Q::cs|no-bom-error>);
2005     $error = null;
2006    
2007     $test->id (1);
2008     $test->assert_null ($efh->getc);
2009    
2010     $test->id ('err');
2011     $test->assert_null ($error);
2012    
2013     $test->id ('bom');
2014     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
2015 wakaba 1.2
2016 wakaba 1.3 @Test:
2017     @@QName: MCDecodeHandle.xml.1.test
2018     @@PerlDef:
2019     my $impl;
2020     __CODE{createImplForTest:: $impl => $impl}__;
2021 wakaba 1.2
2022 wakaba 1.3 my @testdata = (
2023     {
2024     id => q<l=0>,
2025     in => q<>,
2026     out => [null],
2027     name => 'utf-8', bom => false,
2028     },
2029     {
2030     id => q<l=1>,
2031     in => "a",
2032     out => [null, "a", null],
2033     name => 'utf-8', bom => false,
2034     },
2035     {
2036     id => q<bom8.l=0>,
2037     in => "\xEF\xBB\xBF",
2038     out => [null],
2039     name => 'utf-8', bom => true,
2040     },
2041     {
2042     id => q<bom8.l=1>,
2043     in => "\xEF\xBB\xBFa",
2044     out => [null, "a", null],
2045     name => 'utf-8', bom => true,
2046     },
2047     {
2048     id => q<bom8.zwnbsp>,
2049     in => "\xEF\xBB\xBF\xEF\xBB\xBF",
2050     out => [null, "\x{FEFF}", null],
2051     name => 'utf-8', bom => true,
2052     },
2053     {
2054     id => q<bom16be.l=0>,
2055     in => "\xFE\xFF",
2056     out => [null],
2057     name => 'utf-16', bom => true,
2058     },
2059     {
2060     id => q<bom16le.l=0>,
2061     in => "\xFF\xFE",
2062     out => [null],
2063     name => 'utf-16', bom => true,
2064     },
2065     {
2066     id => q<bom16be.l=1>,
2067     in => "\xFE\xFFa",
2068     out => [null, "a", [<Q::cs|illegal-octets-error>]],
2069     name => 'utf-16', bom => true,
2070     },
2071     {
2072     id => q<bom16le.l=1>,
2073     in => "\xFF\xFEa",
2074     out => [null, "a", [<Q::cs|illegal-octets-error>]],
2075     name => 'utf-16', bom => true,
2076     },
2077     {
2078     id => q<bom16be.l=2>,
2079     in => "\xFE\xFF\x4E\x00",
2080     out => [null, "\x{4E00}", null],
2081     name => 'utf-16', bom => true,
2082     },
2083     {
2084     id => q<bom16le.l=2>,
2085     in => "\xFF\xFE\x00\x4E",
2086     out => [null, "\x{4E00}", null],
2087     name => 'utf-16', bom => true,
2088     },
2089     {
2090     id => q<bom16be.l=2lt>,
2091     in => "\xFE\xFF\x00<",
2092     out => [null, "<", null],
2093     name => 'utf-16', bom => true,
2094     },
2095     {
2096     id => q<bom16le.l=2lt>,
2097     in => "\xFF\xFE<\x00",
2098     out => [null, "<", null],
2099     name => 'utf-16', bom => true,
2100     },
2101     {
2102     id => q<bom16be.zwnbsp>,
2103     in => "\xFE\xFF\xFE\xFF",
2104     out => [null, "\x{FEFF}", null],
2105     name => 'utf-16', bom => true,
2106     },
2107     {
2108     id => q<bom16le.zwnbsp>,
2109     in => "\xFF\xFE\xFF\xFE",
2110     out => [null, "\x{FEFF}", null],
2111     name => 'utf-16', bom => true,
2112     },
2113     {
2114     id => q<bom32e3412.l=0>,
2115     in => "\xFE\xFF\x00\x00",
2116     out => [null, "\x00", null],
2117     name => 'utf-16', bom => true,
2118     },
2119     {
2120     id => q<bom32e4321.l=0>,
2121     in => "\xFF\xFE\x00\x00",
2122     out => [null, "\x00", null],
2123     name => 'utf-16', bom => true,
2124     },
2125     {
2126     id => q<bom16be.l=4ltq>,
2127     in => "\xFE\xFF\x00<\x00?",
2128     out => [null, "<", null, "?", null],
2129     name => 'utf-16', bom => true,
2130     },
2131     {
2132     id => q<bom16le.l=4ltq>,
2133     in => "\xFF\xFE<\x00?\x00",
2134     out => [null, "<", null, "?", null],
2135     name => 'utf-16', bom => true,
2136     },
2137     {
2138     id => q<bom16be.decl.1>,
2139     in => qq[\xFE\xFF\x00<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2140     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2141     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2142     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>],
2143     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2144     " ", null, "v", null, "e", null, "r", null, "s", null,
2145     "i", null, "o", null, "n", null, "=", null, '"', null,
2146     "1", null, ".", null, "0", null, '"', null, " ", null,
2147     "e", null, "n", null, "c", null, "o", null, "d", null,
2148     "i", null, "n", null, "g", null, "=", null, '"', null,
2149     "u", null, "t", null, "f", null, "-", null, "1", null,
2150     "6", null, '"', null, "?", null, ">", null],
2151     name => 'utf-16', bom => true,
2152     },
2153     {
2154     id => q<bom16le.decl.1>,
2155     in => qq[\xFF\xFE<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2156     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2157     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2158     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>\x00],
2159     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2160     " ", null, "v", null, "e", null, "r", null, "s", null,
2161     "i", null, "o", null, "n", null, "=", null, '"', null,
2162     "1", null, ".", null, "0", null, '"', null, " ", null,
2163     "e", null, "n", null, "c", null, "o", null, "d", null,
2164     "i", null, "n", null, "g", null, "=", null, '"', null,
2165     "u", null, "t", null, "f", null, "-", null, "1", null,
2166     "6", null, '"', null, "?", null, ">", null],
2167     name => 'utf-16', bom => true,
2168     },
2169     {
2170     id => q<utf16be.decl.1>,
2171     in => qq[\x00<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2172     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2173     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2174     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00b\x00e\x00"\x00?\x00>],
2175     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2176     " ", null, "v", null, "e", null, "r", null, "s", null,
2177     "i", null, "o", null, "n", null, "=", null, '"', null,
2178     "1", null, ".", null, "0", null, '"', null, " ", null,
2179     "e", null, "n", null, "c", null, "o", null, "d", null,
2180     "i", null, "n", null, "g", null, "=", null, '"', null,
2181     "u", null, "t", null, "f", null, "-", null, "1", null,
2182     "6", null, "b", null, "e", null, '"', null,
2183     "?", null, ">", null],
2184     name => 'utf-16be', bom => false,
2185     },
2186     {
2187     id => q<utf16le.decl.1>,
2188     in => qq[<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2189     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2190     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2191     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00l\x00e\x00"].
2192     qq[\x00?\x00>\x00],
2193     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2194     " ", null, "v", null, "e", null, "r", null, "s", null,
2195     "i", null, "o", null, "n", null, "=", null, '"', null,
2196     "1", null, ".", null, "0", null, '"', null, " ", null,
2197     "e", null, "n", null, "c", null, "o", null, "d", null,
2198     "i", null, "n", null, "g", null, "=", null, '"', null,
2199     "u", null, "t", null, "f", null, "-", null, "1", null,
2200     "6", null, "l", null, "e", null, '"', null, "?", null,
2201     ">", null],
2202     name => 'utf-16le', bom => false,
2203     },
2204     {
2205     id => q<16be.decl.1>,
2206     in => qq[\x00<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2207     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2208     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2209     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>],
2210     out => [[<Q::cs|charset-name-mismatch-error>],
2211     "<", null, "?", null, "x", null, "m", null, "l", null,
2212     " ", null, "v", null, "e", null, "r", null, "s", null,
2213     "i", null, "o", null, "n", null, "=", null, '"', null,
2214     "1", null, ".", null, "0", null, '"', null, " ", null,
2215     "e", null, "n", null, "c", null, "o", null, "d", null,
2216     "i", null, "n", null, "g", null, "=", null, '"', null,
2217     "u", null, "t", null, "f", null, "-", null, "1", null,
2218     "6", null, '"', null, "?", null, ">", null],
2219     name => 'utf-16', bom => false,
2220     },
2221     {
2222     id => q<16le.decl.1>,
2223     in => qq[<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2224     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2225     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2226     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>\x00],
2227     out => [[<Q::cs|charset-name-mismatch-error>],
2228     "<", null, "?", null, "x", null, "m", null, "l", null,
2229     " ", null, "v", null, "e", null, "r", null, "s", null,
2230     "i", null, "o", null, "n", null, "=", null, '"', null,
2231     "1", null, ".", null, "0", null, '"', null, " ", null,
2232     "e", null, "n", null, "c", null, "o", null, "d", null,
2233     "i", null, "n", null, "g", null, "=", null, '"', null,
2234     "u", null, "t", null, "f", null, "-", null, "1", null,
2235     "6", null, '"', null, "?", null, ">", null],
2236     name => 'utf-16', bom => false,
2237     },
2238     {
2239     id => q<8.decl.1>,
2240     in => qq[<?xml version="1.0" encoding="utf-8"?>],
2241     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2242     " ", null, "v", null, "e", null, "r", null, "s", null,
2243     "i", null, "o", null, "n", null, "=", null, '"', null,
2244     "1", null, ".", null, "0", null, '"', null, " ", null,
2245     "e", null, "n", null, "c", null, "o", null, "d", null,
2246     "i", null, "n", null, "g", null, "=", null, '"', null,
2247     "u", null, "t", null, "f", null, "-", null, "8", null,
2248     '"', null, "?", null, ">", null],
2249     name => 'utf-8', bom => false,
2250     },
2251     {
2252     id => q<8.decl.2>,
2253     in => qq[<?xml encoding="utf-8"?>],
2254     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2255     " ", null,
2256     "e", null, "n", null, "c", null, "o", null, "d", null,
2257     "i", null, "n", null, "g", null, "=", null, '"', null,
2258     "u", null, "t", null, "f", null, "-", null, "8", null,
2259     '"', null, "?", null, ">", null],
2260     name => 'utf-8', bom => false,
2261     },
2262     {
2263     id => q<8.decl.3>,
2264     in => qq[<?xml version="1.1" encoding="utf-8"?>],
2265     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2266     " ", null, "v", null, "e", null, "r", null, "s", null,
2267     "i", null, "o", null, "n", null, "=", null, '"', null,
2268     "1", null, ".", null, "1", null, '"', null, " ", null,
2269     "e", null, "n", null, "c", null, "o", null, "d", null,
2270     "i", null, "n", null, "g", null, "=", null, '"', null,
2271     "u", null, "t", null, "f", null, "-", null, "8", null,
2272     '"', null, "?", null, ">", null],
2273     name => 'utf-8', bom => false,
2274     },
2275     {
2276     id => q<8.decl.4>,
2277     in => qq[<?xml version="1.0"?>],
2278     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2279     " ", null, "v", null, "e", null, "r", null, "s", null,
2280     "i", null, "o", null, "n", null, "=", null, '"', null,
2281     "1", null, ".", null, "0", null, '"', null,
2282     "?", null, ">", null],
2283     name => 'utf-8', bom => false,
2284     },
2285     {
2286     id => q<bom8.decl.1>,
2287     in => qq[\xEF\xBB\xBF<?xml encoding="utf-8"?>],
2288     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2289     " ", null,
2290     "e", null, "n", null, "c", null, "o", null, "d", null,
2291     "i", null, "n", null, "g", null, "=", null, '"', null,
2292     "u", null, "t", null, "f", null, "-", null, "8", null,
2293     '"', null, "?", null, ">", null],
2294     name => 'utf-8', bom => true,
2295     },
2296     {
2297     id => q<us-ascii.decl.1>,
2298     in => qq[<?xml encoding="us-ascii"?>],
2299     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2300     " ", null,
2301     "e", null, "n", null, "c", null, "o", null, "d", null,
2302     "i", null, "n", null, "g", null, "=", null, '"', null,
2303     "u", null, "s", null, "-", null, "a", null, "s", null,
2304     "c", null, "i", null, "i", null,
2305     '"', null, "?", null, ">", null],
2306     name => 'us-ascii', bom => false,
2307     },
2308     {
2309     id => q<us-ascii.decl.2>,
2310     in => qq[<?xml encoding="US-ascii"?>],
2311     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2312     " ", null,
2313     "e", null, "n", null, "c", null, "o", null, "d", null,
2314     "i", null, "n", null, "g", null, "=", null, '"', null,
2315     "U", null, "S", null, "-", null, "a", null, "s", null,
2316     "c", null, "i", null, "i", null,
2317     '"', null, "?", null, ">", null],
2318     name => 'us-ascii', bom => false,
2319     },
2320     {
2321     id => q<us-ascii.decl.3>,
2322     in => qq[<?xml encoding='us-ascii'?>],
2323     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2324     " ", null,
2325     "e", null, "n", null, "c", null, "o", null, "d", null,
2326     "i", null, "n", null, "g", null, "=", null, "'", null,
2327     "u", null, "s", null, "-", null, "a", null, "s", null,
2328     "c", null, "i", null, "i", null,
2329     "'", null, "?", null, ">", null],
2330     name => 'us-ascii', bom => false,
2331     },
2332     );
2333    
2334     for my $testdata (@testdata) {
2335     my $byte = $testdata->{in};
2336     my $error;
2337     my $i = 0;
2338    
2339     open my $fh, '<', \$byte;
2340     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
2341     (<Q::xml-auto-charset:>, $fh, sub {
2342     my (null, null, $etype, %opt) = @_;
2343     $error = [$etype, \%opt];
2344     });
2345    
2346     $test->id ("$testdata->{id}.bom");
2347     my $tf = $testdata->{bom} ? 'assert_true' : 'assert_false';
2348     $test->$tf ($efh-><AG::MCDecodeHandle.hasBOM>);
2349    
2350     $test->id ("$testdata->{id}.name");
2351     $test->assert_equals ($efh-><AG::MCDecodeHandle.inputEncoding>,
2352     $testdata->{name});
2353    
2354     while (@{$testdata->{out}}) {
2355     if ($i != 0) {
2356     my $c = shift @{$testdata->{out}};
2357     $test->id ("$testdata->{id}.$i");
2358     $test->assert_equals ($efh->getc, $c);
2359     }
2360    
2361     my $v = shift @{$testdata->{out}};
2362     $test->id ("$testdata->{id}.$i.error");
2363     if (defined $v) {
2364     $test->assert_not_null ($error);
2365     $test->assert_equals ($error->[0], $v->[0]);
2366     } else {
2367     $test->assert_null ($error->[0]);
2368     }
2369     $error = null;
2370     $i++;
2371     }
2372 wakaba 1.2
2373 wakaba 1.3 $test->id ("$testdata->{id}.eof");
2374     $test->assert_null ($efh->getc);
2375     $test->assert_null ($error);
2376     } # testdata
2377 wakaba 1.7
2378     @Method:
2379     @@Name: close
2380     @@Return:
2381     @@@PerlDef:
2382     close $self->{<H::mce|filehandle>};
2383 wakaba 1.3 ##MCDecodeHandle
2384 wakaba 1.5
2385     ClsDef:
2386     @ClsQName: ManakaiMCEUCJPDecodeHandle
2387    
2388     @ClsISA: ManakaiMCDecodeHandle
2389    
2390     @Method:
2391     @@Name: getc
2392     @@enDesc:
2393     Returns the next character from the input.
2394     @@Return:
2395     @@@Type: String
2396     @@@enDesc:
2397     The next character.
2398     @@@nullCase:
2399     @@@@enDesc:
2400     If at the end of the file, or if there was an error, in which
2401     case <Perl::$!> is set.
2402     @@@PerlDef:
2403     if (@{$self->{<H::mce|characterQueue>}}) {
2404     $r = shift @{$self->{<H::mce|characterQueue>}};
2405     } else {
2406     __DEEP{
2407     my $error;
2408     if ($self->{<H::mce|continue>}) {
2409     if (read $self->{<H::mce|filehandle>},
2410     $self->{<H::mce|byteBuffer>}, 256,
2411     length $self->{<H::mce|byteBuffer>}) {
2412     #
2413     } else {
2414     $error = true;
2415     }
2416     $self->{<H::mce|continue>} = false;
2417     } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
2418     read $self->{<H::mce|filehandle>},
2419     $self->{<H::mce|byteBuffer>}, 256,
2420     length $self->{<H::mce|byteBuffer>};
2421     }
2422    
2423     unless ($error) {
2424     my $string = Encode::decode ($self->{<H::mce|perlEncodingName>},
2425     $self->{<H::mce|byteBuffer>},
2426     Encode::FB_QUIET ());
2427     if (length $string) {
2428     push @{$self->{<H::mce|characterQueue>}}, split //, $string;
2429     $r = shift @{$self->{<H::mce|characterQueue>}};
2430     if (length $self->{<H::mce|byteBuffer>}) {
2431     $self->{<H::mce|continue>} = true;
2432     }
2433     } else {
2434     if (length $self->{<H::mce|byteBuffer>}) {
2435     $error = true;
2436     } else {
2437     $r = null;
2438     }
2439     }
2440     }
2441     if ($error) {
2442     $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
2443     my $etype = <Q::cs|illegal-octets-error>;
2444     if ($r =~ /^[\xA1-\xFE]/) {
2445     if ($self->{<H::mce|byteBuffer>} =~ s/^([\xA1-\xFE])//) {
2446     $r .= $1;
2447     $etype = <Q::cs|unassigned-code-point-error>;
2448     }
2449     } elsif ($r eq "\x8F") {
2450     if ($self->{<H::mce|byteBuffer>}
2451     =~ s/^([\xA1-\xFE][\xA1-\xFE]?)//) {
2452     $r .= $1;
2453     $etype = <Q::cs|unassigned-code-point-error> if length $1 == 2;
2454     }
2455     } elsif ($r eq "\x8E") {
2456     if ($self->{<H::mce|byteBuffer>} =~ s/^([\xA1-\xFE])//) {
2457     $r .= $1;
2458     $etype = <Q::cs|unassigned-code-point-error>;
2459     }
2460     } elsif ($r eq "\xA0" or $r eq "\xFF") {
2461     $etype = <Q::cs|unassigned-code-point-error>;
2462     }
2463     $self->{<H::mce|onerror>}
2464     ->($self->{<H::mce|onerror>}, $self, $etype,
2465     octets => \$r);
2466     }
2467     }__;
2468     }
2469    
2470    
2471     @Test:
2472     @@QName: MCEUCJPDecodeHandle.1.test
2473     @@PerlDef:
2474     my $impl;
2475     __CODE{createImplForTest:: $impl => $impl}__;
2476    
2477     my @testdata = (
2478     {
2479     id => q<l=0>,
2480     in => q<>,
2481     out => [null],
2482     },
2483     {
2484     id => q<l=1.00>,
2485     in => qq<\x00>,
2486     out => [null, "\x00", null],
2487     },
2488     {
2489     id => q<l=1.0d>,
2490     in => qq<\x0D>,
2491     out => [null, "\x0D", null],
2492     },
2493     {
2494     id => q<l=1.0e>,
2495     in => qq<\x0E>,
2496     out => [null, "\x0E", null],
2497     }, # Error??
2498     {
2499     id => q<l=1.0f>,
2500     in => qq<\x0F>,
2501     out => [null, "\x0F", null],
2502     }, # Error??
2503     {
2504     id => q<l=1.1b>,
2505     in => qq<\x1B>,
2506     out => [null, "\x1B", null],
2507     }, # Error??
2508     {
2509     id => q<l=1.a>,
2510     in => q<a>,
2511     out => [null, "a", null],
2512     },
2513     {
2514     id => q<l=1.20>,
2515     in => qq<\x20>,
2516     out => [null, "\x20", null],
2517     },
2518     {
2519     id => q<5C>,
2520     in => qq<\x5C>,
2521     out => [null, "\x5C", null],
2522     },
2523     {
2524     id => q<l=1.7E>,
2525     in => qq<\x7E>,
2526     out => [null, "\x7E", null],
2527     },
2528     {
2529     id => q<l=1.7F>,
2530     in => qq<\x7F>,
2531     out => [null, "\x7F", null],
2532     },
2533     {
2534     id => q<l=1.80>,
2535     in => qq<\x80>,
2536     out => [null, "\x80", null],
2537     },
2538     {
2539     id => q<l=1.8c>,
2540     in => qq<\x8C>,
2541     out => [null, "\x8C", null],
2542     },
2543     {
2544     id => q<l=1.8e>,
2545     in => qq<\x8E>,
2546     out => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
2547     },
2548     {
2549     id => q<l=1.8f>,
2550     in => qq<\x8F>,
2551     out => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
2552     },
2553     {
2554     id => q<l=1.a0>,
2555     in => qq<\xA0>,
2556     out => [null, "\xA0", [<Q::cs|unassigned-code-point-error>]],
2557     },
2558     {
2559     id => q<l=1.a1>,
2560     in => qq<\xA1>,
2561     out => [null, "\xA1", [<Q::cs|illegal-octets-error>]],
2562     },
2563     {
2564     id => q<l=1.a2>,
2565     in => qq<\xA2>,
2566     out => [null, "\xA2", [<Q::cs|illegal-octets-error>]],
2567     },
2568     {
2569     id => q<l=1.fd>,
2570     in => qq<\xFD>,
2571     out => [null, "\xFD", [<Q::cs|illegal-octets-error>]],
2572     },
2573     {
2574     id => q<l=1.fe>,
2575     in => qq<\xFE>,
2576     out => [null, "\xFE", [<Q::cs|illegal-octets-error>]],
2577     },
2578     {
2579     id => q<l=1.ff>,
2580     in => qq<\xFF>,
2581     out => [null, "\xFF", [<Q::cs|unassigned-code-point-error>]],
2582     },
2583     {
2584     id => q<l=2.0000>,
2585     in => qq<\x00\x00>,
2586     out => [null, "\x00", null, "\x00", null],
2587     },
2588     {
2589     id => q<l=2.0D0A>,
2590     in => qq<\x0D\x0A>,
2591     out => [null, "\x0D", null, "\x0A", null],
2592     },
2593     {
2594     id => q<l=2.1B28>,
2595     in => qq<\x1B\x28>,
2596     out => [null, "\x1B", null, "\x28", null],
2597     },# Error??
2598     {
2599     id => q<l=2.2020>,
2600     in => qq<\x20\x20>,
2601     out => [null, "\x20", null, "\x20", null],
2602     },
2603     {
2604     id => q<l=2.ab>,
2605     in => qq<ab>,
2606     out => [null, "a", null, "b", null],
2607     },
2608     {
2609     id => q<l=2.a0a1>,
2610     in => qq<\xA0\xA1>,
2611     out => [null, "\xA0", [<Q::cs|unassigned-code-point-error>],
2612     "\xA1", [<Q::cs|illegal-octets-error>]],
2613     },
2614     {
2615     id => q<l=2.a1a1>,
2616     in => qq<\xA1\xA1>,
2617     out => [null, "\x{3000}", null],
2618     },
2619     {
2620     id => q<l=2.a1a2>,
2621     in => qq<\xA1\xA2>,
2622     out => [null, "\x{3001}", null],
2623     },
2624     {
2625     id => q<l=2.a1a4>,
2626     in => qq<\xA1\xA4>,
2627     out => [null, "\x{FF0C}", null], # FULLWIDTH COMMA
2628     },
2629     {
2630     id => q<a1a6>,
2631     in => qq<\xA1\xA6>,
2632     out => [null, "\x{30FB}", null], # KATAKABA MIDDLE DOT
2633     },
2634     {
2635     id => q<a1a7>,
2636     in => qq<\xA1\xA7>,
2637     out => [null, "\x{FF1A}", null], # FULLWIDTH COLON
2638     },
2639     {
2640     id => q<a1b1>,
2641     in => qq<\xA1\xB1>,
2642     out => [null, "\x{203E}", null], # OVERLINE
2643     },
2644     {
2645     id => q<a1bd>,
2646     in => qq<\xA1\xBD>,
2647     out => [null, "\x{2014}", null], # EM DASH
2648     },
2649     {
2650     id => q<a1c0>,
2651     in => qq<\xA1\xC0>,
2652     out => [null, "\x{FF3C}", null], # FULLWIDTH REVERSE SOLIDUS
2653     },
2654     {
2655     id => q<a1c1>,
2656     in => qq<\xA1\xC1>,
2657     out => [null, "\x{301C}", null], # WAVE DASH
2658     },
2659     {
2660     id => q<a1c2>,
2661     in => qq<\xA1\xC2>,
2662     out => [null, "\x{2016}", null], # DOUBLE VERTICAL LINE
2663     },
2664     {
2665     id => q<a1c4>,
2666     in => qq<\xA1\xC4>,
2667     out => [null, "\x{2026}", null], # HORIZONTAL ELLIPSIS
2668     },
2669     {
2670     id => q<a1dd>,
2671     in => qq<\xA1\xDD>,
2672     out => [null, "\x{2212}", null], # MINUS SIGN
2673     },
2674     {
2675     id => q<a1ef>,
2676     in => qq<\xA1\xEF>,
2677     out => [null, "\x{00A5}", null], # YEN SIGN
2678     },
2679     {
2680     id => q<a1f1>,
2681     in => qq<\xA1\xF1>,
2682     out => [null, "\x{00A2}", null], # CENT SIGN
2683     },
2684     {
2685     id => q<a1f2>,
2686     in => qq<\xA1\xF2>,
2687     out => [null, "\x{00A3}", null], # POUND SIGN
2688     },
2689     {
2690     id => q<a1f2>,
2691     in => qq<\xA1\xFF>,
2692     out => [null, "\xA1", [<Q::cs|illegal-octets-error>],
2693     "\xFF", [<Q::cs|unassigned-code-point-error>]],
2694     },
2695     {
2696     id => q<a2ae>,
2697     in => qq<\xA2\xAE>,
2698     out => [null, "\x{3013}", null], # GETA MARK
2699     },
2700     {
2701     id => q<a2af>,
2702     in => qq<\xA2\xAF>,
2703     out => [null, "\xA2\xAF", [<Q::cs|unassigned-code-point-error>]],
2704     },
2705     {
2706     id => q<a2ba>,
2707     in => qq<\xA2\xBA>,
2708     out => [null, "\x{2208}", null], # ELEMENT OF
2709     },
2710     {
2711     id => q<a2fe>,
2712     in => qq<\xA2\xFE>,
2713     out => [null, "\x{25EF}", null], # LARGE CIRCLE
2714     },
2715     {
2716     id => q<adce>,
2717     in => qq<\xAD\xCE>,
2718     out => [null, "\xAD\xCE", [<Q::cs|unassigned-code-point-error>]],
2719     },
2720     {
2721     id => q<b0a6>,
2722     in => qq<\xB0\xA6>,
2723     out => [null, "\x{611B}", null], # han
2724     },
2725     {
2726     id => q<f4a6>,
2727     in => qq<\xF4\xA6>,
2728     out => [null, "\x{7199}", null], # han
2729     },
2730     {
2731     id => q<8ea1>,
2732     in => qq<\x8E\xA1>,
2733     out => [null, "\x{FF61}", null],
2734     },
2735     {
2736     id => q<8efe>,
2737     in => qq<\x8E\xFE>,
2738     out => [null, "\x8E\xFE", [<Q::cs|unassigned-code-point-error>]],
2739     },
2740     {
2741     id => q<8ffe>,
2742     in => qq<\x8F\xFE>,
2743     out => [null, "\x8F\xFE", [<Q::cs|illegal-octets-error>]],
2744     },
2745     {
2746     id => q<l=2.a1a2a3>,
2747     in => qq<\xA1\xA2\xA3>,
2748     out => [null, "\x{3001}", null,
2749     "\xA3", [<Q::cs|illegal-octets-error>]],
2750     },
2751     {
2752     id => q<8ea1a1>,
2753     in => qq<\x8E\xA1\xA1>,
2754     out => [null, "\x{FF61}", null,
2755     "\xA1", [<Q::cs|illegal-octets-error>]],
2756     },
2757     {
2758     id => q<8fa1a1>,
2759     in => qq<\x8F\xA1\xA1>,
2760     out => [null, "\x8F\xA1\xA1", [<Q::cs|unassigned-code-point-error>]],
2761     },
2762     {
2763     id => q<8fa2af>,
2764     in => qq<\x8F\xA2\xAF>,
2765     out => [null, "\x{02D8}", null],
2766     },
2767     {
2768     id => q<8fa2b7>,
2769     in => qq<\x8F\xA2\xB7>,
2770     out => [null, "\x{FF5E}", null], # FULLWIDTH TILDE
2771     },
2772     {
2773     id => q<a1a2a1a3>,
2774     in => qq<\xA1\xA2\xA1\xA3>,
2775     out => [null, "\x{3001}", null, "\x{3002}", null],
2776     },
2777     {
2778     id => q<8fa2af>,
2779     in => qq<\x8F\xA2\xAF\xAF>,
2780     out => [null, "\x{02D8}", null,
2781     "\xAF", [<Q::cs|illegal-octets-error>]],
2782     },
2783     {
2784     id => q<8fa2afafa1>,
2785     in => qq<\x8F\xA2\xAF\xAF\xA1>,
2786     out => [null, "\x{02D8}", null,
2787     "\xAF\xA1", [<Q::cs|unassigned-code-point-error>]],
2788     },
2789     );
2790    
2791     for my $testdata (@testdata) {
2792     my $byte = $testdata->{in};
2793     my $error;
2794     my $i = 0;
2795    
2796     open my $fh, '<', \$byte;
2797     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
2798     (<Q::cs|XML.euc-jp>, $fh, sub {
2799     my (null, null, $etype, %opt) = @_;
2800     $error = [$etype, \%opt];
2801     });
2802    
2803     while (@{$testdata->{out}}) {
2804     if ($i != 0) {
2805     my $c = shift @{$testdata->{out}};
2806     $test->id ("$testdata->{id}.$i");
2807     $test->assert_equals ($efh->getc, $c);
2808     }
2809    
2810     my $v = shift @{$testdata->{out}};
2811     $test->id ("$testdata->{id}.$i.error");
2812     if (defined $v) {
2813     $test->assert_not_null ($error);
2814     $test->assert_equals ($error->[0], $v->[0]);
2815     } else {
2816     $test->assert_null ($error->[0]);
2817     }
2818     $error = null;
2819     $i++;
2820     }
2821    
2822     $test->id ("$testdata->{id}.eof");
2823     $test->assert_null ($efh->getc);
2824     $test->assert_null ($error);
2825     } # testdata
2826     ##MCEUCJPDecodeHandle
2827    
2828     ClsDef:
2829     @ClsQName: ManakaiMCShiftJISDecodeHandle
2830    
2831     @ClsISA: ManakaiMCDecodeHandle
2832    
2833     @Method:
2834     @@Name: getc
2835     @@enDesc:
2836     Returns the next character from the input.
2837     @@Return:
2838     @@@Type: String
2839     @@@enDesc:
2840     The next character.
2841     @@@nullCase:
2842     @@@@enDesc:
2843     If at the end of the file, or if there was an error, in which
2844     case <Perl::$!> is set.
2845     @@@PerlDef:
2846     if (@{$self->{<H::mce|characterQueue>}}) {
2847     $r = shift @{$self->{<H::mce|characterQueue>}};
2848     } else {
2849     __DEEP{
2850     my $error;
2851     if ($self->{<H::mce|continue>}) {
2852     if (read $self->{<H::mce|filehandle>},
2853     $self->{<H::mce|byteBuffer>}, 256,
2854     length $self->{<H::mce|byteBuffer>}) {
2855     #
2856     } else {
2857     $error = true;
2858     }
2859     $self->{<H::mce|continue>} = false;
2860     } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
2861     read $self->{<H::mce|filehandle>},
2862     $self->{<H::mce|byteBuffer>}, 256,
2863     length $self->{<H::mce|byteBuffer>};
2864     }
2865    
2866     unless ($error) {
2867     my $string = Encode::decode ($self->{<H::mce|perlEncodingName>},
2868     $self->{<H::mce|byteBuffer>},
2869     Encode::FB_QUIET ());
2870     if (length $string) {
2871     push @{$self->{<H::mce|characterQueue>}}, split //, $string;
2872     $r = shift @{$self->{<H::mce|characterQueue>}};
2873     if (length $self->{<H::mce|byteBuffer>}) {
2874     $self->{<H::mce|continue>} = true;
2875     }
2876     } else {
2877     if (length $self->{<H::mce|byteBuffer>}) {
2878     $error = true;
2879     } else {
2880     $r = null;
2881     }
2882     }
2883     }
2884     if ($error) {
2885     $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
2886     my $etype = <Q::cs|illegal-octets-error>;
2887     if ($r =~ /^[\x81-\x9F\xE0-\xEF]/) {
2888     if ($self->{<H::mce|byteBuffer>} =~ s/(.)//s) {
2889     $r .= $1; # not limited to \x40-\xFC - \x7F
2890     $etype = <Q::cs|unassigned-code-point-error>;
2891     }
2892     } elsif ($r =~ /^[\x80\xA0\xF0-\xFF]/) {
2893     $etype = <Q::cs|unassigned-code-point-error>;
2894     }
2895     $self->{<H::mce|onerror>}
2896     ->($self->{<H::mce|onerror>}, $self, $etype,
2897     octets => \$r);
2898     }
2899     }__;
2900     }
2901    
2902     @Test:
2903     @@QName: MCShiftJISDecodeHandle.1.test
2904     @@PerlDef:
2905     my $impl;
2906     __CODE{createImplForTest:: $impl => $impl}__;
2907    
2908     my @testdata = (
2909     {
2910     id => q<l=0>,
2911     in => q<>,
2912     out => [null],
2913     },
2914     {
2915     id => q<l=1.00>,
2916     in => qq<\x00>,
2917     out => [null, "\x00", null],
2918     },
2919     {
2920     id => q<l=1.0d>,
2921     in => qq<\x0D>,
2922     out => [null, "\x0D", null],
2923     },
2924     {
2925     id => q<l=1.0e>,
2926     in => qq<\x0E>,
2927     out => [null, "\x0E", null],
2928     }, # Error??
2929     {
2930     id => q<l=1.0f>,
2931     in => qq<\x0F>,
2932     out => [null, "\x0F", null],
2933     }, # Error??
2934     {
2935     id => q<l=1.1b>,
2936     in => qq<\x1B>,
2937     out => [null, "\x1B", null],
2938     }, # Error??
2939     {
2940     id => q<l=1.a>,
2941     in => q<a>,
2942     out => [null, "a", null],
2943     },
2944     {
2945     id => q<l=1.20>,
2946     in => qq<\x20>,
2947     out => [null, "\x20", null],
2948     },
2949     {
2950     id => q<l=1.5C>,
2951     in => qq<\x5C>,
2952     out => [null, "\xA5", null], # YEN SIGN
2953     },
2954     {
2955     id => q<l=1.7E>,
2956     in => qq<\x7E>,
2957     out => [null, "\x{203E}", null], # OVERLINE
2958     },
2959     {
2960     id => q<l=1.7F>,
2961     in => qq<\x7F>,
2962     out => [null, "\x7F", null],
2963     },
2964     {
2965     id => q<l=1.80>,
2966     in => qq<\x80>,
2967     out => [null, "\x80", [<Q::cs|unassigned-code-point-error>]],
2968     },
2969     {
2970     id => q<l=1.8c>,
2971     in => qq<\x8C>,
2972     out => [null, "\x8C", [<Q::cs|illegal-octets-error>]],
2973     },
2974     {
2975     id => q<l=1.8e>,
2976     in => qq<\x8E>,
2977     out => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
2978     },
2979     {
2980     id => q<l=1.8f>,
2981     in => qq<\x8F>,
2982     out => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
2983     },
2984     {
2985     id => q<l=1.a0>,
2986     in => qq<\xA0>,
2987     out => [null, "\xA0", [<Q::cs|unassigned-code-point-error>]],
2988     },
2989     {
2990     id => q<l=1.a1>,
2991     in => qq<\xA1>,
2992     out => [null, "\x{FF61}", null],
2993     },
2994     {
2995     id => q<l=1.a2>,
2996     in => qq<\xA2>,
2997     out => [null, "\x{FF62}", null],
2998     },
2999     {
3000     id => q<l=1.df>,
3001     in => qq<\xdf>,
3002     out => [null, "\x{FF9F}", null],
3003     },
3004     {
3005     id => q<l=1.e0>,
3006     in => qq<\xe0>,
3007     out => [null, "\xE0", [<Q::cs|illegal-octets-error>]],
3008     },
3009     {
3010     id => q<l=1.ef>,
3011     in => qq<\xEF>,
3012     out => [null, "\xEF", [<Q::cs|illegal-octets-error>]],
3013     },
3014     {
3015     id => q<F0>,
3016     in => qq<\xF0>,
3017     out => [null, "\xF0", [<Q::cs|unassigned-code-point-error>]],
3018     },
3019     {
3020     id => q<l=1.fc>,
3021     in => qq<\xFC>,
3022     out => [null, "\xFC", [<Q::cs|unassigned-code-point-error>]],
3023     },
3024     {
3025     id => q<l=1.fd>,
3026     in => qq<\xFD>,
3027     out => [null, "\xFD", [<Q::cs|unassigned-code-point-error>]],
3028     },
3029     {
3030     id => q<l=1.fe>,
3031     in => qq<\xFE>,
3032     out => [null, "\xFE", [<Q::cs|unassigned-code-point-error>]],
3033     },
3034     {
3035     id => q<l=1.ff>,
3036     in => qq<\xFF>,
3037     out => [null, "\xFF", [<Q::cs|unassigned-code-point-error>]],
3038     },
3039     {
3040     id => q<l=2.0000>,
3041     in => qq<\x00\x00>,
3042     out => [null, "\x00", null, "\x00", null],
3043     },
3044     {
3045     id => q<l=2.0D0A>,
3046     in => qq<\x0D\x0A>,
3047     out => [null, "\x0D", null, "\x0A", null],
3048     },
3049     {
3050     id => q<l=2.1B28>,
3051     in => qq<\x1B\x28>,
3052     out => [null, "\x1B", null, "\x28", null],
3053     },# Error??
3054     {
3055     id => q<l=2.2020>,
3056     in => qq<\x20\x20>,
3057     out => [null, "\x20", null, "\x20", null],
3058     },
3059     {
3060     id => q<l=2.ab>,
3061     in => qq<ab>,
3062     out => [null, "a", null, "b", null],
3063     },
3064     {
3065     id => q<8040>,
3066     in => qq<\x80\x40>,
3067     out => [null, "\x80", [<Q::cs|unassigned-code-point-error>],
3068     "\x40", null],
3069     },
3070     {
3071     id => q<8100>,
3072     in => qq<\x81\x00>,
3073     out => [null, "\x81\x00", [<Q::cs|unassigned-code-point-error>]],
3074     },
3075     {
3076     id => q<8101>,
3077     in => qq<\x81\x01>,
3078     out => [null, "\x81\x01", [<Q::cs|unassigned-code-point-error>]],
3079     },
3080     {
3081     id => q<813F>,
3082     in => qq<\x81\x3F>,
3083     out => [null, "\x81\x3F", [<Q::cs|unassigned-code-point-error>]],
3084     },
3085     {
3086     id => q<8140>,
3087     in => qq<\x81\x40>,
3088     out => [null, "\x{3000}", null],
3089     },
3090     {
3091     id => q<8141>,
3092     in => qq<\x81\x41>,
3093     out => [null, "\x{3001}", null],
3094     },
3095     {
3096     id => q<8143>,
3097     in => qq<\x81\x43>,
3098     out => [null, "\x{FF0C}", null], # FULLWIDTH COMMA
3099     },
3100     {
3101     id => q<8150>,
3102     in => qq<\x81\x50>,
3103     out => [null, "\x{FFE3}", null], # FULLWIDTH MACRON
3104     },
3105     {
3106     id => q<815C>,
3107     in => qq<\x81\x5C>,
3108     out => [null, "\x{2014}", null], # EM DASH
3109     },
3110     {
3111     id => q<815F>,
3112     in => qq<\x81\x5F>,
3113     out => [null, "\x{005C}", null], # REVERSE SOLIDUS
3114     },
3115     {
3116     id => q<8160>,
3117     in => qq<\x81\x60>,
3118     out => [null, "\x{301C}", null], # WAVE DASH
3119     },
3120     {
3121     id => q<8161>,
3122     in => qq<\x81\x61>,
3123     out => [null, "\x{2016}", null], # DOUBLE VERTICAL LINE
3124     },
3125     {
3126     id => q<8163>,
3127     in => qq<\x81\x63>,
3128     out => [null, "\x{2026}", null], # HORIZONTAL ELLIPSIS
3129     },
3130     {
3131     id => q<817C>,
3132     in => qq<\x81\x7C>,
3133     out => [null, "\x{2212}", null], # MINUS SIGN
3134     },
3135     {
3136     id => q<817F>,
3137     in => qq<\x81\x7F>,
3138     out => [null, "\x81\x7F", [<Q::cs|unassigned-code-point-error>]],
3139     },
3140     {
3141     id => q<818F>,
3142     in => qq<\x81\x8F>,
3143     out => [null, "\x{FFE5}", null], # FULLWIDTH YEN SIGN
3144     },
3145     {
3146     id => q<8191>,
3147     in => qq<\x81\x91>,
3148     out => [null, "\x{00A2}", null], # CENT SIGN
3149     },
3150     {
3151     id => q<8192>,
3152     in => qq<\x81\x92>,
3153     out => [null, "\x{00A3}", null], # POUND SIGN
3154     },
3155     {
3156     id => q<81AC>,
3157     in => qq<\x81\xAC>,
3158     out => [null, "\x{3013}", null], # GETA MARK
3159     },
3160     {
3161     id => q<81AD>,
3162     in => qq<\x81\xAD>,
3163     out => [null, "\x81\xAD", [<Q::cs|unassigned-code-point-error>]],
3164     },
3165     {
3166     id => q<81B8>,
3167     in => qq<\x81\xB8>,
3168     out => [null, "\x{2208}", null], # ELEMENT OF
3169     },
3170     {
3171     id => q<81CA>,
3172     in => qq<\x81\xCA>,
3173     out => [null, "\x{00AC}", null], # NOT SIGN
3174     },
3175     {
3176     id => q<81FC>,
3177     in => qq<\x81\xFC>,
3178     out => [null, "\x{25EF}", null], # LARGE CIRCLE
3179     },
3180     {
3181     id => q<81FD>,
3182     in => qq<\x81\xFD>,
3183     out => [null, "\x81\xFD", [<Q::cs|unassigned-code-point-error>]],
3184     },
3185     {
3186     id => q<81FE>,
3187     in => qq<\x81\xFE>,
3188     out => [null, "\x81\xFE", [<Q::cs|unassigned-code-point-error>]],
3189     },
3190     {
3191     id => q<81FF>,
3192     in => qq<\x81\xFF>,
3193     out => [null, "\x81\xFF", [<Q::cs|unassigned-code-point-error>]],
3194     },
3195     {
3196     id => q<DDDE>,
3197     in => qq<\xDD\xDE>,
3198     out => [null, "\x{FF9D}", null, "\x{FF9E}", null],
3199     },
3200     {
3201     id => q<e040>,
3202     in => qq<\xE0\x40>,
3203     out => [null, "\x{6F3E}", null],
3204     },
3205     {
3206     id => q<eaa4>,
3207     in => qq<\xEA\xA4>,
3208     out => [null, "\x{7199}", null],
3209     },
3210     {
3211     id => q<eaa5>,
3212     in => qq<\xEA\xA5>,
3213     out => [null, "\xEA\xA5", [<Q::cs|unassigned-code-point-error>]],
3214     },
3215     {
3216     id => q<eb40>,
3217     in => qq<\xEB\x40>,
3218     out => [null, "\xEB\x40", [<Q::cs|unassigned-code-point-error>]],
3219     },
3220     {
3221     id => q<ed40>,
3222     in => qq<\xED\x40>,
3223     out => [null, "\xED\x40", [<Q::cs|unassigned-code-point-error>]],
3224     },
3225     {
3226     id => q<effc>,
3227     in => qq<\xEF\xFC>,
3228     out => [null, "\xEF\xFC", [<Q::cs|unassigned-code-point-error>]],
3229     },
3230     {
3231     id => q<f040>,
3232     in => qq<\xF0\x40>,
3233     out => [null, "\xF0", [<Q::cs|unassigned-code-point-error>],
3234     "\x40", null],
3235     },
3236     {
3237     id => q<f140>,
3238     in => qq<\xF1\x40>,
3239     out => [null, "\xF1", [<Q::cs|unassigned-code-point-error>],
3240     "\x40", null],
3241     },
3242     {
3243     id => q<fb40>,
3244     in => qq<\xFB\x40>,
3245     out => [null, "\xFB", [<Q::cs|unassigned-code-point-error>],
3246     "\x40", null],
3247     },
3248     {
3249     id => q<fc40>,
3250     in => qq<\xFc\x40>,
3251     out => [null, "\xFC", [<Q::cs|unassigned-code-point-error>],
3252     "\x40", null],
3253     },
3254     {
3255     id => q<fd40>,
3256     in => qq<\xFD\x40>,
3257     out => [null, "\xFD", [<Q::cs|unassigned-code-point-error>],
3258     "\x40", null],
3259     },
3260     {
3261     id => q<fE40>,
3262     in => qq<\xFE\x40>,
3263     out => [null, "\xFE", [<Q::cs|unassigned-code-point-error>],
3264     "\x40", null],
3265     },
3266     {
3267     id => q<ff40>,
3268     in => qq<\xFF\x40>,
3269     out => [null, "\xFF", [<Q::cs|unassigned-code-point-error>],
3270     "\x40", null],
3271     },
3272     {
3273     id => q<81408142>,
3274     in => qq<\x81\x40\x81\x42>,
3275     out => [null, "\x{3000}", null, "\x{3002}", null],
3276     },
3277     );
3278    
3279     for my $testdata (@testdata) {
3280     my $byte = $testdata->{in};
3281     my $error;
3282     my $i = 0;
3283    
3284     open my $fh, '<', \$byte;
3285     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
3286     (<Q::cs|XML.shift_jis>, $fh, sub {
3287     my (null, null, $etype, %opt) = @_;
3288     $error = [$etype, \%opt];
3289     });
3290    
3291     while (@{$testdata->{out}}) {
3292     if ($i != 0) {
3293     my $c = shift @{$testdata->{out}};
3294     $test->id ("$testdata->{id}.$i");
3295     $test->assert_equals ($efh->getc, $c);
3296     }
3297    
3298     my $v = shift @{$testdata->{out}};
3299     $test->id ("$testdata->{id}.$i.error");
3300     if (defined $v) {
3301     $test->assert_not_null ($error);
3302     $test->assert_equals ($error->[0], $v->[0]);
3303     } else {
3304     $test->assert_null ($error->[0]);
3305     }
3306     $error = null;
3307     $i++;
3308     }
3309    
3310     $test->id ("$testdata->{id}.eof");
3311     $test->assert_null ($efh->getc);
3312     $test->assert_null ($error);
3313     } # testdata
3314     ##MCShiftJISDecodeHandle
3315 wakaba 1.6
3316     ClsDef:
3317     @ClsQName: ManakaiMCISO2022JPDecodeHandle
3318    
3319     @ClsISA: ManakaiMCDecodeHandle
3320    
3321     @Method:
3322     @@Name: getc
3323     @@enDesc:
3324     Returns the next character from the input.
3325     @@Return:
3326     @@@Type: String
3327     @@@enDesc:
3328     The next character.
3329     @@@nullCase:
3330     @@@@enDesc:
3331     If at the end of the file, or if there was an error, in which
3332     case <Perl::$!> is set.
3333     @@@PerlDef:
3334     if (@{$self->{<H::mce|characterQueue>}}) {
3335     $r = shift @{$self->{<H::mce|characterQueue>}};
3336     } else {
3337     __DEEP{
3338     A: {
3339     my $error;
3340     if ($self->{<H::mce|continue>}) {
3341     if (read $self->{<H::mce|filehandle>},
3342     $self->{<H::mce|byteBuffer>}, 256,
3343     length $self->{<H::mce|byteBuffer>}) {
3344     #
3345     } else {
3346     $error = true;
3347     }
3348     $self->{<H::mce|continue>} = false;
3349     } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
3350     read $self->{<H::mce|filehandle>},
3351     $self->{<H::mce|byteBuffer>}, 256,
3352     length $self->{<H::mce|byteBuffer>};
3353     }
3354    
3355     unless ($error) {
3356     if ($self->{<H::mce|byteBuffer>}
3357     =~ s/^\x1B(\x24[\x40\x42]|\x28[\x42\x4A])//) {
3358     $self->{<H::mce|state>} = {
3359     "\x24\x40" => <H::mce|State.2440>,
3360     "\x24\x42" => <H::mce|State.2442>,
3361     "\x28\x42" => <H::mce|State.2842>,
3362     "\x28\x4A" => <H::mce|State.284A>,
3363     }->{$1};
3364     redo A;
3365     } elsif ($self->{<H::mce|state>} eq <H::mce|State.2842>) { # IRV
3366     if ($self->{<H::mce|byteBuffer>}
3367     =~ s/^([\x00-\x0D\x10-\x1A\x1C-\x7F]+)//) {
3368     push @{$self->{<H::mce|characterQueue>}}, split //, $1;
3369     $r = shift @{$self->{<H::mce|characterQueue>}};
3370     } else {
3371     if (length $self->{<H::mce|byteBuffer>}) {
3372     $error = true;
3373     } else {
3374     $r = null;
3375     }
3376     }
3377     } elsif ($self->{<H::mce|state>} eq <H::mce|State.284A>) { # 0201
3378     if ($self->{<H::mce|byteBuffer>}
3379     =~ s/^([\x00-\x0D\x10-\x1A\x1C-\x7F]+)//) {
3380     my $v = $1; $v =~ tr/\x5C\x7E/\xA5\x{203E}/;
3381     push @{$self->{<H::mce|characterQueue>}}, split //, $v;
3382     $r = shift @{$self->{<H::mce|characterQueue>}};
3383     } else {
3384     if (length $self->{<H::mce|byteBuffer>}) {
3385     $error = true;
3386     } else {
3387     $r = null;
3388     $self->{<H::mce|onerror>}
3389     ->($self->{<H::mce|onerror>}, $self,
3390     <Q::cs|invalid-state-error>,
3391     state => $self->{<H::mce|state>});
3392     }
3393     }
3394     } elsif ($self->{<H::mce|state>} eq <H::mce|State.2442>) { # 1983
3395     my $v = Encode::decode ($self->{<H::mce|State.2442>},
3396     $self->{<H::mce|byteBuffer>},
3397     Encode::FB_QUIET ());
3398     if (length $v) {
3399     push @{$self->{<H::mce|characterQueue>}}, split //, $v;
3400     $r = shift @{$self->{<H::mce|characterQueue>}};
3401     } else {
3402     if (length $self->{<H::mce|byteBuffer>}) {
3403     $error = true;
3404     } else {
3405     $r = null;
3406     $self->{<H::mce|onerror>}
3407     ->($self->{<H::mce|onerror>}, $self,
3408     <Q::cs|invalid-state-error>,
3409     state => $self->{<H::mce|state>});
3410     }
3411     }
3412     } elsif ($self->{<H::mce|state>} eq <H::mce|State.2440>) { # 1978
3413     my $v = Encode::decode ($self->{<H::mce|State.2440>},
3414     $self->{<H::mce|byteBuffer>},
3415     Encode::FB_QUIET ());
3416     if (length $v) {
3417     push @{$self->{<H::mce|characterQueue>}}, split //, $v;
3418     $r = shift @{$self->{<H::mce|characterQueue>}};
3419     } else {
3420     if (length $self->{<H::mce|byteBuffer>}) {
3421     $error = true;
3422     } else {
3423     $r = null;
3424     $self->{<H::mce|onerror>}
3425     ->($self->{<H::mce|onerror>}, $self,
3426     <Q::cs|invalid-state-error>,
3427     state => $self->{<H::mce|state>});
3428     }
3429     }
3430     } else {
3431     $error = true;
3432     }
3433     }
3434     if ($error) {
3435     $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
3436     my $etype = <Q::cs|illegal-octets-error>;
3437     if (($self->{<H::mce|state>} eq <H::mce|State.2442> or
3438     $self->{<H::mce|state>} eq <H::mce|State.2440>) and
3439     $r =~ /^[\x21-\x7E]/ and
3440     $self->{<H::mce|byteBuffer>} =~ s/^([\x21-\x7E])//) {
3441     $r .= $1;
3442     $etype = <Q::cs|unassigned-code-point-error>;
3443     } elsif ($r eq "\x1B" and
3444     $self->{<H::mce|byteBuffer>} =~ s/^\(H//) { # Old 0201
3445     $r .= "(H";
3446     $self->{<H::mce|state>} = <H::mce|State.284A>;
3447     }
3448     $self->{<H::mce|onerror>}
3449     ->($self->{<H::mce|onerror>}, $self, $etype,
3450     octets => \$r);
3451     }
3452     } # A
3453     }__;
3454     }
3455    
3456     @Test:
3457     @@QName: MCISO2022JPDecodeHandle.1.test
3458     @@PerlDef:
3459     my $impl;
3460     __CODE{createImplForTest:: $impl => $impl}__;
3461    
3462     my @testdata = (
3463     {
3464     id => q<l=0>,
3465     in => q<>,
3466     out1 => [null],
3467     out2 => [null],
3468     },
3469     {
3470     id => q<l=1.00>,
3471     in => qq<\x00>,
3472     out1 => [null, "\x00", null],
3473     out2 => [null, "\x00", null],
3474     },
3475     {
3476     id => q<l=1.0d>,
3477     in => qq<\x0D>,
3478     out1 => [null, "\x0D", null],
3479     out2 => [null, "\x0D", null],
3480     }, # Error?
3481     {
3482     id => q<0A>,
3483     in => qq<\x0A>,
3484     out1 => [null, "\x0A", null],
3485     out2 => [null, "\x0A", null],
3486     }, # Error?
3487     {
3488     id => q<l=1.0e>,
3489     in => qq<\x0E>,
3490     out1 => [null, "\x0E", [<Q::cs|illegal-octets-error>]],
3491     out2 => [null, "\x0E", [<Q::cs|illegal-octets-error>]],
3492     },
3493     {
3494     id => q<l=1.0f>,
3495     in => qq<\x0F>,
3496     out1 => [null, "\x0F", [<Q::cs|illegal-octets-error>]],
3497     out2 => [null, "\x0F", [<Q::cs|illegal-octets-error>]],
3498     },
3499     {
3500     id => q<l=1.1b>,
3501     in => qq<\x1B>,
3502     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>]],
3503     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>]],
3504     },
3505     {
3506     id => q<l=1.a>,
3507     in => q<a>,
3508     out1 => [null, "a", null],
3509     out2 => [null, "a", null],
3510     },
3511     {
3512     id => q<l=1.20>,
3513     in => qq<\x20>,
3514     out1 => [null, "\x20", null],
3515     out2 => [null, "\x20", null],
3516     },
3517     {
3518     id => q<l=1.5C>,
3519     in => qq<\x5C>,
3520     out1 => [null, "\x5C", null],
3521     out2 => [null, "\x5C", null],
3522     },
3523     {
3524     id => q<l=1.7E>,
3525     in => qq<\x7E>,
3526     out1 => [null, "\x7E", null],
3527     out2 => [null, "\x7E", null],
3528     },
3529     {
3530     id => q<l=1.7F>,
3531     in => qq<\x7F>,
3532     out1 => [null, "\x7F", null],
3533     out2 => [null, "\x7F", null],
3534     },
3535     {
3536     id => q<l=1.80>,
3537     in => qq<\x80>,
3538     out1 => [null, "\x80", [<Q::cs|illegal-octets-error>]],
3539     out2 => [null, "\x80", [<Q::cs|illegal-octets-error>]],
3540     },
3541     {
3542     id => q<l=1.8c>,
3543     in => qq<\x8C>,
3544     out1 => [null, "\x8C", [<Q::cs|illegal-octets-error>]],
3545     out2 => [null, "\x8C", [<Q::cs|illegal-octets-error>]],
3546     },
3547     {
3548     id => q<l=1.8e>,
3549     in => qq<\x8E>,
3550     out1 => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
3551     out2 => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
3552     },
3553     {
3554     id => q<l=1.8f>,
3555     in => qq<\x8F>,
3556     out1 => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
3557     out2 => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
3558     },
3559     {
3560     id => q<l=1.a0>,
3561     in => qq<\xA0>,
3562     out1 => [null, "\xA0", [<Q::cs|illegal-octets-error>]],
3563     out2 => [null, "\xA0", [<Q::cs|illegal-octets-error>]],
3564     },
3565     {
3566     id => q<l=1.a1>,
3567     in => qq<\xA1>,
3568     out1 => [null, "\xA1", [<Q::cs|illegal-octets-error>]],
3569     out2 => [null, "\xA1", [<Q::cs|illegal-octets-error>]],
3570     },
3571     {
3572     id => q<l=1.a2>,
3573     in => qq<\xA2>,
3574     out1 => [null, "\xA2", [<Q::cs|illegal-octets-error>]],
3575     out2 => [null, "\xA2", [<Q::cs|illegal-octets-error>]],
3576     },
3577     {
3578     id => q<l=1.df>,
3579     in => qq<\xdf>,
3580     out1 => [null, "\xDF", [<Q::cs|illegal-octets-error>]],
3581     out2 => [null, "\xDF", [<Q::cs|illegal-octets-error>]],
3582     },
3583     {
3584     id => q<l=1.e0>,
3585     in => qq<\xe0>,
3586     out1 => [null, "\xE0", [<Q::cs|illegal-octets-error>]],
3587     out2 => [null, "\xE0", [<Q::cs|illegal-octets-error>]],
3588     },
3589     {
3590     id => q<l=1.ef>,
3591     in => qq<\xEF>,
3592     out1 => [null, "\xEF", [<Q::cs|illegal-octets-error>]],
3593     out2 => [null, "\xEF", [<Q::cs|illegal-octets-error>]],
3594     },
3595     {
3596     id => q<F0>,
3597     in => qq<\xF0>,
3598     out1 => [null, "\xF0", [<Q::cs|illegal-octets-error>]],
3599     out2 => [null, "\xF0", [<Q::cs|illegal-octets-error>]],
3600     },
3601     {
3602     id => q<l=1.fc>,
3603     in => qq<\xFC>,
3604     out1 => [null, "\xFC", [<Q::cs|illegal-octets-error>]],
3605     out2 => [null, "\xFC", [<Q::cs|illegal-octets-error>]],
3606     },
3607     {
3608     id => q<l=1.fd>,
3609     in => qq<\xFD>,
3610     out1 => [null, "\xFD", [<Q::cs|illegal-octets-error>]],
3611     out2 => [null, "\xFD", [<Q::cs|illegal-octets-error>]],
3612     },
3613     {
3614     id => q<l=1.fe>,
3615     in => qq<\xFE>,
3616     out1 => [null, "\xFE", [<Q::cs|illegal-octets-error>]],
3617     out2 => [null, "\xFE", [<Q::cs|illegal-octets-error>]],
3618     },
3619     {
3620     id => q<l=1.ff>,
3621     in => qq<\xFF>,
3622     out1 => [null, "\xFF", [<Q::cs|illegal-octets-error>]],
3623     out2 => [null, "\xFF", [<Q::cs|illegal-octets-error>]],
3624     },
3625     {
3626     id => q<l=2.0000>,
3627     in => qq<\x00\x00>,
3628     out1 => [null, "\x00", null, "\x00", null],
3629     out2 => [null, "\x00", null, "\x00", null],
3630     },
3631     {
3632     id => q<l=2.0D0A>,
3633     in => qq<\x0D\x0A>,
3634     out1 => [null, "\x0D", null, "\x0A", null],
3635     out2 => [null, "\x0D", null, "\x0A", null],
3636     },
3637     {
3638     id => q<l=2.1B1B>,
3639     in => qq<\x1B\x1B>,
3640     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>],
3641     "\x1B", [<Q::cs|illegal-octets-error>]],
3642     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>],
3643     "\x1B", [<Q::cs|illegal-octets-error>]],
3644     },
3645     {
3646     id => q<l=2.1B20>,
3647     in => qq<\x1B\x20>,
3648     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x20", null],
3649     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x20", null],
3650     },
3651     {
3652     id => q<l=2.1B24>,
3653     in => qq<\x1B\x24>,
3654     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x24", null],
3655     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x24", null],
3656     },
3657     {
3658     id => q<l=2.1B28>,
3659     in => qq<\x1B\x28>,
3660     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x28", null],
3661     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x28", null],
3662     },
3663     {
3664     id => q<l=2.2020>,
3665     in => qq<\x20\x20>,
3666     out1 => [null, "\x20", null, "\x20", null],
3667     out2 => [null, "\x20", null, "\x20", null],
3668     },
3669     {
3670     id => q<l=2.ab>,
3671     in => qq<ab>,
3672     out1 => [null, "a", null, "b", null],
3673     out2 => [null, "a", null, "b", null],
3674     },
3675     {
3676     id => q<8040>,
3677     in => qq<\x80\x40>,
3678     out1 => [null, "\x80", [<Q::cs|illegal-octets-error>],
3679     "\x40", null],
3680     out2 => [null, "\x80", [<Q::cs|illegal-octets-error>],
3681     "\x40", null],
3682     },
3683     {
3684     id => q<1B2440>,
3685     in => qq<\x1B\x24\x40>,
3686     out1 => [null],
3687     out2 => [null],
3688     eof_error => [<Q::cs|invalid-state-error>],
3689     },
3690     {
3691     id => q<1B2442>,
3692     in => qq<\x1B\x24\x42>,
3693     out1 => [null],
3694     out2 => [null],
3695     eof_error => [<Q::cs|invalid-state-error>],
3696     },
3697     {
3698     id => q<1B2840>,
3699     in => qq<\x1B\x28\x40>,
3700     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "(", null,
3701     "\x40", null],
3702     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "(", null,
3703     "\x40", null],
3704     },
3705     {
3706     id => q<1B2842>,
3707     in => qq<\x1B\x28\x42>,
3708     out1 => [null],
3709     out2 => [null],
3710     },
3711     {
3712     id => q<1B284A>,
3713     in => qq<\x1B\x28\x4A>,
3714     out1 => [null],
3715     out2 => [null],
3716     eof_error => [<Q::cs|invalid-state-error>],
3717     },
3718     {
3719     id => q<1B$B1B(B>,
3720     in => qq<\x1B\x24\x42\x1B\x28\x42>,
3721     out1 => [null],
3722     out2 => [null],
3723     },
3724     {
3725     id => q<1B(B1B(B>,
3726     in => qq<\x1B\x28\x42\x1B\x28\x42>,
3727     out1 => [null],
3728     out2 => [null],
3729     },
3730     {
3731     id => q<1B(Ba1B(B>,
3732     in => qq<\x1B\x28\x42a\x1B\x28\x42>,
3733     out1 => [null, "a", null],
3734     out2 => [null, "a", null],
3735     },
3736     {
3737     id => q<1B(Ba1B(B1B(B>,
3738     in => qq<\x1B\x28\x42a\x1B\x28\x42\x1B\x28\x42>,
3739     out1 => [null, "a", null],
3740     out2 => [null, "a", null],
3741     },
3742     {
3743     id => q<1B$42!!1B2842>,
3744     in => qq<\x1B\x24\x42!!\x1B\x28\x42>,
3745     out1 => [null, "\x{3000}", null],
3746     out2 => [null, "\x{3000}", null],
3747     },
3748     {
3749     id => q<1B$4221211B284A>,
3750     in => qq<\x1B\x24\x42!!\x1B\x28\x4A>,
3751     out1 => [null, "\x{3000}", null],
3752     out2 => [null, "\x{3000}", null],
3753     eof_error => [<Q::cs|invalid-state-error>],
3754     },
3755     {
3756     id => q<1B$4021211B2842>,
3757     in => qq<\x1B\x24\x40!!\x1B\x28\x42>,
3758     out1 => [null, "\x{3000}", null],
3759     out2 => [null, "\x{3000}", null],
3760     },
3761     {
3762     id => q<1B$402121211B2842>,
3763     in => qq<\x1B\x24\x40!!!\x1B\x28\x42>,
3764     out1 => [null, "\x{3000}", null, "!", [<Q::cs|illegal-octets-error>]],
3765     out2 => [null, "\x{3000}", null, "!", [<Q::cs|illegal-octets-error>]],
3766     },
3767     {
3768     id => q<1B$4021211B2442!!1B2842>,
3769     in => qq<\x1B\x24\x40!!\x1B\x24\x42!!\x1B\x28\x42>,
3770     out1 => [null, "\x{3000}", null, "\x{3000}", null],
3771     out2 => [null, "\x{3000}", null, "\x{3000}", null],
3772     },
3773     {
3774     id => q<1B$4021211B2440!!1B2842>,
3775     in => qq<\x1B\x24\x40!!\x1B\x24\x40!!\x1B\x28\x42>,
3776     out1 => [null, "\x{3000}", null, "\x{3000}", null],
3777     out2 => [null, "\x{3000}", null, "\x{3000}", null],
3778     },
3779     {
3780     id => q<1B$@!"1B(B\~|>,
3781     in => qq<\x1B\x24\x40!"\x1B(B\\~|>,
3782     out1 => [null, "\x{3001}", null, "\x5C", null,
3783     "\x7E", null, "|", null],
3784     out2 => [null, "\x{3001}", null, "\x5C", null,
3785     "\x7E", null, "|", null],
3786     },
3787     {
3788     id => q<1B$B!"1B(J\~|1B(B>,
3789     in => qq<\x1B\x24\x42!"\x1B(J\\~|\x1B(B>,
3790     out1 => [null, "\x{3001}", null, "\xA5", null,
3791     "\x{203E}", null, "|", null],
3792     out2 => [null, "\x{3001}", null, "\xA5", null,
3793     "\x{203E}", null, "|", null],
3794     },
3795     {
3796     id => q<78compat.3022(16-02)>,
3797     in => qq<\x1B\$\@\x30\x22\x1B\$B\x30\x22\x1B(B>,
3798     out1 => [null, "\x{555E}", null, "\x{5516}", null],
3799     out2 => [null, "\x{5516}", null, "\x{5516}", null],
3800     },
3801     {
3802     id => q<unassigned.2239>,
3803     in => qq<\x1B\$\@\x22\x39\x1B\$B\x22\x39\x1B(B>,
3804     out1 => [null, "\x22\x39", [<Q::cs|unassigned-code-point-error>],
3805     "\x22\x39", [<Q::cs|unassigned-code-point-error>]],
3806     out2 => [null, "\x22\x39", [<Q::cs|unassigned-code-point-error>],
3807     "\x22\x39", [<Q::cs|unassigned-code-point-error>]],
3808     },
3809     {
3810     id => q<83add.223A>,
3811     in => qq<\x1B\$\@\x22\x3A\x1B\$B\x22\x3A\x1B(B>,
3812     out1 => [null, "\x22\x3A", [<Q::cs|unassigned-code-point-error>],
3813     "\x{2208}", null],
3814     out2 => [null, "\x{2208}", null, "\x{2208}", null],
3815     },
3816     {
3817     id => q<83add.2840>,
3818     in => qq<\x1B\$\@\x28\x40\x1B\$B\x28\x40\x1B(B>,
3819     out1 => [null, "\x28\x40", [<Q::cs|unassigned-code-point-error>],
3820     "\x{2542}", null],
3821     out2 => [null, "\x{2542}", null, "\x{2542}", null],
3822     },
3823     {
3824     id => q<83add.7421>,
3825     in => qq<\x1B\$\@\x74\x21\x1B\$B\x74\x21\x1B(B>,
3826     out1 => [null, "\x74\x21", [<Q::cs|unassigned-code-point-error>],
3827     "\x{582F}", null],
3828     out2 => [null, "\x{5C2D}", null, "\x{582F}", null],
3829     },
3830     {
3831     id => q<83swap.3033>,
3832     in => qq<\x1B\$\@\x30\x33\x1B\$B\x30\x33\x1B(B>,
3833     out1 => [null, "\x{9C3A}", null, "\x{9BF5}", null],
3834     out2 => [null, "\x{9C3A}", null, "\x{9BF5}", null],
3835     },
3836     {
3837     id => q<83swap.724D>,
3838     in => qq<\x1B\$\@\x72\x4D\x1B\$B\x72\x4D\x1B(B>,
3839     out1 => [null, "\x{9BF5}", null, "\x{9C3A}", null],
3840     out2 => [null, "\x{9BF5}", null, "\x{9C3A}", null],
3841     },
3842     {
3843     id => q<90add.7425>,
3844     in => qq<\x1B\$\@\x74\x25\x1B\$B\x74\x25\x1B(B>,
3845     out1 => [null, "\x74\x25", [<Q::cs|unassigned-code-point-error>],
3846     "\x74\x25", [<Q::cs|unassigned-code-point-error>]],
3847     out2 => [null, "\x{51DC}", null, "\x{51DC}", null],
3848     },
3849     {
3850     id => q<90add.7426>,
3851     in => qq<\x1B\$\@\x74\x26\x1B\$B\x74\x26\x1B(B>,
3852     out1 => [null, "\x74\x26", [<Q::cs|unassigned-code-point-error>],
3853     "\x74\x26", [<Q::cs|unassigned-code-point-error>]],
3854     out2 => [null, "\x{7199}", null, "\x{7199}", null],
3855     },
3856     );
3857    
3858     for my $testdata (@testdata) {
3859     for my $c (1..2) {
3860     my $byte = $testdata->{in};
3861     my $error;
3862     my $i = 0;
3863    
3864     open my $fh, '<', \$byte;
3865     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
3866     ([null, <Q::icharset|iso-2022-jp>,
3867     <Q::cs|XML.iso-2022-jp>]->[$c], $fh, sub {
3868     my (null, null, $etype, %opt) = @_;
3869     $error = [$etype, \%opt];
3870     });
3871    
3872     while (@{$testdata->{"out$c"}}) {
3873     if ($i != 0) {
3874     my $c = shift @{$testdata->{"out$c"}};
3875     $test->id ("$testdata->{id}.$i.$c");
3876     $test->assert_equals ($efh->getc, $c);
3877     }
3878    
3879     my $v = shift @{$testdata->{"out$c"}};
3880     $test->id ("$testdata->{id}.$i.error.$c");
3881     if (defined $v) {
3882     $test->assert_not_null ($error);
3883     $test->assert_equals ($error->[0], $v->[0]);
3884     } else {
3885     $test->assert_null ($error->[0]);
3886     }
3887     $error = null;
3888     $i++;
3889     }
3890    
3891     $test->id ("$testdata->{id}.eof.$c");
3892     $test->assert_null ($efh->getc);
3893    
3894     my $v = $testdata->{eof_error};
3895     $test->id ("$testdata->{id}.$i.error.$c");
3896     if (defined $v) {
3897     $test->assert_not_null ($error);
3898     $test->assert_equals ($error->[0], $v->[0]);
3899     } else {
3900     $test->assert_null ($error->[0]);
3901     }
3902     $error = null;
3903     }} # testdata
3904     ##MCISO2022JPDecodeHandle
3905    
3906     PropDef:
3907     @QName: mce|state
3908     @mce:key: s
3909     @enDesc:
3910     ISO-2022-JP state.
3911    
3912     ResourceDef:
3913     @QName: mce|State.2440
3914     @mce:key: s1
3915     @enDesc:
3916     ISO-2022-JP state: ESC 2/4 4/0.
3917     @DISCore:resourceType: DISCore|Resource
3918     @DISCore:resourceType: DISCore|Property
3919    
3920     ResourceDef:
3921     @QName: mce|State.2442
3922     @mce:key: s2
3923     @enDesc:
3924     ISO-2022-JP state: ESC 2/4 4/2.
3925     @DISCore:resourceType: DISCore|Resource
3926     @DISCore:resourceType: DISCore|Property
3927    
3928     ResourceDef:
3929     @QName: mce|State.2842
3930     @mce:key: s3
3931     @enDesc:
3932     ISO-2022-JP state: ESC 2/8 4/2.
3933     @DISCore:resourceType: DISCore|Resource
3934    
3935     ResourceDef:
3936     @QName: mce|State.284A
3937     @mce:key: s4
3938     @enDesc:
3939     ISO-2022-JP state: ESC 2/8 4/10.
3940     @DISCore:resourceType: DISCore|Resource
3941 wakaba 1.2
3942     ResourceDef:
3943     @QName: String
3944 wakaba 1.10 @AliasFor: str|DOMString
3945 wakaba 1.2
3946     PropDef:
3947 wakaba 1.3 @QName: mce|onerror
3948 wakaba 1.2 @mce:key: onerr
3949 wakaba 1.3
3950     PropDef:
3951     @QName: mce|inputEncoding
3952     @mce:key: ie
3953    
3954     PropDef:
3955     @QName: mce|hasBOM
3956     @mce:key: bom
3957 wakaba 1.2
3958     PropDef:
3959     @QName: mce|continue
3960     @mce:key: cc
3961     @enDesc:
3962     Whether the <Q::mce|byteBuffer> contains octets that might
3963     be part of characters.
3964    
3965     PropDef:
3966     @QName: mce|charset
3967     @mce:key: cs
3968    
3969     PropDef:
3970     @QName: mce|perlEncodingName
3971     @mce:key: enc
3972    
3973     PropDef:
3974     @QName: mce|filehandle
3975     @mce:key: fh
3976    
3977     RPropDef:
3978     @QName: mce|key
3979     @subsetOf: DISPerl|propHashKey
3980    
3981     PropDef:
3982     @QName: mce|characterQueue
3983     @mce:key: cq
3984    
3985     PropDef:
3986     @QName: mce|byteBuffer
3987     @mce:key: bb
3988    
3989     ElementTypeBinding:
3990     @Name: PropDef
3991     @ElementType:
3992     dis:ResourceDef
3993     @ShadowContent:
3994     @@DISCore:resourceType: DISCore|Property
3995     @@ForCheck: =ManakaiDOM|all
3996    
3997     ElementTypeBinding:
3998     @Name: RPropDef
3999     @ElementType:
4000     dis:ResourceDef
4001     @ShadowContent:
4002     @@DISCore:resourceType: DISSource|ResourceProperty
4003     @@ForCheck: =ManakaiDOM|all
4004    
4005    
4006     ElementTypeBinding:
4007     @Name: nullCase
4008     @ElementType:
4009     dis:ResourceDef
4010     @ShadowContent:
4011     @@DISCore:resourceType: ManakaiDOM|InCase
4012     @@Value:
4013     @@@is-null:1
4014     @@@ContentType: DISCore|String
4015    
4016     ElementTypeBinding:
4017     @Name: TrueCase
4018     @ElementType:
4019     dis:ResourceDef
4020     @ShadowContent:
4021     @@DISCore:resourceType: ManakaiDOM|InCase
4022     @@Value:
4023     @@@@: 1
4024     @@@ContentType: DISCore|Boolean
4025     @@Type: idl|boolean||ManakaiDOM|all
4026    
4027     ElementTypeBinding:
4028     @Name: FalseCase
4029     @ElementType:
4030     dis:ResourceDef
4031     @ShadowContent:
4032     @@DISCore:resourceType: ManakaiDOM|InCase
4033     @@Value:
4034     @@@@: 0
4035     @@@ContentType: DISCore|Boolean
4036     @@Type: idl|boolean||ManakaiDOM|all
4037    
4038     ResourceDef:
4039     @DISCore:resourceType: cs|CharsetSet
4040     @cs:moduleRef: DISlib|Charset

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24