/[suikacvs]/messaging/manakai/lib/Message/Charset/Encode.dis
Suika

Contents of /messaging/manakai/lib/Message/Charset/Encode.dis

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.15 - (hide annotations) (download)
Sun Jul 29 08:04:40 2007 UTC (17 years, 8 months ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
Changes since 1.14: +1 -1 lines
FILE REMOVED
++ ChangeLog	29 Jul 2007 08:02:27 -0000
2007-07-29  Wakaba  <wakaba@suika.fam.cx>

	* Makefile.PL: Updated.

++ manakai/lib/ChangeLog	29 Jul 2007 08:03:11 -0000
	* Makefile: No longer recurse into |manakai| and the
	AutoLoad module.

2007-07-29  Wakaba  <wakaba@suika.fam.cx>

++ manakai/lib/Message/ChangeLog	29 Jul 2007 08:03:25 -0000
2007-07-29  Wakaba  <wakaba@suika.fam.cx>

	* Makefile (all): Special rule for |Util| is removed.

	* Charset/: Removed.

++ manakai/lib/Message/DOM/ChangeLog	29 Jul 2007 07:51:50 -0000
	* Text.pm: |Message::DOM::Traversal, an obsolete module,
	was referenced.

2007-07-29  Wakaba  <wakaba@suika.fam.cx>

++ manakai/lib/Message/Util/ChangeLog	29 Jul 2007 08:03:51 -0000
	* Makefile: Rules for DIS are removed.

2007-07-29  Wakaba  <wakaba@suika.fam.cx>

++ manakai/t/ChangeLog	29 Jul 2007 08:04:33 -0000
2007-07-29  Wakaba  <wakaba@suika.fam.cx>

	* Makefile: Rules for DIS tests are removed.

1 wakaba 1.1 Module:
2     @QName: MCharset|Encode
3     @FullName:
4     @@lang: en
5     @@@: Manakai Charset Encode Module
6     @Namespace:
7     http://suika.fam.cx/~wakaba/archive/2005/manakai/Charset/Encode/
8    
9     @enDesc:
10     The <DFN::manakai Charset Encode Module> is ...
11    
12     {TODO::
13     This module is subject to change.
14     }
15    
16     @DISCore:author: DISCore|Wakaba
17     @License: license|Perl+MPL
18     @Date:
19 wakaba 1.15 $Date: 2006/12/30 13:25:32 $
20 wakaba 1.1
21     @Require:
22     @@Module:
23     @@@QName: MDOM|DOMFeature
24     @@Module:
25 wakaba 1.2 @@@QName: DISlib|Charset
26 wakaba 1.8 @@Module:
27     @@@QName: MDOM|DOMCore
28 wakaba 1.10 @@Module:
29     @@@QName: MDOM|DOMString
30 wakaba 1.1
31     Namespace:
32 wakaba 1.8 @c:
33     http://suika.fam.cx/~wakaba/archive/2004/8/18/dom-core#
34 wakaba 1.2 @cs:
35     http://suika.fam.cx/~wakaba/archive/2004/dis/Charset/
36 wakaba 1.1 @dis:
37     http://suika.fam.cx/~wakaba/archive/2004/8/18/lang#dis--
38     @DISlib:
39     http://suika.fam.cx/~wakaba/archive/2004/dis/
40 wakaba 1.8 @domperl:
41     http://suika.fam.cx/~wakaba/archive/2006/dom/perl/
42 wakaba 1.1 @dx:
43     http://suika.fam.cx/~wakaba/archive/2005/manakai/Util/Error/DOMException#
44     @f:
45     http://suika.fam.cx/~wakaba/archive/2004/dom/feature#
46     @fe:
47     http://suika.fam.cx/www/2006/feature/
48 wakaba 1.2 @icharset:
49     urn:x-suika-fam-cx:charset:
50 wakaba 1.1 @idl:
51     http://suika.fam.cx/~wakaba/archive/2004/dis/IDL#
52     @kwd:
53     http://suika.fam.cx/~wakaba/archive/2005/rfc2119/
54     @lang:
55     http://suika.fam.cx/~wakaba/archive/2004/8/18/lang#
56     @license:
57     http://suika.fam.cx/~wakaba/archive/2004/8/18/license#
58     @ManakaiDOM:
59     http://suika.fam.cx/~wakaba/archive/2004/8/18/manakai-dom#
60     @MCharset:
61     http://suika.fam.cx/~wakaba/archive/2005/manakai/Charset/
62 wakaba 1.2 @mce:
63     http://suika.fam.cx/~wakaba/archive/2005/manakai/Charset/Encode/
64 wakaba 1.1 @MDOM:
65     http://suika.fam.cx/~wakaba/archive/2004/8/18/manakai-dom#ManakaiDOM.
66     @mn:
67     http://suika.fam.cx/~wakaba/archive/2005/manakai/Util/ManakaiNode#
68     @rdf:
69     http://www.w3.org/1999/02/22-rdf-syntax-ns#
70     @rdfs:
71     http://www.w3.org/2000/01/rdf-schema#
72 wakaba 1.10 @str:
73     http://suika.fam.cx/~wakaba/archive/2005/manakai/DOM/DOMString/
74 wakaba 1.1 @test:
75     http://suika.fam.cx/~wakaba/archive/2004/dis/Test#
76 wakaba 1.2 @xml-auto-charset:
77     http://suika.fam.cx/www/2006/03/xml-entity/
78 wakaba 1.1
79     ResourceDef:
80     @QName: MCharset|
81     @rdf:type: dis|ModuleGroup
82     @FullName:
83     @@lang:en
84     @@@:
85     The manakai Charset modules
86     @DISPerl:packageName:
87     Message::Charset::
88     @DISPerl:interfacePackageName:
89 wakaba 1.12 Message::Charset::IF::
90 wakaba 1.1
91     ## -- Features
92    
93 wakaba 1.2 FeatureDef:
94     @QName: MCEncodeFeature
95     @featureQName: fe|MCEncode
96     @FeatureVerDef:
97     @@QName: MCEncodeFeature10
98     @@f:instanceOf: MCEncodeFeature
99 wakaba 1.14 @@f:version: 1.0
100 wakaba 1.2 @@enDesc:
101     The manakai Charset Encode Module, version 1.0
102    
103 wakaba 1.1 ElementTypeBinding:
104     @Name: FeatureDef
105     @ElementType:
106     dis:ResourceDef
107     @ShadowContent:
108     @@rdf:type: f|Feature
109    
110     ElementTypeBinding:
111     @Name: FeatureVerDef
112     @ElementType:
113     dis:ResourceDef
114     @ShadowContent:
115     @@rdf:type: f|Feature
116    
117     ElementTypeBinding:
118     @Name: featureQName
119     @ElementType:
120     f:name
121     @ShadowContent:
122     @@ContentType: DISCore|QName
123    
124     ElementTypeBinding:
125     @Name: IFQName
126     @ElementType:
127     dis:QName
128     @ShadowContent:
129     @@ForCheck: ManakaiDOM|ForIF
130    
131     ElementTypeBinding:
132     @Name: ClsQName
133     @ElementType:
134     dis:QName
135     @ShadowContent:
136     @@ForCheck: ManakaiDOM|ForClass
137    
138     ElementTypeBinding:
139     @Name: IFISA
140     @ElementType:
141     dis:ISA
142     @ShadowContent:
143     @@ForCheck: ManakaiDOM|ForIF
144    
145     ElementTypeBinding:
146     @Name: ClsISA
147     @ElementType:
148     dis:ISA
149     @ShadowContent:
150     @@ForCheck: ManakaiDOM|ForClass
151    
152     ElementTypeBinding:
153     @Name: IFClsDef
154     @ElementType:
155     dis:ResourceDef
156     @ShadowContent:
157     @@rdf:type:
158     @@@@: dis|MultipleResource
159     @@@ForCheck: !ManakaiDOM|ForIF !ManakaiDOM|ForClass
160     @@resourceFor: ManakaiDOM|ForIF
161     @@resourceFor:
162     @@@@: ManakaiDOM|ForClass
163    
164     @@rdf:type:
165     @@@@: DISLang|Interface
166     @@@ForCheck: ManakaiDOM|ForIF
167    
168     @@rdf:type:
169     @@@@: DISLang|Class
170     @@@ForCheck: ManakaiDOM|ForClass
171 wakaba 1.11
172 wakaba 1.1 @@Implement:
173 wakaba 1.12 @@@@: ||+||ManakaiDOM|ForIF
174 wakaba 1.1 @@@ContentType: DISCore|TFPQNames
175 wakaba 1.12 @@@ForCheck: ManakaiDOM|ForClass
176 wakaba 1.1
177     @@f:implements:
178 wakaba 1.2 @@@@: MCEncodeFeature10
179    
180     ElementTypeBinding:
181     @Name: ClsDef
182     @ElementType:
183     dis:ResourceDef
184     @ShadowContent:
185     @@rdf:type:
186     @@@@: dis|MultipleResource
187     @@@ForCheck: !ManakaiDOM|ForIF !ManakaiDOM|ForClass
188     @@resourceFor:
189     @@@@: ManakaiDOM|ForClass
190    
191     @@rdf:type:
192     @@@@: DISLang|Class
193     @@@ForCheck: ManakaiDOM|ForClass
194    
195     @@f:implements:
196     @@@@: MCEncodeFeature10
197 wakaba 1.1
198     ElementTypeBinding:
199     @Name: Method
200     @ElementType:
201     dis:ResourceDef
202     @ShadowContent:
203     @@rdf:type: DISLang|Method
204    
205     ElementTypeBinding:
206     @Name: IntMethod
207     @ElementType:
208     dis:ResourceDef
209     @ShadowContent:
210     @@rdf:type: DISLang|Method
211 wakaba 1.11 @@ForCheck: ManakaiDOM|ForClass
212 wakaba 1.1 @@ManakaiDOM:isForInternal: 1
213    
214     ElementTypeBinding:
215     @Name: Param
216     @ElementType:
217     dis:ResourceDef
218     @ShadowContent:
219     @@DISCore:resourceType: DISLang|MethodParameter
220    
221     ElementTypeBinding:
222     @Name: NamedParam
223     @ElementType:
224     dis:ResourceDef
225     @ShadowContent:
226     @@DISCore:resourceType: DISLang|MethodParameter
227     @@DISPerl:isNamedParameter: 1
228    
229     ElementTypeBinding:
230     @Name: Return
231     @ElementType:
232     dis:ResourceDef
233     @ShadowContent:
234     @@DISCore:resourceType: DISLang|MethodReturn
235    
236     ElementTypeBinding:
237     @Name: Attr
238     @ElementType:
239     dis:ResourceDef
240     @ShadowContent:
241     @@DISCore:resourceType: DISLang|Attribute
242    
243     ElementTypeBinding:
244     @Name: Get
245     @ElementType:
246     dis:ResourceDef
247     @ShadowContent:
248     @@DISCore:resourceType: DISLang|AttributeGet
249    
250     ElementTypeBinding:
251     @Name: Set
252     @ElementType:
253     dis:ResourceDef
254     @ShadowContent:
255     @@DISCore:resourceType: DISLang|AttributeSet
256    
257     ElementTypeBinding:
258     @Name: InCase
259     @ElementType:
260     dis:ResourceDef
261     @ShadowContent:
262     @@DISCore:resourceType: ManakaiDOM|InCase
263    
264     ElementTypeBinding:
265     @Name: PerlDef
266     @ElementType:
267     dis:Def
268     @ShadowContent:
269     @@ContentType:
270     lang:Perl
271     @@ForCheck: ManakaiDOM|ForClass
272    
273     ElementTypeBinding:
274     @Name: Test
275     @ElementType:
276     dis:ResourceDef
277     @ShadowContent:
278     @@rdf:type: test|StandaloneTest
279     @@ForCheck: ManakaiDOM|ForClass
280    
281     ElementTypeBinding:
282     @Name: enDesc
283     @ElementType:
284     dis:Description
285     @ShadowContent:
286     @@lang:en
287    
288     ElementTypeBinding:
289     @Name: enImplNote
290     @ElementType:
291     dis:ImplNote
292     @ShadowContent:
293     @@lang:en
294    
295     ElementTypeBinding:
296     @Name: enFN
297     @ElementType:
298     dis:FullName
299     @ShadowContent:
300     @@lang:en
301 wakaba 1.2
302     ## -- Implementation
303    
304     IFClsDef:
305     @IFQName: MCEncodeImplementation
306     @ClsQName: ManakaiMCEncodeImplementation
307    
308 wakaba 1.8 @domperl:implementedByObjectsOf: c|DOMImplementation
309     @domperl:classImplementedByObjectsOf: c|ManakaiDOMImplementation
310 wakaba 1.2
311     @enDesc:
312     The <IF::MCEncodeImplementation> interface provides
313     factory methods to create <IF::MCEncodeHandle> objects.
314    
315     @Test:
316     @@enDesc:
317     The implementation registry should know this class when the
318     module is loaded.
319     @@PerlDef:
320     I: {
321     for my $impl (@{$Message::DOM::ImplementationRegistry
322 wakaba 1.10 ->get_dom_implementation_list
323 wakaba 1.2 ({<Q::fe|MCEncode> => '1.0'})}) {
324     if ($impl->isa (<IFName::MCEncodeImplementation>)) {
325     last I;
326     }
327     }
328     $test->assert_never;
329     } # I
330    
331     @Method:
332     @@Name: createMCDecodeHandle
333     @@enDesc:
334     Creates an <IF::MCDecodeHandle> object.
335     @@Param:
336     @@@Name: charset
337     @@@Type: String
338     @@@enDesc:
339     The URI that identifies the charset
340     in which strings are written to the <P::byteStream>.
341     @@Param:
342     @@@Name: byteStream
343     @@@Type: DISPerl|Filehandle||ManakaiDOM|all
344     @@@enDesc:
345     A reference to the filehandle that contains the byte
346     stream read by the <IF::MCDecodeHandle> object.
347 wakaba 1.3 @@Param:
348     @@@Name: onerror
349     @@@Type: DISPerl|CODE||ManakaiDOM|all
350     @@@enDesc:
351     A subroutine that is called back when an error is encountered.
352     The <A::MCDecodeHandle.onerror> attribute of the
353     created object is set to this parameter value.
354     @@@nullCase:
355     @@@@enDesc:
356     The <A::MCDecodeHandle.onerror> attribute is set to
357     a do-nothing subroutine.
358 wakaba 1.2 @@Return:
359     @@@Type: MCDecodeHandle
360     @@@enDesc:
361     The newly created filehandle object.
362     @@@nullCase:
363     @@@@enDesc:
364     If the implementation does not support <P::charset>.
365     @@@PerlDef:
366     my $csdef = $Message::Charset::Encode::CharsetDef->{$charset};
367     my $obj = {<H::mce|filehandle> => $byteStream,
368     <H::mce|charset> => $charset,
369     <H::mce|characterQueue> => [],
370     <H::mce|byteBuffer> => '',
371 wakaba 1.3 <H::mce|onerror> => $onerror || sub {}};
372 wakaba 1.2 if ($csdef->{uri}->{<Q::xml-auto-charset:>} or
373     $charset eq <Q::xml-auto-charset:>) {
374 wakaba 1.3 __DEEP{
375     my $b = '';
376     $csdef = $Message::Charset::Encode::CharsetDef
377     ->{<Q::cs|Perl.utf-8>}; # UTF-8 with no BOM
378     $obj->{<H::mce|inputEncoding>} = 'utf-8';
379     if (read $obj->{<H::mce|filehandle>}, $b, 256) {
380     no warnings "substr";
381     no warnings "uninitialized";
382     if (substr ($b, 0, 1) eq "<") {
383     if (substr ($b, 1, 1) eq "?") { # ASCII8
384     __CODE{XMLEntity.guess::
385     $ascii => $b,
386     $errorCondition => {
387     not $csdef->{<H::cs|ASCII8>} or
388     $csdef->{<H::cs|BOM.Required>}
389     },
390     $defaultURI => {<Q::cs|Perl.utf-8>},
391     $defaultName => 'utf-8',
392     $restoreBOM => {},
393     }__;
394     if (defined $csdef->{<H::cs|noBOMVariant>}) {
395     $csdef = $Message::Charset::Encode::CharsetDef
396     ->{$csdef->{<H::cs|noBOMVariant>}};
397     }
398     } elsif (substr ($b, 1, 1) eq "\x00") {
399     if (substr ($b, 2, 2) eq "?\x00") { # ASCII16LE
400     my $c = $b; $c =~ tr/\x00//d;
401     __CODE{XMLEntity.guess::
402     $ascii => $c,
403     $errorCondition => {
404     not $csdef->{<H::cs|ASCII16>} or
405     $csdef->{<H::cs|ASCII16BE>} or
406     $csdef->{<H::cs|BOM.Required>}
407     },
408     $defaultURI => {<Q::cs|Perl.utf-8>},
409     $defaultName => 'utf-8',
410     $restoreBOM => {},
411     }__;
412     if (defined $csdef->{<H::cs|noBOMVariant16LE>}) {
413     $csdef = $Message::Charset::Encode::CharsetDef
414     ->{$csdef->{<H::cs|noBOMVariant16LE>}};
415     }
416     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian4321
417     my $c = $b; $c =~ tr/\x00//d;
418     __CODE{XMLEntity.guess::
419     $ascii => $c,
420     $errorCondition => {
421     not $csdef->{<H::cs|ASCII32>} or
422     $csdef->{<H::cs|ASCII32Endian1234>} or
423     $csdef->{<H::cs|ASCII32Endian2143>} or
424     $csdef->{<H::cs|ASCII32Endian3412>} or
425     $csdef->{<H::cs|BOM.Required>}
426     },
427     $defaultURI => {<Q::cs|Perl.utf-8>},
428     $defaultName => 'utf-8',
429     $restoreBOM => {},
430     }__;
431     if (defined $csdef->{<H::cs|noBOMVariant32Endian4321>}) {
432     $csdef = $Message::Charset::Encode::CharsetDef
433     ->{$csdef->{<H::cs|noBOMVariant32Endian4321>}};
434     }
435     }
436     }
437     } elsif (substr ($b, 0, 3) eq "\xEF\xBB\xBF") { # UTF8
438     $obj->{<H::mce|hasBOM>} = true;
439     substr ($b, 0, 3) = '';
440     my $c = $b;
441     __CODE{XMLEntity.guess::
442     $ascii => $c,
443     $errorCondition => {
444     not $csdef->{<H::cs|UTF8EncodingScheme>} or
445     not $csdef->{<H::cs|BOM.Allowed>}
446     },
447     $defaultURI => {<Q::cs|Perl.utf-8>},
448     $defaultName => 'utf-8',
449     $restoreBOM => {},
450     }__;
451     if (defined $csdef->{<H::cs|noBOMVariant>}) {
452     $csdef = $Message::Charset::Encode::CharsetDef
453     ->{$csdef->{<H::cs|noBOMVariant>}};
454     }
455     } elsif (substr ($b, 0, 2) eq "\x00<") {
456     if (substr ($b, 2, 2) eq "\x00?") { # ASCII16BE
457     my $c = $b; $c =~ tr/\x00//d;
458     __CODE{XMLEntity.guess::
459     $ascii => $c,
460     $errorCondition => {
461     not $csdef->{<H::cs|ASCII16>} or
462     $csdef->{<H::cs|ASCII16LE>} or
463     $csdef->{<H::cs|BOM.Required>}
464     },
465     $defaultURI => {<Q::cs|Perl.utf-8>},
466     $defaultName => 'utf-8',
467     $restoreBOM => {},
468     }__;
469     if (defined $csdef->{<H::cs|noBOMVariant16BE>}) {
470     $csdef = $Message::Charset::Encode::CharsetDef
471     ->{$csdef->{<H::cs|noBOMVariant16BE>}};
472     }
473     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian3412
474     my $c = $b; $c =~ tr/\x00//d;
475     __CODE{XMLEntity.guess::
476     $ascii => $c,
477     $errorCondition => {
478     not $csdef->{<H::cs|ASCII32>} or
479     $csdef->{<H::cs|ASCII32Endian1234>} or
480     $csdef->{<H::cs|ASCII32Endian2143>} or
481     $csdef->{<H::cs|ASCII32Endian4321>} or
482     $csdef->{<H::cs|BOM.Required>}
483     },
484     $defaultURI => {<Q::cs|Perl.utf-8>},
485     $defaultName => 'utf-8',
486     $restoreBOM => {},
487     }__;
488     if (defined $csdef->{<H::cs|noBOMVariant32Endian3412>}) {
489     $csdef = $Message::Charset::Encode::CharsetDef
490     ->{$csdef->{<H::cs|noBOMVariant32Endian3412>}};
491     }
492     }
493     } elsif (substr ($b, 0, 2) eq "\xFE\xFF") {
494     if (substr ($b, 2, 2) eq "\x00<") { # ASCII16BE
495     $obj->{<H::mce|hasBOM>} = true;
496     substr ($b, 0, 2) = '';
497     my $c = $b; $c =~ tr/\x00//d;
498     __CODE{XMLEntity.guess::
499     $ascii => $c,
500     $errorCondition => {
501     not $csdef->{<H::cs|ASCII16>} or
502     $csdef->{<H::cs|ASCII16LE>} or
503     not $csdef->{<H::cs|BOM.Allowed>}
504     },
505     $defaultURI => {<Q::cs|Perl.utf-16be>},
506     $defaultName => 'utf-16',
507     $restoreBOM => {},
508     }__;
509     if (defined $csdef->{<H::cs|noBOMVariant16BE>}) {
510     $csdef = $Message::Charset::Encode::CharsetDef
511     ->{$csdef->{<H::cs|noBOMVariant16BE>}};
512     }
513     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian3412
514     $obj->{<H::mce|hasBOM>} = true;
515     substr ($b, 0, 4) = '';
516     my $c = $b; $c =~ tr/\x00//d;
517     __CODE{XMLEntity.guess::
518     $ascii => $c,
519     $errorCondition => {
520     not $csdef->{<H::cs|ASCII32>} or
521     $csdef->{<H::cs|ASCII32Endian1234>} or
522     $csdef->{<H::cs|ASCII32Endian2143>} or
523     $csdef->{<H::cs|ASCII32Endian4321>} or
524     not $csdef->{<H::cs|BOM.Allowed>}
525     },
526     $defaultURI => {<Q::cs|Perl.utf-16be>},
527     $defaultName => 'utf-16',
528     $restoreBOM => {
529     $obj->{<H::mce|byteBuffer>} .= "\x00\x00";
530     },
531     }__;
532     if (defined $csdef->{<H::cs|noBOMVariant32Endian3412>}) {
533     $csdef = $Message::Charset::Encode::CharsetDef
534     ->{$csdef->{<H::cs|noBOMVariant32Endian3412>}};
535     }
536     } else {
537     $csdef = $Message::Charset::Encode::CharsetDef
538     ->{<Q::cs|Perl.utf-16be>};
539     $obj->{<H::mce|inputEncoding>} = 'utf-16';
540     substr ($b, 0, 2) = '';
541     $obj->{<H::mce|hasBOM>} = true;
542     }
543     } elsif (substr ($b, 0, 2) eq "\xFF\xFE") {
544     if (substr ($b, 2, 2) eq "<\x00") { # ASCII16LE
545     $obj->{<H::mce|hasBOM>} = true;
546     substr ($b, 0, 2) = '';
547     my $c = $b; $c =~ tr/\x00//d;
548     __CODE{XMLEntity.guess::
549     $ascii => $c,
550     $errorCondition => {
551     not $csdef->{<H::cs|ASCII16>} or
552     $csdef->{<H::cs|ASCII16BE>} or
553     not $csdef->{<H::cs|BOM.Allowed>}
554     },
555     $defaultURI => {<Q::cs|Perl.utf-16le>},
556     $defaultName => 'utf-16',
557     $restoreBOM => {},
558     }__;
559     if (defined $csdef->{<H::cs|noBOMVariant16LE>}) {
560     $csdef = $Message::Charset::Encode::CharsetDef
561     ->{$csdef->{<H::cs|noBOMVariant16LE>}};
562     }
563     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian4321
564     $obj->{<H::mce|hasBOM>} = true;
565     substr ($b, 0, 4) = '';
566     my $c = $b; $c =~ tr/\x00//d;
567     __CODE{XMLEntity.guess::
568     $ascii => $c,
569     $errorCondition => {
570     not $csdef->{<H::cs|ASCII32>} or
571     $csdef->{<H::cs|ASCII32Endian1234>} or
572     $csdef->{<H::cs|ASCII32Endian2143>} or
573     $csdef->{<H::cs|ASCII32Endian3412>} or
574     not $csdef->{<H::cs|BOM.Allowed>}
575     },
576     $defaultURI => {<Q::cs|Perl.utf-16le>},
577     $defaultName => 'utf-16',
578     $restoreBOM => {
579     $obj->{<H::mce|byteBuffer>} .= "\x00\x00";
580     },
581     }__;
582     if (defined $csdef->{<H::cs|noBOMVariant32Endian4321>}) {
583     $csdef = $Message::Charset::Encode::CharsetDef
584     ->{$csdef->{<H::cs|noBOMVariant32Endian4321>}};
585     }
586     } else {
587     $csdef = $Message::Charset::Encode::CharsetDef
588     ->{<Q::cs|Perl.utf-16le>};
589     $obj->{<H::mce|inputEncoding>} = 'utf-16';
590     substr ($b, 0, 2) = '';
591     $obj->{<H::mce|hasBOM>} = true;
592     }
593     } elsif (substr ($b, 0, 2) eq "\x00\x00") {
594     if (substr ($b, 2, 2) eq "\x00<") { # ASCII32Endian1234
595     my $c = $b; $c =~ tr/\x00//d;
596     __CODE{XMLEntity.guess::
597     $ascii => $c,
598     $errorCondition => {
599     not $csdef->{<H::cs|ASCII32>} or
600     $csdef->{<H::cs|ASCII32Endian2143>} or
601     $csdef->{<H::cs|ASCII32Endian3412>} or
602     $csdef->{<H::cs|ASCII32Endian4321>} or
603     $csdef->{<H::cs|BOM.Required>}
604     },
605     $defaultURI => {<Q::cs|Perl.utf-8>},
606     $defaultName => 'utf-8',
607     $restoreBOM => {},
608     }__;
609     if (defined $csdef->{<H::cs|noBOMVariant32Endian1234>}) {
610     $csdef = $Message::Charset::Encode::CharsetDef
611     ->{$csdef->{<H::cs|noBOMVariant32Endian1234>}};
612     }
613     } elsif (substr ($b, 2, 2) eq "<\x00") { # ASCII32Endian2143
614     my $c = $b; $c =~ tr/\x00//d;
615     __CODE{XMLEntity.guess::
616     $ascii => $c,
617     $errorCondition => {
618     not $csdef->{<H::cs|ASCII32>} or
619     $csdef->{<H::cs|ASCII32Endian1234>} or
620     $csdef->{<H::cs|ASCII32Endian3412>} or
621     $csdef->{<H::cs|ASCII32Endian4321>} or
622     $csdef->{<H::cs|BOM.Required>}
623     },
624     $defaultURI => {<Q::cs|Perl.utf-8>},
625     $defaultName => 'utf-8',
626     $restoreBOM => {},
627     }__;
628     if (defined $csdef->{<H::cs|noBOMVariant32Endian2143>}) {
629     $csdef = $Message::Charset::Encode::CharsetDef
630     ->{$csdef->{<H::cs|noBOMVariant32Endian2143>}};
631     }
632     } elsif (substr ($b, 2, 2) eq "\xFE\xFF") { # ASCII32Endian1234
633     $obj->{<H::mce|hasBOM>} = true;
634     substr ($b, 0, 4) = '';
635     my $c = $b; $c =~ tr/\x00//d;
636     __CODE{XMLEntity.guess::
637     $ascii => $c,
638     $errorCondition => {
639     not $csdef->{<H::cs|ASCII32>} or
640     $csdef->{<H::cs|ASCII32Endian2143>} or
641     $csdef->{<H::cs|ASCII32Endian3412>} or
642     $csdef->{<H::cs|ASCII32Endian4321>} or
643     $csdef->{<H::cs|BOM.Required>}
644     },
645     $defaultURI => {<Q::cs|Perl.utf-8>},
646     $defaultName => 'utf-8',
647     $restoreBOM => {
648     $obj->{<H::mce|hasBOM>} = false;
649     $obj->{<H::mce|byteBuffer>} .= "\x00\x00\xFE\xFF";
650     },
651     }__;
652     if (defined $csdef->{<H::cs|noBOMVariant32Endian1234>}) {
653     $csdef = $Message::Charset::Encode::CharsetDef
654     ->{$csdef->{<H::cs|noBOMVariant32Endian1234>}};
655     }
656     } elsif (substr ($b, 2, 2) eq "\xFF\xFE") { # ASCII32Endian2143
657     $obj->{<H::mce|hasBOM>} = true;
658     substr ($b, 0, 4) = '';
659     my $c = $b; $c =~ tr/\x00//d;
660     __CODE{XMLEntity.guess::
661     $ascii => $c,
662     $errorCondition => {
663     not $csdef->{<H::cs|ASCII32>} or
664     $csdef->{<H::cs|ASCII32Endian1234>} or
665     $csdef->{<H::cs|ASCII32Endian3412>} or
666     $csdef->{<H::cs|ASCII32Endian4321>} or
667     $csdef->{<H::cs|BOM.Required>}
668     },
669     $defaultURI => {<Q::cs|Perl.utf-8>},
670     $defaultName => 'utf-8',
671     $restoreBOM => {
672     $obj->{<H::mce|hasBOM>} = false;
673     $obj->{<H::mce|byteBuffer>} .= "\x00\x00\xFF\xFE";
674     },
675     }__;
676     if (defined $csdef->{<H::cs|noBOMVariant32Endian2143>}) {
677     $csdef = $Message::Charset::Encode::CharsetDef
678     ->{$csdef->{<H::cs|noBOMVariant32Endian2143>}};
679     }
680     }
681     # \x4C\x6F\xA7\x94 EBCDIC
682     } # buffer
683     $obj->{<H::mce|byteBuffer>} .= $b;
684     } # read
685     }__;
686 wakaba 1.2 } elsif ($csdef->{uri}->{<Q::cs|XML.utf-8>}) {
687 wakaba 1.3 ## BOM is optional.
688     __DEEP{
689     my $b = '';
690     if (read $obj->{<H::mce|filehandle>}, $b, 3) {
691     if ($b eq "\xEF\xBB\xBF") {
692     $obj->{<H::mce|hasBOM>} = true;
693     } else {
694     $obj->{<H::mce|byteBuffer>} .= $b;
695     }
696     }
697     $csdef = $Message::Charset::Encode::CharsetDef
698     ->{<Q::cs|Perl.utf-8>}; # UTF-8 with no BOM
699     }__;
700 wakaba 1.2 } elsif ($csdef->{uri}->{<Q::cs|XML.utf-16>}) {
701 wakaba 1.3 ## BOM is mandated.
702     __DEEP{
703     my $b = '';
704     if (read $obj->{<H::mce|filehandle>}, $b, 2) {
705     if ($b eq "\xFE\xFF") {
706     $obj->{<H::mce|hasBOM>} = true;
707     $csdef = $Message::Charset::Encode::CharsetDef
708     ->{<Q::cs|Perl.utf-16be>}; # UTF-16BE with no BOM
709     } elsif ($b eq "\xFF\xFE") {
710     $obj->{<H::mce|hasBOM>} = true;
711     $csdef = $Message::Charset::Encode::CharsetDef
712     ->{<Q::cs|Perl.utf-16le>}; # UTF-16LE with no BOM
713     } else {
714     $obj->{<H::mce|onerror>}
715     ->($onerror, null, <Q::cs|no-bom-error>,
716     charset_uri => $charset);
717     $obj->{<H::mce|hasBOM>} = false;
718     $obj->{<H::mce|byteBuffer>} .= $b;
719     $csdef = $Message::Charset::Encode::CharsetDef
720     ->{<Q::cs|Perl.utf-16be>}; # UTF-16BE with no BOM
721     }
722     } else {
723     $obj->{<H::mce|onerror>}
724     ->($onerror, null, <Q::cs|no-bom-error>,
725     charset_uri => $charset);
726     $obj->{<H::mce|hasBOM>} = false;
727     $csdef = $Message::Charset::Encode::CharsetDef
728     ->{<Q::cs|Perl.utf-16be>}; # UTF-16BE with no BOM
729     }
730     }__;
731     }
732 wakaba 1.2
733 wakaba 1.6 if ($csdef->{uri}->{<Q::cs|XML.iso-2022-jp>}) {
734     $obj->{<H::mce|State.2440>} = 'gl-jis-1997-swapped';
735     $obj->{<H::mce|State.2442>} = 'gl-jis-1997';
736     $obj->{<H::mce|state>} = <H::mce|State.2842>;
737     $r = bless $obj, <ClassName::ManakaiMCISO2022JPDecodeHandle>;
738     require Encode::GLJIS1997Swapped;
739     require Encode::GLJIS1997;
740     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2440>});
741     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2442>});
742     } elsif ($csdef->{uri}->{<Q::icharset|iso-2022-jp>}) {
743     $obj->{<H::mce|State.2440>} = 'gl-jis-1978';
744     $obj->{<H::mce|State.2442>} = 'gl-jis-1983';
745     $obj->{<H::mce|state>} = <H::mce|State.2842>;
746     $r = bless $obj, <ClassName::ManakaiMCISO2022JPDecodeHandle>;
747     require Encode::GLJIS1978;
748     require Encode::GLJIS1983;
749     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2440>});
750     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2442>});
751     } elsif (defined $csdef->{<H::cs|perlName>}->[0]) {
752 wakaba 1.5 if ($csdef->{uri}->{<Q::cs|XML.euc-jp>} or
753     $csdef->{uri}->{<Q::icharset|euc-jp>}) {
754     $obj->{<H::mce|perlEncodingName>} = $csdef->{<H::cs|perlName>}->[0];
755     $r = bless $obj, <ClassName::ManakaiMCEUCJPDecodeHandle>;
756     require Encode::EUCJP1997;
757     undef $r unless Encode::find_encoding
758     ($obj->{<H::mce|perlEncodingName>});
759     } elsif ($csdef->{uri}->{<Q::cs|XML.shift_jis>} or
760     $csdef->{uri}->{<Q::icharset|shift_jis>}) {
761     $obj->{<H::mce|perlEncodingName>} = $csdef->{<H::cs|perlName>}->[0];
762     $r = bless $obj, <ClassName::ManakaiMCShiftJISDecodeHandle>;
763     require Encode::ShiftJIS1997;
764     undef $r unless Encode::find_encoding
765     ($obj->{<H::mce|perlEncodingName>});
766     } elsif ($csdef->{<H::cs|isBlockSafe>}) {
767 wakaba 1.2 $obj->{<H::mce|perlEncodingName>} = $csdef->{<H::cs|perlName>}->[0];
768     $r = bless $obj, <ClassName::ManakaiMCDecodeHandle>;
769     require Encode;
770     undef $r unless Encode::find_encoding
771     ($obj->{<H::mce|perlEncodingName>});
772     }
773     }
774 wakaba 1.5
775 wakaba 1.3 unless (defined $r) {
776     __DEEP{
777     $obj->{<H::mce|onerror>}
778     ->($onerror, null, <Q::cs|charset-not-supported-error>,
779     charset_uri => $charset);
780     }__;
781     }
782    
783     @@CODE:
784     @@@QName: XMLEntity.guess
785     @@@PerlDef:
786     if ($ascii =~ /^<\?xml\s+(?:version\s*=\s*["'][^"']*["']\s*)?
787     encoding\s*=\s*["']([^"']*)/x) {
788     $obj->{<H::mce|inputEncoding>} = lc $1;
789     my $__uri = $self-><M::MCEncodeImplementation.getURIFromCharsetName>
790     (<Q::cs|xmlName>, $obj->{<H::mce|inputEncoding>});
791     $csdef = $Message::Charset::Encode::CharsetDef->{$__uri};
792     if ($errorCondition) {
793     $obj->{<H::mce|onerror>}
794     ->($obj->{<H::mce|onerror>}, null,
795     <Q::cs|charset-name-mismatch-error>,
796     charset_uri => $__uri,
797     charset_name => $obj->{<H::mce|inputEncoding>});
798     }
799     } else {
800     $csdef = $Message::Charset::Encode::CharsetDef->{$defaultURI};
801     $obj->{<H::mce|inputEncoding>} = $defaultName;
802     $restoreBOM;
803     }
804 wakaba 1.2
805     @@Test:
806     @@@QName: MCEncodeImpl.createMCDecodeHandle.test
807     @@@PerlDef:
808     my $impl;
809     __CODE{createImplForTest:: $impl => $impl}__;
810    
811     my $byte = 'a';
812     open my $fh, '>', \$byte;
813     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
814     (<Q::cs|Perl.utf8>, $fh);
815    
816     $test->id ('interface');
817     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
818    
819 wakaba 1.3 $test->id ('onerr');
820     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
821    
822 wakaba 1.2 @@Test:
823     @@@QName: MCEncodeImpl.createMCXMLDecodeHandle.test
824     @@@PerlDef:
825     my $impl;
826     __CODE{createImplForTest:: $impl => $impl}__;
827    
828     my $byte = 'a';
829     open my $fh, '<', \$byte;
830     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
831     (<Q::xml-auto-charset:>, $fh);
832    
833     $test->id ('interface');
834     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
835    
836 wakaba 1.3 $test->id ('onerr');
837     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
838    
839     @@Test:
840     @@@QName: MCEncodeImpl.createMCDecodeHandle.3.test
841     @@@PerlDef:
842     my $impl;
843     __CODE{createImplForTest:: $impl => $impl}__;
844    
845     my $errors = 0;
846    
847     my $byte = 'a';
848     open my $fh, '>', \$byte;
849     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
850     (q<http://www.charset.example/>, $fh, sub {
851     $errors++;
852     $test->id ('errortype');
853     $test->assert_equals
854     ($_[2],
855     <Q::cs|charset-not-supported-error>);
856     });
857    
858     $test->id ('errors');
859     $test->assert_num_equals (actual_value => $errors, expected_value => 1);
860    
861     $test->id ('return');
862     $test->assert_null ($efh);
863 wakaba 1.2
864 wakaba 1.5 @@Test:
865     @@@QName: MCEncodeImpl.createMCDecodeHandle.4.test
866     @@@PerlDef:
867     my $impl;
868     __CODE{createImplForTest:: $impl => $impl}__;
869    
870     my $byte = 'a';
871     open my $fh, '<', \$byte;
872     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
873     (<Q::cs|XML.euc-jp>, $fh);
874    
875     $test->id ('interface');
876     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
877    
878     $test->id ('class');
879     $test->assert_isa ($efh, <ClassName::ManakaiMCEUCJPDecodeHandle>);
880    
881     $test->id ('onerr');
882     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
883    
884     @@Test:
885     @@@QName: MCEncodeImpl.createMCDecodeHandle.5.test
886     @@@PerlDef:
887     my $impl;
888     __CODE{createImplForTest:: $impl => $impl}__;
889    
890     my $byte = 'a';
891     open my $fh, '<', \$byte;
892     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
893     (<Q::cs|XML.euc-jp>, $fh);
894    
895     $test->id ('interface');
896     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
897    
898     $test->id ('class');
899     $test->assert_isa ($efh, <ClassName::ManakaiMCEUCJPDecodeHandle>);
900    
901     $test->id ('onerr');
902     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
903    
904 wakaba 1.6 @@Test:
905     @@@QName: MCEncodeImpl.createMCDecodeHandle.6.test
906     @@@PerlDef:
907     my $impl;
908     __CODE{createImplForTest:: $impl => $impl}__;
909    
910     my $byte = 'a';
911     open my $fh, '<', \$byte;
912     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
913     (<Q::cs|XML.iso-2022-jp>, $fh);
914    
915     $test->id ('interface');
916     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
917    
918     $test->id ('class');
919     $test->assert_isa ($efh, <ClassName::ManakaiMCISO2022JPDecodeHandle>);
920    
921     $test->id ('onerr');
922     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
923    
924     @@Test:
925     @@@QName: MCEncodeImpl.createMCDecodeHandle.7.test
926     @@@PerlDef:
927     my $impl;
928     __CODE{createImplForTest:: $impl => $impl}__;
929    
930     my $byte = 'a';
931     open my $fh, '<', \$byte;
932     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
933     (<Q::icharset|iso-2022-jp>, $fh);
934    
935     $test->id ('interface');
936     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
937    
938     $test->id ('class');
939     $test->assert_isa ($efh, <ClassName::ManakaiMCISO2022JPDecodeHandle>);
940    
941     $test->id ('onerr');
942     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
943    
944 wakaba 1.2 @Method:
945     @@Name: getURIFromCharsetName
946     @@enDesc:
947     Returns a DOM URI that identifies a charset.
948     @@Param:
949     @@@Name: domain
950     @@@Type: String
951     @@@enDesc:
952     A DOM URI that identifies the context in which the charset
953     name is used.
954     @@Param:
955     @@@Name: name
956     @@@Type: String
957     @@@enDesc:
958     The charset name to convert.
959     @@Return:
960     @@@Type: String
961     @@@enDesc:
962     A DOM URI that identifies <P::name>.
963     @@@nullCase:
964     @@@@enDesc:
965     The implementation was unable to resolve <P::name> to a URI.
966     @@@PerlDef:
967     if ({
968     <Q::cs|ietfName> => true,
969     <Q::cs|xmlName> => true,
970     }->{$domain}) {
971 wakaba 1.3 $name = lc $name;
972     if ($domain eq <Q::cs|ietfName>) {
973     $r = <Q::icharset|> . $name;
974     } elsif ($domain eq <Q::cs|xmlName>) {
975     $r = <Q::cs|XML.> . $name;
976     }
977    
978     unless ($Message::Charset::Encode::CharsetDef->{$r}) {
979     U: for my $uri (keys %$Message::Charset::Encode::CharsetDef) {
980     for (@{$Message::Charset::Encode::CharsetDef->{$uri}->{+{
981     <Q::cs|ietfName> => <H::cs|ietfName>,
982     <Q::cs|xmlName> => <H::cs|xmlName>,
983     }->{$domain}} or []}) {
984     if ($_ eq $name) {
985     $r = $uri;
986     last U;
987     }
988 wakaba 1.2 }
989 wakaba 1.3 } # U
990 wakaba 1.2 }
991 wakaba 1.3 } else {
992     $r = null;
993     }
994    
995     @@Test:
996     @@@QName: MCEncodeImpl.name2uri.test
997     @@@PerlDef:
998     my $impl;
999     __CODE{createImplForTest:: $impl => $impl}__;
1000    
1001     for (
1002     [<Q::icharset|utf-8>, 'utf-8'],
1003     [<Q::icharset|x-no-such-charset>, 'x-no-such-charset'],
1004     [<Q::icharset|utf-8>, 'UTF-8'],
1005     [<Q::icharset|utf-8>, 'uTf-8'],
1006     [<Q::icharset|utf-16be>, 'utf-16be'],
1007     ) {
1008     $test->id ('ietfname2uri.'.$_->[1]);
1009     my $iname = $impl-><M::MCEncodeImplementation.getURIFromCharsetName>
1010     (<Q::cs|ietfName>, $_->[1]);
1011     $test->assert_equals ($iname, $_->[0]);
1012     }
1013    
1014     for (
1015     [<Q::cs|XML.utf-8>, 'utf-8'],
1016     [<Q::cs|XML.x-no-such-charset>, 'x-no-such-charset'],
1017     [<Q::cs|XML.utf-8>, 'UTF-8'],
1018     [<Q::cs|XML.utf-8>, 'uTf-8'],
1019     [<Q::icharset|utf-16be>, 'utf-16be'],
1020     ) {
1021     $test->id ('xmlname2uri.'.$_->[1]);
1022     my $iname = $impl-><M::MCEncodeImplementation.getURIFromCharsetName>
1023     (<Q::cs|xmlName>, $_->[1]);
1024     $test->assert_equals ($iname, $_->[0]);
1025 wakaba 1.2 }
1026    
1027     @Method:
1028     @@Name: getCharsetNameFromURI
1029     @@enDesc:
1030     Returns a name for the charset identified by a DOM URI.
1031     @@Param:
1032     @@@Name: domain
1033     @@@Type: String
1034     @@@enDesc:
1035     A DOM URI that identifies the context in which the charset
1036     name is used.
1037     @@Param:
1038     @@@Name: uri
1039     @@@Type: String
1040     @@@enDesc:
1041     A DOM URI of the charset.
1042     @@Return:
1043     @@@Type: String
1044     @@@enDesc:
1045     A charset name that identifies <P::uri>.
1046     @@@nullCase:
1047     @@@@enDesc:
1048     The implementation was unable to find the charset name
1049     for the <P::uri> that can be used in <P::domain> context.
1050     @@@PerlDef:
1051     if ({
1052     <Q::cs|ietfName> => true,
1053     <Q::cs|xmlName> => true,
1054     }->{$domain}) {
1055 wakaba 1.3 $r = $Message::Charset::Encode::CharsetDef->{$uri}->{+{
1056     <Q::cs|ietfName> => <H::cs|ietfName>,
1057     <Q::cs|xmlName> => <H::cs|xmlName>,
1058     }->{$domain}}->[0];
1059     unless (defined $r) {
1060     if ($domain eq <Q::cs|ietfName> and
1061     substr ($uri, 0, length <Q::icharset|>) eq <Q::icharset|>) {
1062     $r = substr ($uri, length <Q::icharset|>);
1063     } elsif ($domain eq <Q::cs|xmlName> and
1064     substr ($uri, 0, length <Q::cs|XML.>) eq <Q::cs|XML.>) {
1065     $r = substr ($uri, length <Q::cs|XML.>);
1066     }
1067     }
1068 wakaba 1.2 } else {
1069     $r = null;
1070     }
1071    
1072 wakaba 1.3 @@Test:
1073     @@@QName: MCEncodeImpl.uri2name.test
1074     @@@PerlDef:
1075     my $impl;
1076     __CODE{createImplForTest:: $impl => $impl}__;
1077    
1078     for (
1079     [<Q::icharset|utf-8>, 'utf-8'],
1080     [<Q::icharset|x-no-such-charset>, 'x-no-such-charset'],
1081     [q<http://charset.example/>, null],
1082     ) {
1083     $test->id ('uri2ietfname.'.$_->[0]);
1084     my $iname = $impl-><M::MCEncodeImplementation.getCharsetNameFromURI>
1085     (<Q::cs|ietfName>, $_->[0]);
1086     $test->assert_equals ($iname, $_->[1]);
1087     }
1088    
1089     for (
1090     [<Q::cs|XML.utf-8>, 'utf-8'],
1091     [<Q::cs|XML.x-no-such-charset>, 'x-no-such-charset'],
1092     [q<http://charset.example/>, null],
1093     ) {
1094     $test->id ('uri2xmlname.'.$_->[0]);
1095     my $iname = $impl-><M::MCEncodeImplementation.getCharsetNameFromURI>
1096     (<Q::cs|xmlName>, $_->[0]);
1097     $test->assert_equals ($iname, $_->[1]);
1098     }
1099    
1100 wakaba 1.2 @CODE:
1101     @@QName: createImplForTest
1102     @@PerlDef:
1103 wakaba 1.9 $impl = <Class::c|ManakaiDOMImplementation>->_new;
1104 wakaba 1.2 ##MCEncodeImplementation
1105    
1106     ElementTypeBinding:
1107     @Name: CODE
1108     @ElementType:
1109     dis:ResourceDef
1110     @ShadowContent:
1111     @@DISCore:resourceType: DISPerl|BlockCode
1112     @@ForCheck: ManakaiDOM|ForClass
1113    
1114     IFClsDef:
1115     @IFQName: MCDecodeHandle
1116     @ClsQName: ManakaiMCDecodeHandle
1117    
1118     @enDesc:
1119     An <IF::MCDecodeHandle> provides the read access to a character
1120     stream.
1121    
1122     @enDesc:
1123     @@ddid: cestype
1124     @@ForCheck: ManakaiDOM|ForClass
1125     @@@:
1126     The class <Class::ManakaiMCDecodeHandle> can be used to
1127     encapsulate a byte filehandle with <Perl::Encode> call
1128     into a character filehandle-like object.
1129    
1130     The encoding <kwd:MUST> be stateless and signatureless. In addition,
1131     its <Perl::Encode> implementation <kwd:MUST> support
1132     the <Perl::FB_QUIET> flag.
1133    
1134     @Attr:
1135     @@Name: charset
1136     @@enDesc:
1137     A URI that identifies the charset of the handle.
1138     @@Type: String
1139     @@Get:
1140     @@@PerlDef:
1141     $r = $self->{<H::mce|charset>};
1142    
1143     @Attr:
1144 wakaba 1.3 @@Name: onerror
1145 wakaba 1.2 @@enDesc:
1146 wakaba 1.3 A callback function that is invoked when an error is encountered.
1147    
1148     {P:: The function will be invoked with arguments:
1149 wakaba 1.2
1150 wakaba 1.3 - <Perl::$self>::: The function itself.
1151 wakaba 1.2
1152 wakaba 1.3 - <Perl::$handle>::: The <IF::MCDecodeHandle> object.
1153     If the error is thrown during the
1154     construction of the object, it might
1155     be <DOM::null> instead.
1156    
1157     - <Perl::$errorType>::: The DOM URI that identifies the category
1158     of the error.
1159    
1160     - <Perl::%opt>::: Named parameters depending to the <Perl::$errorType>.
1161    
1162     It <kwd:MAY> throw an exception.
1163     }
1164 wakaba 1.2 @@Type: DISPerl|CODE||ManakaiDOM|all
1165     @@Get:
1166     @@@PerlDef:
1167 wakaba 1.3 $r = $self->{<H::mce|onerror>};
1168 wakaba 1.2 @@Set:
1169     @@@PerlDef:
1170 wakaba 1.3 $self->{<H::mce|onerror>} = $given;
1171 wakaba 1.2
1172     @@Test:
1173     @@@QName: MCDecodeHandle.onoctetstreamerror.test
1174     @@@PerlDef:
1175     my $impl;
1176     __CODE{createImplForTest:: $impl => $impl}__;
1177    
1178     my $byte = "a\xE3\x81\x82\x81a";
1179     open my $fh, '<', \$byte;
1180     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1181     (<Q::cs|Perl.utf8>, $fh);
1182    
1183     $test->id ('default');
1184 wakaba 1.3 $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>,
1185 wakaba 1.2 'CODE');
1186    
1187     $test->id ('get.set');
1188     my $sub1 = sub { return "2" };
1189 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> ($sub1);
1190     my $sub2 = $efh-><AG::MCDecodeHandle.onerror>;
1191 wakaba 1.2 $test->assert_equals ($sub2, $sub1);
1192     $test->assert_equals ($sub2->(), "2");
1193    
1194     @Method:
1195     @@Name: getc
1196     @@enDesc:
1197     Returns the next character from the input.
1198     @@Return:
1199     @@@Type: String
1200     @@@enDesc:
1201     The next character.
1202     @@@nullCase:
1203     @@@@enDesc:
1204     If at the end of the file, or if there was an error, in which
1205     case <Perl::$!> is set.
1206     @@@PerlDef:
1207     if (@{$self->{<H::mce|characterQueue>}}) {
1208     $r = shift @{$self->{<H::mce|characterQueue>}};
1209     } else {
1210     __DEEP{
1211     my $error;
1212     if ($self->{<H::mce|continue>}) {
1213     if (read $self->{<H::mce|filehandle>},
1214     $self->{<H::mce|byteBuffer>}, 256,
1215     length $self->{<H::mce|byteBuffer>}) {
1216     #
1217     } else {
1218     $error = true;
1219     }
1220     $self->{<H::mce|continue>} = false;
1221 wakaba 1.3 } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
1222 wakaba 1.2 read $self->{<H::mce|filehandle>},
1223     $self->{<H::mce|byteBuffer>}, 256,
1224     length $self->{<H::mce|byteBuffer>};
1225     }
1226    
1227     unless ($error) {
1228     my $string = Encode::decode ($self->{<H::mce|perlEncodingName>},
1229     $self->{<H::mce|byteBuffer>},
1230     Encode::FB_QUIET ());
1231     if (length $string) {
1232     push @{$self->{<H::mce|characterQueue>}}, split //, $string;
1233     $r = shift @{$self->{<H::mce|characterQueue>}};
1234 wakaba 1.3 if (length $self->{<H::mce|byteBuffer>}) {
1235     $self->{<H::mce|continue>} = true;
1236     }
1237 wakaba 1.2 } else {
1238 wakaba 1.3 if (length $self->{<H::mce|byteBuffer>}) {
1239     $error = true;
1240     } else {
1241     $r = null;
1242     }
1243 wakaba 1.2 }
1244 wakaba 1.3 }
1245     if ($error) {
1246 wakaba 1.2 $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
1247 wakaba 1.3 $self->{<H::mce|onerror>}
1248     ->($self->{<H::mce|onerror>}, $self,
1249     <Q::cs|illegal-octets-error>,
1250     octets => \$r);
1251 wakaba 1.2 }
1252     }__;
1253     }
1254    
1255     @@Test:
1256     @@@QName: MCDecodeHandle.getc.1.test
1257     @@@PerlDef:
1258     my $impl;
1259     __CODE{createImplForTest:: $impl => $impl}__;
1260    
1261     my $byte = "a\xE3\x81\x82\x81a";
1262     open my $fh, '<', \$byte;
1263     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1264     (<Q::cs|Perl.utf8>, $fh);
1265    
1266     my $error = null;
1267 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1268     my ($self, $efh, $type, %opt) = @_;
1269     $error = ${$opt{octets}};
1270 wakaba 1.2 });
1271    
1272     $test->id (1);
1273     $test->assert_equals ($efh->getc, "a");
1274     $test->id ('1.err');
1275     $test->assert_null ($error);
1276    
1277     $test->id (2);
1278     $test->assert_equals ($efh->getc, "\x{3042}");
1279     $test->id ('1.err');
1280     $test->assert_null ($error);
1281    
1282     $test->id (3);
1283     $test->assert_equals ($efh->getc, "\x81");
1284     $test->id ('1.err');
1285     $test->assert_equals ($error, "\x81");
1286     $error = null;
1287    
1288     $test->id (4);
1289     $test->assert_equals ($efh->getc, "a");
1290     $test->id ('4.err');
1291     $test->assert_null ($error);
1292    
1293     $test->id ('eof');
1294     $test->assert_null ($efh->getc);
1295     $test->id ('eof.err');
1296     $test->assert_null ($error);
1297    
1298     @@Test:
1299     @@@QName: MCDecodeHandle.getc.2.test
1300     @@@PerlDef:
1301     my $impl;
1302     __CODE{createImplForTest:: $impl => $impl}__;
1303    
1304     my $byte = "a" x 256;
1305     $byte .= "b" x 256;
1306    
1307     open my $fh, '<', \$byte;
1308     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1309     (<Q::cs|Perl.utf8>, $fh);
1310    
1311     my $error = null;
1312 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1313     my ($self, $efh, $type, %opt) = @_;
1314     $error = ${$opt{octets}};
1315 wakaba 1.2 });
1316    
1317     for my $i (0..255) {
1318     $test->id ("a.$i");
1319     $test->assert_equals ($efh->getc, "a");
1320     $test->id ("a.$i.err");
1321     $test->assert_null ($error);
1322     }
1323    
1324     for my $i (0..255) {
1325     $test->id ("b.$i");
1326     $test->assert_equals ($efh->getc, "b");
1327     $test->id ("b.$i.err");
1328     $test->assert_null ($error);
1329     }
1330    
1331     $test->id ('eof');
1332     $test->assert_null ($efh->getc);
1333     $test->id ('eof.err');
1334     $test->assert_null ($error);
1335    
1336     @@Test:
1337     @@@QName: MCDecodeHandle.getc.3.test
1338     @@@PerlDef:
1339     my $impl;
1340     __CODE{createImplForTest:: $impl => $impl}__;
1341    
1342     my $byte = "a" x 255;
1343     $byte .= "\xE3\x81\x82";
1344     $byte .= "b" x 256;
1345    
1346     open my $fh, '<', \$byte;
1347     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1348     (<Q::cs|Perl.utf8>, $fh);
1349    
1350     my $error = null;
1351 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1352     my ($self, $efh, $type, %opt) = @_;
1353     $error = ${$opt{octets}};
1354 wakaba 1.2 });
1355    
1356     for my $i (0..254) {
1357     $test->id ("a.$i");
1358     $test->assert_equals ($efh->getc, "a");
1359     $test->id ("a.$i.err");
1360     $test->assert_null ($error);
1361     }
1362    
1363     $test->id ("A");
1364     $test->assert_equals ($efh->getc, "\x{3042}");
1365     $test->id ("A.err");
1366     $test->assert_null ($error);
1367    
1368     for my $i (0..255) {
1369     $test->id ("b.$i");
1370     $test->assert_equals ($efh->getc, "b");
1371     $test->id ("b.$i.err");
1372     $test->assert_null ($error);
1373     }
1374    
1375     $test->id ('eof');
1376     $test->assert_null ($efh->getc);
1377     $test->id ('eof.err');
1378     $test->assert_null ($error);
1379    
1380     @@Test:
1381     @@@QName: MCDecodeHandle.getc.4.test
1382     @@@PerlDef:
1383     my $impl;
1384     __CODE{createImplForTest:: $impl => $impl}__;
1385    
1386     my $byte = "a" x 255;
1387     $byte .= "\xE3";
1388    
1389     open my $fh, '<', \$byte;
1390     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1391     (<Q::cs|Perl.utf8>, $fh);
1392    
1393     my $error = null;
1394 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1395     my ($self, $efh, $type, %opt) = @_;
1396     $error = ${$opt{octets}};
1397 wakaba 1.2 });
1398    
1399     for my $i (0..254) {
1400     $test->id ("a.$i");
1401     $test->assert_equals ($efh->getc, "a");
1402     $test->id ("a.$i.err");
1403     $test->assert_null ($error);
1404     }
1405    
1406     $test->id ("E3");
1407     $test->assert_equals ($efh->getc, "\xE3");
1408     $test->id ("E3.err");
1409     $test->assert_equals ($error, "\xE3");
1410     $error = null;
1411    
1412     $test->id ('eof');
1413     $test->assert_null ($efh->getc);
1414     $test->id ('eof.err');
1415     $test->assert_null ($error);
1416 wakaba 1.4
1417     @Method:
1418     @@Name: ungetc
1419     @@enDesc:
1420     Pushes a character with the given ordinal value back
1421     onto the handle's input stream. In <Perl::IO::Handle>
1422     only one character of pushback per handle is guaranteed.
1423     @@Param:
1424     @@@Name: ord
1425     @@@Type: idl|unsignedLong||ManakaiDOM|all
1426     @@@enDesc:
1427     The ordinal value of the character to push back.
1428     @@Return:
1429     @@@PerlDef:
1430     unshift @{$self->{<H::mce|characterQueue>}}, chr $ord;
1431    
1432     @@Test:
1433     @@@QName: MCDecodeHandle.ungetc.test
1434     @@@PerlDef:
1435     my $impl;
1436     __CODE{createImplForTest:: $impl => $impl}__;
1437    
1438     my $byte = "a\x{4E00}b\x{4E11}";
1439    
1440     open my $fh, '<', \$byte;
1441     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1442     (<Q::cs|Perl.utf8>, $fh);
1443    
1444     $test->id ('1.getc');
1445     $test->assert_equals ($efh->getc, "a");
1446    
1447     $test->id ('1.ungetc');
1448     $efh->ungetc (ord "a");
1449     $test->assert_equals ($efh->getc, "a");
1450    
1451     $test->id ('2.getc');
1452     $test->assert_equals ($efh->getc, "\x{4E00}");
1453    
1454     $test->id ('2.ungetc');
1455     $efh->ungetc (ord "\x{4E00}");
1456     $test->assert_equals ($efh->getc, "\x{4E00}");
1457    
1458     $test->id ('3.getc');
1459     $test->assert_equals ($efh->getc, "b");
1460    
1461     $test->id ('4.getc');
1462     $test->assert_equals ($efh->getc, "\x{4E11}");
1463    
1464     $test->id ('4.ungetc');
1465     $efh->ungetc (ord "\x{4E11}");
1466     $test->assert_equals ($efh->getc, "\x{4E11}");
1467 wakaba 1.2
1468 wakaba 1.3 @Attr:
1469     @@Name: inputEncoding
1470     @@enDesc:
1471     The name of the input charset.
1472     @@Type: String
1473     @@Get:
1474     @@@enDesc:
1475     If there is a string looks like encoding declaration,
1476     then the value of it, in lowercase. Otherwise and
1477     there is the UTF-16 <CHAR::BOM>, then <CODE::utf-16>.
1478     Otherwise, <CODE::utf-8>.
1479     @@@nullCase:
1480     @@@@enDesc:
1481     If the charset is different from <Q::xml-auto-charset:>.
1482     @@@PerlDef:
1483     $r = $self->{<H::mce|inputEncoding>};
1484    
1485     @Attr:
1486     @@Name: hasBOM
1487     @@enDesc:
1488     Whether the decoder detected the <CHAR::BYTE ORDER MARK> or not.
1489     @@Type: idl|boolean||ManakaiDOM|all
1490     @@Get:
1491     @@@TrueCase:
1492     @@@@enDesc:
1493     If there is the <CHAR::BOM>.
1494     @@@FalseCase:
1495     @@@@enDesc:
1496     Either if there is no <CHAR::BOM>, the decoder not
1497     reached to the end of the <CHAR::BOM>, or
1498     the decoder implementation does not provide whether
1499     there is the <CHAR::BOM> or not.
1500     @@@PerlDef:
1501     $r = $self->{<H::mce|hasBOM>};
1502    
1503     @Test:
1504     @@QName: MCDecodeHandle.utf-8-optional-bom.1.test
1505     @@PerlDef:
1506     my $impl;
1507     __CODE{createImplForTest:: $impl => $impl}__;
1508    
1509     my $byte = qq<\xEF\xBB\xBFabc>;
1510    
1511     my $error;
1512    
1513     open my $fh, '<', \$byte;
1514     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1515     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1516    
1517     $test->id (1);
1518     $test->assert_equals ($efh->getc, "a");
1519    
1520     $test->id (2);
1521     $test->assert_equals ($efh->getc, "b");
1522    
1523     $test->id (3);
1524     $test->assert_equals ($efh->getc, "c");
1525    
1526     $test->id (4);
1527     $test->assert_null ($efh->getc);
1528    
1529     $test->id ('err');
1530     $test->assert_false ($error);
1531    
1532     $test->id ('bom');
1533     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1534    
1535     @Test:
1536     @@QName: MCDecodeHandle.utf-8-optional-bom.2.test
1537     @@PerlDef:
1538     my $impl;
1539     __CODE{createImplForTest:: $impl => $impl}__;
1540    
1541     my $byte = qq<abc>;
1542    
1543     my $error;
1544    
1545     open my $fh, '<', \$byte;
1546     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1547     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1548    
1549     $test->id (1);
1550     $test->assert_equals ($efh->getc, "a");
1551    
1552     $test->id (2);
1553     $test->assert_equals ($efh->getc, "b");
1554    
1555     $test->id (3);
1556     $test->assert_equals ($efh->getc, "c");
1557    
1558     $test->id (4);
1559     $test->assert_null ($efh->getc);
1560    
1561     $test->id ('err');
1562     $test->assert_false ($error);
1563    
1564     $test->id ('bom');
1565     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1566    
1567     @Test:
1568     @@QName: MCDecodeHandle.utf-8-optional-bom.3.test
1569     @@PerlDef:
1570     my $impl;
1571     __CODE{createImplForTest:: $impl => $impl}__;
1572    
1573     my $byte = qq<\xEF\xBB\xBF\xEF\xBB\xBFabc>;
1574    
1575     my $error;
1576    
1577     open my $fh, '<', \$byte;
1578     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1579     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1580    
1581     $test->id ('zwnbsp');
1582     $test->assert_equals ($efh->getc, "\x{FEFF}");
1583    
1584     $test->id (1);
1585     $test->assert_equals ($efh->getc, "a");
1586    
1587     $test->id (2);
1588     $test->assert_equals ($efh->getc, "b");
1589    
1590     $test->id (3);
1591     $test->assert_equals ($efh->getc, "c");
1592    
1593     $test->id (4);
1594     $test->assert_null ($efh->getc);
1595    
1596     $test->id ('err');
1597     $test->assert_false ($error);
1598    
1599     $test->id ('bom');
1600     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1601    
1602     @Test:
1603     @@QName: MCDecodeHandle.utf-8-optional-bom.4.test
1604     @@PerlDef:
1605     my $impl;
1606     __CODE{createImplForTest:: $impl => $impl}__;
1607    
1608     my $byte = qq<\xEF\xBB\xBF>;
1609    
1610     my $error;
1611    
1612     open my $fh, '<', \$byte;
1613     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1614     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1615    
1616     $test->id (1);
1617     $test->assert_null ($efh->getc);
1618    
1619     $test->id ('err');
1620     $test->assert_false ($error);
1621    
1622     $test->id ('bom');
1623     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1624    
1625     @Test:
1626     @@QName: MCDecodeHandle.utf-8-optional-bom.5.test
1627     @@PerlDef:
1628     my $impl;
1629     __CODE{createImplForTest:: $impl => $impl}__;
1630    
1631     my $byte = qq<>;
1632    
1633     my $error;
1634    
1635     open my $fh, '<', \$byte;
1636     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1637     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1638    
1639     $test->id (1);
1640     $test->assert_null ($efh->getc);
1641    
1642     $test->id ('err');
1643     $test->assert_false ($error);
1644    
1645     $test->id ('bom');
1646     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1647    
1648     @Test:
1649     @@QName: MCDecodeHandle.utf-8-optional-bom.6.test
1650     @@PerlDef:
1651     my $impl;
1652     __CODE{createImplForTest:: $impl => $impl}__;
1653    
1654     my $byte = qq<ab>;
1655    
1656     my $error;
1657    
1658     open my $fh, '<', \$byte;
1659     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1660     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1661    
1662     $test->id (1);
1663     $test->assert_equals ($efh->getc, "a");
1664    
1665     $test->id (2);
1666     $test->assert_equals ($efh->getc, "b");
1667    
1668     $test->id (3);
1669     $test->assert_null ($efh->getc);
1670    
1671     $test->id ('err');
1672     $test->assert_false ($error);
1673    
1674     $test->id ('bom');
1675     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1676    
1677     @Test:
1678     @@QName: MCDecodeHandle.utf-8-optional-bom.7.test
1679     @@PerlDef:
1680     my $impl;
1681     __CODE{createImplForTest:: $impl => $impl}__;
1682    
1683     my $byte = qq<a>;
1684    
1685     my $error;
1686    
1687     open my $fh, '<', \$byte;
1688     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1689     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1690    
1691     $test->id (1);
1692     $test->assert_equals ($efh->getc, "a");
1693    
1694     $test->id (2);
1695     $test->assert_null ($efh->getc);
1696    
1697     $test->id ('err');
1698     $test->assert_false ($error);
1699    
1700     $test->id ('bom');
1701     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1702    
1703    
1704     @Test:
1705     @@QName: MCDecodeHandle.utf-16-with-bom.1.test
1706     @@PerlDef:
1707     my $impl;
1708     __CODE{createImplForTest:: $impl => $impl}__;
1709    
1710     my $byte = qq<\xFE\xFF\x4E\x00\x00a>;
1711    
1712     my $error;
1713    
1714     open my $fh, '<', \$byte;
1715     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1716     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1717    
1718     $test->id (1);
1719     $test->assert_equals ($efh->getc, "\x{4E00}");
1720    
1721     $test->id (2);
1722     $test->assert_equals ($efh->getc, "a");
1723    
1724     $test->id (3);
1725     $test->assert_null ($efh->getc);
1726    
1727     $test->id ('err');
1728     $test->assert_false ($error);
1729    
1730     $test->id ('bom');
1731     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1732    
1733     @Test:
1734     @@QName: MCDecodeHandle.utf-16-with-bom.2.test
1735     @@PerlDef:
1736     my $impl;
1737     __CODE{createImplForTest:: $impl => $impl}__;
1738    
1739     my $byte = qq<\xFF\xFE\x00\x4Ea\x00>;
1740    
1741     my $error;
1742    
1743     open my $fh, '<', \$byte;
1744     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1745     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1746    
1747     $test->id (1);
1748     $test->assert_equals ($efh->getc, "\x{4E00}");
1749    
1750     $test->id (2);
1751     $test->assert_equals ($efh->getc, "a");
1752    
1753     $test->id (3);
1754     $test->assert_null ($efh->getc);
1755    
1756     $test->id ('err');
1757     $test->assert_false ($error);
1758    
1759     $test->id ('bom');
1760     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1761    
1762     @Test:
1763     @@QName: MCDecodeHandle.utf-16-with-bom.3.test
1764     @@PerlDef:
1765     my $impl;
1766     __CODE{createImplForTest:: $impl => $impl}__;
1767    
1768     my $byte = qq<\xFE\xFF\x00a>;
1769    
1770     my $error;
1771    
1772     open my $fh, '<', \$byte;
1773     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1774     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1775    
1776     $test->id (1);
1777     $test->assert_equals ($efh->getc, "a");
1778    
1779     $test->id (2);
1780     $test->assert_null ($efh->getc);
1781    
1782     $test->id ('err');
1783     $test->assert_false ($error);
1784    
1785     $test->id ('bom');
1786     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1787    
1788     @Test:
1789     @@QName: MCDecodeHandle.utf-16-with-bom.4.test
1790     @@PerlDef:
1791     my $impl;
1792     __CODE{createImplForTest:: $impl => $impl}__;
1793    
1794     my $byte = qq<\xFF\xFEa\x00>;
1795    
1796     my $error;
1797    
1798     open my $fh, '<', \$byte;
1799     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1800     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1801    
1802     $test->id (1);
1803     $test->assert_equals ($efh->getc, "a");
1804    
1805     $test->id (2);
1806     $test->assert_null ($efh->getc);
1807    
1808     $test->id ('err');
1809     $test->assert_false ($error);
1810    
1811     $test->id ('bom');
1812     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1813    
1814     @Test:
1815     @@QName: MCDecodeHandle.utf-16-with-bom.5.test
1816     @@PerlDef:
1817     my $impl;
1818     __CODE{createImplForTest:: $impl => $impl}__;
1819    
1820     my $byte = qq<\xFE\xFFa>;
1821    
1822     my $error;
1823    
1824     open my $fh, '<', \$byte;
1825     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1826     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1827    
1828     $test->id ('0.error');
1829     $test->assert_null ($error);
1830    
1831     $test->id (1);
1832     $test->assert_equals ($efh->getc, "a");
1833     $test->id ('1.error');
1834     $test->assert_equals ($error, <Q::cs|illegal-octets-error>);
1835     $error = null;
1836    
1837     $test->id (2);
1838     $test->assert_null ($efh->getc);
1839    
1840     $test->id ('err');
1841     $test->assert_null ($error);
1842    
1843     $test->id ('bom');
1844     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1845    
1846     @Test:
1847     @@QName: MCDecodeHandle.utf-16-with-bom.6.test
1848     @@PerlDef:
1849     my $impl;
1850     __CODE{createImplForTest:: $impl => $impl}__;
1851    
1852     my $byte = qq<\xFF\xFEa>;
1853    
1854     my $error;
1855    
1856     open my $fh, '<', \$byte;
1857     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1858     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1859    
1860     $test->id ('0.error');
1861     $test->assert_null ($error);
1862    
1863     $test->id (1);
1864     $test->assert_equals ($efh->getc, "a");
1865     $test->id ('1.error');
1866     $test->assert_equals ($error, <Q::cs|illegal-octets-error>);
1867     $error = null;
1868    
1869     $test->id (2);
1870     $test->assert_null ($efh->getc);
1871    
1872     $test->id ('err');
1873     $test->assert_null ($error);
1874    
1875     $test->id ('bom');
1876     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1877    
1878     @Test:
1879     @@QName: MCDecodeHandle.utf-16-with-bom.7.test
1880     @@PerlDef:
1881     my $impl;
1882     __CODE{createImplForTest:: $impl => $impl}__;
1883    
1884     my $byte = qq<\xFE\xFF>;
1885    
1886     my $error;
1887    
1888     open my $fh, '<', \$byte;
1889     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1890     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1891    
1892     $test->id (1);
1893     $test->assert_null ($efh->getc);
1894    
1895     $test->id ('err');
1896     $test->assert_null ($error);
1897    
1898     $test->id ('bom');
1899     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1900    
1901     @Test:
1902     @@QName: MCDecodeHandle.utf-16-with-bom.8.test
1903     @@PerlDef:
1904     my $impl;
1905     __CODE{createImplForTest:: $impl => $impl}__;
1906    
1907     my $byte = qq<\xFF\xFE>;
1908    
1909     my $error;
1910    
1911     open my $fh, '<', \$byte;
1912     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1913     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1914    
1915     $test->id (1);
1916     $test->assert_null ($efh->getc);
1917    
1918     $test->id ('err');
1919     $test->assert_null ($error);
1920    
1921     $test->id ('bom');
1922     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1923    
1924     @Test:
1925     @@QName: MCDecodeHandle.utf-16-with-bom.9.test
1926     @@PerlDef:
1927     my $impl;
1928     __CODE{createImplForTest:: $impl => $impl}__;
1929    
1930     my $byte = qq<\xFD\xFF>;
1931    
1932     my $error;
1933    
1934     open my $fh, '<', \$byte;
1935     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1936     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1937    
1938     $test->id ('no.bom');
1939     $test->assert_equals ($error, <Q::cs|no-bom-error>);
1940     $error = null;
1941    
1942     $test->id (1);
1943     $test->assert_equals ($efh->getc, "\x{FDFF}");
1944    
1945     $test->id (2);
1946     $test->assert_null ($efh->getc);
1947    
1948     $test->id ('err');
1949     $test->assert_null ($error);
1950    
1951     $test->id ('bom');
1952     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1953    
1954     @Test:
1955     @@QName: MCDecodeHandle.utf-16-with-bom.10.test
1956     @@PerlDef:
1957     my $impl;
1958     __CODE{createImplForTest:: $impl => $impl}__;
1959    
1960     my $byte = qq<\xFD>;
1961    
1962     my $error;
1963    
1964     open my $fh, '<', \$byte;
1965     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1966     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1967    
1968     $test->id ('no.bom');
1969     $test->assert_equals ($error, <Q::cs|no-bom-error>);
1970     $error = null;
1971    
1972     $test->id (1);
1973     $test->assert_equals ($efh->getc, "\xFD");
1974     $test->id ('1.error');
1975     $test->assert_equals ($error, <Q::cs|illegal-octets-error>);
1976     $error = null;
1977    
1978     $test->id (2);
1979     $test->assert_null ($efh->getc);
1980    
1981     $test->id ('err');
1982     $test->assert_null ($error);
1983    
1984     $test->id ('bom');
1985     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1986    
1987     @Test:
1988     @@QName: MCDecodeHandle.utf-16-with-bom.11.test
1989     @@PerlDef:
1990     my $impl;
1991     __CODE{createImplForTest:: $impl => $impl}__;
1992    
1993     my $byte = qq<>;
1994    
1995     my $error;
1996    
1997     open my $fh, '<', \$byte;
1998     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1999     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
2000    
2001     $test->id ('no.bom');
2002     $test->assert_equals ($error, <Q::cs|no-bom-error>);
2003     $error = null;
2004    
2005     $test->id (1);
2006     $test->assert_null ($efh->getc);
2007    
2008     $test->id ('err');
2009     $test->assert_null ($error);
2010    
2011     $test->id ('bom');
2012     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
2013 wakaba 1.2
2014 wakaba 1.3 @Test:
2015     @@QName: MCDecodeHandle.xml.1.test
2016     @@PerlDef:
2017     my $impl;
2018     __CODE{createImplForTest:: $impl => $impl}__;
2019 wakaba 1.2
2020 wakaba 1.3 my @testdata = (
2021     {
2022     id => q<l=0>,
2023     in => q<>,
2024     out => [null],
2025     name => 'utf-8', bom => false,
2026     },
2027     {
2028     id => q<l=1>,
2029     in => "a",
2030     out => [null, "a", null],
2031     name => 'utf-8', bom => false,
2032     },
2033     {
2034     id => q<bom8.l=0>,
2035     in => "\xEF\xBB\xBF",
2036     out => [null],
2037     name => 'utf-8', bom => true,
2038     },
2039     {
2040     id => q<bom8.l=1>,
2041     in => "\xEF\xBB\xBFa",
2042     out => [null, "a", null],
2043     name => 'utf-8', bom => true,
2044     },
2045     {
2046     id => q<bom8.zwnbsp>,
2047     in => "\xEF\xBB\xBF\xEF\xBB\xBF",
2048     out => [null, "\x{FEFF}", null],
2049     name => 'utf-8', bom => true,
2050     },
2051     {
2052     id => q<bom16be.l=0>,
2053     in => "\xFE\xFF",
2054     out => [null],
2055     name => 'utf-16', bom => true,
2056     },
2057     {
2058     id => q<bom16le.l=0>,
2059     in => "\xFF\xFE",
2060     out => [null],
2061     name => 'utf-16', bom => true,
2062     },
2063     {
2064     id => q<bom16be.l=1>,
2065     in => "\xFE\xFFa",
2066     out => [null, "a", [<Q::cs|illegal-octets-error>]],
2067     name => 'utf-16', bom => true,
2068     },
2069     {
2070     id => q<bom16le.l=1>,
2071     in => "\xFF\xFEa",
2072     out => [null, "a", [<Q::cs|illegal-octets-error>]],
2073     name => 'utf-16', bom => true,
2074     },
2075     {
2076     id => q<bom16be.l=2>,
2077     in => "\xFE\xFF\x4E\x00",
2078     out => [null, "\x{4E00}", null],
2079     name => 'utf-16', bom => true,
2080     },
2081     {
2082     id => q<bom16le.l=2>,
2083     in => "\xFF\xFE\x00\x4E",
2084     out => [null, "\x{4E00}", null],
2085     name => 'utf-16', bom => true,
2086     },
2087     {
2088     id => q<bom16be.l=2lt>,
2089     in => "\xFE\xFF\x00<",
2090     out => [null, "<", null],
2091     name => 'utf-16', bom => true,
2092     },
2093     {
2094     id => q<bom16le.l=2lt>,
2095     in => "\xFF\xFE<\x00",
2096     out => [null, "<", null],
2097     name => 'utf-16', bom => true,
2098     },
2099     {
2100     id => q<bom16be.zwnbsp>,
2101     in => "\xFE\xFF\xFE\xFF",
2102     out => [null, "\x{FEFF}", null],
2103     name => 'utf-16', bom => true,
2104     },
2105     {
2106     id => q<bom16le.zwnbsp>,
2107     in => "\xFF\xFE\xFF\xFE",
2108     out => [null, "\x{FEFF}", null],
2109     name => 'utf-16', bom => true,
2110     },
2111     {
2112     id => q<bom32e3412.l=0>,
2113     in => "\xFE\xFF\x00\x00",
2114     out => [null, "\x00", null],
2115     name => 'utf-16', bom => true,
2116     },
2117     {
2118     id => q<bom32e4321.l=0>,
2119     in => "\xFF\xFE\x00\x00",
2120     out => [null, "\x00", null],
2121     name => 'utf-16', bom => true,
2122     },
2123     {
2124     id => q<bom16be.l=4ltq>,
2125     in => "\xFE\xFF\x00<\x00?",
2126     out => [null, "<", null, "?", null],
2127     name => 'utf-16', bom => true,
2128     },
2129     {
2130     id => q<bom16le.l=4ltq>,
2131     in => "\xFF\xFE<\x00?\x00",
2132     out => [null, "<", null, "?", null],
2133     name => 'utf-16', bom => true,
2134     },
2135     {
2136     id => q<bom16be.decl.1>,
2137     in => qq[\xFE\xFF\x00<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2138     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2139     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2140     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>],
2141     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2142     " ", null, "v", null, "e", null, "r", null, "s", null,
2143     "i", null, "o", null, "n", null, "=", null, '"', null,
2144     "1", null, ".", null, "0", null, '"', null, " ", null,
2145     "e", null, "n", null, "c", null, "o", null, "d", null,
2146     "i", null, "n", null, "g", null, "=", null, '"', null,
2147     "u", null, "t", null, "f", null, "-", null, "1", null,
2148     "6", null, '"', null, "?", null, ">", null],
2149     name => 'utf-16', bom => true,
2150     },
2151     {
2152     id => q<bom16le.decl.1>,
2153     in => qq[\xFF\xFE<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2154     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2155     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2156     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>\x00],
2157     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2158     " ", null, "v", null, "e", null, "r", null, "s", null,
2159     "i", null, "o", null, "n", null, "=", null, '"', null,
2160     "1", null, ".", null, "0", null, '"', null, " ", null,
2161     "e", null, "n", null, "c", null, "o", null, "d", null,
2162     "i", null, "n", null, "g", null, "=", null, '"', null,
2163     "u", null, "t", null, "f", null, "-", null, "1", null,
2164     "6", null, '"', null, "?", null, ">", null],
2165     name => 'utf-16', bom => true,
2166     },
2167     {
2168     id => q<utf16be.decl.1>,
2169     in => qq[\x00<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2170     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2171     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2172     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00b\x00e\x00"\x00?\x00>],
2173     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2174     " ", null, "v", null, "e", null, "r", null, "s", null,
2175     "i", null, "o", null, "n", null, "=", null, '"', null,
2176     "1", null, ".", null, "0", null, '"', null, " ", null,
2177     "e", null, "n", null, "c", null, "o", null, "d", null,
2178     "i", null, "n", null, "g", null, "=", null, '"', null,
2179     "u", null, "t", null, "f", null, "-", null, "1", null,
2180     "6", null, "b", null, "e", null, '"', null,
2181     "?", null, ">", null],
2182     name => 'utf-16be', bom => false,
2183     },
2184     {
2185     id => q<utf16le.decl.1>,
2186     in => qq[<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2187     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2188     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2189     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00l\x00e\x00"].
2190     qq[\x00?\x00>\x00],
2191     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2192     " ", null, "v", null, "e", null, "r", null, "s", null,
2193     "i", null, "o", null, "n", null, "=", null, '"', null,
2194     "1", null, ".", null, "0", null, '"', null, " ", null,
2195     "e", null, "n", null, "c", null, "o", null, "d", null,
2196     "i", null, "n", null, "g", null, "=", null, '"', null,
2197     "u", null, "t", null, "f", null, "-", null, "1", null,
2198     "6", null, "l", null, "e", null, '"', null, "?", null,
2199     ">", null],
2200     name => 'utf-16le', bom => false,
2201     },
2202     {
2203     id => q<16be.decl.1>,
2204     in => qq[\x00<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2205     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2206     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2207     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>],
2208     out => [[<Q::cs|charset-name-mismatch-error>],
2209     "<", null, "?", null, "x", null, "m", null, "l", null,
2210     " ", null, "v", null, "e", null, "r", null, "s", null,
2211     "i", null, "o", null, "n", null, "=", null, '"', null,
2212     "1", null, ".", null, "0", null, '"', null, " ", null,
2213     "e", null, "n", null, "c", null, "o", null, "d", null,
2214     "i", null, "n", null, "g", null, "=", null, '"', null,
2215     "u", null, "t", null, "f", null, "-", null, "1", null,
2216     "6", null, '"', null, "?", null, ">", null],
2217     name => 'utf-16', bom => false,
2218     },
2219     {
2220     id => q<16le.decl.1>,
2221     in => qq[<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2222     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2223     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2224     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>\x00],
2225     out => [[<Q::cs|charset-name-mismatch-error>],
2226     "<", null, "?", null, "x", null, "m", null, "l", null,
2227     " ", null, "v", null, "e", null, "r", null, "s", null,
2228     "i", null, "o", null, "n", null, "=", null, '"', null,
2229     "1", null, ".", null, "0", null, '"', null, " ", null,
2230     "e", null, "n", null, "c", null, "o", null, "d", null,
2231     "i", null, "n", null, "g", null, "=", null, '"', null,
2232     "u", null, "t", null, "f", null, "-", null, "1", null,
2233     "6", null, '"', null, "?", null, ">", null],
2234     name => 'utf-16', bom => false,
2235     },
2236     {
2237     id => q<8.decl.1>,
2238     in => qq[<?xml version="1.0" encoding="utf-8"?>],
2239     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2240     " ", null, "v", null, "e", null, "r", null, "s", null,
2241     "i", null, "o", null, "n", null, "=", null, '"', null,
2242     "1", null, ".", null, "0", null, '"', null, " ", null,
2243     "e", null, "n", null, "c", null, "o", null, "d", null,
2244     "i", null, "n", null, "g", null, "=", null, '"', null,
2245     "u", null, "t", null, "f", null, "-", null, "8", null,
2246     '"', null, "?", null, ">", null],
2247     name => 'utf-8', bom => false,
2248     },
2249     {
2250     id => q<8.decl.2>,
2251     in => qq[<?xml encoding="utf-8"?>],
2252     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2253     " ", null,
2254     "e", null, "n", null, "c", null, "o", null, "d", null,
2255     "i", null, "n", null, "g", null, "=", null, '"', null,
2256     "u", null, "t", null, "f", null, "-", null, "8", null,
2257     '"', null, "?", null, ">", null],
2258     name => 'utf-8', bom => false,
2259     },
2260     {
2261     id => q<8.decl.3>,
2262     in => qq[<?xml version="1.1" encoding="utf-8"?>],
2263     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2264     " ", null, "v", null, "e", null, "r", null, "s", null,
2265     "i", null, "o", null, "n", null, "=", null, '"', null,
2266     "1", null, ".", null, "1", null, '"', null, " ", null,
2267     "e", null, "n", null, "c", null, "o", null, "d", null,
2268     "i", null, "n", null, "g", null, "=", null, '"', null,
2269     "u", null, "t", null, "f", null, "-", null, "8", null,
2270     '"', null, "?", null, ">", null],
2271     name => 'utf-8', bom => false,
2272     },
2273     {
2274     id => q<8.decl.4>,
2275     in => qq[<?xml version="1.0"?>],
2276     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2277     " ", null, "v", null, "e", null, "r", null, "s", null,
2278     "i", null, "o", null, "n", null, "=", null, '"', null,
2279     "1", null, ".", null, "0", null, '"', null,
2280     "?", null, ">", null],
2281     name => 'utf-8', bom => false,
2282     },
2283     {
2284     id => q<bom8.decl.1>,
2285     in => qq[\xEF\xBB\xBF<?xml encoding="utf-8"?>],
2286     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2287     " ", null,
2288     "e", null, "n", null, "c", null, "o", null, "d", null,
2289     "i", null, "n", null, "g", null, "=", null, '"', null,
2290     "u", null, "t", null, "f", null, "-", null, "8", null,
2291     '"', null, "?", null, ">", null],
2292     name => 'utf-8', bom => true,
2293     },
2294     {
2295     id => q<us-ascii.decl.1>,
2296     in => qq[<?xml encoding="us-ascii"?>],
2297     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2298     " ", null,
2299     "e", null, "n", null, "c", null, "o", null, "d", null,
2300     "i", null, "n", null, "g", null, "=", null, '"', null,
2301     "u", null, "s", null, "-", null, "a", null, "s", null,
2302     "c", null, "i", null, "i", null,
2303     '"', null, "?", null, ">", null],
2304     name => 'us-ascii', bom => false,
2305     },
2306     {
2307     id => q<us-ascii.decl.2>,
2308     in => qq[<?xml encoding="US-ascii"?>],
2309     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2310     " ", null,
2311     "e", null, "n", null, "c", null, "o", null, "d", null,
2312     "i", null, "n", null, "g", null, "=", null, '"', null,
2313     "U", null, "S", null, "-", null, "a", null, "s", null,
2314     "c", null, "i", null, "i", null,
2315     '"', null, "?", null, ">", null],
2316     name => 'us-ascii', bom => false,
2317     },
2318     {
2319     id => q<us-ascii.decl.3>,
2320     in => qq[<?xml encoding='us-ascii'?>],
2321     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2322     " ", null,
2323     "e", null, "n", null, "c", null, "o", null, "d", null,
2324     "i", null, "n", null, "g", null, "=", null, "'", null,
2325     "u", null, "s", null, "-", null, "a", null, "s", null,
2326     "c", null, "i", null, "i", null,
2327     "'", null, "?", null, ">", null],
2328     name => 'us-ascii', bom => false,
2329     },
2330     );
2331    
2332     for my $testdata (@testdata) {
2333     my $byte = $testdata->{in};
2334     my $error;
2335     my $i = 0;
2336    
2337     open my $fh, '<', \$byte;
2338     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
2339     (<Q::xml-auto-charset:>, $fh, sub {
2340     my (null, null, $etype, %opt) = @_;
2341     $error = [$etype, \%opt];
2342     });
2343    
2344     $test->id ("$testdata->{id}.bom");
2345     my $tf = $testdata->{bom} ? 'assert_true' : 'assert_false';
2346     $test->$tf ($efh-><AG::MCDecodeHandle.hasBOM>);
2347    
2348     $test->id ("$testdata->{id}.name");
2349     $test->assert_equals ($efh-><AG::MCDecodeHandle.inputEncoding>,
2350     $testdata->{name});
2351    
2352     while (@{$testdata->{out}}) {
2353     if ($i != 0) {
2354     my $c = shift @{$testdata->{out}};
2355     $test->id ("$testdata->{id}.$i");
2356     $test->assert_equals ($efh->getc, $c);
2357     }
2358    
2359     my $v = shift @{$testdata->{out}};
2360     $test->id ("$testdata->{id}.$i.error");
2361     if (defined $v) {
2362     $test->assert_not_null ($error);
2363     $test->assert_equals ($error->[0], $v->[0]);
2364     } else {
2365     $test->assert_null ($error->[0]);
2366     }
2367     $error = null;
2368     $i++;
2369     }
2370 wakaba 1.2
2371 wakaba 1.3 $test->id ("$testdata->{id}.eof");
2372     $test->assert_null ($efh->getc);
2373     $test->assert_null ($error);
2374     } # testdata
2375 wakaba 1.7
2376     @Method:
2377     @@Name: close
2378     @@Return:
2379     @@@PerlDef:
2380     close $self->{<H::mce|filehandle>};
2381 wakaba 1.3 ##MCDecodeHandle
2382 wakaba 1.5
2383     ClsDef:
2384     @ClsQName: ManakaiMCEUCJPDecodeHandle
2385    
2386     @ClsISA: ManakaiMCDecodeHandle
2387    
2388     @Method:
2389     @@Name: getc
2390     @@enDesc:
2391     Returns the next character from the input.
2392     @@Return:
2393     @@@Type: String
2394     @@@enDesc:
2395     The next character.
2396     @@@nullCase:
2397     @@@@enDesc:
2398     If at the end of the file, or if there was an error, in which
2399     case <Perl::$!> is set.
2400     @@@PerlDef:
2401     if (@{$self->{<H::mce|characterQueue>}}) {
2402     $r = shift @{$self->{<H::mce|characterQueue>}};
2403     } else {
2404     __DEEP{
2405     my $error;
2406     if ($self->{<H::mce|continue>}) {
2407     if (read $self->{<H::mce|filehandle>},
2408     $self->{<H::mce|byteBuffer>}, 256,
2409     length $self->{<H::mce|byteBuffer>}) {
2410     #
2411     } else {
2412     $error = true;
2413     }
2414     $self->{<H::mce|continue>} = false;
2415     } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
2416     read $self->{<H::mce|filehandle>},
2417     $self->{<H::mce|byteBuffer>}, 256,
2418     length $self->{<H::mce|byteBuffer>};
2419     }
2420    
2421     unless ($error) {
2422     my $string = Encode::decode ($self->{<H::mce|perlEncodingName>},
2423     $self->{<H::mce|byteBuffer>},
2424     Encode::FB_QUIET ());
2425     if (length $string) {
2426     push @{$self->{<H::mce|characterQueue>}}, split //, $string;
2427     $r = shift @{$self->{<H::mce|characterQueue>}};
2428     if (length $self->{<H::mce|byteBuffer>}) {
2429     $self->{<H::mce|continue>} = true;
2430     }
2431     } else {
2432     if (length $self->{<H::mce|byteBuffer>}) {
2433     $error = true;
2434     } else {
2435     $r = null;
2436     }
2437     }
2438     }
2439     if ($error) {
2440     $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
2441     my $etype = <Q::cs|illegal-octets-error>;
2442     if ($r =~ /^[\xA1-\xFE]/) {
2443     if ($self->{<H::mce|byteBuffer>} =~ s/^([\xA1-\xFE])//) {
2444     $r .= $1;
2445     $etype = <Q::cs|unassigned-code-point-error>;
2446     }
2447     } elsif ($r eq "\x8F") {
2448     if ($self->{<H::mce|byteBuffer>}
2449     =~ s/^([\xA1-\xFE][\xA1-\xFE]?)//) {
2450     $r .= $1;
2451     $etype = <Q::cs|unassigned-code-point-error> if length $1 == 2;
2452     }
2453     } elsif ($r eq "\x8E") {
2454     if ($self->{<H::mce|byteBuffer>} =~ s/^([\xA1-\xFE])//) {
2455     $r .= $1;
2456     $etype = <Q::cs|unassigned-code-point-error>;
2457     }
2458     } elsif ($r eq "\xA0" or $r eq "\xFF") {
2459     $etype = <Q::cs|unassigned-code-point-error>;
2460     }
2461     $self->{<H::mce|onerror>}
2462     ->($self->{<H::mce|onerror>}, $self, $etype,
2463     octets => \$r);
2464     }
2465     }__;
2466     }
2467    
2468    
2469     @Test:
2470     @@QName: MCEUCJPDecodeHandle.1.test
2471     @@PerlDef:
2472     my $impl;
2473     __CODE{createImplForTest:: $impl => $impl}__;
2474    
2475     my @testdata = (
2476     {
2477     id => q<l=0>,
2478     in => q<>,
2479     out => [null],
2480     },
2481     {
2482     id => q<l=1.00>,
2483     in => qq<\x00>,
2484     out => [null, "\x00", null],
2485     },
2486     {
2487     id => q<l=1.0d>,
2488     in => qq<\x0D>,
2489     out => [null, "\x0D", null],
2490     },
2491     {
2492     id => q<l=1.0e>,
2493     in => qq<\x0E>,
2494     out => [null, "\x0E", null],
2495     }, # Error??
2496     {
2497     id => q<l=1.0f>,
2498     in => qq<\x0F>,
2499     out => [null, "\x0F", null],
2500     }, # Error??
2501     {
2502     id => q<l=1.1b>,
2503     in => qq<\x1B>,
2504     out => [null, "\x1B", null],
2505     }, # Error??
2506     {
2507     id => q<l=1.a>,
2508     in => q<a>,
2509     out => [null, "a", null],
2510     },
2511     {
2512     id => q<l=1.20>,
2513     in => qq<\x20>,
2514     out => [null, "\x20", null],
2515     },
2516     {
2517     id => q<5C>,
2518     in => qq<\x5C>,
2519     out => [null, "\x5C", null],
2520     },
2521     {
2522     id => q<l=1.7E>,
2523     in => qq<\x7E>,
2524     out => [null, "\x7E", null],
2525     },
2526     {
2527     id => q<l=1.7F>,
2528     in => qq<\x7F>,
2529     out => [null, "\x7F", null],
2530     },
2531     {
2532     id => q<l=1.80>,
2533     in => qq<\x80>,
2534     out => [null, "\x80", null],
2535     },
2536     {
2537     id => q<l=1.8c>,
2538     in => qq<\x8C>,
2539     out => [null, "\x8C", null],
2540     },
2541     {
2542     id => q<l=1.8e>,
2543     in => qq<\x8E>,
2544     out => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
2545     },
2546     {
2547     id => q<l=1.8f>,
2548     in => qq<\x8F>,
2549     out => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
2550     },
2551     {
2552     id => q<l=1.a0>,
2553     in => qq<\xA0>,
2554     out => [null, "\xA0", [<Q::cs|unassigned-code-point-error>]],
2555     },
2556     {
2557     id => q<l=1.a1>,
2558     in => qq<\xA1>,
2559     out => [null, "\xA1", [<Q::cs|illegal-octets-error>]],
2560     },
2561     {
2562     id => q<l=1.a2>,
2563     in => qq<\xA2>,
2564     out => [null, "\xA2", [<Q::cs|illegal-octets-error>]],
2565     },
2566     {
2567     id => q<l=1.fd>,
2568     in => qq<\xFD>,
2569     out => [null, "\xFD", [<Q::cs|illegal-octets-error>]],
2570     },
2571     {
2572     id => q<l=1.fe>,
2573     in => qq<\xFE>,
2574     out => [null, "\xFE", [<Q::cs|illegal-octets-error>]],
2575     },
2576     {
2577     id => q<l=1.ff>,
2578     in => qq<\xFF>,
2579     out => [null, "\xFF", [<Q::cs|unassigned-code-point-error>]],
2580     },
2581     {
2582     id => q<l=2.0000>,
2583     in => qq<\x00\x00>,
2584     out => [null, "\x00", null, "\x00", null],
2585     },
2586     {
2587     id => q<l=2.0D0A>,
2588     in => qq<\x0D\x0A>,
2589     out => [null, "\x0D", null, "\x0A", null],
2590     },
2591     {
2592     id => q<l=2.1B28>,
2593     in => qq<\x1B\x28>,
2594     out => [null, "\x1B", null, "\x28", null],
2595     },# Error??
2596     {
2597     id => q<l=2.2020>,
2598     in => qq<\x20\x20>,
2599     out => [null, "\x20", null, "\x20", null],
2600     },
2601     {
2602     id => q<l=2.ab>,
2603     in => qq<ab>,
2604     out => [null, "a", null, "b", null],
2605     },
2606     {
2607     id => q<l=2.a0a1>,
2608     in => qq<\xA0\xA1>,
2609     out => [null, "\xA0", [<Q::cs|unassigned-code-point-error>],
2610     "\xA1", [<Q::cs|illegal-octets-error>]],
2611     },
2612     {
2613     id => q<l=2.a1a1>,
2614     in => qq<\xA1\xA1>,
2615     out => [null, "\x{3000}", null],
2616     },
2617     {
2618     id => q<l=2.a1a2>,
2619     in => qq<\xA1\xA2>,
2620     out => [null, "\x{3001}", null],
2621     },
2622     {
2623     id => q<l=2.a1a4>,
2624     in => qq<\xA1\xA4>,
2625     out => [null, "\x{FF0C}", null], # FULLWIDTH COMMA
2626     },
2627     {
2628     id => q<a1a6>,
2629     in => qq<\xA1\xA6>,
2630     out => [null, "\x{30FB}", null], # KATAKABA MIDDLE DOT
2631     },
2632     {
2633     id => q<a1a7>,
2634     in => qq<\xA1\xA7>,
2635     out => [null, "\x{FF1A}", null], # FULLWIDTH COLON
2636     },
2637     {
2638     id => q<a1b1>,
2639     in => qq<\xA1\xB1>,
2640     out => [null, "\x{203E}", null], # OVERLINE
2641     },
2642     {
2643     id => q<a1bd>,
2644     in => qq<\xA1\xBD>,
2645     out => [null, "\x{2014}", null], # EM DASH
2646     },
2647     {
2648     id => q<a1c0>,
2649     in => qq<\xA1\xC0>,
2650     out => [null, "\x{FF3C}", null], # FULLWIDTH REVERSE SOLIDUS
2651     },
2652     {
2653     id => q<a1c1>,
2654     in => qq<\xA1\xC1>,
2655     out => [null, "\x{301C}", null], # WAVE DASH
2656     },
2657     {
2658     id => q<a1c2>,
2659     in => qq<\xA1\xC2>,
2660     out => [null, "\x{2016}", null], # DOUBLE VERTICAL LINE
2661     },
2662     {
2663     id => q<a1c4>,
2664     in => qq<\xA1\xC4>,
2665     out => [null, "\x{2026}", null], # HORIZONTAL ELLIPSIS
2666     },
2667     {
2668     id => q<a1dd>,
2669     in => qq<\xA1\xDD>,
2670     out => [null, "\x{2212}", null], # MINUS SIGN
2671     },
2672     {
2673     id => q<a1ef>,
2674     in => qq<\xA1\xEF>,
2675     out => [null, "\x{00A5}", null], # YEN SIGN
2676     },
2677     {
2678     id => q<a1f1>,
2679     in => qq<\xA1\xF1>,
2680     out => [null, "\x{00A2}", null], # CENT SIGN
2681     },
2682     {
2683     id => q<a1f2>,
2684     in => qq<\xA1\xF2>,
2685     out => [null, "\x{00A3}", null], # POUND SIGN
2686     },
2687     {
2688     id => q<a1f2>,
2689     in => qq<\xA1\xFF>,
2690     out => [null, "\xA1", [<Q::cs|illegal-octets-error>],
2691     "\xFF", [<Q::cs|unassigned-code-point-error>]],
2692     },
2693     {
2694     id => q<a2ae>,
2695     in => qq<\xA2\xAE>,
2696     out => [null, "\x{3013}", null], # GETA MARK
2697     },
2698     {
2699     id => q<a2af>,
2700     in => qq<\xA2\xAF>,
2701     out => [null, "\xA2\xAF", [<Q::cs|unassigned-code-point-error>]],
2702     },
2703     {
2704     id => q<a2ba>,
2705     in => qq<\xA2\xBA>,
2706     out => [null, "\x{2208}", null], # ELEMENT OF
2707     },
2708     {
2709     id => q<a2fe>,
2710     in => qq<\xA2\xFE>,
2711     out => [null, "\x{25EF}", null], # LARGE CIRCLE
2712     },
2713     {
2714     id => q<adce>,
2715     in => qq<\xAD\xCE>,
2716     out => [null, "\xAD\xCE", [<Q::cs|unassigned-code-point-error>]],
2717     },
2718     {
2719     id => q<b0a6>,
2720     in => qq<\xB0\xA6>,
2721     out => [null, "\x{611B}", null], # han
2722     },
2723     {
2724     id => q<f4a6>,
2725     in => qq<\xF4\xA6>,
2726     out => [null, "\x{7199}", null], # han
2727     },
2728     {
2729     id => q<8ea1>,
2730     in => qq<\x8E\xA1>,
2731     out => [null, "\x{FF61}", null],
2732     },
2733     {
2734     id => q<8efe>,
2735     in => qq<\x8E\xFE>,
2736     out => [null, "\x8E\xFE", [<Q::cs|unassigned-code-point-error>]],
2737     },
2738     {
2739     id => q<8ffe>,
2740     in => qq<\x8F\xFE>,
2741     out => [null, "\x8F\xFE", [<Q::cs|illegal-octets-error>]],
2742     },
2743     {
2744     id => q<l=2.a1a2a3>,
2745     in => qq<\xA1\xA2\xA3>,
2746     out => [null, "\x{3001}", null,
2747     "\xA3", [<Q::cs|illegal-octets-error>]],
2748     },
2749     {
2750     id => q<8ea1a1>,
2751     in => qq<\x8E\xA1\xA1>,
2752     out => [null, "\x{FF61}", null,
2753     "\xA1", [<Q::cs|illegal-octets-error>]],
2754     },
2755     {
2756     id => q<8fa1a1>,
2757     in => qq<\x8F\xA1\xA1>,
2758     out => [null, "\x8F\xA1\xA1", [<Q::cs|unassigned-code-point-error>]],
2759     },
2760     {
2761     id => q<8fa2af>,
2762     in => qq<\x8F\xA2\xAF>,
2763     out => [null, "\x{02D8}", null],
2764     },
2765     {
2766     id => q<8fa2b7>,
2767     in => qq<\x8F\xA2\xB7>,
2768     out => [null, "\x{FF5E}", null], # FULLWIDTH TILDE
2769     },
2770     {
2771     id => q<a1a2a1a3>,
2772     in => qq<\xA1\xA2\xA1\xA3>,
2773     out => [null, "\x{3001}", null, "\x{3002}", null],
2774     },
2775     {
2776     id => q<8fa2af>,
2777     in => qq<\x8F\xA2\xAF\xAF>,
2778     out => [null, "\x{02D8}", null,
2779     "\xAF", [<Q::cs|illegal-octets-error>]],
2780     },
2781     {
2782     id => q<8fa2afafa1>,
2783     in => qq<\x8F\xA2\xAF\xAF\xA1>,
2784     out => [null, "\x{02D8}", null,
2785     "\xAF\xA1", [<Q::cs|unassigned-code-point-error>]],
2786     },
2787     );
2788    
2789     for my $testdata (@testdata) {
2790     my $byte = $testdata->{in};
2791     my $error;
2792     my $i = 0;
2793    
2794     open my $fh, '<', \$byte;
2795     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
2796     (<Q::cs|XML.euc-jp>, $fh, sub {
2797     my (null, null, $etype, %opt) = @_;
2798     $error = [$etype, \%opt];
2799     });
2800    
2801     while (@{$testdata->{out}}) {
2802     if ($i != 0) {
2803     my $c = shift @{$testdata->{out}};
2804     $test->id ("$testdata->{id}.$i");
2805     $test->assert_equals ($efh->getc, $c);
2806     }
2807    
2808     my $v = shift @{$testdata->{out}};
2809     $test->id ("$testdata->{id}.$i.error");
2810     if (defined $v) {
2811     $test->assert_not_null ($error);
2812     $test->assert_equals ($error->[0], $v->[0]);
2813     } else {
2814     $test->assert_null ($error->[0]);
2815     }
2816     $error = null;
2817     $i++;
2818     }
2819    
2820     $test->id ("$testdata->{id}.eof");
2821     $test->assert_null ($efh->getc);
2822     $test->assert_null ($error);
2823     } # testdata
2824     ##MCEUCJPDecodeHandle
2825    
2826     ClsDef:
2827     @ClsQName: ManakaiMCShiftJISDecodeHandle
2828    
2829     @ClsISA: ManakaiMCDecodeHandle
2830    
2831     @Method:
2832     @@Name: getc
2833     @@enDesc:
2834     Returns the next character from the input.
2835     @@Return:
2836     @@@Type: String
2837     @@@enDesc:
2838     The next character.
2839     @@@nullCase:
2840     @@@@enDesc:
2841     If at the end of the file, or if there was an error, in which
2842     case <Perl::$!> is set.
2843     @@@PerlDef:
2844     if (@{$self->{<H::mce|characterQueue>}}) {
2845     $r = shift @{$self->{<H::mce|characterQueue>}};
2846     } else {
2847     __DEEP{
2848     my $error;
2849     if ($self->{<H::mce|continue>}) {
2850     if (read $self->{<H::mce|filehandle>},
2851     $self->{<H::mce|byteBuffer>}, 256,
2852     length $self->{<H::mce|byteBuffer>}) {
2853     #
2854     } else {
2855     $error = true;
2856     }
2857     $self->{<H::mce|continue>} = false;
2858     } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
2859     read $self->{<H::mce|filehandle>},
2860     $self->{<H::mce|byteBuffer>}, 256,
2861     length $self->{<H::mce|byteBuffer>};
2862     }
2863    
2864     unless ($error) {
2865     my $string = Encode::decode ($self->{<H::mce|perlEncodingName>},
2866     $self->{<H::mce|byteBuffer>},
2867     Encode::FB_QUIET ());
2868     if (length $string) {
2869     push @{$self->{<H::mce|characterQueue>}}, split //, $string;
2870     $r = shift @{$self->{<H::mce|characterQueue>}};
2871     if (length $self->{<H::mce|byteBuffer>}) {
2872     $self->{<H::mce|continue>} = true;
2873     }
2874     } else {
2875     if (length $self->{<H::mce|byteBuffer>}) {
2876     $error = true;
2877     } else {
2878     $r = null;
2879     }
2880     }
2881     }
2882     if ($error) {
2883     $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
2884     my $etype = <Q::cs|illegal-octets-error>;
2885     if ($r =~ /^[\x81-\x9F\xE0-\xEF]/) {
2886     if ($self->{<H::mce|byteBuffer>} =~ s/(.)//s) {
2887     $r .= $1; # not limited to \x40-\xFC - \x7F
2888     $etype = <Q::cs|unassigned-code-point-error>;
2889     }
2890     } elsif ($r =~ /^[\x80\xA0\xF0-\xFF]/) {
2891     $etype = <Q::cs|unassigned-code-point-error>;
2892     }
2893     $self->{<H::mce|onerror>}
2894     ->($self->{<H::mce|onerror>}, $self, $etype,
2895     octets => \$r);
2896     }
2897     }__;
2898     }
2899    
2900     @Test:
2901     @@QName: MCShiftJISDecodeHandle.1.test
2902     @@PerlDef:
2903     my $impl;
2904     __CODE{createImplForTest:: $impl => $impl}__;
2905    
2906     my @testdata = (
2907     {
2908     id => q<l=0>,
2909     in => q<>,
2910     out => [null],
2911     },
2912     {
2913     id => q<l=1.00>,
2914     in => qq<\x00>,
2915     out => [null, "\x00", null],
2916     },
2917     {
2918     id => q<l=1.0d>,
2919     in => qq<\x0D>,
2920     out => [null, "\x0D", null],
2921     },
2922     {
2923     id => q<l=1.0e>,
2924     in => qq<\x0E>,
2925     out => [null, "\x0E", null],
2926     }, # Error??
2927     {
2928     id => q<l=1.0f>,
2929     in => qq<\x0F>,
2930     out => [null, "\x0F", null],
2931     }, # Error??
2932     {
2933     id => q<l=1.1b>,
2934     in => qq<\x1B>,
2935     out => [null, "\x1B", null],
2936     }, # Error??
2937     {
2938     id => q<l=1.a>,
2939     in => q<a>,
2940     out => [null, "a", null],
2941     },
2942     {
2943     id => q<l=1.20>,
2944     in => qq<\x20>,
2945     out => [null, "\x20", null],
2946     },
2947     {
2948     id => q<l=1.5C>,
2949     in => qq<\x5C>,
2950     out => [null, "\xA5", null], # YEN SIGN
2951     },
2952     {
2953     id => q<l=1.7E>,
2954     in => qq<\x7E>,
2955     out => [null, "\x{203E}", null], # OVERLINE
2956     },
2957     {
2958     id => q<l=1.7F>,
2959     in => qq<\x7F>,
2960     out => [null, "\x7F", null],
2961     },
2962     {
2963     id => q<l=1.80>,
2964     in => qq<\x80>,
2965     out => [null, "\x80", [<Q::cs|unassigned-code-point-error>]],
2966     },
2967     {
2968     id => q<l=1.8c>,
2969     in => qq<\x8C>,
2970     out => [null, "\x8C", [<Q::cs|illegal-octets-error>]],
2971     },
2972     {
2973     id => q<l=1.8e>,
2974     in => qq<\x8E>,
2975     out => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
2976     },
2977     {
2978     id => q<l=1.8f>,
2979     in => qq<\x8F>,
2980     out => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
2981     },
2982     {
2983     id => q<l=1.a0>,
2984     in => qq<\xA0>,
2985     out => [null, "\xA0", [<Q::cs|unassigned-code-point-error>]],
2986     },
2987     {
2988     id => q<l=1.a1>,
2989     in => qq<\xA1>,
2990     out => [null, "\x{FF61}", null],
2991     },
2992     {
2993     id => q<l=1.a2>,
2994     in => qq<\xA2>,
2995     out => [null, "\x{FF62}", null],
2996     },
2997     {
2998     id => q<l=1.df>,
2999     in => qq<\xdf>,
3000     out => [null, "\x{FF9F}", null],
3001     },
3002     {
3003     id => q<l=1.e0>,
3004     in => qq<\xe0>,
3005     out => [null, "\xE0", [<Q::cs|illegal-octets-error>]],
3006     },
3007     {
3008     id => q<l=1.ef>,
3009     in => qq<\xEF>,
3010     out => [null, "\xEF", [<Q::cs|illegal-octets-error>]],
3011     },
3012     {
3013     id => q<F0>,
3014     in => qq<\xF0>,
3015     out => [null, "\xF0", [<Q::cs|unassigned-code-point-error>]],
3016     },
3017     {
3018     id => q<l=1.fc>,
3019     in => qq<\xFC>,
3020     out => [null, "\xFC", [<Q::cs|unassigned-code-point-error>]],
3021     },
3022     {
3023     id => q<l=1.fd>,
3024     in => qq<\xFD>,
3025     out => [null, "\xFD", [<Q::cs|unassigned-code-point-error>]],
3026     },
3027     {
3028     id => q<l=1.fe>,
3029     in => qq<\xFE>,
3030     out => [null, "\xFE", [<Q::cs|unassigned-code-point-error>]],
3031     },
3032     {
3033     id => q<l=1.ff>,
3034     in => qq<\xFF>,
3035     out => [null, "\xFF", [<Q::cs|unassigned-code-point-error>]],
3036     },
3037     {
3038     id => q<l=2.0000>,
3039     in => qq<\x00\x00>,
3040     out => [null, "\x00", null, "\x00", null],
3041     },
3042     {
3043     id => q<l=2.0D0A>,
3044     in => qq<\x0D\x0A>,
3045     out => [null, "\x0D", null, "\x0A", null],
3046     },
3047     {
3048     id => q<l=2.1B28>,
3049     in => qq<\x1B\x28>,
3050     out => [null, "\x1B", null, "\x28", null],
3051     },# Error??
3052     {
3053     id => q<l=2.2020>,
3054     in => qq<\x20\x20>,
3055     out => [null, "\x20", null, "\x20", null],
3056     },
3057     {
3058     id => q<l=2.ab>,
3059     in => qq<ab>,
3060     out => [null, "a", null, "b", null],
3061     },
3062     {
3063     id => q<8040>,
3064     in => qq<\x80\x40>,
3065     out => [null, "\x80", [<Q::cs|unassigned-code-point-error>],
3066     "\x40", null],
3067     },
3068     {
3069     id => q<8100>,
3070     in => qq<\x81\x00>,
3071     out => [null, "\x81\x00", [<Q::cs|unassigned-code-point-error>]],
3072     },
3073     {
3074     id => q<8101>,
3075     in => qq<\x81\x01>,
3076     out => [null, "\x81\x01", [<Q::cs|unassigned-code-point-error>]],
3077     },
3078     {
3079     id => q<813F>,
3080     in => qq<\x81\x3F>,
3081     out => [null, "\x81\x3F", [<Q::cs|unassigned-code-point-error>]],
3082     },
3083     {
3084     id => q<8140>,
3085     in => qq<\x81\x40>,
3086     out => [null, "\x{3000}", null],
3087     },
3088     {
3089     id => q<8141>,
3090     in => qq<\x81\x41>,
3091     out => [null, "\x{3001}", null],
3092     },
3093     {
3094     id => q<8143>,
3095     in => qq<\x81\x43>,
3096     out => [null, "\x{FF0C}", null], # FULLWIDTH COMMA
3097     },
3098     {
3099     id => q<8150>,
3100     in => qq<\x81\x50>,
3101     out => [null, "\x{FFE3}", null], # FULLWIDTH MACRON
3102     },
3103     {
3104     id => q<815C>,
3105     in => qq<\x81\x5C>,
3106     out => [null, "\x{2014}", null], # EM DASH
3107     },
3108     {
3109     id => q<815F>,
3110     in => qq<\x81\x5F>,
3111     out => [null, "\x{005C}", null], # REVERSE SOLIDUS
3112     },
3113     {
3114     id => q<8160>,
3115     in => qq<\x81\x60>,
3116     out => [null, "\x{301C}", null], # WAVE DASH
3117     },
3118     {
3119     id => q<8161>,
3120     in => qq<\x81\x61>,
3121     out => [null, "\x{2016}", null], # DOUBLE VERTICAL LINE
3122     },
3123     {
3124     id => q<8163>,
3125     in => qq<\x81\x63>,
3126     out => [null, "\x{2026}", null], # HORIZONTAL ELLIPSIS
3127     },
3128     {
3129     id => q<817C>,
3130     in => qq<\x81\x7C>,
3131     out => [null, "\x{2212}", null], # MINUS SIGN
3132     },
3133     {
3134     id => q<817F>,
3135     in => qq<\x81\x7F>,
3136     out => [null, "\x81\x7F", [<Q::cs|unassigned-code-point-error>]],
3137     },
3138     {
3139     id => q<818F>,
3140     in => qq<\x81\x8F>,
3141     out => [null, "\x{FFE5}", null], # FULLWIDTH YEN SIGN
3142     },
3143     {
3144     id => q<8191>,
3145     in => qq<\x81\x91>,
3146     out => [null, "\x{00A2}", null], # CENT SIGN
3147     },
3148     {
3149     id => q<8192>,
3150     in => qq<\x81\x92>,
3151     out => [null, "\x{00A3}", null], # POUND SIGN
3152     },
3153     {
3154     id => q<81AC>,
3155     in => qq<\x81\xAC>,
3156     out => [null, "\x{3013}", null], # GETA MARK
3157     },
3158     {
3159     id => q<81AD>,
3160     in => qq<\x81\xAD>,
3161     out => [null, "\x81\xAD", [<Q::cs|unassigned-code-point-error>]],
3162     },
3163     {
3164     id => q<81B8>,
3165     in => qq<\x81\xB8>,
3166     out => [null, "\x{2208}", null], # ELEMENT OF
3167     },
3168     {
3169     id => q<81CA>,
3170     in => qq<\x81\xCA>,
3171     out => [null, "\x{00AC}", null], # NOT SIGN
3172     },
3173     {
3174     id => q<81FC>,
3175     in => qq<\x81\xFC>,
3176     out => [null, "\x{25EF}", null], # LARGE CIRCLE
3177     },
3178     {
3179     id => q<81FD>,
3180     in => qq<\x81\xFD>,
3181     out => [null, "\x81\xFD", [<Q::cs|unassigned-code-point-error>]],
3182     },
3183     {
3184     id => q<81FE>,
3185     in => qq<\x81\xFE>,
3186     out => [null, "\x81\xFE", [<Q::cs|unassigned-code-point-error>]],
3187     },
3188     {
3189     id => q<81FF>,
3190     in => qq<\x81\xFF>,
3191     out => [null, "\x81\xFF", [<Q::cs|unassigned-code-point-error>]],
3192     },
3193     {
3194     id => q<DDDE>,
3195     in => qq<\xDD\xDE>,
3196     out => [null, "\x{FF9D}", null, "\x{FF9E}", null],
3197     },
3198     {
3199     id => q<e040>,
3200     in => qq<\xE0\x40>,
3201     out => [null, "\x{6F3E}", null],
3202     },
3203     {
3204     id => q<eaa4>,
3205     in => qq<\xEA\xA4>,
3206     out => [null, "\x{7199}", null],
3207     },
3208     {
3209     id => q<eaa5>,
3210     in => qq<\xEA\xA5>,
3211     out => [null, "\xEA\xA5", [<Q::cs|unassigned-code-point-error>]],
3212     },
3213     {
3214     id => q<eb40>,
3215     in => qq<\xEB\x40>,
3216     out => [null, "\xEB\x40", [<Q::cs|unassigned-code-point-error>]],
3217     },
3218     {
3219     id => q<ed40>,
3220     in => qq<\xED\x40>,
3221     out => [null, "\xED\x40", [<Q::cs|unassigned-code-point-error>]],
3222     },
3223     {
3224     id => q<effc>,
3225     in => qq<\xEF\xFC>,
3226     out => [null, "\xEF\xFC", [<Q::cs|unassigned-code-point-error>]],
3227     },
3228     {
3229     id => q<f040>,
3230     in => qq<\xF0\x40>,
3231     out => [null, "\xF0", [<Q::cs|unassigned-code-point-error>],
3232     "\x40", null],
3233     },
3234     {
3235     id => q<f140>,
3236     in => qq<\xF1\x40>,
3237     out => [null, "\xF1", [<Q::cs|unassigned-code-point-error>],
3238     "\x40", null],
3239     },
3240     {
3241     id => q<fb40>,
3242     in => qq<\xFB\x40>,
3243     out => [null, "\xFB", [<Q::cs|unassigned-code-point-error>],
3244     "\x40", null],
3245     },
3246     {
3247     id => q<fc40>,
3248     in => qq<\xFc\x40>,
3249     out => [null, "\xFC", [<Q::cs|unassigned-code-point-error>],
3250     "\x40", null],
3251     },
3252     {
3253     id => q<fd40>,
3254     in => qq<\xFD\x40>,
3255     out => [null, "\xFD", [<Q::cs|unassigned-code-point-error>],
3256     "\x40", null],
3257     },
3258     {
3259     id => q<fE40>,
3260     in => qq<\xFE\x40>,
3261     out => [null, "\xFE", [<Q::cs|unassigned-code-point-error>],
3262     "\x40", null],
3263     },
3264     {
3265     id => q<ff40>,
3266     in => qq<\xFF\x40>,
3267     out => [null, "\xFF", [<Q::cs|unassigned-code-point-error>],
3268     "\x40", null],
3269     },
3270     {
3271     id => q<81408142>,
3272     in => qq<\x81\x40\x81\x42>,
3273     out => [null, "\x{3000}", null, "\x{3002}", null],
3274     },
3275     );
3276    
3277     for my $testdata (@testdata) {
3278     my $byte = $testdata->{in};
3279     my $error;
3280     my $i = 0;
3281    
3282     open my $fh, '<', \$byte;
3283     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
3284     (<Q::cs|XML.shift_jis>, $fh, sub {
3285     my (null, null, $etype, %opt) = @_;
3286     $error = [$etype, \%opt];
3287     });
3288    
3289     while (@{$testdata->{out}}) {
3290     if ($i != 0) {
3291     my $c = shift @{$testdata->{out}};
3292     $test->id ("$testdata->{id}.$i");
3293     $test->assert_equals ($efh->getc, $c);
3294     }
3295    
3296     my $v = shift @{$testdata->{out}};
3297     $test->id ("$testdata->{id}.$i.error");
3298     if (defined $v) {
3299     $test->assert_not_null ($error);
3300     $test->assert_equals ($error->[0], $v->[0]);
3301     } else {
3302     $test->assert_null ($error->[0]);
3303     }
3304     $error = null;
3305     $i++;
3306     }
3307    
3308     $test->id ("$testdata->{id}.eof");
3309     $test->assert_null ($efh->getc);
3310     $test->assert_null ($error);
3311     } # testdata
3312     ##MCShiftJISDecodeHandle
3313 wakaba 1.6
3314     ClsDef:
3315     @ClsQName: ManakaiMCISO2022JPDecodeHandle
3316    
3317     @ClsISA: ManakaiMCDecodeHandle
3318    
3319     @Method:
3320     @@Name: getc
3321     @@enDesc:
3322     Returns the next character from the input.
3323     @@Return:
3324     @@@Type: String
3325     @@@enDesc:
3326     The next character.
3327     @@@nullCase:
3328     @@@@enDesc:
3329     If at the end of the file, or if there was an error, in which
3330     case <Perl::$!> is set.
3331     @@@PerlDef:
3332     if (@{$self->{<H::mce|characterQueue>}}) {
3333     $r = shift @{$self->{<H::mce|characterQueue>}};
3334     } else {
3335     __DEEP{
3336     A: {
3337     my $error;
3338     if ($self->{<H::mce|continue>}) {
3339     if (read $self->{<H::mce|filehandle>},
3340     $self->{<H::mce|byteBuffer>}, 256,
3341     length $self->{<H::mce|byteBuffer>}) {
3342     #
3343     } else {
3344     $error = true;
3345     }
3346     $self->{<H::mce|continue>} = false;
3347     } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
3348     read $self->{<H::mce|filehandle>},
3349     $self->{<H::mce|byteBuffer>}, 256,
3350     length $self->{<H::mce|byteBuffer>};
3351     }
3352    
3353     unless ($error) {
3354     if ($self->{<H::mce|byteBuffer>}
3355     =~ s/^\x1B(\x24[\x40\x42]|\x28[\x42\x4A])//) {
3356     $self->{<H::mce|state>} = {
3357     "\x24\x40" => <H::mce|State.2440>,
3358     "\x24\x42" => <H::mce|State.2442>,
3359     "\x28\x42" => <H::mce|State.2842>,
3360     "\x28\x4A" => <H::mce|State.284A>,
3361     }->{$1};
3362     redo A;
3363     } elsif ($self->{<H::mce|state>} eq <H::mce|State.2842>) { # IRV
3364     if ($self->{<H::mce|byteBuffer>}
3365     =~ s/^([\x00-\x0D\x10-\x1A\x1C-\x7F]+)//) {
3366     push @{$self->{<H::mce|characterQueue>}}, split //, $1;
3367     $r = shift @{$self->{<H::mce|characterQueue>}};
3368     } else {
3369     if (length $self->{<H::mce|byteBuffer>}) {
3370     $error = true;
3371     } else {
3372     $r = null;
3373     }
3374     }
3375     } elsif ($self->{<H::mce|state>} eq <H::mce|State.284A>) { # 0201
3376     if ($self->{<H::mce|byteBuffer>}
3377     =~ s/^([\x00-\x0D\x10-\x1A\x1C-\x7F]+)//) {
3378     my $v = $1; $v =~ tr/\x5C\x7E/\xA5\x{203E}/;
3379     push @{$self->{<H::mce|characterQueue>}}, split //, $v;
3380     $r = shift @{$self->{<H::mce|characterQueue>}};
3381     } else {
3382     if (length $self->{<H::mce|byteBuffer>}) {
3383     $error = true;
3384     } else {
3385     $r = null;
3386     $self->{<H::mce|onerror>}
3387     ->($self->{<H::mce|onerror>}, $self,
3388     <Q::cs|invalid-state-error>,
3389     state => $self->{<H::mce|state>});
3390     }
3391     }
3392     } elsif ($self->{<H::mce|state>} eq <H::mce|State.2442>) { # 1983
3393     my $v = Encode::decode ($self->{<H::mce|State.2442>},
3394     $self->{<H::mce|byteBuffer>},
3395     Encode::FB_QUIET ());
3396     if (length $v) {
3397     push @{$self->{<H::mce|characterQueue>}}, split //, $v;
3398     $r = shift @{$self->{<H::mce|characterQueue>}};
3399     } else {
3400     if (length $self->{<H::mce|byteBuffer>}) {
3401     $error = true;
3402     } else {
3403     $r = null;
3404     $self->{<H::mce|onerror>}
3405     ->($self->{<H::mce|onerror>}, $self,
3406     <Q::cs|invalid-state-error>,
3407     state => $self->{<H::mce|state>});
3408     }
3409     }
3410     } elsif ($self->{<H::mce|state>} eq <H::mce|State.2440>) { # 1978
3411     my $v = Encode::decode ($self->{<H::mce|State.2440>},
3412     $self->{<H::mce|byteBuffer>},
3413     Encode::FB_QUIET ());
3414     if (length $v) {
3415     push @{$self->{<H::mce|characterQueue>}}, split //, $v;
3416     $r = shift @{$self->{<H::mce|characterQueue>}};
3417     } else {
3418     if (length $self->{<H::mce|byteBuffer>}) {
3419     $error = true;
3420     } else {
3421     $r = null;
3422     $self->{<H::mce|onerror>}
3423     ->($self->{<H::mce|onerror>}, $self,
3424     <Q::cs|invalid-state-error>,
3425     state => $self->{<H::mce|state>});
3426     }
3427     }
3428     } else {
3429     $error = true;
3430     }
3431     }
3432     if ($error) {
3433     $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
3434     my $etype = <Q::cs|illegal-octets-error>;
3435     if (($self->{<H::mce|state>} eq <H::mce|State.2442> or
3436     $self->{<H::mce|state>} eq <H::mce|State.2440>) and
3437     $r =~ /^[\x21-\x7E]/ and
3438     $self->{<H::mce|byteBuffer>} =~ s/^([\x21-\x7E])//) {
3439     $r .= $1;
3440     $etype = <Q::cs|unassigned-code-point-error>;
3441     } elsif ($r eq "\x1B" and
3442     $self->{<H::mce|byteBuffer>} =~ s/^\(H//) { # Old 0201
3443     $r .= "(H";
3444     $self->{<H::mce|state>} = <H::mce|State.284A>;
3445     }
3446     $self->{<H::mce|onerror>}
3447     ->($self->{<H::mce|onerror>}, $self, $etype,
3448     octets => \$r);
3449     }
3450     } # A
3451     }__;
3452     }
3453    
3454     @Test:
3455     @@QName: MCISO2022JPDecodeHandle.1.test
3456     @@PerlDef:
3457     my $impl;
3458     __CODE{createImplForTest:: $impl => $impl}__;
3459    
3460     my @testdata = (
3461     {
3462     id => q<l=0>,
3463     in => q<>,
3464     out1 => [null],
3465     out2 => [null],
3466     },
3467     {
3468     id => q<l=1.00>,
3469     in => qq<\x00>,
3470     out1 => [null, "\x00", null],
3471     out2 => [null, "\x00", null],
3472     },
3473     {
3474     id => q<l=1.0d>,
3475     in => qq<\x0D>,
3476     out1 => [null, "\x0D", null],
3477     out2 => [null, "\x0D", null],
3478     }, # Error?
3479     {
3480     id => q<0A>,
3481     in => qq<\x0A>,
3482     out1 => [null, "\x0A", null],
3483     out2 => [null, "\x0A", null],
3484     }, # Error?
3485     {
3486     id => q<l=1.0e>,
3487     in => qq<\x0E>,
3488     out1 => [null, "\x0E", [<Q::cs|illegal-octets-error>]],
3489     out2 => [null, "\x0E", [<Q::cs|illegal-octets-error>]],
3490     },
3491     {
3492     id => q<l=1.0f>,
3493     in => qq<\x0F>,
3494     out1 => [null, "\x0F", [<Q::cs|illegal-octets-error>]],
3495     out2 => [null, "\x0F", [<Q::cs|illegal-octets-error>]],
3496     },
3497     {
3498     id => q<l=1.1b>,
3499     in => qq<\x1B>,
3500     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>]],
3501     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>]],
3502     },
3503     {
3504     id => q<l=1.a>,
3505     in => q<a>,
3506     out1 => [null, "a", null],
3507     out2 => [null, "a", null],
3508     },
3509     {
3510     id => q<l=1.20>,
3511     in => qq<\x20>,
3512     out1 => [null, "\x20", null],
3513     out2 => [null, "\x20", null],
3514     },
3515     {
3516     id => q<l=1.5C>,
3517     in => qq<\x5C>,
3518     out1 => [null, "\x5C", null],
3519     out2 => [null, "\x5C", null],
3520     },
3521     {
3522     id => q<l=1.7E>,
3523     in => qq<\x7E>,
3524     out1 => [null, "\x7E", null],
3525     out2 => [null, "\x7E", null],
3526     },
3527     {
3528     id => q<l=1.7F>,
3529     in => qq<\x7F>,
3530     out1 => [null, "\x7F", null],
3531     out2 => [null, "\x7F", null],
3532     },
3533     {
3534     id => q<l=1.80>,
3535     in => qq<\x80>,
3536     out1 => [null, "\x80", [<Q::cs|illegal-octets-error>]],
3537     out2 => [null, "\x80", [<Q::cs|illegal-octets-error>]],
3538     },
3539     {
3540     id => q<l=1.8c>,
3541     in => qq<\x8C>,
3542     out1 => [null, "\x8C", [<Q::cs|illegal-octets-error>]],
3543     out2 => [null, "\x8C", [<Q::cs|illegal-octets-error>]],
3544     },
3545     {
3546     id => q<l=1.8e>,
3547     in => qq<\x8E>,
3548     out1 => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
3549     out2 => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
3550     },
3551     {
3552     id => q<l=1.8f>,
3553     in => qq<\x8F>,
3554     out1 => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
3555     out2 => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
3556     },
3557     {
3558     id => q<l=1.a0>,
3559     in => qq<\xA0>,
3560     out1 => [null, "\xA0", [<Q::cs|illegal-octets-error>]],
3561     out2 => [null, "\xA0", [<Q::cs|illegal-octets-error>]],
3562     },
3563     {
3564     id => q<l=1.a1>,
3565     in => qq<\xA1>,
3566     out1 => [null, "\xA1", [<Q::cs|illegal-octets-error>]],
3567     out2 => [null, "\xA1", [<Q::cs|illegal-octets-error>]],
3568     },
3569     {
3570     id => q<l=1.a2>,
3571     in => qq<\xA2>,
3572     out1 => [null, "\xA2", [<Q::cs|illegal-octets-error>]],
3573     out2 => [null, "\xA2", [<Q::cs|illegal-octets-error>]],
3574     },
3575     {
3576     id => q<l=1.df>,
3577     in => qq<\xdf>,
3578     out1 => [null, "\xDF", [<Q::cs|illegal-octets-error>]],
3579     out2 => [null, "\xDF", [<Q::cs|illegal-octets-error>]],
3580     },
3581     {
3582     id => q<l=1.e0>,
3583     in => qq<\xe0>,
3584     out1 => [null, "\xE0", [<Q::cs|illegal-octets-error>]],
3585     out2 => [null, "\xE0", [<Q::cs|illegal-octets-error>]],
3586     },
3587     {
3588     id => q<l=1.ef>,
3589     in => qq<\xEF>,
3590     out1 => [null, "\xEF", [<Q::cs|illegal-octets-error>]],
3591     out2 => [null, "\xEF", [<Q::cs|illegal-octets-error>]],
3592     },
3593     {
3594     id => q<F0>,
3595     in => qq<\xF0>,
3596     out1 => [null, "\xF0", [<Q::cs|illegal-octets-error>]],
3597     out2 => [null, "\xF0", [<Q::cs|illegal-octets-error>]],
3598     },
3599     {
3600     id => q<l=1.fc>,
3601     in => qq<\xFC>,
3602     out1 => [null, "\xFC", [<Q::cs|illegal-octets-error>]],
3603     out2 => [null, "\xFC", [<Q::cs|illegal-octets-error>]],
3604     },
3605     {
3606     id => q<l=1.fd>,
3607     in => qq<\xFD>,
3608     out1 => [null, "\xFD", [<Q::cs|illegal-octets-error>]],
3609     out2 => [null, "\xFD", [<Q::cs|illegal-octets-error>]],
3610     },
3611     {
3612     id => q<l=1.fe>,
3613     in => qq<\xFE>,
3614     out1 => [null, "\xFE", [<Q::cs|illegal-octets-error>]],
3615     out2 => [null, "\xFE", [<Q::cs|illegal-octets-error>]],
3616     },
3617     {
3618     id => q<l=1.ff>,
3619     in => qq<\xFF>,
3620     out1 => [null, "\xFF", [<Q::cs|illegal-octets-error>]],
3621     out2 => [null, "\xFF", [<Q::cs|illegal-octets-error>]],
3622     },
3623     {
3624     id => q<l=2.0000>,
3625     in => qq<\x00\x00>,
3626     out1 => [null, "\x00", null, "\x00", null],
3627     out2 => [null, "\x00", null, "\x00", null],
3628     },
3629     {
3630     id => q<l=2.0D0A>,
3631     in => qq<\x0D\x0A>,
3632     out1 => [null, "\x0D", null, "\x0A", null],
3633     out2 => [null, "\x0D", null, "\x0A", null],
3634     },
3635     {
3636     id => q<l=2.1B1B>,
3637     in => qq<\x1B\x1B>,
3638     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>],
3639     "\x1B", [<Q::cs|illegal-octets-error>]],
3640     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>],
3641     "\x1B", [<Q::cs|illegal-octets-error>]],
3642     },
3643     {
3644     id => q<l=2.1B20>,
3645     in => qq<\x1B\x20>,
3646     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x20", null],
3647     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x20", null],
3648     },
3649     {
3650     id => q<l=2.1B24>,
3651     in => qq<\x1B\x24>,
3652     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x24", null],
3653     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x24", null],
3654     },
3655     {
3656     id => q<l=2.1B28>,
3657     in => qq<\x1B\x28>,
3658     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x28", null],
3659     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x28", null],
3660     },
3661     {
3662     id => q<l=2.2020>,
3663     in => qq<\x20\x20>,
3664     out1 => [null, "\x20", null, "\x20", null],
3665     out2 => [null, "\x20", null, "\x20", null],
3666     },
3667     {
3668     id => q<l=2.ab>,
3669     in => qq<ab>,
3670     out1 => [null, "a", null, "b", null],
3671     out2 => [null, "a", null, "b", null],
3672     },
3673     {
3674     id => q<8040>,
3675     in => qq<\x80\x40>,
3676     out1 => [null, "\x80", [<Q::cs|illegal-octets-error>],
3677     "\x40", null],
3678     out2 => [null, "\x80", [<Q::cs|illegal-octets-error>],
3679     "\x40", null],
3680     },
3681     {
3682     id => q<1B2440>,
3683     in => qq<\x1B\x24\x40>,
3684     out1 => [null],
3685     out2 => [null],
3686     eof_error => [<Q::cs|invalid-state-error>],
3687     },
3688     {
3689     id => q<1B2442>,
3690     in => qq<\x1B\x24\x42>,
3691     out1 => [null],
3692     out2 => [null],
3693     eof_error => [<Q::cs|invalid-state-error>],
3694     },
3695     {
3696     id => q<1B2840>,
3697     in => qq<\x1B\x28\x40>,
3698     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "(", null,
3699     "\x40", null],
3700     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "(", null,
3701     "\x40", null],
3702     },
3703     {
3704     id => q<1B2842>,
3705     in => qq<\x1B\x28\x42>,
3706     out1 => [null],
3707     out2 => [null],
3708     },
3709     {
3710     id => q<1B284A>,
3711     in => qq<\x1B\x28\x4A>,
3712     out1 => [null],
3713     out2 => [null],
3714     eof_error => [<Q::cs|invalid-state-error>],
3715     },
3716     {
3717     id => q<1B$B1B(B>,
3718     in => qq<\x1B\x24\x42\x1B\x28\x42>,
3719     out1 => [null],
3720     out2 => [null],
3721     },
3722     {
3723     id => q<1B(B1B(B>,
3724     in => qq<\x1B\x28\x42\x1B\x28\x42>,
3725     out1 => [null],
3726     out2 => [null],
3727     },
3728     {
3729     id => q<1B(Ba1B(B>,
3730     in => qq<\x1B\x28\x42a\x1B\x28\x42>,
3731     out1 => [null, "a", null],
3732     out2 => [null, "a", null],
3733     },
3734     {
3735     id => q<1B(Ba1B(B1B(B>,
3736     in => qq<\x1B\x28\x42a\x1B\x28\x42\x1B\x28\x42>,
3737     out1 => [null, "a", null],
3738     out2 => [null, "a", null],
3739     },
3740     {
3741     id => q<1B$42!!1B2842>,
3742     in => qq<\x1B\x24\x42!!\x1B\x28\x42>,
3743     out1 => [null, "\x{3000}", null],
3744     out2 => [null, "\x{3000}", null],
3745     },
3746     {
3747     id => q<1B$4221211B284A>,
3748     in => qq<\x1B\x24\x42!!\x1B\x28\x4A>,
3749     out1 => [null, "\x{3000}", null],
3750     out2 => [null, "\x{3000}", null],
3751     eof_error => [<Q::cs|invalid-state-error>],
3752     },
3753     {
3754     id => q<1B$4021211B2842>,
3755     in => qq<\x1B\x24\x40!!\x1B\x28\x42>,
3756     out1 => [null, "\x{3000}", null],
3757     out2 => [null, "\x{3000}", null],
3758     },
3759     {
3760     id => q<1B$402121211B2842>,
3761     in => qq<\x1B\x24\x40!!!\x1B\x28\x42>,
3762     out1 => [null, "\x{3000}", null, "!", [<Q::cs|illegal-octets-error>]],
3763     out2 => [null, "\x{3000}", null, "!", [<Q::cs|illegal-octets-error>]],
3764     },
3765     {
3766     id => q<1B$4021211B2442!!1B2842>,
3767     in => qq<\x1B\x24\x40!!\x1B\x24\x42!!\x1B\x28\x42>,
3768     out1 => [null, "\x{3000}", null, "\x{3000}", null],
3769     out2 => [null, "\x{3000}", null, "\x{3000}", null],
3770     },
3771     {
3772     id => q<1B$4021211B2440!!1B2842>,
3773     in => qq<\x1B\x24\x40!!\x1B\x24\x40!!\x1B\x28\x42>,
3774     out1 => [null, "\x{3000}", null, "\x{3000}", null],
3775     out2 => [null, "\x{3000}", null, "\x{3000}", null],
3776     },
3777     {
3778     id => q<1B$@!"1B(B\~|>,
3779     in => qq<\x1B\x24\x40!"\x1B(B\\~|>,
3780     out1 => [null, "\x{3001}", null, "\x5C", null,
3781     "\x7E", null, "|", null],
3782     out2 => [null, "\x{3001}", null, "\x5C", null,
3783     "\x7E", null, "|", null],
3784     },
3785     {
3786     id => q<1B$B!"1B(J\~|1B(B>,
3787     in => qq<\x1B\x24\x42!"\x1B(J\\~|\x1B(B>,
3788     out1 => [null, "\x{3001}", null, "\xA5", null,
3789     "\x{203E}", null, "|", null],
3790     out2 => [null, "\x{3001}", null, "\xA5", null,
3791     "\x{203E}", null, "|", null],
3792     },
3793     {
3794     id => q<78compat.3022(16-02)>,
3795     in => qq<\x1B\$\@\x30\x22\x1B\$B\x30\x22\x1B(B>,
3796     out1 => [null, "\x{555E}", null, "\x{5516}", null],
3797     out2 => [null, "\x{5516}", null, "\x{5516}", null],
3798     },
3799     {
3800     id => q<unassigned.2239>,
3801     in => qq<\x1B\$\@\x22\x39\x1B\$B\x22\x39\x1B(B>,
3802     out1 => [null, "\x22\x39", [<Q::cs|unassigned-code-point-error>],
3803     "\x22\x39", [<Q::cs|unassigned-code-point-error>]],
3804     out2 => [null, "\x22\x39", [<Q::cs|unassigned-code-point-error>],
3805     "\x22\x39", [<Q::cs|unassigned-code-point-error>]],
3806     },
3807     {
3808     id => q<83add.223A>,
3809     in => qq<\x1B\$\@\x22\x3A\x1B\$B\x22\x3A\x1B(B>,
3810     out1 => [null, "\x22\x3A", [<Q::cs|unassigned-code-point-error>],
3811     "\x{2208}", null],
3812     out2 => [null, "\x{2208}", null, "\x{2208}", null],
3813     },
3814     {
3815     id => q<83add.2840>,
3816     in => qq<\x1B\$\@\x28\x40\x1B\$B\x28\x40\x1B(B>,
3817     out1 => [null, "\x28\x40", [<Q::cs|unassigned-code-point-error>],
3818     "\x{2542}", null],
3819     out2 => [null, "\x{2542}", null, "\x{2542}", null],
3820     },
3821     {
3822     id => q<83add.7421>,
3823     in => qq<\x1B\$\@\x74\x21\x1B\$B\x74\x21\x1B(B>,
3824     out1 => [null, "\x74\x21", [<Q::cs|unassigned-code-point-error>],
3825     "\x{582F}", null],
3826     out2 => [null, "\x{5C2D}", null, "\x{582F}", null],
3827     },
3828     {
3829     id => q<83swap.3033>,
3830     in => qq<\x1B\$\@\x30\x33\x1B\$B\x30\x33\x1B(B>,
3831     out1 => [null, "\x{9C3A}", null, "\x{9BF5}", null],
3832     out2 => [null, "\x{9C3A}", null, "\x{9BF5}", null],
3833     },
3834     {
3835     id => q<83swap.724D>,
3836     in => qq<\x1B\$\@\x72\x4D\x1B\$B\x72\x4D\x1B(B>,
3837     out1 => [null, "\x{9BF5}", null, "\x{9C3A}", null],
3838     out2 => [null, "\x{9BF5}", null, "\x{9C3A}", null],
3839     },
3840     {
3841     id => q<90add.7425>,
3842     in => qq<\x1B\$\@\x74\x25\x1B\$B\x74\x25\x1B(B>,
3843     out1 => [null, "\x74\x25", [<Q::cs|unassigned-code-point-error>],
3844     "\x74\x25", [<Q::cs|unassigned-code-point-error>]],
3845     out2 => [null, "\x{51DC}", null, "\x{51DC}", null],
3846     },
3847     {
3848     id => q<90add.7426>,
3849     in => qq<\x1B\$\@\x74\x26\x1B\$B\x74\x26\x1B(B>,
3850     out1 => [null, "\x74\x26", [<Q::cs|unassigned-code-point-error>],
3851     "\x74\x26", [<Q::cs|unassigned-code-point-error>]],
3852     out2 => [null, "\x{7199}", null, "\x{7199}", null],
3853     },
3854     );
3855    
3856     for my $testdata (@testdata) {
3857     for my $c (1..2) {
3858     my $byte = $testdata->{in};
3859     my $error;
3860     my $i = 0;
3861    
3862     open my $fh, '<', \$byte;
3863     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
3864     ([null, <Q::icharset|iso-2022-jp>,
3865     <Q::cs|XML.iso-2022-jp>]->[$c], $fh, sub {
3866     my (null, null, $etype, %opt) = @_;
3867     $error = [$etype, \%opt];
3868     });
3869    
3870     while (@{$testdata->{"out$c"}}) {
3871     if ($i != 0) {
3872     my $c = shift @{$testdata->{"out$c"}};
3873     $test->id ("$testdata->{id}.$i.$c");
3874     $test->assert_equals ($efh->getc, $c);
3875     }
3876    
3877     my $v = shift @{$testdata->{"out$c"}};
3878     $test->id ("$testdata->{id}.$i.error.$c");
3879     if (defined $v) {
3880     $test->assert_not_null ($error);
3881     $test->assert_equals ($error->[0], $v->[0]);
3882     } else {
3883     $test->assert_null ($error->[0]);
3884     }
3885     $error = null;
3886     $i++;
3887     }
3888    
3889     $test->id ("$testdata->{id}.eof.$c");
3890     $test->assert_null ($efh->getc);
3891    
3892     my $v = $testdata->{eof_error};
3893     $test->id ("$testdata->{id}.$i.error.$c");
3894     if (defined $v) {
3895     $test->assert_not_null ($error);
3896     $test->assert_equals ($error->[0], $v->[0]);
3897     } else {
3898     $test->assert_null ($error->[0]);
3899     }
3900     $error = null;
3901     }} # testdata
3902     ##MCISO2022JPDecodeHandle
3903    
3904     PropDef:
3905     @QName: mce|state
3906     @mce:key: s
3907     @enDesc:
3908     ISO-2022-JP state.
3909    
3910     ResourceDef:
3911     @QName: mce|State.2440
3912     @mce:key: s1
3913     @enDesc:
3914     ISO-2022-JP state: ESC 2/4 4/0.
3915     @DISCore:resourceType: DISCore|Resource
3916     @DISCore:resourceType: DISCore|Property
3917    
3918     ResourceDef:
3919     @QName: mce|State.2442
3920     @mce:key: s2
3921     @enDesc:
3922     ISO-2022-JP state: ESC 2/4 4/2.
3923     @DISCore:resourceType: DISCore|Resource
3924     @DISCore:resourceType: DISCore|Property
3925    
3926     ResourceDef:
3927     @QName: mce|State.2842
3928     @mce:key: s3
3929     @enDesc:
3930     ISO-2022-JP state: ESC 2/8 4/2.
3931     @DISCore:resourceType: DISCore|Resource
3932    
3933     ResourceDef:
3934     @QName: mce|State.284A
3935     @mce:key: s4
3936     @enDesc:
3937     ISO-2022-JP state: ESC 2/8 4/10.
3938     @DISCore:resourceType: DISCore|Resource
3939 wakaba 1.2
3940     ResourceDef:
3941     @QName: String
3942 wakaba 1.10 @AliasFor: str|DOMString
3943 wakaba 1.2
3944     PropDef:
3945 wakaba 1.3 @QName: mce|onerror
3946 wakaba 1.2 @mce:key: onerr
3947 wakaba 1.3
3948     PropDef:
3949     @QName: mce|inputEncoding
3950     @mce:key: ie
3951    
3952     PropDef:
3953     @QName: mce|hasBOM
3954     @mce:key: bom
3955 wakaba 1.2
3956     PropDef:
3957     @QName: mce|continue
3958     @mce:key: cc
3959     @enDesc:
3960     Whether the <Q::mce|byteBuffer> contains octets that might
3961     be part of characters.
3962    
3963     PropDef:
3964     @QName: mce|charset
3965     @mce:key: cs
3966    
3967     PropDef:
3968     @QName: mce|perlEncodingName
3969     @mce:key: enc
3970    
3971     PropDef:
3972     @QName: mce|filehandle
3973     @mce:key: fh
3974    
3975     RPropDef:
3976     @QName: mce|key
3977     @subsetOf: DISPerl|propHashKey
3978    
3979     PropDef:
3980     @QName: mce|characterQueue
3981     @mce:key: cq
3982    
3983     PropDef:
3984     @QName: mce|byteBuffer
3985     @mce:key: bb
3986    
3987     ElementTypeBinding:
3988     @Name: PropDef
3989     @ElementType:
3990     dis:ResourceDef
3991     @ShadowContent:
3992     @@DISCore:resourceType: DISCore|Property
3993     @@ForCheck: =ManakaiDOM|all
3994    
3995     ElementTypeBinding:
3996     @Name: RPropDef
3997     @ElementType:
3998     dis:ResourceDef
3999     @ShadowContent:
4000     @@DISCore:resourceType: DISSource|ResourceProperty
4001     @@ForCheck: =ManakaiDOM|all
4002    
4003    
4004     ElementTypeBinding:
4005     @Name: nullCase
4006     @ElementType:
4007     dis:ResourceDef
4008     @ShadowContent:
4009     @@DISCore:resourceType: ManakaiDOM|InCase
4010     @@Value:
4011     @@@is-null:1
4012     @@@ContentType: DISCore|String
4013    
4014     ElementTypeBinding:
4015     @Name: TrueCase
4016     @ElementType:
4017     dis:ResourceDef
4018     @ShadowContent:
4019     @@DISCore:resourceType: ManakaiDOM|InCase
4020     @@Value:
4021     @@@@: 1
4022     @@@ContentType: DISCore|Boolean
4023     @@Type: idl|boolean||ManakaiDOM|all
4024    
4025     ElementTypeBinding:
4026     @Name: FalseCase
4027     @ElementType:
4028     dis:ResourceDef
4029     @ShadowContent:
4030     @@DISCore:resourceType: ManakaiDOM|InCase
4031     @@Value:
4032     @@@@: 0
4033     @@@ContentType: DISCore|Boolean
4034     @@Type: idl|boolean||ManakaiDOM|all
4035    
4036     ResourceDef:
4037     @DISCore:resourceType: cs|CharsetSet
4038     @cs:moduleRef: DISlib|Charset

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24