/[suikacvs]/messaging/manakai/lib/Message/Charset/Encode.dis
Suika

Contents of /messaging/manakai/lib/Message/Charset/Encode.dis

Parent Directory Parent Directory | Revision Log Revision Log


Revision 1.10 - (hide annotations) (download)
Sat Dec 2 12:46:19 2006 UTC (18 years, 4 months ago) by wakaba
Branch: MAIN
CVS Tags: manakai-200612
Changes since 1.9: +8 -3 lines
++ manakai/t/ChangeLog	2 Dec 2006 12:46:13 -0000
2006-12-02  Wakaba  <wakaba@suika.fam.cx>

	* Makefile (dom-DOMString.t): New test.

++ manakai/bin/ChangeLog	2 Dec 2006 12:35:25 -0000
2006-12-02  Wakaba  <wakaba@suika.fam.cx>

	* daf.pl: Call |get_dom_implementation|
	instead of obsolete |get_implementation|.

	* grep-dis.pl: |lib/manakai/*.pl| is added.

++ manakai/lib/Message/Util/ChangeLog	2 Dec 2006 12:45:49 -0000
2006-12-02  Wakaba  <wakaba@suika.fam.cx>

	* Makefile: |lib/Message/DOM/DOMString.pm| is added.

++ manakai/lib/Message/DOM/ChangeLog	2 Dec 2006 12:45:20 -0000
2006-12-02  Wakaba  <wakaba@suika.fam.cx>

	* DOMString.dis: New module.

	* DOMString.pm: New file.

	* DOMCore.dis (min): Moved from |DOMFeature.dis|.
	(ImplementationRegistryVariable): Moved from |DOMFeature.dis|.
	Now it references the |DOMImplementationRegistry| object.
	(DOMImplementationRegistryVariable): Moved from |DOMMain.dis|.
	(DOMImplementationRegistry): New interface and
	class, reformed from |ImplementationRegistry| in |DOMFeature.dis|
	and |DOMImplementationRegistry| in |DOMMain.dis|.  Note
	that the class no longer support |get_implementation|
	and |get_implementation_list| methods from
	the |ImplementationRegistry| interface.
	(DOMImplementationList): Class implemented; no
	longer inherits from |ImplementationList|.
	(DOMImplementationSource): Class implemented; no
	longer inherits from |ImplementationSource|.  Note that
	the class no longer support |get_implementation|
	and |get_implementation_list| methods from
	the |ImplementationSource| interface.
	(DOMStringList): Removed.

	* DOMFeature.dis (min, ManakaiDOM:DOMHTMLFeature,
	ManakaiDOM:DOMEventsFeature, ManakaiDOM:DOMXMLFeature,
	ManakaiDOM:DOMXMLFeatureXML11, most part of
	documentation for obsolete DOM Level 3 Minimum Implementation
	module, obsolete property name aliases,
	ImplemenationRegistryVar, ImplementationRegistry,
	DEBUG, MIString, ImplementationList, ImplementationSource,
	ManakaiDOM:implID): Removed.

	* DOMMain.dis (Redefine, RedefinedBy, Redefined): Removed.
	(DOMString): Removed.
	(DOMImplementationRegistryVar, DOMImplementationRegistry): Removed.

	* Makefile: |DOMString.pm| is added.

	* TreeCore.dis (is_default_namespace): |null| was
	returned where a false is expected (|null| is
	a false in Perl, but real |false| is appropriate here).

1 wakaba 1.1 Module:
2     @QName: MCharset|Encode
3     @FullName:
4     @@lang: en
5     @@@: Manakai Charset Encode Module
6     @Namespace:
7     http://suika.fam.cx/~wakaba/archive/2005/manakai/Charset/Encode/
8    
9     @enDesc:
10     The <DFN::manakai Charset Encode Module> is ...
11    
12     {TODO::
13     This module is subject to change.
14     }
15    
16     @DISCore:author: DISCore|Wakaba
17     @License: license|Perl+MPL
18     @Date:
19 wakaba 1.10 $Date: 2006/11/04 17:25:05 $
20 wakaba 1.1
21     @Require:
22     @@Module:
23     @@@QName: MDOM|DOMFeature
24     @@@WithFor: ManakaiDOM|ManakaiDOMLatest
25     @@Module:
26     @@@WithFor: ManakaiDOM|ManakaiDOM
27 wakaba 1.2 @@Module:
28     @@@QName: DISlib|Charset
29     @@@WithFor: ManakaiDOM|all
30 wakaba 1.8 @@Module:
31     @@@QName: MDOM|DOMCore
32     @@@WithFor: ManakaiDOM|ManakaiDOMLatest
33 wakaba 1.10 @@Module:
34     @@@QName: MDOM|DOMString
35     @@@WithFor: ManakaiDOM|ManakaiDOMLatest
36 wakaba 1.1 @DefaultFor: ManakaiDOM|ManakaiDOMLatest
37    
38     Namespace:
39 wakaba 1.8 @c:
40     http://suika.fam.cx/~wakaba/archive/2004/8/18/dom-core#
41 wakaba 1.2 @cs:
42     http://suika.fam.cx/~wakaba/archive/2004/dis/Charset/
43 wakaba 1.1 @dis:
44     http://suika.fam.cx/~wakaba/archive/2004/8/18/lang#dis--
45     @DISlib:
46     http://suika.fam.cx/~wakaba/archive/2004/dis/
47 wakaba 1.8 @domperl:
48     http://suika.fam.cx/~wakaba/archive/2006/dom/perl/
49 wakaba 1.1 @dx:
50     http://suika.fam.cx/~wakaba/archive/2005/manakai/Util/Error/DOMException#
51     @f:
52     http://suika.fam.cx/~wakaba/archive/2004/dom/feature#
53     @fe:
54     http://suika.fam.cx/www/2006/feature/
55 wakaba 1.2 @icharset:
56     urn:x-suika-fam-cx:charset:
57 wakaba 1.1 @idl:
58     http://suika.fam.cx/~wakaba/archive/2004/dis/IDL#
59     @kwd:
60     http://suika.fam.cx/~wakaba/archive/2005/rfc2119/
61     @lang:
62     http://suika.fam.cx/~wakaba/archive/2004/8/18/lang#
63     @license:
64     http://suika.fam.cx/~wakaba/archive/2004/8/18/license#
65     @ManakaiDOM:
66     http://suika.fam.cx/~wakaba/archive/2004/8/18/manakai-dom#
67     @MCharset:
68     http://suika.fam.cx/~wakaba/archive/2005/manakai/Charset/
69 wakaba 1.2 @mce:
70     http://suika.fam.cx/~wakaba/archive/2005/manakai/Charset/Encode/
71 wakaba 1.1 @MDOM:
72     http://suika.fam.cx/~wakaba/archive/2004/8/18/manakai-dom#ManakaiDOM.
73     @mn:
74     http://suika.fam.cx/~wakaba/archive/2005/manakai/Util/ManakaiNode#
75     @rdf:
76     http://www.w3.org/1999/02/22-rdf-syntax-ns#
77     @rdfs:
78     http://www.w3.org/2000/01/rdf-schema#
79 wakaba 1.10 @str:
80     http://suika.fam.cx/~wakaba/archive/2005/manakai/DOM/DOMString/
81 wakaba 1.1 @test:
82     http://suika.fam.cx/~wakaba/archive/2004/dis/Test#
83 wakaba 1.2 @xml-auto-charset:
84     http://suika.fam.cx/www/2006/03/xml-entity/
85 wakaba 1.1
86     ResourceDef:
87     @QName: MCharset|
88     @rdf:type: dis|ModuleGroup
89     @FullName:
90     @@lang:en
91     @@@:
92     The manakai Charset modules
93     @DISPerl:packageName:
94     Message::Charset::
95     @DISPerl:interfacePackageName:
96     @@@:
97     Message::Charset::IFLatest::
98     @@For: ManakaiDOM|ManakaiDOMLatest
99     @DISPerl:interfacePackageName:
100     @@@:
101     Message::Charset::IF::
102     @@For: !ManakaiDOM|ManakaiDOMLatest
103    
104     ## -- Features
105    
106 wakaba 1.2 FeatureDef:
107     @QName: MCEncodeFeature
108     @featureQName: fe|MCEncode
109     @FeatureVerDef:
110     @@QName: MCEncodeFeature10
111     @@f:instanceOf: MCEncodeFeature
112     @@Version: 1.0
113     @@enDesc:
114     The manakai Charset Encode Module, version 1.0
115    
116 wakaba 1.1 ElementTypeBinding:
117     @Name: FeatureDef
118     @ElementType:
119     dis:ResourceDef
120     @ShadowContent:
121     @@rdf:type: f|Feature
122     @@For: =ManakaiDOM|all
123    
124     ElementTypeBinding:
125     @Name: FeatureVerDef
126     @ElementType:
127     dis:ResourceDef
128     @ShadowContent:
129     @@rdf:type: f|Feature
130    
131     ElementTypeBinding:
132     @Name: featureQName
133     @ElementType:
134     f:name
135     @ShadowContent:
136     @@ContentType: DISCore|QName
137    
138     ElementTypeBinding:
139     @Name: IFQName
140     @ElementType:
141     dis:QName
142     @ShadowContent:
143     @@ForCheck: ManakaiDOM|ForIF
144    
145     ElementTypeBinding:
146     @Name: ClsQName
147     @ElementType:
148     dis:QName
149     @ShadowContent:
150     @@ForCheck: ManakaiDOM|ForClass
151    
152     ElementTypeBinding:
153     @Name: IFISA
154     @ElementType:
155     dis:ISA
156     @ShadowContent:
157     @@ForCheck: ManakaiDOM|ForIF
158    
159     ElementTypeBinding:
160     @Name: ClsISA
161     @ElementType:
162     dis:ISA
163     @ShadowContent:
164     @@ForCheck: ManakaiDOM|ForClass
165    
166     ElementTypeBinding:
167     @Name: IFClsDef
168     @ElementType:
169     dis:ResourceDef
170     @ShadowContent:
171     @@rdf:type:
172     @@@@: dis|MultipleResource
173     @@@ForCheck: !ManakaiDOM|ForIF !ManakaiDOM|ForClass
174     @@resourceFor: ManakaiDOM|ForIF
175     @@resourceFor:
176     @@@@: ManakaiDOM|ForClass
177 wakaba 1.2 @@@ForCheck: ManakaiDOM|ManakaiDOMLatest !=ManakaiDOM|ManakaiDOM
178 wakaba 1.1 @@For: ManakaiDOM|ManakaiDOMLatest
179     @@For: =ManakaiDOM|ManakaiDOM
180    
181     @@rdf:type:
182     @@@@: DISLang|Interface
183     @@@ForCheck: ManakaiDOM|ForIF
184    
185     @@rdf:type:
186     @@@@: DISLang|Class
187     @@@ForCheck: ManakaiDOM|ForClass
188     @@Implement:
189     @@@@: ||ManakaiDOM|ManakaiDOM||ManakaiDOM|ForIF
190     @@@ContentType: DISCore|TFPQNames
191     @@@ForCheck: ManakaiDOM|ForClass ManakaiDOM|ManakaiDOM
192     @@Implement:
193     @@@@: ||ManakaiDOM|ManakaiDOMLatest||ManakaiDOM|ForIF
194     @@@ContentType: DISCore|TFPQNames
195     @@@ForCheck: ManakaiDOM|ForClass ManakaiDOM|ManakaiDOMLatest
196    
197     @@f:implements:
198 wakaba 1.2 @@@@: MCEncodeFeature10
199     @@@For: ManakaiDOM|ManakaiDOMLatest
200    
201     ElementTypeBinding:
202     @Name: ClsDef
203     @ElementType:
204     dis:ResourceDef
205     @ShadowContent:
206     @@rdf:type:
207     @@@@: dis|MultipleResource
208     @@@ForCheck: !ManakaiDOM|ForIF !ManakaiDOM|ForClass
209     @@resourceFor:
210     @@@@: ManakaiDOM|ForClass
211     @@@ForCheck: ManakaiDOM|ManakaiDOMLatest
212     @@For: ManakaiDOM|ManakaiDOMLatest
213    
214     @@rdf:type:
215     @@@@: DISLang|Class
216     @@@ForCheck: ManakaiDOM|ForClass
217    
218     @@f:implements:
219     @@@@: MCEncodeFeature10
220 wakaba 1.1 @@@For: ManakaiDOM|ManakaiDOMLatest
221    
222     ElementTypeBinding:
223     @Name: Method
224     @ElementType:
225     dis:ResourceDef
226     @ShadowContent:
227     @@rdf:type: DISLang|Method
228     @@ForCheck: !=ManakaiDOM|ManakaiDOM
229    
230     ElementTypeBinding:
231     @Name: IntMethod
232     @ElementType:
233     dis:ResourceDef
234     @ShadowContent:
235     @@rdf:type: DISLang|Method
236     @@ForCheck: !=ManakaiDOM|ManakaiDOM ManakaiDOM|ForClass
237     @@ManakaiDOM:isForInternal: 1
238    
239     ElementTypeBinding:
240     @Name: Param
241     @ElementType:
242     dis:ResourceDef
243     @ShadowContent:
244     @@DISCore:resourceType: DISLang|MethodParameter
245    
246     ElementTypeBinding:
247     @Name: NamedParam
248     @ElementType:
249     dis:ResourceDef
250     @ShadowContent:
251     @@DISCore:resourceType: DISLang|MethodParameter
252     @@DISPerl:isNamedParameter: 1
253    
254     ElementTypeBinding:
255     @Name: Return
256     @ElementType:
257     dis:ResourceDef
258     @ShadowContent:
259     @@DISCore:resourceType: DISLang|MethodReturn
260    
261     ElementTypeBinding:
262     @Name: Attr
263     @ElementType:
264     dis:ResourceDef
265     @ShadowContent:
266     @@DISCore:resourceType: DISLang|Attribute
267     @@ForCheck: !=ManakaiDOM|ManakaiDOM
268    
269     ElementTypeBinding:
270     @Name: Get
271     @ElementType:
272     dis:ResourceDef
273     @ShadowContent:
274     @@DISCore:resourceType: DISLang|AttributeGet
275    
276     ElementTypeBinding:
277     @Name: Set
278     @ElementType:
279     dis:ResourceDef
280     @ShadowContent:
281     @@DISCore:resourceType: DISLang|AttributeSet
282    
283     ElementTypeBinding:
284     @Name: InCase
285     @ElementType:
286     dis:ResourceDef
287     @ShadowContent:
288     @@DISCore:resourceType: ManakaiDOM|InCase
289    
290     ElementTypeBinding:
291     @Name: PerlDef
292     @ElementType:
293     dis:Def
294     @ShadowContent:
295     @@ContentType:
296     lang:Perl
297     @@ForCheck: ManakaiDOM|ForClass
298    
299     ElementTypeBinding:
300     @Name: Test
301     @ElementType:
302     dis:ResourceDef
303     @ShadowContent:
304     @@rdf:type: test|StandaloneTest
305     @@ForCheck: ManakaiDOM|ForClass
306    
307     ElementTypeBinding:
308     @Name: enDesc
309     @ElementType:
310     dis:Description
311     @ShadowContent:
312     @@lang:en
313    
314     ElementTypeBinding:
315     @Name: enImplNote
316     @ElementType:
317     dis:ImplNote
318     @ShadowContent:
319     @@lang:en
320    
321     ElementTypeBinding:
322     @Name: enFN
323     @ElementType:
324     dis:FullName
325     @ShadowContent:
326     @@lang:en
327 wakaba 1.2
328     ## -- Implementation
329    
330     IFClsDef:
331     @IFQName: MCEncodeImplementation
332     @ClsQName: ManakaiMCEncodeImplementation
333    
334 wakaba 1.8 @domperl:implementedByObjectsOf: c|DOMImplementation
335     @domperl:classImplementedByObjectsOf: c|ManakaiDOMImplementation
336 wakaba 1.2
337     @enDesc:
338     The <IF::MCEncodeImplementation> interface provides
339     factory methods to create <IF::MCEncodeHandle> objects.
340    
341     @f:provides: MCEncodeFeature10
342    
343     @Test:
344     @@enDesc:
345     The implementation registry should know this class when the
346     module is loaded.
347     @@PerlDef:
348     I: {
349     for my $impl (@{$Message::DOM::ImplementationRegistry
350 wakaba 1.10 ->get_dom_implementation_list
351 wakaba 1.2 ({<Q::fe|MCEncode> => '1.0'})}) {
352     if ($impl->isa (<IFName::MCEncodeImplementation>)) {
353     last I;
354     }
355     }
356     $test->assert_never;
357     } # I
358    
359     @Method:
360     @@Name: createMCDecodeHandle
361     @@enDesc:
362     Creates an <IF::MCDecodeHandle> object.
363     @@Param:
364     @@@Name: charset
365     @@@Type: String
366     @@@enDesc:
367     The URI that identifies the charset
368     in which strings are written to the <P::byteStream>.
369     @@Param:
370     @@@Name: byteStream
371     @@@Type: DISPerl|Filehandle||ManakaiDOM|all
372     @@@enDesc:
373     A reference to the filehandle that contains the byte
374     stream read by the <IF::MCDecodeHandle> object.
375 wakaba 1.3 @@Param:
376     @@@Name: onerror
377     @@@Type: DISPerl|CODE||ManakaiDOM|all
378     @@@enDesc:
379     A subroutine that is called back when an error is encountered.
380     The <A::MCDecodeHandle.onerror> attribute of the
381     created object is set to this parameter value.
382     @@@nullCase:
383     @@@@enDesc:
384     The <A::MCDecodeHandle.onerror> attribute is set to
385     a do-nothing subroutine.
386 wakaba 1.2 @@Return:
387     @@@Type: MCDecodeHandle
388     @@@enDesc:
389     The newly created filehandle object.
390     @@@nullCase:
391     @@@@enDesc:
392     If the implementation does not support <P::charset>.
393     @@@PerlDef:
394     my $csdef = $Message::Charset::Encode::CharsetDef->{$charset};
395     my $obj = {<H::mce|filehandle> => $byteStream,
396     <H::mce|charset> => $charset,
397     <H::mce|characterQueue> => [],
398     <H::mce|byteBuffer> => '',
399 wakaba 1.3 <H::mce|onerror> => $onerror || sub {}};
400 wakaba 1.2 if ($csdef->{uri}->{<Q::xml-auto-charset:>} or
401     $charset eq <Q::xml-auto-charset:>) {
402 wakaba 1.3 __DEEP{
403     my $b = '';
404     $csdef = $Message::Charset::Encode::CharsetDef
405     ->{<Q::cs|Perl.utf-8>}; # UTF-8 with no BOM
406     $obj->{<H::mce|inputEncoding>} = 'utf-8';
407     if (read $obj->{<H::mce|filehandle>}, $b, 256) {
408     no warnings "substr";
409     no warnings "uninitialized";
410     if (substr ($b, 0, 1) eq "<") {
411     if (substr ($b, 1, 1) eq "?") { # ASCII8
412     __CODE{XMLEntity.guess::
413     $ascii => $b,
414     $errorCondition => {
415     not $csdef->{<H::cs|ASCII8>} or
416     $csdef->{<H::cs|BOM.Required>}
417     },
418     $defaultURI => {<Q::cs|Perl.utf-8>},
419     $defaultName => 'utf-8',
420     $restoreBOM => {},
421     }__;
422     if (defined $csdef->{<H::cs|noBOMVariant>}) {
423     $csdef = $Message::Charset::Encode::CharsetDef
424     ->{$csdef->{<H::cs|noBOMVariant>}};
425     }
426     } elsif (substr ($b, 1, 1) eq "\x00") {
427     if (substr ($b, 2, 2) eq "?\x00") { # ASCII16LE
428     my $c = $b; $c =~ tr/\x00//d;
429     __CODE{XMLEntity.guess::
430     $ascii => $c,
431     $errorCondition => {
432     not $csdef->{<H::cs|ASCII16>} or
433     $csdef->{<H::cs|ASCII16BE>} or
434     $csdef->{<H::cs|BOM.Required>}
435     },
436     $defaultURI => {<Q::cs|Perl.utf-8>},
437     $defaultName => 'utf-8',
438     $restoreBOM => {},
439     }__;
440     if (defined $csdef->{<H::cs|noBOMVariant16LE>}) {
441     $csdef = $Message::Charset::Encode::CharsetDef
442     ->{$csdef->{<H::cs|noBOMVariant16LE>}};
443     }
444     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian4321
445     my $c = $b; $c =~ tr/\x00//d;
446     __CODE{XMLEntity.guess::
447     $ascii => $c,
448     $errorCondition => {
449     not $csdef->{<H::cs|ASCII32>} or
450     $csdef->{<H::cs|ASCII32Endian1234>} or
451     $csdef->{<H::cs|ASCII32Endian2143>} or
452     $csdef->{<H::cs|ASCII32Endian3412>} or
453     $csdef->{<H::cs|BOM.Required>}
454     },
455     $defaultURI => {<Q::cs|Perl.utf-8>},
456     $defaultName => 'utf-8',
457     $restoreBOM => {},
458     }__;
459     if (defined $csdef->{<H::cs|noBOMVariant32Endian4321>}) {
460     $csdef = $Message::Charset::Encode::CharsetDef
461     ->{$csdef->{<H::cs|noBOMVariant32Endian4321>}};
462     }
463     }
464     }
465     } elsif (substr ($b, 0, 3) eq "\xEF\xBB\xBF") { # UTF8
466     $obj->{<H::mce|hasBOM>} = true;
467     substr ($b, 0, 3) = '';
468     my $c = $b;
469     __CODE{XMLEntity.guess::
470     $ascii => $c,
471     $errorCondition => {
472     not $csdef->{<H::cs|UTF8EncodingScheme>} or
473     not $csdef->{<H::cs|BOM.Allowed>}
474     },
475     $defaultURI => {<Q::cs|Perl.utf-8>},
476     $defaultName => 'utf-8',
477     $restoreBOM => {},
478     }__;
479     if (defined $csdef->{<H::cs|noBOMVariant>}) {
480     $csdef = $Message::Charset::Encode::CharsetDef
481     ->{$csdef->{<H::cs|noBOMVariant>}};
482     }
483     } elsif (substr ($b, 0, 2) eq "\x00<") {
484     if (substr ($b, 2, 2) eq "\x00?") { # ASCII16BE
485     my $c = $b; $c =~ tr/\x00//d;
486     __CODE{XMLEntity.guess::
487     $ascii => $c,
488     $errorCondition => {
489     not $csdef->{<H::cs|ASCII16>} or
490     $csdef->{<H::cs|ASCII16LE>} or
491     $csdef->{<H::cs|BOM.Required>}
492     },
493     $defaultURI => {<Q::cs|Perl.utf-8>},
494     $defaultName => 'utf-8',
495     $restoreBOM => {},
496     }__;
497     if (defined $csdef->{<H::cs|noBOMVariant16BE>}) {
498     $csdef = $Message::Charset::Encode::CharsetDef
499     ->{$csdef->{<H::cs|noBOMVariant16BE>}};
500     }
501     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian3412
502     my $c = $b; $c =~ tr/\x00//d;
503     __CODE{XMLEntity.guess::
504     $ascii => $c,
505     $errorCondition => {
506     not $csdef->{<H::cs|ASCII32>} or
507     $csdef->{<H::cs|ASCII32Endian1234>} or
508     $csdef->{<H::cs|ASCII32Endian2143>} or
509     $csdef->{<H::cs|ASCII32Endian4321>} or
510     $csdef->{<H::cs|BOM.Required>}
511     },
512     $defaultURI => {<Q::cs|Perl.utf-8>},
513     $defaultName => 'utf-8',
514     $restoreBOM => {},
515     }__;
516     if (defined $csdef->{<H::cs|noBOMVariant32Endian3412>}) {
517     $csdef = $Message::Charset::Encode::CharsetDef
518     ->{$csdef->{<H::cs|noBOMVariant32Endian3412>}};
519     }
520     }
521     } elsif (substr ($b, 0, 2) eq "\xFE\xFF") {
522     if (substr ($b, 2, 2) eq "\x00<") { # ASCII16BE
523     $obj->{<H::mce|hasBOM>} = true;
524     substr ($b, 0, 2) = '';
525     my $c = $b; $c =~ tr/\x00//d;
526     __CODE{XMLEntity.guess::
527     $ascii => $c,
528     $errorCondition => {
529     not $csdef->{<H::cs|ASCII16>} or
530     $csdef->{<H::cs|ASCII16LE>} or
531     not $csdef->{<H::cs|BOM.Allowed>}
532     },
533     $defaultURI => {<Q::cs|Perl.utf-16be>},
534     $defaultName => 'utf-16',
535     $restoreBOM => {},
536     }__;
537     if (defined $csdef->{<H::cs|noBOMVariant16BE>}) {
538     $csdef = $Message::Charset::Encode::CharsetDef
539     ->{$csdef->{<H::cs|noBOMVariant16BE>}};
540     }
541     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian3412
542     $obj->{<H::mce|hasBOM>} = true;
543     substr ($b, 0, 4) = '';
544     my $c = $b; $c =~ tr/\x00//d;
545     __CODE{XMLEntity.guess::
546     $ascii => $c,
547     $errorCondition => {
548     not $csdef->{<H::cs|ASCII32>} or
549     $csdef->{<H::cs|ASCII32Endian1234>} or
550     $csdef->{<H::cs|ASCII32Endian2143>} or
551     $csdef->{<H::cs|ASCII32Endian4321>} or
552     not $csdef->{<H::cs|BOM.Allowed>}
553     },
554     $defaultURI => {<Q::cs|Perl.utf-16be>},
555     $defaultName => 'utf-16',
556     $restoreBOM => {
557     $obj->{<H::mce|byteBuffer>} .= "\x00\x00";
558     },
559     }__;
560     if (defined $csdef->{<H::cs|noBOMVariant32Endian3412>}) {
561     $csdef = $Message::Charset::Encode::CharsetDef
562     ->{$csdef->{<H::cs|noBOMVariant32Endian3412>}};
563     }
564     } else {
565     $csdef = $Message::Charset::Encode::CharsetDef
566     ->{<Q::cs|Perl.utf-16be>};
567     $obj->{<H::mce|inputEncoding>} = 'utf-16';
568     substr ($b, 0, 2) = '';
569     $obj->{<H::mce|hasBOM>} = true;
570     }
571     } elsif (substr ($b, 0, 2) eq "\xFF\xFE") {
572     if (substr ($b, 2, 2) eq "<\x00") { # ASCII16LE
573     $obj->{<H::mce|hasBOM>} = true;
574     substr ($b, 0, 2) = '';
575     my $c = $b; $c =~ tr/\x00//d;
576     __CODE{XMLEntity.guess::
577     $ascii => $c,
578     $errorCondition => {
579     not $csdef->{<H::cs|ASCII16>} or
580     $csdef->{<H::cs|ASCII16BE>} or
581     not $csdef->{<H::cs|BOM.Allowed>}
582     },
583     $defaultURI => {<Q::cs|Perl.utf-16le>},
584     $defaultName => 'utf-16',
585     $restoreBOM => {},
586     }__;
587     if (defined $csdef->{<H::cs|noBOMVariant16LE>}) {
588     $csdef = $Message::Charset::Encode::CharsetDef
589     ->{$csdef->{<H::cs|noBOMVariant16LE>}};
590     }
591     } elsif (substr ($b, 2, 2) eq "\x00\x00") { # ASCII32Endian4321
592     $obj->{<H::mce|hasBOM>} = true;
593     substr ($b, 0, 4) = '';
594     my $c = $b; $c =~ tr/\x00//d;
595     __CODE{XMLEntity.guess::
596     $ascii => $c,
597     $errorCondition => {
598     not $csdef->{<H::cs|ASCII32>} or
599     $csdef->{<H::cs|ASCII32Endian1234>} or
600     $csdef->{<H::cs|ASCII32Endian2143>} or
601     $csdef->{<H::cs|ASCII32Endian3412>} or
602     not $csdef->{<H::cs|BOM.Allowed>}
603     },
604     $defaultURI => {<Q::cs|Perl.utf-16le>},
605     $defaultName => 'utf-16',
606     $restoreBOM => {
607     $obj->{<H::mce|byteBuffer>} .= "\x00\x00";
608     },
609     }__;
610     if (defined $csdef->{<H::cs|noBOMVariant32Endian4321>}) {
611     $csdef = $Message::Charset::Encode::CharsetDef
612     ->{$csdef->{<H::cs|noBOMVariant32Endian4321>}};
613     }
614     } else {
615     $csdef = $Message::Charset::Encode::CharsetDef
616     ->{<Q::cs|Perl.utf-16le>};
617     $obj->{<H::mce|inputEncoding>} = 'utf-16';
618     substr ($b, 0, 2) = '';
619     $obj->{<H::mce|hasBOM>} = true;
620     }
621     } elsif (substr ($b, 0, 2) eq "\x00\x00") {
622     if (substr ($b, 2, 2) eq "\x00<") { # ASCII32Endian1234
623     my $c = $b; $c =~ tr/\x00//d;
624     __CODE{XMLEntity.guess::
625     $ascii => $c,
626     $errorCondition => {
627     not $csdef->{<H::cs|ASCII32>} or
628     $csdef->{<H::cs|ASCII32Endian2143>} or
629     $csdef->{<H::cs|ASCII32Endian3412>} or
630     $csdef->{<H::cs|ASCII32Endian4321>} or
631     $csdef->{<H::cs|BOM.Required>}
632     },
633     $defaultURI => {<Q::cs|Perl.utf-8>},
634     $defaultName => 'utf-8',
635     $restoreBOM => {},
636     }__;
637     if (defined $csdef->{<H::cs|noBOMVariant32Endian1234>}) {
638     $csdef = $Message::Charset::Encode::CharsetDef
639     ->{$csdef->{<H::cs|noBOMVariant32Endian1234>}};
640     }
641     } elsif (substr ($b, 2, 2) eq "<\x00") { # ASCII32Endian2143
642     my $c = $b; $c =~ tr/\x00//d;
643     __CODE{XMLEntity.guess::
644     $ascii => $c,
645     $errorCondition => {
646     not $csdef->{<H::cs|ASCII32>} or
647     $csdef->{<H::cs|ASCII32Endian1234>} or
648     $csdef->{<H::cs|ASCII32Endian3412>} or
649     $csdef->{<H::cs|ASCII32Endian4321>} or
650     $csdef->{<H::cs|BOM.Required>}
651     },
652     $defaultURI => {<Q::cs|Perl.utf-8>},
653     $defaultName => 'utf-8',
654     $restoreBOM => {},
655     }__;
656     if (defined $csdef->{<H::cs|noBOMVariant32Endian2143>}) {
657     $csdef = $Message::Charset::Encode::CharsetDef
658     ->{$csdef->{<H::cs|noBOMVariant32Endian2143>}};
659     }
660     } elsif (substr ($b, 2, 2) eq "\xFE\xFF") { # ASCII32Endian1234
661     $obj->{<H::mce|hasBOM>} = true;
662     substr ($b, 0, 4) = '';
663     my $c = $b; $c =~ tr/\x00//d;
664     __CODE{XMLEntity.guess::
665     $ascii => $c,
666     $errorCondition => {
667     not $csdef->{<H::cs|ASCII32>} or
668     $csdef->{<H::cs|ASCII32Endian2143>} or
669     $csdef->{<H::cs|ASCII32Endian3412>} or
670     $csdef->{<H::cs|ASCII32Endian4321>} or
671     $csdef->{<H::cs|BOM.Required>}
672     },
673     $defaultURI => {<Q::cs|Perl.utf-8>},
674     $defaultName => 'utf-8',
675     $restoreBOM => {
676     $obj->{<H::mce|hasBOM>} = false;
677     $obj->{<H::mce|byteBuffer>} .= "\x00\x00\xFE\xFF";
678     },
679     }__;
680     if (defined $csdef->{<H::cs|noBOMVariant32Endian1234>}) {
681     $csdef = $Message::Charset::Encode::CharsetDef
682     ->{$csdef->{<H::cs|noBOMVariant32Endian1234>}};
683     }
684     } elsif (substr ($b, 2, 2) eq "\xFF\xFE") { # ASCII32Endian2143
685     $obj->{<H::mce|hasBOM>} = true;
686     substr ($b, 0, 4) = '';
687     my $c = $b; $c =~ tr/\x00//d;
688     __CODE{XMLEntity.guess::
689     $ascii => $c,
690     $errorCondition => {
691     not $csdef->{<H::cs|ASCII32>} or
692     $csdef->{<H::cs|ASCII32Endian1234>} or
693     $csdef->{<H::cs|ASCII32Endian3412>} or
694     $csdef->{<H::cs|ASCII32Endian4321>} or
695     $csdef->{<H::cs|BOM.Required>}
696     },
697     $defaultURI => {<Q::cs|Perl.utf-8>},
698     $defaultName => 'utf-8',
699     $restoreBOM => {
700     $obj->{<H::mce|hasBOM>} = false;
701     $obj->{<H::mce|byteBuffer>} .= "\x00\x00\xFF\xFE";
702     },
703     }__;
704     if (defined $csdef->{<H::cs|noBOMVariant32Endian2143>}) {
705     $csdef = $Message::Charset::Encode::CharsetDef
706     ->{$csdef->{<H::cs|noBOMVariant32Endian2143>}};
707     }
708     }
709     # \x4C\x6F\xA7\x94 EBCDIC
710     } # buffer
711     $obj->{<H::mce|byteBuffer>} .= $b;
712     } # read
713     }__;
714 wakaba 1.2 } elsif ($csdef->{uri}->{<Q::cs|XML.utf-8>}) {
715 wakaba 1.3 ## BOM is optional.
716     __DEEP{
717     my $b = '';
718     if (read $obj->{<H::mce|filehandle>}, $b, 3) {
719     if ($b eq "\xEF\xBB\xBF") {
720     $obj->{<H::mce|hasBOM>} = true;
721     } else {
722     $obj->{<H::mce|byteBuffer>} .= $b;
723     }
724     }
725     $csdef = $Message::Charset::Encode::CharsetDef
726     ->{<Q::cs|Perl.utf-8>}; # UTF-8 with no BOM
727     }__;
728 wakaba 1.2 } elsif ($csdef->{uri}->{<Q::cs|XML.utf-16>}) {
729 wakaba 1.3 ## BOM is mandated.
730     __DEEP{
731     my $b = '';
732     if (read $obj->{<H::mce|filehandle>}, $b, 2) {
733     if ($b eq "\xFE\xFF") {
734     $obj->{<H::mce|hasBOM>} = true;
735     $csdef = $Message::Charset::Encode::CharsetDef
736     ->{<Q::cs|Perl.utf-16be>}; # UTF-16BE with no BOM
737     } elsif ($b eq "\xFF\xFE") {
738     $obj->{<H::mce|hasBOM>} = true;
739     $csdef = $Message::Charset::Encode::CharsetDef
740     ->{<Q::cs|Perl.utf-16le>}; # UTF-16LE with no BOM
741     } else {
742     $obj->{<H::mce|onerror>}
743     ->($onerror, null, <Q::cs|no-bom-error>,
744     charset_uri => $charset);
745     $obj->{<H::mce|hasBOM>} = false;
746     $obj->{<H::mce|byteBuffer>} .= $b;
747     $csdef = $Message::Charset::Encode::CharsetDef
748     ->{<Q::cs|Perl.utf-16be>}; # UTF-16BE with no BOM
749     }
750     } else {
751     $obj->{<H::mce|onerror>}
752     ->($onerror, null, <Q::cs|no-bom-error>,
753     charset_uri => $charset);
754     $obj->{<H::mce|hasBOM>} = false;
755     $csdef = $Message::Charset::Encode::CharsetDef
756     ->{<Q::cs|Perl.utf-16be>}; # UTF-16BE with no BOM
757     }
758     }__;
759     }
760 wakaba 1.2
761 wakaba 1.6 if ($csdef->{uri}->{<Q::cs|XML.iso-2022-jp>}) {
762     $obj->{<H::mce|State.2440>} = 'gl-jis-1997-swapped';
763     $obj->{<H::mce|State.2442>} = 'gl-jis-1997';
764     $obj->{<H::mce|state>} = <H::mce|State.2842>;
765     $r = bless $obj, <ClassName::ManakaiMCISO2022JPDecodeHandle>;
766     require Encode::GLJIS1997Swapped;
767     require Encode::GLJIS1997;
768     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2440>});
769     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2442>});
770     } elsif ($csdef->{uri}->{<Q::icharset|iso-2022-jp>}) {
771     $obj->{<H::mce|State.2440>} = 'gl-jis-1978';
772     $obj->{<H::mce|State.2442>} = 'gl-jis-1983';
773     $obj->{<H::mce|state>} = <H::mce|State.2842>;
774     $r = bless $obj, <ClassName::ManakaiMCISO2022JPDecodeHandle>;
775     require Encode::GLJIS1978;
776     require Encode::GLJIS1983;
777     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2440>});
778     undef $r unless Encode::find_encoding ($obj->{<H::mce|State.2442>});
779     } elsif (defined $csdef->{<H::cs|perlName>}->[0]) {
780 wakaba 1.5 if ($csdef->{uri}->{<Q::cs|XML.euc-jp>} or
781     $csdef->{uri}->{<Q::icharset|euc-jp>}) {
782     $obj->{<H::mce|perlEncodingName>} = $csdef->{<H::cs|perlName>}->[0];
783     $r = bless $obj, <ClassName::ManakaiMCEUCJPDecodeHandle>;
784     require Encode::EUCJP1997;
785     undef $r unless Encode::find_encoding
786     ($obj->{<H::mce|perlEncodingName>});
787     } elsif ($csdef->{uri}->{<Q::cs|XML.shift_jis>} or
788     $csdef->{uri}->{<Q::icharset|shift_jis>}) {
789     $obj->{<H::mce|perlEncodingName>} = $csdef->{<H::cs|perlName>}->[0];
790     $r = bless $obj, <ClassName::ManakaiMCShiftJISDecodeHandle>;
791     require Encode::ShiftJIS1997;
792     undef $r unless Encode::find_encoding
793     ($obj->{<H::mce|perlEncodingName>});
794     } elsif ($csdef->{<H::cs|isBlockSafe>}) {
795 wakaba 1.2 $obj->{<H::mce|perlEncodingName>} = $csdef->{<H::cs|perlName>}->[0];
796     $r = bless $obj, <ClassName::ManakaiMCDecodeHandle>;
797     require Encode;
798     undef $r unless Encode::find_encoding
799     ($obj->{<H::mce|perlEncodingName>});
800     }
801     }
802 wakaba 1.5
803 wakaba 1.3 unless (defined $r) {
804     __DEEP{
805     $obj->{<H::mce|onerror>}
806     ->($onerror, null, <Q::cs|charset-not-supported-error>,
807     charset_uri => $charset);
808     }__;
809     }
810    
811     @@CODE:
812     @@@QName: XMLEntity.guess
813     @@@PerlDef:
814     if ($ascii =~ /^<\?xml\s+(?:version\s*=\s*["'][^"']*["']\s*)?
815     encoding\s*=\s*["']([^"']*)/x) {
816     $obj->{<H::mce|inputEncoding>} = lc $1;
817     my $__uri = $self-><M::MCEncodeImplementation.getURIFromCharsetName>
818     (<Q::cs|xmlName>, $obj->{<H::mce|inputEncoding>});
819     $csdef = $Message::Charset::Encode::CharsetDef->{$__uri};
820     if ($errorCondition) {
821     $obj->{<H::mce|onerror>}
822     ->($obj->{<H::mce|onerror>}, null,
823     <Q::cs|charset-name-mismatch-error>,
824     charset_uri => $__uri,
825     charset_name => $obj->{<H::mce|inputEncoding>});
826     }
827     } else {
828     $csdef = $Message::Charset::Encode::CharsetDef->{$defaultURI};
829     $obj->{<H::mce|inputEncoding>} = $defaultName;
830     $restoreBOM;
831     }
832 wakaba 1.2
833     @@Test:
834     @@@QName: MCEncodeImpl.createMCDecodeHandle.test
835     @@@PerlDef:
836     my $impl;
837     __CODE{createImplForTest:: $impl => $impl}__;
838    
839     my $byte = 'a';
840     open my $fh, '>', \$byte;
841     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
842     (<Q::cs|Perl.utf8>, $fh);
843    
844     $test->id ('interface');
845     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
846    
847 wakaba 1.3 $test->id ('onerr');
848     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
849    
850 wakaba 1.2 @@Test:
851     @@@QName: MCEncodeImpl.createMCXMLDecodeHandle.test
852     @@@PerlDef:
853     my $impl;
854     __CODE{createImplForTest:: $impl => $impl}__;
855    
856     my $byte = 'a';
857     open my $fh, '<', \$byte;
858     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
859     (<Q::xml-auto-charset:>, $fh);
860    
861     $test->id ('interface');
862     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
863    
864 wakaba 1.3 $test->id ('onerr');
865     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
866    
867     @@Test:
868     @@@QName: MCEncodeImpl.createMCDecodeHandle.3.test
869     @@@PerlDef:
870     my $impl;
871     __CODE{createImplForTest:: $impl => $impl}__;
872    
873     my $errors = 0;
874    
875     my $byte = 'a';
876     open my $fh, '>', \$byte;
877     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
878     (q<http://www.charset.example/>, $fh, sub {
879     $errors++;
880     $test->id ('errortype');
881     $test->assert_equals
882     ($_[2],
883     <Q::cs|charset-not-supported-error>);
884     });
885    
886     $test->id ('errors');
887     $test->assert_num_equals (actual_value => $errors, expected_value => 1);
888    
889     $test->id ('return');
890     $test->assert_null ($efh);
891 wakaba 1.2
892 wakaba 1.5 @@Test:
893     @@@QName: MCEncodeImpl.createMCDecodeHandle.4.test
894     @@@PerlDef:
895     my $impl;
896     __CODE{createImplForTest:: $impl => $impl}__;
897    
898     my $byte = 'a';
899     open my $fh, '<', \$byte;
900     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
901     (<Q::cs|XML.euc-jp>, $fh);
902    
903     $test->id ('interface');
904     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
905    
906     $test->id ('class');
907     $test->assert_isa ($efh, <ClassName::ManakaiMCEUCJPDecodeHandle>);
908    
909     $test->id ('onerr');
910     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
911    
912     @@Test:
913     @@@QName: MCEncodeImpl.createMCDecodeHandle.5.test
914     @@@PerlDef:
915     my $impl;
916     __CODE{createImplForTest:: $impl => $impl}__;
917    
918     my $byte = 'a';
919     open my $fh, '<', \$byte;
920     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
921     (<Q::cs|XML.euc-jp>, $fh);
922    
923     $test->id ('interface');
924     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
925    
926     $test->id ('class');
927     $test->assert_isa ($efh, <ClassName::ManakaiMCEUCJPDecodeHandle>);
928    
929     $test->id ('onerr');
930     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
931    
932 wakaba 1.6 @@Test:
933     @@@QName: MCEncodeImpl.createMCDecodeHandle.6.test
934     @@@PerlDef:
935     my $impl;
936     __CODE{createImplForTest:: $impl => $impl}__;
937    
938     my $byte = 'a';
939     open my $fh, '<', \$byte;
940     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
941     (<Q::cs|XML.iso-2022-jp>, $fh);
942    
943     $test->id ('interface');
944     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
945    
946     $test->id ('class');
947     $test->assert_isa ($efh, <ClassName::ManakaiMCISO2022JPDecodeHandle>);
948    
949     $test->id ('onerr');
950     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
951    
952     @@Test:
953     @@@QName: MCEncodeImpl.createMCDecodeHandle.7.test
954     @@@PerlDef:
955     my $impl;
956     __CODE{createImplForTest:: $impl => $impl}__;
957    
958     my $byte = 'a';
959     open my $fh, '<', \$byte;
960     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
961     (<Q::icharset|iso-2022-jp>, $fh);
962    
963     $test->id ('interface');
964     $test->assert_isa ($efh, <IFName::MCDecodeHandle>);
965    
966     $test->id ('class');
967     $test->assert_isa ($efh, <ClassName::ManakaiMCISO2022JPDecodeHandle>);
968    
969     $test->id ('onerr');
970     $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>, 'CODE');
971    
972 wakaba 1.2 @Method:
973     @@Name: getURIFromCharsetName
974     @@enDesc:
975     Returns a DOM URI that identifies a charset.
976     @@Param:
977     @@@Name: domain
978     @@@Type: String
979     @@@enDesc:
980     A DOM URI that identifies the context in which the charset
981     name is used.
982     @@Param:
983     @@@Name: name
984     @@@Type: String
985     @@@enDesc:
986     The charset name to convert.
987     @@Return:
988     @@@Type: String
989     @@@enDesc:
990     A DOM URI that identifies <P::name>.
991     @@@nullCase:
992     @@@@enDesc:
993     The implementation was unable to resolve <P::name> to a URI.
994     @@@PerlDef:
995     if ({
996     <Q::cs|ietfName> => true,
997     <Q::cs|xmlName> => true,
998     }->{$domain}) {
999 wakaba 1.3 $name = lc $name;
1000     if ($domain eq <Q::cs|ietfName>) {
1001     $r = <Q::icharset|> . $name;
1002     } elsif ($domain eq <Q::cs|xmlName>) {
1003     $r = <Q::cs|XML.> . $name;
1004     }
1005    
1006     unless ($Message::Charset::Encode::CharsetDef->{$r}) {
1007     U: for my $uri (keys %$Message::Charset::Encode::CharsetDef) {
1008     for (@{$Message::Charset::Encode::CharsetDef->{$uri}->{+{
1009     <Q::cs|ietfName> => <H::cs|ietfName>,
1010     <Q::cs|xmlName> => <H::cs|xmlName>,
1011     }->{$domain}} or []}) {
1012     if ($_ eq $name) {
1013     $r = $uri;
1014     last U;
1015     }
1016 wakaba 1.2 }
1017 wakaba 1.3 } # U
1018 wakaba 1.2 }
1019 wakaba 1.3 } else {
1020     $r = null;
1021     }
1022    
1023     @@Test:
1024     @@@QName: MCEncodeImpl.name2uri.test
1025     @@@PerlDef:
1026     my $impl;
1027     __CODE{createImplForTest:: $impl => $impl}__;
1028    
1029     for (
1030     [<Q::icharset|utf-8>, 'utf-8'],
1031     [<Q::icharset|x-no-such-charset>, 'x-no-such-charset'],
1032     [<Q::icharset|utf-8>, 'UTF-8'],
1033     [<Q::icharset|utf-8>, 'uTf-8'],
1034     [<Q::icharset|utf-16be>, 'utf-16be'],
1035     ) {
1036     $test->id ('ietfname2uri.'.$_->[1]);
1037     my $iname = $impl-><M::MCEncodeImplementation.getURIFromCharsetName>
1038     (<Q::cs|ietfName>, $_->[1]);
1039     $test->assert_equals ($iname, $_->[0]);
1040     }
1041    
1042     for (
1043     [<Q::cs|XML.utf-8>, 'utf-8'],
1044     [<Q::cs|XML.x-no-such-charset>, 'x-no-such-charset'],
1045     [<Q::cs|XML.utf-8>, 'UTF-8'],
1046     [<Q::cs|XML.utf-8>, 'uTf-8'],
1047     [<Q::icharset|utf-16be>, 'utf-16be'],
1048     ) {
1049     $test->id ('xmlname2uri.'.$_->[1]);
1050     my $iname = $impl-><M::MCEncodeImplementation.getURIFromCharsetName>
1051     (<Q::cs|xmlName>, $_->[1]);
1052     $test->assert_equals ($iname, $_->[0]);
1053 wakaba 1.2 }
1054    
1055     @Method:
1056     @@Name: getCharsetNameFromURI
1057     @@enDesc:
1058     Returns a name for the charset identified by a DOM URI.
1059     @@Param:
1060     @@@Name: domain
1061     @@@Type: String
1062     @@@enDesc:
1063     A DOM URI that identifies the context in which the charset
1064     name is used.
1065     @@Param:
1066     @@@Name: uri
1067     @@@Type: String
1068     @@@enDesc:
1069     A DOM URI of the charset.
1070     @@Return:
1071     @@@Type: String
1072     @@@enDesc:
1073     A charset name that identifies <P::uri>.
1074     @@@nullCase:
1075     @@@@enDesc:
1076     The implementation was unable to find the charset name
1077     for the <P::uri> that can be used in <P::domain> context.
1078     @@@PerlDef:
1079     if ({
1080     <Q::cs|ietfName> => true,
1081     <Q::cs|xmlName> => true,
1082     }->{$domain}) {
1083 wakaba 1.3 $r = $Message::Charset::Encode::CharsetDef->{$uri}->{+{
1084     <Q::cs|ietfName> => <H::cs|ietfName>,
1085     <Q::cs|xmlName> => <H::cs|xmlName>,
1086     }->{$domain}}->[0];
1087     unless (defined $r) {
1088     if ($domain eq <Q::cs|ietfName> and
1089     substr ($uri, 0, length <Q::icharset|>) eq <Q::icharset|>) {
1090     $r = substr ($uri, length <Q::icharset|>);
1091     } elsif ($domain eq <Q::cs|xmlName> and
1092     substr ($uri, 0, length <Q::cs|XML.>) eq <Q::cs|XML.>) {
1093     $r = substr ($uri, length <Q::cs|XML.>);
1094     }
1095     }
1096 wakaba 1.2 } else {
1097     $r = null;
1098     }
1099    
1100 wakaba 1.3 @@Test:
1101     @@@QName: MCEncodeImpl.uri2name.test
1102     @@@PerlDef:
1103     my $impl;
1104     __CODE{createImplForTest:: $impl => $impl}__;
1105    
1106     for (
1107     [<Q::icharset|utf-8>, 'utf-8'],
1108     [<Q::icharset|x-no-such-charset>, 'x-no-such-charset'],
1109     [q<http://charset.example/>, null],
1110     ) {
1111     $test->id ('uri2ietfname.'.$_->[0]);
1112     my $iname = $impl-><M::MCEncodeImplementation.getCharsetNameFromURI>
1113     (<Q::cs|ietfName>, $_->[0]);
1114     $test->assert_equals ($iname, $_->[1]);
1115     }
1116    
1117     for (
1118     [<Q::cs|XML.utf-8>, 'utf-8'],
1119     [<Q::cs|XML.x-no-such-charset>, 'x-no-such-charset'],
1120     [q<http://charset.example/>, null],
1121     ) {
1122     $test->id ('uri2xmlname.'.$_->[0]);
1123     my $iname = $impl-><M::MCEncodeImplementation.getCharsetNameFromURI>
1124     (<Q::cs|xmlName>, $_->[0]);
1125     $test->assert_equals ($iname, $_->[1]);
1126     }
1127    
1128 wakaba 1.2 @CODE:
1129     @@QName: createImplForTest
1130     @@PerlDef:
1131 wakaba 1.9 $impl = <Class::c|ManakaiDOMImplementation>->_new;
1132 wakaba 1.2 ##MCEncodeImplementation
1133    
1134     ElementTypeBinding:
1135     @Name: CODE
1136     @ElementType:
1137     dis:ResourceDef
1138     @ShadowContent:
1139     @@DISCore:resourceType: DISPerl|BlockCode
1140     @@ForCheck: ManakaiDOM|ForClass
1141    
1142     IFClsDef:
1143     @IFQName: MCDecodeHandle
1144     @ClsQName: ManakaiMCDecodeHandle
1145    
1146     @enDesc:
1147     An <IF::MCDecodeHandle> provides the read access to a character
1148     stream.
1149    
1150     @enDesc:
1151     @@ddid: cestype
1152     @@ForCheck: ManakaiDOM|ForClass
1153     @@@:
1154     The class <Class::ManakaiMCDecodeHandle> can be used to
1155     encapsulate a byte filehandle with <Perl::Encode> call
1156     into a character filehandle-like object.
1157    
1158     The encoding <kwd:MUST> be stateless and signatureless. In addition,
1159     its <Perl::Encode> implementation <kwd:MUST> support
1160     the <Perl::FB_QUIET> flag.
1161    
1162     @Attr:
1163     @@Name: charset
1164     @@enDesc:
1165     A URI that identifies the charset of the handle.
1166     @@Type: String
1167     @@Get:
1168     @@@PerlDef:
1169     $r = $self->{<H::mce|charset>};
1170    
1171     @Attr:
1172 wakaba 1.3 @@Name: onerror
1173 wakaba 1.2 @@enDesc:
1174 wakaba 1.3 A callback function that is invoked when an error is encountered.
1175    
1176     {P:: The function will be invoked with arguments:
1177 wakaba 1.2
1178 wakaba 1.3 - <Perl::$self>::: The function itself.
1179 wakaba 1.2
1180 wakaba 1.3 - <Perl::$handle>::: The <IF::MCDecodeHandle> object.
1181     If the error is thrown during the
1182     construction of the object, it might
1183     be <DOM::null> instead.
1184    
1185     - <Perl::$errorType>::: The DOM URI that identifies the category
1186     of the error.
1187    
1188     - <Perl::%opt>::: Named parameters depending to the <Perl::$errorType>.
1189    
1190     It <kwd:MAY> throw an exception.
1191     }
1192 wakaba 1.2 @@Type: DISPerl|CODE||ManakaiDOM|all
1193     @@Get:
1194     @@@PerlDef:
1195 wakaba 1.3 $r = $self->{<H::mce|onerror>};
1196 wakaba 1.2 @@Set:
1197     @@@PerlDef:
1198 wakaba 1.3 $self->{<H::mce|onerror>} = $given;
1199 wakaba 1.2
1200     @@Test:
1201     @@@QName: MCDecodeHandle.onoctetstreamerror.test
1202     @@@PerlDef:
1203     my $impl;
1204     __CODE{createImplForTest:: $impl => $impl}__;
1205    
1206     my $byte = "a\xE3\x81\x82\x81a";
1207     open my $fh, '<', \$byte;
1208     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1209     (<Q::cs|Perl.utf8>, $fh);
1210    
1211     $test->id ('default');
1212 wakaba 1.3 $test->assert_isa ($efh-><AG::MCDecodeHandle.onerror>,
1213 wakaba 1.2 'CODE');
1214    
1215     $test->id ('get.set');
1216     my $sub1 = sub { return "2" };
1217 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> ($sub1);
1218     my $sub2 = $efh-><AG::MCDecodeHandle.onerror>;
1219 wakaba 1.2 $test->assert_equals ($sub2, $sub1);
1220     $test->assert_equals ($sub2->(), "2");
1221    
1222     @Method:
1223     @@Name: getc
1224     @@enDesc:
1225     Returns the next character from the input.
1226     @@Return:
1227     @@@Type: String
1228     @@@enDesc:
1229     The next character.
1230     @@@nullCase:
1231     @@@@enDesc:
1232     If at the end of the file, or if there was an error, in which
1233     case <Perl::$!> is set.
1234     @@@PerlDef:
1235     if (@{$self->{<H::mce|characterQueue>}}) {
1236     $r = shift @{$self->{<H::mce|characterQueue>}};
1237     } else {
1238     __DEEP{
1239     my $error;
1240     if ($self->{<H::mce|continue>}) {
1241     if (read $self->{<H::mce|filehandle>},
1242     $self->{<H::mce|byteBuffer>}, 256,
1243     length $self->{<H::mce|byteBuffer>}) {
1244     #
1245     } else {
1246     $error = true;
1247     }
1248     $self->{<H::mce|continue>} = false;
1249 wakaba 1.3 } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
1250 wakaba 1.2 read $self->{<H::mce|filehandle>},
1251     $self->{<H::mce|byteBuffer>}, 256,
1252     length $self->{<H::mce|byteBuffer>};
1253     }
1254    
1255     unless ($error) {
1256     my $string = Encode::decode ($self->{<H::mce|perlEncodingName>},
1257     $self->{<H::mce|byteBuffer>},
1258     Encode::FB_QUIET ());
1259     if (length $string) {
1260     push @{$self->{<H::mce|characterQueue>}}, split //, $string;
1261     $r = shift @{$self->{<H::mce|characterQueue>}};
1262 wakaba 1.3 if (length $self->{<H::mce|byteBuffer>}) {
1263     $self->{<H::mce|continue>} = true;
1264     }
1265 wakaba 1.2 } else {
1266 wakaba 1.3 if (length $self->{<H::mce|byteBuffer>}) {
1267     $error = true;
1268     } else {
1269     $r = null;
1270     }
1271 wakaba 1.2 }
1272 wakaba 1.3 }
1273     if ($error) {
1274 wakaba 1.2 $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
1275 wakaba 1.3 $self->{<H::mce|onerror>}
1276     ->($self->{<H::mce|onerror>}, $self,
1277     <Q::cs|illegal-octets-error>,
1278     octets => \$r);
1279 wakaba 1.2 }
1280     }__;
1281     }
1282    
1283     @@Test:
1284     @@@QName: MCDecodeHandle.getc.1.test
1285     @@@PerlDef:
1286     my $impl;
1287     __CODE{createImplForTest:: $impl => $impl}__;
1288    
1289     my $byte = "a\xE3\x81\x82\x81a";
1290     open my $fh, '<', \$byte;
1291     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1292     (<Q::cs|Perl.utf8>, $fh);
1293    
1294     my $error = null;
1295 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1296     my ($self, $efh, $type, %opt) = @_;
1297     $error = ${$opt{octets}};
1298 wakaba 1.2 });
1299    
1300     $test->id (1);
1301     $test->assert_equals ($efh->getc, "a");
1302     $test->id ('1.err');
1303     $test->assert_null ($error);
1304    
1305     $test->id (2);
1306     $test->assert_equals ($efh->getc, "\x{3042}");
1307     $test->id ('1.err');
1308     $test->assert_null ($error);
1309    
1310     $test->id (3);
1311     $test->assert_equals ($efh->getc, "\x81");
1312     $test->id ('1.err');
1313     $test->assert_equals ($error, "\x81");
1314     $error = null;
1315    
1316     $test->id (4);
1317     $test->assert_equals ($efh->getc, "a");
1318     $test->id ('4.err');
1319     $test->assert_null ($error);
1320    
1321     $test->id ('eof');
1322     $test->assert_null ($efh->getc);
1323     $test->id ('eof.err');
1324     $test->assert_null ($error);
1325    
1326     @@Test:
1327     @@@QName: MCDecodeHandle.getc.2.test
1328     @@@PerlDef:
1329     my $impl;
1330     __CODE{createImplForTest:: $impl => $impl}__;
1331    
1332     my $byte = "a" x 256;
1333     $byte .= "b" x 256;
1334    
1335     open my $fh, '<', \$byte;
1336     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1337     (<Q::cs|Perl.utf8>, $fh);
1338    
1339     my $error = null;
1340 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1341     my ($self, $efh, $type, %opt) = @_;
1342     $error = ${$opt{octets}};
1343 wakaba 1.2 });
1344    
1345     for my $i (0..255) {
1346     $test->id ("a.$i");
1347     $test->assert_equals ($efh->getc, "a");
1348     $test->id ("a.$i.err");
1349     $test->assert_null ($error);
1350     }
1351    
1352     for my $i (0..255) {
1353     $test->id ("b.$i");
1354     $test->assert_equals ($efh->getc, "b");
1355     $test->id ("b.$i.err");
1356     $test->assert_null ($error);
1357     }
1358    
1359     $test->id ('eof');
1360     $test->assert_null ($efh->getc);
1361     $test->id ('eof.err');
1362     $test->assert_null ($error);
1363    
1364     @@Test:
1365     @@@QName: MCDecodeHandle.getc.3.test
1366     @@@PerlDef:
1367     my $impl;
1368     __CODE{createImplForTest:: $impl => $impl}__;
1369    
1370     my $byte = "a" x 255;
1371     $byte .= "\xE3\x81\x82";
1372     $byte .= "b" x 256;
1373    
1374     open my $fh, '<', \$byte;
1375     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1376     (<Q::cs|Perl.utf8>, $fh);
1377    
1378     my $error = null;
1379 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1380     my ($self, $efh, $type, %opt) = @_;
1381     $error = ${$opt{octets}};
1382 wakaba 1.2 });
1383    
1384     for my $i (0..254) {
1385     $test->id ("a.$i");
1386     $test->assert_equals ($efh->getc, "a");
1387     $test->id ("a.$i.err");
1388     $test->assert_null ($error);
1389     }
1390    
1391     $test->id ("A");
1392     $test->assert_equals ($efh->getc, "\x{3042}");
1393     $test->id ("A.err");
1394     $test->assert_null ($error);
1395    
1396     for my $i (0..255) {
1397     $test->id ("b.$i");
1398     $test->assert_equals ($efh->getc, "b");
1399     $test->id ("b.$i.err");
1400     $test->assert_null ($error);
1401     }
1402    
1403     $test->id ('eof');
1404     $test->assert_null ($efh->getc);
1405     $test->id ('eof.err');
1406     $test->assert_null ($error);
1407    
1408     @@Test:
1409     @@@QName: MCDecodeHandle.getc.4.test
1410     @@@PerlDef:
1411     my $impl;
1412     __CODE{createImplForTest:: $impl => $impl}__;
1413    
1414     my $byte = "a" x 255;
1415     $byte .= "\xE3";
1416    
1417     open my $fh, '<', \$byte;
1418     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1419     (<Q::cs|Perl.utf8>, $fh);
1420    
1421     my $error = null;
1422 wakaba 1.3 $efh-><AS::MCDecodeHandle.onerror> (sub {
1423     my ($self, $efh, $type, %opt) = @_;
1424     $error = ${$opt{octets}};
1425 wakaba 1.2 });
1426    
1427     for my $i (0..254) {
1428     $test->id ("a.$i");
1429     $test->assert_equals ($efh->getc, "a");
1430     $test->id ("a.$i.err");
1431     $test->assert_null ($error);
1432     }
1433    
1434     $test->id ("E3");
1435     $test->assert_equals ($efh->getc, "\xE3");
1436     $test->id ("E3.err");
1437     $test->assert_equals ($error, "\xE3");
1438     $error = null;
1439    
1440     $test->id ('eof');
1441     $test->assert_null ($efh->getc);
1442     $test->id ('eof.err');
1443     $test->assert_null ($error);
1444 wakaba 1.4
1445     @Method:
1446     @@Name: ungetc
1447     @@enDesc:
1448     Pushes a character with the given ordinal value back
1449     onto the handle's input stream. In <Perl::IO::Handle>
1450     only one character of pushback per handle is guaranteed.
1451     @@Param:
1452     @@@Name: ord
1453     @@@Type: idl|unsignedLong||ManakaiDOM|all
1454     @@@enDesc:
1455     The ordinal value of the character to push back.
1456     @@Return:
1457     @@@PerlDef:
1458     unshift @{$self->{<H::mce|characterQueue>}}, chr $ord;
1459    
1460     @@Test:
1461     @@@QName: MCDecodeHandle.ungetc.test
1462     @@@PerlDef:
1463     my $impl;
1464     __CODE{createImplForTest:: $impl => $impl}__;
1465    
1466     my $byte = "a\x{4E00}b\x{4E11}";
1467    
1468     open my $fh, '<', \$byte;
1469     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1470     (<Q::cs|Perl.utf8>, $fh);
1471    
1472     $test->id ('1.getc');
1473     $test->assert_equals ($efh->getc, "a");
1474    
1475     $test->id ('1.ungetc');
1476     $efh->ungetc (ord "a");
1477     $test->assert_equals ($efh->getc, "a");
1478    
1479     $test->id ('2.getc');
1480     $test->assert_equals ($efh->getc, "\x{4E00}");
1481    
1482     $test->id ('2.ungetc');
1483     $efh->ungetc (ord "\x{4E00}");
1484     $test->assert_equals ($efh->getc, "\x{4E00}");
1485    
1486     $test->id ('3.getc');
1487     $test->assert_equals ($efh->getc, "b");
1488    
1489     $test->id ('4.getc');
1490     $test->assert_equals ($efh->getc, "\x{4E11}");
1491    
1492     $test->id ('4.ungetc');
1493     $efh->ungetc (ord "\x{4E11}");
1494     $test->assert_equals ($efh->getc, "\x{4E11}");
1495 wakaba 1.2
1496 wakaba 1.3 @Attr:
1497     @@Name: inputEncoding
1498     @@enDesc:
1499     The name of the input charset.
1500     @@Type: String
1501     @@Get:
1502     @@@enDesc:
1503     If there is a string looks like encoding declaration,
1504     then the value of it, in lowercase. Otherwise and
1505     there is the UTF-16 <CHAR::BOM>, then <CODE::utf-16>.
1506     Otherwise, <CODE::utf-8>.
1507     @@@nullCase:
1508     @@@@enDesc:
1509     If the charset is different from <Q::xml-auto-charset:>.
1510     @@@PerlDef:
1511     $r = $self->{<H::mce|inputEncoding>};
1512    
1513     @Attr:
1514     @@Name: hasBOM
1515     @@enDesc:
1516     Whether the decoder detected the <CHAR::BYTE ORDER MARK> or not.
1517     @@Type: idl|boolean||ManakaiDOM|all
1518     @@Get:
1519     @@@TrueCase:
1520     @@@@enDesc:
1521     If there is the <CHAR::BOM>.
1522     @@@FalseCase:
1523     @@@@enDesc:
1524     Either if there is no <CHAR::BOM>, the decoder not
1525     reached to the end of the <CHAR::BOM>, or
1526     the decoder implementation does not provide whether
1527     there is the <CHAR::BOM> or not.
1528     @@@PerlDef:
1529     $r = $self->{<H::mce|hasBOM>};
1530    
1531     @Test:
1532     @@QName: MCDecodeHandle.utf-8-optional-bom.1.test
1533     @@PerlDef:
1534     my $impl;
1535     __CODE{createImplForTest:: $impl => $impl}__;
1536    
1537     my $byte = qq<\xEF\xBB\xBFabc>;
1538    
1539     my $error;
1540    
1541     open my $fh, '<', \$byte;
1542     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1543     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1544    
1545     $test->id (1);
1546     $test->assert_equals ($efh->getc, "a");
1547    
1548     $test->id (2);
1549     $test->assert_equals ($efh->getc, "b");
1550    
1551     $test->id (3);
1552     $test->assert_equals ($efh->getc, "c");
1553    
1554     $test->id (4);
1555     $test->assert_null ($efh->getc);
1556    
1557     $test->id ('err');
1558     $test->assert_false ($error);
1559    
1560     $test->id ('bom');
1561     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1562    
1563     @Test:
1564     @@QName: MCDecodeHandle.utf-8-optional-bom.2.test
1565     @@PerlDef:
1566     my $impl;
1567     __CODE{createImplForTest:: $impl => $impl}__;
1568    
1569     my $byte = qq<abc>;
1570    
1571     my $error;
1572    
1573     open my $fh, '<', \$byte;
1574     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1575     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1576    
1577     $test->id (1);
1578     $test->assert_equals ($efh->getc, "a");
1579    
1580     $test->id (2);
1581     $test->assert_equals ($efh->getc, "b");
1582    
1583     $test->id (3);
1584     $test->assert_equals ($efh->getc, "c");
1585    
1586     $test->id (4);
1587     $test->assert_null ($efh->getc);
1588    
1589     $test->id ('err');
1590     $test->assert_false ($error);
1591    
1592     $test->id ('bom');
1593     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1594    
1595     @Test:
1596     @@QName: MCDecodeHandle.utf-8-optional-bom.3.test
1597     @@PerlDef:
1598     my $impl;
1599     __CODE{createImplForTest:: $impl => $impl}__;
1600    
1601     my $byte = qq<\xEF\xBB\xBF\xEF\xBB\xBFabc>;
1602    
1603     my $error;
1604    
1605     open my $fh, '<', \$byte;
1606     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1607     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1608    
1609     $test->id ('zwnbsp');
1610     $test->assert_equals ($efh->getc, "\x{FEFF}");
1611    
1612     $test->id (1);
1613     $test->assert_equals ($efh->getc, "a");
1614    
1615     $test->id (2);
1616     $test->assert_equals ($efh->getc, "b");
1617    
1618     $test->id (3);
1619     $test->assert_equals ($efh->getc, "c");
1620    
1621     $test->id (4);
1622     $test->assert_null ($efh->getc);
1623    
1624     $test->id ('err');
1625     $test->assert_false ($error);
1626    
1627     $test->id ('bom');
1628     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1629    
1630     @Test:
1631     @@QName: MCDecodeHandle.utf-8-optional-bom.4.test
1632     @@PerlDef:
1633     my $impl;
1634     __CODE{createImplForTest:: $impl => $impl}__;
1635    
1636     my $byte = qq<\xEF\xBB\xBF>;
1637    
1638     my $error;
1639    
1640     open my $fh, '<', \$byte;
1641     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1642     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1643    
1644     $test->id (1);
1645     $test->assert_null ($efh->getc);
1646    
1647     $test->id ('err');
1648     $test->assert_false ($error);
1649    
1650     $test->id ('bom');
1651     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1652    
1653     @Test:
1654     @@QName: MCDecodeHandle.utf-8-optional-bom.5.test
1655     @@PerlDef:
1656     my $impl;
1657     __CODE{createImplForTest:: $impl => $impl}__;
1658    
1659     my $byte = qq<>;
1660    
1661     my $error;
1662    
1663     open my $fh, '<', \$byte;
1664     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1665     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1666    
1667     $test->id (1);
1668     $test->assert_null ($efh->getc);
1669    
1670     $test->id ('err');
1671     $test->assert_false ($error);
1672    
1673     $test->id ('bom');
1674     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1675    
1676     @Test:
1677     @@QName: MCDecodeHandle.utf-8-optional-bom.6.test
1678     @@PerlDef:
1679     my $impl;
1680     __CODE{createImplForTest:: $impl => $impl}__;
1681    
1682     my $byte = qq<ab>;
1683    
1684     my $error;
1685    
1686     open my $fh, '<', \$byte;
1687     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1688     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1689    
1690     $test->id (1);
1691     $test->assert_equals ($efh->getc, "a");
1692    
1693     $test->id (2);
1694     $test->assert_equals ($efh->getc, "b");
1695    
1696     $test->id (3);
1697     $test->assert_null ($efh->getc);
1698    
1699     $test->id ('err');
1700     $test->assert_false ($error);
1701    
1702     $test->id ('bom');
1703     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1704    
1705     @Test:
1706     @@QName: MCDecodeHandle.utf-8-optional-bom.7.test
1707     @@PerlDef:
1708     my $impl;
1709     __CODE{createImplForTest:: $impl => $impl}__;
1710    
1711     my $byte = qq<a>;
1712    
1713     my $error;
1714    
1715     open my $fh, '<', \$byte;
1716     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1717     (<Q::cs|XML.utf-8>, $fh, sub { $error = true });
1718    
1719     $test->id (1);
1720     $test->assert_equals ($efh->getc, "a");
1721    
1722     $test->id (2);
1723     $test->assert_null ($efh->getc);
1724    
1725     $test->id ('err');
1726     $test->assert_false ($error);
1727    
1728     $test->id ('bom');
1729     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1730    
1731    
1732     @Test:
1733     @@QName: MCDecodeHandle.utf-16-with-bom.1.test
1734     @@PerlDef:
1735     my $impl;
1736     __CODE{createImplForTest:: $impl => $impl}__;
1737    
1738     my $byte = qq<\xFE\xFF\x4E\x00\x00a>;
1739    
1740     my $error;
1741    
1742     open my $fh, '<', \$byte;
1743     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1744     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1745    
1746     $test->id (1);
1747     $test->assert_equals ($efh->getc, "\x{4E00}");
1748    
1749     $test->id (2);
1750     $test->assert_equals ($efh->getc, "a");
1751    
1752     $test->id (3);
1753     $test->assert_null ($efh->getc);
1754    
1755     $test->id ('err');
1756     $test->assert_false ($error);
1757    
1758     $test->id ('bom');
1759     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1760    
1761     @Test:
1762     @@QName: MCDecodeHandle.utf-16-with-bom.2.test
1763     @@PerlDef:
1764     my $impl;
1765     __CODE{createImplForTest:: $impl => $impl}__;
1766    
1767     my $byte = qq<\xFF\xFE\x00\x4Ea\x00>;
1768    
1769     my $error;
1770    
1771     open my $fh, '<', \$byte;
1772     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1773     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1774    
1775     $test->id (1);
1776     $test->assert_equals ($efh->getc, "\x{4E00}");
1777    
1778     $test->id (2);
1779     $test->assert_equals ($efh->getc, "a");
1780    
1781     $test->id (3);
1782     $test->assert_null ($efh->getc);
1783    
1784     $test->id ('err');
1785     $test->assert_false ($error);
1786    
1787     $test->id ('bom');
1788     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1789    
1790     @Test:
1791     @@QName: MCDecodeHandle.utf-16-with-bom.3.test
1792     @@PerlDef:
1793     my $impl;
1794     __CODE{createImplForTest:: $impl => $impl}__;
1795    
1796     my $byte = qq<\xFE\xFF\x00a>;
1797    
1798     my $error;
1799    
1800     open my $fh, '<', \$byte;
1801     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1802     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1803    
1804     $test->id (1);
1805     $test->assert_equals ($efh->getc, "a");
1806    
1807     $test->id (2);
1808     $test->assert_null ($efh->getc);
1809    
1810     $test->id ('err');
1811     $test->assert_false ($error);
1812    
1813     $test->id ('bom');
1814     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1815    
1816     @Test:
1817     @@QName: MCDecodeHandle.utf-16-with-bom.4.test
1818     @@PerlDef:
1819     my $impl;
1820     __CODE{createImplForTest:: $impl => $impl}__;
1821    
1822     my $byte = qq<\xFF\xFEa\x00>;
1823    
1824     my $error;
1825    
1826     open my $fh, '<', \$byte;
1827     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1828     (<Q::cs|XML.utf-16>, $fh, sub { $error = true });
1829    
1830     $test->id (1);
1831     $test->assert_equals ($efh->getc, "a");
1832    
1833     $test->id (2);
1834     $test->assert_null ($efh->getc);
1835    
1836     $test->id ('err');
1837     $test->assert_false ($error);
1838    
1839     $test->id ('bom');
1840     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1841    
1842     @Test:
1843     @@QName: MCDecodeHandle.utf-16-with-bom.5.test
1844     @@PerlDef:
1845     my $impl;
1846     __CODE{createImplForTest:: $impl => $impl}__;
1847    
1848     my $byte = qq<\xFE\xFFa>;
1849    
1850     my $error;
1851    
1852     open my $fh, '<', \$byte;
1853     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1854     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1855    
1856     $test->id ('0.error');
1857     $test->assert_null ($error);
1858    
1859     $test->id (1);
1860     $test->assert_equals ($efh->getc, "a");
1861     $test->id ('1.error');
1862     $test->assert_equals ($error, <Q::cs|illegal-octets-error>);
1863     $error = null;
1864    
1865     $test->id (2);
1866     $test->assert_null ($efh->getc);
1867    
1868     $test->id ('err');
1869     $test->assert_null ($error);
1870    
1871     $test->id ('bom');
1872     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1873    
1874     @Test:
1875     @@QName: MCDecodeHandle.utf-16-with-bom.6.test
1876     @@PerlDef:
1877     my $impl;
1878     __CODE{createImplForTest:: $impl => $impl}__;
1879    
1880     my $byte = qq<\xFF\xFEa>;
1881    
1882     my $error;
1883    
1884     open my $fh, '<', \$byte;
1885     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1886     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1887    
1888     $test->id ('0.error');
1889     $test->assert_null ($error);
1890    
1891     $test->id (1);
1892     $test->assert_equals ($efh->getc, "a");
1893     $test->id ('1.error');
1894     $test->assert_equals ($error, <Q::cs|illegal-octets-error>);
1895     $error = null;
1896    
1897     $test->id (2);
1898     $test->assert_null ($efh->getc);
1899    
1900     $test->id ('err');
1901     $test->assert_null ($error);
1902    
1903     $test->id ('bom');
1904     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1905    
1906     @Test:
1907     @@QName: MCDecodeHandle.utf-16-with-bom.7.test
1908     @@PerlDef:
1909     my $impl;
1910     __CODE{createImplForTest:: $impl => $impl}__;
1911    
1912     my $byte = qq<\xFE\xFF>;
1913    
1914     my $error;
1915    
1916     open my $fh, '<', \$byte;
1917     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1918     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1919    
1920     $test->id (1);
1921     $test->assert_null ($efh->getc);
1922    
1923     $test->id ('err');
1924     $test->assert_null ($error);
1925    
1926     $test->id ('bom');
1927     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1928    
1929     @Test:
1930     @@QName: MCDecodeHandle.utf-16-with-bom.8.test
1931     @@PerlDef:
1932     my $impl;
1933     __CODE{createImplForTest:: $impl => $impl}__;
1934    
1935     my $byte = qq<\xFF\xFE>;
1936    
1937     my $error;
1938    
1939     open my $fh, '<', \$byte;
1940     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1941     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1942    
1943     $test->id (1);
1944     $test->assert_null ($efh->getc);
1945    
1946     $test->id ('err');
1947     $test->assert_null ($error);
1948    
1949     $test->id ('bom');
1950     $test->assert_true ($efh-><AG::MCDecodeHandle.hasBOM>);
1951    
1952     @Test:
1953     @@QName: MCDecodeHandle.utf-16-with-bom.9.test
1954     @@PerlDef:
1955     my $impl;
1956     __CODE{createImplForTest:: $impl => $impl}__;
1957    
1958     my $byte = qq<\xFD\xFF>;
1959    
1960     my $error;
1961    
1962     open my $fh, '<', \$byte;
1963     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1964     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1965    
1966     $test->id ('no.bom');
1967     $test->assert_equals ($error, <Q::cs|no-bom-error>);
1968     $error = null;
1969    
1970     $test->id (1);
1971     $test->assert_equals ($efh->getc, "\x{FDFF}");
1972    
1973     $test->id (2);
1974     $test->assert_null ($efh->getc);
1975    
1976     $test->id ('err');
1977     $test->assert_null ($error);
1978    
1979     $test->id ('bom');
1980     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
1981    
1982     @Test:
1983     @@QName: MCDecodeHandle.utf-16-with-bom.10.test
1984     @@PerlDef:
1985     my $impl;
1986     __CODE{createImplForTest:: $impl => $impl}__;
1987    
1988     my $byte = qq<\xFD>;
1989    
1990     my $error;
1991    
1992     open my $fh, '<', \$byte;
1993     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
1994     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
1995    
1996     $test->id ('no.bom');
1997     $test->assert_equals ($error, <Q::cs|no-bom-error>);
1998     $error = null;
1999    
2000     $test->id (1);
2001     $test->assert_equals ($efh->getc, "\xFD");
2002     $test->id ('1.error');
2003     $test->assert_equals ($error, <Q::cs|illegal-octets-error>);
2004     $error = null;
2005    
2006     $test->id (2);
2007     $test->assert_null ($efh->getc);
2008    
2009     $test->id ('err');
2010     $test->assert_null ($error);
2011    
2012     $test->id ('bom');
2013     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
2014    
2015     @Test:
2016     @@QName: MCDecodeHandle.utf-16-with-bom.11.test
2017     @@PerlDef:
2018     my $impl;
2019     __CODE{createImplForTest:: $impl => $impl}__;
2020    
2021     my $byte = qq<>;
2022    
2023     my $error;
2024    
2025     open my $fh, '<', \$byte;
2026     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
2027     (<Q::cs|XML.utf-16>, $fh, sub { $error = $_[2] });
2028    
2029     $test->id ('no.bom');
2030     $test->assert_equals ($error, <Q::cs|no-bom-error>);
2031     $error = null;
2032    
2033     $test->id (1);
2034     $test->assert_null ($efh->getc);
2035    
2036     $test->id ('err');
2037     $test->assert_null ($error);
2038    
2039     $test->id ('bom');
2040     $test->assert_false ($efh-><AG::MCDecodeHandle.hasBOM>);
2041 wakaba 1.2
2042 wakaba 1.3 @Test:
2043     @@QName: MCDecodeHandle.xml.1.test
2044     @@PerlDef:
2045     my $impl;
2046     __CODE{createImplForTest:: $impl => $impl}__;
2047 wakaba 1.2
2048 wakaba 1.3 my @testdata = (
2049     {
2050     id => q<l=0>,
2051     in => q<>,
2052     out => [null],
2053     name => 'utf-8', bom => false,
2054     },
2055     {
2056     id => q<l=1>,
2057     in => "a",
2058     out => [null, "a", null],
2059     name => 'utf-8', bom => false,
2060     },
2061     {
2062     id => q<bom8.l=0>,
2063     in => "\xEF\xBB\xBF",
2064     out => [null],
2065     name => 'utf-8', bom => true,
2066     },
2067     {
2068     id => q<bom8.l=1>,
2069     in => "\xEF\xBB\xBFa",
2070     out => [null, "a", null],
2071     name => 'utf-8', bom => true,
2072     },
2073     {
2074     id => q<bom8.zwnbsp>,
2075     in => "\xEF\xBB\xBF\xEF\xBB\xBF",
2076     out => [null, "\x{FEFF}", null],
2077     name => 'utf-8', bom => true,
2078     },
2079     {
2080     id => q<bom16be.l=0>,
2081     in => "\xFE\xFF",
2082     out => [null],
2083     name => 'utf-16', bom => true,
2084     },
2085     {
2086     id => q<bom16le.l=0>,
2087     in => "\xFF\xFE",
2088     out => [null],
2089     name => 'utf-16', bom => true,
2090     },
2091     {
2092     id => q<bom16be.l=1>,
2093     in => "\xFE\xFFa",
2094     out => [null, "a", [<Q::cs|illegal-octets-error>]],
2095     name => 'utf-16', bom => true,
2096     },
2097     {
2098     id => q<bom16le.l=1>,
2099     in => "\xFF\xFEa",
2100     out => [null, "a", [<Q::cs|illegal-octets-error>]],
2101     name => 'utf-16', bom => true,
2102     },
2103     {
2104     id => q<bom16be.l=2>,
2105     in => "\xFE\xFF\x4E\x00",
2106     out => [null, "\x{4E00}", null],
2107     name => 'utf-16', bom => true,
2108     },
2109     {
2110     id => q<bom16le.l=2>,
2111     in => "\xFF\xFE\x00\x4E",
2112     out => [null, "\x{4E00}", null],
2113     name => 'utf-16', bom => true,
2114     },
2115     {
2116     id => q<bom16be.l=2lt>,
2117     in => "\xFE\xFF\x00<",
2118     out => [null, "<", null],
2119     name => 'utf-16', bom => true,
2120     },
2121     {
2122     id => q<bom16le.l=2lt>,
2123     in => "\xFF\xFE<\x00",
2124     out => [null, "<", null],
2125     name => 'utf-16', bom => true,
2126     },
2127     {
2128     id => q<bom16be.zwnbsp>,
2129     in => "\xFE\xFF\xFE\xFF",
2130     out => [null, "\x{FEFF}", null],
2131     name => 'utf-16', bom => true,
2132     },
2133     {
2134     id => q<bom16le.zwnbsp>,
2135     in => "\xFF\xFE\xFF\xFE",
2136     out => [null, "\x{FEFF}", null],
2137     name => 'utf-16', bom => true,
2138     },
2139     {
2140     id => q<bom32e3412.l=0>,
2141     in => "\xFE\xFF\x00\x00",
2142     out => [null, "\x00", null],
2143     name => 'utf-16', bom => true,
2144     },
2145     {
2146     id => q<bom32e4321.l=0>,
2147     in => "\xFF\xFE\x00\x00",
2148     out => [null, "\x00", null],
2149     name => 'utf-16', bom => true,
2150     },
2151     {
2152     id => q<bom16be.l=4ltq>,
2153     in => "\xFE\xFF\x00<\x00?",
2154     out => [null, "<", null, "?", null],
2155     name => 'utf-16', bom => true,
2156     },
2157     {
2158     id => q<bom16le.l=4ltq>,
2159     in => "\xFF\xFE<\x00?\x00",
2160     out => [null, "<", null, "?", null],
2161     name => 'utf-16', bom => true,
2162     },
2163     {
2164     id => q<bom16be.decl.1>,
2165     in => qq[\xFE\xFF\x00<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2166     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2167     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2168     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>],
2169     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2170     " ", null, "v", null, "e", null, "r", null, "s", null,
2171     "i", null, "o", null, "n", null, "=", null, '"', null,
2172     "1", null, ".", null, "0", null, '"', null, " ", null,
2173     "e", null, "n", null, "c", null, "o", null, "d", null,
2174     "i", null, "n", null, "g", null, "=", null, '"', null,
2175     "u", null, "t", null, "f", null, "-", null, "1", null,
2176     "6", null, '"', null, "?", null, ">", null],
2177     name => 'utf-16', bom => true,
2178     },
2179     {
2180     id => q<bom16le.decl.1>,
2181     in => qq[\xFF\xFE<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2182     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2183     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2184     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>\x00],
2185     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2186     " ", null, "v", null, "e", null, "r", null, "s", null,
2187     "i", null, "o", null, "n", null, "=", null, '"', null,
2188     "1", null, ".", null, "0", null, '"', null, " ", null,
2189     "e", null, "n", null, "c", null, "o", null, "d", null,
2190     "i", null, "n", null, "g", null, "=", null, '"', null,
2191     "u", null, "t", null, "f", null, "-", null, "1", null,
2192     "6", null, '"', null, "?", null, ">", null],
2193     name => 'utf-16', bom => true,
2194     },
2195     {
2196     id => q<utf16be.decl.1>,
2197     in => qq[\x00<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2198     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2199     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2200     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00b\x00e\x00"\x00?\x00>],
2201     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2202     " ", null, "v", null, "e", null, "r", null, "s", null,
2203     "i", null, "o", null, "n", null, "=", null, '"', null,
2204     "1", null, ".", null, "0", null, '"', null, " ", null,
2205     "e", null, "n", null, "c", null, "o", null, "d", null,
2206     "i", null, "n", null, "g", null, "=", null, '"', null,
2207     "u", null, "t", null, "f", null, "-", null, "1", null,
2208     "6", null, "b", null, "e", null, '"', null,
2209     "?", null, ">", null],
2210     name => 'utf-16be', bom => false,
2211     },
2212     {
2213     id => q<utf16le.decl.1>,
2214     in => qq[<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2215     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2216     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2217     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00l\x00e\x00"].
2218     qq[\x00?\x00>\x00],
2219     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2220     " ", null, "v", null, "e", null, "r", null, "s", null,
2221     "i", null, "o", null, "n", null, "=", null, '"', null,
2222     "1", null, ".", null, "0", null, '"', null, " ", null,
2223     "e", null, "n", null, "c", null, "o", null, "d", null,
2224     "i", null, "n", null, "g", null, "=", null, '"', null,
2225     "u", null, "t", null, "f", null, "-", null, "1", null,
2226     "6", null, "l", null, "e", null, '"', null, "?", null,
2227     ">", null],
2228     name => 'utf-16le', bom => false,
2229     },
2230     {
2231     id => q<16be.decl.1>,
2232     in => qq[\x00<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2233     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2234     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2235     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>],
2236     out => [[<Q::cs|charset-name-mismatch-error>],
2237     "<", null, "?", null, "x", null, "m", null, "l", null,
2238     " ", null, "v", null, "e", null, "r", null, "s", null,
2239     "i", null, "o", null, "n", null, "=", null, '"', null,
2240     "1", null, ".", null, "0", null, '"', null, " ", null,
2241     "e", null, "n", null, "c", null, "o", null, "d", null,
2242     "i", null, "n", null, "g", null, "=", null, '"', null,
2243     "u", null, "t", null, "f", null, "-", null, "1", null,
2244     "6", null, '"', null, "?", null, ">", null],
2245     name => 'utf-16', bom => false,
2246     },
2247     {
2248     id => q<16le.decl.1>,
2249     in => qq[<\x00?\x00x\x00m\x00l\x00 \x00v\x00e\x00r].
2250     qq[\x00s\x00i\x00o\x00n\x00=\x00"\x001\x00.\x000\x00"].
2251     qq[\x00 \x00e\x00n\x00c\x00o\x00d\x00i\x00n\x00g\x00=].
2252     qq[\x00"\x00u\x00t\x00f\x00-\x001\x006\x00"\x00?\x00>\x00],
2253     out => [[<Q::cs|charset-name-mismatch-error>],
2254     "<", null, "?", null, "x", null, "m", null, "l", null,
2255     " ", null, "v", null, "e", null, "r", null, "s", null,
2256     "i", null, "o", null, "n", null, "=", null, '"', null,
2257     "1", null, ".", null, "0", null, '"', null, " ", null,
2258     "e", null, "n", null, "c", null, "o", null, "d", null,
2259     "i", null, "n", null, "g", null, "=", null, '"', null,
2260     "u", null, "t", null, "f", null, "-", null, "1", null,
2261     "6", null, '"', null, "?", null, ">", null],
2262     name => 'utf-16', bom => false,
2263     },
2264     {
2265     id => q<8.decl.1>,
2266     in => qq[<?xml version="1.0" encoding="utf-8"?>],
2267     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2268     " ", null, "v", null, "e", null, "r", null, "s", null,
2269     "i", null, "o", null, "n", null, "=", null, '"', null,
2270     "1", null, ".", null, "0", null, '"', null, " ", null,
2271     "e", null, "n", null, "c", null, "o", null, "d", null,
2272     "i", null, "n", null, "g", null, "=", null, '"', null,
2273     "u", null, "t", null, "f", null, "-", null, "8", null,
2274     '"', null, "?", null, ">", null],
2275     name => 'utf-8', bom => false,
2276     },
2277     {
2278     id => q<8.decl.2>,
2279     in => qq[<?xml encoding="utf-8"?>],
2280     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2281     " ", null,
2282     "e", null, "n", null, "c", null, "o", null, "d", null,
2283     "i", null, "n", null, "g", null, "=", null, '"', null,
2284     "u", null, "t", null, "f", null, "-", null, "8", null,
2285     '"', null, "?", null, ">", null],
2286     name => 'utf-8', bom => false,
2287     },
2288     {
2289     id => q<8.decl.3>,
2290     in => qq[<?xml version="1.1" encoding="utf-8"?>],
2291     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2292     " ", null, "v", null, "e", null, "r", null, "s", null,
2293     "i", null, "o", null, "n", null, "=", null, '"', null,
2294     "1", null, ".", null, "1", null, '"', null, " ", null,
2295     "e", null, "n", null, "c", null, "o", null, "d", null,
2296     "i", null, "n", null, "g", null, "=", null, '"', null,
2297     "u", null, "t", null, "f", null, "-", null, "8", null,
2298     '"', null, "?", null, ">", null],
2299     name => 'utf-8', bom => false,
2300     },
2301     {
2302     id => q<8.decl.4>,
2303     in => qq[<?xml version="1.0"?>],
2304     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2305     " ", null, "v", null, "e", null, "r", null, "s", null,
2306     "i", null, "o", null, "n", null, "=", null, '"', null,
2307     "1", null, ".", null, "0", null, '"', null,
2308     "?", null, ">", null],
2309     name => 'utf-8', bom => false,
2310     },
2311     {
2312     id => q<bom8.decl.1>,
2313     in => qq[\xEF\xBB\xBF<?xml encoding="utf-8"?>],
2314     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2315     " ", null,
2316     "e", null, "n", null, "c", null, "o", null, "d", null,
2317     "i", null, "n", null, "g", null, "=", null, '"', null,
2318     "u", null, "t", null, "f", null, "-", null, "8", null,
2319     '"', null, "?", null, ">", null],
2320     name => 'utf-8', bom => true,
2321     },
2322     {
2323     id => q<us-ascii.decl.1>,
2324     in => qq[<?xml encoding="us-ascii"?>],
2325     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2326     " ", null,
2327     "e", null, "n", null, "c", null, "o", null, "d", null,
2328     "i", null, "n", null, "g", null, "=", null, '"', null,
2329     "u", null, "s", null, "-", null, "a", null, "s", null,
2330     "c", null, "i", null, "i", null,
2331     '"', null, "?", null, ">", null],
2332     name => 'us-ascii', bom => false,
2333     },
2334     {
2335     id => q<us-ascii.decl.2>,
2336     in => qq[<?xml encoding="US-ascii"?>],
2337     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2338     " ", null,
2339     "e", null, "n", null, "c", null, "o", null, "d", null,
2340     "i", null, "n", null, "g", null, "=", null, '"', null,
2341     "U", null, "S", null, "-", null, "a", null, "s", null,
2342     "c", null, "i", null, "i", null,
2343     '"', null, "?", null, ">", null],
2344     name => 'us-ascii', bom => false,
2345     },
2346     {
2347     id => q<us-ascii.decl.3>,
2348     in => qq[<?xml encoding='us-ascii'?>],
2349     out => [null, "<", null, "?", null, "x", null, "m", null, "l", null,
2350     " ", null,
2351     "e", null, "n", null, "c", null, "o", null, "d", null,
2352     "i", null, "n", null, "g", null, "=", null, "'", null,
2353     "u", null, "s", null, "-", null, "a", null, "s", null,
2354     "c", null, "i", null, "i", null,
2355     "'", null, "?", null, ">", null],
2356     name => 'us-ascii', bom => false,
2357     },
2358     );
2359    
2360     for my $testdata (@testdata) {
2361     my $byte = $testdata->{in};
2362     my $error;
2363     my $i = 0;
2364    
2365     open my $fh, '<', \$byte;
2366     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
2367     (<Q::xml-auto-charset:>, $fh, sub {
2368     my (null, null, $etype, %opt) = @_;
2369     $error = [$etype, \%opt];
2370     });
2371    
2372     $test->id ("$testdata->{id}.bom");
2373     my $tf = $testdata->{bom} ? 'assert_true' : 'assert_false';
2374     $test->$tf ($efh-><AG::MCDecodeHandle.hasBOM>);
2375    
2376     $test->id ("$testdata->{id}.name");
2377     $test->assert_equals ($efh-><AG::MCDecodeHandle.inputEncoding>,
2378     $testdata->{name});
2379    
2380     while (@{$testdata->{out}}) {
2381     if ($i != 0) {
2382     my $c = shift @{$testdata->{out}};
2383     $test->id ("$testdata->{id}.$i");
2384     $test->assert_equals ($efh->getc, $c);
2385     }
2386    
2387     my $v = shift @{$testdata->{out}};
2388     $test->id ("$testdata->{id}.$i.error");
2389     if (defined $v) {
2390     $test->assert_not_null ($error);
2391     $test->assert_equals ($error->[0], $v->[0]);
2392     } else {
2393     $test->assert_null ($error->[0]);
2394     }
2395     $error = null;
2396     $i++;
2397     }
2398 wakaba 1.2
2399 wakaba 1.3 $test->id ("$testdata->{id}.eof");
2400     $test->assert_null ($efh->getc);
2401     $test->assert_null ($error);
2402     } # testdata
2403 wakaba 1.7
2404     @Method:
2405     @@Name: close
2406     @@Return:
2407     @@@PerlDef:
2408     close $self->{<H::mce|filehandle>};
2409 wakaba 1.3 ##MCDecodeHandle
2410 wakaba 1.5
2411     ClsDef:
2412     @ClsQName: ManakaiMCEUCJPDecodeHandle
2413    
2414     @ClsISA: ManakaiMCDecodeHandle
2415    
2416     @Method:
2417     @@Name: getc
2418     @@enDesc:
2419     Returns the next character from the input.
2420     @@Return:
2421     @@@Type: String
2422     @@@enDesc:
2423     The next character.
2424     @@@nullCase:
2425     @@@@enDesc:
2426     If at the end of the file, or if there was an error, in which
2427     case <Perl::$!> is set.
2428     @@@PerlDef:
2429     if (@{$self->{<H::mce|characterQueue>}}) {
2430     $r = shift @{$self->{<H::mce|characterQueue>}};
2431     } else {
2432     __DEEP{
2433     my $error;
2434     if ($self->{<H::mce|continue>}) {
2435     if (read $self->{<H::mce|filehandle>},
2436     $self->{<H::mce|byteBuffer>}, 256,
2437     length $self->{<H::mce|byteBuffer>}) {
2438     #
2439     } else {
2440     $error = true;
2441     }
2442     $self->{<H::mce|continue>} = false;
2443     } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
2444     read $self->{<H::mce|filehandle>},
2445     $self->{<H::mce|byteBuffer>}, 256,
2446     length $self->{<H::mce|byteBuffer>};
2447     }
2448    
2449     unless ($error) {
2450     my $string = Encode::decode ($self->{<H::mce|perlEncodingName>},
2451     $self->{<H::mce|byteBuffer>},
2452     Encode::FB_QUIET ());
2453     if (length $string) {
2454     push @{$self->{<H::mce|characterQueue>}}, split //, $string;
2455     $r = shift @{$self->{<H::mce|characterQueue>}};
2456     if (length $self->{<H::mce|byteBuffer>}) {
2457     $self->{<H::mce|continue>} = true;
2458     }
2459     } else {
2460     if (length $self->{<H::mce|byteBuffer>}) {
2461     $error = true;
2462     } else {
2463     $r = null;
2464     }
2465     }
2466     }
2467     if ($error) {
2468     $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
2469     my $etype = <Q::cs|illegal-octets-error>;
2470     if ($r =~ /^[\xA1-\xFE]/) {
2471     if ($self->{<H::mce|byteBuffer>} =~ s/^([\xA1-\xFE])//) {
2472     $r .= $1;
2473     $etype = <Q::cs|unassigned-code-point-error>;
2474     }
2475     } elsif ($r eq "\x8F") {
2476     if ($self->{<H::mce|byteBuffer>}
2477     =~ s/^([\xA1-\xFE][\xA1-\xFE]?)//) {
2478     $r .= $1;
2479     $etype = <Q::cs|unassigned-code-point-error> if length $1 == 2;
2480     }
2481     } elsif ($r eq "\x8E") {
2482     if ($self->{<H::mce|byteBuffer>} =~ s/^([\xA1-\xFE])//) {
2483     $r .= $1;
2484     $etype = <Q::cs|unassigned-code-point-error>;
2485     }
2486     } elsif ($r eq "\xA0" or $r eq "\xFF") {
2487     $etype = <Q::cs|unassigned-code-point-error>;
2488     }
2489     $self->{<H::mce|onerror>}
2490     ->($self->{<H::mce|onerror>}, $self, $etype,
2491     octets => \$r);
2492     }
2493     }__;
2494     }
2495    
2496    
2497     @Test:
2498     @@QName: MCEUCJPDecodeHandle.1.test
2499     @@PerlDef:
2500     my $impl;
2501     __CODE{createImplForTest:: $impl => $impl}__;
2502    
2503     my @testdata = (
2504     {
2505     id => q<l=0>,
2506     in => q<>,
2507     out => [null],
2508     },
2509     {
2510     id => q<l=1.00>,
2511     in => qq<\x00>,
2512     out => [null, "\x00", null],
2513     },
2514     {
2515     id => q<l=1.0d>,
2516     in => qq<\x0D>,
2517     out => [null, "\x0D", null],
2518     },
2519     {
2520     id => q<l=1.0e>,
2521     in => qq<\x0E>,
2522     out => [null, "\x0E", null],
2523     }, # Error??
2524     {
2525     id => q<l=1.0f>,
2526     in => qq<\x0F>,
2527     out => [null, "\x0F", null],
2528     }, # Error??
2529     {
2530     id => q<l=1.1b>,
2531     in => qq<\x1B>,
2532     out => [null, "\x1B", null],
2533     }, # Error??
2534     {
2535     id => q<l=1.a>,
2536     in => q<a>,
2537     out => [null, "a", null],
2538     },
2539     {
2540     id => q<l=1.20>,
2541     in => qq<\x20>,
2542     out => [null, "\x20", null],
2543     },
2544     {
2545     id => q<5C>,
2546     in => qq<\x5C>,
2547     out => [null, "\x5C", null],
2548     },
2549     {
2550     id => q<l=1.7E>,
2551     in => qq<\x7E>,
2552     out => [null, "\x7E", null],
2553     },
2554     {
2555     id => q<l=1.7F>,
2556     in => qq<\x7F>,
2557     out => [null, "\x7F", null],
2558     },
2559     {
2560     id => q<l=1.80>,
2561     in => qq<\x80>,
2562     out => [null, "\x80", null],
2563     },
2564     {
2565     id => q<l=1.8c>,
2566     in => qq<\x8C>,
2567     out => [null, "\x8C", null],
2568     },
2569     {
2570     id => q<l=1.8e>,
2571     in => qq<\x8E>,
2572     out => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
2573     },
2574     {
2575     id => q<l=1.8f>,
2576     in => qq<\x8F>,
2577     out => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
2578     },
2579     {
2580     id => q<l=1.a0>,
2581     in => qq<\xA0>,
2582     out => [null, "\xA0", [<Q::cs|unassigned-code-point-error>]],
2583     },
2584     {
2585     id => q<l=1.a1>,
2586     in => qq<\xA1>,
2587     out => [null, "\xA1", [<Q::cs|illegal-octets-error>]],
2588     },
2589     {
2590     id => q<l=1.a2>,
2591     in => qq<\xA2>,
2592     out => [null, "\xA2", [<Q::cs|illegal-octets-error>]],
2593     },
2594     {
2595     id => q<l=1.fd>,
2596     in => qq<\xFD>,
2597     out => [null, "\xFD", [<Q::cs|illegal-octets-error>]],
2598     },
2599     {
2600     id => q<l=1.fe>,
2601     in => qq<\xFE>,
2602     out => [null, "\xFE", [<Q::cs|illegal-octets-error>]],
2603     },
2604     {
2605     id => q<l=1.ff>,
2606     in => qq<\xFF>,
2607     out => [null, "\xFF", [<Q::cs|unassigned-code-point-error>]],
2608     },
2609     {
2610     id => q<l=2.0000>,
2611     in => qq<\x00\x00>,
2612     out => [null, "\x00", null, "\x00", null],
2613     },
2614     {
2615     id => q<l=2.0D0A>,
2616     in => qq<\x0D\x0A>,
2617     out => [null, "\x0D", null, "\x0A", null],
2618     },
2619     {
2620     id => q<l=2.1B28>,
2621     in => qq<\x1B\x28>,
2622     out => [null, "\x1B", null, "\x28", null],
2623     },# Error??
2624     {
2625     id => q<l=2.2020>,
2626     in => qq<\x20\x20>,
2627     out => [null, "\x20", null, "\x20", null],
2628     },
2629     {
2630     id => q<l=2.ab>,
2631     in => qq<ab>,
2632     out => [null, "a", null, "b", null],
2633     },
2634     {
2635     id => q<l=2.a0a1>,
2636     in => qq<\xA0\xA1>,
2637     out => [null, "\xA0", [<Q::cs|unassigned-code-point-error>],
2638     "\xA1", [<Q::cs|illegal-octets-error>]],
2639     },
2640     {
2641     id => q<l=2.a1a1>,
2642     in => qq<\xA1\xA1>,
2643     out => [null, "\x{3000}", null],
2644     },
2645     {
2646     id => q<l=2.a1a2>,
2647     in => qq<\xA1\xA2>,
2648     out => [null, "\x{3001}", null],
2649     },
2650     {
2651     id => q<l=2.a1a4>,
2652     in => qq<\xA1\xA4>,
2653     out => [null, "\x{FF0C}", null], # FULLWIDTH COMMA
2654     },
2655     {
2656     id => q<a1a6>,
2657     in => qq<\xA1\xA6>,
2658     out => [null, "\x{30FB}", null], # KATAKABA MIDDLE DOT
2659     },
2660     {
2661     id => q<a1a7>,
2662     in => qq<\xA1\xA7>,
2663     out => [null, "\x{FF1A}", null], # FULLWIDTH COLON
2664     },
2665     {
2666     id => q<a1b1>,
2667     in => qq<\xA1\xB1>,
2668     out => [null, "\x{203E}", null], # OVERLINE
2669     },
2670     {
2671     id => q<a1bd>,
2672     in => qq<\xA1\xBD>,
2673     out => [null, "\x{2014}", null], # EM DASH
2674     },
2675     {
2676     id => q<a1c0>,
2677     in => qq<\xA1\xC0>,
2678     out => [null, "\x{FF3C}", null], # FULLWIDTH REVERSE SOLIDUS
2679     },
2680     {
2681     id => q<a1c1>,
2682     in => qq<\xA1\xC1>,
2683     out => [null, "\x{301C}", null], # WAVE DASH
2684     },
2685     {
2686     id => q<a1c2>,
2687     in => qq<\xA1\xC2>,
2688     out => [null, "\x{2016}", null], # DOUBLE VERTICAL LINE
2689     },
2690     {
2691     id => q<a1c4>,
2692     in => qq<\xA1\xC4>,
2693     out => [null, "\x{2026}", null], # HORIZONTAL ELLIPSIS
2694     },
2695     {
2696     id => q<a1dd>,
2697     in => qq<\xA1\xDD>,
2698     out => [null, "\x{2212}", null], # MINUS SIGN
2699     },
2700     {
2701     id => q<a1ef>,
2702     in => qq<\xA1\xEF>,
2703     out => [null, "\x{00A5}", null], # YEN SIGN
2704     },
2705     {
2706     id => q<a1f1>,
2707     in => qq<\xA1\xF1>,
2708     out => [null, "\x{00A2}", null], # CENT SIGN
2709     },
2710     {
2711     id => q<a1f2>,
2712     in => qq<\xA1\xF2>,
2713     out => [null, "\x{00A3}", null], # POUND SIGN
2714     },
2715     {
2716     id => q<a1f2>,
2717     in => qq<\xA1\xFF>,
2718     out => [null, "\xA1", [<Q::cs|illegal-octets-error>],
2719     "\xFF", [<Q::cs|unassigned-code-point-error>]],
2720     },
2721     {
2722     id => q<a2ae>,
2723     in => qq<\xA2\xAE>,
2724     out => [null, "\x{3013}", null], # GETA MARK
2725     },
2726     {
2727     id => q<a2af>,
2728     in => qq<\xA2\xAF>,
2729     out => [null, "\xA2\xAF", [<Q::cs|unassigned-code-point-error>]],
2730     },
2731     {
2732     id => q<a2ba>,
2733     in => qq<\xA2\xBA>,
2734     out => [null, "\x{2208}", null], # ELEMENT OF
2735     },
2736     {
2737     id => q<a2fe>,
2738     in => qq<\xA2\xFE>,
2739     out => [null, "\x{25EF}", null], # LARGE CIRCLE
2740     },
2741     {
2742     id => q<adce>,
2743     in => qq<\xAD\xCE>,
2744     out => [null, "\xAD\xCE", [<Q::cs|unassigned-code-point-error>]],
2745     },
2746     {
2747     id => q<b0a6>,
2748     in => qq<\xB0\xA6>,
2749     out => [null, "\x{611B}", null], # han
2750     },
2751     {
2752     id => q<f4a6>,
2753     in => qq<\xF4\xA6>,
2754     out => [null, "\x{7199}", null], # han
2755     },
2756     {
2757     id => q<8ea1>,
2758     in => qq<\x8E\xA1>,
2759     out => [null, "\x{FF61}", null],
2760     },
2761     {
2762     id => q<8efe>,
2763     in => qq<\x8E\xFE>,
2764     out => [null, "\x8E\xFE", [<Q::cs|unassigned-code-point-error>]],
2765     },
2766     {
2767     id => q<8ffe>,
2768     in => qq<\x8F\xFE>,
2769     out => [null, "\x8F\xFE", [<Q::cs|illegal-octets-error>]],
2770     },
2771     {
2772     id => q<l=2.a1a2a3>,
2773     in => qq<\xA1\xA2\xA3>,
2774     out => [null, "\x{3001}", null,
2775     "\xA3", [<Q::cs|illegal-octets-error>]],
2776     },
2777     {
2778     id => q<8ea1a1>,
2779     in => qq<\x8E\xA1\xA1>,
2780     out => [null, "\x{FF61}", null,
2781     "\xA1", [<Q::cs|illegal-octets-error>]],
2782     },
2783     {
2784     id => q<8fa1a1>,
2785     in => qq<\x8F\xA1\xA1>,
2786     out => [null, "\x8F\xA1\xA1", [<Q::cs|unassigned-code-point-error>]],
2787     },
2788     {
2789     id => q<8fa2af>,
2790     in => qq<\x8F\xA2\xAF>,
2791     out => [null, "\x{02D8}", null],
2792     },
2793     {
2794     id => q<8fa2b7>,
2795     in => qq<\x8F\xA2\xB7>,
2796     out => [null, "\x{FF5E}", null], # FULLWIDTH TILDE
2797     },
2798     {
2799     id => q<a1a2a1a3>,
2800     in => qq<\xA1\xA2\xA1\xA3>,
2801     out => [null, "\x{3001}", null, "\x{3002}", null],
2802     },
2803     {
2804     id => q<8fa2af>,
2805     in => qq<\x8F\xA2\xAF\xAF>,
2806     out => [null, "\x{02D8}", null,
2807     "\xAF", [<Q::cs|illegal-octets-error>]],
2808     },
2809     {
2810     id => q<8fa2afafa1>,
2811     in => qq<\x8F\xA2\xAF\xAF\xA1>,
2812     out => [null, "\x{02D8}", null,
2813     "\xAF\xA1", [<Q::cs|unassigned-code-point-error>]],
2814     },
2815     );
2816    
2817     for my $testdata (@testdata) {
2818     my $byte = $testdata->{in};
2819     my $error;
2820     my $i = 0;
2821    
2822     open my $fh, '<', \$byte;
2823     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
2824     (<Q::cs|XML.euc-jp>, $fh, sub {
2825     my (null, null, $etype, %opt) = @_;
2826     $error = [$etype, \%opt];
2827     });
2828    
2829     while (@{$testdata->{out}}) {
2830     if ($i != 0) {
2831     my $c = shift @{$testdata->{out}};
2832     $test->id ("$testdata->{id}.$i");
2833     $test->assert_equals ($efh->getc, $c);
2834     }
2835    
2836     my $v = shift @{$testdata->{out}};
2837     $test->id ("$testdata->{id}.$i.error");
2838     if (defined $v) {
2839     $test->assert_not_null ($error);
2840     $test->assert_equals ($error->[0], $v->[0]);
2841     } else {
2842     $test->assert_null ($error->[0]);
2843     }
2844     $error = null;
2845     $i++;
2846     }
2847    
2848     $test->id ("$testdata->{id}.eof");
2849     $test->assert_null ($efh->getc);
2850     $test->assert_null ($error);
2851     } # testdata
2852     ##MCEUCJPDecodeHandle
2853    
2854     ClsDef:
2855     @ClsQName: ManakaiMCShiftJISDecodeHandle
2856    
2857     @ClsISA: ManakaiMCDecodeHandle
2858    
2859     @Method:
2860     @@Name: getc
2861     @@enDesc:
2862     Returns the next character from the input.
2863     @@Return:
2864     @@@Type: String
2865     @@@enDesc:
2866     The next character.
2867     @@@nullCase:
2868     @@@@enDesc:
2869     If at the end of the file, or if there was an error, in which
2870     case <Perl::$!> is set.
2871     @@@PerlDef:
2872     if (@{$self->{<H::mce|characterQueue>}}) {
2873     $r = shift @{$self->{<H::mce|characterQueue>}};
2874     } else {
2875     __DEEP{
2876     my $error;
2877     if ($self->{<H::mce|continue>}) {
2878     if (read $self->{<H::mce|filehandle>},
2879     $self->{<H::mce|byteBuffer>}, 256,
2880     length $self->{<H::mce|byteBuffer>}) {
2881     #
2882     } else {
2883     $error = true;
2884     }
2885     $self->{<H::mce|continue>} = false;
2886     } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
2887     read $self->{<H::mce|filehandle>},
2888     $self->{<H::mce|byteBuffer>}, 256,
2889     length $self->{<H::mce|byteBuffer>};
2890     }
2891    
2892     unless ($error) {
2893     my $string = Encode::decode ($self->{<H::mce|perlEncodingName>},
2894     $self->{<H::mce|byteBuffer>},
2895     Encode::FB_QUIET ());
2896     if (length $string) {
2897     push @{$self->{<H::mce|characterQueue>}}, split //, $string;
2898     $r = shift @{$self->{<H::mce|characterQueue>}};
2899     if (length $self->{<H::mce|byteBuffer>}) {
2900     $self->{<H::mce|continue>} = true;
2901     }
2902     } else {
2903     if (length $self->{<H::mce|byteBuffer>}) {
2904     $error = true;
2905     } else {
2906     $r = null;
2907     }
2908     }
2909     }
2910     if ($error) {
2911     $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
2912     my $etype = <Q::cs|illegal-octets-error>;
2913     if ($r =~ /^[\x81-\x9F\xE0-\xEF]/) {
2914     if ($self->{<H::mce|byteBuffer>} =~ s/(.)//s) {
2915     $r .= $1; # not limited to \x40-\xFC - \x7F
2916     $etype = <Q::cs|unassigned-code-point-error>;
2917     }
2918     } elsif ($r =~ /^[\x80\xA0\xF0-\xFF]/) {
2919     $etype = <Q::cs|unassigned-code-point-error>;
2920     }
2921     $self->{<H::mce|onerror>}
2922     ->($self->{<H::mce|onerror>}, $self, $etype,
2923     octets => \$r);
2924     }
2925     }__;
2926     }
2927    
2928     @Test:
2929     @@QName: MCShiftJISDecodeHandle.1.test
2930     @@PerlDef:
2931     my $impl;
2932     __CODE{createImplForTest:: $impl => $impl}__;
2933    
2934     my @testdata = (
2935     {
2936     id => q<l=0>,
2937     in => q<>,
2938     out => [null],
2939     },
2940     {
2941     id => q<l=1.00>,
2942     in => qq<\x00>,
2943     out => [null, "\x00", null],
2944     },
2945     {
2946     id => q<l=1.0d>,
2947     in => qq<\x0D>,
2948     out => [null, "\x0D", null],
2949     },
2950     {
2951     id => q<l=1.0e>,
2952     in => qq<\x0E>,
2953     out => [null, "\x0E", null],
2954     }, # Error??
2955     {
2956     id => q<l=1.0f>,
2957     in => qq<\x0F>,
2958     out => [null, "\x0F", null],
2959     }, # Error??
2960     {
2961     id => q<l=1.1b>,
2962     in => qq<\x1B>,
2963     out => [null, "\x1B", null],
2964     }, # Error??
2965     {
2966     id => q<l=1.a>,
2967     in => q<a>,
2968     out => [null, "a", null],
2969     },
2970     {
2971     id => q<l=1.20>,
2972     in => qq<\x20>,
2973     out => [null, "\x20", null],
2974     },
2975     {
2976     id => q<l=1.5C>,
2977     in => qq<\x5C>,
2978     out => [null, "\xA5", null], # YEN SIGN
2979     },
2980     {
2981     id => q<l=1.7E>,
2982     in => qq<\x7E>,
2983     out => [null, "\x{203E}", null], # OVERLINE
2984     },
2985     {
2986     id => q<l=1.7F>,
2987     in => qq<\x7F>,
2988     out => [null, "\x7F", null],
2989     },
2990     {
2991     id => q<l=1.80>,
2992     in => qq<\x80>,
2993     out => [null, "\x80", [<Q::cs|unassigned-code-point-error>]],
2994     },
2995     {
2996     id => q<l=1.8c>,
2997     in => qq<\x8C>,
2998     out => [null, "\x8C", [<Q::cs|illegal-octets-error>]],
2999     },
3000     {
3001     id => q<l=1.8e>,
3002     in => qq<\x8E>,
3003     out => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
3004     },
3005     {
3006     id => q<l=1.8f>,
3007     in => qq<\x8F>,
3008     out => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
3009     },
3010     {
3011     id => q<l=1.a0>,
3012     in => qq<\xA0>,
3013     out => [null, "\xA0", [<Q::cs|unassigned-code-point-error>]],
3014     },
3015     {
3016     id => q<l=1.a1>,
3017     in => qq<\xA1>,
3018     out => [null, "\x{FF61}", null],
3019     },
3020     {
3021     id => q<l=1.a2>,
3022     in => qq<\xA2>,
3023     out => [null, "\x{FF62}", null],
3024     },
3025     {
3026     id => q<l=1.df>,
3027     in => qq<\xdf>,
3028     out => [null, "\x{FF9F}", null],
3029     },
3030     {
3031     id => q<l=1.e0>,
3032     in => qq<\xe0>,
3033     out => [null, "\xE0", [<Q::cs|illegal-octets-error>]],
3034     },
3035     {
3036     id => q<l=1.ef>,
3037     in => qq<\xEF>,
3038     out => [null, "\xEF", [<Q::cs|illegal-octets-error>]],
3039     },
3040     {
3041     id => q<F0>,
3042     in => qq<\xF0>,
3043     out => [null, "\xF0", [<Q::cs|unassigned-code-point-error>]],
3044     },
3045     {
3046     id => q<l=1.fc>,
3047     in => qq<\xFC>,
3048     out => [null, "\xFC", [<Q::cs|unassigned-code-point-error>]],
3049     },
3050     {
3051     id => q<l=1.fd>,
3052     in => qq<\xFD>,
3053     out => [null, "\xFD", [<Q::cs|unassigned-code-point-error>]],
3054     },
3055     {
3056     id => q<l=1.fe>,
3057     in => qq<\xFE>,
3058     out => [null, "\xFE", [<Q::cs|unassigned-code-point-error>]],
3059     },
3060     {
3061     id => q<l=1.ff>,
3062     in => qq<\xFF>,
3063     out => [null, "\xFF", [<Q::cs|unassigned-code-point-error>]],
3064     },
3065     {
3066     id => q<l=2.0000>,
3067     in => qq<\x00\x00>,
3068     out => [null, "\x00", null, "\x00", null],
3069     },
3070     {
3071     id => q<l=2.0D0A>,
3072     in => qq<\x0D\x0A>,
3073     out => [null, "\x0D", null, "\x0A", null],
3074     },
3075     {
3076     id => q<l=2.1B28>,
3077     in => qq<\x1B\x28>,
3078     out => [null, "\x1B", null, "\x28", null],
3079     },# Error??
3080     {
3081     id => q<l=2.2020>,
3082     in => qq<\x20\x20>,
3083     out => [null, "\x20", null, "\x20", null],
3084     },
3085     {
3086     id => q<l=2.ab>,
3087     in => qq<ab>,
3088     out => [null, "a", null, "b", null],
3089     },
3090     {
3091     id => q<8040>,
3092     in => qq<\x80\x40>,
3093     out => [null, "\x80", [<Q::cs|unassigned-code-point-error>],
3094     "\x40", null],
3095     },
3096     {
3097     id => q<8100>,
3098     in => qq<\x81\x00>,
3099     out => [null, "\x81\x00", [<Q::cs|unassigned-code-point-error>]],
3100     },
3101     {
3102     id => q<8101>,
3103     in => qq<\x81\x01>,
3104     out => [null, "\x81\x01", [<Q::cs|unassigned-code-point-error>]],
3105     },
3106     {
3107     id => q<813F>,
3108     in => qq<\x81\x3F>,
3109     out => [null, "\x81\x3F", [<Q::cs|unassigned-code-point-error>]],
3110     },
3111     {
3112     id => q<8140>,
3113     in => qq<\x81\x40>,
3114     out => [null, "\x{3000}", null],
3115     },
3116     {
3117     id => q<8141>,
3118     in => qq<\x81\x41>,
3119     out => [null, "\x{3001}", null],
3120     },
3121     {
3122     id => q<8143>,
3123     in => qq<\x81\x43>,
3124     out => [null, "\x{FF0C}", null], # FULLWIDTH COMMA
3125     },
3126     {
3127     id => q<8150>,
3128     in => qq<\x81\x50>,
3129     out => [null, "\x{FFE3}", null], # FULLWIDTH MACRON
3130     },
3131     {
3132     id => q<815C>,
3133     in => qq<\x81\x5C>,
3134     out => [null, "\x{2014}", null], # EM DASH
3135     },
3136     {
3137     id => q<815F>,
3138     in => qq<\x81\x5F>,
3139     out => [null, "\x{005C}", null], # REVERSE SOLIDUS
3140     },
3141     {
3142     id => q<8160>,
3143     in => qq<\x81\x60>,
3144     out => [null, "\x{301C}", null], # WAVE DASH
3145     },
3146     {
3147     id => q<8161>,
3148     in => qq<\x81\x61>,
3149     out => [null, "\x{2016}", null], # DOUBLE VERTICAL LINE
3150     },
3151     {
3152     id => q<8163>,
3153     in => qq<\x81\x63>,
3154     out => [null, "\x{2026}", null], # HORIZONTAL ELLIPSIS
3155     },
3156     {
3157     id => q<817C>,
3158     in => qq<\x81\x7C>,
3159     out => [null, "\x{2212}", null], # MINUS SIGN
3160     },
3161     {
3162     id => q<817F>,
3163     in => qq<\x81\x7F>,
3164     out => [null, "\x81\x7F", [<Q::cs|unassigned-code-point-error>]],
3165     },
3166     {
3167     id => q<818F>,
3168     in => qq<\x81\x8F>,
3169     out => [null, "\x{FFE5}", null], # FULLWIDTH YEN SIGN
3170     },
3171     {
3172     id => q<8191>,
3173     in => qq<\x81\x91>,
3174     out => [null, "\x{00A2}", null], # CENT SIGN
3175     },
3176     {
3177     id => q<8192>,
3178     in => qq<\x81\x92>,
3179     out => [null, "\x{00A3}", null], # POUND SIGN
3180     },
3181     {
3182     id => q<81AC>,
3183     in => qq<\x81\xAC>,
3184     out => [null, "\x{3013}", null], # GETA MARK
3185     },
3186     {
3187     id => q<81AD>,
3188     in => qq<\x81\xAD>,
3189     out => [null, "\x81\xAD", [<Q::cs|unassigned-code-point-error>]],
3190     },
3191     {
3192     id => q<81B8>,
3193     in => qq<\x81\xB8>,
3194     out => [null, "\x{2208}", null], # ELEMENT OF
3195     },
3196     {
3197     id => q<81CA>,
3198     in => qq<\x81\xCA>,
3199     out => [null, "\x{00AC}", null], # NOT SIGN
3200     },
3201     {
3202     id => q<81FC>,
3203     in => qq<\x81\xFC>,
3204     out => [null, "\x{25EF}", null], # LARGE CIRCLE
3205     },
3206     {
3207     id => q<81FD>,
3208     in => qq<\x81\xFD>,
3209     out => [null, "\x81\xFD", [<Q::cs|unassigned-code-point-error>]],
3210     },
3211     {
3212     id => q<81FE>,
3213     in => qq<\x81\xFE>,
3214     out => [null, "\x81\xFE", [<Q::cs|unassigned-code-point-error>]],
3215     },
3216     {
3217     id => q<81FF>,
3218     in => qq<\x81\xFF>,
3219     out => [null, "\x81\xFF", [<Q::cs|unassigned-code-point-error>]],
3220     },
3221     {
3222     id => q<DDDE>,
3223     in => qq<\xDD\xDE>,
3224     out => [null, "\x{FF9D}", null, "\x{FF9E}", null],
3225     },
3226     {
3227     id => q<e040>,
3228     in => qq<\xE0\x40>,
3229     out => [null, "\x{6F3E}", null],
3230     },
3231     {
3232     id => q<eaa4>,
3233     in => qq<\xEA\xA4>,
3234     out => [null, "\x{7199}", null],
3235     },
3236     {
3237     id => q<eaa5>,
3238     in => qq<\xEA\xA5>,
3239     out => [null, "\xEA\xA5", [<Q::cs|unassigned-code-point-error>]],
3240     },
3241     {
3242     id => q<eb40>,
3243     in => qq<\xEB\x40>,
3244     out => [null, "\xEB\x40", [<Q::cs|unassigned-code-point-error>]],
3245     },
3246     {
3247     id => q<ed40>,
3248     in => qq<\xED\x40>,
3249     out => [null, "\xED\x40", [<Q::cs|unassigned-code-point-error>]],
3250     },
3251     {
3252     id => q<effc>,
3253     in => qq<\xEF\xFC>,
3254     out => [null, "\xEF\xFC", [<Q::cs|unassigned-code-point-error>]],
3255     },
3256     {
3257     id => q<f040>,
3258     in => qq<\xF0\x40>,
3259     out => [null, "\xF0", [<Q::cs|unassigned-code-point-error>],
3260     "\x40", null],
3261     },
3262     {
3263     id => q<f140>,
3264     in => qq<\xF1\x40>,
3265     out => [null, "\xF1", [<Q::cs|unassigned-code-point-error>],
3266     "\x40", null],
3267     },
3268     {
3269     id => q<fb40>,
3270     in => qq<\xFB\x40>,
3271     out => [null, "\xFB", [<Q::cs|unassigned-code-point-error>],
3272     "\x40", null],
3273     },
3274     {
3275     id => q<fc40>,
3276     in => qq<\xFc\x40>,
3277     out => [null, "\xFC", [<Q::cs|unassigned-code-point-error>],
3278     "\x40", null],
3279     },
3280     {
3281     id => q<fd40>,
3282     in => qq<\xFD\x40>,
3283     out => [null, "\xFD", [<Q::cs|unassigned-code-point-error>],
3284     "\x40", null],
3285     },
3286     {
3287     id => q<fE40>,
3288     in => qq<\xFE\x40>,
3289     out => [null, "\xFE", [<Q::cs|unassigned-code-point-error>],
3290     "\x40", null],
3291     },
3292     {
3293     id => q<ff40>,
3294     in => qq<\xFF\x40>,
3295     out => [null, "\xFF", [<Q::cs|unassigned-code-point-error>],
3296     "\x40", null],
3297     },
3298     {
3299     id => q<81408142>,
3300     in => qq<\x81\x40\x81\x42>,
3301     out => [null, "\x{3000}", null, "\x{3002}", null],
3302     },
3303     );
3304    
3305     for my $testdata (@testdata) {
3306     my $byte = $testdata->{in};
3307     my $error;
3308     my $i = 0;
3309    
3310     open my $fh, '<', \$byte;
3311     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
3312     (<Q::cs|XML.shift_jis>, $fh, sub {
3313     my (null, null, $etype, %opt) = @_;
3314     $error = [$etype, \%opt];
3315     });
3316    
3317     while (@{$testdata->{out}}) {
3318     if ($i != 0) {
3319     my $c = shift @{$testdata->{out}};
3320     $test->id ("$testdata->{id}.$i");
3321     $test->assert_equals ($efh->getc, $c);
3322     }
3323    
3324     my $v = shift @{$testdata->{out}};
3325     $test->id ("$testdata->{id}.$i.error");
3326     if (defined $v) {
3327     $test->assert_not_null ($error);
3328     $test->assert_equals ($error->[0], $v->[0]);
3329     } else {
3330     $test->assert_null ($error->[0]);
3331     }
3332     $error = null;
3333     $i++;
3334     }
3335    
3336     $test->id ("$testdata->{id}.eof");
3337     $test->assert_null ($efh->getc);
3338     $test->assert_null ($error);
3339     } # testdata
3340     ##MCShiftJISDecodeHandle
3341 wakaba 1.6
3342     ClsDef:
3343     @ClsQName: ManakaiMCISO2022JPDecodeHandle
3344    
3345     @ClsISA: ManakaiMCDecodeHandle
3346    
3347     @Method:
3348     @@Name: getc
3349     @@enDesc:
3350     Returns the next character from the input.
3351     @@Return:
3352     @@@Type: String
3353     @@@enDesc:
3354     The next character.
3355     @@@nullCase:
3356     @@@@enDesc:
3357     If at the end of the file, or if there was an error, in which
3358     case <Perl::$!> is set.
3359     @@@PerlDef:
3360     if (@{$self->{<H::mce|characterQueue>}}) {
3361     $r = shift @{$self->{<H::mce|characterQueue>}};
3362     } else {
3363     __DEEP{
3364     A: {
3365     my $error;
3366     if ($self->{<H::mce|continue>}) {
3367     if (read $self->{<H::mce|filehandle>},
3368     $self->{<H::mce|byteBuffer>}, 256,
3369     length $self->{<H::mce|byteBuffer>}) {
3370     #
3371     } else {
3372     $error = true;
3373     }
3374     $self->{<H::mce|continue>} = false;
3375     } elsif (512 > length $self->{<H::mce|byteBuffer>}) {
3376     read $self->{<H::mce|filehandle>},
3377     $self->{<H::mce|byteBuffer>}, 256,
3378     length $self->{<H::mce|byteBuffer>};
3379     }
3380    
3381     unless ($error) {
3382     if ($self->{<H::mce|byteBuffer>}
3383     =~ s/^\x1B(\x24[\x40\x42]|\x28[\x42\x4A])//) {
3384     $self->{<H::mce|state>} = {
3385     "\x24\x40" => <H::mce|State.2440>,
3386     "\x24\x42" => <H::mce|State.2442>,
3387     "\x28\x42" => <H::mce|State.2842>,
3388     "\x28\x4A" => <H::mce|State.284A>,
3389     }->{$1};
3390     redo A;
3391     } elsif ($self->{<H::mce|state>} eq <H::mce|State.2842>) { # IRV
3392     if ($self->{<H::mce|byteBuffer>}
3393     =~ s/^([\x00-\x0D\x10-\x1A\x1C-\x7F]+)//) {
3394     push @{$self->{<H::mce|characterQueue>}}, split //, $1;
3395     $r = shift @{$self->{<H::mce|characterQueue>}};
3396     } else {
3397     if (length $self->{<H::mce|byteBuffer>}) {
3398     $error = true;
3399     } else {
3400     $r = null;
3401     }
3402     }
3403     } elsif ($self->{<H::mce|state>} eq <H::mce|State.284A>) { # 0201
3404     if ($self->{<H::mce|byteBuffer>}
3405     =~ s/^([\x00-\x0D\x10-\x1A\x1C-\x7F]+)//) {
3406     my $v = $1; $v =~ tr/\x5C\x7E/\xA5\x{203E}/;
3407     push @{$self->{<H::mce|characterQueue>}}, split //, $v;
3408     $r = shift @{$self->{<H::mce|characterQueue>}};
3409     } else {
3410     if (length $self->{<H::mce|byteBuffer>}) {
3411     $error = true;
3412     } else {
3413     $r = null;
3414     $self->{<H::mce|onerror>}
3415     ->($self->{<H::mce|onerror>}, $self,
3416     <Q::cs|invalid-state-error>,
3417     state => $self->{<H::mce|state>});
3418     }
3419     }
3420     } elsif ($self->{<H::mce|state>} eq <H::mce|State.2442>) { # 1983
3421     my $v = Encode::decode ($self->{<H::mce|State.2442>},
3422     $self->{<H::mce|byteBuffer>},
3423     Encode::FB_QUIET ());
3424     if (length $v) {
3425     push @{$self->{<H::mce|characterQueue>}}, split //, $v;
3426     $r = shift @{$self->{<H::mce|characterQueue>}};
3427     } else {
3428     if (length $self->{<H::mce|byteBuffer>}) {
3429     $error = true;
3430     } else {
3431     $r = null;
3432     $self->{<H::mce|onerror>}
3433     ->($self->{<H::mce|onerror>}, $self,
3434     <Q::cs|invalid-state-error>,
3435     state => $self->{<H::mce|state>});
3436     }
3437     }
3438     } elsif ($self->{<H::mce|state>} eq <H::mce|State.2440>) { # 1978
3439     my $v = Encode::decode ($self->{<H::mce|State.2440>},
3440     $self->{<H::mce|byteBuffer>},
3441     Encode::FB_QUIET ());
3442     if (length $v) {
3443     push @{$self->{<H::mce|characterQueue>}}, split //, $v;
3444     $r = shift @{$self->{<H::mce|characterQueue>}};
3445     } else {
3446     if (length $self->{<H::mce|byteBuffer>}) {
3447     $error = true;
3448     } else {
3449     $r = null;
3450     $self->{<H::mce|onerror>}
3451     ->($self->{<H::mce|onerror>}, $self,
3452     <Q::cs|invalid-state-error>,
3453     state => $self->{<H::mce|state>});
3454     }
3455     }
3456     } else {
3457     $error = true;
3458     }
3459     }
3460     if ($error) {
3461     $r = substr $self->{<H::mce|byteBuffer>}, 0, 1, '';
3462     my $etype = <Q::cs|illegal-octets-error>;
3463     if (($self->{<H::mce|state>} eq <H::mce|State.2442> or
3464     $self->{<H::mce|state>} eq <H::mce|State.2440>) and
3465     $r =~ /^[\x21-\x7E]/ and
3466     $self->{<H::mce|byteBuffer>} =~ s/^([\x21-\x7E])//) {
3467     $r .= $1;
3468     $etype = <Q::cs|unassigned-code-point-error>;
3469     } elsif ($r eq "\x1B" and
3470     $self->{<H::mce|byteBuffer>} =~ s/^\(H//) { # Old 0201
3471     $r .= "(H";
3472     $self->{<H::mce|state>} = <H::mce|State.284A>;
3473     }
3474     $self->{<H::mce|onerror>}
3475     ->($self->{<H::mce|onerror>}, $self, $etype,
3476     octets => \$r);
3477     }
3478     } # A
3479     }__;
3480     }
3481    
3482     @Test:
3483     @@QName: MCISO2022JPDecodeHandle.1.test
3484     @@PerlDef:
3485     my $impl;
3486     __CODE{createImplForTest:: $impl => $impl}__;
3487    
3488     my @testdata = (
3489     {
3490     id => q<l=0>,
3491     in => q<>,
3492     out1 => [null],
3493     out2 => [null],
3494     },
3495     {
3496     id => q<l=1.00>,
3497     in => qq<\x00>,
3498     out1 => [null, "\x00", null],
3499     out2 => [null, "\x00", null],
3500     },
3501     {
3502     id => q<l=1.0d>,
3503     in => qq<\x0D>,
3504     out1 => [null, "\x0D", null],
3505     out2 => [null, "\x0D", null],
3506     }, # Error?
3507     {
3508     id => q<0A>,
3509     in => qq<\x0A>,
3510     out1 => [null, "\x0A", null],
3511     out2 => [null, "\x0A", null],
3512     }, # Error?
3513     {
3514     id => q<l=1.0e>,
3515     in => qq<\x0E>,
3516     out1 => [null, "\x0E", [<Q::cs|illegal-octets-error>]],
3517     out2 => [null, "\x0E", [<Q::cs|illegal-octets-error>]],
3518     },
3519     {
3520     id => q<l=1.0f>,
3521     in => qq<\x0F>,
3522     out1 => [null, "\x0F", [<Q::cs|illegal-octets-error>]],
3523     out2 => [null, "\x0F", [<Q::cs|illegal-octets-error>]],
3524     },
3525     {
3526     id => q<l=1.1b>,
3527     in => qq<\x1B>,
3528     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>]],
3529     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>]],
3530     },
3531     {
3532     id => q<l=1.a>,
3533     in => q<a>,
3534     out1 => [null, "a", null],
3535     out2 => [null, "a", null],
3536     },
3537     {
3538     id => q<l=1.20>,
3539     in => qq<\x20>,
3540     out1 => [null, "\x20", null],
3541     out2 => [null, "\x20", null],
3542     },
3543     {
3544     id => q<l=1.5C>,
3545     in => qq<\x5C>,
3546     out1 => [null, "\x5C", null],
3547     out2 => [null, "\x5C", null],
3548     },
3549     {
3550     id => q<l=1.7E>,
3551     in => qq<\x7E>,
3552     out1 => [null, "\x7E", null],
3553     out2 => [null, "\x7E", null],
3554     },
3555     {
3556     id => q<l=1.7F>,
3557     in => qq<\x7F>,
3558     out1 => [null, "\x7F", null],
3559     out2 => [null, "\x7F", null],
3560     },
3561     {
3562     id => q<l=1.80>,
3563     in => qq<\x80>,
3564     out1 => [null, "\x80", [<Q::cs|illegal-octets-error>]],
3565     out2 => [null, "\x80", [<Q::cs|illegal-octets-error>]],
3566     },
3567     {
3568     id => q<l=1.8c>,
3569     in => qq<\x8C>,
3570     out1 => [null, "\x8C", [<Q::cs|illegal-octets-error>]],
3571     out2 => [null, "\x8C", [<Q::cs|illegal-octets-error>]],
3572     },
3573     {
3574     id => q<l=1.8e>,
3575     in => qq<\x8E>,
3576     out1 => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
3577     out2 => [null, "\x8E", [<Q::cs|illegal-octets-error>]],
3578     },
3579     {
3580     id => q<l=1.8f>,
3581     in => qq<\x8F>,
3582     out1 => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
3583     out2 => [null, "\x8F", [<Q::cs|illegal-octets-error>]],
3584     },
3585     {
3586     id => q<l=1.a0>,
3587     in => qq<\xA0>,
3588     out1 => [null, "\xA0", [<Q::cs|illegal-octets-error>]],
3589     out2 => [null, "\xA0", [<Q::cs|illegal-octets-error>]],
3590     },
3591     {
3592     id => q<l=1.a1>,
3593     in => qq<\xA1>,
3594     out1 => [null, "\xA1", [<Q::cs|illegal-octets-error>]],
3595     out2 => [null, "\xA1", [<Q::cs|illegal-octets-error>]],
3596     },
3597     {
3598     id => q<l=1.a2>,
3599     in => qq<\xA2>,
3600     out1 => [null, "\xA2", [<Q::cs|illegal-octets-error>]],
3601     out2 => [null, "\xA2", [<Q::cs|illegal-octets-error>]],
3602     },
3603     {
3604     id => q<l=1.df>,
3605     in => qq<\xdf>,
3606     out1 => [null, "\xDF", [<Q::cs|illegal-octets-error>]],
3607     out2 => [null, "\xDF", [<Q::cs|illegal-octets-error>]],
3608     },
3609     {
3610     id => q<l=1.e0>,
3611     in => qq<\xe0>,
3612     out1 => [null, "\xE0", [<Q::cs|illegal-octets-error>]],
3613     out2 => [null, "\xE0", [<Q::cs|illegal-octets-error>]],
3614     },
3615     {
3616     id => q<l=1.ef>,
3617     in => qq<\xEF>,
3618     out1 => [null, "\xEF", [<Q::cs|illegal-octets-error>]],
3619     out2 => [null, "\xEF", [<Q::cs|illegal-octets-error>]],
3620     },
3621     {
3622     id => q<F0>,
3623     in => qq<\xF0>,
3624     out1 => [null, "\xF0", [<Q::cs|illegal-octets-error>]],
3625     out2 => [null, "\xF0", [<Q::cs|illegal-octets-error>]],
3626     },
3627     {
3628     id => q<l=1.fc>,
3629     in => qq<\xFC>,
3630     out1 => [null, "\xFC", [<Q::cs|illegal-octets-error>]],
3631     out2 => [null, "\xFC", [<Q::cs|illegal-octets-error>]],
3632     },
3633     {
3634     id => q<l=1.fd>,
3635     in => qq<\xFD>,
3636     out1 => [null, "\xFD", [<Q::cs|illegal-octets-error>]],
3637     out2 => [null, "\xFD", [<Q::cs|illegal-octets-error>]],
3638     },
3639     {
3640     id => q<l=1.fe>,
3641     in => qq<\xFE>,
3642     out1 => [null, "\xFE", [<Q::cs|illegal-octets-error>]],
3643     out2 => [null, "\xFE", [<Q::cs|illegal-octets-error>]],
3644     },
3645     {
3646     id => q<l=1.ff>,
3647     in => qq<\xFF>,
3648     out1 => [null, "\xFF", [<Q::cs|illegal-octets-error>]],
3649     out2 => [null, "\xFF", [<Q::cs|illegal-octets-error>]],
3650     },
3651     {
3652     id => q<l=2.0000>,
3653     in => qq<\x00\x00>,
3654     out1 => [null, "\x00", null, "\x00", null],
3655     out2 => [null, "\x00", null, "\x00", null],
3656     },
3657     {
3658     id => q<l=2.0D0A>,
3659     in => qq<\x0D\x0A>,
3660     out1 => [null, "\x0D", null, "\x0A", null],
3661     out2 => [null, "\x0D", null, "\x0A", null],
3662     },
3663     {
3664     id => q<l=2.1B1B>,
3665     in => qq<\x1B\x1B>,
3666     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>],
3667     "\x1B", [<Q::cs|illegal-octets-error>]],
3668     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>],
3669     "\x1B", [<Q::cs|illegal-octets-error>]],
3670     },
3671     {
3672     id => q<l=2.1B20>,
3673     in => qq<\x1B\x20>,
3674     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x20", null],
3675     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x20", null],
3676     },
3677     {
3678     id => q<l=2.1B24>,
3679     in => qq<\x1B\x24>,
3680     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x24", null],
3681     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x24", null],
3682     },
3683     {
3684     id => q<l=2.1B28>,
3685     in => qq<\x1B\x28>,
3686     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x28", null],
3687     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "\x28", null],
3688     },
3689     {
3690     id => q<l=2.2020>,
3691     in => qq<\x20\x20>,
3692     out1 => [null, "\x20", null, "\x20", null],
3693     out2 => [null, "\x20", null, "\x20", null],
3694     },
3695     {
3696     id => q<l=2.ab>,
3697     in => qq<ab>,
3698     out1 => [null, "a", null, "b", null],
3699     out2 => [null, "a", null, "b", null],
3700     },
3701     {
3702     id => q<8040>,
3703     in => qq<\x80\x40>,
3704     out1 => [null, "\x80", [<Q::cs|illegal-octets-error>],
3705     "\x40", null],
3706     out2 => [null, "\x80", [<Q::cs|illegal-octets-error>],
3707     "\x40", null],
3708     },
3709     {
3710     id => q<1B2440>,
3711     in => qq<\x1B\x24\x40>,
3712     out1 => [null],
3713     out2 => [null],
3714     eof_error => [<Q::cs|invalid-state-error>],
3715     },
3716     {
3717     id => q<1B2442>,
3718     in => qq<\x1B\x24\x42>,
3719     out1 => [null],
3720     out2 => [null],
3721     eof_error => [<Q::cs|invalid-state-error>],
3722     },
3723     {
3724     id => q<1B2840>,
3725     in => qq<\x1B\x28\x40>,
3726     out1 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "(", null,
3727     "\x40", null],
3728     out2 => [null, "\x1B", [<Q::cs|illegal-octets-error>], "(", null,
3729     "\x40", null],
3730     },
3731     {
3732     id => q<1B2842>,
3733     in => qq<\x1B\x28\x42>,
3734     out1 => [null],
3735     out2 => [null],
3736     },
3737     {
3738     id => q<1B284A>,
3739     in => qq<\x1B\x28\x4A>,
3740     out1 => [null],
3741     out2 => [null],
3742     eof_error => [<Q::cs|invalid-state-error>],
3743     },
3744     {
3745     id => q<1B$B1B(B>,
3746     in => qq<\x1B\x24\x42\x1B\x28\x42>,
3747     out1 => [null],
3748     out2 => [null],
3749     },
3750     {
3751     id => q<1B(B1B(B>,
3752     in => qq<\x1B\x28\x42\x1B\x28\x42>,
3753     out1 => [null],
3754     out2 => [null],
3755     },
3756     {
3757     id => q<1B(Ba1B(B>,
3758     in => qq<\x1B\x28\x42a\x1B\x28\x42>,
3759     out1 => [null, "a", null],
3760     out2 => [null, "a", null],
3761     },
3762     {
3763     id => q<1B(Ba1B(B1B(B>,
3764     in => qq<\x1B\x28\x42a\x1B\x28\x42\x1B\x28\x42>,
3765     out1 => [null, "a", null],
3766     out2 => [null, "a", null],
3767     },
3768     {
3769     id => q<1B$42!!1B2842>,
3770     in => qq<\x1B\x24\x42!!\x1B\x28\x42>,
3771     out1 => [null, "\x{3000}", null],
3772     out2 => [null, "\x{3000}", null],
3773     },
3774     {
3775     id => q<1B$4221211B284A>,
3776     in => qq<\x1B\x24\x42!!\x1B\x28\x4A>,
3777     out1 => [null, "\x{3000}", null],
3778     out2 => [null, "\x{3000}", null],
3779     eof_error => [<Q::cs|invalid-state-error>],
3780     },
3781     {
3782     id => q<1B$4021211B2842>,
3783     in => qq<\x1B\x24\x40!!\x1B\x28\x42>,
3784     out1 => [null, "\x{3000}", null],
3785     out2 => [null, "\x{3000}", null],
3786     },
3787     {
3788     id => q<1B$402121211B2842>,
3789     in => qq<\x1B\x24\x40!!!\x1B\x28\x42>,
3790     out1 => [null, "\x{3000}", null, "!", [<Q::cs|illegal-octets-error>]],
3791     out2 => [null, "\x{3000}", null, "!", [<Q::cs|illegal-octets-error>]],
3792     },
3793     {
3794     id => q<1B$4021211B2442!!1B2842>,
3795     in => qq<\x1B\x24\x40!!\x1B\x24\x42!!\x1B\x28\x42>,
3796     out1 => [null, "\x{3000}", null, "\x{3000}", null],
3797     out2 => [null, "\x{3000}", null, "\x{3000}", null],
3798     },
3799     {
3800     id => q<1B$4021211B2440!!1B2842>,
3801     in => qq<\x1B\x24\x40!!\x1B\x24\x40!!\x1B\x28\x42>,
3802     out1 => [null, "\x{3000}", null, "\x{3000}", null],
3803     out2 => [null, "\x{3000}", null, "\x{3000}", null],
3804     },
3805     {
3806     id => q<1B$@!"1B(B\~|>,
3807     in => qq<\x1B\x24\x40!"\x1B(B\\~|>,
3808     out1 => [null, "\x{3001}", null, "\x5C", null,
3809     "\x7E", null, "|", null],
3810     out2 => [null, "\x{3001}", null, "\x5C", null,
3811     "\x7E", null, "|", null],
3812     },
3813     {
3814     id => q<1B$B!"1B(J\~|1B(B>,
3815     in => qq<\x1B\x24\x42!"\x1B(J\\~|\x1B(B>,
3816     out1 => [null, "\x{3001}", null, "\xA5", null,
3817     "\x{203E}", null, "|", null],
3818     out2 => [null, "\x{3001}", null, "\xA5", null,
3819     "\x{203E}", null, "|", null],
3820     },
3821     {
3822     id => q<78compat.3022(16-02)>,
3823     in => qq<\x1B\$\@\x30\x22\x1B\$B\x30\x22\x1B(B>,
3824     out1 => [null, "\x{555E}", null, "\x{5516}", null],
3825     out2 => [null, "\x{5516}", null, "\x{5516}", null],
3826     },
3827     {
3828     id => q<unassigned.2239>,
3829     in => qq<\x1B\$\@\x22\x39\x1B\$B\x22\x39\x1B(B>,
3830     out1 => [null, "\x22\x39", [<Q::cs|unassigned-code-point-error>],
3831     "\x22\x39", [<Q::cs|unassigned-code-point-error>]],
3832     out2 => [null, "\x22\x39", [<Q::cs|unassigned-code-point-error>],
3833     "\x22\x39", [<Q::cs|unassigned-code-point-error>]],
3834     },
3835     {
3836     id => q<83add.223A>,
3837     in => qq<\x1B\$\@\x22\x3A\x1B\$B\x22\x3A\x1B(B>,
3838     out1 => [null, "\x22\x3A", [<Q::cs|unassigned-code-point-error>],
3839     "\x{2208}", null],
3840     out2 => [null, "\x{2208}", null, "\x{2208}", null],
3841     },
3842     {
3843     id => q<83add.2840>,
3844     in => qq<\x1B\$\@\x28\x40\x1B\$B\x28\x40\x1B(B>,
3845     out1 => [null, "\x28\x40", [<Q::cs|unassigned-code-point-error>],
3846     "\x{2542}", null],
3847     out2 => [null, "\x{2542}", null, "\x{2542}", null],
3848     },
3849     {
3850     id => q<83add.7421>,
3851     in => qq<\x1B\$\@\x74\x21\x1B\$B\x74\x21\x1B(B>,
3852     out1 => [null, "\x74\x21", [<Q::cs|unassigned-code-point-error>],
3853     "\x{582F}", null],
3854     out2 => [null, "\x{5C2D}", null, "\x{582F}", null],
3855     },
3856     {
3857     id => q<83swap.3033>,
3858     in => qq<\x1B\$\@\x30\x33\x1B\$B\x30\x33\x1B(B>,
3859     out1 => [null, "\x{9C3A}", null, "\x{9BF5}", null],
3860     out2 => [null, "\x{9C3A}", null, "\x{9BF5}", null],
3861     },
3862     {
3863     id => q<83swap.724D>,
3864     in => qq<\x1B\$\@\x72\x4D\x1B\$B\x72\x4D\x1B(B>,
3865     out1 => [null, "\x{9BF5}", null, "\x{9C3A}", null],
3866     out2 => [null, "\x{9BF5}", null, "\x{9C3A}", null],
3867     },
3868     {
3869     id => q<90add.7425>,
3870     in => qq<\x1B\$\@\x74\x25\x1B\$B\x74\x25\x1B(B>,
3871     out1 => [null, "\x74\x25", [<Q::cs|unassigned-code-point-error>],
3872     "\x74\x25", [<Q::cs|unassigned-code-point-error>]],
3873     out2 => [null, "\x{51DC}", null, "\x{51DC}", null],
3874     },
3875     {
3876     id => q<90add.7426>,
3877     in => qq<\x1B\$\@\x74\x26\x1B\$B\x74\x26\x1B(B>,
3878     out1 => [null, "\x74\x26", [<Q::cs|unassigned-code-point-error>],
3879     "\x74\x26", [<Q::cs|unassigned-code-point-error>]],
3880     out2 => [null, "\x{7199}", null, "\x{7199}", null],
3881     },
3882     );
3883    
3884     for my $testdata (@testdata) {
3885     for my $c (1..2) {
3886     my $byte = $testdata->{in};
3887     my $error;
3888     my $i = 0;
3889    
3890     open my $fh, '<', \$byte;
3891     my $efh = $impl-><M::MCEncodeImplementation.createMCDecodeHandle>
3892     ([null, <Q::icharset|iso-2022-jp>,
3893     <Q::cs|XML.iso-2022-jp>]->[$c], $fh, sub {
3894     my (null, null, $etype, %opt) = @_;
3895     $error = [$etype, \%opt];
3896     });
3897    
3898     while (@{$testdata->{"out$c"}}) {
3899     if ($i != 0) {
3900     my $c = shift @{$testdata->{"out$c"}};
3901     $test->id ("$testdata->{id}.$i.$c");
3902     $test->assert_equals ($efh->getc, $c);
3903     }
3904    
3905     my $v = shift @{$testdata->{"out$c"}};
3906     $test->id ("$testdata->{id}.$i.error.$c");
3907     if (defined $v) {
3908     $test->assert_not_null ($error);
3909     $test->assert_equals ($error->[0], $v->[0]);
3910     } else {
3911     $test->assert_null ($error->[0]);
3912     }
3913     $error = null;
3914     $i++;
3915     }
3916    
3917     $test->id ("$testdata->{id}.eof.$c");
3918     $test->assert_null ($efh->getc);
3919    
3920     my $v = $testdata->{eof_error};
3921     $test->id ("$testdata->{id}.$i.error.$c");
3922     if (defined $v) {
3923     $test->assert_not_null ($error);
3924     $test->assert_equals ($error->[0], $v->[0]);
3925     } else {
3926     $test->assert_null ($error->[0]);
3927     }
3928     $error = null;
3929     }} # testdata
3930     ##MCISO2022JPDecodeHandle
3931    
3932     PropDef:
3933     @QName: mce|state
3934     @mce:key: s
3935     @enDesc:
3936     ISO-2022-JP state.
3937    
3938     ResourceDef:
3939     @QName: mce|State.2440
3940     @mce:key: s1
3941     @enDesc:
3942     ISO-2022-JP state: ESC 2/4 4/0.
3943     @DISCore:resourceType: DISCore|Resource
3944     @For: =ManakaiDOM|all
3945     @DISCore:resourceType: DISCore|Property
3946    
3947     ResourceDef:
3948     @QName: mce|State.2442
3949     @mce:key: s2
3950     @enDesc:
3951     ISO-2022-JP state: ESC 2/4 4/2.
3952     @DISCore:resourceType: DISCore|Resource
3953     @For: =ManakaiDOM|all
3954     @DISCore:resourceType: DISCore|Property
3955    
3956     ResourceDef:
3957     @QName: mce|State.2842
3958     @mce:key: s3
3959     @enDesc:
3960     ISO-2022-JP state: ESC 2/8 4/2.
3961     @DISCore:resourceType: DISCore|Resource
3962     @For: =ManakaiDOM|all
3963    
3964     ResourceDef:
3965     @QName: mce|State.284A
3966     @mce:key: s4
3967     @enDesc:
3968     ISO-2022-JP state: ESC 2/8 4/10.
3969     @DISCore:resourceType: DISCore|Resource
3970     @For: =ManakaiDOM|all
3971 wakaba 1.2
3972     ResourceDef:
3973     @QName: String
3974 wakaba 1.10 @AliasFor: str|DOMString
3975 wakaba 1.2 @For: ManakaiDOM|ManakaiDOM
3976    
3977     PropDef:
3978 wakaba 1.3 @QName: mce|onerror
3979 wakaba 1.2 @mce:key: onerr
3980 wakaba 1.3
3981     PropDef:
3982     @QName: mce|inputEncoding
3983     @mce:key: ie
3984    
3985     PropDef:
3986     @QName: mce|hasBOM
3987     @mce:key: bom
3988 wakaba 1.2
3989     PropDef:
3990     @QName: mce|continue
3991     @mce:key: cc
3992     @enDesc:
3993     Whether the <Q::mce|byteBuffer> contains octets that might
3994     be part of characters.
3995    
3996     PropDef:
3997     @QName: mce|charset
3998     @mce:key: cs
3999    
4000     PropDef:
4001     @QName: mce|perlEncodingName
4002     @mce:key: enc
4003    
4004     PropDef:
4005     @QName: mce|filehandle
4006     @mce:key: fh
4007    
4008     RPropDef:
4009     @QName: mce|key
4010     @subsetOf: DISPerl|propHashKey
4011    
4012     PropDef:
4013     @QName: mce|characterQueue
4014     @mce:key: cq
4015    
4016     PropDef:
4017     @QName: mce|byteBuffer
4018     @mce:key: bb
4019    
4020     ElementTypeBinding:
4021     @Name: PropDef
4022     @ElementType:
4023     dis:ResourceDef
4024     @ShadowContent:
4025     @@DISCore:resourceType: DISCore|Property
4026     @@ForCheck: =ManakaiDOM|all
4027    
4028     ElementTypeBinding:
4029     @Name: RPropDef
4030     @ElementType:
4031     dis:ResourceDef
4032     @ShadowContent:
4033     @@DISCore:resourceType: DISSource|ResourceProperty
4034     @@ForCheck: =ManakaiDOM|all
4035    
4036    
4037     ElementTypeBinding:
4038     @Name: nullCase
4039     @ElementType:
4040     dis:ResourceDef
4041     @ShadowContent:
4042     @@DISCore:resourceType: ManakaiDOM|InCase
4043     @@Value:
4044     @@@is-null:1
4045     @@@ContentType: DISCore|String
4046    
4047     ElementTypeBinding:
4048     @Name: TrueCase
4049     @ElementType:
4050     dis:ResourceDef
4051     @ShadowContent:
4052     @@DISCore:resourceType: ManakaiDOM|InCase
4053     @@Value:
4054     @@@@: 1
4055     @@@ContentType: DISCore|Boolean
4056     @@Type: idl|boolean||ManakaiDOM|all
4057    
4058     ElementTypeBinding:
4059     @Name: FalseCase
4060     @ElementType:
4061     dis:ResourceDef
4062     @ShadowContent:
4063     @@DISCore:resourceType: ManakaiDOM|InCase
4064     @@Value:
4065     @@@@: 0
4066     @@@ContentType: DISCore|Boolean
4067     @@Type: idl|boolean||ManakaiDOM|all
4068    
4069     ResourceDef:
4070     @DISCore:resourceType: cs|CharsetSet
4071     @cs:moduleRef: DISlib|Charset

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24