/[suikacvs]/markup/html/whatpm/Whatpm/Charset/DecodeHandle.pm
Suika

Log of /markup/html/whatpm/Whatpm/Charset/DecodeHandle.pm

Parent Directory Parent Directory | Revision Log Revision Log


Links to HEAD: (view) (download) (annotate)
Links to HEAD: (view) (download) (annotate)
Sticky Tag:

Revision 1.18 - (view) (download) (annotate) - [select for diffs]
Mon Sep 15 07:19:03 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
CVS Tags: HEAD
Changes since 1.17: +10 -3 lines
Diff to previous 1.17
++ whatpm/Whatpm/ChangeLog	15 Sep 2008 07:17:34 -0000
	* HTML.pm.src: Remove checking for control character, surrogate
	pair, or noncharacter code points and non-Unicode code
	points (they should be handled by Whatpm::Charset::UnicodeChecker).
	(parse_char_stream): Support for the |$get_wrapper| argument and
	character stream error handlers.

2008-09-15  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/Charset/ChangeLog	15 Sep 2008 07:18:45 -0000
	* DecodeHandle.pm (onerror): Return |undef| if no explicit value
	is set.

	* UnicodeChecker.pm: Support for HTML5 parse errors.
	(onerror): Return |undef| if no explicit value is set.

2008-09-15  Wakaba  <wakaba@suika.fam.cx>


Revision 1.17 - (view) (download) (annotate) - [select for diffs]
Sun Sep 14 11:57:41 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
Changes since 1.16: +3 -1 lines
Diff to previous 1.16
++ whatpm/Whatpm/ChangeLog	14 Sep 2008 11:56:24 -0000
	* HTML.pm.src: Use |read| instead of |getc|.  |set_inner_html|
	would report character error from now.

2008-09-14  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/Charset/ChangeLog	14 Sep 2008 11:57:38 -0000
	* DecodeHandle.pm (CharString onerror): New method.

	* UnicodeString.pm (read): New.
	(getc): Removed.
	(manakai_read_until): Checking operation implemented.

2008-09-14  Wakaba  <wakaba@suika.fam.cx>


Revision 1.16 - (view) (download) (annotate) - [select for diffs]
Sun Sep 14 07:19:47 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
Changes since 1.15: +18 -100 lines
Diff to previous 1.15
++ whatpm/Whatpm/ChangeLog	14 Sep 2008 07:19:17 -0000
	* HTML.pm.src: Make a "bare ero" error for unknown
	entities point the "&" character.

2008-09-14  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/Charset/ChangeLog	14 Sep 2008 07:19:40 -0000
	* DecodeHandle.pm: Merge the ShiftJIS class into the Encode class.

2008-09-14  Wakaba  <wakaba@suika.fam.cx>


Revision 1.15 - (view) (download) (annotate) - [select for diffs]
Sun Sep 14 06:58:28 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
Changes since 1.14: +102 -186 lines
Diff to previous 1.14
++ whatpm/Whatpm/ChangeLog	14 Sep 2008 06:57:36 -0000
	* HTML.pm.src: It turns out that U+FFFD don't have to
	be added to the list of excluded characters.

2008-09-14  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/Charset/ChangeLog	14 Sep 2008 06:58:07 -0000
	* DecodeHandle.pm: Merge the EUCJP class into the Encode class.

2008-09-14  Wakaba  <wakaba@suika.fam.cx>


Revision 1.14 - (view) (download) (annotate) - [select for diffs]
Sun Sep 14 06:32:49 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
Changes since 1.13: +139 -84 lines
Diff to previous 1.13
++ whatpm/Whatpm/ChangeLog	14 Sep 2008 06:32:02 -0000
	* HTML.pm.src ($char_onerror): Have character decoder's |line|
	and |column| a higher priority than the one set by the
	tokenizer's input handler.
	($self->{read_until}): Exclude U+FFFD (but this might
	not be necessary, since now we do line/column fixup in
	the character decode handle).

2008-09-14  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/Charset/ChangeLog	14 Sep 2008 06:32:40 -0000
	* DecodeHandle.pm: EUCJP class reimplemented using |read|-centric
	model.

2008-09-14  Wakaba  <wakaba@suika.fam.cx>


Revision 1.13 - (view) (download) (annotate) - [select for diffs]
Sun Sep 14 03:59:08 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
Changes since 1.12: +2 -2 lines
Diff to previous 1.12
++ whatpm/Whatpm/ChangeLog	14 Sep 2008 03:58:42 -0000
	* HTML.pm.src: Use |{read_until}| where possible.

2008-09-14  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/Charset/ChangeLog	14 Sep 2008 03:59:00 -0000
	* DecodeHandle.pm: Typo fixed.

2008-09-14  Wakaba  <wakaba@suika.fam.cx>


Revision 1.12 - (view) (download) (annotate) - [select for diffs]
Sun Sep 14 03:07:58 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
Changes since 1.11: +44 -36 lines
Diff to previous 1.11
++ whatpm/Whatpm/ChangeLog	14 Sep 2008 03:06:56 -0000
	* HTML.pm.src: Change |{getc_until}| to |{read_until}|
	and |manakai_getc_until| to |manakai_read_until| to
	reduce the number of string copies.

2008-09-14  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/Charset/ChangeLog	14 Sep 2008 03:07:37 -0000
	* DecodeHandle.pm, UnicodeChecker.pm: Change |manakai_getc_until|
	to |manakai_read_until| to reduce the number of string copies.

2008-09-14  Wakaba  <wakaba@suika.fam.cx>


Revision 1.11 - (view) (download) (annotate) - [select for diffs]
Sun Sep 14 01:51:08 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
Changes since 1.10: +133 -7 lines
Diff to previous 1.10
++ whatpm/Whatpm/ChangeLog	14 Sep 2008 01:47:27 -0000
2008-09-14  Wakaba  <wakaba@suika.fam.cx>

	* HTML.pm.src (parse_char_string): Use newly created
	|Whatpm::Charset::DecodeHandle::CharString| instead of Perl's
	standard feature to |open| a string as a filehandle,
	since Perl's string filehandle seems not supporting |ungetc|
	method correctly.
	(parse_char_stream): Define |{getc_until}| method.
	(DATA_STATE): Experimental support for |getc_until| feature.

++ whatpm/Whatpm/Charset/ChangeLog	14 Sep 2008 01:50:52 -0000
2008-09-14  Wakaba  <wakaba@suika.fam.cx>

	* DecodeHandle.pm (CharString): New class.
	(Encode read): Don't remove read string from |{char_buffer}|,
	to decease the number of string operations and to enable
	|manakai_getc_until| ungetc'ing without any string operation.
	(manakai_getc_until): New method.

	* UnicodeChecker.pm (getc): Don't |read| more than one
	character, to prevent characters being bufferred
	such that mixture of |getc| and |manakai_getc_until|
	calls does not make the result broken.


Revision 1.10 - (view) (download) (annotate) - [select for diffs]
Fri Sep 12 03:31:40 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
Changes since 1.9: +4 -1 lines
Diff to previous 1.9
++ whatpm/Whatpm/ContentChecker/ChangeLog	12 Sep 2008 03:31:23 -0000
2008-09-12  Wakaba  <wakaba@suika.fam.cx>

	* HTML.pm: The WF2 |oninput| attribute was not added.
	The WF2 |onforminput| and |onformchange| attributes were
	not added, since they are not defined (although they ARE
	mentioned) in the normative part of the WF2 spec.
	Typo on the |oninvalid| attribute fixed.


Revision 1.9 - (view) (download) (annotate) - [select for diffs]
Thu Sep 11 12:09:38 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
Changes since 1.8: +112 -29 lines
Diff to previous 1.8
++ whatpm/Whatpm/Charset/ChangeLog	11 Sep 2008 12:09:15 -0000
	* UnicodeChecker.pm, DecodeHandle.pm: Try to reduce the
	number of string copies and method calls, first round.

2008-09-11  Wakaba  <wakaba@suika.fam.cx>


Revision 1.8 - (view) (download) (annotate) - [select for diffs]
Thu Sep 11 09:55:56 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
Changes since 1.7: +21 -1 lines
Diff to previous 1.7
++ whatpm/Whatpm/Charset/ChangeLog	11 Sep 2008 09:55:54 -0000
	* UnicodeChecker.pm, DecodeHandle.pm: Tentative support
	for |read| method.

2008-09-11  Wakaba  <wakaba@suika.fam.cx>


Revision 1.7 - (view) (download) (annotate) - [select for diffs]
Wed Sep 10 10:27:09 2008 UTC (16 years, 1 month ago) by wakaba
Branch: MAIN
Changes since 1.6: +26 -11 lines
Diff to previous 1.6
++ whatpm/Whatpm/ChangeLog	10 Sep 2008 10:25:19 -0000
2008-09-10  Wakaba  <wakaba@suika.fam.cx>

	* ContentChecker.pm: Support for charset-layer error levels.

	* HTML.pm.src: Don't specify |text| argument for the
	|chardecode:fallback| error, since it is not the encoding
	being used alternatively.

++ whatpm/Whatpm/Charset/ChangeLog	10 Sep 2008 10:26:52 -0000
2008-09-10  Wakaba  <wakaba@suika.fam.cx>

	* DecodeHandle.pm: Set error levels.

	* WebLatin1.pm: Support for |us-ascii| and |iso-8859-5|
	charsets (this module no longer for Latin1, but for Latin*
	encodings).

	* WebThai.pm: Support for |tis-620| charset.


Revision 1.6 - (view) (download) (annotate) - [select for diffs]
Sun May 18 06:07:22 2008 UTC (16 years, 5 months ago) by wakaba
Branch: MAIN
Changes since 1.5: +12 -2 lines
Diff to previous 1.5
++ whatpm/Whatpm/Charset/ChangeLog	18 May 2008 06:07:15 -0000
	* WebThai.pm, WebLatin1.pm: New modules.

	* DecodeHandle.pm: Support for fallback character mappings,
	used for WebLatin1 and WebThai encodings.

2008-05-18  Wakaba  <wakaba@suika.fam.cx>


Revision 1.5 - (view) (download) (annotate) - [select for diffs]
Sun May 18 04:15:52 2008 UTC (16 years, 5 months ago) by wakaba
Branch: MAIN
Changes since 1.4: +7 -3 lines
Diff to previous 1.4
++ whatpm/Whatpm/ChangeLog	18 May 2008 04:15:00 -0000
	* HTML.pm.src (parse_byte_string): Redefined to invoke
	|parse_byte_stream|.
	(parse_byte_stream): New method.

2008-05-18  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/Charset/ChangeLog	18 May 2008 04:15:42 -0000
	* DecodeHandle.pm (ShiftJIS): \xF0-\xFC should be considered
	as part of double-byte characters for more user-friendly error
	recovery.

2008-05-18  Wakaba  <wakaba@suika.fam.cx>


Revision 1.4 - (view) (download) (annotate) - [select for diffs]
Sun May 18 03:46:30 2008 UTC (16 years, 5 months ago) by wakaba
Branch: MAIN
Changes since 1.3: +10 -1 lines
Diff to previous 1.3
++ whatpm/Whatpm/ChangeLog	18 May 2008 03:46:02 -0000
2008-05-18  Wakaba  <wakaba@suika.fam.cx>

	* HTML.pm.src (parse_byte_string): Fix the column number reported
	by encoding layer error reporter.

++ whatpm/Whatpm/Charset/ChangeLog	18 May 2008 03:46:17 -0000
2008-05-18  Wakaba  <wakaba@suika.fam.cx>

	* DecodeHandle.pm (getc): Ignore BOM if BOM pattern is specified.


Revision 1.3 - (view) (download) (annotate) - [select for diffs]
Sat May 17 12:29:24 2008 UTC (16 years, 5 months ago) by wakaba
Branch: MAIN
Changes since 1.2: +40 -18 lines
Diff to previous 1.2
++ whatpm/Whatpm/ChangeLog	17 May 2008 12:28:47 -0000
	* HTML.pm.src (parse_byte_string): Use streaming decoder
	rather than converting the whole byte string and then parsing.
	Propagate errors in character encoding layer.
	(get_next_token): Precise error reporting for |bare stago| error.

2008-05-17  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/Charset/ChangeLog	17 May 2008 12:29:09 -0000
2008-05-17  Wakaba  <wakaba@suika.fam.cx>

	* DecodeHandle.pm (ByteBuffer): New class.


Revision 1.2 - (view) (download) (annotate) - [select for diffs]
Sun Jul 15 16:51:14 2007 UTC (17 years, 3 months ago) by wakaba
Branch: MAIN
Changes since 1.1: +138 -139 lines
Diff to previous 1.1
++ whatpm/t/ChangeLog	15 Jul 2007 16:51:05 -0000
	* Charset-DecodeHandler.t: Cases for charset names has
	been changed to match with the module's implementation.

2007-07-15  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/Charset/ChangeLog	15 Jul 2007 16:49:53 -0000
	* DecodeHandle.pm (create_decode_handle): Set canonical
	or specified name to |input_encoding| attribute.
	(uri_to_name): Reimplemented.
	(input_encoding): Return charset name returned
	by |uri_to_name| if available.
	($CharsetDef): Property |xml_name| is now contain
	only name defined in XML specifications.

2007-07-15  Wakaba  <wakaba@suika.fam.cx>


Revision 1.1 - (view) (download) (annotate) - [select for diffs]
Sun Jul 15 12:41:51 2007 UTC (17 years, 3 months ago) by wakaba
Branch: MAIN
++ whatpm/t/ChangeLog	15 Jul 2007 12:41:07 -0000
2007-07-15  Wakaba  <wakaba@suika.fam.cx>

	* Charset-DecodeHandler.t: New test script.

++ whatpm/Whatpm/ChangeLog	15 Jul 2007 08:22:21 -0000
	* Charset/: New directory.

2007-07-15  Wakaba  <wakaba@suika.fam.cx>

++ whatpm/Whatpm/Charset/ChangeLog	15 Jul 2007 08:22:53 -0000
2007-07-15  Wakaba  <wakaba@suika.fam.cx>

	* DecodeHandle.pm: New Perl module (created
	from manakai's |Encode.dis|).

2007-07-15  Wakaba  <wakaba@suika.fam.cx>

	* ChangeLog: New file.


This form allows you to request diffs between any two revisions of this file. For each of the two "sides" of the diff, select a symbolic revision name using the selection box, or choose 'Use Text Field' and enter a numeric revision.

  Diffs between and
  Type of Diff should be a

Sort log by:

admin@suikawiki.org
ViewVC Help
Powered by ViewVC 1.1.24