2008-10-04 Wakaba
* HTML.pm.src: Support for new flow (but not phrasing) elements (HTML5
revisions 1731 and 1778). Support for the end tag (HTML5
revision 1731).
2008-10-04 Wakaba
* HTML.pm.src: Support for |command| and |eventsource| elements (HTML5
revision 1731). End tags of |option| and |optgroup| elements are
now optional (HTML5 revision 1731).
2008-10-04 Wakaba
* HTML.pm.src: New "special" elements added to the list (HTML5
revision 1778). "strile" -> "strike".
2008-10-02 Wakaba
* ContentType.pm (get_sniffed_type): Support for the "better"
content sniffing (HTML5 revision 1927). In a case the official
type was not returned when the method is invoked in the list
context.
2008-09-22 Wakaba
* HTML.pm.src: Character references for non-space C0 characters,
including U+000B VT, DEL character, noncharacter code points, are
now converted to the U+FFFD character (cf. HTML5 revision 2138).
2008-09-21 Wakaba
* ContentChecker.pm: |form=""| check support added.
2008-09-21 Wakaba
* ContentChecker.pm: |contextmenu| validness is now checked using
|id| and |id_type| properties, and |menu| property is removed.
2008-09-21 Wakaba
* ContentChecker.pm: Prepare for |form| |name| attribute's
duplication checking.
2008-09-21 Wakaba
* HTML.pm.src (parse_byte_stream): Support (or non-support) for
unsupported charset="" parameter value (HTML5 revision 2131).
2008-09-20 Wakaba
* HTML.pm.src: Reminding places where U+000B is allowed as a space
character is fixed (cf. HTML5 revision 1738).
* ContentChecker.pm, HTMLTable.pm: U+000B is no longer part of
space characters (HTML5 revision 1738).
2008-09-20 Wakaba
* HTML.pm.src: The "anything else" case for the "after after body"
insertion mode was not updated to swtich to the "in body"
insertion mode. U+000B is no longer a space character for the
purpose of tree construction phase (HTML5 revision 1738).
2008-09-20 Wakaba
* HTML.pm.src: U+000B is no longer a space character (HTML5
revision 1738).
2008-09-20 Wakaba
* ContentType.pm: 0x0B is no longer a space character (HTML5
revision 1738).
* HTML.pm.src: U+000B is no longer a space character for the
algorithm for extracting an encoding from a Content-Type (HTML5
revision 1738).
2008-09-20 Wakaba
* ContentChecker.pm ($IsInHTMLInteractiveContent): New.
2008-09-18 Wakaba
* LangTag.pm: Add checks for remaining requirements from RFC 4646.
* mklangreg.pl: Sort 'Prefix' values by their length, to ease
matching.
2008-09-18 Wakaba
* LangTag.pm: Warn for private use language subtags. Error level
typos fixed. Support for Suppress-Script field.
* mklangreg.pl: Support for dumping of nested structure.
2008-09-18 Wakaba
* LangTag.pm (check_rfc4646_langtag): Check if a tag is in the
recommended case as per RFC 4646.
2008-09-18 Wakaba
* LangTag.pm (check_rfc4646_langtag): New method.
2008-09-18 Wakaba
* mklangreg.pl: New script.
* Makefile: Updated for creation of the module for language subtag
registry.
2008-09-16 Wakaba
* Makefile: WebIDL.html added.
* WebIDL.pod: New documentation.
2008-09-16 Wakaba
* WebIDL.pm: Checker's error types are redefined.
2008-09-16 Wakaba
* WebIDL.pm: Parser's error types are redefined. Some forward
compatible parsing bugs are fixed. Some unreachable codes are
commented out.
2008-09-16 Wakaba
* WebIDL.pm: Support for the reminding extended attributes are
added. It does not satisfy the definition that a forward
interface declaration has an extended attribute. It seems that
unless explicitly allowed multiple extended attributes with the
same name is not allowed, though it is not explicitly mentioned in
the spec.
2008-09-16 Wakaba
* WebIDL.pm: Unescapes extended attribute names and extended
attribute identifiers. Preserve whether an extended attribute has
an argument list of not. Support for extended attributes:
Constructor, ExceptionConsts, IndexGetter, IndexSetter,
NameGetter, NameSetter, and Null.
(has_argument_list): New attribute.
(idl_text): Stringifies argument lists, if any, even if it is
empty.
2008-09-15 Wakaba
* HTML.pm.src: New state |PCDATA_STATE|. Use an empty string for
|{s_kwd}| in DATA_STATE as default.
2008-09-15 Wakaba
* HTML.pm.src, mkhtmlparser.pl: Replace |{prev_char}|
by |{s_kwd}| in DATA_STATE.
2008-09-15 Wakaba
* HTML.pm.src: Shorten keys.
2008-09-15 Wakaba
* HTML.pm.src: Remove checking for control character, surrogate
pair, or noncharacter code points and non-Unicode code
points (they should be handled by Whatpm::Charset::UnicodeChecker).
(parse_char_stream): Support for the |$get_wrapper| argument and
character stream error handlers.
2008-09-15 Wakaba
* ContentChecker.pm: Don't call |loda_ns_module|
for null-namespace elements/attributes.
* HTML.pm.src: Fact out $disallowed_control_chars
as a hash.
2008-09-14 Wakaba
* HTML.pm.src: Regexp typo fixed. |{prev_char}|
and |{next_char}| initializations are moved to initialization
method. |{read_until}| now supports buffering. Sync |set_inner_html|
with |parse_char_stream|.
2008-09-14 Wakaba
* HTML.pm.src (parse_char_stream): Make |set_next_char|
invoke |manakai_read_until|, not only |read|, where
possible, to decrease the number of |read| method calls.
* mkhtmlparser.pl: Related changes to the aforementioned
modification.
2008-09-14 Wakaba
* HTML.pm.src: Use |read| instead of |getc|. |set_inner_html|
would report character error from now.
2008-09-14 Wakaba
* HTML.pm.src: White-space-leaded non-white-space character
tokens in "before head insertion mode" was not
correctly handled.
(set_inner_html): Reimplemented using CharString decodehandle
class. Support for $get_wrapper argument. Support
for |{read_until}| feature.
2008-09-14 Wakaba
* HTML.pm.src: Make a "bare ero" error for unknown
entities point the "&" character.
2008-09-14 Wakaba
* HTML.pm.src: It turns out that U+FFFD don't have to
be added to the list of excluded characters.
2008-09-14 Wakaba
* HTML.pm.src ($char_onerror): Have character decoder's |line|
and |column| a higher priority than the one set by the
tokenizer's input handler.
($self->{read_until}): Exclude U+FFFD (but this might
not be necessary, since now we do line/column fixup in
the character decode handle).
2008-09-14 Wakaba
* HTML.pm.src: Use |{read_until}| where possible.
2008-09-14 Wakaba
* HTML.pm.src: Change |{getc_until}| to |{read_until}|
and |manakai_getc_until| to |manakai_read_until| to
reduce the number of string copies.
2008-09-14 Wakaba
* HTML.pm.src (parse_char_string): Use newly created
|Whatpm::Charset::DecodeHandle::CharString| instead of Perl's
standard feature to |open| a string as a filehandle,
since Perl's string filehandle seems not supporting |ungetc|
method correctly.
(parse_char_stream): Define |{getc_until}| method.
(DATA_STATE): Experimental support for |getc_until| feature.
2008-09-13 Wakaba
* HTML.pm.src: Check points added to newly added branches.
2008-09-13 Wakaba
* HTML.pm.src: Remove |{char}|, which is no longer used.
Remove |{entity_in_attr}| and |{last_attribute_value_state}|
and replaced by |{prev_state}|.
* mkhtmlparser.pl: Remove |{char}| feature.
Remove |!!!back-next-input-character;| macro.
2008-09-13 Wakaba
* HTML.pm.src: Finally we get rid of all the inner loops. Remove
entity related tokenizer states in favor of new states
implementing the consume character reference algorithm.
2008-09-13 Wakaba
* HTML.pm.src: "Consume a character reference" algorithm is
now implemented as a tokenizer's state, rather than
a method, with minimum changes (more changes will
be made, in due course). "Bogus comment state"'s inner
loop gets removed.
2008-09-13 Wakaba
* HTML.pm.src: Make |PUBLIC| and |SYSTEM| keyword tokenizing
into their own tokenizer states.
2008-09-13 Wakaba
* HTML.pm.src: |CDATA_SECTION_STATE| (formally |CDATA_BLOCK_STATE|
is split into three states.
2008-09-13 Wakaba
* HTML.pm.src: |CLOSE_TAG_OPEN_STATE| is broken into
itself and new |CDATA_PCDATA_CLOSE_TAG_STATE| so that
no longer does the tokenizer have to push back next input
characters in those states.
2008-09-13 Wakaba
* HTML.pm.src: |MARKUP_DECLARATION_OPEN_STATE| broken
into four states so that no longer does the tokenizer have to push
back next input characters in that state.
2008-09-11 Wakaba
* HTML.pm.src: Methods now accept additional parameter, $get_wrapper,
which can be used to insert some wrapper between the character
stream handle and the tokenizer. (It is currently not supported
for |set_inner_html| for |Element|s).
2008-09-10 Wakaba
* HTML.pm.src: Ignore punctuations in charset names.
2008-09-10 Wakaba
* ContentChecker.pm: Support for charset-layer error levels.
* HTML.pm.src: Don't specify |text| argument for the
|chardecode:fallback| error, since it is not the encoding
being used alternatively.
2008-09-06 Wakaba
* HTML.pm.src: Support for |XSLT-compat| (HTML5 revision 2141).
2008-08-31 Wakaba
* CacheManifest.pm: Support for extensibility (HTML5 revision 2051).
2008-08-31 Wakaba
* HTML.pm.src: Bug fix and sync with the spec with regard
to after after frameset insertion mode processing (HTML5
revision 1909). Note that the implementation was wrong
per the old spec before the r1909 changes.
2008-08-30 Wakaba
* HTMLTable.pm: scope=auto algorithm fix synced with the
spec (HTML5 revision 2093).
($process_row): Algorithm step numbers synced with the
spec (HTML5 revision 2092).
2008-08-30 Wakaba
* HTMLTable.pm: Zs is not what we want; we want White_Space! (HTML5
revision 2094).
2008-08-30 Wakaba
* ContentType.pm: Support for image/svg+xml (HTML5 revision 2096).
2008-08-30 Wakaba
* HTML.pm.src: '"' and "'" at the end of attribute
name (after another attribute) now raise parse error (HTML5
revision 2123). Empty unquoted attribute values are no
longer allowed (HTML5 revision 2122).
2008-08-30 Wakaba
* mkhtmlparser.pl: Support for MathML |definitionURL| attribute (HTML5
revision 2130).
2008-08-30 Wakaba
* ContentChecker.pm: |xml:lang| attribute value must be same
as |lang| attribute value for HTML elements (HTML5 revision 2062
and so on).
2008-08-30 Wakaba
* ContentChecker.pm: Error level definition for |xml_id_error|
was missing.
* URIChecker.pm: The end of the URL should be marked as the
error location for an empty path error. The position
between the userinfo and the port components should be
marked as the error location for an empty host error.
2008-08-30 Wakaba
* URIChecker.pm: Set parameters representing where in the
value the error occurs for errors. Report unknown
address format error in warning level, since address
formats are rarely added. Path segments starting with "/.."
were misinterpreted as a dot-segment.
2008-08-30 Wakaba
* URIChecker.pm (check_iri_reference): Requires
|Message::DOM::DOMImplementation|.
2008-08-29 Wakaba
* IMTChecker.pm: Updated for the new error reporting architecture.
* ContentChecker.pm: Error levels for IMTs are added.
2008-08-17 Wakaba
* H2H.pm (_shift_token): Support for unquoted HTML attribute
values.
2008-08-16 Wakaba
* CacheManifest.pm: Support for new style of error
reports.
* HTML.pm.src: Set line=1, column=1 to the document node.
2008-08-16 Wakaba
* ContentChecker.pm, RDFXML.pm: Pass {level} object to language tag
and URL checkers. Support for more error levels for bogus
langauge tag and URL "standards".
* LangTag.pm, URIChecker.pm: Support for new style error
level reporting.
2008-08-15 Wakaba
* ContentChecker.pm: Support for RDF/XML error levels.
* HTMLTable.pm, RDFXML.pm: Support for new style of error level
specifying. Error types are revised.
2008-08-15 Wakaba
* ContentChecker.pm: All error reporting method calls are
renewed.
2008-08-15 Wakaba
* HTML.pm.src: All error type names and "text" parameters
are revised. Use new style for "level" specification.
* mkhtmlparser.pl: Use new style for "level" specification.
2008-08-03 Wakaba
* WebIDL.pm (parse_char_string): Simplified error
reporting process for broken ignored valuetype definition.
(Valuetype idl_text): Support for special "DOMString" name.
2008-08-03 Wakaba
* WebIDL.pm ($get_scoped_name): Append "::::" if the last
terminal of the ScopedName is "DOMString", such that whether
the last part of the scoped name is "DOMString" or "_DOMString"
later. It is necessary to determine whether a |typedef|
definition should be ignored or not.
(parse_char_string): Unescape the identifier of
exception members.
($resolve): Return undef for builtin types and sequence
types (we might not have to do this, however...).
(check): Support checking for Exceptions, Valuetypes,
and Typedefs.
($serialize_type): Support for "DOMString::::" syntax.
(Typedef idl_text): Output Type as "DOMString" if it
is really "DOMString" (i.e. its internal representation
is "::DOMString::").
2008-08-03 Wakaba
* WebIDL.pm ($resolve): New code, based on resolve code
for constant types in the |check| method.
(check): Support for checking of attributes, operations, and
arguments.
(Attribute/Operation idl_text): Exception names in getraises,
setraises, and raises clauses is serizlied by |$serialize_type|
code.
2008-08-02 Wakaba
* WebIDL.pm ($integer): Order of selections are changed to match
hexadecimal numbers (the original pattern, taken from the spec,
was not work for hexadecimal numbers, because the "0" prefix
matches to the [0-7]* part (as an empty string) and therefore
it does not match with remaining "x..." part of a "0x..." integer
literal.
($get_type): It now returns a string, not an array reference,
for regular types and |sequence| types (i.e. it in any case
returns a string).
($get_next_token): The second item in the array that represents
a integer or float token is now a Perl number value, not the
original string representation of the number.
(check): Support for const value consistency checking.
No extended attribute is defined for constants.
(Node subclasses): Use simple strings rather than array references
for default data type values.
($serialize_type): Type values are now simple strings.
(value): If the new attribute value is a false value, then
a FALSE value is set to the attribute.
2008-08-02 Wakaba
* WebIDL.pm ($get_scoped_name): Now scoped names are stored
in its stringified format ("scoped name" as defined in the
spec). Note that future version of this module should not use
array references for type values and the |type_text| attribute
should be made obsolete.
(parse_char_string): Unescape attribute names.
(check): Support for checking of whether inherited interfaces
are actually defined or not. Support for checking of whether
interface member identifiers are duplicated or not.
($serialize_type): Scoped names are returned as is. A future
version of this code should escape identifiers other than "DOMString",
otherwise the idl_text would be non-conforming.
2008-08-02 Wakaba
* WebIDL.pm (parse_char_string): Set line/column numbers
to generated nodes. Unescape identifiers. Extended attributes
for Definition's were ignored.
(append_child): Set |parent_node| attribute.
(parent_node): New attribute.
(check): Support interface/exception members. Support
extended attributes. Support definition identifier uniqueness
constraint.
(qualified_name): New attribute.
(Interface/Exception idl_text): Extended attributes were
not prepended to the returned text.
2008-08-02 Wakaba
* WebIDL.pm (parse_char_string): Set line/column numbers
to interface object experimentally. s/shift/pop/g, shift
would make things wrong. Support for interface forward
declarations was missing. Broken interface declarations
with no block were not ignored entirely.
(Whatpm::WebIDL::Node): New abstract class. This class
makes things easier.
(child_nodes): New attribute. Unlike DOM's attribute with
same name, this attribute returns a dead list of nodes for
simplicity.
(get_user_data, set_user_data): New methods.
(Module idl_text): A SPACE character should be inserted
before the |{| character.
(Interface idl_text): Support for interface forward declarations.
(is_forward_declaration): New attribute.
2008-07-19 Wakaba
* WebIDL.pm (type_text): Better serializer.
2008-07-19 Wakaba
* WebIDL.pm: Revise forward-compatible parsing so that
it now can handle broken extended attributes and as such.
2008-07-19 Wakaba
* WebIDL.pm: Real support for extended attributes.
Support for extended attributes with arguments.
2008-07-19 Wakaba
* WebIDL.pm: Support for |exception| syntax.
(Interface->idl_text): Tentative support for inheritances.
2008-07-19 Wakaba
* WebIDL.pm: Hierarchical scoped name support was broken.
Support for raises, setraises, and getraises syntaxes.
2008-07-18 Wakaba
* WebIDL.pm: Support for |idl_text| attribute, version 1 (no
proper support for types, extended attributes, and exceptions yet).
WebIDL parser, version 1 (no support for exceptions yet,
no proper support for extended attributes yet).
2008-07-09 Wakaba
* WebIDL.pm (parse_char_string): Support for basic attribute syntax.
2008-06-29 Wakaba
* WebIDL.pm: Support for valuetype and const.
2008-06-29 Wakaba
* WebIDL.pm: New module.
2008-06-15 Wakaba
* Makefile (Entities.html): URI changed.
2008-06-08 Wakaba
* HTML.pm.src: Support for ruby parsing (HTML5 revision 1704).
2008-06-01 Wakaba
* HTML.pm.src (_get_next_token): A parse error was missing.
2008-06-01 Wakaba
* mklinktypelist.pl: rel=contact is no longer part of the HTML5
spec (commented out). (HTML5 revision 1711).
2008-05-25 Wakaba
* ContentType.pm: Drop support for UTF-32 (HTML5 revision 1701).
* HTML.pm.src: UTF-16BE and UTF-16LE should be considered
as UTF-16 (HTML5 revision 1701).
2008-05-25 Wakaba
* HTML.pm.src: Support for in (HTML5 revision
1692).
2008-05-25 Wakaba
* HTML.pm.src: The secondary insertion mode used when switching
to foreign content is the "in body" insertion mode (HTML5 revision
1696).
2008-05-25 Wakaba
* HTML.pm.src: Don't raise parse error for (HTML5
revision 1697).
2008-05-24 Wakaba
* HTML.pm.src: Support for end-of-file token in foreign content
insertion mode (HTML5 revision 1693). Update SVG camelCase
attribute list (HTML5 revision 1700).