NAME

Char::Normalize::FullwidthHalfwidth - Fullwidth/halfwidth character normalization


SYNOPSIS

  use Char::Normalize::FullwidthHalfwidth qw/normalize_width/;
  
  $s = <>;
  normalize_width (\$s);
  print $s;


DESCRIPTION

The Char::Normalize::FullwidthHalfwidth module provides a function that normalizes fullwidth/halfwidth compatibility characters into their canonical representations.


FUNCTION

This module provides a function, normalize_width. It can be imported to a package by standard Exporter method, as:

  use Char::Normalize::FullwidthHalfwidth qw/normalize_width/;

Note that the use statement does not export anything unless the function name was explicitly specified.

Alternatively, you can invoke the function in its fully qualified form as:

  require Char::Normalize::FullwidthHalfwidth;
  Char::Normalize::FullwidthHalfwidth::normalize_width (\$scalarref);
normalize_width ($scalarref)

Normalize the fullwidth/halfwidth characters in the scalar referenced by the argument into their preferable form. The argument must be a scalar reference. The scalar is treated as a character string (possibly with the utf8 flag set), not a byte string. The function returns the scalar reference.

The function performs the following conversions:

A character U+3000 IDEOGRAPHIC SPACE (so-called fullwidth space)

Replaced by a U+0020 SPACE (so-called halfwidth space) character.

Characters in the range U+FF01..U+FF5E (so-called fullwidth ASCII characters)

Replaced by a character in the range U+0021..U+007E (so-called halfwidth ASCII characters).

Characters in the range U+FF61..U+FF9F (halfwidth Katakana)

Replaced by a corresponding so-called fullwidth Katakana (or ideographic punctuation). Note that U+FF9E HALFWIDTH KATAKANA VOICED SOUND MARK and U+FF9F HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK are replaced by U+3099 COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK and U+309A COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK respectively, not their spacing variants.

Characters in the range U+FFE0..U+FFE6 (fullwidth symbols)

Replaced by a corresponding canonical character.


BUGS

Not all compatibility characters in the fullwidth and halfwidth block of the Unicode Standard are currently supported - especially, halfwidth Hangul syllables are not converted to their fullwidth equivalents. A future version of this module is expected to address this issue by extending the conversion table.


AUTHOR

Wakaba <w@suika.fam.cx>.


HISTORY

This module was originally developed as part of SuikaWiki http://suika.fam.cx/~wakaba/wiki/sw/n/SuikaWiki.


LICENSE

Copyright 2008 Wakaba <w@suika.fam.cx>

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.