in reply to The Unicode Bug with Transliteration or Substitution

It could be :) so you have sample data to play with?

Have you tried utf8::upgrade($string)? Maybe you can try Unicode::Semantics

  • Comment on Re: The Unicode Bug with Transliteration or Substitution

Replies are listed 'Best First'.
Re^2: The Unicode Bug with Transliteration or Substitution
by choroba (Cardinal) on May 03, 2014 at 20:07 UTC
    You can use the Japanese Wikipedia Perl page . Perl 5.8.3 at work outputs different files for
    tr/ / /s; tr/\t/ /s;
    and

    s/ +/ /g; s/\t+/ /g;

    I tested with diff -w against the original, i.e. ignoring whitespace.

    utf8::upgrade didn't change anything, before or after the substitution/transliteration.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ