in reply to Re^4: Date::Manip and German months names (solved)
in thread Date::Manip and German months names

Sounds like you forgot to use utf8::upgrade($from);.

only works when I use locale

No, using unicode semantics is enough.

presuming other changes will be made as well

No, using unicode semantics is enough.

use HTML::Entities qw( decode_entities ); use locale qw(); my $lc = decode_entities('ä'); my $uc = decode_entities('Ä'); utf8::downgrade($uc); for (0..2) { if ($_ == 0) { utf8::downgrade($lc); locale->unimport(); print("Byte Semantics\n"); print("--------------\n"); } elsif ($_ == 1) { utf8::downgrade($lc); locale->import(); print("Locale Semantics\n"); print("----------------\n"); } elsif ($_ == 2) { utf8::upgrade($lc); locale->unimport(); print("Unicode Semantics\n"); print("-----------------\n"); } if ($lc =~ /^\Q$uc\E\z/) { print("case sensitive match\n"); } elsif ($lc =~ /^\Q$uc\E\z/i) { print("case insensitive match\n"); } else { print("no match\n"); } if ($lc =~ /^[[:alpha:]]\z/) { print("[:alpha:]\n"); } else { print("Not [:alpha:]\n"); } if ($lc =~ /^[\p{IsAlpha}]\z/) { print("\\p{IsAlpha}\n"); } else { print("Not \\p{IsAlpha}\n"); } print("\n"); }
Byte Semantics -------------- no match Not [:alpha:] \p{IsAlpha} Locale Semantics ---------------- no match Not [:alpha:] \p{IsAlpha} Unicode Semantics ----------------- case insensitive match [:alpha:] \p{IsAlpha}

Replies are listed 'Best First'.
Re^6: Date::Manip and German months names (solved)
by almut (Canon) on Jul 10, 2008 at 00:54 UTC
    Sounds like you forgot to use utf8::upgrade($from);

    No, I tried this:

    #!/usr/bin/perl $Lang{$L}{"Repl"} = { "m" => "Monat" }; # mimic Date::Manip $_ = "Mär"; print "before: $_\n"; foreach $from (keys %{ $Lang{$L}{"Repl"} }) { $to=$Lang{$L}{"Repl"}{$from}; utf8::upgrade($from); # Use Unicode semantics s/(^|[^[:alpha:]])$from($|[^[:alpha:]])/$1$to$2/i; } print "after: $_\n";

    which prints

    before: Mär after: Monatär

    Using

    s/(^|[^\p{IsAlpha}])$from($|[^\p{IsAlpha}])/$1$to$2/i;

    in place of the above substitution does work fine, though (as does adding use locale to the [:alpha:] version):

    before: Mär after: Mär

    presuming other changes will be made as well
    No, using unicode semantics is enough.

    I was referring to "MÄR" also working (in addition to "Mär") in the context of Date::Manip...  which it won't unless that abbreviation is also being set up in the respective $$d{"month_abb"}=...

      Interesting. I would have thought that promoting the regexp would have been enough. Anyway, the fix to promote the string on which s/// acts. Change
      utf8::upgrade($from);  # Use Unicode semantics
      to
      utf8::upgrade($_);  # Use Unicode semantics