in reply to Re^2: Date::Manip and German months names (solved)
in thread Date::Manip and German months names

a proper solution would of course have to dynamically construct the correct character set depending on the language being selected.

A simpler solution might be

foreach $from (keys %{ $Lang{$L}{"Repl"} }) { $to=$Lang{$L}{"Repl"}{$from}; utf8::upgrade($from); # Use Unicode semantics for \b s/\b$from\b/$to/i; }

He's already assuming $from doesn't contains symbols since he's not using quotemeta, so using \b doesn't introduce any limitations.

My solution will also make "MÄR" work, unlike the current implementation and your proposed solution.

Update: Shoot! \w includes digits, so \b won't do. There's a POSIX class that includes just letters that does the trick:

utf8::upgrade($from); # Use Unicode semantics s/(^|[^[:alpha:]])$from($|[^[:alpha:]])/$1$to$2/i;

Update: As discovered below, what needs to be upgraded is the string on which s/// acts.

utf8::upgrade($_); # Use Unicode semantics s/(^|[^[:alpha:]])$from($|[^[:alpha:]])/$1$to$2/i;

Replies are listed 'Best First'.
Re^4: Date::Manip and German months names (solved)
by almut (Canon) on Jul 09, 2008 at 23:22 UTC

    Yes, that looks like a good (simple) solution.  Interestingly though

    s/(^|[^[:alpha:]])$from($|[^[:alpha:]])/$1$to$2/i;

    only works for me when I use locale (which I may not necessarily want to do in this case), while

    s/(^|[^\p{IsAlpha}])$from($|[^\p{IsAlpha}])/$1$to$2/i;

    does work without...

    My solution will also make "MÄR" work

    ...presuming other changes will be made as well — i.e. adding another list of month abbreviations to the definition of $$d{"month_abb"}=...

      Sounds like you forgot to use utf8::upgrade($from);.

      only works when I use locale

      No, using unicode semantics is enough.

      presuming other changes will be made as well

      No, using unicode semantics is enough.

      use HTML::Entities qw( decode_entities ); use locale qw(); my $lc = decode_entities('ä'); my $uc = decode_entities('Ä'); utf8::downgrade($uc); for (0..2) { if ($_ == 0) { utf8::downgrade($lc); locale->unimport(); print("Byte Semantics\n"); print("--------------\n"); } elsif ($_ == 1) { utf8::downgrade($lc); locale->import(); print("Locale Semantics\n"); print("----------------\n"); } elsif ($_ == 2) { utf8::upgrade($lc); locale->unimport(); print("Unicode Semantics\n"); print("-----------------\n"); } if ($lc =~ /^\Q$uc\E\z/) { print("case sensitive match\n"); } elsif ($lc =~ /^\Q$uc\E\z/i) { print("case insensitive match\n"); } else { print("no match\n"); } if ($lc =~ /^[[:alpha:]]\z/) { print("[:alpha:]\n"); } else { print("Not [:alpha:]\n"); } if ($lc =~ /^[\p{IsAlpha}]\z/) { print("\\p{IsAlpha}\n"); } else { print("Not \\p{IsAlpha}\n"); } print("\n"); }
      Byte Semantics -------------- no match Not [:alpha:] \p{IsAlpha} Locale Semantics ---------------- no match Not [:alpha:] \p{IsAlpha} Unicode Semantics ----------------- case insensitive match [:alpha:] \p{IsAlpha}
        Sounds like you forgot to use utf8::upgrade($from);

        No, I tried this:

        #!/usr/bin/perl $Lang{$L}{"Repl"} = { "m" => "Monat" }; # mimic Date::Manip $_ = "Mär"; print "before: $_\n"; foreach $from (keys %{ $Lang{$L}{"Repl"} }) { $to=$Lang{$L}{"Repl"}{$from}; utf8::upgrade($from); # Use Unicode semantics s/(^|[^[:alpha:]])$from($|[^[:alpha:]])/$1$to$2/i; } print "after: $_\n";

        which prints

        before: Mär after: Monatär

        Using

        s/(^|[^\p{IsAlpha}])$from($|[^\p{IsAlpha}])/$1$to$2/i;

        in place of the above substitution does work fine, though (as does adding use locale to the [:alpha:] version):

        before: Mär after: Mär

        presuming other changes will be made as well
        No, using unicode semantics is enough.

        I was referring to "MÄR" also working (in addition to "Mär") in the context of Date::Manip...  which it won't unless that abbreviation is also being set up in the respective $$d{"month_abb"}=...