Re^3: Date::Manip and German months names (solved)

a proper solution would of course have to dynamically construct the correct character set depending on the language being selected.

A simpler solution might be

    foreach $from (keys %{ $Lang{$L}{"Repl"} }) {
      $to=$Lang{$L}{"Repl"}{$from};

      utf8::upgrade($from);  # Use Unicode semantics for \b
      s/\b$from\b/$to/i;
    }
[download]

He's already assuming $from doesn't contains symbols since he's not using quotemeta, so using \b doesn't introduce any limitations.

My solution will also make "MÄR" work, unlike the current implementation and your proposed solution.

Update: Shoot! \w includes digits, so \b won't do. There's a POSIX class that includes just letters that does the trick:

      utf8::upgrade($from);  # Use Unicode semantics
      s/(^|[^[:alpha:]])$from($|[^[:alpha:]])/$1$to$2/i;
[download]

Update: As discovered below, what needs to be upgraded is the string on which s/// acts.

      utf8::upgrade($_);  # Use Unicode semantics
      s/(^|[^[:alpha:]])$from($|[^[:alpha:]])/$1$to$2/i;
[download]

Comment on Re^3: Date::Manip and German months names (solved) Select or Download Code

Replies are listed 'Best First'.
Re^4: Date::Manip and German months names (solved) by almut (Canon) on Jul 09, 2008 at 23:22 UTC
Yes, that looks like a good (simple) solution. Interestingly though `s/(^\|[^[:alpha:]])$from($\|[^[:alpha:]])/$1$to$2/i;` [download] only works for me when I `use locale` (which I may not necessarily want to do in this case), while `s/(^\|[^\p{IsAlpha}])$from($\|[^\p{IsAlpha}])/$1$to$2/i;` [download] does work without... My solution will also make "MÄR" work ...presuming other changes will be made as well — i.e. adding another list of month abbreviations to the definition of `$$d{"month_abb"}=...`	[reply] [d/l] [select]
Re^5: Date::Manip and German months names (solved) by ikegami (Patriarch) on Jul 10, 2008 at 00:15 UTC
Sounds like you forgot to use `utf8::upgrade($from);`. only works when I `use locale` No, using unicode semantics is enough. presuming other changes will be made as well No, using unicode semantics is enough. use HTML::Entities qw( decode_entities ); use locale qw(); my $lc = decode_entities('ä'); my $uc = decode_entities('Ä'); utf8::downgrade($uc); for (0..2) { if ($_ == 0) { utf8::downgrade($lc); locale->unimport(); print("Byte Semantics\n"); print("--------------\n"); } elsif ($_ == 1) { utf8::downgrade($lc); locale->import(); print("Locale Semantics\n"); print("----------------\n"); } elsif ($_ == 2) { utf8::upgrade($lc); locale->unimport(); print("Unicode Semantics\n"); print("-----------------\n"); } if ($lc =~ /^\Q$uc\E\z/) { print("case sensitive match\n"); } elsif ($lc =~ /^\Q$uc\E\z/i) { print("case insensitive match\n"); } else { print("no match\n"); } if ($lc =~ /^[[:alpha:]]\z/) { print("[:alpha:]\n"); } else { print("Not [:alpha:]\n"); } if ($lc =~ /^[\p{IsAlpha}]\z/) { print("\\p{IsAlpha}\n"); } else { print("Not \\p{IsAlpha}\n"); } print("\n"); } [download] `Byte Semantics -------------- no match Not [:alpha:] \p{IsAlpha} Locale Semantics ---------------- no match Not [:alpha:] \p{IsAlpha} Unicode Semantics ----------------- case insensitive match [:alpha:] \p{IsAlpha}` [download]	[reply] [d/l] [select]
Re^6: Date::Manip and German months names (solved) by almut (Canon) on Jul 10, 2008 at 00:54 UTC
Sounds like you forgot to use `utf8::upgrade($from);` No, I tried this: `#!/usr/bin/perl $Lang{$L}{"Repl"} = { "m" => "Monat" }; # mimic Date::Manip $_ = "Mär"; print "before: $_\n"; foreach $from (keys %{ $Lang{$L}{"Repl"} }) { $to=$Lang{$L}{"Repl"}{$from}; utf8::upgrade($from); # Use Unicode semantics s/(^\|[^[:alpha:]])$from($\|[^[:alpha:]])/$1$to$2/i; } print "after: $_\n";` [download] which prints `before: Mär after: Monatär` [download] Using `s/(^\|[^\p{IsAlpha}])$from($\|[^\p{IsAlpha}])/$1$to$2/i;` [download] in place of the above substitution does work fine, though (as does adding `use locale` to the `[:alpha:]` version): `before: Mär after: Mär` [download] presuming other changes will be made as well No, using unicode semantics is enough. I was referring to "MÄR" also working (in addition to "Mär") in the context of Date::Manip... which it won't unless that abbreviation is also being set up in the respective `$$d{"month_abb"}=...`	[reply] [d/l] [select]
Re^7: Date::Manip and German months names (solved) by ikegami (Patriarch) on Jul 10, 2008 at 01:38 UTC