Heidegger has asked for the wisdom of the Perl Monks concerning the following question:

I need to upper case strings Lithuanian. Couldn't make the setlocale() function work under Windows. So I have to write my own uc() function. So far I made my regular expression very simple and dumb:
s/š/Š/; s/ž/Ž/; s/¹/¥/; s/ê/Ê/;

Is there a way to make it nicer? Unfortunately, I couldn't think of anything better.

Replies are listed 'Best First'.
Re: Regular expression question
by MarkM (Curate) on Jan 16, 2003 at 10:21 UTC

    Assuming the translation is always one character for one character, the 'nice' way is to use the "transliteration" operator:

    tr/abc/ABC/;

    You may want to peek at the work being done in supporting unicode for Perl. I'm positive that either somebody has already done what you are looking for, or that somebody would appreciate your efforts as a contribution.

    Cheers!

Re: Regular expression question
by Hofmator (Curate) on Jan 16, 2003 at 12:37 UTC
    An alternative to using the tr/// suggested by MarkM might be to use a hash lookup table, sth like the following. This would also allow to substitute one character for multiple characters if you need that.
    my %hash = ( a => 'A', b => 'B', ... ); s/(.)/$hash{$1}/eg;
    Though I would suggest benchmarking to find out what works best for you (if you have a one to one correspondence between characters then tr/// should be quicker than my suggestion here).

    -- Hofmator

      s/(.)/$hash{$1}/eg
      Will end up removing all chars in the string that are not specifically listed in the hash.

      I also believe that tr/// is always faster than s///.

      #!/usr/local/bin/perl use strict; use Benchmark; timethese( 1000000, { trchange => \&trchange, schange => \&schange, }); sub trchange { my $s = q{123 abc def a b a b a b}; $s =~ tr/a/A/; } sub schange { my $s = q{123 abc def a b a b a b}; $s =~ s/a/A/g; } # Benchmark: timing 1000000 iterations of schange, trchange... # schange: 21 wallclock secs (21.39 usr + 0.01 sys = 21.40 CPU) @ 4 +6728.97/s (n=1000000) # trchange: 7 wallclock secs ( 6.41 usr + 0.01 sys = 6.42 CPU) @ 1 +55763.24/s (n=1000000)
      Wonko