arunhorne has asked for the wisdom of the Perl Monks concerning the following question:

Monks ~

I have a list of words in variable case that contains some synonyms such as , Arun, arun. I want to eliminate these. Now it is easy for me to do using lcfirst and I have done this. However, some names begin in square brackets or another symbol, e..g [Pyruvate], [pyruvate]. So what I need is a regular expression to perform an advanced version of lcfirst, i.e. it acts on the first word character rather than the first character? Is this possible?

Thanks, Arun

Edit kudra, 2002-05-07 Fixed unclosed code tag

Replies are listed 'Best First'.
Re: Regexes for Case Change
by Molt (Chaplain) on May 08, 2002 at 11:23 UTC

    One way to do this is the code below. It goes through the string finding a word boundary followed by a series of word-characters and another word boundary and replaces it with the lcfirst'ed version of the word.

    Another way would be to match on \b\w and replace with the lowercased version of the letter. To me though this version seems more readable. The other version may well be more efficient, but I think that's a question for the benchmarkers.

    #!/usr/bin/perl -w use strict; my $test = "This Is A [Big] [Nasty Old] [Test]"; $test =~ s/\b(\w+)\b/lcfirst $1/eg; print $test;

    Update: Okay, with the odd formatting I misread things as needing uppercasing.. fixed now though. If you're just trying to reduce everything to lowercase though just use 'lc $test'- It's quicker, easier, and does exactly what it says on the tin.

Re: Regexes for Case Change
by jmcnamara (Monsignor) on May 08, 2002 at 11:36 UTC

    You could use something like this:
    #!/usr/bin/perl -w use strict; my $line = "Hello world [Pyruvate] [pyruvate]\n"; $line =~ s/\b(\w+)\b/\l$1/g; print $line; __END__ prints: hello world [pyruvate] [pyruvate]

    However, this may be overkill if you can just lc() the entire line.

    --
    John.

Re: Regexes for Case Change
by rob_au (Abbot) on May 08, 2002 at 11:40 UTC
    If you are looking to drop the case on all of the characters in your string, you could easily perform this with the transliteration operator - For example:

    $string =~ tr [A-Z] [a-z];

     

      Ithink it's generally better to do this with the lc operator rather than tr since lc handles localisation character sets (Umlauts and so forth) and unicode properly.

      Not that I think it matters in this case, but it's probably one of those things where when you get into one style you may as well get into the one which won't make you trip when you expand what you're working with.


        I think it's generally better to do this with the lc operator rather than tr since lc handles localisation character sets

        Only if "use locale" is in effect. Otherwise the following is unlikely to do anything:     print uc 'ü';

        This assertion also depends on what the "general" case is considered to be. The general case is probably a single character set so a transliteration, as shown by rob_au, is probably sufficient.

        --
        John.

        While the perlfunc:tr operator may not handle localisation character sets, it does have the advantage over substitution of speed as it doesn't perform interpolation or use the regex engine. As such, the choice between functions really comes down to the data being manipulated and whether character and locale classes will come into effect.

        The transliteration solution was provided more so for proof of TMTOWTDI, YMMV.

         

Re: Regexes for Case Change
by arunhorne (Pilgrim) on May 08, 2002 at 11:20 UTC
    Sorry for my lame use of formatting, I clicked 'submit' when I meant preview having made changes
      Sorry for my lame use of formatting, I clicked 'submit' when I meant preview having made changes

      Just go back, click on the title, and you can edit again to your hearts content. That's how people Update their questions or replies. The manual is your friend. :)

      --t. alex

      "Nyahhh (munch, munch) What's up, Doc?" --Bugs Bunny