perlpal has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monks ,

I need to perform multiple operations on a string

- remove special characters like (% ,(,),*)

- replace continuous white spaces with a single underscore

- replace all uppercase letters with lowercase letters

Currently , i have written the following code snippet to accomplish the same

$string =~ s/ \(.*\)//g; $string =~ s/ {1,}/_/g; $string = lc ($string);

Is there a more optimized approach for the same?

Replies are listed 'Best First'.
Re: Regex operation optimisation
by jethro (Monsignor) on Jul 02, 2010 at 09:57 UTC

    if you need to remove only matching (), your code won't do. Since '.' will match any character including ')' and since it is a greedy match, it will only eliminate the outmost parenthesis and (!)everything in between. Also the space before the parenthesis would have to match. You could use

    s/\([^)]*\)//g #eliminate anything in parenthesis or s/\(([^)]*)\)/$1/g #remove matching parens but not the content
    but you would have to repeat that regex until no match was found anymore. Except if you only have only one level of parenthesis, then one execution of the regex would be enough

    removing characters can be done with one regex using a character class:

    s/[%*()]//g;

    even parenthesis if you don't mind that they don't match

    you could simply use + instead of {1,}

    UPDATE to correct a very silly mistake indeed: changed ? to +

      I think that ? means {0,1}. Possibly + would be more appropriate.

      Also, if you wish to eliminate all whitespace characters, which includes newlines, tabs, etc., then \s+ might be the way to go.

      A non-greedy quantifier could be used to match only the first closing parenthesis.

      s/\(.*?\)//
        That turns "foo (bar (baz) qux) quux" into "foo  qux) quux". I'd be very surprised if the OP wants that to happen.
Re: Regex operation optimisation
by jwkrahn (Abbot) on Jul 02, 2010 at 12:31 UTC
    - remove special characters like (% ,(,),*)
    $string =~ tr/%()*//d;
    - replace continuous white spaces with a single underscore
    $string =~ tr/ /_/s;

        perlpal's example was $string =~ s/ {1,}/_/g; which matches a space character while your example $string =~ s/\s+/\_/g; uses the whitespace character class (\s) which matches "\t", "\r", "\n" and "\f" as well as the space character.    In general, the tr/// operator is more efficient then the s/// operator so I try to use it when appropriate.