in reply to quick question about parenthesis and regular expressions

But it DOES work. If $word contains "absorb" in it at all $1 will be set to absorb, after the match.

One thing of note is that $1 will never match absorbs nor absorbed, as as it will match absorb first, the regex engine will consider the regex a success and stop. if on the other hand you want to match absorbed or absorbs you might want to change the order of the alternation around, so that the last thing it attempts to match is absorb. or for less backtracking you might try something like:

$word =~ /(absorb(?:s|ed)?)/i; print $1;

updated:added the ? as per ysth below. ysth++ for catching my oversight.

-enlil

Replies are listed 'Best First'.
Re: Re: quick question about parenthesis and regular expressions
by ysth (Canon) on Nov 04, 2003 at 10:15 UTC
    You need to say (?:s|ed)? or "absorb" won't match.
Re: Re: quick question about parenthesis and regular expressions
by cranberry13 (Beadle) on Nov 04, 2003 at 12:31 UTC
    Enlil--
    Thanks for the reg expression -- it works :)

    But here's an interesting problem, it seems if i'm using a regular expression with the substitution operator, the value of $1 comes up blank.

    Why is that?

    What I really want to do is this:

    if ($line =~ s/($word(?:s|ed))/<b>$1<\/b>/igm)

    assuming that:

    $line= "She was very absorbed in her homework."; $word = "absorb";

    I want it to match absorbed and but store the value of it in $1 so that I can use it later.

      Ummm, I have no problem doing the substitution using the code supplied:

      $line = "She was very absorbed in her homework."; $word = "absorb"; print STDOUT "Line: $line\n"; $line =~ s/($word(?:s|ed))/<b>$1<\/b>/igm; print STDOUT "New Line: $line\n"; print STDOUT "\$1 contains: $1\n";

      Of course, I might consider changing it slightly to read:

      $line =~ s/\b(${word}(?:s|ed)?)\b/<b>$1<\/b>/igm;

      However, I did just discover that there is something more going on here, because while the original (without the \b) does set something in $1, my change doesn't (despite doing the substitution properly).

      And finally, if you're just doing a match why are you using s///?

        It looks like the $1 is being lost because of the g modifier. That is, it matches, it replaces, it tries again. It starts the matching process again, finds the start of a possible match at the word "in", wipes out $1 etc, finds that the possible match isn't really, and then when the substitution loop finishes, you have lost $1.

        This appears to be a bug. (Report with perlbug if you like.) However I would also point out that any code which relies on the correct behaviour is likely to be buggy anyways - if the word appears multiple times then you won't catch all of the substitutions. If you really want to have fine access to all of the substitution information after the fact then you either need to write your own substitution loop (using matching with /g, pos and substr) or you need to embed code in the substitution. Like this:

        my @matches; $line =~ s/\b($word(?:s|ed))/ push @matches, $1; "<b>$1<\/b>" /iegm;
        You're right: the \bs in the pattern do seem to sabotage $1. In particular, any char in front of the () group kills the $1, if the g and i switches are both set.
        my $line = 'She was very absorbed in her homework.'; my $word = 'absorb'; $_=$line; s/ ($word)//g; print "\$1 is $1\n"; $_=$line; s/ ($word)//i; print "\$1 is $1\n"; $_=$line; s/ ($word)//ig; print "\$1 is $1\n";
        One other note:
        #This also fails s/(\b$word)//ig; #Although this is ok s/( $word)//ig; #And this is fine, too! my $pat = qr/\b($word)/; s/$pat//gi; #or even my $pat = qr/($word)/; s/\b$pat//gi; #or EVEN THIS! my $pat = qr/$word/; s/\b($word)//gi;
        I think we have a perlbug. Frenzy of updates completed. Really.