jeanluca has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

Well, the aim of the following test script is to extract the text behind the b till the end of that line:
#! /usr/bin/perl $a = " a 10 b a2 s2 c 30" ; ($c) = $a =~ /^b\s+(.*?)$/m ; print "c is $c\n" ;
This works great. However, it is also possible that nothing follows b..... In that particular case I would like $c tobe empty. However this is not the case, it contains: c 30.
Any suggestions how to change the regexp ?

Thanks a lot in advance
Luca

Replies are listed 'Best First'.
Re: regexp need a match characters and spaces
by GrandFather (Saint) on Mar 23, 2006 at 10:48 UTC

    use \n in place of $:

    (my $c) = $a =~ /^b\s+(.*?)\n/m;

    DWIM is Perl's answer to Gödel
      This works for the OP's code, but would force the need for a trailing newline if the OP wanted to match what came after c.

      If you swap the regex for the following you'll remove that requirement. Note the I've had to swap out the \s+ as this was scooping up the newline character if there was no data on the line, resulting in the regex matching the whole of the following line.

      /^b +([^\n]*)$/m;
      ---
      my name's not Keith, and I'm not reasonable.
Re: regexp need a match characters and spaces
by codeacrobat (Chaplain) on Mar 23, 2006 at 14:48 UTC
    Why do this in one shot?
    A safe approach extracting the content is:
    $a = " a 10 b a2 s2 c 30"; @_ = split /\n/,$a; ($c) = map {/^b\s+(.*)/} @_;
Re: regexp need a match characters and spaces
by jeanluca (Deacon) on Mar 23, 2006 at 11:44 UTC
    I see the problem too with the \s+, which is scooping up the newline character. Any suggestions if this can be done with negative look-behind or something like that ?

    Luca
      You could try the following as it seems to cope with spaces between the b and it's newline and with b with an immediate newline by making the spaces optional and also making a character class that excludes newline, unlike \s.

      Set up some text to try and set up the regular expression.

      #!/usr/local/bin/perl # use strict; use warnings; our @tryThese = ( "a 10\nb a2 s2\nc 30", "a 10\nb\nc 30", "a 10\nb \nc 30", "a 10\nb a2 s2 \nc 30"); our $c; our $rxAfterB = qr{(?m)^b[\x20\x09]*([^\n]*)};

      Loop over @tryThese printing out what we are testing, doing the match then showing the result.

      foreach my $a (@tryThese) { print "\n\$a contains ...\n"; { local $" = "<--\n"; print "@{[split /\n/, $a]}\n"; } ($c) = $a =~ /$rxAfterB/; print "c is -->$c<--\n\n"; }

      When run this produces

      $a contains ... a 10<-- b a2 s2<-- c 30 c is -->a2 s2<-- $a contains ... a 10<-- b<-- c 30 c is --><-- $a contains ... a 10<-- b <-- c 30 c is --><-- $a contains ... a 10<-- b a2 s2 <-- c 30 c is -->a2 s2 <--

      I hope this is of use.

      Cheers,

      JohnGG

      Update: The slashes around the compiled regular expression in the line ($c) = $a =~ /$rxAfterB/; are superfluous. The line should read

      ($c) = $a =~ $rxAfterB;

      JohnGG

      I think the solution given by codeacrobat is the best one, but if you wanted to insist on working with the one scalar value containing multiple lines, maybe something like this:
      $a = " a 10 b a2 s2 c 30"; ( $c ) = $a =~ /^b[ \t]+(.*)/m; print "$c\n";
      That is, simply make sure you don't include "\n" among the kinds of white-space that can follow "b" in order to yield a match.

      (BTW, you want everything on the line that starts with "b", and the "m" modifier on the regex does not affect the behavior of "." -- it still will not match "\n", so the question mark and dollar sign in the OP version -- (.*?)$ -- are redundant here.)

Re: regexp need a match characters and spaces
by berntsmr (Novice) on Mar 25, 2006 at 06:07 UTC
    I would suggest a lookahead for a new line... It will work in this case, but not if the b row is the last row in your parsed variable (unless you force it to have a newline as the last character always). e.g. ($c) = $a =~ m{^b\s+(.*?)(?>\n)$}m ; Needs some tweaking if b is the last row... -m
Re: regexp need a match characters and spaces
by doc_faustroll (Scribe) on Mar 27, 2006 at 05:22 UTC
    In general, you probably don't want to pull out the m or s modifiers unless you are going to attempt to make a match across lines. Wisdom, if there were such, would follow the principle of parsimony. Solve simple problems simply, and save the bells and whistles for harder problems.

    My advice would be to process any multiline string in which you are only seeking to match within individual lines as an array of lines to match against individually. You've run afoul here of exactly the kind of problem that you should not even need to hanker with.

    There is a reason why the newline character is such an important part of the regexp world. Don't go there, unless you have to. And when you do, you'll learn certain practices that become more important when matching across lines. Until, then solve your simple problems simply and move on.

    Explaining to you the finer points of regexp design is not even appropriate at this point, as the principle of parsimony has precedence.