Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi there, I am far from understanding perl - but I have to get along with it. In other words, I'm completely new to this. :-(

So here's the problem I'm going to town with: I'd like to parse a line of text for a keyword, and then "split" the line at that place. I need the text prior to the match also, and the keywords may occur more than once.

To stick with the perlfaq example, let's say my text is

$restofline = "One fish two fish red fish blue fish";
To split the words "one", "two", "red" and "blue" I tried a simple match:
while($restofline =~ m/(.*)fish(.*)/) { $beginofline = $1; $restofline = $2; #.... do some mighty magic }
Sooo ....
What happens is that it always finds the LAST match instead of the first! The Problem seems to be the "(.*)" to the left of the keyword. As soon as I try
$restofline =~ m/fish(.*)/
it works fine. But how do I get the text to the left of the matched keyword then?

I tried several things like the g modifier, {1} and + match quantifiers, but to no avail. It either works like finding the last match or it doesn't work at all.

Can you guys please tell me what I did wrong and - even more important - how to do that right?

Thanks in advance, Faltblatt

Replies are listed 'Best First'.
Re: match last element instead of first??
by jettero (Monsignor) on Oct 17, 2007 at 17:07 UTC

    m/(.*)fish/ makes perl go all the way to the end of the string and then back up to find fish...

    m/^(.*?)fish/ tells perl to match as little as possible while still finding fish afterwards. It will only match zero characters though, unless you tell it to start from the beginning.

    It's all explained rather well in perlre. I read that every year or so still and I've been at this quite a while. I also recommend reading perldata and perlref on a semi-regular basis.

    -Paul

      ... and m/fish(.*?)$/ makes perl go to the end of the string, match as little as possible (from back to front), and then fish.

      Update: After rereading the original message that was not asked ;-). But it may still be worth noting that anchoring is a good idea if there is no compelling reason against it.

      Second update: ignore this node ;)

        No:

        perl -le "$_=shift; m/fish(.*?)$/ and print $1" "one fish two fish thr +ee fish"

        gives

        two fish three fish

        .*? will match as little as necessary, but it will not match as little as possible. The fish will still match at the leftmost position. Then, the .*? will match as little as necessary to make the match succeed, which still is the rest of the string.

Re: match last element instead of first??
by kyle (Abbot) on Oct 17, 2007 at 17:18 UTC

    I'm not sure I understand your question. That said, I'd recommend you look into using split for what you appear to be doing. Here's a quick example:

    use Data::Dumper; my $restofline = "One fish two fish red fish blue fish"; print Dumper( [ split /\s*fish\s*/, $restofline ] ); __END__ $VAR1 = [ 'One', 'two', 'red', 'blue' ];

    If for some reason, you really do have to process the string incrementally, maybe index is a good solution (provided your keyword really is static).

    If you really do what to use a regex, I wonder if a non-greedy match is what you're after. That is, "$restofline =~ m/(.*?)fish(.*)/".

Re: match last element instead of first??
by amarquis (Curate) on Oct 17, 2007 at 17:31 UTC

    Basically, what is happening is that when greedy constructs fight, the left one wins. In your pattern  m/(.*)fish(.*)/) There are two (.*) constructs that try to get as many characters as they can without failing the match. The first one gobbles up until the last fish (because if it ate the fish, your "fish" couldn't match), then the "fish" matches, and only then is the last (.*) let to grab as much as it can.

    As stated above, you either want the non-greedy version in front (if you want to split the string only on the first "fish") or the split function (which is appropriate if you don't mind splitting on each fish in the line).

    Also as stated above, read perlre early and often. I've read it in the last week and I was still thrown for a loop by Corion's post above, my interpretation of the pattern was very wrong.

Re: match last element instead of first??
by johngg (Canon) on Oct 17, 2007 at 17:50 UTC
    You can pull out your words into an array by using captures in a global match, like this

    use strict; use warnings; my $restOfLine = q{one fish two fish red fish blue fish}; my @words = $restOfLine =~ m{(\S+)\s+fish}g; print qq{$_\n} for @words;

    which produces

    one two red blue

    I hope this is of use.

    Cheers,

    JohnGG

Re: match last element instead of first??
by Anonymous Monk on Oct 18, 2007 at 09:19 UTC
    Hi again, I thank you all for your comments. By experience from other forums I somehow expected nothing more than a "RTFM, dork!". I'm very happy that didn't happen here. :-)

    Every time I have to try something with perl I have to read the docs, most time the perlre. I read that back and forth, from left to right, but didn't find the answer to my problem.

    And yes, m/(.*?)fish(.*)/ was exactly what I was looking for. I just overlooked two things:

    - the possibility to tie the "?" to something that's a match pattern itself
    - the idea of "greediness" in matches.
    Never, not in my darkest dreams, would I have come up with a solution like that. In my opinion a pattern has to find the leftmost match, period. But who am I to judge this - Perl is like that, go with it.

    And, as the perldocs say somewhere at the beginning: There's always more than one way to reach your goal. So I will look into the split function also, perhaps that's an alternative to my problem.

    So, to come to and end, I hope I don't have to ask a lot of questions here - but of course I will if there's another problem. ;-)

    Thanks again and have a nice day all,
    Faltblatt

Re: match last element instead of first??
by naikonta (Curate) on Oct 18, 2007 at 06:49 UTC
    Just as you ask, split the line. (assuming the separator is fixed).
    $ perl -le '$line = shift; @fish = split /\s*fish\s*/, $line; print fo +r @fish' 'One fish two fish red fish blue fish' One two red blue
    You may compare between regex and split solutions, if you like.

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!