Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have data like .... in multiline string $lines
___DATA___ #Pattern 1 - aaa <anything> bbb <anything> ccc <whitespaces> dddd aaa xxxxxxx bbb ccc dddd ------------------------------------- # Pattern 2 - aaa <anything> bbb <anything> ccc <whitespaces> dddd aaa xxxxxxxxxxxxxxxxxxxxx xxxxxxx bbb xxxxxxxxxxxxxxxxxxxxx xxxx xx ccc dddd
I want
print $#{[$lines =~ /aaa.*bbb.*ccc\s*dddd/gsi]} + 1 . "\n";
to give me 2. But it gives me 1. I assume that the culprut is .* Any way to get around this?

Replies are listed 'Best First'.
Re: regex matches more than I want
by duff (Parson) on Dec 02, 2003 at 14:38 UTC

    The easiest way is to change your .* to be an appropriate negated character class or to use the ? modifier on the greedy * to make it less greedy.

    1. print  $#{[$lines =~ /aaa[^b]*bbb[^c]*ccc\s*dddd/gsi]} + 1 . "\n";  # or ...
    2. print  $#{[$lines =~ /aaa.*?bbb.*?ccc\s*dddd/gsi]} + 1 . "\n";

    See the documentation in perlre and perlrequick and perlretut

      Thanks guys for all the help !!!
Re: regex matches more than I want
by ChrisR (Hermit) on Dec 02, 2003 at 13:44 UTC
    .* is very greedy. Try this instead:
    print $#{[$lines =~ /aaa(.*?)bbb(.*?)ccc\s*dddd/gsi]} + 1 . "\n";

    Update:
    I went under the assumption that:
    $lines = "aaa xxxxxxx bbb ccc dddd";
    print  $#{[$lines =~ /aaa(.*?)bbb(.*?)ccc\s*dddd/gsi]} + 1 . "\n"; will return 2.

    If
    $lines = " aaa xxxxxxx bbb ccc dddd aaa xxxxxxxxxxxxxxxxxxxxx xxxxxxx bbb xxxxxxxxxxxxxxxxxxxxx xxxx xx ccc dddd";
    print  $#{[$lines =~ /aaa(.*?)bbb(.*?)ccc\s*dddd/gsi]} + 1 . "\n"; will return 4.

    If you use
    $lines = " aaa xxxxxxx bbb ccc dddd aaa xxxxxxxxxxxxxxxxxxxxx xxxxxxx bbb xxxxxxxxxxxxxxxxxxxxx xxxx xx ccc dddd"; print $#{[$lines =~ /aaa.*?bbb.*?ccc\s*dddd/gsi]} + 1 . "\n";
    It will return 2.
    I'm not sure if thes helps or not but my guess is the last solution I show is what you want.