comment on

However, *capturing* just a non-greedy dot star will still suffer from having to test the remaining pattern (outside of the parens) at each step. Thus, the negated character class will perform a lot better in the following:

    cc => sub { "this is an amazingly long string" =~ /\s([^l]*)l/ },
    ds => sub { "this is an amazingly long string" =~ /\s(.*?)l/ },
[download]

However, both approaches *express* different things (they just happen to functionally coincide in the above). For some things, .*? is the right approach, for others, a negated character class is the right approach.

And, to add to japhy's additional warning regarding the stricter meaning of a negated character class, I'll offer another example. For those who do not see the potential difference in meaning and use of each approach, consider the following contrived example: I want to match (and extract) the first two fields of colon separated data, but only when the third field starts with an 'A' (let's not worry about whether split() would be a better approach for a minute):

#!/usr/bin/perl -w
use strict;
my %data;
while(<DATA>){
    next unless m/^(.*?):(.*?):A/;       # non-greedy DS
    #next unless m/^([^:]*):([^:]*):A/;  # negated CC
    $data{$1} = $2;
}
while( my($k,$v) = each %data) {
    print "$k => $v\n";
}
__DATA__
abc:123:A:B
def:456:A:C
ghi:789:B:A
jkl:000:C:C

OUTPUT:
non-greedy DS:
    abc => 123
    def => 456
    ghi => 789:B

negated CC:
    abc => 123
    def => 456
[download]

The non-greedy DS version doesn't work according the spec (only the first two lines have an 'A' in the 3rd field). That's because dot star part in (.*?): does not say "match only up to the next colon" (as some people occassionally believe it does), it says: "match as few (of *any* characters1) as we can and still have the remainder of the expression match". When the whole pattern is (.*?):, the end result (aside from efficiency) is the same --- but if the pattern that follows is more than a single character, things are not at all the same as a negated character class.

I only wanted to reiterate this because I've often seen beginners and more experienced programmer's make the mistake of thinking that the non-greedy dot star and a negated character class are interchangeable, and they simply aren't.

[1]  well 'any character' except a newline,  unless /s
[download]

In reply to Re: Re: Ovid, Long Live .*? (dot star question-mark) by danger
in thread Ovid, Long Live .*? (dot star question-mark) by japhy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.