comment on

Random thoughts on match operating efficiency

The difference between the maximally matched quantifier (.+) - greedy, and the minimally matched quantifier (.+?) - nongreedy, in the case of the +(1 or more) quantifier is what is matched but more importantly, how, or from where, it is matched

In the maximal case the match position begins from eol and backtracks a position at a time and checks for the match, repeating till success or starting match position is reached

In the nongreedy case the operator match position starts from the starting match postion and forward-tracks a character at a time until success or eol

application of + quantifier behaviour to ? quantifier behaviour:

applying this to the ?(0 or 1) quantifier, I would expect the matching start position differs in the case of a greedy match starting at 1 position ahead, and in the nongreedy case starting at the starting match position.

Random summation:

The difference is not in what is matched, but how, or from where, the matching starts. This effectively increases the nongreedy match efficiency by the reduction of one jump ahead operation per usage.

Just Random:

I would imagine this will have been internally optimised, unless (or even especially if) there is perhaps a security benefit of a look forward match opposed to a look behind match

update later the same day

crumbs, +(0 or 1) quantifier, well that is incorrect. This '+' is the (1 or more) quantifier.

ok so to fix the above example i have replaced the '*' quantifiers with '+' quantifiers. And I have replaced the '+' quantifiers with '?' quantifiers, so at least what I wrote makes sense. Which it does despite the syntax errors now rectified.

After attempting to provide some examples where differences would be found, between the default greedy and nongreedy behaviour indicated by a secondary '?' quantifier, I realised that you are right, there are no differences in what is matched, when the '\n' are included, and in agreement with davidos and my own response, being the difference is in how the match is carried out.

In reply to Re: Puzzled by regex by Don Coyote
in thread Puzzled by regex by syphilis

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.