comment on

The immediate reason that this very complex regex stops where it does is that it doesn't take into account the possibility of "S.txt" in the middle of the line. Thus it stops matching when it bumps into that part of the line. Only matches in the part before "S.txt" are getting printed out. However, even without that annoying "S.txt", there are several other problems with your regex and I also wonder if we might have an XY Problem.

Judging by your response to targetsmart and the code you've shown above, you seem to just want to strip the introductory label and replace runs of spaces with "==". if that is the case then why not do something simple like this:

my $sNoLabel = substr($var, length("FILES CHECKED IN:"));
$sNoLabel =~ s/\s+/==/g;
print "MATCH$sNoLabel\n";
[download]

On the other hand, if you wanted to do something else, e.g. print out "TEST" and the version numbers but skip past that pesky "S.txt", then it might help to understand a bit better what the regex you currently have is doing. I have a feeling it is not what you think:

[\s] - the square brackets are ok but unnecessary. You only need square brackets if you need a list of letters or symbols, e.g. even though \s matches all sorts of whitespace it is only one symbol, so plain \s is ok. On the other hand if you want to match the letter a and all whitespace, *then* you would need square brackets, like this: [\sa] or [a\s].
(.*?) - were you hoping to match TEST? a non-greedy wildcard expression is almost always a problem. Please try to avoid this construct - it rarely matches what one thinks and there are usually much better choices. In this case, you might consider something like \w+ or \S+ \w matches any "word" character, i.e. letter, digit, or underscore. \S matches any non-whitespace.
(\d+\.\d+(\.*\d*)*) - in addition to CVS reversion numbers you were hoping to match, this also matches things like "1.23......33..456" and "1.23." Also if the version number has more than 2 segments, each extra segment gets captured in its own separate variable. Again probably not what you want. To clean this up and capture only one variable for the whole revision number, you'll need to use "non-capturing" regular expressions. They look like (?:blah) rather than (blah). You'll also need to get rid of all of the * except the one at the end. The cleaned-up regular expression looks like this: (\d+\.\d+(?:\.\d+)*)
(\d+\.\d+(\.*\d*)*)\s+(\d+\.\d+(\.*\d*)*|NONE)* - this has the same problems as your revision number regex before. But even if these were fixed, the regex would match either "revnum spaces revnum" or the word "NONE" an arbitrary number of times. Is this really your intent? Did you mean to say that revision numbers after the first one always come in pairs? That the word NONE can follow the first revision number found?

Putting this altogether here is how we would capture "TEST" and the revision numbers, but skip past "S.txt":

#move match start forward to after label
# g tells Perl to save the point where we stopped matching
$var =~ /^FILES CHECKED IN:\s+/g;

#capture label and move match start to after TEST
my ($sName) = ($var =~ /(\w+)/g);

#capture the remaining revision numbers (ignore S.txt)
my @aRevnums = ($var =~ /(\d+\.\d+(?:\.\d+)*)/g);

print "$sName: @aRevnums\n";
[download]

Best, beth

In reply to Re: Repeated Pattern Matching Fails by ELISHEVA
in thread Repeated Pattern Matching Fails by kapila

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.