Repeated Pattern Matching Fails

kapila has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Repeated Pattern Matching Fails by ELISHEVA (Prior) on Apr 14, 2009 at 10:35 UTC
The immediate reason that this very complex regex stops where it does is that it doesn't take into account the possibility of "S.txt" in the middle of the line. Thus it stops matching when it bumps into that part of the line. Only matches in the part before "S.txt" are getting printed out. However, even without that annoying "S.txt", there are several other problems with your regex and I also wonder if we might have an XY Problem. Judging by your response to targetsmart and the code you've shown above, you seem to just want to strip the introductory label and replace runs of spaces with "==". if that is the case then why not do something simple like this: `my $sNoLabel = substr($var, length("FILES CHECKED IN:")); $sNoLabel =~ s/\s+/==/g; print "MATCH$sNoLabel\n";` [download] On the other hand, if you wanted to do something else, e.g. print out "TEST" and the version numbers but skip past that pesky "S.txt", then it might help to understand a bit better what the regex you currently have is doing. I have a feeling it is not what you think: `[\s]` - the square brackets are ok but unnecessary. You only need square brackets if you need a list of letters or symbols, e.g. even though \s matches all sorts of whitespace it is only one symbol, so plain \s is ok. On the other hand if you want to match the letter a and all whitespace, then you would need square brackets, like this: `[\sa]` or `[a\s]`. `(.?)` - were you hoping to match TEST? a non-greedy wildcard expression is almost always a problem. Please try to avoid this construct - it rarely matches what one thinks and there are usually much better choices. In this case, you might consider something like `\w+` or `\S+` `\w` matches any "word" character, i.e. letter, digit, or underscore. `\S` matches any non-whitespace. `(\d+\.\d+(\.\d))` - in addition to CVS reversion numbers you were hoping to match, this also matches things like "1.23......33..456" and "1.23." Also if the version number has more than 2 segments, each extra segment gets captured in its own separate variable. Again probably not what you want. To clean this up and capture only one variable for the whole revision number, you'll need to use "non-capturing" regular expressions. They look like `(?:blah)` rather than `(blah)`. You'll also need to get rid of all of the * except the one at the end. The cleaned-up regular expression looks like this: `(\d+\.\d+(?:\.\d+))` `(\d+\.\d+(\.\d))\s+(\d+\.\d+(\.\d)\|NONE)` - this has the same problems as your revision number regex before. But even if these were fixed, the regex would match either "revnum spaces revnum" or the word "NONE" an arbitrary number of times. Is this really your intent? Did you mean to say that revision numbers after the first one always come in pairs? That the word NONE can follow the first revision number found? Putting this altogether here is how we would capture "TEST" and the revision numbers, but skip past "S.txt": `#move match start forward to after label # g tells Perl to save the point where we stopped matching $var =~ /^FILES CHECKED IN:\s+/g; #capture label and move match start to after TEST my ($sName) = ($var =~ /(\w+)/g); #capture the remaining revision numbers (ignore S.txt) my @aRevnums = ($var =~ /(\d+\.\d+(?:\.\d+)*)/g); print "$sName: @aRevnums\n";` [download] Best, beth	[reply] [d/l] [select]
Re^2: Repeated Pattern Matching Fails by kapila (Acolyte) on Apr 14, 2009 at 11:40 UTC
Thanks Beth & Parv for your replies. Beth thanks for correcting me in my pattern. I basically require a pattern wherein, it matches the lines- `$var="Files Checked IN: test/abc.txt 1.23 1.3.4.5 team/hello.cpp 2.1 NONE Clear/thing.pl NONE 1.2.34 etc etc";` [download] Read more... (238 Bytes)	[reply] [d/l]
Re^3: Repeated Pattern Matching Fails by shmem (Chancellor) on Apr 14, 2009 at 12:11 UTC
You don't need a pattern for that, split will do: `use Data::Dumper; my %found; while (<DATA>) { my ($file, @vers) = split " "; if (-f $file) { $found{$file} = [ @vers ]; } } print Dumper(\%found); __DATA__ Files Checked IN: test/abc.txt 1.23 1.3.4.5 team/hello.cpp 2.1 NONE Clear/thing.pl NONE 1.2.34 etc etc` [download] Output: `$VAR1 = { 'Clear/thing.pl' => [ 'NONE', '1.2.34' ], 'team/hello.cpp' => [ '2.1', 'NONE' ], 'test/abc.txt' => [ '1.23', '1.3.4.5' ] };` [download] You write Need to call this pattern in one line..(requirement of my script) Do you mind to tell why you do have such a requirement? <update> Anyways, you want the /g modifier on your pattern, and thus you can't say `if ( $var =~ m{$pattern}g ) {` [download] because that will match only once (but without resetting the match location). If you really have to match globally and evaluate the result in a scalar (or boolean) context, you have to write something like `if ( (@matches = $var =~ m{$pattern}g ) > 0) {` [download] which is ugly and not very readable. Anyways, here it is... `$var="Files Checked IN: test/abc.txt 1.23 1.3.4.5 team/hello.cpp 2.1 NONE Clear/thing.pl NONE 1.2.34 etc etc"; my $pattern = '([\w./]+)\s+([\d.]+\|NONE)\s+([\d.]+\|NONE)'; if ( ( @matches = $var =~ m{$pattern}g ) >0 ) { print join('\|',@matches),$/; }` [download] </update>	[reply] [d/l] [select]
Re^4: Repeated Pattern Matching Fails by kapila (Acolyte) on Apr 15, 2009 at 08:51 UTC
Re^5: Repeated Pattern Matching Fails by parv (Parson) on Apr 15, 2009 at 10:10 UTC
Re: Repeated Pattern Matching Fails by targetsmart (Curate) on Apr 14, 2009 at 08:17 UTC
`@match = $var =~ /\d+\.\d+/g;` [download] will this help you? Again what you want to match exactly?. Vivek -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.	[reply] [d/l]
Re^2: Repeated Pattern Matching Fails by kapila (Acolyte) on Apr 14, 2009 at 09:34 UTC
Thanks vivek for reply. But this pattern will not grep the word string. The pattern i wrote is matching upto certain limit, instead of matching repeatedly. if i do m/($pattern)*/g it won't work.. What do u suggest?	[reply]
Re: Repeated Pattern Matching Fails by parv (Parson) on Apr 14, 2009 at 10:12 UTC
Quite unusal code style you seems to have adopted, if code indeed is indicative of a style. Could you please post the example of `$var` when the pattern does not match more than once? Also, would you please include everything (an example of each line, that is) that you are trying to match? Please post the expected output from your example input, too.	[reply] [d/l]