in reply to Regex problem
if( $stdout =~ /(?:OVERVIEW|SUMMARY)(.+)(?:AFFECTED\sPRODUCTS|BACKGROU +ND)/s ) { print "$1\n"; }
You said you want to pick out everything between "OVERVIEW" and "AFFECTED PRODUCTS", so why does your regex also look for "SUMMARY" and "BACKGROUND"?
Here's what's happening: Your first grouping looks for OVERVIEW or SUMMARY. Your last grouping looks for AFFECTED PRODUCTS or BACKGROUND. Between those, your capture of (.+) is greedy, so it will try to match as much as possible. So your first grouping matches OVERVIEW, then your capture greedily matches all the way to the end of the string, then starts working backwards until it finds a point where your last grouping can match. Since BACKGROUND comes later in the string than AFFECTED PRODUCTS, BACKGROUND gets matched.
To put it another way, by matching BACKGROUND instead of AFFECTED PRODUCTS, the regex is able to give the longest possible string to your greedy capture in the middle. To fix it, the short answer is to make that captured match non-greedy by changing it to (.+?). A better answer would require understanding why you have those other words in there if you don't need to match them.
Aaron B.
Available for small or large Perl jobs; see my home node.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Regex problem
by jayto (Acolyte) on Aug 01, 2012 at 13:03 UTC |