while ( <DATA> ) {
if ( /^(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?/ ) {
print "THIS IS THE FIRST MATCH: $1" ,$/;
print "THIS IS THE SECOND MATCH: $2" ,$/ if defined $2;
}
}
__DATA__
12/123 11-abc 456 a 1/2 zyx this is the last 1
13/123 11-abc 456 a 1/2 zyx this is the last 2
14/456 11-abc 456 a 1/2 zyx this is it
15/456 11-abc 456 a 1/2 zyx this is it
16/123 11-abc 456 a 1/2 zyx this is the last 3
17/123 11-abc 456 a 1/2 xyz this is the last 3
The regex is thus explained by YAPE::Regex::Explain:C:\>perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new('^
+(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?')->explain"
The regular expression:
(?-imsx:^(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
\/ '/'
----------------------------------------------------------------------
123 11 '123 11'
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
a ' a '
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
zyx 'zyx'
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
Though I think an easier approach might be just using an if construct like so:while ( <DATA> ) {
if ( /^(\d+\/123 11.*?) a / ) { #first matched
my $first = $1;
my $second = $1 if /\G.*?zyx(.*)/; #check for next match
print "First = $first\n";
#print $second only if it is defined;
print "Second = $second\n" if defined $second;
}
}
Dependent on what your data might actually look like, the regex's might need changing to better suit your needs.-enlil
|