in reply to Extracting data after a match

one way to do it:
while ( <DATA> ) { if ( /^(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?/ ) { print "THIS IS THE FIRST MATCH: $1" ,$/; print "THIS IS THE SECOND MATCH: $2" ,$/ if defined $2; } } __DATA__ 12/123 11-abc 456 a 1/2 zyx this is the last 1 13/123 11-abc 456 a 1/2 zyx this is the last 2 14/456 11-abc 456 a 1/2 zyx this is it 15/456 11-abc 456 a 1/2 zyx this is it 16/123 11-abc 456 a 1/2 zyx this is the last 3 17/123 11-abc 456 a 1/2 xyz this is the last 3
The regex is thus explained by YAPE::Regex::Explain:
C:\>perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new('^ +(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?')->explain" The regular expression: (?-imsx:^(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \/ '/' ---------------------------------------------------------------------- 123 11 '123 11' ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- (?= look ahead to see if there is: ---------------------------------------------------------------------- a ' a ' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- zyx 'zyx' ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Though I think an easier approach might be just using an if construct like so:
while ( <DATA> ) { if ( /^(\d+\/123 11.*?) a / ) { #first matched my $first = $1; my $second = $1 if /\G.*?zyx(.*)/; #check for next match print "First = $first\n"; #print $second only if it is defined; print "Second = $second\n" if defined $second; } }
Dependent on what your data might actually look like, the regex's might need changing to better suit your needs.

-enlil