Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am reading a file and need to find several matches within
a line, i have successfully managed the first match but
the second match is proving rather difficult.

$line1="12/123 11-abc 456 a 1/2 zyx this is the last"
$line2="14/456 11-abc 456 a 1/2 zyx this is it"

i want to match with 123 11 and extract '12/123 11-abc 456'
then proceed to check for 'zyx'and if it matches extract the remaining line

$line1 will result in a match on both counts and the variables are stored into 2 variables

$line 2 will not result in any match and i proceed to next line

Replies are listed 'Best First'.
Re: Extracting data after a match
by Enlil (Parson) on Jun 10, 2003 at 20:30 UTC
    one way to do it:
    while ( <DATA> ) { if ( /^(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?/ ) { print "THIS IS THE FIRST MATCH: $1" ,$/; print "THIS IS THE SECOND MATCH: $2" ,$/ if defined $2; } } __DATA__ 12/123 11-abc 456 a 1/2 zyx this is the last 1 13/123 11-abc 456 a 1/2 zyx this is the last 2 14/456 11-abc 456 a 1/2 zyx this is it 15/456 11-abc 456 a 1/2 zyx this is it 16/123 11-abc 456 a 1/2 zyx this is the last 3 17/123 11-abc 456 a 1/2 xyz this is the last 3
    The regex is thus explained by YAPE::Regex::Explain:
    C:\>perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new('^ +(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?')->explain" The regular expression: (?-imsx:^(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \/ '/' ---------------------------------------------------------------------- 123 11 '123 11' ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- (?= look ahead to see if there is: ---------------------------------------------------------------------- a ' a ' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- zyx 'zyx' ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    Though I think an easier approach might be just using an if construct like so:
    while ( <DATA> ) { if ( /^(\d+\/123 11.*?) a / ) { #first matched my $first = $1; my $second = $1 if /\G.*?zyx(.*)/; #check for next match print "First = $first\n"; #print $second only if it is defined; print "Second = $second\n" if defined $second; } }
    Dependent on what your data might actually look like, the regex's might need changing to better suit your needs.

    -enlil

Re: Extracting data after a match
by pzbagel (Chaplain) on Jun 10, 2003 at 20:04 UTC

    I don't know whether you want to catch the string before zyx, using the /g and \G may not be necessary if you don't care about it, although it would ensure that you get the right zyx. How about something like:

    $line[0]="12/123 11-abc 456 a 1/2 zyx this is the last"; $line[1]="14/456 11-abc 456 a 1/2 zyx this is it"; for (@line) { if(m|(\d+/123 11-[a-z]+ \d+)|g) { $start=$1; /\G(.*?) zyx (.*)$/; print "Start of record:$start\nBefore zyx:$1\nand Afte +r zyx:$2\n"; } } __OUTPUT__ Start of record:12/123 11-abc 456 Before zyx: a 1/2 and After zyx:this is the last