Re: Extracting data after a match

one way to do it:

while ( <DATA> ) {
  if ( /^(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?/ ) {
    print "THIS IS THE FIRST MATCH: $1" ,$/;
    print "THIS IS THE SECOND MATCH: $2" ,$/ if defined $2;
  }

}

__DATA__
12/123 11-abc 456 a 1/2 zyx this is the last 1
13/123 11-abc 456 a 1/2 zyx this is the last 2
14/456 11-abc 456 a 1/2 zyx this is it
15/456 11-abc 456 a 1/2 zyx this is it
16/123 11-abc 456 a 1/2 zyx this is the last 3
17/123 11-abc 456 a 1/2 xyz this is the last 3
[download]

The regex is thus explained by YAPE::Regex::Explain:

C:\>perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new('^
+(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?')->explain"
The regular expression:

(?-imsx:^(\d+\/123 11.*?)(?= a )(?:.*?zyx(.*))?)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
----------------------------------------------------------------------
    \/                       '/'
----------------------------------------------------------------------
    123 11                   '123 11'
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
     a                       ' a '
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
    zyx                      'zyx'
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      .*                       any character except \n (0 or more
                               times (matching the most amount
                               possible))
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

Though I think an easier approach might be just using an if construct like so:

while ( <DATA> ) {
  if ( /^(\d+\/123 11.*?) a / ) { #first matched
    my $first = $1;
    my $second = $1 if /\G.*?zyx(.*)/; #check for next match
    print "First = $first\n";
    #print $second only if it is defined;
    print "Second = $second\n" if defined $second; 
  }

}
[download]

Dependent on what your data might actually look like, the regex's might need changing to better suit your needs.

-enlil

Comment on Re: Extracting data after a match Select or Download Code