Extracting data after a match

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Extracting data after a match by Enlil (Parson) on Jun 10, 2003 at 20:30 UTC
one way to do it: `while ( <DATA> ) { if ( /^(\d+\/123 11.?)(?= a )(?:.?zyx(.))?/ ) { print "THIS IS THE FIRST MATCH: $1" ,$/; print "THIS IS THE SECOND MATCH: $2" ,$/ if defined $2; } } __DATA__ 12/123 11-abc 456 a 1/2 zyx this is the last 1 13/123 11-abc 456 a 1/2 zyx this is the last 2 14/456 11-abc 456 a 1/2 zyx this is it 15/456 11-abc 456 a 1/2 zyx this is it 16/123 11-abc 456 a 1/2 zyx this is the last 3 17/123 11-abc 456 a 1/2 xyz this is the last 3` [download] The regex is thus explained by YAPE::Regex::Explain: C:\>perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new('^ +(\d+\/123 11.?)(?= a )(?:.?zyx(.))?')->explain" The regular expression: (?-imsx:^(\d+\/123 11.?)(?= a )(?:.?zyx(.))?) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \/ '/' ---------------------------------------------------------------------- 123 11 '123 11' ---------------------------------------------------------------------- .? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- (?= look ahead to see if there is: ---------------------------------------------------------------------- a ' a ' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- (?: group, but do not capture (optional (matching the most amount possible)): ---------------------------------------------------------------------- .? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- zyx 'zyx' ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- . any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- )? end of grouping ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] Though I think an easier approach might be just using an if construct like so: `while ( <DATA> ) { if ( /^(\d+\/123 11.?) a / ) { #first matched my $first = $1; my $second = $1 if /\G.?zyx(.*)/; #check for next match print "First = $first\n"; #print $second only if it is defined; print "Second = $second\n" if defined $second; } }` [download] Dependent on what your data might actually look like, the regex's might need changing to better suit your needs. -enlil	[reply] [d/l] [select]
Re: Extracting data after a match by pzbagel (Chaplain) on Jun 10, 2003 at 20:04 UTC
I don't know whether you want to catch the string before zyx, using the /g and \G may not be necessary if you don't care about it, although it would ensure that you get the right zyx. How about something like: `$line[0]="12/123 11-abc 456 a 1/2 zyx this is the last"; $line[1]="14/456 11-abc 456 a 1/2 zyx this is it"; for (@line) { if(m\|(\d+/123 11-[a-z]+ \d+)\|g) { $start=$1; /\G(.?) zyx (.)$/; print "Start of record:$start\nBefore zyx:$1\nand Afte +r zyx:$2\n"; } } __OUTPUT__ Start of record:12/123 11-abc 456 Before zyx: a 1/2 and After zyx:this is the last` [download]	[reply] [d/l]