eh3civic has asked for the wisdom of the Perl Monks concerning the following question:
I have some large data sets that have oddly formatted lines, and much of it I don't need. The general rule is that I need to keep every instance of lines that start with "john#", are followed by {2,5}new lines, and then lines starting with "jacob - \d\.0"
bhgfsggdsgsg -- john1 weruwearnwrnweuarar jjafdaiuweifweofiuwe jacob - 1.0 -- nfaslf23523525 john2 asfsjldf43tgre john3 asbdfhskafbv3333v sdfahh34ttg sadfhk34t3wtg sdfhk3gfwghhw3 jacob - 2.0
The output that I need would look like this..
john1 > jacob - 1.0 john3 > jacob - 2.0
Obviously my data is a little different than this, but I have every regex pulling exactly what I need, but just not in the way I want it. I can't seem to figure out how to tell it to take John# only when followed by a Jacob. I don't want to keep a John# unless it is followed by a Jacob. For instance, the code below would look at when there are 3 lines between them...I know it isn't right, but the multi-match, multi-line thing has me confused.
if($line1 =~ /(john)^.{1,100}$^.{1,100}$^.{1,100}$^(jacob \- \d\.0)/s) { ($JOHN,$JACOB) = ($1,$2); print MYOUTPUTFILE1 "$JOHN - $JACOB"; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Matching consecutive "different" regex patterns across multiple lines
by Anonymous Monk on Apr 23, 2011 at 11:53 UTC | |
by duelafn (Parson) on Apr 23, 2011 at 13:04 UTC | |
|
Re: Matching consecutive "different" regex patterns across multiple lines
by wind (Priest) on Apr 23, 2011 at 15:57 UTC |