Lines above and Below a matched string

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Lines above and Below a matched string by BrowserUk (Patriarch) on Jun 26, 2003 at 18:48 UTC
If the file is small, slurp the whole file into a scalar and use a regex of the general form `my $data = do{ local (ARGV, $/) = 'text.file'; <>; }; print "$2,$1,$3" if $data =~ m[(DATE).?(SOMETHING).?(SOMETHINGELSE)]s;` [download] The /s modifier allows . to match newlines, so the .? will span lines. Add /g if you expect to find more than one occurance. If your file is too big to slurp, then you could use a sliding buffer on the file. See Re: split and sysread() for some sample code. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply] [d/l]
Re: Lines above and Below a matched string by gjb (Vicar) on Jun 26, 2003 at 19:30 UTC
Just keep track of the last three lines you read, and match against the one before the last you read. `my $line1 = <>; chomp($line1); my $line2 = <>; chomp($line2); while (<>) { chomp($_); my $line3 = $_; if ($line2 =~/(SOMETHING)/) { my $something = $1; $line1 =~ /(DATE)/; my $date = $1; $line3 =~ /(SOMETHINGELSE)/; my $somethingElse = $1; print "$something,$date,$somethingElse\n"; } $line1 = $line2; $line2 = $line3; }` [download] Note: this is untested code. It will fail on files shorter than three lines. And no, it's not elegant, but it's simplee. Hope this helps, -gjb- Update: This is a sliding buffer just as in BrowserUK's approach, but this should work for large files too.	[reply] [d/l]
Re: Lines above and Below a matched string by svsingh (Priest) on Jun 26, 2003 at 19:57 UTC
Here's something a little different. I know you want to match on SOMETHING, but since you'll also need to match on DATE and SOMETHINGELSE (I'm assuming), why not match all three in sequence? To me, this is a little simpler than the buffer ideas. It probably has some flaws, but it seems to be working. my $tmp; while ($tmp = <DATA>) { if ($tmp =~ /(DATE\d)/) { my %h = (); $h{'date'} = $1; $tmp = <DATA>; if ($tmp =~ /(SOMETHING\d)/) { $h{'match'} = $1; $tmp = <DATA>; if ($tmp =~ /(SOMETHINGELSE\d)/) { $h{'more'} = $1; print "$h{'match'},$h{'date'},$h{'more'}\n"; } else { redo; } } else { redo; } } } __DATA__ anything anything DATE1 anything anything anything SOMETHING1 anything anything anything SOMETHINGELSE1 anything anything DATE2 anything anything anything DATE3 anything anything anything SOMETHING2 anything anything anything SOMETHINGELSE2 anything anything DATE4 anything anything anything SOMETHING3 anything anything DATE5 anything anything anything SOMETHING4 anything anything anything SOMETHINGELSE3 [download] That returns: `SOMETHING1,DATE1,SOMETHINGELSE1 SOMETHING2,DATE3,SOMETHINGELSE2 SOMETHING4,DATE5,SOMETHINGELSE3` [download] Hope this helps.	[reply] [d/l] [select]
Re: Lines above and Below a matched string by Itatsumaki (Friar) on Jun 26, 2003 at 18:37 UTC
One way is to read your file in chunks of three lines. If line 2 matches the format you need, then you go ahead and process lines 1 & 3, otherwise go to the next chunk of three. -Tats	[reply]
Re: Re: Lines above and Below a matched string by derby (Abbot) on Jun 26, 2003 at 18:46 UTC
Kinda yes and kinda no. If line 2 doesn't match, you have then make line three the first line of your buffer and read in two lines. -derby	[reply]
Re: Re: Lines above and Below a matched string by Itatsumaki (Friar) on Jun 26, 2003 at 19:30 UTC
derby pointed out the problem with my approach. I think BrowserUK's idea of sliding buffers is probably best, but you can salvage my approach by opening three file-handles and having them in different frames. In code: `open(IN1, "<$infile"); open(IN2, "<$infile"); open(IN3, "<$infile"); my $temp; $temp = <IN2>; # remove one dummy line from 2nd file-handle $temp = <IN3>; $temp = <IN3>; # remove two dummy lien from 3rd file-handle # Now process all three file-handles separately # in chunks of three lines` [download] And there was a recent node on how to read a file in chunks of n lines, that may help. -Tats	[reply] [d/l]