Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: grab 'n' lines from a file above and below a /match/

by mrpeabody (Friar)
on Sep 17, 2004 at 04:35 UTC ( [id://391659]=note: print w/replies, xml ) Need Help??


in reply to grab 'n' lines from a file above and below a /match/

Obligatory Tie::File solution. I haven't done any benchmarks, but I would guess it's as fast as the other Perl solutions while being less memory-intensive.

As others have said, /bin/grep is the way to go here.

#!/usr/bin/perl use strict; use warnings; use Tie::File; use Fcntl 'O_RDONLY'; my $DEBUG = 0; my $text = qr/c9391b56-b174-441b-921c-7d63/; my $file = 'GWSvc.log'; my $context = 3; sub dprint { print @_ if $DEBUG }; my @lines; tie @lines, 'Tie::File', $file, mode => O_RDONLY or die "tie failed: $!"; for (my $i = 0; $i <= $#lines; $i++) { dprint "SCAN: line $i\n"; if ($lines[$i] =~ /$text/) { dprint "MATCH at line $i\n"; my $start = $i - $context; if ($start < 0) { $start = 0; }; my $end = $i + $context; for my $j ($start .. $end) { dprint "$j: "; print "$lines[$j]\n"; }; print "\n"; $i += $context; }; };

Replies are listed 'Best First'.
Re^2: grab 'n' lines from a file above and below a /match/
by Aristotle (Chancellor) on Sep 17, 2004 at 06:23 UTC

    It's actually slower and more memory intensive than any of the other solutions. Tie::File internally keeps a list of byte offsets for all the lines, and it needs lot of additional overhead that is supposed to optimize writes which you never make any use of.

    Your code also doesn't get the edge cases right: if there's a match within less than $context lines of the previous, it will be missed.

    You gave me an idea with regards to memory consumption, though:

    #!/usr/bin/perl use strict; use warnings; use Fcntl qw( :seek ); my $rx = qr/c9391b56-b174-441b-921c-7d63/; my $to_print = 0; my $context = 10; my @offs = ( 0 ) x ( 1 + $context ); while(<>) { my $context_start = shift @offs; my $here = tell ARGV; push @offs, $here; if( /$rx/ ) { if( not $to_print ) { my $length = $here - $context_start; seek ARGV, $context_start, SEEK_SET; read ARGV, $_, $length; } $to_print = 1 + $context; } --$to_print, print if $to_print; }

    This only needs to keep $context offsets in memory.

    Update: fixed bugs. It was ( 0 ) x $context which gave one too few lines of before-context and $here - $context_start + length which of course ate too much input — but that wasn't obvious with my test data. Oopsie.

    Makeshifts last the longest.

      It's actually slower and more memory intensive than any of the other solutions. Tie::File internally keeps a list of byte offsets for all the lines, and it needs lot of additional overhead that is supposed to optimize writes which you never make any use of.
      Oops. Guessed wrong, then.

      Your code also doesn't get the edge cases right: if there's a match within less than $context lines of the previous, it will be missed.
      That was intentional, and it depends on your definition of "missed". That hit will be printed with the context of the previous hit. Changing the behavior would just require removing the line:
      $i += $context;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://391659]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-24 13:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found