last $n lines that match a criteria

BUU has asked for the wisdom of the Perl Monks concerning the following question:

Basically I have a semi large file (5 megs+) that contains output sperated by new lines. What I want to do is get the last $n lines that match a certain criteria.
My immediate thought was to use tail, in the spirit of not reinventing the wheel. But (as far as I can tell) tail only has an option to specify the number of lines from the very bottom. Thus if the $n lines weren't contained in the first sample (default 10 lines or what not) then I would have to specify a larger number of lines from the bottom, tail -n 20 and so on until I find the $n lines I want. While this is workable I suppose, probably in the form of:

my @lines;
while(@lines < $n)
{
  @lines = grep/criteria/,split/\n/,`tail -n $i file.foo`;
  $i+=$n;
}
[download]

But the idea of having to reparse the same lines over and over (bottom 10, then bottom 20) and so on repeatedly kind of rankles. Perhaps it's not an overwhelming problem, but it bothers me =]. And of course I would have to make constant calls to tail, perhaps 5 or more calls from a single invocation. This doesn't seem to bode well for efficiency..

My second thought was that I could do something along the lines of:

my @file = <FILEHANDLE>;
for(@file)
{
  push @lines,$_ if /criteria/;
  last if @lines>$n;
}
[download]

But of course slurping a semi massive file into memory is going to incur even worse penalties then using tail.

Last but and probably least, I could do something along the lines of tail -f and have a deamon that constantly watches the file in question and keeps some sort of database of the last 10 lines that match my criteria. This might be the most efficient of the three, but it seems vastly more complicated, in that I have to maintain the deamon, make sure it's running and configured properly, etc. Any thoughts?

Comment on last $n lines that match a criteria Select or Download Code

Replies are listed 'Best First'.
Re: last $n lines that match a criteria by Anonymous Monk on Nov 17, 2003 at 07:00 UTC
`#!/usr/bin/perl -w use strict; use File::ReadBackwards; my @lines = (); my $n = 10; my $elif = File::ReadBackwards->new('somefile') \|\| die $!; while(defined(my $line = $elif->readline())){ unshift @lines, $line if $line =~ /criteria/; last if @lines >= $n } print @lines;` [download]	[reply] [d/l]
Re: last $n lines that match a criteria by Zaxo (Archbishop) on Nov 17, 2003 at 06:42 UTC
Your second example gets the first $n, not the last. Both are pretty wasteful of resources. Here's one way to do it, `{ local $_; while (<FILEHANDLE>) { push @lines, $_ if /criteria/; shift @lines if @lines > $n; } }` [download] I've localized `$_` since `while (<>) {...}` does not, but that is usually not necessary. After Compline, Zaxo	[reply] [d/l]
Re: last $n lines that match a criteria by cleverett (Friar) on Nov 17, 2003 at 08:01 UTC
Heh, what's CPAN good for if you don't use it? Read file backwards until you have 10 instances `#!/usr/bin/perl use strict; use File::ReadFileBackwards; my @instances = (); $file=File::ReadFileBackwards->new("/some/log/file"); while (@instances < 10 and defined($line = $file->readline)) { push @instances, $line; }` [download] A daemon that tails the file (needs lots more to be a real daemon) `#!/usr/bin/perl use strict; use File::Tail; my @instances = (); $file=File::Tail->new("/some/log/file"); while (defined($line=$file->read)) { if ($line =~ m/criteria/) { my $discard = pop @instances if @instances > 10; push @instances, $line; } }` [download]	[reply] [d/l] [select]
Re: last $n lines that match a criteria by davido (Cardinal) on Nov 17, 2003 at 07:08 UTC
Here is a FIFO approach: `my @lastmatches; my $keep = 5; while ( my $line = <FILEHANDLE> ) { next unless $line =~ /criteria/; push @lastmatches, $line; unshift @lastmatches if --$keep < 1; }` [download] I don't know if unshift is "expensive" from a time-critical standpoint, but where the array is never more than five elements long, it probably isn't terribly efficient to use it in this way. I've essentially created a fifo list that won't grow to larger than five elements. It does scale pretty well though, and passed my tests. Or there's this grep and list slice approach: `my @lastfive = ( grep { /criteria/ } <FILEHANDLE> ) [ -5 .. -1 ];` [download] UPDATE: I created a 5mb file and used the grep method along with a list slice to gather the last five using the following snippet. On the machine I tested it with, it took about 5 seconds to grep the file using a simple regex. ... that on an old beat up 266mhz Pentium II notebook. Again, I'm not sure how time critical the OP's needs are, and while I know the grep method is slower than the File::ReadBackwards method, it's pretty simple, and seems to work just fine as long as it's ok to take a few seconds per 5mb file. Here's the test snippet: `use strict; use warnings; # Create the 5mb file. my @alphabet = ( "A".."Z", "a".."z", " ", "\n"); open OUTFILE, ">file.txt" or die; print OUTFILE $alphabet[ rand( @alphabet) ] for 1 .. (1024 * 1024 * 5) +; close OUTFILE; # Find the last five occurrences of 'abc'. print "Testing grep method:\n"; open IN, "file.txt" or die; my @lastfive = ( grep { /abc/ } <IN> ) [-5 .. -1]; close IN; my $count = 5; print $count--, ".: ", $_ foreach @lastfive;` [download] Dave "If I had my life to live over again, I'd be a plumber." -- Albert Einstein	[reply] [d/l] [select]
Re: Re: last $n lines that match a criteria by Anonymous Monk on Nov 17, 2003 at 07:15 UTC
...it probably isn't terribly inefficient to use it in this way But it is inefficient to read every line in the file, test every line against the regex, and push every matching line onto the array and shift all but $keep matching lines back off the array. File::ReadBackwards was designed for this kind of problem.	[reply]
Re: Re: last $n lines that match a criteria by Anonymous Monk on Nov 18, 2003 at 07:45 UTC
In your updated second example you are in fact reading the entire file into memory (something the OP wanted to avoid), and creating the entire grep list in memory (at the same time), and then skimming the final N lines off that list. If the pattern occurs on every other line, you'll actually have the entire file plus half again in memory at once.	[reply]
Re: last $n lines that match a criteria by sgifford (Prior) on Nov 17, 2003 at 08:13 UTC
An alternative to File::ReadBackwards would be the `tac(1)` command: NAME tac - concatenate and print files in reverse For example: `open(ELIF,"tac file.foo \|") or die "Couldn't tac file: $!\n"; while (<ELIF>) { push @lines,$_ if /criteria/; last if @lines > $n; }` [download]	[reply] [d/l]
Re: last $n lines that match a criteria by Roger (Parson) on Nov 17, 2003 at 07:30 UTC
I am a lazy programmer. I would combine the Unix grep and tail utilities to do those kind of things with a one-liner. ;-) my @lines = split /\n/, `grep criteria file.foo \| tail -$i`; [download] Provided what you are searching for is not too complicated of cause.	[reply] [d/l]
Re: last $n lines that match a criteria by jmcnamara (Monsignor) on Nov 17, 2003 at 08:54 UTC
Here is a one-liner, change 5 and the match criteria to suit. `perl -ne '$a[($i+=1)%=5] = $_ if /foo/; END{print @a[$i+1..@a,0..$ +i]}' file` [download] This reads all the way through the file so it will be less efficient than methods that read backwards through the file. -- John.	[reply] [d/l]
Re: Re: last $n lines that match a criteria by ysth (Canon) on Nov 17, 2003 at 11:19 UTC
I think you mean ..$#a, not ..@a. This works, too: `perl -wpe'$a[($i+=1)%=5]=$_ if /foo/} for(@a[$i+1..$#a,0..$i]){' file`	[reply] [d/l]
Re: Re: Re: last $n lines that match a criteria by jmcnamara (Monsignor) on Nov 17, 2003 at 11:50 UTC
No, I meant `@_` because it came from some golf code and that was one character shorter. :-) Here is the progression of code that I took it from (note this is for tail and not the last `$n` matching lines but the change is minor). `# tail perl -ne '$i=$.%5; $a[$i]=$_; END{print @a[$i+1..$#a,0..$i]}' file perl -ne 'END{print@a[$i+1..$#a,0..$i]}$a[$i=$.%5]=$_' file perl -ne 'END{print@a[$i+1..@a,0..$i]}$a[$i=$.%5]=$_' file perl -pe '$a[$i=$.%5]=$_}{print@a[$i+1..@a,0..$i]' file perl -pe '$_[$==$.%5]=$_}{print@_[$=+1..@_,0..$=]' file` [download] Also, I left out `-w` on purpose for cases where there were less matches than `$n`. Try this: `echo foo \| perl -wpe 'your code here' file` [download] That is also why I was able to get away with `@a`. -- John.	[reply] [d/l] [select]
Re: Re: Re: Re: last $n lines that match a criteria by ysth (Canon) on Nov 17, 2003 at 11:56 UTC