| Category: | File utilities |
| Author/Contact Info | Michael K. Neylon [mailto://mneylon-pm@masemware.com] |
| Description: | There were a few questions in SOPW that involved finding patterns in a number of files, and typically resulted in answers revolving around using a system call to grep. I was thinking that it would seem easier if they simply just used File::Grep, when I realized that File::Grep does not exist at all despite all the other File:: modules. While grepping is a trivial task, I wrote this keeping efficiency in mind such that the type of call environment will affect performance, using the 3-way state of wantarray to determine the difference. Note that I've not released this to CPAN yet. Two issues still to decide: first, the namespace File:: appears to be typically left for several base packages, and I don't know if releasing this as File::Grep would invade that space (similar to releasing a DBI::-named package (which should be in the DBIx:: namespace). Secondly, I can think of some supplimentary functions to add to this, such as "fgrep_flat" which would returned the flatted array of matches for all files, and possibly "fgrep_into_array" which the user would supply a reference to an array, such that for very large files, the transfer of the return variable would not temporarily duplicate the large number of matches and possibly overwhelm memory. So I'm solicatiting other suggestions, or if even this file is necessary (I've not seen anything else via CPAN, google, or PM Super Search that implies a similar being exists). If this is just duplication of effort, I'm not too worried, it only took 1/2hr to write and test appropriately, most of that in CVS and testing. |
#!/usr/bin/perl -w
package File::Grep;
use strict;
use Carp;
BEGIN {
use Exporter ();
use vars qw($VERSION @ISA @EXPORT @EXPORT_OK %EXPORT_TAGS);
$VERSION = sprintf( "%d.%02d", q(0.01) =~ /\s(\d+)\.(\d+)/ );
@ISA = qw(Exporter);
@EXPORT = qw();
@EXPORT_OK = qw( fgrep );
%EXPORT_TAGS = ( );
}
sub fgrep (&@) {
my ( $coderef, @files ) = @_;
my $returntype;
if ( wantarray ) {
$returntype = 2; # Return everything
} elsif ( defined( wantarray ) ) {
$returntype = 1; # Return just the count
} else {
$returntype = 0; # Return at first match
}
my @matches;
my $count;
foreach my $file ( @files ) {
if ( $returntype == 2 ) {
push @matches, { filename => $file,
count => 0,
matches => [] };
}
open FILE, "<$file" or
carp "Cannot open file $file to grep: $!" and next;
while ( my $line = <FILE> ) {
local $_ = $line;
if ( &$coderef ) {
$count++;
last if ( $returntype == 0 ); # Last of while loop!
if ( $returntype == 2 ) {
$matches[-1]->{ count }++;
push @{ $matches[-1]->{ matches } }, $line;
}
}
}
close FILE;
if ( !$returntype && $count ) {
return 1;
}
}
if ( $returntype == 2 ) {
return @matches;
} elsif ( $returntype == 1 ) {
return $count;
} else {
return 0; # Void context; if here, nothing was found, e
+ver
}
}
1;
__END__
=head1 NAME
File::Grep - Find matches to a pattern in a series of files
=head1 SYNOPSIS
use File::Grep qw( fgrep );
# Void context
if ( fgrep { /$user/ } "/etc/passwd" ) { do_something(); }
# Scalar context
print "The index page was hit ",
fgrep { /index\.html/ } glob "/var/log/httpd/access.log.*",
" times\n";
# Array context
my @matches = fgrep { /index\.html } glob "/var/log/httpd/access.log
+.*";
foreach my $matchset ( @matches ) {
print "There were ", $matchset->{ count }, " matches in ",
$matchset->{ filename }, "\n";
}
=head1 DESCRIPTION
File::Grep mimics the functionality of the grep function in perl, but
applying it to files instead of a list. This is similar in nature to
the UNIX grep command, but more powerful as the pattern can be any leg
+al
perl function.
While looking for patterns for files is trivally easy, File::Grep take
+s
steps to be efficient in both computation and resources. Namely, if c
+alled
in void context, it will short circuit execution when a match is locat
+ed
and immediately report truthfulness. In scalar context, it will only
+keep
track of the number of matches and return that value. In array contex
+t, it
will generate an array of hashes that include details on the matching
+--
specifically for each hash, key "filename" will be the name of the cur
+rent
file, "count" will be the number of hits, and "matches" will be an arr
+ay
reference containing the matched lines, in order of discovery. The
ordering of this array will follow the same order of files as passed i
+n
from fgrep.
The syntax for this command is similar to grep:
fgrep BLOCK ARRAY.
The block should be a subroutine that returns if a match was found or
+not.
The variable $_ will be localized before this routine is called, so ma
+y
be used to process the current line. Note, however, that only the
original content of the line is saved in the array of hashes in array
context. The array is a list of files to be grepped. If a file canno
+t
be opened, a warning will be issued, though the function will continue
+ to
process remaining files; in addition, an entry in the array of hashes
+will
still be created as to not mess up any indexing with the original file
+
list.
=head1 EXPORT
"fgrep" may be exported, but this is not set by default.
=head1 AUTHOR
Michael K. Neylon, E<lt>mneylon-pm@masemware.comE<gt>
=head1 SEE ALSO
L<perl>.
=cut
|
|
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: File::Grep
by larryk (Friar) on Jan 20, 2002 at 17:58 UTC | |
by Masem (Monsignor) on Jan 22, 2002 at 17:23 UTC | |
by jmcnamara (Monsignor) on Jan 22, 2002 at 18:47 UTC | |
by Masem (Monsignor) on Jan 22, 2002 at 20:20 UTC | |
by larryk (Friar) on Jan 27, 2002 at 18:01 UTC | |
|
Re: File::Grep: Add'l Functionality.
by dmitri (Priest) on Jan 21, 2002 at 22:42 UTC | |
by Masem (Monsignor) on Jan 22, 2002 at 02:55 UTC |