Your "sample find results" don't seem to be consistent with the command line you use to create them. The output of a pipeline like this:
find path -type f -print0 | xargs -0 grep -i -n pattern
should be a set of zero or more lines lines like this:
path/filename:line#:content of line containing pattern
path/subdir/filename:line#:another line with pattern in it
(Note the addition of "-print0" on the find command, and "-0" on xargs; someday those will save you a lot of grief, e.g. when you have file names containing spaces or other shell-magic characters -- BTW, it's possible to create a file on a unix box with line-feed and/or carriage-return characters in the file name; I've seen it happen.)
But none of your sample results would match those templates. Anyway, given the command line that you are using, and the presence of some files with the pattern ":\d+:" as part of the file name (and there may be some grepped lines from data files that also contain matches for ":\d+:"), I don't think you want to use "xargs grep -n" that way -- the results cannot be parsed reliably.
Take the time to let perl do the grepping on the files:
#!/usr/bin/perl
use strict;
# use Data::Dumper; # you might want this
die "Usage: $0 search_path search_pattern\n"
unless ( @ARGV == 2 and -d $ARGV[0] );
my ( $path, $pattern ) = @ARGV;
my @filelist;
open( my $find, "-|", "find $path -type f -depth -print0" )
or die "Unable to run 'find $path ...'\n";
{
local $/ = chr(0); # set input record separator to null byte
@filelist = <$find>;
chomp @filelist; # remove null byte terminations
}
close $find;
my %found;
for my $filename ( @filelist ) {
# $. = 0; # (update: this line is not needed)
open( my $fh, $filename );
while (<$fh>) {
$found{$filename}{$.} = $_ if ( /$pattern/ );
}
close $fh;
}
# check out how the data is stored if you want:
# print Dumper( \%found );
# or pretty-print it:
for my $file ( sort keys %found ) {
# parse $file into directory and filename if you want
for my $line ( sort {$a<=>$b} keys %{$found{$file}} ) {
printf( "File <<%s>> LineNo <<%d>> matches: %s",
$file, $line, $found{$file}{$line} );
}
}
(update: removed some misleading stuff from one of the "die" messages)
There are two features that result from using the search_pattern string within perl, and you'll probably like them:
- You don't need to worry about properly quoting the match string in order to make it work in the "xargs grep ..." shell command; having things like spaces, angle-brackets, etc, in the search pattern will be safe.
- You can leverage the extra power of perl regular expressions -- they provide some special tricks you don't get with a standard "grep" shell command. (In using the above script, some regex patterns would need to be quoted on the command line when the script is run, in order to get past shell interpretation and directly into @ARGV.)
|