RegEx to get file name from find results

vsailas has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
I do a find | grep using find ../ -depth -type f | xargs grep -i -n "<search Term>";
it pulls up many files, I use a regEx to get file name and path $ResultOfGrep=~/(\w+[\.\w+[\.]?]+)(\:)/;
where $1 holds file name and $` holds path.
Sample find results are like this

 /work/MergeProject/cgi-bin/log/spliceforms/Dec-13-2007_02:21:10:     
+                6:Error Details:

/work/cgi-bin/BulkGO.cgi:If any individual load errors occured, they w
+ill be listed here:</b><p> 

/work/GOBKTProject/cgi-bin/TitleLiner.cgi-ELFE-fixed:# Lastly, the cou
+nt variable logs the result.
[download]

where 'BulkGO.cgi', 'TitleLiner.cgi-ELFE-fixed' and 'Dec-13-2007_02:21:10' are file names
File name and path are seprated by ':' from the matching string.
Please help me with a regex, which could seperate all possible file names from such a find | grep result.

Thanking all in advance.

Comment on RegEx to get file name from find results Select or Download Code

Replies are listed 'Best First'.
Re: RegEx to get file name from find results by graff (Chancellor) on Feb 05, 2008 at 07:17 UTC
Your "sample find results" don't seem to be consistent with the command line you use to create them. The output of a pipeline like this: `find path -type f -print0 \| xargs -0 grep -i -n pattern` [download] should be a set of zero or more lines lines like this: `path/filename:line#:content of line containing pattern path/subdir/filename:line#:another line with pattern in it` [download] (Note the addition of "-print0" on the find command, and "-0" on xargs; someday those will save you a lot of grief, e.g. when you have file names containing spaces or other shell-magic characters -- BTW, it's possible to create a file on a unix box with line-feed and/or carriage-return characters in the file name; I've seen it happen.) But none of your sample results would match those templates. Anyway, given the command line that you are using, and the presence of some files with the pattern ":\d+:" as part of the file name (and there may be some grepped lines from data files that also contain matches for ":\d+:"), I don't think you want to use "xargs grep -n" that way -- the results cannot be parsed reliably. Take the time to let perl do the grepping on the files: #!/usr/bin/perl use strict; # use Data::Dumper; # you might want this die "Usage: $0 search_path search_pattern\n" unless ( @ARGV == 2 and -d $ARGV[0] ); my ( $path, $pattern ) = @ARGV; my @filelist; open( my $find, "-\|", "find $path -type f -depth -print0" ) or die "Unable to run 'find $path ...'\n"; { local $/ = chr(0); # set input record separator to null byte @filelist = <$find>; chomp @filelist; # remove null byte terminations } close $find; my %found; for my $filename ( @filelist ) { # $. = 0; # (update: this line is not needed) open( my $fh, $filename ); while (<$fh>) { $found{$filename}{$.} = $_ if ( /$pattern/ ); } close $fh; } # check out how the data is stored if you want: # print Dumper( \%found ); # or pretty-print it: for my $file ( sort keys %found ) { # parse $file into directory and filename if you want for my $line ( sort {$a<=>$b} keys %{$found{$file}} ) { printf( "File <<%s>> LineNo <<%d>> matches: %s", $file, $line, $found{$file}{$line} ); } } [download] (update: removed some misleading stuff from one of the "die" messages) There are two features that result from using the search_pattern string within perl, and you'll probably like them: You don't need to worry about properly quoting the match string in order to make it work in the "xargs grep ..." shell command; having things like spaces, angle-brackets, etc, in the search pattern will be safe. You can leverage the extra power of perl regular expressions -- they provide some special tricks you don't get with a standard "grep" shell command. (In using the above script, some regex patterns would need to be quoted on the command line when the script is run, in order to get past shell interpretation and directly into @ARGV.)	[reply] [d/l] [select]
Re: RegEx to get file name from find results by ikegami (Patriarch) on Feb 05, 2008 at 06:23 UTC
There's no reliable way to parse those. Have you considered using `grep` with the `-l` option (to just list the file names), or using `perl` instead of `grep`? (File::Find::Rule could replace `find` too, trivially)	[reply] [d/l] [select]
Re: RegEx to get file name from find results by poolpi (Hermit) on Feb 05, 2008 at 08:19 UTC
`#!/usr/bin/perl use strict; use warnings; use File::Find; use File::Slurp; my @dir = ( '/work' ); my $term = 'my term'; find( sub{ my @lines; -f and @lines = read_file( $File::Find::name ); # complete pathname to the file for my $line (@lines){ if ($line =~ /\A $term /x){ print $File::Find::dir # Directory name . q{/} . qq{[$_]} # file name . qq{ : $line}, "\n" # line matching the term + } } }, @dir );` [download] `Output: /work/subd0/subd1/[test_05022008.pl] : my_matching_line;` [download] hth, PooLpi	[reply] [d/l] [select]
Re: RegEx to get file name from find results by bunch (Sexton) on Feb 05, 2008 at 16:39 UTC
Use grep with "-Z" option, this makes it print a null character instead of ":" after the file name.	[reply]