Reading a Directory of Files, Searching for Text, Outputting Matches

NorthShore44 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks - I am trying to write a program that will read in a directory of txt files, search through each file sequentially for text matches, and subsequently output the text matches to other txt files. If the script does find matches, I would like it to copy the text to a filename that would match the file from which it came. So far, I've gotten the text search to work, and print matches to one file, not individual files. But I can't seem to get the loop to work. To do this, I've done a $File::Find::name of the directory and outputted the list of txt files into a text file, which I've then read as an array. Here's my code - if you can give me some help, it would be much appreciated. Thanks!

#! /opt/local/bin/perl -w


#   Part 1: Write all of the filenames to Index.txt to be able to inpu
+t names as an array #


open directory1, ">./index.txt" or die "Could not create file: $!\n";

use File::Find;

my @all_file_names;

find sub {
     return if -d;
     push @all_file_names, $File::Find::name;
     }, '*my directory*';

for my $path ( @all_file_names ) {
     select directory1;
     print "$path\n";
     }

print STDERR "contents of directory written \n";
close directory1;     

##################################################
#                     Part 2: Open & Read Contract List               
+                                                   #
##################################################

my $filename = 'index.txt';
open my $fh, $filename
    or die "Couldn't read '$filename': $!";

#chomp @ARGV = <$fh>;

print "Processing $_"
    for @ARGV;

##################################################
#     Part 3: Read in the contracts and search for keywords           
+                                     #
##################################################

open Contract1a, ">./test.txt" or die "Could not create file: $!\n";

@contracts = @ARGV;
$linescount = 20;     #set how many lines after the match you'd like t
+o pull;

foreach (@contracts) {

  open Contract1, "<./$_ \n";
  @lines = <Contract1>;
  close Contract1;
  my $contents = join "", @lines;

  @all = undef;
  shift @all;

  while ($contents =~ m/Management Fees/i) {

    if ($contents =~ m/(\n.*Management Fees(.*\n){$linescount})/i) {

      push (@all, "$1\n");
      print STDERR "$1\n";
      $contents =~ s/Management Fees//i;

    }

  }

  select Contract1a;
  print join("\n",(@all));
  print "\n";

}

print STDERR "...Contract1 match complete.\n\n";

close Contract1a;>
[download]

Comment on Reading a Directory of Files, Searching for Text, Outputting Matches Download Code

Replies are listed 'Best First'.
Re: Reading a Directory of Files, Searching for Text, Outputting Matches by cdarke (Prior) on Oct 13, 2009 at 16:07 UTC
I'm not sure I can provide a complete solution, since I have no idea what is comming in on the command line, and what the data is. You get the filenames into `@all_file_names` but all you do is to write those to index.txt, you do not appear to do anything else with the files - or maybe I misunderstand. However you have some strange constructs which tidying up and might make your code clearer. `use strict;` [download] There are several variables that are used inside the main loop. Are these supposed to be globals? Declaring them with `my` will make their scope clearer (that might solve part of your problem). In a couple of places you have code like this: `select directory1; print "$path\n";` [download] which is rather unnecessary. It would be simpler to: `print directory1 "$path\n";` [download] Be careful of your opens: `open Contract1, "<./$_ \n"; # What's with the space and new-line? my @lines = <Contract1>; close Contract1; my $contents = join "", @lines;` [download] Perl will look in the current directory anyway (no need for ./), and you should always test the open. Could be written as: `open Contract1, '<', $_ or die "Unable to open $_: $!"; local $/ = undef; # slurp mode my $contents = <Contract1>; # Hope the files are small! close Contract1;` [download] You place an `undef` element onto the array, then shift it off: `@all = undef; shift @all;` [download] Would be simpler if you did this: `my @all;` [download] Which you would have done if you `use strict;` I'm not sure that you actually need to do a multi-line match and store everything, but then again I don't know what the data looks like.	[reply] [d/l] [select]
Re: Reading a Directory of Files, Searching for Text, Outputting Matches by gmargo (Hermit) on Oct 13, 2009 at 17:31 UTC
I interpret your code as searching through these contract files for the "Management Fees" string, and then printing that line and the next 19 lines. If this is true then I think you're making it too difficult by slurping the entire file into a string, and then iteratively searching through that string, each time from the beginning. If my interpretation is correct, here is a bit of code that does the same thing but processes the files line-by-line. Also it has code, which I think you requested, creating different output files for each input file. foreach my $contract (@contracts) { open (Contract1, "<", $contract) \|\| die("Cannot open input file $contract: $!"); my @all; while (<Contract1>) { if (/Management Fees/i) { # Print this line push @all, $_; # And the next 19 lines too for (my $i=0; $i<($linescount-1); $i++) { my $line = <Contract1>; last if !defined $line; push @all, $line; } } } close Contract1; # Base output filename on original filename. my $outfile = "$contract".".output"; open (Contract1a, ">", $outfile) \|\| die("Cannot open output file $outfile: $!"); print Contract1a join("\n",(@all)); close Contract1a; } [download]	[reply] [d/l]
Re: Reading a Directory of Files, Searching for Text, Outputting Matches by planetscape (Chancellor) on Oct 13, 2009 at 22:35 UTC
The example shown in Re: Read entire file from given folder is the basic skeleton I use for this sort of thing, and may give you some ideas. HTH, planetscape	[reply]
Re: Reading a Directory of Files, Searching for Text, Outputting Matches by stonecolddevin (Parson) on Oct 13, 2009 at 20:58 UTC
(planetscape is going to kill me for improperly formatting this, but I'm too lazy to turn off TinyMCE) Why don't you check out KinoSearch? You'll have a full index of words to search, and It'll allow you to focus on the other business logic, like writing the results you want out to wherever you choose. mtfnpy	[reply]
Re: Reading a Directory of Files, Searching for Text, Outputting Matches by afoken (Chancellor) on Oct 13, 2009 at 20:55 UTC
It seems you are searching for a combination of find, xargs, grep and simple output redirection. No perl needed: `find /some/where/in/the/wild -name '*.txt' -print0 \| xargs -0 grep -A3 + 'needle_in_the_haystack' > result.txt` [download] Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l]