in reply to Re: retrieving information from a set of files
in thread retrieving information from a set of files

Will this work?
$accts = accts_file.txt; $dirtoget = "/my/data/dir/with/files"; opendir(FILEDIR, $dirtoget) || die("Cannot open directory"); @thefiles = grep -T, <$sample_dirtoget/*>; closedir(FILEDIR); my %layout = (file1 => 0, file2 => 0, file3 => 0, file4 => 0, file5 => + 0, file6 => 0, file6 => 0, file8 => 20, file9 ->10, file10 => 11); foreach $file (@thefiles) { $sample_filename = substr($file, 57); open(FILE, $file) || die ("Cannot open $file:$!"); while ($line = <FILE>){ chomp $line; $acct = substr($line, $layout{$file}, 9); `grep $acct $accts_file.txt >> NEW_$sample_filename`; } close(FILE); }

Replies are listed 'Best First'.
Re^3: retrieving information from a set of files
by graff (Chancellor) on Oct 21, 2006 at 05:32 UTC
    Have you tried it? Did it work? If not, how did it fail?

    There are some obvious problems, most of which would be revealed to you if you add "use strict;" and "use warnings;" at the top of the script: no quotes around "accts_file.txt", "$accts" is used only once,  grep -T,<$sample_dirtoget/*> should have curlies around -T and no comma, and you didn't assign a value to $sample_dirtoget. You do opendir, but then you use a glob instead of readdir (so opendir was unnecessary). There's probably more stuff like that, but you get the idea...

    Doing a bunch of back-ticked grep commands inside your while loop isn't such a good solution, especially since you are not doing any error checking on those commands. In fact, I think you've lost your train of thought there. This probably is not really doing what you set out to do.

    And I'm not quite sure I understand what you're trying to do with the %layout hash. Are those the actual file names you are using as hash keys? What if there's a file whose name doesn't match one of those keys? (Why get file names from a glob or readdir if you have the names in a hash?)

    If the account number field is always nine digits (bounded by non-digit characters), is it the case that other fields in any given file would never contain exactly nine digits (bounded by non-digit characters)? If so, you could try something like this:

    use strict; use warnings; # get the list of accounts numbers we're looking for: my %target_acc; my $target_file = "accts_file.txt"; open( TARGS, "<", $target_file ) or die "$target_file: $!"; while (<TARGS>) { chomp; $target_acc{$_} = undef; # keep target account #' as hash keys } close TARGS; # get the list of files we want to search over: my $datadir = "/my/data/dir/with/files"; opendir( DIR, $datadir ) or die "$datadir: $!"; my @files = grep {-T "$datadir/$_"} readdir DIR; closedir DIR; # open each file to be searched, output found lines to # a corresponding NEW file in the current directory for my $file ( @files ) { open( OUT, ">", "NEW_$file" ) or die "NEW_$file: $!"; if ( open( IN, "<", "$datadir/$file" )) { while (<IN>) { print if (( /^(\d{9})\|/ or /\|(\d{9})[|\r\n]/ ) and exists( $target_acc{$1} )); } close IN; } close OUT; }
    That hasn't been tested, but it does compile properly with strict and warnings enabled. Instead of using substring or split, I'm just trying a regex on each line of each file being searched, to match a 9-digit string either at the beginning of the line and followed by "|", or anywhere in the line, preceded by "|" and followed by "|" or newline. Then I see whether the 9-digit string that matched happens to be one of the numbers of interest, simply by checking the %target_acc hash.