Perlseeker_1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Experts,

i am trying to read a data from bunch of large files based on key element exist in another file

for example i have a file called file - 1 which has two fields name and number as below

file - 1

K 1 J 2 L 3 H 4

Bunch of large files contains data as below, i have mentioned only one file, we may have n number of files

file 01 K 01 JK 1234 J 02 HJ 5678 L 03 JH 6789 H 04 IH 5467

i am reading all the files and will store it in an array called @filelist

@filelist --------- file 01 file 02 file 03 file 04 file 05 file 06

i am reading each element from the file - 1 and will see if it exists in any of the files avialable in @filelist if exists i need to print the value of that element to text file

for example when i read a file - 1 , the first element is K, take the K and seacrh if it exists in any of the files available in @filelist, for example the value of K is available in file 01, print the K 01 JK 1234 to some text file called output.txt

My code

my @array; open F, "$input_file" or die "Couldn't Open File: $!"; while (<F>) { chomp; ( my ( $name, $num ) = split / /, $_ ) ; push (@array, $name); } foreach my $name (@array) { }

Can anybody please help me on syntac

Replies are listed 'Best First'.
Re: how to check if a particular value exist in list of files stored in an array
by baxy77bax (Deacon) on Oct 29, 2013 at 10:41 UTC
    well here is a quick solution: but be aware there is more then one way to do it :
    use strict; @filelist = ("f1"); while(<DATA>){ chomp; /^([^\t]*)\t/; # I assume it is tab seperated my $rec = $1; foreach my $file(@filelist){ my $entry = qx(grep -P "^$rec\t" $file);# I assume it is tab seper +ated print "$rec\t$file\t$entry"; } } __DATA__ K 1 J 2 L 3 H 4
    The cosde is a bit redundant but i'll leave it up to you to clean it up. You can simply redirect the putput to a third file with "perl program.pl > output" or use :
    open (OUT, ">", $myoutfile) || die "$!"
    cheers

    baxy

Re: how to check if a particular value exist in list of files stored in an array
by hdb (Monsignor) on Oct 29, 2013 at 11:36 UTC

    Your approach is one good way to do it if you do a few minor modifications. Read your first file into a hash instead of an array for easier lookup:

    my %records; open my $fh, "<", $file1; /^(\S+)\s/ and $records{$1}=1 while <$fh>; close $fh;

    I am using a regular expression to find the first column in your file which seems to be simpler here and can also be used as a test. For example, empty lines will be skipped automatically this way.

    Then open each of your other files in turn and check whether a given line shall be printed by doing a lookup in the hash:

    for my $file (@filelist) { open my $fh01, "<", $file; /^(\S+)\s/ and defined $records{$1} and print while <$fh01>; close $fh01; }

    I have skipped all checks whether files can be opened etc. You need to add those yourself.

      hdb,
      Defensive programming aside, I believe that this may not completely satisfy the requirements. Perlseeker_1 wasn't clear if value should be printed once for each time it appeared in the files, if it could appear more than once, etc. I could see it being one of at least 3 choices:
      • Print once for each instance in the big files
      • Print once per big file if it appears (can still be multipl)
      • Print exactly once if it appears in any of the big files
      Assuming option 3, short-circuiting would speed things up assuming the real world case wasn't also the worst case.

      Cheers - L~R

        Agree. But I hope that my proposal will help as a starting point even if it does not meet all the requirements.