in reply to how to check if a particular value exist in list of files stored in an array

Your approach is one good way to do it if you do a few minor modifications. Read your first file into a hash instead of an array for easier lookup:

my %records; open my $fh, "<", $file1; /^(\S+)\s/ and $records{$1}=1 while <$fh>; close $fh;

I am using a regular expression to find the first column in your file which seems to be simpler here and can also be used as a test. For example, empty lines will be skipped automatically this way.

Then open each of your other files in turn and check whether a given line shall be printed by doing a lookup in the hash:

for my $file (@filelist) { open my $fh01, "<", $file; /^(\S+)\s/ and defined $records{$1} and print while <$fh01>; close $fh01; }

I have skipped all checks whether files can be opened etc. You need to add those yourself.

Replies are listed 'Best First'.
Re^2: how to check if a particular value exist in list of files stored in an array
by Limbic~Region (Chancellor) on Oct 29, 2013 at 11:57 UTC
    hdb,
    Defensive programming aside, I believe that this may not completely satisfy the requirements. Perlseeker_1 wasn't clear if value should be printed once for each time it appeared in the files, if it could appear more than once, etc. I could see it being one of at least 3 choices:
    • Print once for each instance in the big files
    • Print once per big file if it appears (can still be multipl)
    • Print exactly once if it appears in any of the big files
    Assuming option 3, short-circuiting would speed things up assuming the real world case wasn't also the worst case.

    Cheers - L~R

      Agree. But I hope that my proposal will help as a starting point even if it does not meet all the requirements.

        Hi Experts,

        i have file called condition_file and the data in this file looks like....

        condition_file K 01 J 02 H 03 I 04

        I am using below code to read the file condition_file and place the values into an hash

        open F, "$condition_file" or die "File not exists"; while (<F>) { chomp; $records{$field1} = $field1; }

        I am reading list of files into array

        @array -------- file-1 file-2 file-3 file-4

        The content of each file

        file-1 ------- K,01,Europe,Sweden,Mace,rank1,check,01234 J,02,Australia,Sydney,Syd,rank2,chek1,01234 K,01,China,chen,mar,rank4,chack,11234 J,02,japan,Syin,yhk,ranek,chek2,21234 file-2 ------- H,03,German,Ger,hgtk,rank4,hekc,1245 I,04,Negria,neg,ghsjk,rankk1,jusk,4562 K,01,Europe1,Sweden4,Mace1,rank15,check1,12234 K,02,Europe2,Sweden3,Mace2,rank14,check2,21234 file-3 ------- H,03,German2,Ger,hgtk,rank4,hekc,1245 I,04,Negria2,neg,ghsjk,rankk1,jusk,4562 K,11,Europe5,Sweden6,Mace3,rank16,check11,42234 file-4 ------- H,16,German2,Ger,hgtk,rank4,hekc,1245 I,17,Negria2,neg,ghsjk,rankk1,jusk,4562 K,11,Europe5,Sweden6,Mace3,rank16,check11,42234

        I need to see if the first field in condition_file exists in any of the files available in @array

        foreach my $file_name (@array) { open FILE, "$file_name" or die "File not exists"; while ( chomp( my ( $field1, $field2 ) = ( split /\,/, <FILE> ) ) ) { if (exists $records{$field1}) { $field-2 = $records{$field1}; if $field2 = $field-2; { print OUTPUT ( ( join ",", $field1, $field2 ), "\n" ); }}}}

        code works like... it reads each file for example file-1, takes first two elements field1 --> K and field2 --> 01 and checks if field1 exists in records array if exists it assigns its second field to field-2, then checks if field2 and field-2 are equal, if equal then print the value to OUTPUT. This prints the data to output text file as

        Output what i got........

        output file ------------- K,01, J,02, K,01, J,02, H,03, I,04, K,01, H,03, I,04,

        Output i am expecting

        output file ------------- K,01,Europe,Sweden,Mace,rank1,check,01234 J,02,Australia,Sydney,Syd,rank2,chek1,01234 K,01,China,chen,mar,rank4,chack,11234 J,02,japan,Syin,yhk,ranek,chek2,21234 H,03,German,Ger,hgtk,rank4,hekc,1245 I,04,Negria,neg,ghsjk,rankk1,jusk,4562 K,01,Europe1,Sweden4,Mace1,rank15,check1,12234 H,03,German2,Ger,hgtk,rank4,hekc,1245 I,04,Negria2,neg,ghsjk,rankk1,jusk,4562

        The above logic is taking almost 4 hours of time to fetch the data from large/huge files