in reply to Re^3: how to check if a particular value exist in list of files stored in an array
in thread how to check if a particular value exist in list of files stored in an array

Hi Experts,

i have file called condition_file and the data in this file looks like....

condition_file K 01 J 02 H 03 I 04

I am using below code to read the file condition_file and place the values into an hash

open F, "$condition_file" or die "File not exists"; while (<F>) { chomp; $records{$field1} = $field1; }

I am reading list of files into array

@array -------- file-1 file-2 file-3 file-4

The content of each file

file-1 ------- K,01,Europe,Sweden,Mace,rank1,check,01234 J,02,Australia,Sydney,Syd,rank2,chek1,01234 K,01,China,chen,mar,rank4,chack,11234 J,02,japan,Syin,yhk,ranek,chek2,21234 file-2 ------- H,03,German,Ger,hgtk,rank4,hekc,1245 I,04,Negria,neg,ghsjk,rankk1,jusk,4562 K,01,Europe1,Sweden4,Mace1,rank15,check1,12234 K,02,Europe2,Sweden3,Mace2,rank14,check2,21234 file-3 ------- H,03,German2,Ger,hgtk,rank4,hekc,1245 I,04,Negria2,neg,ghsjk,rankk1,jusk,4562 K,11,Europe5,Sweden6,Mace3,rank16,check11,42234 file-4 ------- H,16,German2,Ger,hgtk,rank4,hekc,1245 I,17,Negria2,neg,ghsjk,rankk1,jusk,4562 K,11,Europe5,Sweden6,Mace3,rank16,check11,42234

I need to see if the first field in condition_file exists in any of the files available in @array

foreach my $file_name (@array) { open FILE, "$file_name" or die "File not exists"; while ( chomp( my ( $field1, $field2 ) = ( split /\,/, <FILE> ) ) ) { if (exists $records{$field1}) { $field-2 = $records{$field1}; if $field2 = $field-2; { print OUTPUT ( ( join ",", $field1, $field2 ), "\n" ); }}}}

code works like... it reads each file for example file-1, takes first two elements field1 --> K and field2 --> 01 and checks if field1 exists in records array if exists it assigns its second field to field-2, then checks if field2 and field-2 are equal, if equal then print the value to OUTPUT. This prints the data to output text file as

Output what i got........

output file ------------- K,01, J,02, K,01, J,02, H,03, I,04, K,01, H,03, I,04,

Output i am expecting

output file ------------- K,01,Europe,Sweden,Mace,rank1,check,01234 J,02,Australia,Sydney,Syd,rank2,chek1,01234 K,01,China,chen,mar,rank4,chack,11234 J,02,japan,Syin,yhk,ranek,chek2,21234 H,03,German,Ger,hgtk,rank4,hekc,1245 I,04,Negria,neg,ghsjk,rankk1,jusk,4562 K,01,Europe1,Sweden4,Mace1,rank15,check1,12234 H,03,German2,Ger,hgtk,rank4,hekc,1245 I,04,Negria2,neg,ghsjk,rankk1,jusk,4562

The above logic is taking almost 4 hours of time to fetch the data from large/huge files

Replies are listed 'Best First'.
Re^5: how to check if a particular value exist in list of files stored in an array
by hdb (Monsignor) on Oct 30, 2013 at 07:12 UTC

    A few comments:

    • $field-2 is not a valid variable name, so I have replaced it with $field_2
    • In order to get your complete output you need to store the full line of data. So I have changed the while statement, see below.
    • You do not need chomp as you are only looking at the first two fields. When $line is printed later, the newline is desired anyway.
    • $field2 = $field-2 is an assignment not a comparison. Use eq to compare strings, see below.

    foreach my $file_name (@array) { open FILE, "$file_name" or die "File not exists"; while ( my $line = <FILE> ) { my ( $field1, $field2 ) = split /\,/, $line; if (exists $records{$field1}) { $field_2 = $records{$field1}; if( $field2 eq $field_2 ) { print OUTPUT $line; } } } }
      Hi Experts,

      The code which i used is taking more time to check each file and print the line to output file

      As each file contains millions of records, taking field1 from the condition file and searching all the files in @filelist, the files may be 1...........100, the size of each file is in GB

      foreach my $file_name (@array) { open FILE, "$file_name" or die "File not exists"; while ( my $line = <FILE> ) { my ( $field1, $field2 ) = split /\,/, $line; if (exists $records{$field1}) { $field_2 = $records{$field1}; if( $field2 eq $field_2 ) { print OUTPUT $line; } } } }

      Please help on this

        I would expect that checking several 100 GBytes takes some time...

        Update: If you store the information of your condition file already as "field1,field2" => 1, ie in the same format that you encounter in your large files you can rewrite it as:

        foreach my $file_name (@array) { open FILE, "$file_name" or die "File not exists"; while ( <FILE> ) { print OUTPUT $_ if /^(.*?,.*?),/ and exists $records{$1}; } }

        Maybe this is faster...