in reply to Re^5: how to check if a particular value exist in list of files stored in an array
in thread how to check if a particular value exist in list of files stored in an array

Hi Experts,

The code which i used is taking more time to check each file and print the line to output file

As each file contains millions of records, taking field1 from the condition file and searching all the files in @filelist, the files may be 1...........100, the size of each file is in GB

foreach my $file_name (@array) { open FILE, "$file_name" or die "File not exists"; while ( my $line = <FILE> ) { my ( $field1, $field2 ) = split /\,/, $line; if (exists $records{$field1}) { $field_2 = $records{$field1}; if( $field2 eq $field_2 ) { print OUTPUT $line; } } } }

Please help on this

  • Comment on Re^6: how to check if a particular value exist in list of files stored in an array
  • Download Code

Replies are listed 'Best First'.
Re^7: how to check if a particular value exist in list of files stored in an array
by hdb (Monsignor) on Oct 30, 2013 at 08:59 UTC

    I would expect that checking several 100 GBytes takes some time...

    Update: If you store the information of your condition file already as "field1,field2" => 1, ie in the same format that you encounter in your large files you can rewrite it as:

    foreach my $file_name (@array) { open FILE, "$file_name" or die "File not exists"; while ( <FILE> ) { print OUTPUT $_ if /^(.*?,.*?),/ and exists $records{$1}; } }

    Maybe this is faster...

      Hi Experts,

      As mentioned in my pervious posts, i have condition-file which contains data as...

      condition-file --------------- K 01 J 02 I 03 H 04

      i have input files where i have stored all the files in a array called @filelist, which may be file1,fie2,....filen.Each file contains record like...

      file1 ------- K,01,data,japan,jung,kcika,Kisjl,01234 J,02,thaja,China,cunk,Jksb,Yjski,23456

      i need to take first element from the condition-file and need to check if it exists in all files exist in @filelist for example, i have taken the first element from condition-file, K and will check if this exists in each file in @filelist check in file1, if exists print the line to output text file, check file 2 and do the same till filen

      i have code which is taking more than 1 hour.Although it works fine, it takes very long time, 30 min to 1 hour. I also tried following Perl script

      foreach my $file_name (@array) { open FILE, "$file_name" or die "File not exists"; while ( my $line = <FILE> ) { my ( $field1, $field2 ) = split /\,/, $line; if (exists $records{$field1}) { $field_2 = $records{$field1}; if( $field2 eq $field_2 ) { print OUTPUT $line; } } } }

      I was wondering if someone could suggest a solution

        Have you checked how long it takes to fetch the data without doing anything else?

        You can't get faster than the time it takes you to fetch the data.

        Maybe you can be smarter and find out that you don't need to read all the data.

        • For example, if you can guess in what file what data is most likely and start with that file.
        • Or, if you need to search the same files for multiple times, it may prove fruitful to create an index into the files that tells you where what data lives.
        • Or it might be fruitful to sort your large files so that you can very quickly check whether data is available.

        Only you with your knowledge about the files and the data can decide whether you need to read through all files to find the entries or not.

        Hi Experts,

        As mentioned in my pervious posts, i have condition-file which contains data as...

        condition-file --------------- K 01 J 02 I 03 H 04

        i have input files where i have stored all the files in a array called @filelist, which may be file1,fie2,....filen.Each file contains record like...

        file1 ------- K,01,data,japan,jung,kcika,Kisjl,01234 J,02,thaja,China,cunk,Jksb,Yjski,23456

        i need to take first element from the condition-file and need to check if it exists in all files exist in @filelist for example, i have taken the first element from condition-file, K and will check if this exists in each file in @filelist check in file1, if exists print the line to output text file, check file 2 and do the same till filen

        i have code which is taking more than 1 hour.Although it works fine, it takes very long time, 30 min to 1 hour. I also tried following Perl script

        foreach my $file_name (@array) { open FILE, "$file_name" or die "File not exists"; while ( my $line = <FILE> ) { my ( $field1, $field2 ) = split /\,/, $line; if (exists $records{$field1}) { $field_2 = $records{$field1}; if( $field2 eq $field_2 ) { print OUTPUT $line; } } } }

        I was wondering if someone could suggest a solution