in reply to Re^2: compare two text file line by line, how to optimise
in thread compare two text file line by line, how to optimise

I don't understand the utility of sub buildTestFile, please can you explain ?

GrandFather has posted an SSCCE which is the best way to illustrate some situation in code. Rather than distribute countless MB of data as the input (which would have been rather impolite), the SSCCE builds them on the fly. This is what buildTestFile does.

Wht is the utility of %words! that we don't use in any other part of the code

Using the hash forces uniqueness as this is a property of hash keys.

Replies are listed 'Best First'.
Re^4: compare two text file line by line, how to optimise
by thespirit (Novice) on Feb 28, 2016 at 11:09 UTC
    Hi But i don't want to test! i have data, and i search for true result and not an random output! when i eliminate the buildTestFile and just use the rest of the code, it take days to treat 50 Mb of data, not what specified in 3 minute. This code is also slow like all the other with my 2GB RAM computer :( Regards

      Run this simple program with minimal processing against your data and post the results. This will help eliminate one potential source of your problem (i/o) and provide a better indication of your data than just a size of 50M

      #!/usr/bin/perl use strict; my $t0 = time; my $file1 = $ARGV[0] || 'ficc.txt'; my $file2 = $ARGV[1] || 'fic.txt'; my $count1=0; my $words1=0; open FICC,'<',$file1 or die "$file1 : $!"; while (<FICC>) { my @words = split /\s+/,lc $_; $words1 += @words; ++$count1; } close FICC; my $count2=0; my $words2=0; open FIC,'<',$file2 or die "$file2 : $!"; while (<FIC>) { my @words = split /\s+/,lc $_; $words2 += @words; ++$count2; } close FICC; my $dur = int time-$t0; print " File1 : $count1 lines $words1 words in $file1 File2 : $count2 lines $words2 words in $file2 Time : $dur seconds\n";
      poj
        hi File1 : 3874004 lines 6050371 words in file1 File2 : 4305242 lines 6457863 words in file2 Time : 33 seconds Thanks