Re: compare two text file line by line, how to optimise

Cleaned up and using a test harness (i.e. some internal test data and print to stdout) you code looks like this:

#!/usr/bin/perl
use strict;
use warnings;

my $file1 = <<FILE;
chirac prime paris
chirac prime jacques
chirac prime president 
chirac paris france
chirac paris french
FILE
my $file2 = <<FILE;
chirac presidential migration 
chirac presidential paris 
chirac prime president
chirac presidential 007
chirac paris migration 
chirac paris french
FILE

#open my $inA, '<', $ARGV[0] or die "Can;t open $ARGV[0]: $!\n";
#open my $inB, '<', $ARGV[1] or die "Can;t open $ARGV[0]: $!\n";
open my $inA, '<', \$file1;
open my $inB, '<', \$file2;

#print "bonjour\n";
#print "choose the output file name\n";
#
#chomp(my $fic2 = <STDIN>);
#open my $outFile, '>', $fic2 or die "Can't create $fic2: $!\n";

my @aLines;

while (my $ligne = <$inA>) {
    chomp $ligne;
    push @aLines, lc($ligne);
}

while (my $che = <$inB>) {
    chomp $che;

    my @bWords = split(/\s/, $che);

    foreach my $kh (@aLines) {
        my @aWords = split(/\s/, $kh);
        my $total = 0;
        
        for my $bWord (@bWords) {
            my $matched;
            
            for my $aWord (@aWords) {
                $matched = $bWord eq $aWord;
                last if $matched;
            }

            $total++ if $matched;
        }

        #print the retrieved line
        #print $outFile "$.: $kh\n";
        print "$.: $kh\n";
    }
}
[download]

Prints:

1: chirac prime paris
1: chirac prime jacques
1: chirac prime president 
1: chirac paris france
1: chirac paris french
2: chirac prime paris
2: chirac prime jacques
2: chirac prime president 
2: chirac paris france
2: chirac paris french
3: chirac prime paris
3: chirac prime jacques
3: chirac prime president 
3: chirac paris france
3: chirac paris french
4: chirac prime paris
4: chirac prime jacques
4: chirac prime president 
4: chirac paris france
4: chirac paris french
5: chirac prime paris
5: chirac prime jacques
5: chirac prime president 
5: chirac paris france
5: chirac paris french
6: chirac prime paris
6: chirac prime jacques
6: chirac prime president 
6: chirac paris france
6: chirac paris french
[download]

which doesn't work as advertised. Making this code generate the wrong answer more quickly is possible, but probably not what you actually want to do! Maybe you should more fully describe what it is you want to achieve? Are you looking for matching lines (in which case the word matching stuff and nested loops is bogus), or do you want to match lines that have some minimum number of matching words, or something else? We can't tell unless you tell us. Tell us why you are doing this and we may be able to make better guesses!

Premature optimization is the root of all job security

Comment on Re: compare two text file line by line, how to optimise Select or Download Code

Replies are listed 'Best First'.
Re^2: compare two text file line by line, how to optimise by thespirit (Novice) on Feb 26, 2016 at 14:26 UTC
Thank again, what i want exactly is to match lines that have minimum number of matching words	[reply]
Re^2: compare two text file line by line, how to optimise by thespirit (Novice) on Feb 26, 2016 at 11:28 UTC
Thank you for your replay Your code is well writing and concise :) This is what my code do, and the output that i wrote in the post is an error. I tested your code , it is also slow like my code, because it do exactly the same processing. What do you think if we stock the second file or both fiels in a hash of table	[reply]
Re^3: compare two text file line by line, how to optimise by poj (Abbot) on Feb 26, 2016 at 12:27 UTC
the output that i wrote in the post is an error But you haven't shown what the correct output should be so we can only guess what you are trying to do. Here's my guess, matching a combination of words from FIC with lines in FICC #!/usr/bin/perl use strict; my @FIC = (); #open FIC,'<','fic.txt' or die "$!"; #while (my $line = <FIC>){ # next unless $line =~ /\S/; # my @words = split /\s+/,$line; # push @FIC,[ @words ]; #} #close FIC; @FIC = ( [ qw(chirac prime paris)], [ qw(chirac prime jacques) ], [ qw(chirac prime president) ], [ qw(chirac paris france) ], [ qw(chirac paris french) ], ); my $u=0; open FICC,'<','ficc.txt' or die "$!"; #open OUT, '>','output.txt' or die "$!"; while (my $line = <FICC>){ ++$u; next unless $line =~ /\S/; # skip blank lines for my $ar (@FIC){ my @matched = grep $line=~/$_/,@$ar; if (@matched == @$ar){ print "$u: $line matched all words : @matched\n\n"; #print OUT "$u: $line matched all words : @matched\n\n"; last; } } } close FICC; #close OUT __DATA__ chirac presidential migration chirac presidential paris jacques chirac has been the prime minster and the president chirac presidential 007 chirac paris migration chirac aaa french bbb paris ccc [download] poj	[reply] [d/l]
Re^4: compare two text file line by line, how to optimise by thespirit (Novice) on Feb 26, 2016 at 14:23 UTC
Thank you for the replay, i edited the posted with the correct output	[reply]
Re^5: compare two text file line by line, how to optimise by poj (Abbot) on Feb 27, 2016 at 13:30 UTC