in reply to Re: compare two text file line by line, how to optimise
in thread compare two text file line by line, how to optimise

Thank you for your replay Your code is well writing and concise :) This is what my code do, and the output that i wrote in the post is an error. I tested your code , it is also slow like my code, because it do exactly the same processing. What do you think if we stock the second file or both fiels in a hash of table
  • Comment on Re^2: compare two text file line by line, how to optimise

Replies are listed 'Best First'.
Re^3: compare two text file line by line, how to optimise
by poj (Abbot) on Feb 26, 2016 at 12:27 UTC
    the output that i wrote in the post is an error

    But you haven't shown what the correct output should be so we can only guess what you are trying to do. Here's my guess, matching a combination of words from FIC with lines in FICC

    #!/usr/bin/perl use strict; my @FIC = (); #open FIC,'<','fic.txt' or die "$!"; #while (my $line = <FIC>){ # next unless $line =~ /\S/; # my @words = split /\s+/,$line; # push @FIC,[ @words ]; #} #close FIC; @FIC = ( [ qw(chirac prime paris)], [ qw(chirac prime jacques) ], [ qw(chirac prime president) ], [ qw(chirac paris france) ], [ qw(chirac paris french) ], ); my $u=0; open FICC,'<','ficc.txt' or die "$!"; #open OUT, '>','output.txt' or die "$!"; while (my $line = <FICC>){ ++$u; next unless $line =~ /\S/; # skip blank lines for my $ar (@FIC){ my @matched = grep $line=~/$_/,@$ar; if (@matched == @$ar){ print "$u: $line matched all words : @matched\n\n"; #print OUT "$u: $line matched all words : @matched\n\n"; last; } } } close FICC; #close OUT __DATA__ chirac presidential migration chirac presidential paris jacques chirac has been the prime minster and the president chirac presidential 007 chirac paris migration chirac aaa french bbb paris ccc
    poj
      Thank you for the replay, i edited the posted with the correct output

        So, taking the first line of file 2

        chirac presidential migration

        compare this with each line of file 1 in turn

        chirac prime paris
        chirac prime jacques
        chirac prime president 
        chirac paris france
        chirac paris french
        

        and calculate how many words match. Output the file 1 line if the count is greater than a minimum value. Repeat for each line in file 2.

        For this example, the number of words matching is only 1 ("chirac") in each case so if the minimum is 2 then none of the above lines be would output. Is that the logic ?

        poj