I have a program that read tow Text files, and compare each line word by word, if the line contain a cerain number of term as intersection , so the program will display the line
#!/usr/bin/perl print "bonjour\n"; open(FIC, $ARGV[0]); open(FICC, $ARGV[1]); my @a = (); my @b = (); my $l=0; my $v=0; my $g=0; my $h=0; my $t=0; my $q=0; print "choose the outpu file name\n"; chomp(my $fic2=<STDIN>); open(FIC2, ">$fic2"); #--------------------------------------------------- # initialisation des variables #--------------------------------------------------- $i=0; $j=0; $u=0; $v=0; $t=0; $kk=0; $total=0; while (<FICC>) { my $ligne=$_; $b[$i]=lc($ligne); $i++; } while (<FIC>) { my $ligne=$_; $a[$j]=lc($ligne); $j++; } foreach my $che(@b){ chomp($che); @aa=split(/\s/,$che); $u++; foreach my $kh(@a){ chomp($kh); @bb=split(/\s/,$kh); $v++; $t=0;$total=0; for ($l=0;$l<=$#bb;$l++){ for ($m=0;$m<=$#aa;$m++){ # here compare the word of each line if(($bb[$l] eq $aa[$m]) ){ $t++; $m++; $kk=1; # if the tow termes are identical so $kk=1; goto che } } che: if($kk==1) { #calculate the number of identic terms per line with $number $total++; } $kk=0; } #print the retrieved line print FIC2 "$u: $kh\n"; } $v=0; }
the problem with this code it is too slow with file about 50 MO, how to speed up this code thank you
The File1 contain many line for example:</p> chirac prime paris chirac prime jacques chirac prime president chirac paris france chirac paris french
The File 2: chirac presidential migration chirac presidential paris chirac prime president chirac presidential 007 chirac paris migration chirac paris french
output 1: chirac prime paris 1: chirac prime jacques 1: chirac prime president 1: chirac paris france 1: chirac paris french 2: chirac prime paris 2: chirac prime jacques 2: chirac prime president 2: chirac paris france 2: chirac paris french 3: chirac prime paris 3: chirac prime jacques 3: chirac prime president 3: chirac paris france 3: chirac paris french 4: chirac prime paris 4: chirac prime jacques 4: chirac prime president 4: chirac paris france 4: chirac paris french 5: chirac prime paris 5: chirac prime jacques 5: chirac prime president 5: chirac paris france 5: chirac paris french 6: chirac prime paris 6: chirac prime jacques 6: chirac prime president 6: chirac paris france 6: chirac paris french
In reply to compare two text file line by line, how to optimise by thespirit
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |