in reply to compare two text file line by line, how to optimise
Cleaned up and using a test harness (i.e. some internal test data and print to stdout) you code looks like this:
#!/usr/bin/perl use strict; use warnings; my $file1 = <<FILE; chirac prime paris chirac prime jacques chirac prime president chirac paris france chirac paris french FILE my $file2 = <<FILE; chirac presidential migration chirac presidential paris chirac prime president chirac presidential 007 chirac paris migration chirac paris french FILE #open my $inA, '<', $ARGV[0] or die "Can;t open $ARGV[0]: $!\n"; #open my $inB, '<', $ARGV[1] or die "Can;t open $ARGV[0]: $!\n"; open my $inA, '<', \$file1; open my $inB, '<', \$file2; #print "bonjour\n"; #print "choose the output file name\n"; # #chomp(my $fic2 = <STDIN>); #open my $outFile, '>', $fic2 or die "Can't create $fic2: $!\n"; my @aLines; while (my $ligne = <$inA>) { chomp $ligne; push @aLines, lc($ligne); } while (my $che = <$inB>) { chomp $che; my @bWords = split(/\s/, $che); foreach my $kh (@aLines) { my @aWords = split(/\s/, $kh); my $total = 0; for my $bWord (@bWords) { my $matched; for my $aWord (@aWords) { $matched = $bWord eq $aWord; last if $matched; } $total++ if $matched; } #print the retrieved line #print $outFile "$.: $kh\n"; print "$.: $kh\n"; } }
Prints:
1: chirac prime paris 1: chirac prime jacques 1: chirac prime president 1: chirac paris france 1: chirac paris french 2: chirac prime paris 2: chirac prime jacques 2: chirac prime president 2: chirac paris france 2: chirac paris french 3: chirac prime paris 3: chirac prime jacques 3: chirac prime president 3: chirac paris france 3: chirac paris french 4: chirac prime paris 4: chirac prime jacques 4: chirac prime president 4: chirac paris france 4: chirac paris french 5: chirac prime paris 5: chirac prime jacques 5: chirac prime president 5: chirac paris france 5: chirac paris french 6: chirac prime paris 6: chirac prime jacques 6: chirac prime president 6: chirac paris france 6: chirac paris french
which doesn't work as advertised. Making this code generate the wrong answer more quickly is possible, but probably not what you actually want to do! Maybe you should more fully describe what it is you want to achieve? Are you looking for matching lines (in which case the word matching stuff and nested loops is bogus), or do you want to match lines that have some minimum number of matching words, or something else? We can't tell unless you tell us. Tell us why you are doing this and we may be able to make better guesses!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: compare two text file line by line, how to optimise
by thespirit (Novice) on Feb 26, 2016 at 14:26 UTC | |
|
Re^2: compare two text file line by line, how to optimise
by thespirit (Novice) on Feb 26, 2016 at 11:28 UTC | |
by poj (Abbot) on Feb 26, 2016 at 12:27 UTC | |
by thespirit (Novice) on Feb 26, 2016 at 14:23 UTC | |
by poj (Abbot) on Feb 27, 2016 at 13:30 UTC |