in reply to Re: Comparing 2 arrays too find the total number of identical elements.
in thread Comparing 2 arrays too find the total number of identical elements.

Thanks for the help correcting the spelling and the [] brackets fixed it, don't know how I missed them. Here's my code anyway.
#!/usr/bin/perl print "content-type: text/html \n\n"; open MYINPUTFILE, "<projecta.txt"; my $holdTerminator = $/; undef $/; my $projectA = <MYINPUTFILE>; $/ = $holdTerminator; my @lines = split /$holdTerminator/, $projectA; $projectA = "init"; $projectA = join $holdTerminator, @lines; print $projectA; print "\n"; open MYINPUTFILEE, "<projectb.txt"; my $holdTerminator1 = $/; undef $/; my $projectB = <MYINPUTFILEE>; $/ = $holdTerminator1; my @lines = split /$holdTerminator1/, $projectB; $projectB = "init"; $projectB = join $holdTerminator1, @lines; print $projectB; print "\n"; $MatchCount = 0; @sentencesA = split(/\./, $projectA); print @sentencesA; @sentencesB = split(/\./, $projectB); print "\n"; print @sentencesB; $arrLenA = scalar @sentencesA; print $arrLenA; print "\n"; $arrLenB = scalar @sentencesB; print $arrLenB; print "\n"; for ($z=0;$z<=$arrLenA;$z++){ for ($i=0;$i<=$arrLenB;$i++){ if ($sentencesA[$z] eq $sentencesB[$i]){ $MatchCount++; } } } print $MatchCount; close(MYINPUTFILE); close(MYINPUTFILEE);
  • Comment on Re^2: Comparing 2 arrays too find the total number of identical elements.
  • Download Code

Replies are listed 'Best First'.
Re^3: Comparing 2 arrays too find the total number of identical elements.
by polettix (Vicar) on Apr 22, 2007 at 19:33 UTC
    You're still missing these two fundamental lines:
    use strict; use warnings;
    at the beginning of your program. Get into the habit to always include them, because they will help you spotting various errors and pitfalls.

    Regarding loading files, note that you had to cut-and-paste the code while you could have factored it all out into a function. Moreover, IMHO you're using constructs that I'd avoid. If you're able to install modules in your system, you could install File::Slurp and do something along these lines:

    use File::Slurp qw( read_file ); my $projectA = read_file('projecta.txt'); my $projectB = read_file('projectb.txt');
    If you want to roll your own, implement the read_file() sub yourself:
    sub read_file { my $filename = shift; # Force whole file into one scalar unless we want # each "line" on its own local $/ unless wantarray; open my $fh, '<', $filename # USE 3-args version of open! or die "open('$filename'): $!"; # verify your open! return <$fh>; # enjoy auto-close of $fh :) }

    Regarding the split into sentences, note that there are other sentence terminators (question and exclamation marks, to name a few). Last, but not least, please re-read my previous post about your algorithm.

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
Re^3: Comparing 2 arrays too find the total number of identical elements.
by johngg (Canon) on Apr 22, 2007 at 22:24 UTC
    frodo72's advice to put use strict; and use warnings; at the top of your scripts is worth paying attention to, as is the advice to use a 3-argument open and check that it succeeded. They will save you a lot of time and grief in the long run.

    I am puzzled by data reading part of the code you have shown

    my $holdTerminator = $/; undef $/; my $projectA = <MYINPUTFILE>; $/ = $holdTerminator; my @lines = split /$holdTerminator/, $projectA; $projectA = "init"; $projectA = join $holdTerminator, @lines; print $projectA; print "\n";

    Why do you slurp the whole file into the scalar $projectA then split it into lines in the array @lines, assign the string "init" to $projectA but then immediately overwrite that by join'ing the elements of @lines again? All you have done is take the last line terminator off the string in $projectA and I'm not sure what the significance of the 'init' is to you but the line $projectA = "init"; is futile in your code as you trample all over it straight away. Did you want to put a line containing 'init' at the beginning of your text? I don't see any further mention of it outside your data reading blocks so I'm not sure why it is there. Perhaps you could clarify the point.

    There is no need to remember and reset $/ using a temporary variable. Just localise it, perhaps in a subroutine as frodo72 has shown you or in a bare code block.

    use strict; use warnings; print qq{content-type: text/html \n\n}; my $inputFile = q{projecta.txt}; open my $inputFH, q{<}, $inputFile or die qq{open: $inputFile: $!\n}; my $projectA; { local $/; $projectA = <$inputFH>; } close $inputFH or die qq{close: $inputFile: $!\n}; # If you want a line containing "init" first # uncomment the line below # # $projectA = qq{init\n} . $projectA; print $projectA; ...

    It is a good idea to explicitly close your files if not relying on an automatic close as a lexical filehandle goes out of scope.

    I think you might need to give some more thought to the way you split your text into sentences. Firstly, how do you want to treat sentences that are identical other than the fact that one has a double space in it somewhere and the other doesn't? You might wish to consider collapsing multiple spaces down to one. Secondly, you may have two identical sentences but, because of what has gone before, they wrap over lines at different points. Again, you might wish to consider replacing line terminators (\n in your case?) with spaces. Once you have your sentences split up, hashes is the way to go.

    I hope you find these thoughts useful.

    Cheers,

    JohnGG