barrymcv has asked for the wisdom of the Perl Monks concerning the following question:

Just a quick question monks. I'v been trying this code for a while to count the number of identical elements in the arrays and it keeps returning a number 1 greater then the array length? If I was to test 2 arrays of 10 elements each I get 11 even if only 1 was similar. The arrays contain string elements.
$MatchCount = 0; for ($z=0;$z<=$arrLenA;$z++){ for ($i=0;$i<=$arrLenB;$i++){ if ($sentencesA[$z] eq $setencesB{$i}){ $MatchCount++; } } } print $MatchCount;
Thanks in advance
  • Comment on Comparing 2 arrays too find the total number of identical elements.
  • Download Code

Replies are listed 'Best First'.
Re: Comparing 2 arrays too find the total number of identical elements.
by naikonta (Curate) on Apr 22, 2007 at 14:15 UTC
    OK, here's the quick answer: Array in perlfaq, Array on PerlMonks Q&A.

    Basically you need hash to track the elements of the arrays. Here's a clue:

    my @array1 = qw(perl monks camel); my @array2 = qw(node thread monks); my %hash; for (@array1, @array2) { $hash{$_}++; }
    If you iterate the %hash you'll get list of unique elements and the number of each element in the arrays. You can do that, right? I mean the hash iteration. Let us know after you try that. Good luck, barrymcv :-)

    Oh wait, why didn't you tell us you already asked about this? What did you learn from there, and what went wrong so you need to ask again with a bit different question?


    Update: Added ref to his previous question.

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

Re: Comparing 2 arrays too find the total number of identical elements.
by betterworld (Curate) on Apr 22, 2007 at 14:13 UTC
    if ($sentencesA[$z] eq $setencesB{$i}){

    You seem to be confusing curly braces with square brackets. Plus, "setences" should probably be "sentences". These mistakes would not have happened with strict and warnings.

    I cannot test that code because I don't know what is in the variables.

    Maybe List::Compare is the best solution to your problem.

    Update: Changed List::Util to List::Compare
Re: Comparing 2 arrays too find the total number of identical elements.
by polettix (Vicar) on Apr 22, 2007 at 14:16 UTC
    There is too little context in your code sample, for example you don't show how $arrLenA and $arrLenB are initialised. Beyond this, I strongly suspect you're not running under strict and warnings, that are likely to hint you that there's something wrong with your code. As an example, $sentencesA[$z] tells us that there should be an @sentencesA array, but $setencesB{$i} shows two things:
    • you are probably mispelling your variable name (maybe you wanted to write sentencesB?)
    • you're not using an array, but hash %setencesB
    Please clean up your code, and bake a self-contained example that's easy for us to cut-and-paste into a file to see what's wrong. Strive to make it as little as possible, but working. Be sure to put
    use strict; use warnings;
    at the beginning. You'll probably see that when you've got the example ready without receiving any compilation error or warning, your code will probably work ;-).

    On a more general ground, there are other things to note:

    • you correctly start iterating from index 0, but then you should probably set your continuation condition to $z < $arrLenA instead of <=
    • if both array contain the string 'hello' repeated 10 times, the result will be 10 * 10 = 100 matches, because you're comparing each element in A to each element in B. Is this what you really want?
    Hope this helps!

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
      Thanks for the help correcting the spelling and the [] brackets fixed it, don't know how I missed them. Here's my code anyway.
      #!/usr/bin/perl print "content-type: text/html \n\n"; open MYINPUTFILE, "<projecta.txt"; my $holdTerminator = $/; undef $/; my $projectA = <MYINPUTFILE>; $/ = $holdTerminator; my @lines = split /$holdTerminator/, $projectA; $projectA = "init"; $projectA = join $holdTerminator, @lines; print $projectA; print "\n"; open MYINPUTFILEE, "<projectb.txt"; my $holdTerminator1 = $/; undef $/; my $projectB = <MYINPUTFILEE>; $/ = $holdTerminator1; my @lines = split /$holdTerminator1/, $projectB; $projectB = "init"; $projectB = join $holdTerminator1, @lines; print $projectB; print "\n"; $MatchCount = 0; @sentencesA = split(/\./, $projectA); print @sentencesA; @sentencesB = split(/\./, $projectB); print "\n"; print @sentencesB; $arrLenA = scalar @sentencesA; print $arrLenA; print "\n"; $arrLenB = scalar @sentencesB; print $arrLenB; print "\n"; for ($z=0;$z<=$arrLenA;$z++){ for ($i=0;$i<=$arrLenB;$i++){ if ($sentencesA[$z] eq $sentencesB[$i]){ $MatchCount++; } } } print $MatchCount; close(MYINPUTFILE); close(MYINPUTFILEE);
        You're still missing these two fundamental lines:
        use strict; use warnings;
        at the beginning of your program. Get into the habit to always include them, because they will help you spotting various errors and pitfalls.

        Regarding loading files, note that you had to cut-and-paste the code while you could have factored it all out into a function. Moreover, IMHO you're using constructs that I'd avoid. If you're able to install modules in your system, you could install File::Slurp and do something along these lines:

        use File::Slurp qw( read_file ); my $projectA = read_file('projecta.txt'); my $projectB = read_file('projectb.txt');
        If you want to roll your own, implement the read_file() sub yourself:
        sub read_file { my $filename = shift; # Force whole file into one scalar unless we want # each "line" on its own local $/ unless wantarray; open my $fh, '<', $filename # USE 3-args version of open! or die "open('$filename'): $!"; # verify your open! return <$fh>; # enjoy auto-close of $fh :) }

        Regarding the split into sentences, note that there are other sentence terminators (question and exclamation marks, to name a few). Last, but not least, please re-read my previous post about your algorithm.

        Flavio
        perl -ple'$_=reverse' <<<ti.xittelop@oivalf

        Don't fool yourself.
        frodo72's advice to put use strict; and use warnings; at the top of your scripts is worth paying attention to, as is the advice to use a 3-argument open and check that it succeeded. They will save you a lot of time and grief in the long run.

        I am puzzled by data reading part of the code you have shown

        my $holdTerminator = $/; undef $/; my $projectA = <MYINPUTFILE>; $/ = $holdTerminator; my @lines = split /$holdTerminator/, $projectA; $projectA = "init"; $projectA = join $holdTerminator, @lines; print $projectA; print "\n";

        Why do you slurp the whole file into the scalar $projectA then split it into lines in the array @lines, assign the string "init" to $projectA but then immediately overwrite that by join'ing the elements of @lines again? All you have done is take the last line terminator off the string in $projectA and I'm not sure what the significance of the 'init' is to you but the line $projectA = "init"; is futile in your code as you trample all over it straight away. Did you want to put a line containing 'init' at the beginning of your text? I don't see any further mention of it outside your data reading blocks so I'm not sure why it is there. Perhaps you could clarify the point.

        There is no need to remember and reset $/ using a temporary variable. Just localise it, perhaps in a subroutine as frodo72 has shown you or in a bare code block.

        use strict; use warnings; print qq{content-type: text/html \n\n}; my $inputFile = q{projecta.txt}; open my $inputFH, q{<}, $inputFile or die qq{open: $inputFile: $!\n}; my $projectA; { local $/; $projectA = <$inputFH>; } close $inputFH or die qq{close: $inputFile: $!\n}; # If you want a line containing "init" first # uncomment the line below # # $projectA = qq{init\n} . $projectA; print $projectA; ...

        It is a good idea to explicitly close your files if not relying on an automatic close as a lexical filehandle goes out of scope.

        I think you might need to give some more thought to the way you split your text into sentences. Firstly, how do you want to treat sentences that are identical other than the fact that one has a double space in it somewhere and the other doesn't? You might wish to consider collapsing multiple spaces down to one. Secondly, you may have two identical sentences but, because of what has gone before, they wrap over lines at different points. Again, you might wish to consider replacing line terminators (\n in your case?) with spaces. Once you have your sentences split up, hashes is the way to go.

        I hope you find these thoughts useful.

        Cheers,

        JohnGG