Re^3: Comparing 2 arrays too find the total number of identical elements.

frodo72's advice to put use strict; and use warnings; at the top of your scripts is worth paying attention to, as is the advice to use a 3-argument open and check that it succeeded. They will save you a lot of time and grief in the long run.

I am puzzled by data reading part of the code you have shown

my $holdTerminator = $/; undef $/; my $projectA = <MYINPUTFILE>; $/ = $holdTerminator; my @lines = split /$holdTerminator/, $projectA; $projectA = "init"; $projectA = join $holdTerminator, @lines; print $projectA; print "\n";
[download]

Why do you slurp the whole file into the scalar $projectA then split it into lines in the array @lines, assign the string "init" to $projectA but then immediately overwrite that by join'ing the elements of @lines again? All you have done is take the last line terminator off the string in $projectA and I'm not sure what the significance of the 'init' is to you but the line $projectA = "init"; is futile in your code as you trample all over it straight away. Did you want to put a line containing 'init' at the beginning of your text? I don't see any further mention of it outside your data reading blocks so I'm not sure why it is there. Perhaps you could clarify the point.

There is no need to remember and reset $/ using a temporary variable. Just localise it, perhaps in a subroutine as frodo72 has shown you or in a bare code block.

use strict;
use warnings;

print qq{content-type: text/html \n\n};

my $inputFile = q{projecta.txt};
open my $inputFH, q{<}, $inputFile
   or die qq{open: $inputFile: $!\n};

my $projectA;
{
    local $/;
    $projectA = <$inputFH>;
}

close $inputFH
   or die qq{close: $inputFile: $!\n};

# If you want a line containing "init" first
# uncomment the line below
#
# $projectA = qq{init\n} . $projectA;

print $projectA;

...
[download]

It is a good idea to explicitly close your files if not relying on an automatic close as a lexical filehandle goes out of scope.

I think you might need to give some more thought to the way you split your text into sentences. Firstly, how do you want to treat sentences that are identical other than the fact that one has a double space in it somewhere and the other doesn't? You might wish to consider collapsing multiple spaces down to one. Secondly, you may have two identical sentences but, because of what has gone before, they wrap over lines at different points. Again, you might wish to consider replacing line terminators (\n in your case?) with spaces. Once you have your sentences split up, hashes is the way to go.

I hope you find these thoughts useful.

Cheers,

JohnGG

Comment on Re^3: Comparing 2 arrays too find the total number of identical elements. Select or Download Code