barrymcv has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have been making an Online Assignment Submission System mostly in PHP for my Final Year Project. The system must check for similarities in submitted assignments. Basically I want to read in 2 files compare them and return the number of identical sentences. I have just started learning PERL few days ago, here is my code so far.... Thanks for any help, shouldn't be too difficult?
#!/Perl/bin/perl.exe print "content-type: text/html \n\n"; $projectA = (OPEN, c:\projecta.txt); $projectB = (OPEN, c:\projectb.txt); $MatchCount = 0; @sentencesA = split(/\./, $projectA); @sentencesB = split(/\./, $projectB); $arrLenA = scalar @sentencesA; $arrLenB = scalar @sentencesB; for ($z=0;$z<=$arrLenA;$z++){ for ($i=0;$i<=$arrLenB;$i++){ if $sentencesA[$i] == $setencesB[$z]{ $MatchCount++; } } } return $MatchCount;

Replies are listed 'Best First'.
Re: compare 2 files and return the number of similar sentences
by moklevat (Priest) on Apr 20, 2007 at 13:10 UTC
    You are going to want to use eq instead of == for your comparison. In perl, eq is for string comparisons and == is for numerical comparisons. Also, including use strict; and use warnings; at the top of your script will save you a lot of grief in the long run.
Re: compare 2 files and return the number of similar sentences
by derby (Abbot) on Apr 20, 2007 at 13:35 UTC

    Sounds like a poor mans version of turnitin. Well a couple of things.

    • Your open syntax is not valid
    • Looks like a CGI program - so use CGI
    • You haven't read in the contents of the file
    • Splitting on just a period is not going to help - what about ?, !, ;
    • What if the sentences are slightly out of order ... you may want to sort

    -derby
Re: compare 2 files and return the number of similar sentences
by Krambambuli (Curate) on Apr 20, 2007 at 13:28 UTC
    Maybe you know or can make some assumptions about the general conditions required for your program.

    Your code (with the above mentioned correction for using 'eq' instead of '==') might be already ok; it might be not ok if the files to compare could be large/huge.

    Then some other possible restrictions/caveats would come up:

    - would there be any restrictions on how much memory is available ?
    - would it be important to get a quick answer or is it no problem to let the processing take some time ?
    - would 'similarity' mean 'equality' (as in your code), or would a somewhat 'softer' test be needed (allowing for a variable number of whitespaces, linebreaks, casing, ...) ?

    Considering such issues would make it possible to know if the solution drafted above is perfectly enough or if it would need refinements.
      Thanks for the quick responses. Memory + time shouldn't be an issue and the files will be small. Keeping it simple so I would just like to find identical sentences. Is it difficult to pass the $MatchCount variable back into a php script? Thanks again.
Re: compare 2 files and return the number of similar sentences
by zentara (Cardinal) on Apr 20, 2007 at 13:58 UTC