in reply to Re: File Similarity Concept
in thread File Similarity Concept

Thank You for the reply and the thought you put into it.

I'm taking your suggestions re slurping and the hash variant under serious consideration (and not responding directly to those, right now, because first I have to be sure I didn't miss something; that I understand your intent; and that I know how and where to implement them).

As to your point re specifying the v. in use5.018, I understand but choose to post with info for the reader on just what I used to run the script. While a downward revision might be 'kind' (as in "changing it so as to save another Monk the trouble of doing so") but might sometimes leave that individual without the info re what v. I used and would always incure extra work for me.

Update: fixed in para 2: s/reading/reader/

Replies are listed 'Best First'.
Re^3: File Similarity Concept
by FreeBeerReekingMonk (Deacon) on May 18, 2015 at 21:41 UTC

    I can clarify that intent. Consider this:

    my $file2 = <DATA>; chomp $file2; die $file2; __DATA__ The quick brown fox

    It yields:

    The quick at data.pl line 5, <DATA> line 1.

    where as

    undef $/; my $file2 = <DATA>; chomp $file2; die $file2; __DATA__ The quick brown fox

    yields:

    The quick brown fox

    I understand your dillema with versions, and of course, you are free to do so.

    As for the counting: I suggested to count all words in one file as positives, and all words in the other file as negative. Thus, if the word "the" has the same occurrance in both files, then the value for that word in the hash will be zero. And either positive or negative if it occurs more than n time in one of them.