Reworking your code slightly to use "built in" files results are as expected:

#!/usr/bin/perl use strict; use warnings; =pod Removed original file name code to make sample self contained. my $f1 = shift; my $f2 = shift; if (! defined($f1) or ! defined($f2)) { die "Need two text file names as arguments. \n"; } =cut my $file1Content = <<CONTENT; red green blue red orange CONTENT my $file2Content = <<CONTENT; yellow orange red grey purple CONTENT my %results; open my $file1, '<', \$file1Content; while (my $line = <$file1>) { $line =~ s/[[:punct:]]//g; for my $word (split(/\s+/, $line)) { $word =~ s/[^A-Za-z0-9]//g; $results{lc $word} = 1; } } my @words2; my @storage; open my $file2, '<', \$file2Content; while (my $line = <$file2>) { $line =~ s/[[:punct:]]/ /g; @words2 = grep {/\S/} split(/ /, $line); for (my $i = 0; $i < scalar @words2; $i++) { $words2[$i] = lc($words2[$i]); $words2[$i] =~ s/[^A-Za-z0-9]//g; push(@storage, $words2[$i]); if (grep {$_ eq $words2[$i]} @storage[0 .. $#storage - 1]) { $results{$words2[$i]} = 1; } else { $results{$words2[$i]}++; } } } my $counter = 0; foreach my $words (sort {$results{$b} <=> $results{$a}} keys %results) + { if ($results{$words} > 1) { $counter = $counter + 1; print $words, "\n\n"; } } printf "Found %1.0f words in common\n", $counter;

Prints:

orange red Found 2 words in common

Maybe you can provide "file contents" that fail in the way you didn't describe?

Of course, the code can be cleaned up a little:

#!/usr/bin/perl use strict; use warnings; my $file1Content = <<CONTENT; red green blue red orange CONTENT my $file2Content = <<CONTENT; yellow orange red grey purple CONTENT my %group1; open my $file1, '<', \$file1Content; while (my $line = <$file1>) { my @words = map {lc} grep {$_} split /[\W\d]+/, $line; $group1{$_} = $_ for @words; } my %common; open my $file2, '<', \$file2Content; while (my $line = <$file2>) { my @words = map {lc} grep {/\S/} split /[\W\d]+/, $line; $common{$_} = $_ for grep {exists $group1{$_}} @words; } print "$_\n\n" for sort values %common; printf "Found %1.0f words in common\n", scalar keys %common;
Premature optimization is the root of all job security

In reply to Re: Extracting common words by GrandFather
in thread Extracting common words by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.