Re: Dividing a file into groups of two words and counting them

I may have misunderstood but I think you want each pair overlapping, i.e "this word and that" has three pairs viz. "this word", "word and" and "and that". If that is the case you can do a a global regex match using a capture for one word followed by non-word characters then a look-ahead assertion for the next word, also with a capture.

use strict;
use warnings;

open my $inFH, q{<}, \ <<'END_OF_FILE' or die qq{open: $!\n};
peter piper picked  a peck of pickled peppers
a peck of pickled peppers peter   piper picked
if peter   piper picked a peck of pickled peppers
where's the peck of pickled  peppers peter piper picked
END_OF_FILE

my $textFile = do
   {
       local $/;
       <$inFH>;
   };

close $inFH or die qq{close: $!\n};

my $rxWordPair = qr
   {(?x)
       ([\w'-]+)
       \W+
       (?=([\w'-]+))
   };

my %pairFrequencies;
while ( $textFile =~ m{$rxWordPair}g )
{
    $pairFrequencies{ qq{$1 $2} } ++;
}

print
   map  { qq{$_: $pairFrequencies{ $_ }\n} }
   sort
   {
       $pairFrequencies{$b} <=> $pairFrequencies{$a}
       ||
       $a cmp $b
   }
   keys %pairFrequencies;
[download]

This produces

of pickled: 4
peck of: 4
peter piper: 4
pickled peppers: 4
piper picked: 4
a peck: 3
peppers peter: 2
picked a: 2
if peter: 1
peppers a: 1
peppers where's: 1
picked if: 1
the peck: 1
where's the: 1
[download]

I hope this is useful.

Cheers,

JohnGG

Comment on Re: Dividing a file into groups of two words and counting them Select or Download Code