in reply to Dividing a file into groups of two words and counting them

I may have misunderstood but I think you want each pair overlapping, i.e "this word and that" has three pairs viz. "this word", "word and" and "and that". If that is the case you can do a a global regex match using a capture for one word followed by non-word characters then a look-ahead assertion for the next word, also with a capture.

use strict; use warnings; open my $inFH, q{<}, \ <<'END_OF_FILE' or die qq{open: $!\n}; peter piper picked a peck of pickled peppers a peck of pickled peppers peter piper picked if peter piper picked a peck of pickled peppers where's the peck of pickled peppers peter piper picked END_OF_FILE my $textFile = do { local $/; <$inFH>; }; close $inFH or die qq{close: $!\n}; my $rxWordPair = qr {(?x) ([\w'-]+) \W+ (?=([\w'-]+)) }; my %pairFrequencies; while ( $textFile =~ m{$rxWordPair}g ) { $pairFrequencies{ qq{$1 $2} } ++; } print map { qq{$_: $pairFrequencies{ $_ }\n} } sort { $pairFrequencies{$b} <=> $pairFrequencies{$a} || $a cmp $b } keys %pairFrequencies;

This produces

of pickled: 4 peck of: 4 peter piper: 4 pickled peppers: 4 piper picked: 4 a peck: 3 peppers peter: 2 picked a: 2 if peter: 1 peppers a: 1 peppers where's: 1 picked if: 1 the peck: 1 where's the: 1

I hope this is useful.

Cheers,

JohnGG