I haven't tried to understand every line, but where exactly is the performance gain compared to the naive algorithm? Instead of comparing m positions you compare 2^(m-1) patterns?
for my $pat (1 .. 2**$m-1) {
...
for my $i (0 .. $n-2) {
for my $j ( $i+1 .. $n-1 ) {
...
}
}
}