Finding vowels in a cryptogram

tall_man has asked for the wisdom of the Perl Monks concerning the following question:

One way of starting to solve simple-substitution cryptograms is to try to find the vowels. I look for high-frequency letters that tend not to contact each other very frequently. (I've seen other approaches to crytograms here, like merlyn's pat program, but nothing for vowels).

I would like to automate the process of finding vowels, and it seems like a tree cluster analysis algorithm might work, as described on this page.

The distance measure is a little tricky. I would like the distance measure from letter 'x' to letter 'y' to be a percent disagreement, (perhaps number of times 'x' contacts 'y' over the total number of contacts for 'x' and 'y'). (The more contacts the greater the distance, since I am looking for letters that avoid each other).

I took a look at Algorithm::Cluster, but it doesn't seem to be directly applicable to this case. It's more for genetic data with real-number values.

Here is my starting code, (just getting the single-letter and digram frequencies). Any suggestions on modules I could use to help solve this?

use strict;
use Statistics::Frequency;
use FileHandle;
use Data::Dumper;

sub simplifyText {
   my $txt = shift;
   $txt =~ s/\s+/ /g;
   $txt =~ tr/A-Z/a-z/;
   $txt =~ tr[.(),/:][]d;
   return $txt;
}

my $f1  = Statistics::Frequency->new;

my $fn = "/net/fox/vol02/tallman/notes/dynamac";
my $fh = new FileHandle("<" . $fn);
defined $fh or die "Cannot open $fn: $!\n";
local $/ = undef;

my $text = <$fh>;
$text = simplifyText($text);
print "text *$text*\n";

my @txt = split //,$text;
my @txt_nospaces = grep { $_ ne ' ' } @txt;
$f1->add_data(\@txt_nospaces);

my $f2 = Statistics::Frequency->new;
my $last = undef;
my $letter;
my @pairs = ();
foreach $letter (@txt) {
   if ($letter eq ' ') {
      $last = undef;
      next;
   }
   push @pairs,($last . $letter) if defined $last;
   $last = $letter;
}
$f2->add_data(\@pairs);

my %freq = $f1->frequencies;
print Data::Dumper->Dump([\%freq],["*freq"]);

my %freq2 = $f2->frequencies;
print Data::Dumper->Dump([\%freq2],["*freq2"]);
[download]

Comment on Finding vowels in a cryptogram Download Code