Re: Capitalization Clusters

A slightly more straight-forward approach (IMO). It picks up 'These,' unless you uncomment the first line in the regexp (but then you run the risk of missing proper names at the begining of sentences; you still need to decide how you want to handle that).

#!/usr/bin/perl -l
use warnings;
use strict;

my $data = 'Douglas built five Douglas World Cruisers to attempt his f
+irst flight to Buenos Aires. These were the predecesors of the modern
+ AH-64D and AH-64D Apache.';

# What do you call a capitalized "word"?
my $cap_word = qr/[A-Z][\w-]*/;

my @clusters = $data =~ /
  #(?<!\.\s)           # Ignore words at begining of sentences?
  (
    $cap_word          # Capitalized word, followed by any number
    (?:\s+$cap_word)*  # of other cap words (separated by spaces)
  )
/gx;

# Update: Oh, you wanted the largest...
print "Largest cluster: ", (sort { length $b <=> length $a } @clusters
+)[0];
[download]

bbfu
Black flowers blossom
Fearless on my breath

Comment on Re: Capitalization Clusters Download Code