Leaving that aside, this may be one of those rare cases where a "goto" statement is worth having. Consider:
Because it ignores punctuation, this will do the wrong thing in a case like "I saw John Brown. He was dead." Good luck! (update: enhanced the final block of commentary in the code, and made the title more relevant)use strict; my @tokens = (); my $properName = ""; while (<>) { push @tokens, split; FINDCAP: # the "until" loop skips non-capitalized tokens until ( @tokens == 0 or $tokens[0] =~ /^\W*[A-Z][\'A-Za-z]*\b/ ) { shift @tokens; } # the "while" loop accumulates consecutive capitalized toke +ns, # if any were found that caused us to break out of the "unt +il" while ( @tokens and $tokens[0] =~ /^\W*([A-Z][\'A-Za-z]*)\b/ ) { $properName .= $1 . " "; shift @tokens; } # go into the next block if there are still tokens left # (this means we haven't reached the end of this line) if ( @tokens ) { if ( $properName =~ / [A-Z]/ ) { print $properName,$/; # print if $properName has >1 word } $properName = ""; # reset to empty string goto FINDCAP; # look through the remainder of this line } # that block was skipped if there are no tokens left # so we loop back to the outer "while" loop to get the # next line, and append its words to @tokens # (and $properName, if not empty, remains intact # for appending the next Capitalized Token, if any) }
In reply to Looking for proper names in text
by graff
in thread Out of memory
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |