http://qs1969.pair.com?node_id=517242


in reply to Re: Creating Dictionaries
in thread Creating Dictionaries

This line has some problems:
while (my ($word) = (lc $word) =~ /[a-z]{2,}/g) {
It won't compile under use strict; ... i think you meant to pattern match against lc $line, and there's no parens in the regex to capture anything ...

Working off your general idea, i came up with (no clue how this rates performance-wise against OP or my other solution below):
my %hash; while (my $line = <STDIN>) { foreach my $word ( $line =~ m/\b([a-zA-Z]{2,4})\b/g ) { $hash{lc $word}++; } } print "$_\n" for sort keys %hash;
Which can be rewritten as:
while (my $line = <STDIN>) { $hash{lc $_}++ for $line =~ m/\b([a-zA-Z]{2,4})\b/g; } #or do { $hash{lc $_}++ for m/\b([a-zA-Z]{2,4})\b/g } for <STDIN>;

Update: Doh. note i misread the /(\w)\1\1\1\1/ regex as 5+ letters instead of 5+ of the _same_ letter in a row .. If the 5+ letters don't happen very often, might be better to just exclude at the end:
while (my $line = <STDIN>) { $hash{lc $_}++ for $line =~ m/([a-zA-Z]{2,})/g; } delete $hash{$_} for grep /(\w)\1{4}/, keys %hash;