http://qs1969.pair.com?node_id=517242


in reply to Re: Creating Dictionaries
in thread Creating Dictionaries

This line has some problems:
while (my ($word) = (lc $word) =~ /[a-z]{2,}/g) {
It won't compile under use strict; ... i think you meant to pattern match against lc $line, and there's no parens in the regex to capture anything ...

Working off your general idea, i came up with (no clue how this rates performance-wise against OP or my other solution below):
my %hash; while (my $line = <STDIN>) { foreach my $word ( $line =~ m/\b([a-zA-Z]{2,4})\b/g ) { $hash{lc $word}++; } } print "$_\n" for sort keys %hash;
Which can be rewritten as:
while (my $line = <STDIN>) { $hash{lc $_}++ for $line =~ m/\b([a-zA-Z]{2,4})\b/g; } #or do { $hash{lc $_}++ for m/\b([a-zA-Z]{2,4})\b/g } for <STDIN>;

Update: Doh. note i misread the /(\w)\1\1\1\1/ regex as 5+ letters instead of 5+ of the _same_ letter in a row .. If the 5+ letters don't happen very often, might be better to just exclude at the end:
while (my $line = <STDIN>) { $hash{lc $_}++ for $line =~ m/([a-zA-Z]{2,})/g; } delete $hash{$_} for grep /(\w)\1{4}/, keys %hash;

Replies are listed 'Best First'.
Re^3: Creating Dictionaries
by Perl Mouse (Chaplain) on Dec 16, 2005 at 14:33 UTC
    There's no need for parenthesis if you use m//g in list context (as I've done). However, I shouldn't have used a while, but a for.

    Now, your solution only grabs words 2, 3 or 4 letters long. Which is a restriction that OP didn't have - he eliminates words that have 5 times the same letter (not five letters!)

    Perl --((8:>*