in reply to Conversation Pools
While the time domain is statistically challenging, the text parsing seems so ripe in perl.$threshold=100; # max-cutoff while(<>) { chomp; s/[^A-z\s]//g; s/\s+/ /g; $says=$_; @words=split(/ /,$says); foreach(@words) { if($seen{$_} !=1) { $seen{$_} =1;} else {$count{$_}=$count{$_}+1;} } } while (($unique,$cases)=each(%seen)) { if($count{$unique} > $threshold){ print " $count{$unique} : $unique \n"; } }
|
|---|