Here is my effort. It finds all the patterns and also does a
quick dictionary lookup for real words. If you don't have a
dictionary text file you can select from a wide variety at the
National Puzzlers' League -- Word Lists
my $str ='helloworldhellohellohihellohiworld';
my %hash;
# grab all the substrings 2+ chars and count occurences in a hash
for my $i ( 0 ..(length($str) -1) ) {
$hash{ substr($str, $i, $_) }++ for 2.. (length($str) - $i);
}
# sort on occurences and then in alphabetical order
# only select elements that occur >1 times using grep
my @order = sort { $hash{$b} <=> $hash{$a}
||
$a cmp $b
} grep { $hash{$_} > 1 } keys %hash;
print "\nPatterns found:\n\n";
print "$hash{$_} occurrences of $_ \n" for @order;
# now grab dictionary file into a hash
# you can save memory using Search::Dict
open DICT, "c:/windows/desktop/dict.txt" or die $!;
while ($word = <DICT>) {
chomp $word;
$dict{$word}++;
}
@words = grep { defined $dict{$_} } @order;
print "\nReal words found in dictionary:\n\n";
print "$hash{$_} occurrences of $_ \n" for @words;
# remove substrings of larger words
@words = sort { length $b <=> length $a } @words;
for $i ( 0 .. $#words - 1 ) {
for $j ( $i + 1 .. $#words ) {
$hash{$words[$j]} = 0 if $words[$i] =~ m/\Q$words[$j]/ and $ha
+sh{$words[$i]} == $hash{$words[$j]};
}
}
# regenerate sort order grepping out unwanted substrings (set occurenc
+es to zero above)
@words = sort { $hash{$b} <=> $hash{$a}
||
$a cmp $b
} grep { $hash{$_} } @words;
print "\nBest Matches:\n\n";
print "$hash{$_} occurrences of $_ \n" for @words;
__END__
# sample output
Patterns found:
4 occurrences of el
4 occurrences of ell
4 occurrences of ello
4 occurrences of he
4 occurrences of hel
4 occurrences of hell
4 occurrences of hello
4 occurrences of ll
4 occurrences of llo
4 occurrences of lo
3 occurrences of elloh
3 occurrences of helloh
3 occurrences of lloh
3 occurrences of loh
3 occurrences of oh
2 occurrences of ellohi
2 occurrences of hellohi
2 occurrences of hi
2 occurrences of ld
2 occurrences of llohi
2 occurrences of lohi
2 occurrences of ohi
2 occurrences of or
2 occurrences of orl
2 occurrences of orld
2 occurrences of rl
2 occurrences of rld
2 occurrences of wo
2 occurrences of wor
2 occurrences of worl
2 occurrences of world
Real words found in dictionary:
4 occurrences of el
4 occurrences of ell
4 occurrences of he
4 occurrences of hell
4 occurrences of hello
4 occurrences of lo
3 occurrences of oh
2 occurrences of hi
2 occurrences of or
2 occurrences of wo
2 occurrences of world
Best Matches:
4 occurrences of hello
3 occurrences of oh
2 occurrences of hi
2 occurrences of world
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|