Yow. I'm not sure who's crazier - you for suggesting this might be something one would want to do, or me for trying to do it ;)
Because It's There, as the man said.
Having struggled with it a bit I realised one thing about the question itself, which is that we can't say there are only three patterns. In fact there are a lot more - "hell", "hel", and "he" to name but the most obvious additions. That's unless we want to match against a dictionary, in which case it's just a matter of processing power.
Assuming we are interested in patterns rather than specific words I think the following does it. I should say at the outset that the clever bit in this comes from
japhy's regex book which is referred to in
this node.
my $string = "helloworldhellohellohihellohiworld";
my $length = length $string;
my $window = int (($length - 2) / 2);
# use japhy's regex to hoover up all char
# sequences that MIGHT be patterns:
my @pats;
my $regex;
while ($window > 1) {
$regex = '(?=(' . '.' x $window . '))';
push @pats, ($string =~ /$regex/g);
$window --;
}
# now go through @pats to find the duplicates
# and print the final result
@pats = sort @pats;
my %dups;
for (2 .. $#pats) {
$dups{$pats[$_]} ++ if ($pats[$_] eq $pats[$_ - 1])
}
$dups{$_} ++ for keys %dups;
for (keys %dups) {print $dups{$_},' occurrences of "',$_,'"',"\n"}
This throws up 31 patterns, with up to four occurrences each. (BTW, in case
$window doesn't make sense, I assumed (A) there must be at least two occurrences of each pattern, otherwise it wouldn't really be a pattern; (B) each pattern must be at least 2 chars and (C) there must be at least 2 patterns.)
Thanks for making me think. Can I stop now?
§
George Sherston
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.