Your description of the problem is a bit puzzling. It looks like you want to run through the string in order and select four characters from it. After you've taken those four out, you want to go back and get four more from what's left, and so on, until you can't form any more tuples. The rules for selecting 4-tuples are that they must all be the same letter (chosen from a cluster of at least 4), or they must be one letter from each of four consecutive clusters of letters.
Is that right?
And then to add to the difficulty, you want to get the largest possible set. I think that is a hard problem. I don't have a solution for you, but I hope I've made your requirements clearer to others.
Update
I have come up with a script that does what I think you want, although it does not skip matches in favor of more optimal ones. It just finds the leftmost tuple until they're all gone. (update again: think it's debugged now) Anyway, it's a start.
#!perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ($tuple, @set);
print "Starting with $_\n";
while (($tuple) = /((\w)\2\2\2|(\w)(?:\3*)(?!\3)(\w)(?:\4*)(?!\4)(\w
+)(?:\5*)(?!\5)(\w))/) {
$tuple =~ y///cs; # Only one of any character
if (length($tuple) == 1) {
$tuple x= 4;
}
print "Found $tuple!\n";
# Remove tuple
for my $char (split //, $tuple) {
# If it is the last of its kind,
# no more matches across it are possible
# I put a space in there, so it won't match \w
s/(?<!$char)$char(?!$char)/ / or s/$char//;
}
print "Next round: $_\n";
}
}
__DATA__
AAAAADDDDDEFFGMMSSTVVVVV
AADDDEEEEFFFFGGMMMMMMMMMMSTV
AADEEFFG
Caution: Contents may have been coded under pressure.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.