I think I like LanX's grep/unpack approach best (Update: also the approaches of hdb here and kcott here), but here's another (no claims for efficiency):
>perl -wMstrict -le "my $s = join '', qw(ABCbb bCBAd ddCAC DDDAC ABBBC ABCDA ABCCC CCCAB); print qq{'$s'}; ;; my $row_width = 5; my $contiguous = 3; $contiguous <= $row_width or die 'nonsense'; $contiguous > 0 or die 'ridiculous'; ;; my $pre_max = $row_width - $contiguous; my $post = $contiguous - 1; ;; use re 'eval'; my $mod; my @reps = grep { $mod = ! $mod } $s =~ m{ \G (?: .{$row_width}){0,}? .{0,$pre_max}? ((.)\2{$post}) (?(?{ pos($s) % $row_width }) .){$pre_max} }xmsg; ;; printf qq{'$_' } for @reps; " 'ABCbbbCBAdddCACDDDACABBBCABCDAABCCCCCCAB' 'DDD' 'BBB' 'CCC' 'CCC'
Update: If I correctly understand how all this works, the regex sub-expression
(?(?{ pos($s) % $row_width }) .){$pre_max}
will continue to "loop" until the value of the $pre_max quantifier is exhausted even though a row boundary has been reached. A way to "break out" of this loop once a boundary is reached (again, IIUC) is
(?(?{ $+[1] % $row_width }) . | (*SKIP:AHEAD)){$pre_max} (*MARK:AHEAD)
($+[1] used in place of pos($s)). Again, no timings have been done to confirm this is actually beneficial. See Special Backtracking Control Verbs (Perl version 5.10+) in perlre and note that these verbs are marked "experimental". If you really want to walk on the wild side, try (*ACCEPT) which is marked highly experimental:
(?(?{ $+[1] % $row_width }) . | (*ACCEPT)){$pre_max}
and no (*MARK) is needed. Both of these variations have been tested.
In reply to Re: Regex matching on grid alignment
by AnomalousMonk
in thread Regex matching on grid alignment
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |