in reply to Regex matching on grid alignment
I think I like LanX's grep/unpack approach best (Update: also the approaches of hdb here and kcott here), but here's another (no claims for efficiency):
>perl -wMstrict -le "my $s = join '', qw(ABCbb bCBAd ddCAC DDDAC ABBBC ABCDA ABCCC CCCAB); print qq{'$s'}; ;; my $row_width = 5; my $contiguous = 3; $contiguous <= $row_width or die 'nonsense'; $contiguous > 0 or die 'ridiculous'; ;; my $pre_max = $row_width - $contiguous; my $post = $contiguous - 1; ;; use re 'eval'; my $mod; my @reps = grep { $mod = ! $mod } $s =~ m{ \G (?: .{$row_width}){0,}? .{0,$pre_max}? ((.)\2{$post}) (?(?{ pos($s) % $row_width }) .){$pre_max} }xmsg; ;; printf qq{'$_' } for @reps; " 'ABCbbbCBAdddCACDDDACABBBCABCDAABCCCCCCAB' 'DDD' 'BBB' 'CCC' 'CCC'
Update: If I correctly understand how all this works, the regex sub-expression
(?(?{ pos($s) % $row_width }) .){$pre_max}
will continue to "loop" until the value of the $pre_max quantifier is exhausted even though a row boundary has been reached. A way to "break out" of this loop once a boundary is reached (again, IIUC) is
(?(?{ $+[1] % $row_width }) . | (*SKIP:AHEAD)){$pre_max} (*MARK:AHEAD)
($+[1] used in place of pos($s)). Again, no timings have been done to confirm this is actually beneficial. See Special Backtracking Control Verbs (Perl version 5.10+) in perlre and note that these verbs are marked "experimental". If you really want to walk on the wild side, try (*ACCEPT) which is marked highly experimental:
(?(?{ $+[1] % $row_width }) . | (*ACCEPT)){$pre_max}
and no (*MARK) is needed. Both of these variations have been tested.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Regex matching on grid alignment
by Anonymous Monk on Sep 10, 2013 at 00:08 UTC | |
by AnomalousMonk (Archbishop) on Sep 10, 2013 at 10:02 UTC |