Re: Regex matching on grid alignment

I think I like LanX's grep/unpack approach best (Update: also the approaches of hdb here and kcott here), but here's another (no claims for efficiency):

>perl -wMstrict -le
"my $s = join '', qw(ABCbb bCBAd ddCAC DDDAC ABBBC ABCDA ABCCC CCCAB);
 print qq{'$s'};
 ;;
 my $row_width  = 5;
 my $contiguous = 3;
 $contiguous <= $row_width or die 'nonsense';
 $contiguous >  0          or die 'ridiculous';
 ;;
 my $pre_max = $row_width - $contiguous;
 my $post    = $contiguous - 1;
 ;;
 use re 'eval';
 my $mod;
 my @reps =
   grep { $mod = ! $mod }
   $s =~ m{
     \G (?: .{$row_width}){0,}? .{0,$pre_max}? ((.)\2{$post})
     (?(?{ pos($s) % $row_width }) .){$pre_max}
     }xmsg;
 ;;
 printf qq{'$_'  } for @reps;
"
'ABCbbbCBAdddCACDDDACABBBCABCDAABCCCCCCAB'
'DDD'  'BBB'  'CCC'  'CCC'
[download]

Update: If I correctly understand how all this works, the regex sub-expression
(?(?{ pos($s) % $row_width }) .){$pre_max}
will continue to "loop" until the value of the $pre_max quantifier is exhausted even though a row boundary has been reached. A way to "break out" of this loop once a boundary is reached (again, IIUC) is
(?(?{ $+[1] % $row_width }) . | (*SKIP:AHEAD)){$pre_max} (*MARK:AHEAD)
($+[1] used in place of pos($s)). Again, no timings have been done to confirm this is actually beneficial. See Special Backtracking Control Verbs (Perl version 5.10+) in perlre and note that these verbs are marked "experimental". If you really want to walk on the wild side, try (*ACCEPT) which is marked highly experimental:
(?(?{ $+[1] % $row_width }) . | (*ACCEPT)){$pre_max}
and no (*MARK) is needed. Both of these variations have been tested.

Comment on Re: Regex matching on grid alignment Select or Download Code

Replies are listed 'Best First'.
Re^2: Regex matching on grid alignment by Anonymous Monk on Sep 10, 2013 at 00:08 UTC
I think I like LanX's grep/unpack approach best May I ask why? When I benchmarked it, rjt's regex was 5-10 times faster (depending on grid size) than the grep/unpack approach when used alone, and even faster when I combine it with our existing expression. Efficiency was a main requirement, and I personally don't find the grep/unpack any more readable or maintainable than the regex, but maybe that's just me. (LanX, please don't feel like I'm picking on you. I really appreciate your comments and learned from them. It's just that for what I asked for help with, I thought rjt's solution was way better, but now I'm trying to understand why it might not be.)	[reply]
Re^3: Regex matching on grid alignment by AnomalousMonk (Archbishop) on Sep 10, 2013 at 10:02 UTC
I think I like LanX's grep/unpack approach best May I ask why? Well, I did only say I thought I liked it best :) My practice in dealing with certain SoPW queries, regex-related ones in particular, is to try to formulate an answer before looking at any of the responses, then post what I have if it seems it might bring something to the table. In quickly reading through the replies already posted before posting my own, it seemed that rjt's was closest to mine in matching to row-groups, then to a repeat within a row. (Although rjt's solution finds the rightmost repeat in a string, and you would presumably work leftward from there to pick up the others; my understanding is that you want all repeats from the string. My approach finds the leftmost repeat first, then continues working to the right.) What my approach added, it seemed to me, was a way within the regex to "re-synchronize" to a row boundary before looking for the next repeat to capture. As I say, I only briefly glanced at the various replies before posting my own, and did no benchmarking. I was beguiled by the simplicity of some of the other replies and by the fact that I've found such approaches to be quite efficient in some other cases. I failed to notice that rjt had already done some benchmarking and found his or her regex approach to be significantly faster. And that's the point: benchmarking tells the tale. No matter if an alternate solution is simpler, more elegant or maintainable (whatever those words really mean), if you have a need for speed and a solution's not fast enough, it's unuseable. Period. So good luck in your quest, and I hope your experience of the Monastery has been and will continue to be productive. And please consider registering: it just makes things a bunch simpler.	[reply]