Using regex as an alternative to usual loops on 1D data (Using surrogate string)

Hello,

Here I describe an idea and share several examples of using surrogate string to loop over its characters with regex, mimicking traditional loops. Such regexes contain evaluation blocks (?{}) or (??{}). In these blocks we can manipulate on array elements, and the indexing comes not with traditional variables i, j, k (C-style or foreach loops), but rather with regex variables of matching positions - pos, $-[0], $+[0] (e.g. perldocs -- @ ).

This short essay is a sister of Using regex as an alternative to usual loops on 1D data. Differences: instead of surrogate string the array elements are joined by particular separator, but it requires checking for if separator is not included in data.

These ideas are for comparison purposes, and TMTOWTDI; I do not believe regex code may be faster, and readability is hardly better.

The target (surrogate) string is generated by:

',' x scalar @array;
[download]

...and the backbone of regex is:

m/.(?{ $array[ pos ] = do_smth(); })(*FAIL)/;
[download]

...which we can expand. We match one or more characters and do manipulations with array elements accessing them with $array[ pos ] or so.

Here I show an example program. It calculates sum of absolute differences between consecutive array elements:

#!/usr/bin/perl -wl

use strict;

my @A = ( -5, 3, 1, -2 );

my $acc = 0;

for my $i ( 0 .. @A - 2 ){
    $acc += abs( $A[ $i ] - $A[ $i + 1 ] );
    }

print $acc;

$acc = 0;

( ',' x ( @A - 1 ) ) =~ /
    .
    (?{
        $acc += abs( $A[ $-[0] ] - $A[ $-[0] + 1 ] );
        })
    (*FAIL)
    /x;

print $acc;
[download]

OUTPUT:

13
13
[download]

Next example is a 'TRIANGLE' loop (loop in loop). I use here greedy .* to compare pairs of consecutive elements backwards. This acts as bubble sort. Program starts with traditional 'for' loop and alternatively -- regex "loop" on surrogate string.

#!/usr/bin/perl -wl

use strict;

my @A = qw( d c b a );

print "@A";

for my $i ( 0 .. @A - 2 ){
    for my $j ( reverse $i .. @A - 2 ){
        print ' ' x $i . "<$A[ $j ]> cmp <$A[ $j + 1 ]>";
        }
    }

print "-" x 5;

( ',' x ( @A - 1 ) ) =~ m/
    .
    .*
    (?{
        print ' ' x $-[0] . "<$A[ $+[0] - 1 ]> cmp <$A[ $+[0] ]>";
        })
    (*FAIL)
    /x;

print "-" x 5;

( ',' x ( @A - 1 ) ) =~ m/
    .
    .*
    (?{
          $A[ $+[0] - 1 ] gt $A[ $+[0] ] and
        ( $A[ $+[0] - 1 ],   $A[ $+[0] ] ) =
          reverse
        ( $A[ $+[0] - 1 ],   $A[ $+[0] ] );
        print "--@A";
        })
    (*FAIL)
    /x;

print "@A";
[download]

OUTPUT:

d c b a
<b> cmp <a>
<c> cmp <b>
<d> cmp <c>
 <b> cmp <a>
 <c> cmp <b>
  <b> cmp <a>
-----
<b> cmp <a>
<c> cmp <b>
<d> cmp <c>
 <b> cmp <a>
 <c> cmp <b>
  <b> cmp <a>
-----
--d c a b
--d a c b
--a d c b
--a d b c
--a b d c
--a b c d
a b c d
[download]

Similar example -- selection sort (with non-greedy .*?, meaning forward direction):

#!/usr/bin/perl -wl

use strict;

my @A = qw( d c b a );

print "@A";

for my $i ( 0 .. @A - 2 ){
    for my $j ( $i .. @A - 2 ){
        print ' ' x $i . "<$A[ $j ]> cmp <$A[ $j + 1 ]>";
        }
    }

print "-" x 5;

( ',' x ( @A - 1 ) ) =~ m/
    .
    .*?
    (?{
        print ' ' x $-[0] . "<$A[ $+[0] - 1 ]> cmp <$A[ $+[0] ]>";
        })
    (*FAIL)
    /x;

print "-" x 5;

my $jmin;

( ',' x ( @A - 1 ) ) =~ m/
    .
    (?{ $jmin = $+[0] - 1; })
    .*?
    (?{
        $A[ $jmin ] gt $A[ $+[0] ] and
        $jmin = $+[0];
        })
    $
    (?{ $jmin != $-[0] and
        ( $A[ $-[0] ], $A[ $jmin ] ) =
          reverse
        ( $A[ $-[0] ], $A[ $jmin ] );
        print "--@A";
        })
    (*FAIL)
    /x;

print "@A";
[download]

OUTPUT:

d c b a
<d> cmp <c>
<c> cmp <b>
<b> cmp <a>
 <c> cmp <b>
 <b> cmp <a>
  <b> cmp <a>
-----
<d> cmp <c>
<c> cmp <b>
<b> cmp <a>
 <c> cmp <b>
 <b> cmp <a>
  <b> cmp <a>
-----
--a c b d
--a b c d
--a b c d
a b c d
[download]

Next example is about traversing array by taking not 1 or 2 but 3 elements (consecutive). Moreover, in this example, it moves with doubled step size. First two sub-examples are written in normal C-for and foreach loops, and third is regex. I use (*SKIP) verb to skip some positions of matching, in this case I skip one. No manipulation, only printing of array values.
An example is analogic to examples in sister node.

#!/usr/bin/perl -w

use strict;

my @A = ( 1 .. 3, 'abc', 'zz', 79, 444, 5 );

for( my $i = 0; $i < @A - 2; $i += 2 ){
    print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]";
    }
print "\n";

for my $i ( grep $_ % 2 == 0, 0 .. @A - 3 ){
    print "[$A[ $i ]-$A[ $i + 1 ]-$A[ $i + 2 ]]";
    }
print "\n";

( ',' x ( @A - 1 ) ) =~ m/
    (,)
    (,)(*SKIP)
    (,)
    (?{ print "[$A[ $-[ 1 ] ]-$A[ $-[ 2 ] ]-$A[ $-[ 3 ] ]]" })
#    (?{ local $" = '-'; print "[@A[ @-[ 1 .. 3 ] ]]" })  # same outpu
+t
#    (?{ local $" = '-'; print "[@A[ $-[ 0 ] .. ( pos ) - 1 ]]" })  # 
+same output
    (*FAIL)
    /x;
print "\n";
[download]

OUTPUT:

[1-2-3][3-abc-zz][zz-79-444]
[1-2-3][3-abc-zz][zz-79-444]
[1-2-3][3-abc-zz][zz-79-444]
[download]

If we manipulate two or more elements, we can choose how many surrogate symbols to match. If we match only first symbol, then we use one variable, e.g. $-[ 0 ], and the indexes of consecutive elements would be: $-[ 0 ] + 1, $-[ 0 ] + 2, ... Otherwise we match all symbols and indexes we get from @- or @+ arrays.

Thanks for reading!

Comment on Using regex as an alternative to usual loops on 1D data (Using surrogate string) Select or Download Code