neversaint has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters,

I am trying to delimit an array given a string (which later be decomposed as region marker).
Let me give the example:
$delim = 'A -3 C -4 B'; $array = ['A -4 A','A -1 C','C -4 D','D -4 B','B -3 C','C -2 B','B -1 E']; + * + * + * + * + * + * # Note that in $array, starting from # second element $array->[1] until $array->[-1], # the first alphabet ( asterisks *) is actually # a repeat of the last alphabet ( plus +) # from the previous element.
Using $delim, I wish to identify region in $array. First of all we can decompose $delim into sub-regions.
I have a function that does this job. The decomposition looks like this:
my $delim = 'A -3 C -4 B'; my $delim_reg = ['A -3 C','C -4 B'];

Once we know $delim_reg, we can identify those region in $array.
There are 6 possibilities (instances) we can identify. They are:
my $delim_reg = ['A -3 C','C -4 B']; \__i__/ \__j__/ --> sub-regions my $array = ['A -4 A','A -1 C','C -4 D','D -4 B','B -3 C', 'C -2 B','B -1 E']; |\_____i_______/ \__j__/|-> Inst 1 |____________________________________________________| |\_i__/ \_____j________/ |-------> Instance 2 |_________________________| ... till |\_____i_______/ \_______________j__________________/ |--> Inst 6 |______________________________________________________|
Let me try to describe the picture above. For example, Instance 1 contains two sub-regions (i and j) which correspond to each marker from $delim_reg. So in "i" a sub-region is started with A and ended with C. Similarly in "j" a sub-region is started with C and ended with B.

Thus, we define a sub-region 'A -3 C' as a collection of tuples, such that in this collection the first alphabet of the first tuple must be equal with 'A' and last alphabet of the last tuple must be equal with 'C'.

Now the main task is to come up with a function that can capture those 6 instances with $delim and $array above as input. The final result will be this ( I did it by hand):
my $VAR = [ [['A -4 A', 'A -1 C'], ['C -2 B']], # Ins1 [['A -4 A', 'A -1 C'], ['C -4 D','D -4 B']], # Ins2 [['A -4 A', 'A -1 C'], ['C -4 D','D -4 B','B -4 C','C -2 B']], # Ins3 [['A -4 A', 'A -1 C', 'C -4 D','D -4 B','B -4 C'],['C -2 B']], # Ins4 [['A -1 C'], ['C -4 D', 'D -4 B']], # Ins5 [['A -1 C'], ['C -4 D', 'D -4 B','B -3 C','C -2 B']], # Ins6 ]
I need to keep the result in forms of AoA, so that I can later add other value into it. In other cases, $delim can contain more or less than 3 alphabets. The size of the $array may also be varied. Thus generating more/lesser sub-regions and number of instances too.

My code below is still far far from achieving the desired result above. I encounter two main difficulties at the moment: I really don't know how to go about it. Here is the code:
use Data::Dumper; my $delim = 'A -3 C -4 B'; my $array = ['A -4 A','A -1 C','C -4 D','D -4 B','B -3 C','C -2 B','B -1 E']; get_delim_region($delim,$array); sub get_delim_region{ my ($delim,$array) = @_; my $delim_reg = decomp_str($delim); # I'm really stuck from here..... my @instances; OUT: foreach my $delim_rg ( @{$delim_reg} ){ my @delimreg; my $st1 = (split(" ",$delim_rg))[0]; my $st2 = (split(" ",$delim_rg))[2]; IN: foreach my $i ( 0 .. @{$array}-1 ){ my $tr1 = (split(" ",$array->[$i]))[0]; my $tr2 = (split(" ",$array->[$i]))[2]; if($st1 eq $tr1){ push @delimreg,$array->[$i]; } elsif($st2 eq $tr2){ push @delimreg,$array->[$i]; next OUT; } push @instances, [ @delimreg ]; } } print Dumper \@instances; return ; } sub decomp_str{ # This subroutine decompose a string into sub-regions i.e # from: $delim = 'A -3 C -4 B'; # into: $delim_reg = ['A -3 C','C -4 B']; # Credit Roy Johnson - fastest [$_[0] =~ /(?=([a-z]\s*(?:\S+\s*){2}))\S+\s*/gi ] }
Dear fellow monks, I humbly seek your enlighthenment in this matter. Thanks so much beforehand.

---
neversaint and everlastingly indebted.......

Replies are listed 'Best First'.
Re: Identifying Delimited Regions of an Array
by blokhead (Monsignor) on Oct 15, 2005 at 20:59 UTC
    It's easier to deal with the data if each "A -4 C" string token is already split into an array:
    my @delim = ( ["A", -3, "C"], ["C", -4, "B"] ); my @data = ( ["A", -4, "A"], ["A", -1, "C"], ["C", -4, "D"], ["D", -4, "B"], ["B", -3, "C"], ["C", -2, "B"], ["B", -1, "E"] );
    Now, we want to compute sets of pairs of indices such that each pair ($x,$y) tells us where the intervals are. I.e, @data[$x..$y] is one of the intervals we want.

    To do this for any number of things in @delim is tricky, but we can use tye's Algorithm::Loops to loop through all possible sets of 2*@delim indices. If they give a good decomposition into these intervals, then we print the intervals.

    Here's the code first, with an explanation later:

    #!/usr/bin/perl -l use Algorithm::Loops 'NestedLoops'; my @delim = ... my @data = ... my $N = $#data; my $D = $#delim; NestedLoops( [ [0 .. $N], sub {[$_ .. $N]}, ( sub {[$_+1 .. $N]}, sub {[$_ .. $N]} ) x $D ], { OnlyWhen => sub { return 0 if @_ < 2*@delim; for my $d (0 .. $D) { return 0 if $data[ $_[ $d*2 ] ][0] ne $delim[$d][0] or $data[ $_[ $d*2 + 1 ] ][2] ne $delim[$d][2] } return 1; } }, sub { my @ranges = map { [@data[ $_[2*$_] .. $_[2*$_+1] ]] } 0 .. $D; print join " / ", map { my $range = $_; join " ", map "[@$_]", @$range } @rang +es; } );
    It probably could have been much more elegant in a functional language, but we take what we can get. Here's roughly how it works:

    Standard disclaimers apply. I didn't test this for larger @delim, but I'm fairly confident that I generalized it properly.

    Update: Knowing what you're really doing with this (is it routing, as Skeeve suggests?) will help. There may be some graph algorithm that can come to our aid.

    Update 2: I added a line to the OnlyWhen sub, to make it work with warnings. The fix also makes it give the correct output for the buggy test cases that neversaint gives in his reply..

    blokhead

      Dear blokhead,

      Thank you so much for your -life saving- answer to my posting. I managed to get it work on larger test case.

      There are couple of critical issues however, that I truly hope you won't mind to take a look at it. Because I honestly don't know how to go about resolving it.

      My code below already give correct answers (98%), except for these two cases that give unexpected result:

      Case 1
      Given $delim = = 'A -3 C -4 B'; $set->{'key2'} => ['A -4 B','B -4 C','C -4 A','A -4 B'], RESULT: key2 [A -4 B] [B -4 C] / [A -4 B] [B -4 C] / [C -4 A] [A -4 B] # only this is correct
      Given that input, only the latter is the answer, because both sub-region are satisfied (covered).

      Case 2
      Given: $delim2 = 'A -4 B'; $set->{'key2'} => ['A -4 B','B -4 C','C -4 A','A -4 B'], RESULT: [A -4 B] #ok [A -4 B] #ok [A -4 B] [B -4 C] [C -4 A] [A -4 B] #ok [A -4 B] #not ok -- don't know how this is obtained
      I can understand where the first 3 lines come from - and they are correct, but not the last one.
      My full running code based on your approach is as follows:

      ---
      neversaint and everlastingly indebted.......
Re: Identifying Delimited Regions of an Array
by Skeeve (Parson) on Oct 15, 2005 at 21:02 UTC

    It appears to me that each of your triple (X # Y) is a piece of a route with X being the starting point and Y being the end point.

    What you seem to have in $array is a long route and a shorter one in $delim.

    No you seem to want to find regions in $array that give you the same stop on the long route, you got on the short.

    So here is my idea:

    1. Make two HoA where you store for each starting and each end point a list of indeces in your $array where to find them
      example data structure:
      %start = ( A => [ 0, 1, ], B => [ 4, 6, ], C => [ 2, 5, ], D => [ 3, ], } %end = { A => [ 0, ], B => [ 3, 5, ], C => [ 1, 4, ], D => [ 2, ], E => [ 6, ], );
    2. With this information you can easily find, for each starting point, where it is located.
    3. You will also find easily where each end point can be found.
    4. A valid route from A to C are now all those ranges where the value from @{$start{'A'}} is less or equal to a value from @{$end{'C'}}
      This should be:
      A C
      0 1
      0 4
      1 1
      1 4
    5. Implementation is left as an excercise to you ;-)


    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e