in reply to Identifying Delimited Regions of an Array

It's easier to deal with the data if each "A -4 C" string token is already split into an array:
my @delim = ( ["A", -3, "C"], ["C", -4, "B"] ); my @data = ( ["A", -4, "A"], ["A", -1, "C"], ["C", -4, "D"], ["D", -4, "B"], ["B", -3, "C"], ["C", -2, "B"], ["B", -1, "E"] );
Now, we want to compute sets of pairs of indices such that each pair ($x,$y) tells us where the intervals are. I.e, @data[$x..$y] is one of the intervals we want.

To do this for any number of things in @delim is tricky, but we can use tye's Algorithm::Loops to loop through all possible sets of 2*@delim indices. If they give a good decomposition into these intervals, then we print the intervals.

Here's the code first, with an explanation later:

#!/usr/bin/perl -l use Algorithm::Loops 'NestedLoops'; my @delim = ... my @data = ... my $N = $#data; my $D = $#delim; NestedLoops( [ [0 .. $N], sub {[$_ .. $N]}, ( sub {[$_+1 .. $N]}, sub {[$_ .. $N]} ) x $D ], { OnlyWhen => sub { return 0 if @_ < 2*@delim; for my $d (0 .. $D) { return 0 if $data[ $_[ $d*2 ] ][0] ne $delim[$d][0] or $data[ $_[ $d*2 + 1 ] ][2] ne $delim[$d][2] } return 1; } }, sub { my @ranges = map { [@data[ $_[2*$_] .. $_[2*$_+1] ]] } 0 .. $D; print join " / ", map { my $range = $_; join " ", map "[@$_]", @$range } @rang +es; } );
It probably could have been much more elegant in a functional language, but we take what we can get. Here's roughly how it works:

NestedLoops( [ [0 .. $N], sub {[$_ .. $N]}, ( sub {[$_+1 .. $N]}, sub {[$_ .. $N]} ) x $D ],
This part tells what variables we'll be looping over. This is like constructing:
for my $i0 ( 0 .. $N ) { for my $i1 ( $i0 .. $N ) { for my $i2 ( $i1+1 .. $N ) { for my $i3 ( $i2 .. $N ) { ...
Note that we want the even-numbered intervals to tell where an interval begins, and the odd-numbered intervals where an interval ends. The two endpoints of an interval can be the same (i.e, $i3 can be the same as $i2), but the beginning of an interval has to be after the previous interval ends (i.e, $i2 must be at least $i1+1). That's why in every other loop, we start indices at $_ or $_+1.

So in what follows, instead of having a for loop with all these $iN variables, we'll get the list of loop variables in the array @_.

{ OnlyWhen => sub { return 0 if @_ < 2*@delim; for my $d (0 .. $D) { return 0 if $data[ $_[ $d*2 ] ][0] ne $delim[$d][0] or $data[ $_[ $d*2 + 1 ] ][2] ne $delim[$d][2] } return 1; } },
This part filters out the intervals that don't match up with @delim. Note that the $d'th intervals endpoints are $_[$d*2] and $_[$d*2+1]. We simply check that our $d'th interval starts and ends with the correct letter from @delim.

Update: I added the line return 0 if @_ < 2*@delim;, because I guess NestedLoops will give you fewer loop variables than you ask for (when the last few loop variables don't have a valid range to loop over). This change gets rid of the warnings and makes the code now produce the correct output for the test cases in your reply.

sub { my @ranges = map { [@data[ $_[2*$_] .. $_[2*$_+1] ]] } 0 .. $D; print join " / ", map { my $range = $_; join " ", map "[@$_]", @$range } @rang +es; }
This sub gets called for every set of loop variables @_ that pass the filter, which are all the interval-endpoints that we're interested in. This part just prints them out in a slick way. It's a little messy because we have several levels of arrays.

Anyway, when it's all said and done, this prints:

[A -4 A] [A -1 C] / [C -4 D] [D -4 B] [A -4 A] [A -1 C] / [C -4 D] [D -4 B] [B -3 C] [C -2 B] [A -4 A] [A -1 C] / [C -2 B] [A -4 A] [A -1 C] [C -4 D] [D -4 B] [B -3 C] / [C -2 B] [A -1 C] / [C -4 D] [D -4 B] [A -1 C] / [C -4 D] [D -4 B] [B -3 C] [C -2 B] [A -1 C] / [C -2 B] [A -1 C] [C -4 D] [D -4 B] [B -3 C] / [C -2 B]
So it appears that you missed a few when you did it by hand ;)

If you wanted to return the deep structure of all the intervals, try this instead:

my @intervals = NestedLoops( .. sub { return [ map { [@data[ $_[2*$_] .. $_[2*$_+1] ]] } 0 .. $D ] } );

Standard disclaimers apply. I didn't test this for larger @delim, but I'm fairly confident that I generalized it properly.

Update: Knowing what you're really doing with this (is it routing, as Skeeve suggests?) will help. There may be some graph algorithm that can come to our aid.

Update 2: I added a line to the OnlyWhen sub, to make it work with warnings. The fix also makes it give the correct output for the buggy test cases that neversaint gives in his reply..

blokhead

Replies are listed 'Best First'.
Re^2: Identifying Delimited Regions of an Array
by neversaint (Deacon) on Oct 16, 2005 at 09:08 UTC
    Dear blokhead,

    Thank you so much for your -life saving- answer to my posting. I managed to get it work on larger test case.

    There are couple of critical issues however, that I truly hope you won't mind to take a look at it. Because I honestly don't know how to go about resolving it.

    My code below already give correct answers (98%), except for these two cases that give unexpected result:

    Case 1
    Given $delim = = 'A -3 C -4 B'; $set->{'key2'} => ['A -4 B','B -4 C','C -4 A','A -4 B'], RESULT: key2 [A -4 B] [B -4 C] / [A -4 B] [B -4 C] / [C -4 A] [A -4 B] # only this is correct
    Given that input, only the latter is the answer, because both sub-region are satisfied (covered).

    Case 2
    Given: $delim2 = 'A -4 B'; $set->{'key2'} => ['A -4 B','B -4 C','C -4 A','A -4 B'], RESULT: [A -4 B] #ok [A -4 B] #ok [A -4 B] [B -4 C] [C -4 A] [A -4 B] #ok [A -4 B] #not ok -- don't know how this is obtained
    I can understand where the first 3 lines come from - and they are correct, but not the last one.
    My full running code based on your approach is as follows:

    ---
    neversaint and everlastingly indebted.......