comment on

It's easier to deal with the data if each "A -4 C" string token is already split into an array:

my @delim = ( ["A", -3, "C"], ["C", -4, "B"] );
my @data  = (
  ["A", -4, "A"],
  ["A", -1, "C"],
  ["C", -4, "D"],
  ["D", -4, "B"],
  ["B", -3, "C"],
  ["C", -2, "B"],
  ["B", -1, "E"]
);
[download]

Now, we want to compute sets of pairs of indices such that each pair ($x,$y) tells us where the intervals are. I.e, @data[$x..$y] is one of the intervals we want.

To do this for any number of things in @delim is tricky, but we can use tye's Algorithm::Loops to loop through all possible sets of 2*@delim indices. If they give a good decomposition into these intervals, then we print the intervals.

Here's the code first, with an explanation later:

#!/usr/bin/perl -l
use Algorithm::Loops 'NestedLoops';

my @delim = ...
my @data  = ...

my $N = $#data;
my $D = $#delim;

NestedLoops(
  [        [0    .. $N],  sub {[$_ .. $N]},
    ( sub {[$_+1 .. $N]}, sub {[$_ .. $N]} ) x $D
  ],
  { OnlyWhen => sub {
       return 0 if @_ < 2*@delim;
       for my $d (0 .. $D) {
         return 0 if $data[ $_[ $d*2     ] ][0] ne $delim[$d][0]
                  or $data[ $_[ $d*2 + 1 ] ][2] ne $delim[$d][2]
       }
       return 1;
  } },
  sub {
    my @ranges = map { [@data[ $_[2*$_] .. $_[2*$_+1] ]] } 0 .. $D;
    print join " / ",
          map { my $range = $_; join " ", map "[@$_]", @$range } @rang
+es;
  }
);
[download]

It probably could have been much more elegant in a functional language, but we take what we can get. Here's roughly how it works:

NestedLoops(
  [        [0    .. $N],  sub {[$_ .. $N]},
    ( sub {[$_+1 .. $N]}, sub {[$_ .. $N]} ) x $D
  ],
[download]

This part tells what variables we'll be looping over. This is like constructing:

for my $i0 ( 0 .. $N ) {
  for my $i1 ( $i0 .. $N ) {
    for my $i2 ( $i1+1 .. $N ) {
      for my $i3 ( $i2 .. $N ) {
        ...
[download]

Note that we want the even-numbered intervals to tell where an interval begins, and the odd-numbered intervals where an interval ends. The two endpoints of an interval can be the same (i.e, $i3 can be the same as $i2), but the beginning of an interval has to be after the previous interval ends (i.e, $i2 must be at least $i1+1). That's why in every other loop, we start indices at $_ or $_+1.

So in what follows, instead of having a for loop with all these $iN variables, we'll get the list of loop variables in the array @_.

  { OnlyWhen => sub {
       return 0 if @_ < 2*@delim;
       for my $d (0 .. $D) {
         return 0 if $data[ $_[ $d*2     ] ][0] ne $delim[$d][0]
                  or $data[ $_[ $d*2 + 1 ] ][2] ne $delim[$d][2]
       }
       return 1;
  } },
[download]

This part filters out the intervals that don't match up with @delim. Note that the $d'th intervals endpoints are $_[$d*2] and $_[$d*2+1]. We simply check that our $d'th interval starts and ends with the correct letter from @delim.

Update: I added the line return 0 if @_ < 2*@delim;, because I guess NestedLoops will give you fewer loop variables than you ask for (when the last few loop variables don't have a valid range to loop over). This change gets rid of the warnings and makes the code now produce the correct output for the test cases in your reply.

  sub {
    my @ranges = map { [@data[ $_[2*$_] .. $_[2*$_+1] ]] } 0 .. $D;
    print join " / ",
          map { my $range = $_; join " ", map "[@$_]", @$range } @rang
+es;
  }
[download]

This sub gets called for every set of loop variables @_ that pass the filter, which are all the interval-endpoints that we're interested in. This part just prints them out in a slick way. It's a little messy because we have several levels of arrays.

Anyway, when it's all said and done, this prints:

[A -4 A] [A -1 C] / [C -4 D] [D -4 B]
[A -4 A] [A -1 C] / [C -4 D] [D -4 B] [B -3 C] [C -2 B]
[A -4 A] [A -1 C] / [C -2 B]
[A -4 A] [A -1 C] [C -4 D] [D -4 B] [B -3 C] / [C -2 B]
[A -1 C] / [C -4 D] [D -4 B]
[A -1 C] / [C -4 D] [D -4 B] [B -3 C] [C -2 B]
[A -1 C] / [C -2 B]
[A -1 C] [C -4 D] [D -4 B] [B -3 C] / [C -2 B]
[download]

So it appears that you missed a few when you did it by hand ;)

If you wanted to return the deep structure of all the intervals, try this instead:

my @intervals = NestedLoops(
  ..
  sub {
    return [ map { [@data[ $_[2*$_] .. $_[2*$_+1] ]] } 0 .. $D ]
  }
);
[download]

Standard disclaimers apply. I didn't test this for larger @delim, but I'm fairly confident that I generalized it properly.

Update: Knowing what you're really doing with this (is it routing, as Skeeve suggests?) will help. There may be some graph algorithm that can come to our aid.

Update 2: I added a line to the OnlyWhen sub, to make it work with warnings. The fix also makes it give the correct output for the buggy test cases that neversaint gives in his reply..

blokhead

In reply to Re: Identifying Delimited Regions of an Array by blokhead
in thread Identifying Delimited Regions of an Array by neversaint

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.