haukex has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I'm looking for some module recommendations before I run off and reinvent a wheel. This is a two-part question; for the first part I haven't really found anything on CPAN, and for the second part I have a few ideas (below), but again, haven't really found anything. I suspect there might be some modules I'm missing.

So I have two-dimensional data represented as an AoA. In a configuration file, I need to specify ranges of that data, for example "extract rows X through Y and columns M through N". It would be nice if these ranges could be expressed in some easy way, like in the example I showed below, and I would like to get only a 2-dimensional subset of the larger data. The second part is, it would be really nice if the returned values were aliases/references to the original data, so that if I make a modification to the subset, it modifies the original, and vice versa (so basically, I want a "view" of the larger dataset). I know I could do this with Data::Alias or tied arrays, but again, maybe someone knows a module that already provides this. Here is the whole thing expressed in code:

use warnings; use strict; use Test::More tests=>2; sub getsubset { ... } my $data = [ ['a','b','c','d'], ['e','f','g','h'], ['i','j','k','l'], ['m','n','o','p'] ]; my $range = 'R2C2:R3C3'; # or any other useful syntax! my $subset = getsubset($data, $range); is_deeply $subset, [ ['f','g'], ['j','k'] ]; # Part 2: $subset->[0][1] = 'X'; $subset->[1][0] = 'Y'; is_deeply $data, [ ['a','b','c','d'], ['e','f','X','h'], ['i','Y','k','l'], ['m','n','o','p'] ];

Replies are listed 'Best First'.
Re: Selecting Ranges of 2-Dimensional Data
by pryrt (Abbot) on Oct 26, 2018 at 15:23 UTC

    I know that PDL uses fancy reference slices for multidimensional numerical matrixes. I don't know if it can be used for non-numeric matrixes (which you example implies), or whether you could at least use PDL-like techniques.

    update: PDL::Char is a likely candidate.

      Thanks! I'll look into PDL.

      I should have said, the strings will be variable length - many of them only a couple of characters long, but some column headers or annotations are hundreds of characters long.

Re: Selecting Ranges of 2-Dimensional Data (array of aliases)
by LanX (Saint) on Oct 26, 2018 at 21:53 UTC
    ♪..♫ It's a kind of magic ... ♩..♬ ;-)
    use warnings; use strict; #use Data::Dump qw/pp/; use Test::More tests=>2; sub arr_alias { \ @_ } # arr_ref to list of aliases sub getsubset { my ( $data, $range ) = @_; my ( $rows, $cols ) = @$range; # [y0,y1], [x0,x1] # YMMV! return [ map { arr_alias @$_[ $cols->[0] .. $cols->[1] ] # x-slice } @$data[ $rows->[0] .. $rows->[1] ] # y-slice ]; } my $data = [ ['a','b','c','d'], ['e','f','g','h'], ['i','j','k','l'], ['m','n','o','p'] ]; my $range = [ [1,2], [1,2] ]; # or any other useful syntax! my $subset = getsubset($data, $range); is_deeply $subset, [ ['f','g'], ['j','k'] ]; # Part 2: $subset->[0][1] = 'X'; $subset->[1][0] = 'Y'; is_deeply $data, [ ['a','b','c','d'], ['e','f','X','h'], ['i','Y','k','l'], ['m','n','o','p'] ];

    C:/Perl_524/bin\perl.exe d:/tmp/matrix_range.pl 1..2 ok 1 ok 2

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    update

    simplified code, added comments

    update

    changed from slices to ranges

      D'oh! Of course, @_! Thank you very much, LanX :-)

      An updated getsubset (for the code here):

      sub getsubset { my ($data,$range) = @_; my $cols = @{$$data[0]}; @$_==$cols or croak "data not rectangular" for @$data; $range = rangeparse($range) unless ref $range eq 'ARRAY'; @$range==4 or croak "bad size of range"; my @max = (0+@$data,$cols)x2; for my $i (0..3) { $$range[$i]=$max[$i] if $$range[$i]<0; croak "index $i out of range" if $$range[$i]<1 || $$range[$i]>$max[$i]; } croak "bad rows $$range[0]-$$range[2]" if $$range[0]>$$range[2]; croak "bad cols $$range[1]-$$range[3]" if $$range[1]>$$range[3]; my @cis = $$range[1]-1 .. $$range[3]-1; return [ map { sub{\@_}->(@{$$data[$_]}[@cis]) } $$range[0]-1 .. $$range[2]-1 ] }
        Honestly I wouldn't have implemented it your way in order to have a higher degree of reusability and readability.

        For instance

        • using a callback instead of arr_alias would allow a version to return copies which pass your first test criteria.
        • the nested maps could be abstracted to slice arbitrary nested structures, not only matrices.
        • allowing different projections, not only ranges
        OTOH I tend to get lost in abstraction. .. ;)

        > Of course, @_!

        It's a hack I once learned from Ikegami++, but it seems to be reliable.

        Though not many people know if it depends on an implementation detail.

        And unfortunately I don't think it can be used to alias hashes (i.e. values) too.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        This is all very interesting to read. I've put some say statements in the new getsubset to figure out aspects of the parameter array and matrix manipulations. I'll put abridged output and (unabridged) source between readmore tags and pull out the bits I want to ask about after. All of the useful source in this has been listed upthread, so I'd probably skip to the code niblets...

        I have not seen this syntax before, and just to be sure, I thumbed through _Learning Perl_, not seeing it in chapter 4, Lists and Arrays. Asking google what "perl arrays x 2" means does not ask an effective question.

          my @max = ( 0 + @$data, $cols ) x 2;

        I'm just looking for a reference to read up on that. The second thing I wanted to bring up was about the parameter array. Is it the case that @_ does not change over the life of the function? Does it have intrinsic aliasing?

        Finally, after days of tinkering with it, I'm still baffled by the return from getsubset. We know what it's to be because we print it out when it gets returned. Lo and behold, it is a reference to an array. I can't see how the sausage gets made here:

        return [ map { sub { \@_ } ->( @{ $$data[$_] }[@cis] ) } $$range[0] - 1 .. $$range[2] - 1 ];

        You use the range operator once. LanX (upthread for the curious) used it twice:

        return [ map { arr_alias @$_[ $cols->[0] .. $cols->[1] ] # x-slice } @$data[ $rows->[0] .. $rows->[1] ] # y-slice ];

        Are they logically equivalent? Thanks for your comment and raising a topic I haven't looked at in perl very far. (Scientific computing in my day was fortran.) Wouldn't it be relatively easy to display these values using Tk::TableMatrix?

Re: Selecting Ranges of 2-Dimensional Data (updated)
by haukex (Archbishop) on Oct 26, 2018 at 18:20 UTC

      I think that part of testing code is trying to understand it and then giving it more cases. In order for me to see what is happening here, I had to add a routine to print the array as we are accustomed to seeing 2d arrays. I had to add a bunch of say statements to suss out the internals. I got my first look at is_deeply, where the 3rd argument $test_name must be optional:

      is_deeply is_deeply( $got, $expected, $test_name );

      I'm also new with the syntax involving alias, so I printed out values to form a question there. I added a test to substitute in an 'M', intentionally causing the ultimate test to fail, to see if it would. I'll list output and then source between readmore tags:

      Q1) What is the purpose of using alias in this line

      alias my $r = $$range[$i];

      when it goes out of scope at the end of the loop?

      I always enjoy reading your posts, haukex, finding them challenging in the right way. I hope you don't resent me "embellishing" on your script.

        What is the purpose of using alias in this line alias my $r = $$range[$i];

        It was just to shorten code, and play around with Data::Alias a bit. Since LanX showed that the module isn't necessary (because the elements of @_ are already aliases), I removed it and posted an updated getsubset here.

        I hope you don't resent me "embellishing" on your script.

        Not at all, do whatever you need to figure stuff out, that's why I post publicly :-)

Re: Selecting Ranges of 2-Dimensional Data
by akuk (Beadle) on Oct 26, 2018 at 16:55 UTC

    For Multi dimensional Arrays, You need to take a look at PDL libraries. I have used Data::Frame modules but it is for numbers. I believe for strings you need to check PDL::Char. Here is the link http://pdl.perl.org/?docs=Char&title=PDL::Char

Re: Selecting Ranges of 2-Dimensional Data
by LanX (Saint) on Oct 26, 2018 at 15:51 UTC
    FWIW:

    it's possible to alias single array elements

    use warnings; use strict; use Data::Dump qw/pp/; my @a= ('a','b','c','d'); *b = \$a[1]; $b ="X"; pp @a

    ("a", "X", "c", "d")

    But wasn't able yet to bind the alias to another array element.

    But I seem to remember seeing it done in the past ...°

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    update

    °) yep, see arr_alias() in Re: Selecting Ranges of 2-Dimensional Data (array of aliases)

      The following hack passes the tests in the root node:

      use Data::Alias; sub getsubset { my $d = shift; alias my @x = @{$$d[1]}[1,2]; alias my @y = @{$$d[2]}[1,2]; return [ \@x, \@y ]; }
        yeah, but I meant without Data::Alias and just with a standard *glob mechanism.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice