First, the definition: It occurs to me that people refer to lots of things as "pivot tables" including the ability to drag certain rows to column headers for filtering and all that datawarehousing stuff.

This isn't that.

I have an unsettlingly frequent need to take "few column, many row" files and query outputs and line them up next to each other. I can't use sql because they are frequently bridges between databases.

and our perl environments are stuck with core modules, and we have a distressing mix of unixen and windows environments, so things like sqlite are out.

What I have here is some code I've banged together that will take a 2-dimensional list (of the "list of list references" variety) and produce a similarly structured list where columns and rows switch places. That is to say this:

1,a 2,b
will become this:
1,2 a,b
On it's own, I don't think that would really even bare mention in CB. But what I did add was padding and handling for irregular field counts and blank rows. Meaning that this:
1,2,3 4,5
(note the leading blank line) will be converted to:
,1,,4 ,2,,5 ,3,,
Note that it's not remarkably efficient code. It probably couldn't even be considered particularly 'perlish'. (I find I write remarkably C-ish perl.)

I'm definitely up for suggestions (and recommendations on where to put usage examples, test cases, etc. This is getting a bit long.) The next version (still in frustration) uses external buffering for an intermediate stage and doesn't do the whole schmeggegy in ram.

To use, just call the pivot function with a list of listrefs.

sub pivot { my (@srcgrid) = @_; my $max_row = $#srcgrid; my $max_col = get_max_cols(@srcgrid); my @pivoted; for my $col (0..$max_col) { my @new_row; for my $row (0..$max_row) { my $value; if (defined $srcgrid[$row][$col]) { $value = $srcgrid[$row][$col]; } else { $value = ''; } push @new_row,$value; } push @pivoted,\@new_row; } return @pivoted; } sub get_max_cols { my (@grid) = @_; my $max= 0; for my $rowref (@grid) { my @row = @$rowref; if ($max < $#row) { $max = $#row; } } return $max; }
It's not earth shattering code, but I save a lot of time and get a lot of use out of it.

EDIT: Ok, it's not 'pivoting' it's "transposing" (title changed accordingly. Thanks to tye and ambrus for the clarification.

Me

Replies are listed 'Best First'.
Re: Pivoting 2 dimensional array refs
by wind (Priest) on May 13, 2011 at 01:13 UTC

    Here's my version that's a little more perlish as you'd say.:

    use Data::Dumper; use strict; use warnings; sub pivot { my @src = @_; my $max_col = 0; $max_col < $#$_ and $max_col = $#$_ for @src; my @dest; for my $col (0..$max_col) { my @new_row; for my $row (0..$#src) { push @new_row, $src[$row][$col] // ''; } push @dest, \@new_row; } return @dest; } my @test = ( [1..5], [11..15], [21..25], [31..35], [41], [51..55], ); my @results = pivot(@test); print Dumper(\@results); =prints $VAR1 = [ [1,11,21,31,41,51], [2,12,22,32,'',52], [3,13,23,33,'',53], [4,14,24,34,'',54], [5,15,25,35,'',55] ]; =cut

    Update: And a little more perlish still would be to use map:

    sub pivot { my @src = @_; my $max_col = 0; $max_col < $#$_ and $max_col = $#$_ for @src; my @dest; for my $col (0..$max_col) { push @dest, [map {$src[$_][$col] // ''} (0..$#src)]; } return @dest; }

    This last is actually my preferred way of doing this, since it's easy to see what the inner loop is iterating on.

    - Miller
      Nice. It's fundimentally the same, but you get a lot of collapse of excess code by folding up the if/else decision when computing $value.

      I would normally, in any language, spot that construct:

      if (something) value = xxxxx else value = yyyy
      as something to be folded up into one expression. Perhaps a terinary operator: value = (something) ? xxxxx : yyyy. But the "defined or" really shines here, reeling in the rest. That is precisely the idiom that the OP needs in the problem: grab the value at the source coordinates, but make it '' instead of undef.

      I see that the computation of max_col was made into one line by using the suffix form of 'for'. That shrunk the vertical expanse of code down quite a bit.

        I once thought as you do ;).

        Unless there is a performance gain that is:

        1. Truly significant
        2. Actually needed
        I stay away from ternary assignment and other forms of code collapse in favor of more verbose easy to read code. I'd far rather opt for clear suboptimal code than optimized idioms that don't really give me much.

        Now if there's actual redundancy I'm more inclined to squnch it up.

        But I suspect this largely talks past your point of the 'defined or' which, as I'm packing, I haven't yet had time to cogitate over.

        o/

        Me
      Your results are different from the results of the original sub:
      $VAR1 = [ [1,11,21,31,41,51], [2,12,22,32,'',52], [3,13,23,33,'',53], [4,14,24,34,'',54], [5,15,25,35,'',55] ];
      Update: They are not, you just did not update the pod :)
      Hey nice. Thanks for that. I'll have to sit down with a cigar and go over that a bit. Your assignment of max_col is somewhat dizzying. Would this handle an empty subordinate array? (eh, nm. I'll check it out.)
      Me
Re: Pivoting 2 dimensional array refs
by ambrus (Abbot) on May 13, 2011 at 07:13 UTC

    For searchability, this code transposes an array of arrays.

      Yeah tye brought that up yesterday as well. If only I'd had the linear algebra background to have known that earlier, I may have come across the Algorithm::Loops route to getting this done.

      o/

      Me
        Linear Algebra is well worth the trouble to read about.

        When I was a teen, I bought a fancy new calculator, an HP 15C, that featured matrix operations. I borrowed a book on Linear Algebra in order to learn how to use it effectively. (If I knew how much they would sell for as antiques, I would not have thrown mine out!)

        I recall calculating a "cash discount" table at a gas station (I worked as a cashier part time) using a single operation to populate the matrix giving the table. It's basically the kind of stuff you do in APL.

        Later when I took L.A. in college I had an easy time. It's really the simplest "advanced" math there is. It's more like programming.