punkish has asked for the wisdom of the Perl Monks concerning the following question:

How do I convert

my $foo = [ [1, 2, 3, ...], ['a', 'b', 'c', ...], ['foo', 'bar', 'baz', ...], ];

into

my $bar = [ [1, 'a', 'foo', ...], [2, 'b', 'bar', ...], [3, 'c', 'baz', ...], ];

The component arrays in the input array 'foo' can be of different lengths, so component arrays in the output array 'bar' should have a zero-length string '' for missing values.

Here is what I came up with...

my $bar = rearrange($foo); sub rearrange { my ($in) = @_; my @out; for my $col ( 0 .. (max($arrays) - 1) ) { my @row; foreach my $arr (@$in) { push @row, defined($arr->[$col]) ? $arr->[$col] : ''; } push @out, \@row; } return \@out; } sub max { my ($arr) = @_; my $max; foreach my $array (@$arr) { $max = scalar @$array if (scalar @$array > $max); } return $max; }

Oh, and component arrays in $foo could be pretty large... million elements or so each, and 30 or more such arrays.

Note: Original problem -- I have 30 or so text files with a million or so rows in each. Each text file has values destined for a column in a table in a db. So, a table with 30 or so columns and a million or so rows. The rows in the text files are ordered... they go in the same order in the table. In other words, row 'n' in each text file is a column value for the nth row in the table. How do I populate that table?

So, I thought I would build an array of arrays representing the table, and then INSERT the aofa into the table. Better suggestions are welcome, but I am also academically curious about the problem of rearranging an array.

--

when small people start casting long shadows, it is time to go to bed

Replies are listed 'Best First'.
Re: rearranging an array of arrays
by tilly (Archbishop) on Mar 10, 2009 at 16:56 UTC
    There is a major problem here. If you have 30 million items which take, say, 40 bytes of memory each (an empty string takes 28 before you put anything in it) you're talking 1.2 GB of RAM. If you have 2 million rows, you could find yourself going past the 2 GB addressing limit for 32-bit code. I would therefore strongly suggest not putting it into RAM. Instead insert the data as you read with something like this:
    # Assume that %file has the filename for each field in your # table and @fields has the list of field names. my @fhs; for my $field (@fields) { open(my $fh, "<", $file{$field}) or die "Can't open '$file{$field}: $!"; push @fhs, $fh; } while (1) { my $did_read; my @data; for my $fh (@fhs) { my $rec = <$fh>; if (defined($rec)) { $did_read++; push @data, $rec; } else { push @data, ""; } } if ($did_read) { insert_data(@data); } else { # Came to the end of all data streams last; } }
Re: rearranging an array of arrays
by Fletch (Bishop) on Mar 10, 2009 at 16:12 UTC

    Inspired by the approach in Two handy tools for nested arrays: mapat and transpose and vague residual memories from HOP, I'd set up a list of iterators, one for each source (said iterator returns '' when exhausted to handle "short" source rows); then just pull off the head from each to get your next new transposed row. That should be able to scale well (if your iterators are sufficiently lazy).

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: rearranging an array of arrays
by ikegami (Patriarch) on Mar 10, 2009 at 16:28 UTC

    If you're ok with undef instead of the empty string for missing elements, you could use Algorithm::Loops's MapCharU.

    use Algorithm::Loops qw( MapCharU ); my $bar = [ MapCharU { [ @_ ] } @$foo ];

    The block could be changed to convert undefined values into empty strings, but it wouldn't be able to distinguish between undefined values in $foo and missing elements in $foo.

Re: rearranging an array of arrays
by kyle (Abbot) on Mar 10, 2009 at 21:33 UTC

    Aside from this being a huge data set, as tilly has noted, I might do it this way.

    use List::Util qw( max ); use List::MoreUtils qw( mesh ); # Find the length of the longest list. my $longest = max map { scalar @{ $_ } } @{ $foo }; # Pad every array to equal length using empty strings. push @{ $_ }, ( q{} ) x ( $longest - scalar @{ $_ } ) for @{ $foo }; # Since 'mesh' takes a list of arrays, using a prototype, # I use '&mesh' to avoid the prototype. my @meshed = &mesh( @{ $foo } ); # Alternate way to do the same thing #my @meshed = eval "mesh " . join q{,}, # map { "\@{ \$foo->[ $_ ] }" } 0 .. $#{ $foo }; # Pull out the individual rows. my $bar = []; push @{ $bar }, [ splice @meshed, 0, $longest ] while @meshed;

    This preserves any undef in the original input.

    It does more looping than it really needs to, but most of those loops are over the fairly small set of input arrays.