punkish has asked for the wisdom of the Perl Monks concerning the following question:

# I have the following two arrays @foo = ( # date_time -----# #foo# ["20031001 000000", 1.4], ["20031001 001500", 1.5], ["20031001 003000", 1.6], ["20031001 004500", 1.5], ["20031001 010000", 1.4], ); @bar = ( # date_time -----# #bar# ["20031001 000000", 0.001], ["20031001 000005", 0.004], ["20031001 000015", 0.005], ["20031001 001000", 0.008], ["20031001 001005", 0.007], ["20031001 001500", 0.007], ["20031001 001515", 0.007], ); # and I want the following array # @baz is a mashup of the above two # @baz is sorted on the date_time stamps (the first col), and # 'undef' is used for any missing value in corresponding arrays @baz = ( # date_time -----# #foo#, #bar# ["20031001 000000", 1.4, 0.001], ["20031001 000005", undef, 0.004], ["20031001 000015", undef, 0.005], ["20031001 001000", undef, 0.008], ["20031001 001005", undef, 0.007], ["20031001 001500", 1.5, 0.007], ["20031001 001515", undef, 0.007], ["20031001 003000", 1.6, undef], ["20031001 004500", 1.5, undef], ["20031001 010000", 1.4, undef], );

In real world, both foo and bar are rather large (a few thousand elements each), and either one could be bigger than the other.

I have tried to create a hash with date_time as keys to avoid duplicate time stamps, checking for existence of keys, and suitably mashing them together, then sorting the hash on its keys to write out an array. But my code is messy and bad, and slow, and I am fatigued, and bad, and slow now. Any pointers would be appreciated.

--

when small people start casting long shadows, it is time to go to bed

Replies are listed 'Best First'.
Re: mashing two arrays
by jettero (Monsignor) on Jan 19, 2007 at 17:41 UTC
    Even though your hash attempts have failed, I still think it's probably the way to go. It might look something like the following. There may be better ways to do it, but this seems simplest to me.
    my %h = (); $h{$_->[0]} = [$->[1]] for @a1; # this will make empty arrayrefs automatically where necessary $h{$_->[0]}[1] = $_->[1] for @a2; my @b = map {[ $_ => @{$h{$_}} ]} sort keys %h;

    (sort is probably ok to preserve the order here, but if it weren't, then Tie::IxHash might be helpful instead.)

    -Paul

      yes, thank you. On fixing a typo above, and one mod as below, it works (had to add the undef in the first assigment, else it doesn't show up wherever the second array elements don't exist).

      my %h = (); $h{$_->[0]} = [$_->[1], undef,] for @a1; $h{$_->[0]}[1] = $_->[1] for @a2; my @arr = map {[$_, @{$h{$_}}]} sort keys %h;
      --

      when small people start casting long shadows, it is time to go to bed

        Alternatives:

        my %h; $h{$_->[0]}[0] = $_->[1] for @a1; $h{$_->[0]}[1] = $_->[1] for @a2; my @arr = map [ $_, $h{$_}[0], $h{$_}[1] ], sort keys %h;
        # Nevermind, this is broken and not worth fixing. my %h1 = map @$_, @a1; my %h2 = map @$_, @a2; my @arr = map [ $_, $h1{$_}, $h2{$_} ], sort keys %h1, keys %h2;

        If you were dealing in hashes instead of arrays:

        my %hash = map { $_ => [ $h1{$_}, $h2{$_} ] } keys %h1, keys %h2;
Re: mashing two arrays
by ferreira (Chaplain) on Jan 19, 2007 at 18:05 UTC

    An alternative to jettero's solution is

    my %baz; $baz{$_->[0]}{foo} = $_->[1] for @foo; $baz{$_->[0]}{bar} = $_->[1] for @bar; my @baz; for my $k (sort keys %baz) { push @baz, [ $k, @{$baz{$k}}{qw(foo bar)} ]; } use Data::Dump 'dump'; print dump(\@baz);
    which is very easy to generalize to more than two arrays. The idea is
    1. build a hash whose keys are the timestamps,
    2. each value is by itself a hash ref,
    3. stick the value from each array to some key (like 'foo' or 'bar') of these nested hashes
    4. and then build the result array with the key and a hash slice.
Re: mashing two arrays
by marcpestana (Initiate) on Jan 19, 2007 at 18:10 UTC
    Hello, How about...
    # declare anonymous hash my $hash = {}; map { $hash->{$_->[0]} = ['',$_->[1]] } @foo; for my $key (@bar) { $hash->{$key->[0]} ? $hash->{$key->[0]}[0] = $key->[1] : $hash->{$key->[0]} = [$key->[1],'']; }
Re: mashing two arrays
by thospel (Hermit) on Jan 20, 2007 at 22:17 UTC
    If the arrays are already sorted, it might be a pity to lose that information and have to re-sort at the end though. You might consider doing a standard merge instead:
    my @baz; my $f = $b = 0; while ($f < @foo && $b < @bar) { my $cmp = $foo[$f][0] cmp $bar[$b][0]; push @baz, [$cmp <= 0 ? @{$foo[$f++]} : (@{$bar[$b]}[0, 1], undef), $cmp >= 0 ? $bar[$b++][-1] : undef]; } push @baz, [@{$foo[$_]}, undef] for $f..$#foo; push @baz, [@{$bar[$_]}[0, 1], undef, $bar[$_][-1]] for $b..$#bar;
    You can do some time measurements with your real data to see if this gains you anything or not.
Re: mashing two arrays
by Anonymous Monk on Jan 24, 2007 at 15:17 UTC
    Assumed that both arrays are sorted.

    Make an iterator that gets initialized with both arrays and keeps an index for each of them.

    This iterator would always return the next date_time and undef or a value from either @foo or @bar. The index into either of the original arrays is only incremented if the value is taken.

    If you really need an array @baz you could fill it with the iterator.

    This solution would work with any number of arrays (but would not make sense with only one).

    You may find details, code snippets, etc. in "Higher Order Perl" reviewed on this site.