Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl Monks,

I have 7 different arrays. The first being an unique identifier and rest being its values. Example;

@ID = qw(1 1 2 3 4 4 6 6); @Value_1 = qw(a a a a a a a a ); @Value_2 = qw(b b b b b b b b ); @Value_3 = qw(c d c c c d c d);

If one of the IDs have more than one value in one of the arrays it appears more than one time in each section. Since 1 hac c and d value in @Value_3 it appears two times with the same value in ID Value_1 and Value_2. What is to get rid of redundancies and if there is a different value for single ID then merge it like "c / d".

What I tried is to go through all the IDs and if ID$n eq ID$n - 1 then check if each Value1_$n is ne Value_1$n - 1 have a separate variable to merge it delete $n - 1 and append the new one. The problem is the code is getting really long with 7+ columns. Is there a shorter way to do it? I realize my explanation might be vague.. Thanks for your help!

if (($Entry > 0) && ($ID[$Entry] eq $ID[$Entry - 1])) { if ($Value_1[$Entry] =~ m/($Value_1[$Entry - 1])/) { $Placeholder = join(' / '$Value_1[$Entry], $Value_1[$Entry - 1]); delete $Value[$Entry - 1]; push(@Value, $Placeholder); etc... }

Replies are listed 'Best First'.
Re: Arrays merges and redundancies
by kennethk (Abbot) on Mar 30, 2012 at 18:34 UTC
    The most natural way to generalize the code IMHO is to change to a loop over array references rather than treating each array individually - see perlreftut. You could put this all in a complex data structure (perldsc) rather than how you have it loaded now, but you could generalize what you have as something like:

    if (($Entry > 0) && ($ID[$Entry] eq $ID[$Entry - 1])) { for my $ref (\@Value_1, \@Value_2, \@Value_3, \@Value_4, \@Value_5, \@Value_6, \@Value_7, ) { if ($ref->[$Entry] =~ m/($ref->[$Entry - 1])/) { $Placeholder = join(' / ', $ref->[$Entry], $ref->[$Entry - 1] +); delete $ref->[$Entry - 1]; push(@Value, $Placeholder); } } }

    Of course, the above is buggy since your posted code was already buggy -- you've got index problems on your counter as soon as you start deleting elements, and that push isn't helping. Also note the added comma in your join. But the loop should answer your question.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Arrays merges and redundancies
by Marshall (Canon) on Mar 30, 2012 at 20:19 UTC
    Here is one way...using use List::MoreUtils qw(each_array uniq);...Uses a hash of array (HoA) to gather up the values associated with the ID, then that hash is accessed in the same order as the ID array and values are compressed, re-formatted.

    Update: changed code so that it works with non-numeric ID's
    instead of printing, you can push a reference to \@new onto a new @Compressed array...
    each_array() makes an iterator that pulls pairs of numbers walking left to right from the ID array and the Value array row that we are working on. uniq() removes duplicates (unique values only - order is preserved). see List::MoreUtils

    #!/usr/bin/perl -w use strict; use List::MoreUtils qw(each_array uniq); use Data::Dumper; my @ID = qw(1 1 2 3 4 4 6 6); my @Values = ([qw(a a a a a a a a )], [qw(b b b b b b b b )], [qw(c d c c c d c d )]); print join(" ",uniq @ID), "\n"; #compressed ID's foreach my $row_ref (@Values) { # make hash to gather up values for each ID # eg: 1 => [c, d] my %id2values; my $ea = each_array(@ID, @$row_ref); while ( my ($id, $value) = $ea->() ) { push @{$id2values{$id}}, $value; } # compress out the dupe values then make a "c/d" string # if there is more than one value my @new = map{join ("/", uniq @{$id2values{$_}})} uniq @ID; print "@new\n"; #compressed values } __END__ 1 2 3 4 6 a a a a a b b b b b c/d c c c/d c/d
    Update Also tested with the other test case, the above code produces the correct result of:
    Apple Grape Banana 5 2 3/4 10/15 3 4 for: my @ID = qw(Apple Apple Grape Banana Banana); my @Values = ([qw(5 5 2 3 4 )], [qw(10 15 3 4 4 )], );

      I tried.

      use strict; use warnings; use Data::Dumper; my @IDs = qw(Apple Apple Grape Banana Banana); my @Price = qw(5 5 2 3 4 ); my @Amount = qw(10 15 3 4 4 ); sub uniqjoin { my %seen; my $sep=shift; return join($sep, grep { !$seen{$_}++ } sort @_); } my ($idx, $id, %h); #to hash of array while( ($idx,$id)= each(@IDs) ){ push @{$h{$id}->{price}} , $Price[$idx]; push @{$h{$id}->{amount}}, $Amount[$idx]; } #concatenate arrays with '/' foreach my $id (keys %h){ $h{$id}->{price} = uniqjoin( '/', @{$h{$id}->{price}} ); $h{$id}->{amount} = uniqjoin( '/', @{$h{$id}->{amount}} ); } #print print join(' ', sort keys(%h)) . "\n"; print join(' ' , map{ $h{$_}->{price} } sort keys(%h) ) . "\n"; print join(' ' , map{ $h{$_}->{amount} } sort keys(%h) ) . "\n";
      Sometimes I see List::Util and List::MoreUtil at Monk. I should see them.

        You were close.. I fixed your code? ...Don't sort the IDs - if you want to preserve the order of the data! Also, its a fluke that the data appeared in the right order due to sort in uniqjoin().

        List::Util is a core module (no installation required) - you will have to install List::MoreUtil if you want to use it. Besides being very handy functions, they are fast because they are implemented in C.

        use strict; use warnings; use Data::Dumper; my @IDs = qw(Apple Apple Grape Banana Banana); my @Price = qw(5 5 2 3 4 ); my @Amount = qw(10 15 3 4 4 ); sub uniqjoin { my $sep=shift; my %seen; return join($sep, grep { !$seen{$_}++ } @_); #NO SORT } my %h; #to hash of array my $idx =0; foreach my $id (@IDs) { push @{$h{$id}->{price}} , $Price[$idx]; push @{$h{$id}->{amount}}, $Amount[$idx]; $idx++; } #concatenate arrays with '/' foreach my $id (keys %h){ $h{$id}->{price} = uniqjoin( '/', @{$h{$id}->{price} } ); $h{$id}->{amount} = uniqjoin( '/', @{$h{$id}->{amount}} ); } my %seen; my @uniqIDs = grep{!$seen{$_}++}@IDs; #print print join(' ', @uniqIDs) . "\n"; print join(' ' , map{ $h{$_}->{price} } @uniqIDs ) . "\n"; print join(' ' , map{ $h{$_}->{amount} } @uniqIDs ) . "\n"; __END__ Apple Grape Banana 5 2 3/4 10/15 3 4
        update: notice that keys %h does not come out in any particular order - make @uniqIDs and use that array to enforce the ordering of the data to be like in the original when you print it out - and this will wind up being faster than doing multiple "sort keys %h" anyway

        There is some redundancy in the code, you could make your own uniq() function like the one in List:MoreUtil and use in join() - but why bother? Use the module and get the advantage of well debugged, fast code (should run faster than a pure Perl implementation). Of course there are some scalability/re-usability issues too - due to hard coding of price and value - but if this meets your needs - go for it!

        If you use a Perl 5.12 feature like the each() function for arrays, I would put a "use 5.012;" statement in the code. Many people like me are still at 5.10.1.

Re: Arrays merges and redundancies
by Riales (Hermit) on Mar 30, 2012 at 18:35 UTC

    I'm a little confused by your explanation, but I think you're using arrays when you probably want hashes? Maybe something like...

    my %hash = ( 1 => [qw/a a a a a a a a/], 2 => [qw/b b b b b b b b/], 3 => [qw/c d c c c d c d/], );

    But I'm still not very clear on what you're trying to do...

      From the following studpi example;

      @IDs = qw(Apple Apple Grape Banana Banana) @Price = qw(5 5 2 3 4 ) @Amount = qw(10 15 3 4 4 ) I want; @IDs = qw(Apple Grape Banana) @Price = qw(5 2 3/4) @Amount = qw(10/15 3 4)

      Ithought about using hashes but for that I need unique key and IDs are not unique. Sorry for vague explanations and thanks for help.