janDD has asked for the wisdom of the Perl Monks concerning the following question:

Dear Community,
I have to align several arrays into a 2D matrix. Let's say, I have the two vectors :
['A','B','C'], ['A','B','D']
The output should look like:
A A B B C D
In order to achieve that "simple" version, I wrote the script which I attach below. It takes the first array and aligns the second and takes the second and aligns the first and outputs the version with the minimum amount of holes.
If, however, there are the following three vectors:
['A','B','C'], ['A','D','C'], ['A','B','C']
The script outputs (6 holes):
A A A B B C C D C
An "intelligent" person would find instead: (3holes)
A A A B B D C C C
I wonder if there is a way to achieve that. Actually, I'd need a code that "looks forward" which rows come and align either the right or the left side ... mhm.
Thanks for ideas/solutions,
Jan
script:
#!/usr/bin/perl -w use strict; #use Data::Dumper; my $hash; #$hash->{'Vector1'} = ['A','B','D','H','J']; #$hash->{'Vector2'} = ['A','B','C','D']; #$hash->{'Vector4'} = ['H','K']; $hash->{'Vector3'} = ['A','B','C']; $hash->{'Vector4'} = ['A','D','C']; $hash->{'Vector5'} = ['A','B','C']; #print Dumper($hash); my $count_total = scalar (keys %{$hash}); my @keys = keys %{$hash}; my $aligned; my $minimum; my $key_count=0; for (0 .. $#keys) { my $test = shift(@keys); push(@keys,$test); # # push first into results matrix # to start with # my @result; foreach (@{$hash->{$keys[0]}}) { push(@result,[$_]); } # # compare # my $count_processed=0; my @vector; foreach (keys %{$hash}) { if (($count_processed+2)>$count_total) { last; } @vector = @{$hash->{$keys[$count_processed+1]}}; my $i=0; while ($i <= $#vector) { my $entry = $vector[$i]; if (defined(${$result[$i]}[0])) { # print "1)Is "; # print $entry; # print " in "; # print join(',',@{$result[$i]}); # print " ?"; if (grep $_ eq $entry,@{$result[$i]}) { # print " => yes\n"; push(@{$result[$i]},$entry); } else { # print " => no\n"; splice(@vector,$i,0,''); my @space; for (0 .. ($count_processed-$#{$result[$i]})) { push(@space,''); } push(@{$result[$i]},@space); } ; } else { # print "undefined => shift\n"; my @space; for (0 .. $count_processed) { push(@space,''); } push(@{$result[$i]},@space,$entry); } $i++; } # # the latest column might have missing ''s at the end. This needs +to be # filled up # my $p=0; foreach (@result) { if ($#{$result[$p]} < $#{$result[0]} ) { push(@{$result[$p]},''); } $p++; } $count_processed++; } # # count the number of holes # my $n=0; foreach (@result) { foreach (@$_) { # print "$_ \t"; if ($_ eq '') { $n++; } } } # print "\n"; if($key_count == 0){ $minimum = $n; $aligned = { 'matrix' => [@result], 'n' => $n }; }else{ if($n < $minimum){ $minimum = $n; $aligned = { 'matrix' => [@result], 'n' => $n }; } } $key_count++; } print "minimal configuration ($aligned->{'n'} holes):\n"; foreach (@{$aligned->{'matrix'}}) { foreach (@$_) { print "$_ \t"; } print "\n"; }

Replies are listed 'Best First'.
Re: Converting Arrays into Matrix
by ikegami (Patriarch) on Apr 26, 2011 at 07:39 UTC

    I don't know what algorithms there are for doing N-way diffs, but this came up with something sane for your example:

    use strict; use warnings; use Algorithm::Diff qw( ); my @seqs = ( [qw( A B C )], [qw( A D C )], [qw( A B C )], ); my @combined; my @grid; for my $col_idx (0..$#seqs) { my $seq = $seqs[$col_idx]; my $diff = Algorithm::Diff->new(\@combined, $seq); my @new_combined; my @new_grid; while ($diff->Next()) { if ($diff->Same()) { for ($diff->Range(1)) { push @new_combined, $combined[$_]; push @new_grid, $grid[$_]; $new_grid[-1][$col_idx] = 1; } } else { for ($diff->Range(1)) { push @new_combined, $combined[$_]; push @new_grid, $grid[$_]; } for ($diff->Range(2)) { push @new_combined, $seq->[$_]; push @new_grid, []; $new_grid[-1][$col_idx] = 1; } } } @combined = @new_combined; @grid = @new_grid; } for my $row_idx (0..$#combined) { my $ch = $combined[$row_idx]; for my $col_idx (0..$#seqs) { print($grid[$row_idx][$col_idx] ? $ch : " ", " "); } print("\n"); }
    A A A B B D C C C
      Cool! That is exactly what I need. Had I just known that there exists a solution already x).
      Thanks, Jan

        Me again I have another problem with the script and since I frankly don't fully understand it, I ask again here:
        Please look at the output of:

        use strict; use warnings; use Algorithm::Diff qw( ); my @seqs = ( [qw( A B C D E F G H I )], [qw( A D C X F G H I )], # [qw( A )], # [qw( A B C )], ); my @combined; my @grid; for my $col_idx (0..$#seqs) { my $seq = $seqs[$col_idx]; my $diff = Algorithm::Diff->new(\@combined, $seq); my @new_combined; my @new_grid; while ($diff->Next()) { if ($diff->Same()) { for ($diff->Range(1)) { push @new_combined, $combined[$_]; push @new_grid, $grid[$_]; $new_grid[-1][$col_idx] = 1; } } else { for ($diff->Range(1)) { push @new_combined, $combined[$_]; push @new_grid, $grid[$_]; } for ($diff->Range(2)) { push @new_combined, $seq->[$_]; push @new_grid, []; $new_grid[-1][$col_idx] = 1; } } } @combined = @new_combined; @grid = @new_grid; } for my $row_idx (0..$#combined) { my $ch = $combined[$row_idx]; for my $col_idx (0..$#seqs) { print($grid[$row_idx][$col_idx] ? $ch : " ", " "); } print("\n"); }

        It outputs

        A A B C D D E C X F F G G H H I I

        What I want is:

        A A B B C C D X E F F G G H H I I

        The rows must be, so to say, unique, meaning that there must not be a "C" in line 3 and one in line "6". What I also wonder is: The Documentation of Algorithm:Diff reads that if finds the LCS. But for my example it finds 6 common rows while I find 7 ...
        Greetings,
        Jan

Re: Converting Arrays into Matrix
by Ratazong (Monsignor) on Apr 26, 2011 at 07:02 UTC

    Have you tried to sort your vectors before combining them? That should solve your issue ... (being a kind of the "look forward" you were asking for)

    HTH, Rata

    Update: Sorting (of course) changes the order of the elements (that made the proposed solution so elegant ;-) ... when proposing it I wasn't aware of a requirement to preserve the order ... so with the clarification below the simple solution won't work ...

      Hi,
      well, what exactly do you mean by "sorting"? The order of the entries must be preserved, e.g.
      ['A','C','B'] ne ['A','B','C']
      So it means that the result of the above should be
      A A C B B C
      or
      A A B C C B
      If I understood you correctly, sorting would result in
      A A B B C C
      Or not? This I don't want ;P
      Jan
Re: Converting Arrays into Matrix
by anonymized user 468275 (Curate) on Apr 26, 2011 at 11:07 UTC
    It looks like just two passes through a 2D array to me, which can be factorised into one pass through the outer dimension:
    my @TwoD = ( ['A','B','C'], ['A','D','C'], ['A','B','C'] ); my ( $slotH, %slotH ) = (-1,()); my @result = (); for ( my $i1 = 0; $i1 <= $#TwoD; $i1++ ) { for ( my $i2 = 0; $i2 <= $#{ $TwoD[ $i1 ]}; $i2++ ) { my $v = $TwoD[ $i1 ][ $i2 ]; defined( $slotH{ $v } ) or $slotH{ $v } = ++$slotH; $result[$i1][$slotH{ $v }] = $v; } for ( my $i2 = 0; $i2 <= $slotH; $i2++ ) { $i2 and print "\t"; print ( $result[$i1][$i2] || '' ); } print "\n"; }
    produces
    A B C A C D A B C
    Update: actually my version needs its output rotating 90 degrees to get exactly the same output, but the technique is similar for doing that, resulting in minor change to the above script.

    One world, one people

Re: Converting Arrays into Matrix
by LanX (Saint) on Apr 26, 2011 at 23:06 UTC
    using hash-slices for set arithmetics:

    use strict; my @AoA=( ['A','B','C'], ['A','D','C'], ['A','B','C'], ); my (%join,@AoH); for (@AoA) { @join{@$_}=(); my %set; @set{@$_}=(); push @AoH,\%set; } for my $elem (sort keys %join) { for (@AoH){ print "$elem" if exists $_->{$elem}; print "\t"; } print "\n"; }

    prints

    A	A	A	
    B		B	
    C	C	C	
    	D		
    

    Cheers Rolf

      Oh, I didn't see the previous two posts.
      They find indeed the correct number of common "rows". There is however the problem that the order is not fully conserved.

      [qw( A B C D E F G H I )],[qw( A D C X F G H I )]
      With "conserved" I mean, that in the second vector, the "X" comes after the "C" and in front of "F".
      The last two versions output the aligned vectors such that the "X" is at the end, i.e. after the "I".

      The letters stand for a process flow and its order must be conserved (and there is no way of sorting them as the names are arbitrary human readable things) ;)
      Greetings, Jan
        You can use my code with which ever order you want, just provide a sorted array to loop through.

        UPDATE: you might think that the wanted sorting is obvious, but that's wrong.

        take  [[qw/A X C/][qw/A Y C/] and find a heuristic that decides the order of X and Y.

        You must be explicit about the order!!!

        Cheers Rolf