in reply to Converting Arrays into Matrix

I don't know what algorithms there are for doing N-way diffs, but this came up with something sane for your example:

use strict; use warnings; use Algorithm::Diff qw( ); my @seqs = ( [qw( A B C )], [qw( A D C )], [qw( A B C )], ); my @combined; my @grid; for my $col_idx (0..$#seqs) { my $seq = $seqs[$col_idx]; my $diff = Algorithm::Diff->new(\@combined, $seq); my @new_combined; my @new_grid; while ($diff->Next()) { if ($diff->Same()) { for ($diff->Range(1)) { push @new_combined, $combined[$_]; push @new_grid, $grid[$_]; $new_grid[-1][$col_idx] = 1; } } else { for ($diff->Range(1)) { push @new_combined, $combined[$_]; push @new_grid, $grid[$_]; } for ($diff->Range(2)) { push @new_combined, $seq->[$_]; push @new_grid, []; $new_grid[-1][$col_idx] = 1; } } } @combined = @new_combined; @grid = @new_grid; } for my $row_idx (0..$#combined) { my $ch = $combined[$row_idx]; for my $col_idx (0..$#seqs) { print($grid[$row_idx][$col_idx] ? $ch : " ", " "); } print("\n"); }
A A A B B D C C C

Replies are listed 'Best First'.
Re^2: Converting Arrays into Matrix
by janDD (Acolyte) on Apr 26, 2011 at 08:09 UTC
    Cool! That is exactly what I need. Had I just known that there exists a solution already x).
    Thanks, Jan

      Me again I have another problem with the script and since I frankly don't fully understand it, I ask again here:
      Please look at the output of:

      use strict; use warnings; use Algorithm::Diff qw( ); my @seqs = ( [qw( A B C D E F G H I )], [qw( A D C X F G H I )], # [qw( A )], # [qw( A B C )], ); my @combined; my @grid; for my $col_idx (0..$#seqs) { my $seq = $seqs[$col_idx]; my $diff = Algorithm::Diff->new(\@combined, $seq); my @new_combined; my @new_grid; while ($diff->Next()) { if ($diff->Same()) { for ($diff->Range(1)) { push @new_combined, $combined[$_]; push @new_grid, $grid[$_]; $new_grid[-1][$col_idx] = 1; } } else { for ($diff->Range(1)) { push @new_combined, $combined[$_]; push @new_grid, $grid[$_]; } for ($diff->Range(2)) { push @new_combined, $seq->[$_]; push @new_grid, []; $new_grid[-1][$col_idx] = 1; } } } @combined = @new_combined; @grid = @new_grid; } for my $row_idx (0..$#combined) { my $ch = $combined[$row_idx]; for my $col_idx (0..$#seqs) { print($grid[$row_idx][$col_idx] ? $ch : " ", " "); } print("\n"); }

      It outputs

      A A B C D D E C X F F G G H H I I

      What I want is:

      A A B B C C D X E F F G G H H I I

      The rows must be, so to say, unique, meaning that there must not be a "C" in line 3 and one in line "6". What I also wonder is: The Documentation of Algorithm:Diff reads that if finds the LCS. But for my example it finds 6 common rows while I find 7 ...
      Greetings,
      Jan

        [qw( A B C D E F G H I )], [qw( A D C X F G H I )],
        Your second array does not contain an 'B', but starts with A D C. Looks like a typo.

        Dear community
        Well, you are right, it was a typo. However, the problem I faced is still persistant (though I was not able to describe it due to my typo). It is also hard to reproduce. This is why I would like to show you a sample of my data. Please execute the script and look at the output:

        You will see that there is the letter "V" in row number 19 AND 25 ... That is not correct, is it?

        They should all be in 25 or even 28, depending on the situation with the 3 and the U...

        I am really sorry for the amount of data, but I cannot really reproduce this behavior. With many smaller data sets (those i showed you), it worked ...
        Greetings, Jan

        #!/usr/bin/perl -w use strict; use warnings; use Algorithm::Diff qw( ); my @seqs = ( [ qw ( A B C D E Z F G H I J K L M N O V W X Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O P Q V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O P V W Y ) ], [ qw ( A B C D E Z F G H I J K L M N O P V Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O P Q R S T V W Y ) ], [ qw ( 3 W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W Y ) ], [ qw ( A B C D E Z F G H I J K L M N O V W 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P V W Y ) ], [ qw ( A B C D E Z F G H I J K L M N Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S W 2 4 ) ], [ qw ( N O V Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( N O P Q R S V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( 7 2 4 ) ], [ qw ( A B C D E 1 F G H I J K L M N O P V 5 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O V W 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T U W 2 Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O V 5 ) ], [ qw ( A B C D E Z F 6 G I J K V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S W X 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W Y ) ], [ qw ( A O P Q R V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V 5 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O V W 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O P Q V W 4 ) ], [ qw ( A B C D E 1 F G H I J K L M 5 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F 6 G H I J K L M N O P Q R V W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S V W Y ) ], [ qw ( A B C D E Z F G H I J K L M N O P V W Y ) ], [ qw ( A B C D E 1 F G H I J K L M N O P V 5 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T U V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W Y ) ], [ qw ( 3 W 2 4 ) ], [ qw ( 3 W 2 4 ) ], [ qw ( A B C D E 1 F G H I J K L M N O P V 5 ) ], [ qw ( A O P Q V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F 6 G I J K L M N P V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O P V X Y ) ], [ qw ( A B C D E 1 F G H I J K L M N O P V 5 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( N O P V W Y ) ], [ qw ( A B C D E 1 F G H I J K L M N O P V 5 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F 6 G I J K L M N O P Q R V W Y ) ], [ qw ( A O P Q V W Y ) ], [ qw ( 3 W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P V W Y ) ], [ qw ( A O P Q R V W Y ) ], [ qw ( A B C D E Z F G H I J K L M N O P V X Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T U V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O 5 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T U V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( N O P V W Y ) ], [ qw ( A B C D E Z F G H I J K L M N O P V W Y ) ], [ qw ( A B C D E Z F G H I J K L M N O V W Y ) ], [ qw ( A B C D E Z F 6 G I J K L M N P V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O P Q R S T U V W Y ) ], [ qw ( A B C D E Z F G H I J K L M N O P V X Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S W 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A O P Q R V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P V W Y ) ], [ qw ( N O P V W Y ) ], [ qw ( A B C D E Z F G H I J K L M N O V W 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E Z F 6 G I J K L M N O P Q R V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O V W Y ) ], [ qw ( A B C D E 1 F G H I J K L M N O P V 5 ) ], [ qw ( A B C D E Z F G H I J K L M N O P Q R S T V W Y ) ], [ qw ( N O P V Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S W X 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O P Q R S T U V W Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O P V Y ) ], [ qw ( A B C D E Z F G H I J K L M N O P V W Y ) ], [ qw ( A B C D E Z F G H I J K L M N O V W 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O P V Y ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T V W 2 4 ) ], [ qw ( A B C D E 1 Z F G H I J K L M N O P Q R S T W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O P Q V W 2 4 ) ], [ qw ( A B C D E Z F G H I J K L M N O V W Y ) ], ); my @combined; my @grid; for my $col_idx (0..$#seqs) { my $seq = $seqs[$col_idx]; my $diff = Algorithm::Diff->new(\@combined, $seq); my @new_combined; my @new_grid; while ($diff->Next()) { if ($diff->Same()) { for ($diff->Range(1)) { push @new_combined, $combined[$_]; push @new_grid, $grid[$_]; $new_grid[-1][$col_idx] = 1; } } else { for ($diff->Range(1)) { push @new_combined, $combined[$_]; push @new_grid, $grid[$_]; } for ($diff->Range(2)) { push @new_combined, $seq->[$_]; push @new_grid, []; $new_grid[-1][$col_idx] = 1; } } } @combined = @new_combined; @grid = @new_grid; } for my $row_idx (0..$#combined) { my $ch = $combined[$row_idx]; for my $col_idx (0..$#seqs) { print($grid[$row_idx][$col_idx] ? $ch : " ", " "); } print("\n"); }