in reply to Merging/Rearranging Tables

Hi homeveg,

I think you will want to do this with a couple of passes.

In the first pass, read each ID, and keep track of how many columns each table has, as well as saving individual values per table.

In the second pass, for each ID, collect the corresponding values from each table, or assign the appropriate number of "n.a." (if the ID isn't defined for the table).

For example:

# Strict use strict; use warnings; # Libraries use Data::Dumper; # User-defined + # Define tables, each one a separate value in the hash "$ptables" my $ptables = { 'tab1' => [ "ID column | column 1", "gene 1 | value 1.1", "gene 2 | value 2.1", "gene 4 | value 4.1", "gene 8 | value 8.1", ], 'tab2' => [ "ID column | column 1 | column2", "gene 1 | value 1.1 | value 1.2", "gene 3 | value 3.1 | value 3.2", "gene 4 | value 4.1 | value 4.2", ] }; # Globals my %output; my %ncolumns; my %values; my @tables = (sort keys %$ptables); # Get all table na +mes # Main program # First pass -- parse each table to fetch all the IDs print "=== Pass 1 ===\n"; foreach my $table (@tables) { my $ptab = $ptables->{$table}; # Assign to table my @rows = split(/\s*\|\s*/, shift @$ptab); # Get column headi +ngs shift @rows; # Discard "ID colu +mn" my $ncols = @rows; # Find number of c +olumns $ncolumns{$table} = $ncols; # Save # of column +s print "Reading $table; $ncols col(s)...\n"; # Announce table n +ame foreach my $line (@$ptab) { my ($id,@vals) = split(/\s*\|\s*/, $line); # Get ID and value +s $output{$id} ||= [ ]; # Placeholder for +ID $values{$table}{$id} = [ @vals ]; # Save values for +table/ID } } # Second pass -- process each ID, adding values from each table my @ids = (sort keys %output); print "=== Pass 2 ===\n"; foreach my $id (@ids) { print "Processing ID $id\n"; my $pout = $output{$id}; # Get current ID l +ist foreach my $table (@tables) { my $ncols = $ncolumns{$table}; # Get number of co +lumns my $pvalues = $values{$table}{$id}; # Get values for t +able/ID if (defined($pvalues)) { push @$pout, @$pvalues; # Save values } else { push @$pout, ( "n.a." ) x $ncols; # Missing value = +N/A } } } # Verify results print "=== Verify results ===\n"; foreach my $id (@ids) { my $pvalues = $output{$id}; printf "%12.12s | %s\n", $id, join(" | ", @$pvalues); }

The output of which is:

gene 1 | value 1.1 | value 1.1 | value 1.2 gene 2 | value 2.1 | n.a. | n.a. gene 3 | n.a. | value 3.1 | value 3.2 gene 4 | value 4.1 | value 4.1 | value 4.2 gene 8 | value 8.1 | n.a. | n.a.

Of course, you can always add more tables to the master table $ptables.


s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

Replies are listed 'Best First'.
Re^2: Merging/Rearranging Tables
by homeveg (Acolyte) on Feb 11, 2007 at 02:05 UTC
    Thank you a lot! It looks like ready solution for me!
    I still have some problems with Hashes (but I am learning...), therefore, I still have a question:

    -----------
    How to fill HoA with the values? Will the following code work for me?

    # define files containing tables (tab-delimited text files) my @files = ['tab1.txt','tab2.txt']; # Define master hash "$ptables" my %ptables; #read files and add data to the HoA: foreach my $file (@files) { my @string_array = Read_File($file); # define arrays of strings my $hash_key =~ s/\.txt//; # generate hash key $ptables{$hash_key} = [ @string_array ]; # save all strings to the + hash }
          "Thank you a lot!"

      You're welcome a lot!

          "Will the following code work for me?"

      Well, I'm tempted to ask -- "what happens when you try?".  The best way to learn is, after all, by trying.

      I do see that you're trying to initialize @files from a list reference ['tab1.txt','tab2.txt']; which likely won't do what you're expecting (you'll get a single item in @files, which itself is a reference to the 2-item list).

      Better to declare it like this:

      my @files = ('tab1.txt','tab2.txt'); # Or, using the "quote-word" function "qw", # which lets you omit the quotes and the comma: my @files = qw( tab1.txt tab2.txt );

      Furthermore, you're creating a variable $hash_key which you're never assigning to, but rather trying to perform a regex substitution on with:

      my $hash_key =~ s/\.txt//; # generate hash key

      That's why, when I run it with use strict; and use warnings; I get the error:

      Use of uninitialized value in substitution (s///) at merge.pl line 16.

      So I'm assuming what you want instead is to assign to the filename $file, and then perform the substitution to get the resulting hash key:

      (my $hash_key = $file) =~ s/\.txt//; # generate hash key

      A final thought:  make liberal use of Data::Dumper to see what data a given data structure contains at any time.  For example, to see the entire contents of $ptables after making each assignment:

      $ptables{$hash_key} = [ @string_array ]; # save all strings to the has +h print Dumper(\%ptables); # use "\" to pass reference o +f hash

      Update:  fixed typo (thanks johngg).


      s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
        Well, you are right, it is better to try before to ask, but I thought maybe I did some principle mistake in data assignment to the hash structure that one can identify just by looking on it.

        Concerning @files and $hash_key inititalization - it was a mistake. I should have check it before, sorry.

        I'll try with the Data:Dumper.
        Thanks for your comments.

        Cheers,
        Evgeniy