in reply to Merging/Rearranging Tables

A little late, but wanted to show a slightly different way. Really, much of it follows liverpole's ideas but saves the data as it's string instead of an array of values.

Chris

#!/usr/bin/perl use strict; use warnings; use File::Basename; use Sort::Naturally; my %data; my $table; my %col_cnt; @ARGV = glob "tab*.txt"; while (<>) { next if /^ID/; chomp; ($table = fileparse $ARGV) =~ s/\.txt$// unless $table; $col_cnt{$table} ||= tr/|//; my ($gene, $cols) = split /\s*\|\s*/, $_, 2; $data{$gene}{$table} = $cols; $table = '' if eof; } my @tables = nsort keys %col_cnt; for my $gene (nsort keys %data) { print $gene; for my $table (@tables) { if ($data{$gene}{$table}) { print ' | ' . $data{$gene}{$table}; } else { print ' | ' . join " | ", ("n.a.") x $col_cnt{$table}; } } print "\n"; }
Update: Changed from storing the output in a string and waiting until the the entire output was stored and then printed to printing as the program proceeds.

Update: Changed from built-in sort to 'nsort' (use Sort::Naturally). If the fields being sorted have part of their name a number exceeding 9,(gene1 gene2 gene9 gene10) or (tab1 tab2...tab11 tab12), they won't sort properly using the default sort.

Replies are listed 'Best First'.
Re^2: Merging/Rearranging Tables
by homeveg (Acolyte) on Feb 13, 2007 at 14:37 UTC
    Hi,
    thanks for your solution. It is very interesting! Can you comment it a bit?
    The only minor modification, I would do - keep the header line:
    while (<>) { # next if /^ID/; chomp; ($table = fileparse $ARGV) =~ s/\.txt$// unless $table; $col_cnt{$table} ||= tr/|//; my ($gene, $cols) = split /\s*\|\s*/, $_, 2; $data{$gene}{$table} = $cols; $table = '' if eof; }
      The only minor modification, I would do - keep the header line

      That would not process in the code below - it is saving genes and their values - if you want to save the headers the code would need to be rewritten to accomplish that.

      while (<>) { next if /^ID/; chomp; # if $ARGV = 'tab1.txt', $table will = 'tab1' # unless $table is already initiallized. # $table is uninitiallized on the first read # and after end of file for every file # (see below: $table = '' if eof;) ($table = fileparse $ARGV) =~ s/\.txt$// unless $table; # If the column count for this table is not already # stored, then use the transliteration operator to count # the number of pipes (# of columns) and store it. $col_cnt{$table} ||= tr/|//; my ($gene, $cols) = split /\s*\|\s*/, $_, 2; # Here columns are not stored in an array, but in a string # if the data is: # gene 3 | value 3.1 | value 3.2 # then $cols would be: # 'value 3.1 | value 3.2' $data{$gene}{$table} = $cols; $table = '' if eof; }
      Does this explain better? I suspect that some of your question is about $ARGV and where that comes from and the input (angle) operator (while (<>)) and reading files from @ARGV.

      Chris

        Hi,
        Thanks again for your answer!
        You was right, I had questions about the reading from @ARGV, but can not formulate it correctly, therefore I was asking for your comments in general. :)
        No it is clear for me what is going on.
        In addition - nsort usage is very actual for my case, as long as i really have the situation of mixed names, when you have both characters and numbers.
        Cheers,
        Evgeniy