in reply to Re^2: Merging/Rearranging Tables
in thread Merging/Rearranging Tables

    "Thank you a lot!"

You're welcome a lot!

    "Will the following code work for me?"

Well, I'm tempted to ask -- "what happens when you try?".  The best way to learn is, after all, by trying.

I do see that you're trying to initialize @files from a list reference ['tab1.txt','tab2.txt']; which likely won't do what you're expecting (you'll get a single item in @files, which itself is a reference to the 2-item list).

Better to declare it like this:

my @files = ('tab1.txt','tab2.txt'); # Or, using the "quote-word" function "qw", # which lets you omit the quotes and the comma: my @files = qw( tab1.txt tab2.txt );

Furthermore, you're creating a variable $hash_key which you're never assigning to, but rather trying to perform a regex substitution on with:

my $hash_key =~ s/\.txt//; # generate hash key

That's why, when I run it with use strict; and use warnings; I get the error:

Use of uninitialized value in substitution (s///) at merge.pl line 16.

So I'm assuming what you want instead is to assign to the filename $file, and then perform the substitution to get the resulting hash key:

(my $hash_key = $file) =~ s/\.txt//; # generate hash key

A final thought:  make liberal use of Data::Dumper to see what data a given data structure contains at any time.  For example, to see the entire contents of $ptables after making each assignment:

$ptables{$hash_key} = [ @string_array ]; # save all strings to the has +h print Dumper(\%ptables); # use "\" to pass reference o +f hash

Update:  fixed typo (thanks johngg).


s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/

Replies are listed 'Best First'.
Re^4: Merging/Rearranging Tables
by homeveg (Acolyte) on Feb 11, 2007 at 11:30 UTC
    Well, you are right, it is better to try before to ask, but I thought maybe I did some principle mistake in data assignment to the hash structure that one can identify just by looking on it.

    Concerning @files and $hash_key inititalization - it was a mistake. I should have check it before, sorry.

    I'll try with the Data:Dumper.
    Thanks for your comments.

    Cheers,
    Evgeniy
      Finally,
      working version of the script:
      Input:
      Tab1.txt
      ID column | column 1 gene 1 | value 1.1 gene 2 | value 2.1 gene 4 | value 4.1 gene 8 | value 8.1

      Tab2.txt
      ID column | column 1 | column2 gene 1 | value 1.1 | n.a. gene 3 | value 3.1 | value 3.2 gene 4 | value 4.1 | value 4.2

      To run the script, provide it with file names as argumeents. Files should be in the same filder as a script.
      The table merging script is below:
      # Strict use strict; use warnings; # Libraries use Data::Dumper; #variables definition my (@filenames, @strings, @text); #read user defigned tables filenames; if(!$ARGV[1]){ die "Please provide with the at least 2 file names. Good luck!!!"; }; foreach my $element (@ARGV) { if ($element =~ /-help=/i) { print STDERR "Please provide with the at least 2 filenames\n"; + exit; } else { push (@filenames, $element); } } # Define master hash "$ptables" my $ptables; #read files and add data to the HoA: foreach my $file (@filenames) { my @string_array; #read input file and define arrays of strings open (FILE, "<$file") or die "$!"; while (<FILE>) { for my $chank (split /\n/) { push (@string_array, $chank);} } close (FILE); (my $hash_key = $file) =~ s/\.txt//; # generate hash key $ptables->{$hash_key} = [ @string_array ]; # save all strings to t +he hash print Dumper(\$ptables); undef @string_array; } # Globals my %output; my %ncolumns; my %values; my @tables = (sort keys %$ptables); # Get all table na +mes # Main program # First pass -- parse each table to fetch all the IDs print "=== Pass 1 ===\n"; foreach my $table (@tables) { my $ptab = $ptables->{$table}; # Assign to table my @rows = split(/\s*\|\s*/, shift @$ptab); # Get column headi +ngs shift @rows; # Discard "ID colu +mn" my $ncols = @rows; # Find number of c +olumns $ncolumns{$table} = $ncols; # Save # of column +s print "Reading $table; $ncols col(s)...\n"; # Announce table n +ame foreach my $line (@$ptab) { my ($id,@vals) = split(/\s*\|\s*/, $line); # Get ID and value +s $output{$id} ||= [ ]; # Placeholder for +ID $values{$table}{$id} = [ @vals ]; # Save values for +table/ID } } # Second pass -- process each ID, adding values from each table my @ids = (sort keys %output); print "=== Pass 2 ===\n"; foreach my $id (@ids) { print "Processing ID $id\n"; my $pout = $output{$id}; # Get current ID l +ist foreach my $table (@tables) { my $ncols = $ncolumns{$table}; # Get number of co +lumns my $pvalues = $values{$table}{$id}; # Get values for t +able/ID if (defined($pvalues)) { push @$pout, @$pvalues; # Save values } else { push @$pout, ( "n.a." ) x $ncols; # Missing value = +N/A } } } # Verify results print "=== Verify results ===\n"; foreach my $id (@ids) { my $pvalues = $output{$id}; printf "%12.12s | %s\n", $id, join(" | ", @$pvalues); }

      Thanks everybody for your help! Regards, Evgeniy