in reply to Combining hashes of hahses?

An issue is that push @{ $data{$taxon} }, $3; does not make sense - you intimate that data is a hash of hash, but you are using it as a hash of array in that statement.

Without more information telling us what you want to do with the data it's not clear if the data structures you are generating are appropriate. Consider the following sample however:

use strict; use warnings; my $file1 = <<FILE; dna: species_1 ACCATGATACGATG species_2 GGTTTCGACGCAGA species_3 GGACTCAGCGACTA FILE my $file2 = <<FILE; morph: species_1 001001010201001001 species_2 002010200210120201 species_4 001001110000000101 species_5 111001001001000201 FILE my %data; my $dnaLen = 0; my $morphLen = 0; open IN, '<', \$file1 or die "Failed to open file1: $!"; while (<IN>) { next unless /(^\w+)\s+(\w+)/; $data{$1}{dna} = $2; $dnaLen ||= length $2; } close IN; open IN, '<', \$file2 or die "Failed to open file2: $!"; while (<IN>) { next unless /(^\w+)\s+(\w+)/; $data{$1}{morph} = $2; $morphLen ||= length $2; } close IN; die "No dna data found" unless $dnaLen; die "No morph data found" unless $morphLen; for my $species (sort keys %data) { $data{$species}{dna} ||= '?' x $dnaLen; $data{$species}{morph} ||= '?' x $morphLen; print "$species: $data{$species}{dna}$data{$species}{morph}\n"; }

Prints:

species_1: ACCATGATACGATG001001010201001001 species_2: GGTTTCGACGCAGA002010200210120201 species_3: GGACTCAGCGACTA?????????????????? species_4: ??????????????001001110000000101 species_5: ??????????????111001001001000201

which uses a hash of hash where the primary key is the species and the secondary key is morph or dna.


Perl is environmentally friendly - it saves trees

Replies are listed 'Best First'.
Re^2: Combining hashes of hahses?
by erio (Initiate) on Nov 07, 2007 at 18:38 UTC
    Thanks GrandFather. Very helpful. Sorry to have been a bit light on context. I am trying to put DNA sequence data and coded morphological data into a file format that is accepted by a number of programs that reconstruct the evolutionary relationships amongst a group of organisms. The file will look something like this:
    #nexus begin data; dimensions ntax=5 nchar=32; format datatype=mixed (dna:1-14, standard:15-32) missing=? gap=-; Matrix species_1: ACCATGATACGATG001001010201001001 species_2: GGTTTCGACGCAGA002010200210120201 species_3: GGACTCAGCGACTA?????????????????? species_4: ??????????????001001110000000101 species_5: ??????????????111001001001000201 end;
    The number of species and characters can be quite large.

      In that case the hash of hash is exactly appropriate and it looks like my sample code should drop right into your application. Happy to help.

      Update: there is enough information to generate the header too: ;)

      ... my @species = sort keys %data; my $nSpecies = @species; # Print the header print "begin data;\n"; printf "dimensions ntax=%d nchar=%d;\n", $nSpecies, $dnaLen + $morphLe +n; printf "format datatype=mixed (dna:1-%d, standard:%d-%d) missing=? ga +p=-;\n", $dnaLen, $dnaLen + 1, $dnaLen + $morphLen; print "Matrix\n"; # Print the data for my $species (sort keys %data) { $data{$species}{dna} ||= '?' x $dnaLen; $data{$species}{morph} ||= '?' x $morphLen; print "$species: $data{$species}{dna}$data{$species}{morph}\n"; } print "end;\n";

      Prints:

      begin data; dimensions ntax=5 nchar=32; format datatype=mixed (dna:1-14, standard:15-32) missing=? gap=-; Matrix species_1: ACCATGATACGATG001001010201001001 species_2: GGTTTCGACGCAGA002010200210120201 species_3: GGACTCAGCGACTA?????????????????? species_4: ??????????????001001110000000101 species_5: ??????????????111001001001000201 end;

      Perl is environmentally friendly - it saves trees
        You are wise GrandFather, I had tried something similar without the benefit of a comprehension of printf, which makes things a bit tidier. Cheers!
Re^2: Combining hashes of hahses?
by convenientstore (Pilgrim) on Nov 07, 2007 at 21:48 UTC
    Grandfather, What is the purpose of
    $dnaLen ||= length $2;
    first i was just reading to see if I can make sense of the notation of ||= , but I found out that it's just $dnaLen = $dnaLen || length $2; But in this case, $dnaLen would never be anything other than 0(false).. ? am I not reading this correctly?

    UPDATE -- I guess it's being used here
    die "No dna data found" unless $dnaLen; die "No morph data found" unless $morphLen;

      $x ||= something; is commonly used to give $x a value if it hasn't one already (more correctly, if the current value is false). In the case cited it is to pick up the first non-zero length of a dna string. There is an implicit assumption that all dna strings are the same length.

      Note that Perl returns the value of which ever true value it finds when evaluating || (not simply a true or false value) so $x gets the value 'something' regardless of what the nature of 'something' is if $x is false to start with. In particular, this trick can be used to set a scalar to a default string if the scalar hasn't been set already:

      my $error; ... $error ||= 'No error found';

      Perl is environmentally friendly - it saves trees
        thank you always