in reply to Multidimensional hash help!

G'day ila14,

Welcome to the monastery.

My biggest problem with this is determining the structure of the multidimensional hash you're trying to create.

Your description, "I would like to hash: % hash1 = Column1 => Column 2. %hash2 = %hash1 => Column3. %hash3 = %hash2 => Column 4.", conveys no real meaning. Your code is not helpful either: given you've stated "the syntax is not correct", this isn't too surprising.

From the code you've posted, I suspect you'd benefit from reading "perlintro -- a brief introduction and overview of Perl".

For information on data structures, I suggest you read "perldsc - Perl Data Structures Cookbook"; paying particular attention to the "HASHES OF HASHES" section.

The actual code you need may be as simple as this:

#!/usr/bin/env perl use strict; use warnings; use autodie; use constant { ID => 0, SYMBOL => 1, GO_ID => 5, GO_NAME => 6, }; my $file = './pm_1076856.tsv'; my %go_hash; open my $fh, '<', $file; while (<$fh>) { next if $. == 1; my @cols = split /\t/; $go_hash{$cols[SYMBOL]}{$cols[ID]}{$cols[GO_ID]} = $cols[GO_NAME]; } use Data::Dump; dd \%go_hash;

The file pm_1076856.tsv contains the input data you posted. Here's the output after running my example script:

{ Symbol1 => { H1SXX9 => { "GO:0015031" => "protein transport" } }, Symbol2 => { H1SXZ5 => { "GO:0003824" => "catalytic activity", "GO:0008152" => "metabolic process", "GO:0008728" => "GTP diphosphokinase activity", "GO:0015969" => "guanosine tetraphosphate metabolic p +rocess", "GO:0016301" => "kinase activity", "GO:0016310" => "phosphorylation", "GO:0016597" => "amino acid binding", "GO:0016740" => "transferase activity", }, }, Symbol3 => { H1SXZ8 => { "GO:0006812" => "cation transport", "GO:0008324" => "cation transmembrane transporter act +ivity", "GO:0030001" => "metal ion transport", "GO:0046872" => "metal ion binding", "GO:0055085" => "transmembrane transport", }, }, Symbol4 => { H1SY02 => { "GO:0006810" => "transport", "GO:0008565" => "protein transporter activity", "GO:0015031" => "protein transport", }, }, Symbol5 => { H1SY06 => { "GO:0004129" => "cytochrome-c oxidase activity", "GO:0005506" => "iron ion binding", }, }, }

If that's close to what you want, try changing the hash depth and @cols indices to get whatever you require.

If that's completely different from what you're after, and you still can't work out what code you need, reduce your example data to a more manageable size for demonstration purposes (maybe half a dozen records) and post the actual data structure you require (something along the lines of my posted output would be preferable).

Also take a look at the guidelines in "How do I post a question effectively?" for hints and tips on what you can do to help us to help you.

-- Ken

Replies are listed 'Best First'.
Re^2: Multidimensional hash help!
by ila14 (Initiate) on Mar 05, 2014 at 12:41 UTC

    I do have a further question. I tried to manually traverse my hash so that I can print the keys separated by newline and tab however because the hash is more than 2 dimensions I am experiencing trouble. (http://perlmaven.com/multi-dimensional-hashes) Here is the piece of code that I wrote.

    #foreach my $symb (sort keys %go_hash) { #foreach my $UniID (keys %{ $go_hash{$symb} }) { #foreach my $TaX (keys %{ $go_hash{$symb}{$UniID} }) { #print "$symb, $UniID, $go_hash{$symb}{$UniID}{$TaX}\n"; #} #} #}

    and here is a sample output:

    Hey, HSWZH7, HASH(0x7fdeb30dc310) how, HSX0L1, HASH(0x7fdeb3169768) are, HSX1I1, HASH(0x7fdeb31784b0) you, HSX4J3, HASH(0x7fdeb31784b0)
    The "prettiest" I have made my output using data::dumper so far is to set the indent to 1 and pair to "\t" as shown below. $Data::Dumper::Pair = " \t "; $Data::Dumper::Indent = 1; Thanks again.

      Thanks for fixing the formatting; however, instead of creating a new post you can just edit the original (see "How do I change/delete my post?"). Don't worry about the original (Re^2: Multidimensional hash help!): I've requested that it be reaped.

      When you post, please be specific about what you're doing. In this case, I've guessed $symb refers to the Symbol column. $UniID and $TaX are unclear (there's two columns with ID and two with Taxon): UniID and TaX may be standard abbreviations where you work (or generally in your industry) but I don't work in your industry nor do most of the people here who could help you.

      The number of levels of the hash shouldn't be an issue: I'm guessing another nested for loop would've accessed all the data. In the script below, I've added another level (to what I had in my previous script) and shown how to print the fields. [For future reference, you'll find logically indenting your code makes it a lot easier to read and maintain (compare your code with mine).]

      Data::Dump (which I used in my previous script) is a CPAN module which you may need to install. Data::Dumper is a built-in module. I've shown usage examples of both for comparison — you'll need click on "Reveal this spoiler" to see the output.

      #!/usr/bin/env perl use strict; use warnings; use autodie; use constant { ID => 0, SYMBOL => 1, TAXON_NAME => 3, GO_ID => 5, GO_NAME => 6, }; my $file = './pm_1076856.tsv'; my %go_hash; open my $fh, '<', $file; while (<$fh>) { next if $. == 1; my @cols = split /\t/; $go_hash{$cols[SYMBOL]}{$cols[ID]}{$cols[TAXON_NAME]}{$cols[GO_ID] +} = $cols[GO_NAME]; } close $fh; for my $symbol (sort keys %go_hash) { for my $id (sort keys %{$go_hash{$symbol}}) { for my $taxon_name (sort keys %{$go_hash{$symbol}{$id}}) { for my $go_id (sort keys %{$go_hash{$symbol}{$id}{$taxon_n +ame}}) { print join("\t" => $symbol, $id, $taxon_name, $go_id, $go_hash{$symbol}{$id}{$taxon_name +}{$go_id} ), "\n"; } } } } { print "\nData::Dumper Output:\n"; use Data::Dumper; local $Data::Dumper::Indent = 1; print Dumper \%go_hash; } print "\nData::Dump Output:\n"; use Data::Dump; dd \%go_hash;

      Output:

      Symbol1 H1SXX9 Homo Sapiens GO:0015031 protein transport Symbol2 H1SXZ5 Homo Sapiens GO:0003824 catalytic activity Symbol2 H1SXZ5 Homo Sapiens GO:0008152 metabolic process Symbol2 H1SXZ5 Homo Sapiens GO:0008728 GTP diphosphokinase + activity Symbol2 H1SXZ5 Homo Sapiens GO:0015969 guanosine tetraphos +phate metabolic process Symbol2 H1SXZ5 Homo Sapiens GO:0016301 kinase activity Symbol2 H1SXZ5 Homo Sapiens GO:0016310 phosphorylation Symbol2 H1SXZ5 Homo Sapiens GO:0016597 amino acid binding Symbol2 H1SXZ5 Homo Sapiens GO:0016740 transferase activit +y Symbol3 H1SXZ8 Homo Sapiens GO:0006812 cation transport Symbol3 H1SXZ8 Homo Sapiens GO:0008324 cation transmembran +e transporter activity Symbol3 H1SXZ8 Homo Sapiens GO:0030001 metal ion transport Symbol3 H1SXZ8 Homo Sapiens GO:0046872 metal ion binding Symbol3 H1SXZ8 Homo Sapiens GO:0055085 transmembrane trans +port Symbol4 H1SY02 Homo Sapiens GO:0006810 transport Symbol4 H1SY02 Homo Sapiens GO:0008565 protein transporter + activity Symbol4 H1SY02 Homo Sapiens GO:0015031 protein transport Symbol5 H1SY06 Homo Sapiens GO:0004129 cytochrome-c oxidas +e activity Symbol5 H1SY06 Homo Sapiens GO:0005506 iron ion binding

      -- Ken

Re^2: Multidimensional hash help!
by ila14 (Initiate) on Mar 05, 2014 at 09:37 UTC
    Hello Ken, Thank you for your response. I am a biologist and have only started using perl over the past month so am feeling a little lost. Your code does help a lot and is similar to what I require. I would need to add an additional to have it completely and shall read the references you posted. Thank you. ila
      "I am a biologist and have only started using perl over the past month so am feeling a little lost."

      I recommend you bookmark "perl - The Perl 5 language interpreter".

      From this page, you'll find links to the documentation for all the built-in functions, modules and other parts of the language as well as FAQs, tutorials and other resources.

      Rather than attempting to read everything at once (a particularly daunting endeavour), I suggest you familiarise yourself with the various sections and what they provide: this is a much simpler task and will allow you to quickly access information as and when you need it.

      Having said that, you'd probably benefit from reading "perlintro -- a brief introduction and overview of Perl" in its entirety.

      -- Ken

Re^2: Multidimensional hash help!
by ila14 (Initiate) on Mar 05, 2014 at 12:40 UTC
    I do have a further question. I tried to manually traverse my hash so that I can print the keys separated by newline and tab however because the hash is more than 2 dimensions I am experiencing trouble. (http://perlmaven.com/multi-dimensional-hashes) Here is the piece of code that I wrote. #foreach my $symb (sort keys %go_hash) { #foreach my $UniID (keys %{ $go_hash{$symb} }) { #foreach my $TaX (keys %{ $go_hash{$symb}{$UniID} }) { #print "$symb, $UniID, $go_hash{$symb}{$UniID}{$TaX}\n"; #} #} #} and here is a sample output: Hey, HSWZH7, HASH(0x7fdeb30dc310) how, HSX0L1, HASH(0x7fdeb3169768) are, HSX1I1, HASH(0x7fdeb31784b0) you, HSX4J3, HASH(0x7fdeb31784b0) The "prettiest" I have made my output using data::dumper so far is to set the indent to 1 and pair to "\t" as shown below. $Data::Dumper::Pair = " \t "; $Data::Dumper::Indent = 1; Thanks again.