in reply to can i avoid all these nested hashes

It may help to use objects instead of explicit hashes. The hashes are still effectively there and the nested structure is still effectively there, but you can provide methods that give friendly ways of managing the data that hide the underlying complexity. Consider:

#!/usr/bin/perl use strict; use warnings; package SNP; sub new { my ($class, %params) = @_; die "id parameter required by $class constructor\n" if !exists $pa +rams{id}; $params{genes} ||= {}; return bless \%params, $class; } sub addGene { my ($self, $geneId, %params) = @_; $self->{genes}{$geneId} ||= Gene->new(id => $geneId, %params); return $self->{genes}{$geneId}; } sub match { my ($self, $other) = @_; my @otherGenes = sort keys %{$other->{genes}}; my @genes = sort keys %{$self->{genes}}; my $result = ''; return "$self->{id} and $other->{id} differ in number of genes.\n" if @otherGenes != @genes; # different number of transcripts # Check all genes match for my $gene (@genes) { my $matchFail = $self->{genes}{$gene}->match($other->{genes}{$ +gene}); next if !$matchFail; $result .= "- Gene mismatch for $gene:\n"; $result .= $matchFail; } if ($result) { $result = "$self->{id} does not match $other->{id}:\n" . $resu +lt; } return $result; } package Gene; sub new { my ($class, %params) = @_; die "id parameter required by $class constructor\n" if !exists $pa +rams{id}; $params{trans} ||= {}; return bless \%params, $class; } sub addTranscript { my ($self, $transId, %params) = @_; $self->{trans}{$transId} ||= Transcript->new(id => $transId, %para +ms); return $self->{trans}{$transId}; } sub match { my ($self, $other) = @_; my @otherTrans = sort keys %{$other->{trans}}; my @trans = sort keys %{$self->{trans}}; my $result = ''; return "$self->{id} and $other->{id} differ in number of transacti +ons\n" if @otherTrans != @trans; # different number of transcripts # Check all transcripts match for my $transName (@trans) { my $matchFail = $self->{trans}{$transName}->match($other->{trans}{$transNa +me}); next if !$matchFail; $result .= "-- Transcript mismatch for $transName:\n"; $result .= $matchFail; } return $result; } package Transcript; sub new { my ($class, %params) = @_; die "id parameter required by $class constructor\n" if !exists $pa +rams{id}; $params{props} ||= {}; return bless \%params, $class; } sub setProp { my ($self, $prop, $value) = @_; $self->{props}{$prop} = $value; } sub match { my ($self, $other) = @_; my @otherProps = sort keys %{$other->{props}}; my @props = sort keys %{$self->{props}}; my $result = ''; return if @otherProps != @props; # different number of properti +es # Check all properties match for my $propName (@props) { if (!defined $other->{props}{$propName}) { $result .= "$self->{id} has $propName but $other->{id} doe +sn't\n"; next; } if ($self->{props}{$propName} ne $other->{props}{$propName}) { $result .= "--- $self->{id} and $other->{id} property $propName d +iffers:\n"; $result .= " '$self->{props}{$propName}' and '$other->{props}{ +$propName}'\n"; next; } } return $result; } package main; my $snp1 = SNP->new(id => 'SNP1'); my $gene = $snp1->addGene('Gene1'); my $trans = $gene->addTranscript('Trans1'); $trans->setProp(big => 1); $trans->setProp(color => 'blue'); $gene = $snp1->addGene('Gene2'); $trans = $gene->addTranscript('Trans2'); $trans->setProp(big => 1); $trans->setProp(color => 'green'); my $snp2 = SNP->new(id => 'SNP2'); $gene = $snp2->addGene('Gene1'); $trans = $gene->addTranscript('Trans1'); $trans->setProp(big => 1); $trans->setProp(color => 'blue'); $gene = $snp2->addGene('Gene2'); $trans = $gene->addTranscript('Trans2'); $trans->setProp(big => 1); $trans->setProp(color => 'blue'); print $snp1->match($snp2);

Prints:

SNP1 does not match SNP2: - Gene mismatch for Gene2: -- Transcript mismatch for Trans2: --- Trans2 and Trans2 property color differs: 'green' and 'blue'
True laziness is hard work

Replies are listed 'Best First'.
Re^2: can i avoid all these nested hashes
by Anonymous Monk on Dec 13, 2010 at 01:51 UTC

    That's clumbsier, slower, and uses 4 times as much memory.

    You're sugesting replacing one line of built-in code

    $snp{'Gene1'}{'Trans1'}{'color'} = 'blue';

    with

    my $snp1 = SNP->new(id => 'SNP1'); my $gene = $snp1->addGene('Gene1'); my $trans = $gene->addTranscript('Trans1'); $trans->setProp(big => 1);

    Plus 70 lines of code that does nothing more than replicate built-in functionality. More code, means more bugs.

    The very definition of assinine.

      At first glance, what Grandfather did above looks a lot like "copy/paste" coding, and maybe even "OOP for OOP's sake". But you don't have to look all that closely at his code, and you don't have think very long about the domain of the problem, to understand that Grandfather is actually making a smart investment up front, in anticipation of what could end up being a fairly diverse set of problems to be addressed.

      I'm all for using expedients whenever possible, and I would have probably used BrowserUK's approach myself, if I knew that there was just the one question to be answered (and quickly) about the data. But putting in a foundation for future work is not "assinine" [sic].

        Ask your average haulier if it is a "sound investment" to send a 8mpg 18-wheeler to pick up a 1/4 tonne of goods from a new customer, in antisipation that they might become a 25-tonne per trip customer sometime in the future.