DanielM0412 has asked for the wisdom of the Perl Monks concerning the following question:

this code is reading in a 4by4 matrix, ( related to biology ) here is the matrix

0.95 0.02 0.07 0.07 #A 0.03 0.01 0.06 0.83 #C 0.01 0.02 0.80 0.09 #G 0.01 0.95 0.07 0.01 #T
my $filepath = "OP.txt"; my $newdata = readdt2($filepath); sub readdt2 { my $ifn = shift; # how does this work open(my $IFH, "<$ifn") or die "cannot open file $ifn\n"; my $line; my @nucleotides = ("A","C","G","T"); my %ret; #why did he name a hash return?? that really confused me my @tmp; for(my $j=0;$j<4;$j++) { $ret{$nucleotides[$j]} = []; #dont understand $line = <$IFH>; chomp($line); @tmp = split(/\s+/,$line); for(my $i=0;$i<=$#tmp;$i++) { $ret{$nucleotides[$j]}->[$i]= $tmp[$i] ; #dont understand } } close($IFH); return(\%ret); }

what really threw me off was why did he name a hash $ret which is short for return thanks in advance! P.S. how can i make this code more flexible? say it reads in a matrix with unknown dimensions H, and K

Replies are listed 'Best First'.
Re: trouble understanding boss's code
by Util (Priest) on Jul 20, 2011 at 17:16 UTC

    It is a common practice, although not always the best idea. I often see $r or $ret (or @r, %r, @ret, %ret, etc) as the name of whatever variable will ultimately contain the value(s) to be returned. Often, the variable should instead be named in a way that makes the name itself self-document its contents or its use. I cannot tell if this would be one of those cases.

    Untested, equivalent code (just because I read via refactoring):

    sub read_dt2 { my ($path) = @_; open my $IFH, '<', $path or die "cannot open file '$path': $!"; my %mysterious_hash_keyed_by_nucleotide; for my $nucleotide ( qw( A C G T ) ) { local $_ = <$IFH>; $mysterious_hash_keyed_by_nucleotide{$nucleotide} = [split]; } close $IFH or warn "cannot close file '$path': $!"; return \%mysterious_hash_keyed_by_nucleotide; }

      thanks, i will definately try it out
Re: trouble understanding boss's code
by SuicideJunkie (Vicar) on Jul 20, 2011 at 17:27 UTC

    Well, he does return a reference to that hash at the end, so that part makes sense.

    Comment1: The shift at the beginning is implicitly shifting off of @_ to pull the first parameter to the function into $ifn. The open would be better written as: open my $IFH, '<', $ifn or die "Cannot open file '$ifn': reason = $!\n"; so that you can see WHY it failed to open via the $!.

    Comment2: Initializing $ret{A} through $ret{T} to be new arrayrefs as they come up in the loop. The loop would be better written as for my $j (@nucleotides) {...}, and then use $j in place of $nucleotides[$j]

    Comment3: The -> is redundant. But it seems to me the whole darn $i loop is redundant too. It could be more simply written as $ret{$nucleotides[$j]} = split(/\s+/, $line); (note the comment 2 replacement could simplify the $j here too). My guess for this one is that the Boss is thinking C, and is doing a manual memcpy().

    Putting the first line of the file into $ret{A} and the fourth line into $ret{T} seems very odd, but I've no idea what the later use is, so maybe it makes sense.

      wow, thanks a lot, i shortened it significantly,

      sub readdt2 { my $ifn = shift; open(my $IFH, "<$ifn") or die "cannot open file $ifn\n"; my $line; my @nt = ("A","C","G","T"); my %ret; my @tmp; for my $j(@nt) { $ret{$j} = []; $line = <$IFH>; chomp($line); @tmp = split(/\s+/,$line); for (my $i=0; $i<=$#tmp; $i++ ) { $ret{$j}[$i]= $tmp[$i] +0.0001; } } close($IFH); return(\%ret); }

      but when i used a data dumper the numbers didnt add up to 1.00, "A" added up to 1.11, "T" added up to 1.04, "C" added up to .93, G added up to .92 does anything stick out to you?

        According to the sample data you posted:

        0.95 0.02 0.07 0.07 #A 0.03 0.01 0.06 0.83 #C 0.01 0.02 0.80 0.09 #G 0.01 0.95 0.07 0.01 #T

        Each column adds up to 1.00, but each row adds up to an arbitrary value. Since you're putting 0.95, 0.02, 0.07 and 0.07 into the A array, it makes sense that the A array adds up to 1.11 :)

        PS: Why are you adding 0.0001 to your data? Printing the output with appropriate printfs should round the values off nicely and hide the artifacts from the floating point math in the CPU.

      $ret{$nucleotides[$j]} = split(/\s+/, $line); # ^ ^

      Missed to turn split() result in to an array reference.

Re: trouble understanding boss's code
by runrig (Abbot) on Jul 20, 2011 at 17:18 UTC
    The variable ret is what is being returned from the function, so it's a decent enough if not a great name.