trouble understanding boss's code

DanielM0412 has asked for the wisdom of the Perl Monks concerning the following question:

this code is reading in a 4by4 matrix, ( related to biology ) here is the matrix

0.95 0.02 0.07 0.07 #A
0.03 0.01 0.06 0.83 #C
0.01 0.02 0.80 0.09 #G
0.01 0.95 0.07 0.01 #T
[download]

my $filepath = "OP.txt";
my $newdata = readdt2($filepath);

sub readdt2 {
  my $ifn = shift; # how does this work
  open(my $IFH, "<$ifn") or die "cannot open file $ifn\n";
  my $line;
  my @nucleotides = ("A","C","G","T");
  my %ret; #why did he name a hash return?? that really confused me
  my @tmp;
  for(my $j=0;$j<4;$j++) {
    $ret{$nucleotides[$j]} = []; #dont understand 
    $line = <$IFH>;
    chomp($line);
    @tmp = split(/\s+/,$line);
    for(my $i=0;$i<=$#tmp;$i++) {
      $ret{$nucleotides[$j]}->[$i]= $tmp[$i] ;  #dont understand 
    }
  }
  close($IFH);
  return(\%ret);
}
[download]

what really threw me off was why did he name a hash $ret which is short for return thanks in advance! P.S. how can i make this code more flexible? say it reads in a matrix with unknown dimensions H, and K

Comment on trouble understanding boss's code Select or Download Code

Replies are listed 'Best First'.
Re: trouble understanding boss's code by Util (Priest) on Jul 20, 2011 at 17:16 UTC
It is a common practice, although not always the best idea. I often see $r or $ret (or @r, %r, @ret, %ret, etc) as the name of whatever variable will ultimately contain the value(s) to be returned. Often, the variable should instead be named in a way that makes the name itself self-document its contents or its use. I cannot tell if this would be one of those cases. Untested, equivalent code (just because I read via refactoring): `sub read_dt2 { my ($path) = @_; open my $IFH, '<', $path or die "cannot open file '$path': $!"; my %mysterious_hash_keyed_by_nucleotide; for my $nucleotide ( qw( A C G T ) ) { local $_ = <$IFH>; $mysterious_hash_keyed_by_nucleotide{$nucleotide} = [split]; } close $IFH or warn "cannot close file '$path': $!"; return \%mysterious_hash_keyed_by_nucleotide; }` [download]	[reply] [d/l]
Re^2: trouble understanding boss's code by DanielM0412 (Acolyte) on Jul 20, 2011 at 18:01 UTC
thanks, i will definately try it out	[reply]
Re: trouble understanding boss's code by SuicideJunkie (Vicar) on Jul 20, 2011 at 17:27 UTC
Well, he does return a reference to that hash at the end, so that part makes sense. Comment1: The `shift` at the beginning is implicitly shifting off of `@_` to pull the first parameter to the function into $ifn. The open would be better written as: `open my $IFH, '<', $ifn or die "Cannot open file '$ifn': reason = $!\n";` so that you can see WHY it failed to open via the $!. Comment2: Initializing $ret{A} through $ret{T} to be new arrayrefs as they come up in the loop. The loop would be better written as `for my $j (@nucleotides) {...}`, and then use `$j` in place of `$nucleotides[$j]` Comment3: The -> is redundant. But it seems to me the whole darn $i loop is redundant too. It could be more simply written as `$ret{$nucleotides[$j]} = split(/\s+/, $line);` (note the comment 2 replacement could simplify the $j here too). My guess for this one is that the Boss is thinking C, and is doing a manual memcpy(). Putting the first line of the file into $ret{A} and the fourth line into $ret{T} seems very odd, but I've no idea what the later use is, so maybe it makes sense.	[reply] [d/l] [select]
Re^2: trouble understanding boss's code by DanielM0412 (Acolyte) on Jul 20, 2011 at 18:39 UTC
wow, thanks a lot, i shortened it significantly, `sub readdt2 { my $ifn = shift; open(my $IFH, "<$ifn") or die "cannot open file $ifn\n"; my $line; my @nt = ("A","C","G","T"); my %ret; my @tmp; for my $j(@nt) { $ret{$j} = []; $line = <$IFH>; chomp($line); @tmp = split(/\s+/,$line); for (my $i=0; $i<=$#tmp; $i++ ) { $ret{$j}[$i]= $tmp[$i] +0.0001; } } close($IFH); return(\%ret); }` [download] but when i used a data dumper the numbers didnt add up to 1.00, "A" added up to 1.11, "T" added up to 1.04, "C" added up to .93, G added up to .92 does anything stick out to you?	[reply] [d/l]
Re^3: trouble understanding boss's code by SuicideJunkie (Vicar) on Jul 20, 2011 at 18:51 UTC
According to the sample data you posted: `0.95 0.02 0.07 0.07 #A 0.03 0.01 0.06 0.83 #C 0.01 0.02 0.80 0.09 #G 0.01 0.95 0.07 0.01 #T` [download] Each column adds up to 1.00, but each row adds up to an arbitrary value. Since you're putting 0.95, 0.02, 0.07 and 0.07 into the A array, it makes sense that the A array adds up to 1.11 :) PS: Why are you adding 0.0001 to your data? Printing the output with appropriate `printf`s should round the values off nicely and hide the artifacts from the floating point math in the CPU.	[reply] [d/l] [select]
Re^4: trouble understanding boss's code by DanielM0412 (Acolyte) on Jul 20, 2011 at 19:04 UTC
Re^5: trouble understanding boss's code by DanielM0412 (Acolyte) on Jul 20, 2011 at 19:11 UTC
Re^2: trouble understanding boss's code by Anonymous Monk on Jul 21, 2011 at 11:27 UTC
`$ret{$nucleotides[$j]} = split(/\s+/, $line); # ^ ^` [download] Missed to turn split() result in to an array reference.	[reply] [d/l]
Re: trouble understanding boss's code by runrig (Abbot) on Jul 20, 2011 at 17:18 UTC
The variable ret is what is being returned from the function, so it's a decent enough if not a great name.	[reply]