in reply to trouble understanding boss's code

Well, he does return a reference to that hash at the end, so that part makes sense.

Comment1: The shift at the beginning is implicitly shifting off of @_ to pull the first parameter to the function into $ifn. The open would be better written as: open my $IFH, '<', $ifn or die "Cannot open file '$ifn': reason = $!\n"; so that you can see WHY it failed to open via the $!.

Comment2: Initializing $ret{A} through $ret{T} to be new arrayrefs as they come up in the loop. The loop would be better written as for my $j (@nucleotides) {...}, and then use $j in place of $nucleotides[$j]

Comment3: The -> is redundant. But it seems to me the whole darn $i loop is redundant too. It could be more simply written as $ret{$nucleotides[$j]} = split(/\s+/, $line); (note the comment 2 replacement could simplify the $j here too). My guess for this one is that the Boss is thinking C, and is doing a manual memcpy().

Putting the first line of the file into $ret{A} and the fourth line into $ret{T} seems very odd, but I've no idea what the later use is, so maybe it makes sense.

Replies are listed 'Best First'.
Re^2: trouble understanding boss's code
by DanielM0412 (Acolyte) on Jul 20, 2011 at 18:39 UTC

    wow, thanks a lot, i shortened it significantly,

    sub readdt2 { my $ifn = shift; open(my $IFH, "<$ifn") or die "cannot open file $ifn\n"; my $line; my @nt = ("A","C","G","T"); my %ret; my @tmp; for my $j(@nt) { $ret{$j} = []; $line = <$IFH>; chomp($line); @tmp = split(/\s+/,$line); for (my $i=0; $i<=$#tmp; $i++ ) { $ret{$j}[$i]= $tmp[$i] +0.0001; } } close($IFH); return(\%ret); }

    but when i used a data dumper the numbers didnt add up to 1.00, "A" added up to 1.11, "T" added up to 1.04, "C" added up to .93, G added up to .92 does anything stick out to you?

      According to the sample data you posted:

      0.95 0.02 0.07 0.07 #A 0.03 0.01 0.06 0.83 #C 0.01 0.02 0.80 0.09 #G 0.01 0.95 0.07 0.01 #T

      Each column adds up to 1.00, but each row adds up to an arbitrary value. Since you're putting 0.95, 0.02, 0.07 and 0.07 into the A array, it makes sense that the A array adds up to 1.11 :)

      PS: Why are you adding 0.0001 to your data? Printing the output with appropriate printfs should round the values off nicely and hide the artifacts from the floating point math in the CPU.

        oh okay i see what you mean, thanks and the .0001 is supposed to represent the noise factor i think
Re^2: trouble understanding boss's code
by Anonymous Monk on Jul 21, 2011 at 11:27 UTC
    $ret{$nucleotides[$j]} = split(/\s+/, $line); # ^ ^

    Missed to turn split() result in to an array reference.