To_Bz has asked for the wisdom of the Perl Monks concerning the following question:

I have the following code to get the nucleotide frequency of DNA subfragments.
my $file1=shift; my @array=('A','T','C','G','AA','AT','AG','AC','TA','TT','TG','TC','GA +','GT','GG','GC','CA','CT','CG','CC','AAA','AAT','AAG','AAC','ATA','A +TT','ATG','ATC','AGA','AGT','AGG','AGC','ACA','ACT','ACG','ACC','TAA' +,'TAT','TAG','TAC','TTA','TTT','TTG','TTC','TGA','TGT','TGG','TGC','T +CA','TCT','TCG','TCC','GAA','GAT','GAG','GAC','GTA','GTT','GTG','GTC' +,'GGA','GGT','GGG','GGC','GCA','GCT','GCG','GCC','CAA','CAT','CAG','C +AC','CTA','CTT','CTG','CTC','CGA','CGT','CGG','CGC','CCA','CCT','CCG' +,'CCC','AAAA','AAAT','AAAG','AAAC','AATA','AATT','AATG','AATC','AAGA' +,'AAGT','AAGG','AAGC','AACA','AACT','AACG','AACC','ATAA','ATAT','ATAG +','ATAC','ATTA','ATTT','ATTG','ATTC','ATGA','ATGT','ATGG','ATGC','ATC +A','ATCT','ATCG','ATCC','AGAA','AGAT','AGAG','AGAC','AGTA','AGTT','AG +TG','AGTC','AGGA','AGGT','AGGG','AGGC','AGCA','AGCT','AGCG','AGCC','A +CAA','ACAT','ACAG','ACAC','ACTA','ACTT','ACTG','ACTC','ACGA','ACGT',' +ACGG','ACGC','ACCA','ACCT','ACCG','ACCC','TAAA','TAAT','TAAG','TAAC', +'TATA','TATT','TATG','TATC','TAGA','TAGT','TAGG','TAGC','TACA','TACT' +,'TACG','TACC','TTAA','TTAT','TTAG','TTAC','TTTA','TTTT','TTTG','TTTC +','TTGA','TTGT','TTGG','TTGC','TTCA','TTCT','TTCG','TTCC','TGAA','TGA +T','TGAG','TGAC','TGTA','TGTT','TGTG','TGTC','TGGA','TGGT','TGGG','TG +GC','TGCA','TGCT','TGCG','TGCC','TCAA','TCAT','TCAG','TCAC','TCTA','T +CTT','TCTG','TCTC','TCGA','TCGT','TCGG','TCGC','TCCA','TCCT','TCCG',' +TCCC','GAAA','GAAT','GAAG','GAAC','GATA','GATT','GATG','GATC','GAGA', +'GAGT','GAGG','GAGC','GACA','GACT','GACG','GACC','GTAA','GTAT','GTAG' +,'GTAC','GTTA','GTTT','GTTG','GTTC','GTGA','GTGT','GTGG','GTGC','GTCA +','GTCT','GTCG','GTCC','GGAA','GGAT','GGAG','GGAC','GGTA','GGTT','GGT +G','GGTC','GGGA','GGGT','GGGG','GGGC','GGCA','GGCT','GGCG','GGCC','GC +AA','GCAT','GCAG','GCAC','GCTA','GCTT','GCTG','GCTC','GCGA','GCGT','G +CGG','GCGC','GCCA','GCCT','GCCG','GCCC','CAAA','CAAT','CAAG','CAAC',' +CATA','CATT','CATG','CATC','CAGA','CAGT','CAGG','CAGC','CACA','CACT', +'CACG','CACC','CTAA','CTAT','CTAG','CTAC','CTTA','CTTT','CTTG','CTTC' +,'CTGA','CTGT','CTGG','CTGC','CTCA','CTCT','CTCG','CTCC','CGAA','CGAT +','CGAG','CGAC','CGTA','CGTT','CGTG','CGTC','CGGA','CGGT','CGGG','CGG +C','CGCA','CGCT','CGCG','CGCC','CCAA','CCAT','CCAG','CCAC','CCTA','CC +TT','CCTG','CCTC','CCGA','CCGT','CCGG','CCGC','CCCA','CCCT','CCCG','C +CCC'); my $name1=""; my $seq1=""; my %counts=(); my %counts_1=(); my %counts_2=(); my %counts_3=(); my %counts_4=(); my %total_mono=(); open (IN, "<$file1") or die ("Couldn't open file $file1\n"); while (my $i=<IN>){ next unless ($i =~ /\w+/); chomp($i); if ($i =~ /^>(\S+)/){ unless ($seq1 eq ""){ $seq1 =~ s/[^ATCG]//g; &process_nuc($seq1, $name1); } $seq1=""; $name1=$1; }else{ $seq1.=uc($i); } } close IN; $seq1 =~ s/[^ATCG]//g; &process_nuc($seq1, $name1); print "Matrix_"; print scalar(@array); for (my $k=0; $k<@array; $k++){ print "\t$array[$k]"; }print "\n"; my %norm_1=(); my %norm_2=(); my %norm_3=(); my %norm_4=(); foreach my $k (keys (%counts)){ print "$k"; 60 my $value=0; 61 $norm_1{'A'}=$counts_1{$k}{'A'}/$total_mono{$k}; 62 $norm_1{'T'}=$counts_1{$k}{'T'}/$total_mono{$k}; 63 $norm_1{'C'}=$counts_1{$k}{'C'}/$total_mono{$k}; 64 $norm_1{'G'}=$counts_1{$k}{'G'}/$total_mono{$k};

When I executed, I got results and the following error message:

Use of uninitialized value in division (/) at ./n_f.pl line 61.

Illegal division by zero at ./n_f.pl line 61.

I tried to use eval BLOCK as recomended in http://www.perlmonks.org/?node_id=369002, but it didnīt help that much because I get the same error message. I know that first message is a warning, but how can I know if this affects my results?.

Also, Iīm not quite sure if the second message is an error or a warning, so I donīt know if this made the script to stop.

Iīll appreciate the help.

Replies are listed 'Best First'.
Re: Uninitialized value in division and Illegal division by zero fix
by AppleFritter (Vicar) on Jul 09, 2014 at 17:47 UTC
Re: Uninitialized value in division and Illegal division by zero fix
by Old_Gray_Bear (Bishop) on Jul 09, 2014 at 17:42 UTC
    You are getting the error in line 61 because the denominator is zero, and division by zero is not defined (see the "Proof that 2 = 1" for example).

    The fix is to either:

    1. Check the denominator and skip the calculation if you are going to have a divide-by-zero.
    2. Debug the code and determine why $total_mono{$k} is not getting updated like you think it ought to be.

    ----
    I Go Back to Sleep, Now.

    OGB

Re: Uninitialized value in division and Illegal division by zero fix
by toolic (Bishop) on Jul 09, 2014 at 17:11 UTC
Re: Uninitialized value in division and Illegal division by zero fix
by PerlSufi (Friar) on Jul 09, 2014 at 18:46 UTC
    Listen to everything the other monks have previously told you as far as asking a question on here. It is not totally clear what the goal of your script is, either. You simply want to read a fasta file and calculate the percentage of occurances for each nucleotide? I have done this before, so I will try to help you.
    Also, It appears that you are trying to read a fasta file. Why are you shifting the file in the beginning? Please, do file handle like this instead:
    my $file = '/path/to/file.txt'; open (my $fh, '<', $file) or die "Could not open file '$file' $!";
    Update: So, if you delete the subroutine reference  &process_nuc from your first while loop, the code sucessfully reads a fasta file into  $seq1
    After that, you're gonna need to loop through it and increment each time a nucleo tide is found, then do your division for percentage.
Re: Uninitialized value in division and Illegal division by zero fix
by AnomalousMonk (Archbishop) on Jul 09, 2014 at 22:48 UTC

    It's not entirely clear to me what you're trying to do, but the
        my @array=('A','T','C','G','AA','AT','AG', ..., 'CCCA','CCCT','CCCG','CCCC');
    array and other code suggests you are iterating over every possible permutation of one through four of the  A T C G bases and searching a string for and counting every occurrence of each of these substrings.

    If so, there's an easier, probably faster, way. The following code can probably use more error checking, but as I don't know just what you're doing, I'll leave that to you. I also do not claim it's the fastest possible approach. HTH.

Re: Uninitialized value in division and Illegal division by zero fix
by johngg (Canon) on Jul 10, 2014 at 09:16 UTC

    Not addressing your problem but you might be able to save some typing and avoid mistakes by building your @array programmatically. The following code uses a recursive subroutine to construct an anonymous array ($raTerms), which is then dereferenced (@$raTerms) and printed. Running it with a maximum of three letters for brevity here.

    use strict; use warnings; use 5.014; my $raItems = [ qw{ A T G C } ]; my $raTerms = []; makeCombos( $raTerms, 3, $raItems ); say for @$raTerms; sub makeCombos { my( $raTerms, $iters, $raItems, @terms ) = @_; my @localTerms; if ( @terms ) { push @localTerms, map { my $item = $_; map { $_ . $item } @terms } @$raItems; } else { push @localTerms, @$raItems; } $iters --; push @$raTerms, @localTerms; return $iters ? makeCombos( $raTerms, $iters, $raItems, @localTerms ) : (); }

    The output.

    I hope this is helpful.

    Cheers,

    JohnGG

Re: Uninitialized value in division and Illegal division by zero fix
by redbull2012 (Sexton) on Jul 10, 2014 at 09:41 UTC

    Hmmm,

    First:

    #... if ($i =~ /^>(\S+)/){ # Captured in $1 unless ($seq1 eq ""){ $seq1 =~ s/[^ATCG]//g; #if match The s/// clears $1 &process_nuc($seq1, $name1); } $seq1=""; $name1=$1; #hence $name1 may be undef #...

    Second: Why do you print out all elements in "@array"? Ok, this may be test code.

    Third: Why do you create %norm_2, %norm_3, %norm_4 (more...) without using them ???

    Fourth: Where do you get the data for %counts_1, %total_mono ???

    Fifth: What does process_nuc() do????

    ....

    ..