in reply to Re: .qual File Writing Complication
in thread .qual File Writing Complication

A .qual file is a numerical representation of the confidence of a nucleotide sequencer. Used by the modern, automated sequencers, they assign a 2 digit value to each nucleotide ACTGU (which are what all DNA is composed of) representing the accuracy of the sequencer guessing at that particular nucleotide (in my case, 40 being = 100% confidence). For example: if I were a DNA sequencer and I KNOW that the first three nucleotides are ACT but I'm only partially confident that the last nucleotide is a G. The .qual file would look something like this: 40 40 40 34. I hope this makes sense! Let me know if you need/would like further explanation!



here's my code:
#!/usr/bin/perl -w use strict; use warnings; use Bio::Seq; use Bio::SeqIO; use Bio::Seq::Quality; my $seq_count = 0; my $used_seqs; my $match; my $front; my $end; my $front_tag = 0; my $end_tag = 0; my $nfront = 0; my $nend = 0; my $no_match = 0; my $seq_file = "fastaTestData.fna"; my $qual_file = "qualTestData.qual"; my $seqIn_obj = Bio::SeqIO->new(-file => "$seq_file", -format => 'fast +a') or die "Can't access $seq_file\n"; #creates the fasta output file print "creating fasta output file\n"; my $fasta = Bio::SeqIO->new(-file => '>testFasta.fasta', -format => 'f +asta'); # creates new qual output file print "creating quality output file\n"; my $quality = Bio::SeqIO->new(-file => '>testQual.qual', -format => 'q +ual'); #reads in the tag file as a fasta file print "Reading in tag file\n"; my $tag_obj = Bio::SeqIO->new(-file => "mids.fa", -format => 'fasta') +or die "Can't access mids.fa\n"; #adds the tag id and sequence into a hash print "creating tag hash\n"; my %tag_hash = (); while(my $tag = $tag_obj->next_seq){ # instantiates the hash map, using the tag id as the key and the t +ag sequence as the value. $tag_hash{$tag->id} = $tag->seq; } # reads in the quality files associated with the fasta files. print "reading in qual file\n"; my $qual_obj = Bio::SeqIO->new(-file => "$qual_file", -format => "qual +") or die "Can't access $qual_file"; print "creating qual hash\n"; my %qual_hash = (); while(my $qual_seq = $qual_obj->_next_qual){ # next line is pushing a reference to an array onto an array of sc +alar array references $qual_hash{$qual_seq->id()} = $qual_seq->qual(); } # reads in the next sequence (next_seq) and stores it in $seqIn_obj th +en prints it out. print "Applying regexs\n"; while (my $seq = $seqIn_obj->next_seq){ print "getting new sequence\n"; $seq_count += 1; $match = 0; $front = 0; $end = 0; my $temp_seq = $seq->seq; while((my $id, my $tag_seq) = each(%tag_hash)){ print "cycling through tag hash\n"; # reverses the tag_seq my $rtag_seq = reverse $tag_seq; for(my $i = 0; $i <= 1; $i += 1){ if($temp_seq =~ s/^$tag_seq//i){ print "removing tag\n"; $match += 1; $front += 1; $front_tag = 1; } if($temp_seq =~ s/^$rtag_seq//i){ print "removing tag\n"; $match += 1; $front += 1; $front_tag = 1; } if($temp_seq =~ s/$tag_seq$//i){ print "removing tag\n"; $match += 1; $end += 1; $end_tag = 1; } if($temp_seq =~ s/$rtag_seq$//i){ print "removing tag\n"; $match += 1; $end += 1; $end_tag = 1; } } } if($front > 1){ $nfront += 1; } if($end > 1){ $nend += 1; } if($match == 0){ $no_match += 1; } if($front <= 1 && $end <= 1 && $match != 0){ print "removing 6 nts and amending qual sequence\n"; $used_seqs += 1; my @qual = @{$qual_hash{$seq->id()}}; print ("@qual\n"); my $qual_string = join(" ", @qual); print "$qual_string\n"; $qual_string =~ s///; print "$qual_string\n"; if($front_tag == 1){ $qual_string =~ s/^\w{32}//; $temp_seq =~ s/^\w{6}//; $front_tag = 0; } elsif($end_tag == 1){ $qual_string =~ s/\w{32}$//; $temp_seq =~ s/\w{6}$//; $end_tag = 0; } # creates the new sequence and passes the metadata to the sequ +ence object print "creating fasta seq and writing to file\n"; my $new_seq = Bio::Seq->new(-seq => $temp_seq, -display_id => +$seq->id, -length => $seq->length, -desc => $seq->desc, -alphabet => $seq->alphabet); # writes the new sequence object to the specified file $fasta->write_seq($new_seq); print "creating qual seq and writing to file\n"; my $new_qual = Bio::Seq::Quality->new(-seq => $qual_string, -i +d => $seq->id(), -desc => $seq->desc); $quality->write_seq($new_qual); } } print("nfront = $nfront\nnend = $nend\nno_match = $no_match\nseq_count + = $seq_count\nused_seqs = $used_seqs\n");


Feel free to ask if anything doesn't make sense (for I may have left something out from the copy and paste), if there are any hints to make it more efficient etc.. most importantly, to fix the error stated previously! Thanks again!

Replies are listed 'Best First'.
Re^3: .qual File Writing Complication
by ww (Archbishop) on Jul 08, 2010 at 21:09 UTC
    #!??? How does one leave "something out from the copy and paste?" :-)

    No syntax errors found. Warnings in OP appear to come from Bio:xxx; not from Perl.

    Based on the warning MSG: seq doesn't validate with [0-9A-Za-z\*\-\.=~\\/\?] and the fact I can't find any attempt in your code to "validate" a "$seq" with such a character class nor even a ($temp_seq I have to wonder: could your fastaTestData.fna be corrupt?

    Fixed unclosed code tag

Re^3: .qual File Writing Complication
by graff (Chancellor) on Jul 09, 2010 at 01:20 UTC
    Perhaps it's just naivete on my part (I'm not at all familiar with "try/catch/throw exception" style error handling), but I find it odd that your error-report stack trace ends with this:
    STACK: tagfinder.pl:257
    which would suggest to me that the your script that uses those Bio modules and throws that stack trace (i.e. the code that you wrote yourself) should have at least 257 lines in it. The thing that's odd about it is that the code you posted here isn't that long -- it's only 172 lines.

    So I don't think it makes a lot of sense to look for the error in the code you posted.

    Apart from that, I wonder if maybe you are getting your files confused -- like maybe you are passing your file full of numbers to a function that is expecting a file full of nucleotide letters, or something like that.

    Anyway, ask yourself whether you can break things up into smaller components, identify the particular piece that is causing trouble, and present just a small snippet of code and data to demonstrate the problem -- something that can be run by anyone who has the necessary modules installed (which wouldn't include me, I'm afraid). Good luck with that...

Re^3: .qual File Writing Complication
by furry_marmot (Pilgrim) on Jul 09, 2010 at 04:55 UTC

    I looked up your modules on CPAN and I see you're using the BioPerl package. There are 860 modules, 84 utility scripts, and a dozen or more documents. I don't know about the rest of y'all, but I'm not going to install and search the files. :-)

    From what I can see, Bio::Root::Exception is designed to throw exceptions, and the stack trace seems to be saying that it began with tagfinder.pl. How does that relate to the code you posted?

    I would guess that one of the many BioPerl modules is attempting to validate your data and failing. But we don't know where in your script it's failing or how you're running the program. In fact, I notice you've got print statements throughout your code, but there are none in your original post; so I can only guess that you didn't post the entire output. Is that true? If so, then what do you expect us to do? Even if someone here knows BioPerl and decides to respond, you still haven't given them enough to go on.

    Care to try again?

    --marmot
Re^3: .qual File Writing Complication
by cjfields (Novice) on Jul 12, 2010 at 20:08 UTC

    Just to note, as others have indicated putting the code here that generated that stack trace does help to trace the problem down. Looks like the problem arises when you are creating a new Bio::Seq::Quality, on line 165 of the code you submitted.

    According to the Bio::Seq::Quality you are using the wrong named attribute for the quality information:

    my $new_qual = Bio::Seq::Quality->new(-seq => $qual_string, -id => $seq->id(), -desc => $seq->desc);

    That should be:

    my $new_qual = Bio::Seq::Quality->new(-qual => $qual_string, -id => $seq->id(), -desc => $seq->desc);

    This class also accepts sequence data to go with those scores; not sure if it requires it, though.