Re: A program to extract the reads and modify the seq ID

Hello Teju,

Welcome to Perlmonks! There are several problems in your program/data:

You have "sample_ID.txt" but you open "sample_IDs.txt"
Your ID file contains two columns but you only want the first column in your FASTA processing
No die statements to tell you about failed file I/O
You use \n> as your input record separator (Look for $/ in perlvar) but you're still processing those opening '>' in fasta tags.
You are wiping out most of "useful" information from your FASTA processing. So your expected output doesn't match. If I'm reading your program right, you'd be getting those DNA sequences - nothing else.

So, if you fix all of those things - you should be on your way (to glory or otherwise!).

Here's my program fixed up mostly for clarity/readability:

use strict;
use warnings;
my $idsfile = "sample_IDs.txt";
my $seqfile = "sample_reads.fasta";
my %ids  = ();

open (my $idfh, "<", $idsfile) or die "Can't open $idsfile - ", $!, "\
+n";
while(my $line = <$idfh>) {
  chomp $line;
  next unless $line;
  my @contents = split /\s/, $line;
  $ids{$contents[0]} += 1;
}
close $idfh;

local $/ = "\n>";  # read by FASTA record

open (my $fastafh, "<", $seqfile) or die "Can't open $seqfile - ", $!,
+ "\n";
while (my $line = <$fastafh>) {
    chomp $line;
    next unless $line;
    my ($id) = $line =~ /^(\S+)/;  # parse ID as first word in FASTA h
+eader
    next unless $id;
    if ($ids{">" . $id}) {
        $line =~ s/^>*.+\n//;  # remove FASTA header
        $line =~ s/\n//g;  # remove endlines
        print "$line\n";
    }
}
close $fastafh;
[download]

Comment on Re: A program to extract the reads and modify the seq ID Select or Download Code

Replies are listed 'Best First'.
Re^2: A program to extract the reads and modify the seq ID by Kenosis (Priest) on Feb 25, 2014 at 18:08 UTC
Your script only prints the sequences and not the header with the OP's wanted appended info.	[reply]
Re^3: A program to extract the reads and modify the seq ID by robby_dobby (Hermit) on Feb 26, 2014 at 04:02 UTC
Yes, didn't I say that I fixed the program "mostly for clarity/readability"? I also mentioned this in my last point. :-)	[reply]
Re^4: A program to extract the reads and modify the seq ID by Kenosis (Priest) on Feb 26, 2014 at 05:29 UTC
The OP says that the program ...neither prints out the output nor gives me no errors.... In fact, it doesn't print anything. I'm not sure that you've clarified it or made it more readable--nor did the OP request such--but your refactoring now makes it print only the sequences. Please don't get me wrong: your bullet points for the OP were very well done. I suppose I just assumed too much thinking that the code you posted was a corrected version of the OP's code. Instead, the OP must now fix your code--or his/her own--to get it to work correctly. The latter, however, is certainly not a bad thing, and your bullet points should help...	[reply]
Re^5: A program to extract the reads and modify the seq ID by robby_dobby (Hermit) on Feb 26, 2014 at 06:02 UTC
Re^6: A program to extract the reads and modify the seq ID by Kenosis (Priest) on Feb 26, 2014 at 06:17 UTC