in reply to A program to extract the reads and modify the seq ID
G'day Teju,
Welcome to the monastery.
I think this is closer to what you want:
#!/usr/bin/env perl use strict; use warnings; use autodie; open my $id_fh, '<', 'sample_ID.txt'; my %ids = map { s/^>//; $_ } grep { length } split /\s+/ => do { local + $/; <$id_fh> }; close $id_fh; { local $/ = "\n>"; open my $seq_fh, '<', 'sample_reads.fasta'; while (<$seq_fh>) { (my $key = (split)[0]) =~ s/^>//; next unless exists $ids{$key}; my ($head, $seq) = split /\n/, $_, 2; print '>' unless $head =~ /^>/; print "${head}_weight=$ids{$key}\n"; $seq =~ y/> \n//d; print "$seq\n"; } close $seq_fh; }
I only used the first five blocks of data from each file for my test. Here's the output:
>comp10003_c0_seq1 len=166 path=[748:0-22 1004:23-46 2527:47-165]_weig +ht=41 AAGTAGCCTATGCGCTACAGTAAGAAAGACAGGTGAAAAAATGGAAGTAAAACAATTAGATGACTACTTT +GGATATACAGAAAAGGGCAGTTCCTTAGAGGGGGAATTACGAGCAGGACTAACGACATTCTTGACAATG +GCGTACATTCTGTTTGTGAACCCAGAC >comp10004_c0_seq1 len=143 path=[2167:0-44 2322:45-68 2508:69-142]_wei +ght=25 AATCTTTAATTTAAACTTAAAAAAAATTAACTTTTGAAAGGAATTAAAATGGAAAAAGAAATGTTAGTAG +TAGCTAAATTAAAAGAAGGTACATTTGAAAAATTTATGGGTTTCATGCAATCGCCTGAAGGTTTAGCAG +AAAG
-- Ken
|
|---|