onlyIDleft has asked for the wisdom of the Perl Monks concerning the following question:
Dear monks,
I am trying to replace the headers of a multifasta file as input 1 by the set of new headers in input 2.
INPUT 1 could be for example:
>head1 \n seq1 \n >head2 \n seq2 \n
INPUT2 could be, for example
head1 \t headA \n headB \t headB \n
The FINAL output should be of the form
>headA \n seq1 \n >headB \n seq2 \n
This is the code I've have so far, but I have trouble when trying to use key value in hashes to extract out new header! Strange, what am I doing wrong?
note: I am a a very new to Perl programming
Thanks in advance to anyone who can help
use strict; use warnings; my @filenames = @ARGV; my @output; my (@headers_seqs, %head_seqs, %header_pairs, $destination); open(IN1, '<', $filenames[0]) or die "Can't read from multifasta file +with alternating lines of headers and sequences $_: $! \n"; open(IN2, '<', $filenames[1]) or die "Can't read from tab-delimited he +ader replacement file $_: $! \n"; while(<IN1>){ chomp; if ($_=~ m/\>/) #looks for match to the + '>' character { my $header = $_; # print $header, "\n"; $header =~ s/\>//; push @headers_seqs, $header; # print "header:", "\t", $header, "\n"; } elsif ($_!~ m/\>/) #looks for match to +the '>' character { my $seq = $_; # print "seq:", "\t", $seq, "\n"; push @headers_seqs, $seq; } } print $#headers_seqs, "\n"; %head_seqs = @headers_seqs; ###########################***********************-------------------- +--------------------***********************########################## +# my @head_orig_new; while(<IN2>){ chomp; my @line_splits = split('\s+',$_); #print "@line_splits", "\n"; my $orig_header = $line_splits[0]; #print $orig_header, "\n"; my $new_header = $line_splits[1]; #print $new_header, "\n"; push @head_orig_new, $orig_header; push @head_orig_new, $new_header; } #print $#head_orig_new, "\n"; %header_pairs = @head_orig_new; ###########################***********************-------------------- +--------------------***********************########################## +# foreach(keys %head_seqs) { push @output, '>', $header_pairs{$_}, "\n"; push @output, $head_seqs{$_},"\n"; } ###########################***********************-------------------- +--------------------***********************########################## +# $destination = $filenames[0].'_headers-replaced.fasta'; open (OUT, '>', $destination) or die "Can't write to file $destination +: $!\n"; print OUT @output; close IN1; close IN2; close OUT;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: replace FASTA sequence headers
by Athanasius (Archbishop) on Jun 10, 2012 at 17:23 UTC | |
by jwkrahn (Abbot) on Jun 10, 2012 at 20:38 UTC | |
by Anonymous Monk on Jun 10, 2012 at 21:34 UTC | |
by jwkrahn (Abbot) on Jun 11, 2012 at 01:26 UTC | |
by Anonymous Monk on Jun 10, 2012 at 18:14 UTC | |
|
Re: replace FASTA sequence headers
by Cristoforo (Curate) on Jun 10, 2012 at 20:41 UTC | |
|
Re: replace FASTA sequence headers
by jwkrahn (Abbot) on Jun 10, 2012 at 19:34 UTC |