Dear monks,

I am trying to replace the headers of a multifasta file as input 1 by the set of new headers in input 2.

INPUT 1 could be for example:

>head1 \n seq1 \n >head2 \n seq2 \n

INPUT2 could be, for example

head1 \t headA \n headB \t headB \n

The FINAL output should be of the form

>headA \n seq1 \n >headB \n seq2 \n

This is the code I've have so far, but I have trouble when trying to use key value in hashes to extract out new header! Strange, what am I doing wrong?

note: I am a a very new to Perl programming

Thanks in advance to anyone who can help

use strict; use warnings; my @filenames = @ARGV; my @output; my (@headers_seqs, %head_seqs, %header_pairs, $destination); open(IN1, '<', $filenames[0]) or die "Can't read from multifasta file +with alternating lines of headers and sequences $_: $! \n"; open(IN2, '<', $filenames[1]) or die "Can't read from tab-delimited he +ader replacement file $_: $! \n"; while(<IN1>){ chomp; if ($_=~ m/\>/) #looks for match to the + '>' character { my $header = $_; # print $header, "\n"; $header =~ s/\>//; push @headers_seqs, $header; # print "header:", "\t", $header, "\n"; } elsif ($_!~ m/\>/) #looks for match to +the '>' character { my $seq = $_; # print "seq:", "\t", $seq, "\n"; push @headers_seqs, $seq; } } print $#headers_seqs, "\n"; %head_seqs = @headers_seqs; ###########################***********************-------------------- +--------------------***********************########################## +# my @head_orig_new; while(<IN2>){ chomp; my @line_splits = split('\s+',$_); #print "@line_splits", "\n"; my $orig_header = $line_splits[0]; #print $orig_header, "\n"; my $new_header = $line_splits[1]; #print $new_header, "\n"; push @head_orig_new, $orig_header; push @head_orig_new, $new_header; } #print $#head_orig_new, "\n"; %header_pairs = @head_orig_new; ###########################***********************-------------------- +--------------------***********************########################## +# foreach(keys %head_seqs) { push @output, '>', $header_pairs{$_}, "\n"; push @output, $head_seqs{$_},"\n"; } ###########################***********************-------------------- +--------------------***********************########################## +# $destination = $filenames[0].'_headers-replaced.fasta'; open (OUT, '>', $destination) or die "Can't write to file $destination +: $!\n"; print OUT @output; close IN1; close IN2; close OUT;

In reply to replace FASTA sequence headers by onlyIDleft

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.