combine 2 fasta files into 1

newbie25 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm a newbie when it comes to perl and am stuck on a simple script. I am trying to take two different text files (exact same size and alignment, line by line) and put them together into one file.
File 1 looks like this (its a fasta dna sequence):

>GK9PVB108JH8SN rank=0000021 x=3781.0 y=885.0 length=137
TGATTATGAGTTAGATGTTCGCTCTGAGGTTTCAACGATGCTTCAAGATTCCTAATTCGC
GTTGCGACTCTCGAGTATGCGTTCTATTCACATTTCTGTTGTCGTACATATTTGACTCAC
GATCTTGATTTCTTATC <br>
>GK9PVB108JYDQ5 rank=0000032 x=3965.0 y=143.0 length=53
GTTTTCAACGCTGGTTCGAGATTTCCTAATTTCACATTGCGACTCTCGAGTGC
[download]

File 2 is the complement of 1 and looks like this (its a quality assessment of the above data):

>GK9PVB108JH8SN rank=0000021 x=3781.0 y=885.0 length=137
40 38 38 30 30 30 38 40 40 30 30 30 36 40 40 40 40 40 40 40 40 40 40 4
+0 39 34 34 14 14 14 14 13 19 14 18 25 32 32 34 27 
>GK9PVB108JYDQ5 rank=0000032 x=3965.0 y=143.0 length=53
25 21 21 23 23 37 31 31 31 37 37 39 38 40 40 40 40 40 40 40 40 39 39 3
+9 39 35 23 25 25 37 34 35 36 37 37 37 37 37 39 39 40 40 40 40 39 38 3
+5 33 33 33 32 31 19
[download]

What I want is a file that combines these to put the quality data under the corresponding sequence

eg: >GK9PVB108JYDQ5 rank=0000032 x=3965.0 y=143.0 length=53
GTTTTCAACGCTGGTTCGAGATTTCCTAATTTCACATTGCGACTCTCGAGTGC 
>GK9PVB108JYDQ5 rank=0000032 x=3965.0 y=143.0 length=53
25 21 21 23 23 37 31 31 31 37 37 39 38 40 40 40 40 40 40 40 40 39 39 3
+9 39 35 23 25 25 37 34 35 36 37 37 37 37 37 39 39 40 40 40 40 39 38 3
+5 33 33 33 32 31 19
[download]

The script I have written is:

use warnings;
use strict; 

open( F1, "subsettest.txt" ) or die "file1: $!";
open( F2, "subsettest1.txt" ) or die "file2: $!";

while ( my$s1 = <F1> ) {
   my$s2 = <F2>;
   if ( $s1 =~ /^>/ ) {
      print $s1;
   } else {
      chomp $s1;
      print "$s1\n$s2";  
   }
}
close F1;
close F2;
exit;
[download]

The problem is that this adds the second file line by line into the first file instead of underneath it. eg:

>GK9PVB108I6QD3 rank=0000053 x=3650.5 y=393.0 length=71
ACACTTTAGCGGGACATTATTACAAGAAGGTACCTGAACCACATCGGGTTTCCTTGCTTC
40 40 38 34 21 21 21 30 36 40 35 33 33 36 32 32 33 29 28 26 26 28 31 2
+6 26 31 24 24 23 23 34 25 27 27 32 32 22 22 22 23 32 32 34 34 34 19 1
+9 19 27 26 26 16 16 14 14 29 29 27 27 31
TTCAACGGTAA
[download]

What am I missing?!?!!

Comment on combine 2 fasta files into 1 Select or Download Code

Replies are listed 'Best First'.
Re: combine 2 fasta files into 1 by NetWallah (Canon) on Jul 30, 2010 at 20:25 UTC
The formatting of the files it not clear. If you indicate every appearance of a newline by <br> , then the second file does not contain any newlines, and so, the single <F2> will suck in the entire file, giving you the results you seem to have. It would be a good idea to check if you actualy got somethin in that read: `defined my $s2 = <F2> or die "I really need F2 to contain the same number of records +as F1:$!";` [download] To improve readability, please add a space after each "my". Syntactic sugar causes cancer of the semicolon. --Alan Perlis	[reply] [d/l]
Re: combine 2 fasta files into 1 by BioLion (Curate) on Jul 31, 2010 at 10:06 UTC
Not actually a fix to your code, but a slightly different approach, would be to temporarily store each FASTA record in a hash using the complete header as a key. This way you can check for existence of a known header and print out the records together. I haven't tested it, but the problem you are getting looks like it could be a buffering issue? If you `++$\|; # turn off buffering`, the issue might go away, but I still think i would go with the hash approach as it doesn't rely on the records being in the same order etc... HTH Just a something something...	[reply] [d/l]
Re: combine 2 fasta files into 1 by Generoso (Prior) on Jul 30, 2010 at 21:21 UTC
It look like the F1 file has HTML in it and a <br> looks like a new line but it is not, is just for formatting when you look at it with a browser like Internet Explorer.	[reply]
Re^2: combine 2 fasta files into 1 by Generoso (Prior) on Jul 30, 2010 at 21:34 UTC
Maybe this is what you are looking for. `use warnings; use strict; open( F1, "subsettest.txt" ) or die "file1: $!"; open( F2, "subsettest1.txt" ) or die "file2: $!"; while ( my $s1 = <F1> ) { my $s2 = <F2>; if ( $s1 =~ /^>/ ) { print $s1; } else { print $s2,$s1; } } close F1; close F2; exit;` [download] >GK9PVB108JH8SN rank=0000021 x=3781.0 y=885.0 length=137 40 38 38 30 30 30 38 40 40 30 30 30 36 40 40 40 40 40 40 40 40 40 40 4 +0 39 34 34 14 14 14 14 13 19 14 18 25 32 32 34 27 TGATTATGAGTTAGATGTTCGCTCTGAGGTTTCAACGATGCTTCAAGATTCCTAATTCGCGTTGCGACTC +TCGAGTATGCGTTCTATTCACATTTCTGTTGTCGTACATATTTGACTCACGATCTTGATTTCTTATC >GK9PVB108JYDQ5 rank=0000032 x=3965.0 y=143.0 length=53 25 21 21 23 23 37 31 31 31 37 37 39 38 40 40 40 40 40 40 40 40 39 39 3 +9 39 35 23 25 25 37 34 35 36 37 37 37 37 37 39 39 40 40 40 40 39 38 3 +5 33 33 33 32 31 19 GTTTTCAACGCTGGTTCGAGATTTCCTAATTTCACATTGCGACTCTCGAGTGC [download]	[reply] [d/l] [select]
Re^3: combine 2 fasta files into 1 by newbie25 (Initiate) on Jul 30, 2010 at 22:58 UTC
Thanks for your suggestion but removing the chomp function of my script doesn't solve my problem. It is still giving output that looks like this: `>GK9PVB108JRQBH rank=0000011 x=3889.0 y=1131.0 length=82 TCCATGTGTACAACTCATATGGAGCATCGATAGTATTAACAGTCTTGGTTGTGCGAGTTC 19 19 19 32 32 32 32 23 23 15 15 15 19 19 24 30 29 30 30 27 19 19 20 3 +0 35 35 33 31 31 31 31 32 32 30 23 16 15 15 15 24 28 28 25 22 17 17 1 +7 17 23 21 21 21 24 28 24 19 18 17 16 19 TTTGTTGTTTCCTTTAACTAAC` [download] instead of: `>GK9PVB108JRQBH rank=0000011 x=3889.0 y=1131.0 length=82 TCCATGTGTACAACTCATATGGAGCATCGATAGTATTAACAGTCTTGGTTGTGCGAGTTCTTTGTTGTTT +CCTTTAACTAAC 19 19 19 32 32 32 32 23 23 15 15 15 19 19 24 30 29 30 30 27 19 19 20 3 +0 35 35 33 31 31 31 31 32 32 30 23 16 15 15 15 24 28 28 25 22 17 17 1 +7 17 23 21 21 21 24 28 24 19 18 17 16 19` [download]	[reply] [d/l] [select]
Re^3: combine 2 fasta files into 1 by Generoso (Prior) on Aug 03, 2010 at 17:45 UTC
Ok form what I figure you have several lines in F1 and you what to skip the first corresponding line in F2. `use warnings; use strict; open( F1, "subsettest.txt" ) or die "file1: $!"; open( F2, "subsettest1.txt" ) or die "file2: $!"; my @s2s = (); while ( my $s1 = <F1> ) { my $s2 = <F2>; if ( $s1 =~ /^>/ ) { foreach (@s2s) {print $_;} print $s1; @s2s = (); } else { push (@s2s,$s2); print $s1; } } foreach (@s2s) {print $_;} close F1; close F2; exit;` [download]	[reply] [d/l]