FIJI42 has asked for the wisdom of the Perl Monks concerning the following question:
I have a sample subroutine that parses through two fasta-formatted files containing several genes/gene sequences and returns two sets of hashes: one with values as nucleotide sequences, and another with values as description information - both have gene names as keys.
Ultimately, I want to print the hash information into two separate files in fasta format (i.e. gene name and description in header, then sequence underneath, for each gene). However, I don't know how to print the information to the files in the correct format - any suggestions how to do this?
Here's my code so far
#!/usr/bin/perl use strict; use warnings; my ($file1, $file2) = @ARGV; my $out1 = 'output1.fasta'; my $out2 = 'output2.fasta'; open (FILE1, "<$file1") or die "Cannot open input file: $file1 !\n"; open (FILE2, "<$file2") or die "Cannot open input file: $file2 !\n"; open (my $output1, ">$out1") or die "Cannot open output file: $out1 !\ +n"; open (my $output2, ">$out2") or die "Cannot open output file: $out2 !\ +n"; my @file1=<FILE1>; my @file2=<FILE2>; my %File1=(); my %File1_D=(); my %File2=(); my %File2_D=(); Parse2Files (\@file1, \@file2, \%File1, \%File2, \%File1_D, \%File2_D) +; my @temp1a = keys %File1; ## gene name my @temp1b = values %File1_D; ## description my @temp1c = values %File1; ## gene seq print $output1 "@temp1a, @temp1b, @temp1c\n"; my @temp2a = keys %File2; ## gene name my @temp2b = values %File2_D; ## description my @temp2b = values %File2; ## gene seq print $output2 "@temp2a, @temp2b, @temp2c\n"; sub Parse2Files { my ($arrayref1, $arrayref2, $hashref1, $hashref2) = @_; my @file1=@{$arrayref1}; my @file2=@{$arrayref2}; my $f1dscp=''; ## File 1 - fasta description my $f1name=''; ## File 1 - fasta gene name my $f2dscp=''; ## File 2 - fasta description my $f2name=''; ## File 2 - fasta gene name for (my $i=0; $i<=$#file1; $i++) { chomp($file1[$i]); if ($file1[$i] =~ m/^>(\S+)\s(.+)/) { $f1name = $1; $f1dscp = $2; $File1_D{$f1name} = $f1dscp; } else{ $File1{$f1name} .= $file1[$i]; } } for (my $i=0; $i<=$#file2; $i++) { chomp($file2[$i]); if ($file2[$i] =~ m/^>(\S+)\s(.+)/){ $f2name = $1; $f2dscp = $2; $File2_D{$f2name} = $f2dscp; } else{ $File2{$f2name} .= $file2[$i]; } } return (%File1,%File2, %File1_D, %File2_D); }
Yes, the code is basically taking two fasta files and putting them back into another fasta-formatted file - I'm just toying with this to practice making output filed.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Print Output to New File
by 1nickt (Canon) on Nov 03, 2017 at 10:52 UTC | |
|
Re: Print Output to New File
by hippo (Archbishop) on Nov 03, 2017 at 09:35 UTC | |
|
Re: Print Output to New File
by holli (Abbot) on Nov 02, 2017 at 23:28 UTC | |
|
Re: Print Output to New File (UPDATED)
by thanos1983 (Parson) on Nov 03, 2017 at 09:00 UTC | |
by 1nickt (Canon) on Nov 03, 2017 at 13:11 UTC | |
by thanos1983 (Parson) on Nov 03, 2017 at 15:50 UTC |