I have a sample subroutine that parses through two fasta-formatted files containing several genes/gene sequences and returns two sets of hashes: one with values as nucleotide sequences, and another with values as description information - both have gene names as keys.

Ultimately, I want to print the hash information into two separate files in fasta format (i.e. gene name and description in header, then sequence underneath, for each gene). However, I don't know how to print the information to the files in the correct format - any suggestions how to do this?

Here's my code so far

#!/usr/bin/perl use strict; use warnings; my ($file1, $file2) = @ARGV; my $out1 = 'output1.fasta'; my $out2 = 'output2.fasta'; open (FILE1, "<$file1") or die "Cannot open input file: $file1 !\n"; open (FILE2, "<$file2") or die "Cannot open input file: $file2 !\n"; open (my $output1, ">$out1") or die "Cannot open output file: $out1 !\ +n"; open (my $output2, ">$out2") or die "Cannot open output file: $out2 !\ +n"; my @file1=<FILE1>; my @file2=<FILE2>; my %File1=(); my %File1_D=(); my %File2=(); my %File2_D=(); Parse2Files (\@file1, \@file2, \%File1, \%File2, \%File1_D, \%File2_D) +; my @temp1a = keys %File1; ## gene name my @temp1b = values %File1_D; ## description my @temp1c = values %File1; ## gene seq print $output1 "@temp1a, @temp1b, @temp1c\n"; my @temp2a = keys %File2; ## gene name my @temp2b = values %File2_D; ## description my @temp2b = values %File2; ## gene seq print $output2 "@temp2a, @temp2b, @temp2c\n"; sub Parse2Files { my ($arrayref1, $arrayref2, $hashref1, $hashref2) = @_; my @file1=@{$arrayref1}; my @file2=@{$arrayref2}; my $f1dscp=''; ## File 1 - fasta description my $f1name=''; ## File 1 - fasta gene name my $f2dscp=''; ## File 2 - fasta description my $f2name=''; ## File 2 - fasta gene name for (my $i=0; $i<=$#file1; $i++) { chomp($file1[$i]); if ($file1[$i] =~ m/^>(\S+)\s(.+)/) { $f1name = $1; $f1dscp = $2; $File1_D{$f1name} = $f1dscp; } else{ $File1{$f1name} .= $file1[$i]; } } for (my $i=0; $i<=$#file2; $i++) { chomp($file2[$i]); if ($file2[$i] =~ m/^>(\S+)\s(.+)/){ $f2name = $1; $f2dscp = $2; $File2_D{$f2name} = $f2dscp; } else{ $File2{$f2name} .= $file2[$i]; } } return (%File1,%File2, %File1_D, %File2_D); }

Yes, the code is basically taking two fasta files and putting them back into another fasta-formatted file - I'm just toying with this to practice making output filed.


In reply to Print Output to New File by FIJI42

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.