G'day BiologySwede,
Welcome to the monastery.
There's some issues with what you've posted:
-
You didn't state whether any sequences could contain no Ns.
I've assumed the sequences might not have Ns.
The code I've provided below can be shortened if that's not the case; however, it will work with either case as written.
-
You provided an example of a problematic sequence but didn't say whether:
-
you didn't want the initial zero-length string that your split would produce, or
-
you actually wanted to retain the leading (and/or trailing) Ns.
I've assumed (1).
The following code eliminates the need for an interim %sequence hash, requires no regex for split and reduces your code substantially (all processing occurs in a single statement). Also note that I've added some additional test data.
#!/usr/bin/env perl -l
use strict;
use warnings;
/^[^>]/ && do { y/N/ /; print join "\n" => split } while <DATA>;
__DATA__
>fasta1
NNNAGTCTGCAAANAATTTGCGGCTCACAAT
>fasta2
CGCAGCCATTAACATCTCAACAAGCCAAAAATTCCTTCTCAGAAATTCGGNNN
>mytest1
NNNACGTNNTGCANN
>mytest2
ACGTNNCGTANNNNNGTACNTACG
>mytest3
TGCA
Output:
AGTCTGCAAA
AATTTGCGGCTCACAAT
CGCAGCCATTAACATCTCAACAAGCCAAAAATTCCTTCTCAGAAATTCGG
ACGT
TGCA
ACGT
CGTA
GTAC
TACG
TGCA
Here's some additional tips regarding the code you posted:
-
Hashes have no inherent ordering. "keys %sequence" will probably return a different order to that in the original fasta file. I don't know if that's important to you.
-
Get into the habit of using the 3-argument form of open with a lexical filehandle.
-
It's easy to forget to check for I/O errors (as you did with "open (OUTFILE,">fasta_report.txt");").
Consider using the autodie pragma: it's a lot less work for you and removes the possibility forgetting the I/O checks.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.