in reply to Converting fasta (with multiple sequences) into tabular using perl

Hi, here is one solution. I'm using Path::Tiny for file handling. Also localizing the special variable $/ (the input record separator), as you hinted at.

use strict; use warnings; use feature 'say'; use Path::Tiny; local $/ = '>'; my $fh = path('./foo.fasta')->openr; while ( my $paragraph = <$fh> ) { chomp $paragraph; my @lines = split /\n/, $paragraph or next; my ( $identifier, $string ); for my $line ( @lines ) { if ( $line =~ /(sequence\d+)/ ) { $identifier = $1; } else { $string .= $line; } } say "$identifier\t$string"; } __END__
I used the following input file:
>sequence1 ACTCCCCGTGCGCGCCCGGCCCGTAGCGTCCTCGTCGCCGCCCCTCGTCTCGCAGCCGCAGCCCGCGTGG ACGCTCTCGCCTGAGCGCCGCGGACTAGCCCGGGTGGCC > sequence2 CAGTCCGGCAGCGCCGGGGTTAAGCGGCCCAAGTAAACGTAGCGCAGCGATCGGCGCCGGAGATTCGCGA ACCCGACACTCCGCGCCGCCCGCCGGCCAGGACCCGCGGCGCGATCGCGGCGCCGCGCTACAGCCAGCCT CACTGGCGCGCGGGCGAGCGCACGGGCGCTC >randomstuff sequence3 CACGACAGGCCCGCTGAGGCTTGTGCCAGACCTTGGAAACCTCAGGTATATACCTTTCCAGACGCGGGAT CTCCCCTCCCC > sequence4 blahblah CAGCAGACATCTGAATGAAGAAGAGGGTGCCAGCGGGTATGAGGAGTGCATTATCGTTAATGGGAACTTC AGTGACCAGTCCTCAGACACGAAGGATGCTCCCTCACCCCCAGTCTTGGAGGCAATCTGCACAGAGCCAG TCTGCACACC
and got the following output:
$ perl foo.pl sequence1 ACTCCCCGTGCGCGCCCGGCCCGTAGCGTCCTCGTCGCCGCCCCTCGTCTCGCAGCC +GCAGCCCGCGTGGACGCTCTCGCCTGAGCGCCGCGGACTAGCCCGGGTGGCC sequence2 CAGTCCGGCAGCGCCGGGGTTAAGCGGCCCAAGTAAACGTAGCGCAGCGATCGGCGC +CGGAGATTCGCGAACCCGACACTCCGCGCCGCCCGCCGGCCAGGACCCGCGGCGCGATCGCGGCGCCGC +GCTACAGCCAGCCTCACTGGCGCGCGGGCGAGCGCACGGGCGCTC sequence3 CACGACAGGCCCGCTGAGGCTTGTGCCAGACCTTGGAAACCTCAGGTATATACCTTT +CCAGACGCGGGATCTCCCCTCCCC sequence4 CAGCAGACATCTGAATGAAGAAGAGGGTGCCAGCGGGTATGAGGAGTGCATTATCGT +TAATGGGAACTTCAGTGACCAGTCCTCAGACACGAAGGATGCTCCCTCACCCCCAGTCTTGGAGGCAAT +CTGCACAGAGCCAGTCTGCACACC

Hope this helps!


The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re^2: Converting fasta (with multiple sequences) into tabular using perl
by rarenas (Acolyte) on Dec 14, 2017 at 10:47 UTC

    Thank you! This was indeed helpful. :)