in reply to Manipulating tab delimited file

(Update: added $species detection from @ARGV)

Here's the code...

my ($c, $species) = (0, shift()); while(<>) { chomp; next unless length; my ($seq, $scount) = split /\s+/; next if $scount < 2 || length $seq < 15 || length $seq > 30; print ">$species" . $c++ . "_count=$scount\n$seq\n"; }

If it's named process_sequences, then it could be invoked as:

process-sequences speciesname inputfilename >outputfilename

It's too bad the filenames aren't named for the species they represent, because then you could do something like this:

my ($c, $species) = (0, $ARGV[0]); open my $outfh, '>', $species . ".new" || die $!; while(<>) { chomp; if(length) { my ($seq, $scount) = split /\s+/; if($scount >= 2 && length $seq >= 15 && length $seq <= 30) { print $outfh ">$species" . $c++ . "_count=$scount\n$seq\n" +; } } if (eof()) { ($species, $c) = ($ARGV[0], 0); close $outfh || die $!; open $outfh, '>', $species . ".new" || die $!; } }

And that would be invoked with a list of filenames, each named for the species:

process-sequences bird bee cat dog

Dave

Replies are listed 'Best First'.
Re^2: Manipulating tab delimited file
by andyBio (Novice) on Apr 29, 2016 at 04:41 UTC
    Thanks for the help, Dave! I appreciate your wisdom!