tstrobaugh has asked for the wisdom of the Perl Monks concerning the following question:

I'm running Strawberry perl 5.20.2 on windowsx64. First day, just trying to take over some bioinformatics from a person leaving (I have no perl experience). Could you tell me what switches/options there are for the below script? I can get it to work (using >perl split_FASTA-no_overlap.pl "filename". But the output is just to the "terminal"? and I would like it to write the files to the directory for import into a sequencing program. Sorry if question is not in the correct form. Thanks.

#! /opt/local/compbio/bin/perl5.8.0 -w $size = 5000; $overlap = 1; $overlap_size = 0.00*$size; $f = $ARGV[0]; local $/ = "\n>"; open(IN, "$f") || die "$f no\n"; while(<IN>){ s/>//g; $id = ''; $annt = ''; @ln = split /\n/; $defLine = shift @ln; $id = $1 if $defLine =~ /^(\S+)\s?/; $annt = $1 if $defLine =~ /^\S+\s+(.+)$/; $seq = join("",@ln); for(my $i=0; ($i*$size)<(length$seq); $i++){ $subSeq = substr($seq,$i*$size, $size); $subSeq =~ s/(\w{60)/\$1\n/g; chomp $subSeq; $start = $i*$size +1; $end = $start + (length$subSeq) - 1; print ">$id:[",$start,'-',$end,"] $annt\n$subSeq\n"; if($overlap == 1 && ($i*$size+$overlap_size) < (length$seq)){ $subSeq = substr($seq,$i*$size+$overlap_size, $size); $subSeq =~ s/(\w{60)/\$1\n/g; chomp $subSeq; $start = $i*$size +1+$overlap_size; $end = $start + (length$subSeq) - 1; print ">$id:[",$start,'-',$end,"] $annt\n$subSeq\n"; } } }

Replies are listed 'Best First'.
Re: begginer help running perl script
by CountZero (Bishop) on May 08, 2015 at 18:57 UTC
    There is nothing in this script that allows you to redirect the output to a different file every 5000 lines. There are no switches or options for Perl to do that.

    You can of course redirect the output to a file from the command-line

    perl split_FASTA-no_overlap.pl "filename" > output.file

    Or you can "pipe" the output of this program to the input of another Perl program which then saves the data in a file and starts a new file every 5000 lines:

    perl split_FASTA-no_overlap.pl "filename" | perl splitter.pl -L5000

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: beginner help running perl script
by toolic (Bishop) on May 08, 2015 at 18:13 UTC
    By default, print goes to the STDOUT filehandle (your terminal). To direct your output to a file, open a file for output, and print to the new filehandle. UNTESTED:
    $size = 5000; $overlap = 1; $overlap_size = 0.00*$size; $f = $ARGV[0]; local $/ = "\n>"; open $fh_out, '>', 'out.txt' or die "file out.txt: $!"; open(IN, "$f") || die "$f no\n"; while(<IN>){ s/>//g; $id = ''; $annt = ''; @ln = split /\n/; $defLine = shift @ln; $id = $1 if $defLine =~ /^(\S+)\s?/; $annt = $1 if $defLine =~ /^\S+\s+(.+)$/; $seq = join("",@ln); for(my $i=0; ($i*$size)<(length$seq); $i++){ $subSeq = substr($seq,$i*$size, $size); $subSeq =~ s/(\w{60)/\$1\n/g; chomp $subSeq; $start = $i*$size +1; $end = $start + (length$subSeq) - 1; print $fh_out ">$id:[",$start,'-',$end,"] $annt\n$subSeq\n"; if($overlap == 1 && ($i*$size+$overlap_size) < (length$seq)){ $subSeq = substr($seq,$i*$size+$overlap_size, $size); $subSeq =~ s/(\w{60)/\$1\n/g; chomp $subSeq; $start = $i*$size +1+$overlap_size; $end = $start + (length$subSeq) - 1; print $fh_out ">$id:[",$start,'-',$end,"] $annt\n$subSeq\n +"; } } } close $fh_out;

    See also

      Thanks for that, it does output to a text file now, but it has "doubles" of every output and it's all in one file, as opposed to every output of 5000 sequences being in a separate file. I don't think there is any need to rewrite the perl script though, he was using this script and generating the separate files. I just don't know the correct way to execute it with the proper switches or formatting to make that happen. I don't think this guy (before me) wrote scripts, just used available ones.

Re: begginer help running perl script
by vinoth.ree (Monsignor) on May 08, 2015 at 18:18 UTC

    From perldoc -f open:

    open STDOUT, '>', "foo.out"

    The docs are your friend.


    All is well. I learn by answering your questions...