Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
So, I have a file like the followind:
236 28KD_MYCLE Train MPNRRRCKLSTAISTVATLAIASPCAYFLVYEPTASAKPAAKHYEFKQAASIADLPGEVLDAISQGLSQF +GINLPPVPSL TGTDDPGNGLRTPGLTSPDLTNQELGTPVLTAPGTGLTPPVTGSPICTAPDLNLGGTCPSEVPITTPISL +DPGTDGTYPI LGDPSTLGGTSPISTSSGELVNDLLKVANQLGASQVMDLIKGVVMPAVMQGVQNGNVAGDLSGSVTPAAI +SLIPVT SSSSSSSSSSSSSSSSSSSSSS................................................ +.......... ...................................................................... +.......... ...................................................................... +...... // 338 A85A_MYCTU Train MQLVDRVRGAVTGMSRRLVVGAVGAALVSGLVGAVGGTATAGAFSRPGLPVEYLQVPSPSMGRDIKVQFQ +SGGANSPALY LLDGLRAQDDFSGWDINTPAFEWYDQSGLSVVMPVGGQSSFYSDWYQPACGKAGCQTYKWETFLTSELPG +WLQANRHVKP TGSAVVGLSMAASSALTLAIYHPQQFVYAGAMSGLLDPSQAMGPTLIGLAMGDAGGYKASDMWGPKEDPA +WQRNDPLLNV GKLIANNTRVWVYCGNGKPSDLGGNNLPAKFLEGFVRTSNIKFQDAYNAGGGHNGVFDFPDSGTHSWEYW +GAQLNAMKPD LQRALGATPNTGPAPQGA SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS............................ +.......... ...................................................................... +.......... ...................................................................... +.......... ...................................................................... +.......... .................. // 325 A85B_MYCBO MTDVSRKIRAWGRRLMIGTAAAVVLPGLVGLAGGAATAGAFSRPGLPVEYLQVPSPSMGRDIKVQFQSGG +NNSPAVYLLD GLRAQDDYNGWDINTPAFEWYYQSGLSIVMPVGGQSSFYSDWYSPACGKAGCQTYKWETFLTSELPQWLS +ANRAVKPTGS AAIGLSMAGSSAMILAAYHPQQFIYAGSLSALLDPSQGMGPSLIGLAMGDAGGYKAADMWGPSSDPAWER +NDPTQQIPKL VANNTRLWVYCGNGTPNELGGANIPAEFLENFVRSSNLKFQDAYNAAGGHNAVFNFPPNGTHSWEYWGAQ +LNAMKGDLQS SLGAG SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.............................. +.......... ...................................................................... +.......... ...................................................................... +.......... ...................................................................... +.......... ..... // 325 A85B_MYCKA MTDVSGKIRAWGRRLLVGAAAAAALPGLVGLAGGAATAGAFSRPGLPVEYLQVPSAAMGRSIKVQFQSGG +DNSPAVYLLD GLRAQDDYNGWDINTPAFEWYYQSGLSVIMPVGGQSSFYSDWYSPACGKAGCTTYKWETFLTSELPQWLS +ANRSVKPTGS AAVGISMAGSSALILSVYHPQQFIYAGSLSALMDPSQGMGPSLIGLAMGDAGGYKASDMWGPSSDPAWQR +NDPSLHIPEL VANNTRLWIYCGNGTPSELGGANVPAEFLENFVRSSNLKFQDAYNAAGGHNAVFNLDANGTHSWEYWGAQ +LNAMKGDLQA SLGAR SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.............................. +.......... ...................................................................... +.......... ...................................................................... +.......... ...................................................................... +.......... ..... //

and what I want is to print the line with the ID (namely the one that starts with a number), and then, 2 lines, one that will have the sequence of letters and another one that will have the SSSSSSSSSSS....... sequence.
For some reason the code that I have tried does not print anything.
$/="//\n"; while(<>) { if($_=~/^(\d+)\s+(.*)/m) {$seq_length=$1; $ID=$2;} $seq=''; while($_!~/([A_z]+)/mg) { $part1=$1; $seq.=$seq.$part1; } $top=''; while($_=~/([S\.]+)/mg) {$part2=$1; $top.=$top.$part2;} print ">$ID\n$seq\n$top\n"; } $/="\n";

Replies are listed 'Best First'.
Re: create an one-line format for this
by McA (Priest) on Oct 01, 2013 at 22:47 UTC

    Hi,

    this would be my approach based on that what you wrote:

    #!/usr/bin/env perl use strict; use warnings; my $filename = 'file'; open my $fh, "<", $filename or die $!; local $/ = "//\n"; while(defined(my $line = <$fh>)) { chomp $line; my ($seq_length, $ID, $top, $seq); foreach my $part (split /\n/, $line) { if($part =~ /^(\d+)\s+(.*)$/) { $seq_length = $1; $ID = $2; next; } if($part =~ /^([S\.]+)$/) { $top .= $1; next; } $seq .= $part; } print ">$ID\n$seq\n$top\n"; } close $fh;

    Best regards
    McA

      Thank you very much for your help!
Re: create an one-line format for this
by Kenosis (Priest) on Oct 01, 2013 at 23:50 UTC

    Here's another option:

    use strict; use warnings; local $/ = "//\n"; while (<>) { chomp; my ( $len, $id, $parts ) = /(\d+)\s+(.+?)\n(.+)/s; $parts =~ s/\n//g; $parts =~ /.{$len}/p; print ">$id\n${^MATCH}\n${^POSTMATCH}\n" }

    If you're using Perl v5.14+, you can combine the substitution and matching into a single line as follows:

    $parts =~ s/\n//gr =~ /.{$len}/p;

    Hope this helps!

Re: create an one-line format for this
by hdb (Monsignor) on Oct 02, 2013 at 06:41 UTC

    As an alternative, read line by line and decide for each type what to do:

    use strict; use warnings; while(<>){ print and next if /^\d+\s/; chomp; print if /^([A-Z]+|[.]+)$/; print "\n$_" if /^S+[.]*$/; print "\n" if m|^//$|; }