Let's suppose for a moment that you wanted to split the data at the lines containing "INPUT SEQUENCE=" and use the number following the = as part of the file name for the output file, then you could:

use strict; use warnings; $/ = 'INPUT SEQUENCE='; while (<DATA>) { chomp; next unless length; next unless s/^(\d+)\n//; print "Start of file seq$1.dat\n"; print "$_\n"; } __DATA__
INPUT SEQUENCE=6618 >P40757|ALN_RANCA Allantoinase, mitochondrial precursor - Rana catesbe +iana (Bull frog) MALKSKPGIMNITPGSKISVIRSKRVIQANTISSCDIIISDGKISSVLAWGKHVTSGAKLLDVGDLVVMA GIIDPHVHVNEPGRTDWEGYRTATLAAAAGGITAIVDMPLNSLPPTTSVTNFHTKLQAAKRQCYVDVAFW GGVIPDNQVELIPMLQAGVAGFKCFLINSGVPEFPHVSVTDLHTAMSELQGTNSVLLFHAELEIAKPAPE IGDSTLYQTFLDSRPDDMEIAAVQLVADLCQQYKVRCHIVHLSSAQSLTIIRKAKEAGAPLTVETTHHYL SLSSEHIPPGATYFKCCPPVRGHRNKEALWNALLQGHIDMVVSDHSPCTPDLKLLKEGDYMKAWGGISSL QFGLPLFWTSARTRGFSLTDVSQLLSSNTAKLCGLGIVKEPLKWVMMLIWSSGILTKSFRCKKMIFITRI SSPHIWDSFFKEKSWLLLFEGLLFISKGSMLPNQLENLFLYTLWSLVKPVHPVHPIIRKNLPHI + + Total Number of residues in the sequence =484 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 5 AAAAG 96 + to 100 AAAGG 97 + to 101 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 7 AGVAGFK 157 + to 163 GGISSLQ 345 + to 351 ---------------------------------------------------------------------- +----------­------------------------- Minimum number of amino-acids present in the distant repeat is = 5 Maximum number of amino-acids present in the distant repeat is = 7 Total number of distant repeats found = 62 ______________________________________________________________________ +__________­_________________________ INPUT SEQUENCE=6619 >Q9RKU5|ALN_STRCO Probable allantoinase - Streptomyces coelicolor MSEAELVLRSTRVITPEGTRAASVAVTGEKITAVLPYDAPVPAGARLEDVGDHVVLPGLVDTHVHVNDPG RTEWEGFWTATRAAAAGGITTLVDMPLNSIPPTTTVDNLRTKREVAADKAHIDVGFWGGALPDNVKDLRP LHEAGVFGFKAFLSPSGVDEFPHLDQEQLARSLAEIAAFDGLLIVHAEDPHHLAAAPQQGGPKYTHFLAS RPRDAEDTAIATLLAQAKRFNARVHVLHLSSSDALPLIAEARADGVRVTVETCPHYLTLTAEEVPDGASE FKCCPPIREAANQDLLWQALADGTIDCVVTDHSPSTADLKTDDFATAWGGIAGLQLSLPAMWTAARGRGL GLEDVVRWMSERTAALVGLDARKGAIAPGHDADFAVLAPDETFTVDPAALQHRNRVTAYAGKTLYGVVKS TWLRGERIVADGAFTDPKGQLLDRA + + Total Number of residues in the sequence =445 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 5 AAAAG 83 + to 87 AAAGG 84 + to 88 PSTAD 314 + to 318 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 6 AAAGGI 84 + to 89 SSSDAL 240 + to 245 ---------------------------------------------------------------------- +----------­------------------------- Minimum number of amino-acids present in the distant repeat is = 5 Maximum number of amino-acids present in the distant repeat is = 8 Total number of distant repeats found = 94 ______________________________________________________________________ +__________­_________________________

Prints:

Start of file seq6618.dat >P40757|ALN_RANCA Allantoinase, mitochondrial precursor - Rana catesbe +iana (Bull frog) MALKSKPGIMNITPGSKISVIRSKRVIQANTISSCDIIISDGKISSVLAWGKHVTSGAKLLDVGDLVVMA GIIDPHVHVNEPGRTDWEGYRTATLAAAAGGITAIVDMPLNSLPPTTSVTNFHTKLQAAKRQCYVDVAFW GGVIPDNQVELIPMLQAGVAGFKCFLINSGVPEFPHVSVTDLHTAMSELQGTNSVLLFHAELEIAKPAPE IGDSTLYQTFLDSRPDDMEIAAVQLVADLCQQYKVRCHIVHLSSAQSLTIIRKAKEAGAPLTVETTHHYL SLSSEHIPPGATYFKCCPPVRGHRNKEALWNALLQGHIDMVVSDHSPCTPDLKLLKEGDYMKAWGGISSL QFGLPLFWTSARTRGFSLTDVSQLLSSNTAKLCGLGIVKEPLKWVMMLIWSSGILTKSFRCKKMIFITRI SSPHIWDSFFKEKSWLLLFEGLLFISKGSMLPNQLENLFLYTLWSLVKPVHPVHPIIRKNLPHI + + Total Number of residues in the sequence =484 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 5 AAAAG 96 + to 100 AAAGG 97 + to 101 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 7 AGVAGFK 157 + to 163 GGISSLQ 345 + to 351 ---------------------------------------------------------------------- +----------­------------------------- Minimum number of amino-acids present in the distant repeat is = 5 Maximum number of amino-acids present in the distant repeat is = 7 Total number of distant repeats found = 62 ______________________________________________________________________ +__________­_________________________ Start of file seq6619.dat >Q9RKU5|ALN_STRCO Probable allantoinase - Streptomyces coelicolor MSEAELVLRSTRVITPEGTRAASVAVTGEKITAVLPYDAPVPAGARLEDVGDHVVLPGLVDTHVHVNDPG RTEWEGFWTATRAAAAGGITTLVDMPLNSIPPTTTVDNLRTKREVAADKAHIDVGFWGGALPDNVKDLRP LHEAGVFGFKAFLSPSGVDEFPHLDQEQLARSLAEIAAFDGLLIVHAEDPHHLAAAPQQGGPKYTHFLAS RPRDAEDTAIATLLAQAKRFNARVHVLHLSSSDALPLIAEARADGVRVTVETCPHYLTLTAEEVPDGASE FKCCPPIREAANQDLLWQALADGTIDCVVTDHSPSTADLKTDDFATAWGGIAGLQLSLPAMWTAARGRGL GLEDVVRWMSERTAALVGLDARKGAIAPGHDADFAVLAPDETFTVDPAALQHRNRVTAYAGKTLYGVVKS TWLRGERIVADGAFTDPKGQLLDRA + + Total Number of residues in the sequence =445 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 5 AAAAG 83 + to 87 AAAGG 84 + to 88 PSTAD 314 + to 318 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 6 AAAGGI 84 + to 89 SSSDAL 240 + to 245 ---------------------------------------------------------------------- +----------­------------------------- Minimum number of amino-acids present in the distant repeat is = 5 Maximum number of amino-acids present in the distant repeat is = 8 Total number of distant repeats found = 94 ______________________________________________________________________ +__________­_________________________

which may or may not be anything at all like what you had in mind, but then you haven't actually told us that so guessing is all we can do.


DWIM is Perl's answer to Gödel

In reply to Re: split of files by GrandFather
in thread split of files by boby

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.