in reply to split of files

Let's suppose for a moment that you wanted to split the data at the lines containing "INPUT SEQUENCE=" and use the number following the = as part of the file name for the output file, then you could:

use strict; use warnings; $/ = 'INPUT SEQUENCE='; while (<DATA>) { chomp; next unless length; next unless s/^(\d+)\n//; print "Start of file seq$1.dat\n"; print "$_\n"; } __DATA__
INPUT SEQUENCE=6618 >P40757|ALN_RANCA Allantoinase, mitochondrial precursor - Rana catesbe +iana (Bull frog) MALKSKPGIMNITPGSKISVIRSKRVIQANTISSCDIIISDGKISSVLAWGKHVTSGAKLLDVGDLVVMA GIIDPHVHVNEPGRTDWEGYRTATLAAAAGGITAIVDMPLNSLPPTTSVTNFHTKLQAAKRQCYVDVAFW GGVIPDNQVELIPMLQAGVAGFKCFLINSGVPEFPHVSVTDLHTAMSELQGTNSVLLFHAELEIAKPAPE IGDSTLYQTFLDSRPDDMEIAAVQLVADLCQQYKVRCHIVHLSSAQSLTIIRKAKEAGAPLTVETTHHYL SLSSEHIPPGATYFKCCPPVRGHRNKEALWNALLQGHIDMVVSDHSPCTPDLKLLKEGDYMKAWGGISSL QFGLPLFWTSARTRGFSLTDVSQLLSSNTAKLCGLGIVKEPLKWVMMLIWSSGILTKSFRCKKMIFITRI SSPHIWDSFFKEKSWLLLFEGLLFISKGSMLPNQLENLFLYTLWSLVKPVHPVHPIIRKNLPHI + + Total Number of residues in the sequence =484 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 5 AAAAG 96 + to 100 AAAGG 97 + to 101 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 7 AGVAGFK 157 + to 163 GGISSLQ 345 + to 351 ---------------------------------------------------------------------- +----------­------------------------- Minimum number of amino-acids present in the distant repeat is = 5 Maximum number of amino-acids present in the distant repeat is = 7 Total number of distant repeats found = 62 ______________________________________________________________________ +__________­_________________________ INPUT SEQUENCE=6619 >Q9RKU5|ALN_STRCO Probable allantoinase - Streptomyces coelicolor MSEAELVLRSTRVITPEGTRAASVAVTGEKITAVLPYDAPVPAGARLEDVGDHVVLPGLVDTHVHVNDPG RTEWEGFWTATRAAAAGGITTLVDMPLNSIPPTTTVDNLRTKREVAADKAHIDVGFWGGALPDNVKDLRP LHEAGVFGFKAFLSPSGVDEFPHLDQEQLARSLAEIAAFDGLLIVHAEDPHHLAAAPQQGGPKYTHFLAS RPRDAEDTAIATLLAQAKRFNARVHVLHLSSSDALPLIAEARADGVRVTVETCPHYLTLTAEEVPDGASE FKCCPPIREAANQDLLWQALADGTIDCVVTDHSPSTADLKTDDFATAWGGIAGLQLSLPAMWTAARGRGL GLEDVVRWMSERTAALVGLDARKGAIAPGHDADFAVLAPDETFTVDPAALQHRNRVTAYAGKTLYGVVKS TWLRGERIVADGAFTDPKGQLLDRA + + Total Number of residues in the sequence =445 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 5 AAAAG 83 + to 87 AAAGG 84 + to 88 PSTAD 314 + to 318 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 6 AAAGGI 84 + to 89 SSSDAL 240 + to 245 ---------------------------------------------------------------------- +----------­------------------------- Minimum number of amino-acids present in the distant repeat is = 5 Maximum number of amino-acids present in the distant repeat is = 8 Total number of distant repeats found = 94 ______________________________________________________________________ +__________­_________________________

Prints:

Start of file seq6618.dat >P40757|ALN_RANCA Allantoinase, mitochondrial precursor - Rana catesbe +iana (Bull frog) MALKSKPGIMNITPGSKISVIRSKRVIQANTISSCDIIISDGKISSVLAWGKHVTSGAKLLDVGDLVVMA GIIDPHVHVNEPGRTDWEGYRTATLAAAAGGITAIVDMPLNSLPPTTSVTNFHTKLQAAKRQCYVDVAFW GGVIPDNQVELIPMLQAGVAGFKCFLINSGVPEFPHVSVTDLHTAMSELQGTNSVLLFHAELEIAKPAPE IGDSTLYQTFLDSRPDDMEIAAVQLVADLCQQYKVRCHIVHLSSAQSLTIIRKAKEAGAPLTVETTHHYL SLSSEHIPPGATYFKCCPPVRGHRNKEALWNALLQGHIDMVVSDHSPCTPDLKLLKEGDYMKAWGGISSL QFGLPLFWTSARTRGFSLTDVSQLLSSNTAKLCGLGIVKEPLKWVMMLIWSSGILTKSFRCKKMIFITRI SSPHIWDSFFKEKSWLLLFEGLLFISKGSMLPNQLENLFLYTLWSLVKPVHPVHPIIRKNLPHI + + Total Number of residues in the sequence =484 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 5 AAAAG 96 + to 100 AAAGG 97 + to 101 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 7 AGVAGFK 157 + to 163 GGISSLQ 345 + to 351 ---------------------------------------------------------------------- +----------­------------------------- Minimum number of amino-acids present in the distant repeat is = 5 Maximum number of amino-acids present in the distant repeat is = 7 Total number of distant repeats found = 62 ______________________________________________________________________ +__________­_________________________ Start of file seq6619.dat >Q9RKU5|ALN_STRCO Probable allantoinase - Streptomyces coelicolor MSEAELVLRSTRVITPEGTRAASVAVTGEKITAVLPYDAPVPAGARLEDVGDHVVLPGLVDTHVHVNDPG RTEWEGFWTATRAAAAGGITTLVDMPLNSIPPTTTVDNLRTKREVAADKAHIDVGFWGGALPDNVKDLRP LHEAGVFGFKAFLSPSGVDEFPHLDQEQLARSLAEIAAFDGLLIVHAEDPHHLAAAPQQGGPKYTHFLAS RPRDAEDTAIATLLAQAKRFNARVHVLHLSSSDALPLIAEARADGVRVTVETCPHYLTLTAEEVPDGASE FKCCPPIREAANQDLLWQALADGTIDCVVTDHSPSTADLKTDDFATAWGGIAGLQLSLPAMWTAARGRGL GLEDVVRWMSERTAALVGLDARKGAIAPGHDADFAVLAPDETFTVDPAALQHRNRVTAYAGKTLYGVVKS TWLRGERIVADGAFTDPKGQLLDRA + + Total Number of residues in the sequence =445 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 5 AAAAG 83 + to 87 AAAGG 84 + to 88 PSTAD 314 + to 318 ---------------------------------------------------------------------- +----------­------------------------- Number of residues in the repeat = 6 AAAGGI 84 + to 89 SSSDAL 240 + to 245 ---------------------------------------------------------------------- +----------­------------------------- Minimum number of amino-acids present in the distant repeat is = 5 Maximum number of amino-acids present in the distant repeat is = 8 Total number of distant repeats found = 94 ______________________________________________________________________ +__________­_________________________

which may or may not be anything at all like what you had in mind, but then you haven't actually told us that so guessing is all we can do.


DWIM is Perl's answer to Gödel