in reply to Bioinformatic task

Define 'big'. One way to handle the problem for smaller values of 'big' is to read the entire file into memory then use a regular expression to cut it up for you.

For larger values of big you may be able to use something like:

use strict; use warnings; local $/ = ">Seq"; while (<DATA>) { chomp; next if ! length; print "Record: $_\n"; } __DATA__ >Seq1 AAATTTGGG..... >Seq2 AGATTTACC.....
True laziness is hard work

Replies are listed 'Best First'.
Re^2: Bioinformatic task
by uvnew (Acolyte) on Nov 07, 2010 at 23:18 UTC
    Thanks for replying. I actually wrongly used the word 'big'. I must read the whole file into memory, where one structure would be for the headers and another one for the sequences. My computer definitely has enough RAM for that.

      I strongly second what aquarium implies in Re: Bioinformatic task - using a single structure containing records with two parts is much better than trying to keep two parallel arrays in sync.

      However, it's probably worth your while to tell us more about the problem you are trying to solve. I suspect there are other areas where you could use a little help with this problem!

      True laziness is hard work
Re^2: Bioinformatic task
by patcat88 (Deacon) on Nov 09, 2010 at 03:59 UTC
    For the OP's purpose, dont use a regular expression just to cut up a string with a static delimiter. Regular expressions aren't the answer for every last parsing problem in Perl. They are 10x slower, no matter how basic, than a couple line index and substr algorithm. Use substr and index (see my post here Re: Is Using Threads Slower Than Not Using Threads?), or as you showed, by redefining input record separator and reading "by line".