in reply to Optimizing a script

why are you now delimiting (or trying to split) records on a $<TAB><TAB><NEWLINE> not even sure that that's what's it's trying to split on, as you've got quotes in the wrong places there "...$/ = qq{"\$"\t""\t""\n};..." To cut to the chase: the program was trying to slurp the whole file, as it couldn't find the record delimiter, and was trying real hard to do so. Here's a simple solution...please provide a proper sequence of characters for the regex to look for to find the end of each record, if it's not the original "$" (dollar sign on a line by itself)
while($line=<>) { chomp $line; if($line =~ /^\$$/) { #look for start of line, followed by a lite +ral dollar sign, followed by the end of the line (not newline charact +er, but $ has to be at the end of $line) close OUTFILE; undef $data; next; } if(!$data) { open(OUTFILE,">$line") or die; } else { $data = 1; print OUTFILE "$line\n"; } }
this is nice and simple (procedural style coding) that you will find easier to understand. run the script "perl script.pl <your_data_file"

Replies are listed 'Best First'.
Re: Re: Optimizing a script
by aquarium (Curate) on Apr 19, 2004 at 12:52 UTC
    btw...processing the file this way, one line at a time, is very friendly to your computer's memory. it will ever only hold a single line from your input file into memory. So it doesn't matter if input file has 50.000 lines for a particular section you want to separate out, or if it has 5 lines. Doing the "tricky" thing of changing your record delimiter would eat up lots of memory if a data section was 50,000 lines. the problem would be worse if you had malformed record delimiters e.g. "$\t\n", which is harder to debug, as the program would just fail, rather than producing (not exactly what you want) kind of output.
      for( $above_post ) {s/lines/records/g; s/single line/single record/g} # </pedantic>

      Update: Never mind me, I misread and thought this was in reply to the original post which was processing by records. However if you do the record processing it's more state you have to keep up with yourself rather than letting perl handle it for you. Not to mention that reading by lines isn't necessarily going to protect you from malformed input any better (for example someone sends you a multi-meg file with Mac \cM line endings . . .).

        it's not pedantic...but much easier to debug. good luck trying to print $_ if your record separator is wrong....it'll run out of memory before it tries to do so. They need code that is memory efficient, and the most efficient way to do this is to read/write line by line. Just because everybody is on this track, doesn't mean that the original poster of the question had this in mind in terms of efficiency. In fact, i'm pretty sure they just wanted the script to run well on a low spec pc. it's not about being pedantic, but about giving them a script they can debug (if they need to). I would have used record slurping myself (if data sections aren't large), and appreciate all the earlier code. my code is not fancy, but works. it can process a data section of several million lines, on a 386.....and you can watch the output with tail -f, instead of waiting for a day to see if it works or not.