in reply to Re: Optimizing a script
in thread Optimizing a script

btw...processing the file this way, one line at a time, is very friendly to your computer's memory. it will ever only hold a single line from your input file into memory. So it doesn't matter if input file has 50.000 lines for a particular section you want to separate out, or if it has 5 lines. Doing the "tricky" thing of changing your record delimiter would eat up lots of memory if a data section was 50,000 lines. the problem would be worse if you had malformed record delimiters e.g. "$\t\n", which is harder to debug, as the program would just fail, rather than producing (not exactly what you want) kind of output.

Replies are listed 'Best First'.
Re: Re: Re: Optimizing a script
by Fletch (Bishop) on Apr 19, 2004 at 13:09 UTC
    for( $above_post ) {s/lines/records/g; s/single line/single record/g} # </pedantic>

    Update: Never mind me, I misread and thought this was in reply to the original post which was processing by records. However if you do the record processing it's more state you have to keep up with yourself rather than letting perl handle it for you. Not to mention that reading by lines isn't necessarily going to protect you from malformed input any better (for example someone sends you a multi-meg file with Mac \cM line endings . . .).

      it's not pedantic...but much easier to debug. good luck trying to print $_ if your record separator is wrong....it'll run out of memory before it tries to do so. They need code that is memory efficient, and the most efficient way to do this is to read/write line by line. Just because everybody is on this track, doesn't mean that the original poster of the question had this in mind in terms of efficiency. In fact, i'm pretty sure they just wanted the script to run well on a low spec pc. it's not about being pedantic, but about giving them a script they can debug (if they need to). I would have used record slurping myself (if data sections aren't large), and appreciate all the earlier code. my code is not fancy, but works. it can process a data section of several million lines, on a 386.....and you can watch the output with tail -f, instead of waiting for a day to see if it works or not.