in reply to File to @array then parse like a file

As I understand it a I/O read is a I/O read and will take the same time whether it goes into an array or is processed line by line. The only advantages I can see to using the array approach are
1) You have to pass over the file more that once (which doesn't seem to be your case)
2) You are locking the file and want to reduce the amount of time that it is locked for, the script would take the same time, but the file would be availble to another process faster.

On another note if you are looking to squeeze every second out, your second and subsequent if statements should be elsif() statements, that way one the match is made for a given line the rest of the tests get skipped. This of course assumes that each test is exclusive.

PS the scary part to me is that I recognze the file format you are trying to parse
  • Comment on Re: File to @array then parse like a file

Replies are listed 'Best First'.
Re: Re: File to @array then parse like a file
by Limbic~Region (Chancellor) on Mar 14, 2003 at 23:15 UTC
    I could be wrong, but I disagree. Slurping a file into an array or a scalar can certainly be faster than parsing the file line by line. This is because there are consecutive system calls. Changing $/ can increase speed without slurping the whole file into memory as a happy medium. For instance, reading in a 100 MB file that only has 10 characters per line 1 line at a time would certainly be slower than reading it in 64K chunks.

    In dbrock's example. It certainly does seem like a waste to slurp the file. The point is not to slurp the file for premature optimization unless there is a valid reason to do so.

    Cheers - L~R

    UPDATE: If the file is extremely large, but you only need the top part of it, you can use last to end the loop once you have all the data that you need. I am guessing that this may be the reason for wanting to speed things up.

    UPDATE 2: Setting $/ = \65536; does in deed change how much of the file is read by buffer. The other factor that slows things down by iterating by newlines is the stuff in between (data munging). It has to be performed more times than if you are working with a larger data block. Thanks to chromatic for keeping me on my toes and runrig for clearing up some confusion in the CB.

      I'm not aware of any operating system that deals with "lines" on a file level. Unix-wise, it's all just a stream of characters. Perl-wise, unless you use sysread, you'll get buffered reads, so you'll only hit the disk when you've exhausted the buffer, not every time you want a new line.

      There may be a filesystem out there that does work with lines, not characters. To my knowledge, Perl doesn't do anything different there.

      Update: I forgot to mention block device buffering, or buffering in the disk controller.