in reply to Opening file and checking for data

Well of course benchmarking may prove me wrong but I would have thought that slurping the contents into the array and then iterating over the array is going to be less efficient than simply reading the file line by line like:

... while(<DATA>) { ... }
You probably also want to avoid using DATA as a filehandle as this is a predefined handle setup when Perl initializes pointing to the the stuff after an __END__ or __DATA__ at the end of the program. It doesn't break anything but might be confusing to someone reading the program later.

/J\

Replies are listed 'Best First'.
Re: Re: Opening file and checking for data
by hardburn (Abbot) on Jul 03, 2003 at 14:09 UTC

    I haven't done benchmarking, either, but I've seen them in the past. Slurping is by far faster than processing line-by-line. Reason is that doing I/O in a single operation is faster than doing a bit at a time, since you don't have to worry about things like resetting the drive head to the correct position.

    Naturally, you have to worry about memory limitations. There is really a limited number of cases when slurping is worth it. If your file is small enough to fit in memory, you won't see much speed gain. If it's too large, you'll end up swapping to the hard disk and will thus lose any benifits from slurping.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

      Perl's I/O is buffered, so it does one I/O for every disk block regardless of which method you use.
      #!/usr/bin/perl if ($ARGV[0] eq 'line') { print "Line-at-a-time\n"; while (<STDIN>) { print if /perl/; } } else { print "All at once\n"; my @arr = (<STDIN>); foreach (@arr) { print if /perl/; } }
      On my system, the block size is 4096 bytes. On an 8K file with 128 lines, we see:
      $ strace -e read /tmp/t29 line </tmp/t29.8192 >/dev/null
      ...
      read(0, "This is a line that contains the"..., 4096) = 4096
      read(0, "This is a line that contains the"..., 4096) = 4096
      read(0, "", 4096)                       = 0
      
      $ strace -e read /tmp/t29 slurp </tmp/t29.8192 >/dev/null
      ...
      read(0, "This is a line that contains the"..., 4096) = 4096
      read(0, "This is a line that contains the"..., 4096) = 4096
      read(0, "", 4096)                       = 0
      
      Still, the diamond operator takes some time to operate, so slurping is still probably faster, but not because of I/O.

      And of course the caching algoritms of your hard-drive and OS will have a big influence as well. If the file is small enough to be read in one go, slurping will not have any big added speed-benefit, but will increase memory load.

      And if the file is rather large, it may crowd out other items in your cache and slow down other programs: TANSTAAFL, as Heinlein said.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law