in reply to Re: Where's the leak?
in thread Where's the leak?

One bad thing about line-by-line in this users case though is that it will be much slower as he is reading these files over the network and the backend in windows will be way more effeciant if he pulls the whole file at once. That said if memory is more of a concern than speed line-by-line is the way to go here.

-Waswas

Replies are listed 'Best First'.
Re: Re: Re: Where's the leak?
by dws (Chancellor) on Dec 23, 2002 at 21:05 UTC
    One bad thing about line-by-line in this users case though is that it will be much slower as he is reading these files over the network and the backend in windows will be way more effeciant if he pulls the whole file at once.

    Do you have evidence to support this? My experience says the opposite. For one, reading the file in slurp mode doesn't save substantial network traffic over reading it line-at-a-time, since disk pages are read and buffered to support per-line access. For another, assuming the pattern you're trying to match occurs once and is distributed randomly through the target file, on average you'll only need to read half the file to match it.

      I have ran into a few projects using C where mmapped files over network mounts on windows were dropped in favor of a full read of the file in order to get the low level windows networking code to burst the file. In this case though your point may be true, I guess if the match happens randomly in the file there would be no need to have the whole file transfered across the network. I my cases I need access to the whole file every time. A good example between memmapped file access and a full open/read triggering the burst mode is simple though, copy with explorer almost always triggers the burst mode -- try installing ms office across a network drive ( the installer mmaps the cabs) time it, then time copying the files across and installing. Or a perl only test is a slurp and dump local vs line by line dump to local on a large text file. dws++ for bringing up a point I completly missed though.

      Edited:
      Also as far as I know the bursting mode does not work on samba servers as far as I know.

      -Waswas
Re^3: Where's the leak?
by Aristotle (Chancellor) on Dec 23, 2002 at 21:35 UTC
    Enter buffering. Perl doesn't read the file line by line, even if your code requests it that way.

    Makeshifts last the longest.

      I may be out of date, but doesnt perl use its PerlIO layer wich just indirectly uses fseek,fwrite and ftell ad nauseum to mem map files for line-by-line? Last time I looked I dont think I saw that it buffered the entire file, which is what you need to do (ie read the entire file in one swoop) to get windows busrting mode to kick in.

      stdio
      Layer which calls fread, fwrite and fseek/ftell etc. Note that as this is "real" stdio it will ignore any layers beneath it and got straight to the operating system via the C library as usual.

      perlio
      This is a re-implementation of "stdio-like" buffering written as a PerlIO "layer". As such it will call whatever layer is below it for its operations.

      -Waswas
        Ah. I thought you meant it was reading the file one line at a time over the network - which it doesn't, it gobbles up larger chunks and hands you the lines out of the current one from a buffer. It won't buffer the entire file at once though, that's correct.

        Makeshifts last the longest.