in reply to Perl reads minimum 8KB from files. Can this be lowered?

I've read somewhere that DOS/Windows reads an entire cluster at a time. This size can vary anywhere between 4KB and 32KB. But this is hard-coded into the OS. So, there's not much you can do. If you're using an NTFS file system with 4KB clusters, then that means each read operation that is 4KB or less will grab 4KB. Even if you're just reading one byte from a file. If you want to read 5000 bytes, it will read 8 kilobytes. There's no way around it. In Perl, you can turn off buffering, and maybe that will help, but I'm no expert.

The sysread() function will give you however many bytes you want to get, but that doesn't mean that at the lowest level you're forcing the OS to read smaller chunks. You can't.

  • Comment on Re: Perl reads minimum 8KB from files. Can this be lowered?

Replies are listed 'Best First'.
Re^2: Perl reads minimum 8KB from files. Can this be lowered?
by sectokia (Friar) on Apr 11, 2022 at 12:11 UTC
    It appears on windows reguardless of the file system cluster size, you can fetch 512b with sysread. At least on my system this is the minimum it ends up being. Clusters really only matter to the file system for its allocation of clusters to a file, where as the devices only care about blocks.
      Well, what I was saying is that you can fetch any number of bytes using sysread(). You can fetch just one byte or you can read the entire file with one call. But behind the scenes, Windows does a lot of buffering. So, instead of just reading 512 bytes of a file, it reads an entire cluster. That may be 4KB or 32KB...whatever the size of the cluster. Sysread() will give you 512 or 513 bytes, if that's what you requested, but the OS will read more, because that's how the system is designed.
        Sysread() will give you 512 or 513 bytes, if that's what you requested, but the OS will read more, because that's how the system is designed.

        That continues to below the operating system. In the PC world, both hard disks and floppy disks have had sector sizes of 512 bytes from day 1, and there was and is simply no way to read less than a single sector. CDROMs introduced a sector size of 2048 bytes, with the same restriction, you have to read whole sectors. Modern large capacity hard disks have sector sizes of 4096 bytes. Only those can emulate hard disk with 512 byte sector size, i.e. you can read and write less than a sector. Of course, the performance penalty for writing an emulated 512 byte sector means that the hard disk has to read the whole 4096 bytes of the physical sector to its internal memory, then modify one of the eight emulated sectors, and write back the physical sector. That needs at least one full rotation of the spindle. Compare to writing a full physical sector, needing no read and just a single write at the right time.

        The real implementationis much more complex, involving caching, relocation of sectors, and things get really crazy with Shingled magnetic recording, where the hard disk has to rewrite a bunch of sectors because the sectors overlap each other. In that regard, SMR is compareable to flash memory (SD-Cards, USB-Sticks, SSDs, etc), where erase blocks are typically much larger than write blocks, and so you have to shuffle data around before writing used blocks.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)