in reply to Re: perl performance vs egrep
in thread perl performance vs egrep

I wouldn't think threaded searching would help. The operating system knows that many programs read file sequentially, thus, if you read some part of a file from disk, the OS will probably read further if the disk is free, so that the program can access the rest of it faster. (You may even be able to override this behaiviour with the posix_fadvise function.)

Replies are listed 'Best First'.
Re^3: perl performance vs egrep
by superfrink (Curate) on Jan 23, 2005 at 21:16 UTC
    My first thought about making the searching multi-threaded is that the disk then has to read from multiple files (for each thread). This probably means the read head on the hard drive will be required to move around the surface of the disk more than just reading each file sequentially.

    It is hard to predict which approach will give the best read performance. I think it's a reasonable to assume that your OS and filesystem try to keep the files stored sequentially on the disk so I would expect searching each file in sequence is probably faster.

    Maybe you might want to time running 'wc' on all of the files in sequence vs all of the files at different concurrencies to see what works best for reading the data.

    The whole point is your disk is probably much slower than your CPU so it will probably be a much bigger bottleneck especially if you go and start moving the read head around a lot.
Re^3: perl performance vs egrep
by exussum0 (Vicar) on Jan 23, 2005 at 14:39 UTC
    Buffering will help, no doubt, but at some point, the process will have to block to read in a couple of bytes or a large chunk of data. Also in the situation with dual CPUs, having both chug away could be adventageous.

    ----
    Give me strength for today.. I will not talk it away..
    Just for a moment.. It will burn through the clouds.. and shine down on me.