in reply to Re^4: search array for closest lower and higher number from another array
in thread search array for closest lower and higher number from another array

You may want to consider the different types of disk I/O sub-systems in modern operating systems. Most *nix systems have Raw I/O support, Direct I/O support, Concurrent I/O support, Modular I/O support, etc. For example, most databases use raw I/O. Performance of these applications is usually better when using raw I/O rather than using other I/O methods, because it avoids the additional work of memory copies, logging, and inode locks.

My comment about perl was directed at which I/O subsystem perl was using, not that perl would be treated differently by that I/O sub-system.

When writing *nix utilities ( like grep ), system programmers were encouraged to write "cache aware" programs. Whenever possible, work on the cached version directly, and avoid memory to memory move/copy. ( For clarification, I use "move/copy" because the operating system may perform a move rather than a copy, but this happens in the paging I/O sub-system, and has to do with paging performance. This is transparent to the application. )

All I was trying to point out, was that a 500MB file on a test machine may be cached, but may not be cached on a production machine, and that a pure perl solution may very well be a better solution on a production machine. But that is the decision of the OP

"Well done is better than well said." - Benjamin Franklin

  • Comment on Re^5: search array for closest lower and higher number from another array

Replies are listed 'Best First'.
Re^6: search array for closest lower and higher number from another array
by bigbot (Beadle) on Feb 07, 2011 at 07:06 UTC
    Well I ran 10 iterations on the benchmark as you saw. I have also done greps on fresh data files (which were certainly not cached) and the difference in speed was very small (~1%) compared with subsequent searches. I do believe grep is far faster than even running data files through an empty loop. I definitely wish there was a better pure Perl solution.

    This is of course because I'm using grep as a "dumb" tool to get context around the match. Then I feed this data into Perl where the true parsing is done (to remove irrelevant lines I don't wish to see). If I could do everything inside a Perl loop I would imagine it would be more efficient. In this case however Perl needs to find the line with the data header before the match, and continue after the match until the next header. I just haven't found a better way than "pre-searching" the file with grep. It's fast enough, but could it be faster? :D I'm turning into an efficiency addict now.

      bigbot

      I don't think anyone here would tell you that a perl script could/would be faster than a "highly" optimized C program ( I could be wrong ). And there is nothing wrong with using the system command 'grep' to produce your solution.

      But if you still want to improve the execution time, some things to look at:

      • In your test script you used the regex match

        $matchCount++ if ($_ =~ /$string/o);

        however in this case, a simple "index" test should be faster

      • If you read blocks of data ( read or sysread ) you would decrease the difference between grep and perl, but now the complexity of you script has increased.
      All of these improvements are nice, but is it worth spending the time to debug the improved script to gain 'nn' seconds at execution time. That is your call -- balancing your time efficiency versus the script efficiency. Chapter 24 of The Camel book does an excellent job of explaining the trade-offs.

      Good Luck

      "Well done is better than well said." - Benjamin Franklin

        Thank you Flex I will certainly look into that book!