in reply to threads and multiple filehandles

Have you tried "grep" yet to see if the result returns in acceptable time?
if your files are sorted then you could break up the problem by seeking half way into the file to see if the number in question is smaller or larger....then seek 1/4 or 3/4 into the file, depending on previous result...then possibly one more iteration of such "divide and conquer" before doing a scan of 1/8 of the file, as the rest has been eliminated.
another alternative is to pre-process the files and generate (say) 100 files, dividing by the first two digits of the number. then when you do the actual searching, the first two digits of the number in question tell you which (much reduced in size) file to search in.
the hardest line to type correctly is: stty erase ^H

Replies are listed 'Best First'.
Re^2: threads and multiple filehandles
by zentara (Cardinal) on Sep 21, 2006 at 18:20 UTC
    Have you tried "grep" yet to see if the result returns in acceptable time?

    Hi, I was playing around somemore with it, and found that the fastest search is perl's regex engine

    # slurp file into $buf if( $buf =~ /($searchnum)/ ){....}
    This was faster than running backticks with the c grep.

    So I guess the trick is decide on how much memory you want to use at any one moment, read in overlapping segments of the file of that size, then regex it.


    I'm not really a human, but I play one on earth. Cogito ergo sum a bum