in reply to Fastest Search method for strings in large file

Why don't you take your time and try to explain us in detail what you want to achieve: how many strings you want to match, which kind of data there are on the file, etc.

Otherwise that will be like playing piņata!

  • Comment on Re: Fastest Search method for strings in large file

Replies are listed 'Best First'.
Re^2: Fastest Search method for strings in large file
by Anonymous Monk on Jul 14, 2008 at 13:04 UTC
    Basic requirement is to extract records from the file conataining the serch string and write it to another file. serach string will be regular sting size of 10 char. no of search strings will around 100-1000. file is delimited file with large no of records. (size 21gb)

      Re: Fastest Search method for strings in large file modified to print whole "\n" dilimited records to stdout:

      #! perl -slw use strict; use List::Util qw[ max ]; our $BUFSIZE ||= 2**16; my @needles = qw[ 12345 67890 ]; my $regex = '(?:' . join( '|', map quotemeta, @needles ) . ')'; my $maxLen = max map length, @needles; open FILE, '<', $ARGV[ 0 ] or die "$ARGV[ 0 ]: $!"; my( $toRead, $soFar, $offset ) = ( $BUFSIZE, 0, 0 ); while( my $read = sysread FILE, $_, $toRead, $offset ) { if( m[$regex] ) { while( m[^([^\n]*$regex[^\n]*$)]mg ) { print $1; } } $soFar += $read; my $len = length() - rindex $_, "\n"; substr $_, 0, $len, substr $_, -$len ; $offset = $len; $toRead = $BUFSIZE - $len; }

      On my system, performance tails off sharply with BUFSIZEs above 2**16.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.