Did you ever settle upon a solution?
For grins, I just ran a test that looked up 1000 randomly generated 10-digit telephone numbers (nnn-nnn-nnnn) in a flatfile database containing approximately 6.6% (2e6 / 3e7) of the 1e10 numbers:
c:\test>572961 9991230061 9991230061 is not found 9991230062 9991230062 is found 9991230063 9991230063 is not found Terminating on signal SIGINT(2) c:\test>perl -wle"printf qq[%03d%03d%04d\n], int( rand 1000 ), int( ra +nd 1000 ), int( rand 10000 ) for 1 .. 1e3" | perl 572961.pl >nul File for area code '000' not found at 572961.pl line 12, <STDIN> line +57. 999 trials of lookup (32.287s total), 32.319ms/trial
Each lookup takes around 33 ms which ought to be quick enough for most purposes.
The disk files (for all 999 possible area codes) require 10 GB, though that could trivially be reduced to 2.5 GB. Each area code is stored in a separate file, with one line of 10,000 characters for each of the 999 subarea codes; and each byte in the line representing a single telephone number by a simple '0' or '1'.
The lookup process is:
Care to trade 10 MB (2.5 MB) of diskspace per area code for 32 ms lookup time regardless of how the application grows?
#! perl -slw use strict; use Benchmark::Timer; my $T = new Benchmark::Timer; while( my $number = <STDIN> ) { chomp $number; $T->start( 'lookup' ); if( my( $area, $subarea, $no ) = $number =~ m[^(\d{3})(\d{3})(\d{4 +})$] ) { open FILE, '<', "./tele/$area" or warn "File for area code '$area' not found" and next; seek FILE, ( $subarea - 1 ) * 10002, 0; my $mask = <FILE>; print "$number is ", ( substr $mask, ( $no - 1 ), 1 ) ? 'found' : 'not found'; } else { print "Invalid telephone number: $number"; } $T->stop( 'lookup' ); } $T->report;
In reply to Re: Searching text files
by BrowserUk
in thread Searching text files
by SteveS832001
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |