#3 is the idea I like most
I don't know much about american phone numbers, but if they all have a fixed length of 10, you'd just need slightly more than 1GB disk space to store one bit for each existing number.
I wouldn't create this bit vector in memory. Just create a big enough file, initialized with zeros and then go through your text file and position with fseek to $phone_num >> 3 and set bit number $phone_num & 7.
do the same positioning for read access, but check the bit.
I think searching will be done in less than a second.
Update: Of course you can couple this with the idea of splitting for each area code. This should reduce the summed size of your three files to 1/333 (about 4MB) if the area code has 3 numbers.
Update #2: If you have 10 numbers in each phone number and have 2million numbers you already have 21MB disk space used. So the bit vector on disk will save you 16MB.
In reply to Re^2: Searching text files
by Skeeve
in thread Searching text files
by SteveS832001
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |