in reply to Re: Searching text files
in thread Searching text files
#3 is the idea I like most
I don't know much about american phone numbers, but if they all have a fixed length of 10, you'd just need slightly more than 1GB disk space to store one bit for each existing number.
I wouldn't create this bit vector in memory. Just create a big enough file, initialized with zeros and then go through your text file and position with fseek to $phone_num >> 3 and set bit number $phone_num & 7.
do the same positioning for read access, but check the bit.
I think searching will be done in less than a second.
Update: Of course you can couple this with the idea of splitting for each area code. This should reduce the summed size of your three files to 1/333 (about 4MB) if the area code has 3 numbers.
Update #2: If you have 10 numbers in each phone number and have 2million numbers you already have 21MB disk space used. So the bit vector on disk will save you 16MB.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Searching text files
by rminner (Chaplain) on Sep 15, 2006 at 05:18 UTC |