I'm in a classic position of having data in a flat file that has grown large over time and is starting to take too long to search. The technology needs to move on. I've searche the databse here and haven't found so much about large constant datbases and hence this...
The existing setup
15,000 records/lines with about 12 pipe delimited fields.
File size - 15 MB
each week we add about 40 new records to the database - it's a growing problem!
It's essentially a constant database and in real time we don't add things in or sort etc.
Of the 12 fields we allow searching on 7 defined fields. Sometimes we just search on one field but much of the time we search on 3 or more fields.
for info the 7 fields we search on occupy about 10Mb of the 15MB total
Searching is done with an old and somewhat modified Salina Sol cgi perl script which iterates over the text file a line at a time.
The script runs under cgi - not mod_perl or any special environment. The webserver is Apache.
Searching on one field takes about 10 seconds to pull up 40 results
and searching on 3 fields takes about 12 secs.
I like the existing setup because we can see the data easily and correct any problems etc.
A new setup
I'd like a system that could handle 45,000 records of the existing size and search them on the same 7 fields in reasonable time - say 5 seconds on my setup.
I'm not too keen to go down a full blown MySQL type route because I belive that searching on multiple fields will be slower then other possible solutions and I don't need the benefits of a relational db particularly. I know some people will be appalled at this view and to them I apologise in advance.
I'm open to other suggestions on the best way forward (and I suppose relational solutions too if they can meet the spec). It occurred to me that I could put all the information in a cdb database, each line having a unique key and have a smaller database, with just the 7 fields, and the key, held in memory. The figures would show that as being about 15Mb in size. If it were done with mod_perl then the searchable bit of the databases would be loaded once from a disk file and thereafter searches should be pretty quick - if with the on-cost of getting the full entries out of the cdb database. But I have never used mod_perl before and my perl experience is pretty low I have to say.
Thanks for your help asnd ideas...
In reply to Large Constant Database in Text File by stephentyler
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |