in reply to Speeding up data lookups

By monkeying around w/ your own searches, you're probably reinventing some wheels, and doing it rather ineffectively.

Any high access / high volume data should be indexed as a rule of thumb. Even moving your data into something like SQLite would give you the benefit of indexing.

If it were me, I'd definitely persue some sort of robust indexing mechanism first. Probably think about threading & smart serialization next.

Good luck -- that actually sounds like a fun piece of work.

Replies are listed 'Best First'.
Re^2: Speeding up data lookups
by cees (Curate) on Sep 20, 2005 at 14:27 UTC

    SQLite is a fantastic piece of work, but I found that once your dataset starts getting large, it becomes quite slow at dumping the data in. Once your sqlite file is over a Gig or so, inserts start slowing to a crawl compared to when the DB is empty (and that is using transactions and dumping in 1000 or 10000 rows at a time). Queries were still surprisingly fast, but inserts really sucked.

    It is of course possible that I was missing something, but I think using DBM files would be faster if you need to regenerate the databases often (ie new data files come in regularly)

    It does sound like a fun project though...

      For my type of work, I load SQLite DBs once, and then do various studies against that data. I load the entire shebang in one transaction, sometimes several million records in one go.

      I've never really studied the difference between the first 100,000 and say the 10th, but that would be quite interesting to see. So, here's a benchmark:

      Inserting 4096 byte strings for 15s per benchmark, ten sets, all in one mongo transaction. Final file size is 2.3G.

      3122.67/s (n=50681) 3166.34/s (n=48825) 3165.59/s (n=50966) 3206.15/s (n=51619) 3157.13/s (n=50230) 3179.75/s (n=50399) 3188.18/s (n=50692) 3211.26/s (n=51316) 3098.32/s (n=49914) 3046.30/s (n=49807)

      I don't really see any slowdown, but maybe I should change the test a bit. At any rate, I'm sticking -- IMNSHO, SQLite is one of the most under-rated pieces of code out there. (When used as indicated.)