I am not actually reading all the data into Perl, just the smaller of two
tables in an SQLite DB (one has 40000 records, the other 26 million).
Profiling with Devel::NYTProf indicated that around 50% of the time
was spent doing SELECTs on the database, so that looked like a good starting
point for optimisation. As I have access to a cluster with available memory
varying between 18-90 GB per node, I thought it might be worthwhile
investigating whether some of the operations could be performed in memory,
especially as I currently only require two such look-up tables.
The idea is that with the look-up tables I can create them once and thereby
effectively store the results of multiple SELECTS, which can then be accessed
via the hash key. This seemed like it might be faster in Perl, but as you
point out, a single SELECT is what the DB is optimised to do. So I guess my mileage
may vary.
Cheers,
loris
| [reply] |
I'd suggest creating temporary copies of one or both of your tables in an SQLite In-memory DB.
If you really need more performance than that gives you; and if your select queries are sufficiently predictable that you could pre-index the smaller table, that might be worth thinking about.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
I did try the in-memory approach, but this didn't seem any faster than
reading from the file system, presumably because the entire database file,
which is only 1.4 GB, is being cached anyway.
But I shall look into indexing.
Thanks,
loris
| [reply] |