I also don't know why there is the difference in benchmarks. After converting to Perl 5.10 code for my machine (push $aref.. and keys @array aren't valid until 5.14), I got similar results.
If you are using a real disk drive and database is not "online", then
$dbh->do('PRAGMA synchronous = 0');
sped your 'old' benchmark up from 47 secs to 32 secs on my older model machine. The SQLite app that I'm working on now takes in this range for a 300K row table with mainly varchar's to be created. So I think you are in the "zone of reason" in terms of performance. Since you are mainly using ints, might be interesting to try with numeric values instead of text in your benchmark.
There is a bulk import facility for SQLite that works with a csv format. I have never used it but, it might be quicker. I don't know where your actual data comes from. | [reply] [d/l] |
It turned out to be even more peculiar:
dwalin@sol:~$ /tmp/perl-5.8.9/bin/perl /tmp/test.pl 100 1000 100
Benchmark: timing 100 iterations of new, old...
new: 25 wallclock secs (24.35 usr + 0.15 sys = 24.50 CPU) @ 4.08/s (n=100)
old: 9 wallclock secs ( 8.99 usr + 0.06 sys = 9.05 CPU) @ 11.05/s (n=100)
dwalin@sol:~$ /tmp/perl-5.10.1/bin/perl /tmp/test.pl 100 1000 100
Benchmark: timing 100 iterations of new, old...
new: 34 wallclock secs (33.95 usr + 0.17 sys = 34.12 CPU) @ 2.93/s (n=100)
old: 12 wallclock secs (11.28 usr + 0.04 sys = 11.32 CPU) @ 8.83/s (n=100)
dwalin@sol:~$ /tmp/perl-5.14.1/bin/perl /tmp/test.pl 100 1000 100
Benchmark: timing 100 iterations of new, old...
new: 23 wallclock secs (22.74 usr + 0.09 sys = 22.83 CPU) @ 4.38/s (n=100)
old: 10 wallclock secs (10.60 usr + 0.03 sys = 10.63 CPU) @ 9.41/s (n=100)
Curioser and curioser, said Alice... It seems that Perl has indeed improved in 5.14 over 5.10 - but before that, it was even faster in 5.8. Now that's a dilemma for me: opt for Modern Perl quirks and syntactic sugar I haven't been able to get used to yet, or go with ole goode 5.8 just because of sheer speed... 5.8.9 is 17% faster than 5.14.1 with 'old' routine, that's nothing to sneeze at, especially in my situation - I need to squeeze every reasonable percent out of DB insertion code... That's one tough decision.
And I'm afraid no, I can't use bulk import facility in SQLite. The real world application I'm working on is significantly more complicated than the test I'm running here. There are several tables with mixed types of data; INSERT statements are intertwined with DELETEs and SELECTs and all of it is wrapped in BEGIN IMMEDIATE TRANSACTION/COMMIT because of concurrency. I tried the external application way but it sucked - sqlite had to parse a lot of similar statements as if they were different, and this brought it down to its knees.
Regards,
Alex. | [reply] |
I suspect that there are some problems with IMMEDIATE.
http://www.sqlite.org/lang_transaction.html
I am not a DB guy and I don't know the "best" answer, if there even is one. I mean if process A reads the DB and then wants to update some record, but process B has read the same record and want's to update it too! Who's update "wins"? Maybe this is a complicated thing where if process A had known about process B's update in advance, it would have made a different decision? In other words, what process A will do is "nonsense" given what process B just did. If this is a single field in a record, maybe a password, I can see it..last guy wins and there is really not a problem. If each update is "atomic" and self consistent, then maybe this IMMEDIATE keyword is not necessary? I defer to the DB guru's here.
| [reply] |