Well, there are some problems with your benchmark. I'm going to ignore CSV since its obviously the slowest.
First, your method of inserting into the mysql DB is the SLOWEST of the three options:
Benchmark: timing 500 iterations of dbh_insert, dbh_insert_values, sth
+_insert...
dbh_insert: 28 wallclock secs ( 5.99 usr + 1.68 sys = 7.67 CPU) @ 65
+.19/s (n=500)
dbh_insert_values: 27 wallclock secs ( 6.02 usr + 1.82 sys = 7.84 CP
+U) @ 63.78/s (n=500)
sth_insert: 59 wallclock secs (28.48 usr + 2.41 sys = 30.89 CPU) @ 16
+.19/s (n=500)
I used your database format with a small change of adding an index on ID which slows down the inserts.
For those curious, here are the BM stats of the three insert formats without an index on ID.
Benchmark: timing 500 iterations of dbh_insert, dbh_insert_values, sth
+_insert...
dbh_insert: 20 wallclock secs ( 5.30 usr + 1.41 sys = 6.71 CPU) @ 74
+.52/s (n=500)
dbh_insert_values: 23 wallclock secs ( 5.62 usr + 1.45 sys = 7.07 CP
+U) @ 70.72/s (n=500)
sth_insert: 55 wallclock secs (27.53 usr + 2.50 sys = 30.03 CPU) @ 16
+.65/s (n=500)
Second problem is the actual import subroutines.
Your import routine is skewed in favor of flatfile since it is simulating a mass import, not periodic imports.
Your code:
sub insert_flatfile {
# Note: files opened in > mode again to prevent disk space iss
+ues
open(FF, ">$flatfile") || die "Unable to open '$flatfile': $!"
+;
foreach(@rows) {
print FF join('|', ($_->{ID}, $_->{name}, $_->{address
+}, $_->{city}, $_->{state}, $_->{zip})), "\n";
}
close(FF);
}
Now, to simulate a production enviorment it should be more like this:
sub insert_flatfile {
# Note: files opened in > mode again to prevent disk space iss
+ues
open(FF, ">$flatfile") || die "Unable to open '$flatfile': $!"
+;
close(FF);
foreach(@rows) {
open(FF, ">>$flatfile") || die "Unable to open '$flatf
+ile': $!";
flock(FF,LOCK_EX);
print FF join('|', ($_->{ID}, $_->{name}, $_->{address
+}, $_->{city}, $_->{state}, $_->{zip})), "\n";
flock(FF,LOCK_UN);
close(FF);
}
}
Note: There is still a problem with this routine since in a production enviorment, you would either have to read the flatfile to find out the next ID to store ( or another file where you store the next id ) while MySQL could use an auto_increment on the ID column. But for this benchmarks needs its really overkill.
Note2:If simulating the mass importation of data, MySQL can be greatly speed up using
insert into table (blah,blah2,blah3) values ('vbla1','vbla2','vbla3'),
('vbla1b','vbla2b','vbla3b'),
ect...
Note3: Depending on how the data is used.. weither it would need to be instantly retrieved or not insert delayed can be used which makes the inserting APPEAR to be near instantanious.
Here are my results for the benchmarks using my 2 changes WITHOUT an index on ID
Benchmark: timing 500 iterations of Flatfile_Insert, Mysql_Insert...
Flatfile_Insert: 8 wallclock secs ( 4.29 usr + 3.30 sys = 7.59 CPU)
+ @ 65.88/s (n=500)
Mysql_Insert: 24 wallclock secs ( 5.32 usr + 1.30 sys = 6.62 CPU) @
+75.53/s (n=500)
========================================
Benchmark: timing 500 iterations of Flatfile_Select, Mysql_Select...
Flatfile_Select: 15 wallclock secs ( 7.99 usr + 5.81 sys = 13.80 CPU)
+ @ 36.23/s (n=500)
Mysql_Select: 127 wallclock secs (35.89 usr + 4.20 sys = 40.09 CPU) @
+ 12.47/s (n=500)
========================================
Benchmark: timing 500 iterations of Flatfile_Update, Mysql_Update...
Flatfile_Update: 175 wallclock secs (146.59 usr + 24.77 sys = 171.36 C
+PU) @ 2.92/s (n=500)
Mysql_Update: 189 wallclock secs (10.51 usr + 1.92 sys = 12.43 CPU) @
+ 40.23/s (n=500)
========================================
We can only do 1 iteration of the delete because it necessarily destro
+ys our datasource.
Benchmark: timing 1 iterations of Flatfile_Delete, Mysql_Delete...
Flatfile_Delete: 0 wallclock secs ( 0.17 usr + 0.01 sys = 0.18 CPU)
+ @ 5.56/s (n=1)
(warning: too few iterations for a reliable count)
Mysql_Delete: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
(warning: too few iterations for a reliable count)
And now with an index on ID:
Benchmark: timing 500 iterations of Flatfile_Insert, Mysql_Insert...
Flatfile_Insert: 8 wallclock secs ( 4.02 usr + 3.36 sys = 7.38 CPU)
+ @ 67.75/s (n=500)
Mysql_Insert: 26 wallclock secs ( 5.74 usr + 1.93 sys = 7.67 CPU) @
+65.19/s (n=500)
========================================
Benchmark: timing 500 iterations of Flatfile_Select, Mysql_Select...
Flatfile_Select: 14 wallclock secs ( 8.25 usr + 5.39 sys = 13.64 CPU)
+ @ 36.66/s (n=500)
Mysql_Select: 85 wallclock secs (34.61 usr + 5.33 sys = 39.94 CPU) @
+12.52/s (n=500)
========================================
Benchmark: timing 500 iterations of Flatfile_Update, Mysql_Update...
Flatfile_Update: 179 wallclock secs (147.56 usr + 25.98 sys = 173.54 C
+PU) @ 2.88/s (n=500)
Mysql_Update: 37 wallclock secs (10.73 usr + 2.33 sys = 13.06 CPU) @
+38.28/s (n=500)
========================================
We can only do 1 iteration of the delete because it necessarily destro
+ys our datasource.
Benchmark: timing 1 iterations of Flatfile_Delete, Mysql_Delete...
Flatfile_Delete: 0 wallclock secs ( 0.13 usr + 0.04 sys = 0.17 CPU)
+ @ 5.88/s (n=1)
(warning: too few iterations for a reliable count)
Mysql_Delete: 0 wallclock secs ( 0.01 usr + 0.01 sys = 0.02 CPU) @
+50.00/s (n=1)
(warning: too few iterations for a reliable count)
EDIT: I was looking at it some more and I decided I didnt think the select routine was the best way of testing either.
Basically your selecting each row once , just in a random order. A better way to simulate a production enviorment would be to crank up the iterations ( $number_of_rows * 500 for example ) and have the subroutine pick a random row to select from. $a = int(rand($number_of_rows))
Pete insert into pete values('red hair','near green eyes','overinflated ego'); |