Re: How to optimise " csv into mysql " using TEXT:CSV_XS and DBD::Mysql
by Corion (Patriarch) on Aug 06, 2015 at 08:51 UTC
|
| [reply] |
|
|
thank you for the hint, however, I tried to use LOAD DATA prior to this solution, but as input data is very dirty, I wanted to have more control on whats being imported. But maybe I can try to remove the garbage and then use LOAD DATA.
| [reply] |
|
|
| [reply] [d/l] [select] |
|
|
Re: How to optimise " csv into mysql " using TEXT:CSV_XS and DBD::Mysql
by GrandFather (Saint) on Aug 06, 2015 at 08:53 UTC
|
The first step is to run Devel::Profile over your code and see where the time is spent. If it's parsing the csv then there may not be much you can do. If it's adding rows you may find adding multiple rows at a time helps. But the important thing is to know where to focus attention and for that you need to profile your code and see where the bottlenecks are.
Premature optimization is the root of all job security
| [reply] |
|
|
Ouch, overkill
The first step is to understand just what the code is doing. If taiko doesn't understand what the code is doing, Devel::Profile won't help.
In this case taiko is connecting to an SQL database without explicitly disabling autocommit. This is where the slow down is occurring, as every time they call execute on the statement handler, it executes the statement then commits the change. I highly recommend they follow the advice given by Corion in this case, as using a bulk load tool will be even faster than disabling autocommit.
| [reply] |
|
|
Hi SimonPratt,
Hm, I sort of disagree. It is very rarely (well almost never) a bad idea to use a profiling tool when your code is running too slow. It is a known fact that developers are notoriously bad at figuring out where the code is slow without using a profiling tool.
And, basically, if you double the speed of a procedure or function that takes only 10% of the overall execution time, then your win will be about 5%, i.e. nothing substantial. And that happens very frequently. So it is really better to concentrate of the parts of the code that are really slow. Perhaps it is not what you think. Perhaps it is a single line of code that is really poorly designed, and you won't know until you test it.
So, even though I also have the general feeling, in the case in point, that the performance problem is probably with the database (and the autocommit, commit rate, transaction control, redo logs, and so on), you can't be sure until you have profiled your code. To me, the 30% improvement obtained by the original poster is quite small. In my experience, real performance breakthroughs are much higher (one to two orders of magnitude).
In short, if you don't have a performance problem, it is probably useless to optimize. But if you do have a performance problem, using a profiling tool (even a very limited one) is usually better than shooting in the dark.
I am working on a proprietary language for accessing a very large database. That language has no profiling tools. I was asked to improve performance of our extraction programs. After having become pi**ed off at manually modifying programs to find out where they spent a lot of time, I wrote a little Perl program that would modify our program sources in that language in order to get a detailed account of where the program was spending a lot of time. I am really happy that I did that and I wish I had done that earlier. Now, I know immediately if and where I can improve those programs.
In brief, if you have a performance problem, I do not think that profiling the code can ever be a mistake.
| [reply] |
|
|
|
|
|
|
Thanx for the hint, indeed switching AutoCOmmit off increased the speed by roughly 30%.
| [reply] |
|
|
|
|
Tried Devel::Profile, but its somehow not running and giving the error
String found where operator expected at (eval 8)[C:/Strawberry/perl/si
+te/lib/Devel/Profile.pm:133] line 2, near "croak 'usage: $io->getline
+()'"
(Do you need to predeclare croak?)
<br>syntax error at (eval 8)[C:/Strawberry/perl/site/lib/Devel/Profile
+.pm:133] line 2, near "croak 'usage: $io->getline()'"
| [reply] [d/l] |
|
|
| [reply] |
|
|
|
|
|
|
|
croak is a function exported by the Carp module. Try to add this to your code:
use Carp;
| [reply] [d/l] [select] |
|
|
|
|
|
|
Re: How to optimise " csv into mysql " using Text:CSV_XS and DBD::Mysql
by chacham (Prior) on Aug 06, 2015 at 13:13 UTC
|
my $query = qq{INSERT INTO some_table (id,$fieldList) VALUES (id,$field_placeholders)};
Side comment: Dynamic SQL is not secure. Admittedly, this one is more interesting in that it actually uses placeholders, but, building the placeholders string kind of defeats the purpose. That is, there's a benefit that strings need not be escaped and there will be strong typing, but the statement is no more secure.
If you're just doing an import, mysql supports it directly with LOAD DATA INFILE.
| [reply] |
|
|
$sth = $dbh->prepare("something something ?, ?");
$sth->execute(@args);
–and–
$place = "?, ?";
$sth = $dbh->prepare("something something $place");
$sth->execute(@args);
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
|
|
|
Re: How to optimise " csv into mysql " using Text:CSV_XS and DBD::Mysql
by locked_user sundialsvc4 (Abbot) on Aug 06, 2015 at 14:54 UTC
|
Why are you calling $sth->finish()?!
From DBI:
Indicate that no more data will be fetched from this statement handle before it is either executed again or destroyed. You almost certainly do not need to call this method.
Adding calls to finish after loop that fetches all rows is a common mistake, don't do it, it can mask genuine problems like uncaught fetch errors.
When all the data has been fetched from a SELECT statement, the driver will automatically call finish for you. So you should not call it explicitly except when you know that you've not fetched all the data from a statement handle and the handle won't be destroyed soon.
... etcetera ... go read the page at the link provided ...
I notice that the examples in DBD::MySQL, from which this code was probably more-or-less “lifted,” themselves include calls to finish(), which I think should be removed from that documentation.
Your design is appropriate in that it prepare()s the handle, then repeatedly uses it to execute() inserts that use placeholders. But I will wager that finish() is then doing all the wrong things to it, and I would gamble that this is where your speed is actually going-away.
| |
|
|
I have read the manual and got your point
will test if it makes any better to the performance of the scripts
thanx for you hint
edit: I removed the finish() from the while loop, but it didn't help much with performance,
| [reply] |