in reply to Re: How to optimise " csv into mysql " using TEXT:CSV_XS and DBD::Mysql
in thread How to optimise " csv into mysql " using Text:CSV_XS and DBD::Mysql

Ouch, overkill

The first step is to understand just what the code is doing. If taiko doesn't understand what the code is doing, Devel::Profile won't help.

In this case taiko is connecting to an SQL database without explicitly disabling autocommit. This is where the slow down is occurring, as every time they call execute on the statement handler, it executes the statement then commits the change. I highly recommend they follow the advice given by Corion in this case, as using a bulk load tool will be even faster than disabling autocommit.

  • Comment on Re^2: How to optimise " csv into mysql " using TEXT:CSV_XS and DBD::Mysql

Replies are listed 'Best First'.
Re^3: How to optimise " csv into mysql " using TEXT:CSV_XS and DBD::Mysql
by Laurent_R (Canon) on Aug 06, 2015 at 20:06 UTC
    Hi SimonPratt,

    Hm, I sort of disagree. It is very rarely (well almost never) a bad idea to use a profiling tool when your code is running too slow. It is a known fact that developers are notoriously bad at figuring out where the code is slow without using a profiling tool.

    And, basically, if you double the speed of a procedure or function that takes only 10% of the overall execution time, then your win will be about 5%, i.e. nothing substantial. And that happens very frequently. So it is really better to concentrate of the parts of the code that are really slow. Perhaps it is not what you think. Perhaps it is a single line of code that is really poorly designed, and you won't know until you test it.

    So, even though I also have the general feeling, in the case in point, that the performance problem is probably with the database (and the autocommit, commit rate, transaction control, redo logs, and so on), you can't be sure until you have profiled your code. To me, the 30% improvement obtained by the original poster is quite small. In my experience, real performance breakthroughs are much higher (one to two orders of magnitude).

    In short, if you don't have a performance problem, it is probably useless to optimize. But if you do have a performance problem, using a profiling tool (even a very limited one) is usually better than shooting in the dark.

    I am working on a proprietary language for accessing a very large database. That language has no profiling tools. I was asked to improve performance of our extraction programs. After having become pi**ed off at manually modifying programs to find out where they spent a lot of time, I wrote a little Perl program that would modify our program sources in that language in order to get a detailed account of where the program was spending a lot of time. I am really happy that I did that and I wish I had done that earlier. Now, I know immediately if and where I can improve those programs.

    In brief, if you have a performance problem, I do not think that profiling the code can ever be a mistake.

      Hi, Laurent_R

      I upvoted your response, because I can see where you're coming from and do agree that profiling tools have their place.

      I disagree however, with the idea that profiling is never a bad idea. Someone not understanding what the code is doing will get nothing more than a very general idea of their script spending most of its time doing y. They might then spend an inordinate amount of time 'fixing' y to run much more quickly, without ever understanding that changes in other parts of the code means y is called less often (or even completely factored out), potentially resulting in much better performance gains than fixing y might ever yield.

      So yes, I agree that profiling is a very useful tool, however it can be a double edged sword and should be used carefully when the developer understands what their code is doing.

        Hi SimonPratt,

        although I have expressed some disagreement, I also upvoted both your posts, because I understand your point.

        I disagree however, with the idea that profiling is never a bad idea.
        That's not what I said. Well, not quite. ;-) I said "very rarely (well almost never) a bad idea".

        But I fully agree with you that, in order to improve code performance, you need a deep understanding of your program, as well as of the programming language you are using and the external resources your are relying on (files, database, network, etc.). But if I see that, say, my program is spending a lot of time in subroutine Y, then I will certainly not only think about how can I make Y faster, but also if I can call Y less often (perhaps by caching some data), or replace Y altogether with something better. Or even possibly change the whole program architecture and/or algorithm, and re-factor the whole shebang, but that of course comes with a higher cost (non regression testing, etc.), but I have done it a few times.

        But before stating to do that, I need to ascertain that it is really Y that is guilty. Maybe the problem is somewhere else.

Re^3: How to optimise " csv into mysql " using TEXT:CSV_XS and DBD::Mysql
by taiko (Sexton) on Aug 06, 2015 at 13:07 UTC
    Thanx for the hint, indeed switching AutoCOmmit off increased the speed by roughly 30%.

      No worries :-)

      I would also suggest you look into that finish call. It seems odd to be calling it immediately after executing the statement each time. A more expected workflow would be to prepare a statement, execute it multiple times, finish the statement, commit (or rollback) changes to the DB, then disconnect

      I use DBI here, but only for Microsoft SQL Server and have never used the finish command, so not really sure what it does