in reply to Re^8: Performance oddity when splitting a huge file into an AoA
in thread Performance oddity when splitting a huge file into an AoA

The zip worked fine.

I cannot make much sense of the statistics either. There is something weird going on. The spliting seems to be taking an inordinate amount of time.

Part of the problem is that with all three files calling the same subroutine, all the statistics get lumped in together and averaged out, so you cannot see if there is any significant differences between the first run and the other two.

To address that, I'd create C&P copies of the X() subroutine (say x1() x2() & x3()), and call a different version for each file. That will break out the timings for each file and perhaps highlight run to run differences.

Also, from the numbers presented, it looks like end-of-loop overhead is getting lumped in with the last statement in the loop. To counter that, I'd stick a dummy statement at the bottom of the loop:

my $dummy; sub x1{ open my $fh, '<', shift or die $!; my @AoA; while (my $line = <$fh>) { my @line_arr = split ',', $line; #push @AoA, \@line_arr; $dummy = 1; } }
Next time i'll just upload the nytprof.out file.

That would certainly be easier (I assume by "upload", you mean here to PM!). Especially for anyone attempting to follow along.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^10: Performance oddity when splitting a huge file into an AoA
by Xenofur (Monk) on May 09, 2009 at 10:07 UTC
    As the nytprof files are binary, I can't upload them here. Instead I've made an effort to make the next batch of benchmarks more useful: http://drop.io/perl_performance/asset/arrays-zip

    I've taken your suggestions and implemented them. Also, there's only two variants this time, as one of them shows the performance issue and the other one has a tiny change which resolves the issue to an extent.

    Sidenote: Are you available on IRC? This stuff is very time-consuming and I'm way out of my depth here, so the only thing i literally can do is provide info in the hopes it helps others, but with the delays involved in the form of communication here i'm getting incredibly frustrated.

      Having spent a couple of hours flicking between the two sets of results in your latest zip, I am at a loss to explain or reproduce the problem. I doesn't make any sense at all to me why

      my @line_arr = split ',', $line;

      would take 8 times longer in one run relative to another. Assuming the same interpreter is being used for both runs.

      As I cannot reproduce it, and nobody else has spoken up to say that they can, it would appear to be confined to your system. If you have a work around, and are not concerned that this will adversely affect your other programs, then simply drop the issue.

      Quite frankly, I find this NYT prof output very difficult to use. Pretty, but otherwise essentially useless. But, once again, I seem to be in a minority here.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        You mean running this in 32 bit ActivePerl or Strawberry Perl doesn't do this on your system? And yes, the two results in the latest zip are from the same interpreter.

        As for NYTProf, I'm only using it because nothing different was suggested and because it is the most advanced of the profilers i know.

        Either way, thanks a lot for your time and assistance with this. :)
      You've had the change that seems to (inexplicably) resolve the performance issue for a week now. Does your frustration comes from not being able to use it for some reason? Or are you like me impatient to find out what the heck this one is about? :)
        Exactly this. :)

        I could've walked away long ago, but i really want to fix this in one way or another for good. It's just a bit hard to justify 1-2 hours of expense whenever i sit down to think up ways to modify the code, muck around with different Perl installations, run the benchmarks, try to get it all in a presentable format and write up short descriptions of what i did. (Plus, i get to spend hours after what worrying and wondering if i missed something.)