in reply to Re^6: Performance oddity when splitting a huge file into an AoA
in thread Performance oddity when splitting a huge file into an AoA

Hm. I would have taken a look, but when I try to unrar your archive, there are mutliple copies of files all with the same name and no path information, so each overwrites the last.

(I have to say that all those (2.25 MB of) htmls, pngs & csss seem like overkill as a way of presenting the same information that could be conveyed in three short text files. And with the latter, I could manipulate the numbers programmically rather than having to constantly chase my tail around several dozen html pages. )


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."
  • Comment on Re^7: Performance oddity when splitting a huge file into an AoA

Replies are listed 'Best First'.
Re^8: Performance oddity when splitting a huge file into an AoA
by Xenofur (Monk) on May 07, 2009 at 15:20 UTC
    I have absolutely no idea what could cause this, as I use completely standard rar.exe from the original producer to make these archives with nothing special about them whatsoever. In case this works better on your system, here's a zip file: http://drop.io/perl_performance/asset/ap-vs-cw-vs-sb-zip

    Next time i'll just upload the nytprof.out file. I kinda assumed that with what little information there is, looking at it normally would be all that's required. And as far as the format itself goes, ask the nytprof guys, i only used their tools. ;) (Should mention you're supposed to start with opening the "index.html" file.)

      The zip worked fine.

      I cannot make much sense of the statistics either. There is something weird going on. The spliting seems to be taking an inordinate amount of time.

      Part of the problem is that with all three files calling the same subroutine, all the statistics get lumped in together and averaged out, so you cannot see if there is any significant differences between the first run and the other two.

      To address that, I'd create C&P copies of the X() subroutine (say x1() x2() & x3()), and call a different version for each file. That will break out the timings for each file and perhaps highlight run to run differences.

      Also, from the numbers presented, it looks like end-of-loop overhead is getting lumped in with the last statement in the loop. To counter that, I'd stick a dummy statement at the bottom of the loop:

      my $dummy; sub x1{ open my $fh, '<', shift or die $!; my @AoA; while (my $line = <$fh>) { my @line_arr = split ',', $line; #push @AoA, \@line_arr; $dummy = 1; } }
      Next time i'll just upload the nytprof.out file.

      That would certainly be easier (I assume by "upload", you mean here to PM!). Especially for anyone attempting to follow along.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        As the nytprof files are binary, I can't upload them here. Instead I've made an effort to make the next batch of benchmarks more useful: http://drop.io/perl_performance/asset/arrays-zip

        I've taken your suggestions and implemented them. Also, there's only two variants this time, as one of them shows the performance issue and the other one has a tiny change which resolves the issue to an extent.

        Sidenote: Are you available on IRC? This stuff is very time-consuming and I'm way out of my depth here, so the only thing i literally can do is provide info in the hopes it helps others, but with the delays involved in the form of communication here i'm getting incredibly frustrated.