in reply to Re^3: Performance oddity when splitting a huge file into an AoA
in thread Performance oddity when splitting a huge file into an AoA

The -V of my ASPerl:
d:\Web-Dev\arrays>perl -V Summary of my perl5 (revision 5 version 10 subversion 0) configuration +: Platform: osname=MSWin32, osvers=5.00, archname=MSWin32-x86-multi-thread Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS PERL_MALLOC_WRAP PL_OP_SLAB_ALLOC USE_ITHREADS USE_LARGE_FILES USE_PERLIO USE_SITECUSTOMIZE Locally applied patches: ActivePerl Build 1004 [287188] 33741 avoids segfaults invoking S_raise_signal() (on Linux) 33763 Win32 process ids can have more than 16 bits 32809 Load 'loadable object' with non-default file extension 32728 64-bit fix for Time::Local Built under MSWin32 Compiled at Sep 3 2008 13:16:37 @INC: C:/Perl/site/lib C:/Perl/lib .
I'm not sure as to how i can look closer here. Suggestions as to what tests i can run are welcome. Meanwhile, here's snapshots of both, done with Procmon and NYTProf: http://drop.io/perl_performance

Replies are listed 'Best First'.
Re^5: Performance oddity when splitting a huge file into an AoA
by BrowserUk (Patriarch) on May 06, 2009 at 08:27 UTC
    Suggestions as to what tests i can run are welcome.

    The profiling you've done doesn't get into enough detail in the critical areas.

    The first thing I would try, is isolating whether the extra time is spent reading from the file or shuffling memory. To that end, I'd see what happens to the timings if I just read the data but didn't store it:

    #! perl -slw #use 5.010; use strict; use Time::HiRes qw[ time ];; sub x{ open my $fh, '<', shift or die $!; # my @AoA; my $dummy = [ split ',' ] while <$fh>; close $fh; return $.; } for ( 1 .. 5 ) { my $start = time; printf "Records: %d in %.3f seconds\n", x( sprintf 'junk%d.dat', 1+ ($_ & 1) ), time() - $start; }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Alright, broke the script up a bit and ran 3 different benchmarks in ActivePerl, Cygwin and Strawberry Perl. Here's the results: http://drop.io/perl_performance/asset/ap-vs-cw-vs-sb-rar

      It's really weird. If it pushes the data into the AoA, it takes a long time on the splitting. However if it doesn't push, then the splits go fast.

        Hm. I would have taken a look, but when I try to unrar your archive, there are mutliple copies of files all with the same name and no path information, so each overwrites the last.

        (I have to say that all those (2.25 MB of) htmls, pngs & csss seem like overkill as a way of presenting the same information that could be conveyed in three short text files. And with the latter, I could manipulate the numbers programmically rather than having to constantly chase my tail around several dozen html pages. )


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        (Feel free to ignore since I just started reading this thread.) The link requires too much work (possiblly to allow drop.io domain to run JavaScript, followed by taking a quiz) to see the results.