Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks...

I am a mere Novice, but having great fun learning this language. I am currently processing some large text files....

Pseudo Code reads

While count < nRecords { Read in File Add File to List increatemt count } Sort List Dump List to New File
I am noticing a slow down of adding record when the size of the main list is 1.5 Million records; The list processing then speeds up when 2.3 Million are reached, and seems quite happy until 7 Million, when my server runs out of memory.

Is the performance a Perl thing, or Windows tryng to scavnge memory prior to using its swap files ?

Replies are listed 'Best First'.
Re: Large Array Performance
by shmem (Chancellor) on Jun 16, 2007 at 10:08 UTC
    Pseudo Code reads ...

    You could post real code here (between <code></code> tags), we might have hints for you.

    I guess the slowdown is due to OS latency giving out more memory to perl, since there's nothing in the language which would inherently slow down the allocation process between 1.5 and 2.3 millions of records.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Large Array Performance
by GrandFather (Saint) on Jun 16, 2007 at 10:16 UTC

    I can't help with a reason for the change in performance, but you may realise better over all performance by sorting blocks of a million or so records into temporary files, then merge the files together to generate a single large sorted file.


    DWIM is Perl's answer to Gödel
      and some module as Sort::External can do all this process transparently.
Re: Large Array Performance
by swampyankee (Parson) on Jun 16, 2007 at 17:55 UTC

    I'm thinking that the performance issues you're seeing are, at least partly, the result of a less-than optimal design. Have you considered using something that removes the necessity of keeping several million records in memory, such as using a tied hash, a tied array, a tied file or a database?

    emc

    Any New York City or Connecticut area jobs? I'm currently unemployed.

    There are some enterprises in which a careful disorderliness is the true method.

    —Herman Melville