in reply to Faster push and shift

If reading the file WITHOUT doing significant processing takes already 10 minutes, this must be a tremendously huge file, and maybe you should think your algorithm all over. However, from your code it is not clear what you are wanting to do. You are populating @a and @b, but never use them anywhere.

There is one more thing which puzzles me: Usually, input/output time dominates processing time, unless you do a lot of processing for your input records. You keep your arrays small (@a and @b never grow larger than 6 elements), so the operation on the arrays doesn't impact the run-time significantly. The regexp also is simple enough that I don't think that the processing time of this regexp could be 15 times the reading-time of a record. I would conclude, that the processing time is not lost by your coding, but to be sure, you could insert timer calls in your loop.

-- 
Ronald Fischer <ynnor@mm.st>

Replies are listed 'Best First'.
Re^2: Faster push and shift
by BrowserUk (Patriarch) on Feb 16, 2012 at 12:07 UTC

    Actually, each of those active lines costs substantially. This is just 1M records matching the OPs data as simply as possible:

    c:\test>junk91 junk.dat Bare loop ## baseline 0.322973012924194 0.358 0.078 0 0 c:\test>junk91 junk.dat ## +400% Add back: regex; 1.40799999237061 1.466 0.046 0 0 c:\test>junk91 junk.dat ## +80% Add back: regex; first push; 1.67599987983704 1.731 0.046 0 0 c:\test>junk91 junk.dat ## +400% Add back: regex; first push; second push; 2.94299983978271 2.932 0.124 0 0 c:\test>junk91 junk.dat ## +150% Add back: regex; first push; second push; shifts 3.2759997844696 3.369 0.015 0 0

    The explanation is that no matter how little time something takes, if you do it a million times, it adds up.

    In the OPs case, where he must be processing somewhere in the region of 2 or 3 billion records, it adds up big.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Indeed ...

      And I find it interesting, that reading the data does not use that much time, compared to the other operations. I wouldn't have expected this, even if we take buffering into account.

      -- 
      Ronald Fischer <ynnor@mm.st>

        Quite obviously, BrowserUK very routinely processes gigantic datasets during the course of his work day.   He is quite the expert on those (what are to many of us...) edge cases.   Upvoted.