Re: Faster push and shift

If reading the file WITHOUT doing significant processing takes already 10 minutes, this must be a tremendously huge file, and maybe you should think your algorithm all over. However, from your code it is not clear what you are wanting to do. You are populating @a and @b, but never use them anywhere.

There is one more thing which puzzles me: Usually, input/output time dominates processing time, unless you do a lot of processing for your input records. You keep your arrays small (@a and @b never grow larger than 6 elements), so the operation on the arrays doesn't impact the run-time significantly. The regexp also is simple enough that I don't think that the processing time of this regexp could be 15 times the reading-time of a record. I would conclude, that the processing time is not lost by your coding, but to be sure, you could insert timer calls in your loop.

--
Ronald Fischer <ynnor@mm.st>

Comment on Re: Faster push and shift Download Code

Replies are listed 'Best First'.
Re^2: Faster push and shift by BrowserUk (Patriarch) on Feb 16, 2012 at 12:07 UTC
Actually, each of those active lines costs substantially. This is just 1M records matching the OPs data as simply as possible: `c:\test>junk91 junk.dat Bare loop ## baseline 0.322973012924194 0.358 0.078 0 0 c:\test>junk91 junk.dat ## +400% Add back: regex; 1.40799999237061 1.466 0.046 0 0 c:\test>junk91 junk.dat ## +80% Add back: regex; first push; 1.67599987983704 1.731 0.046 0 0 c:\test>junk91 junk.dat ## +400% Add back: regex; first push; second push; 2.94299983978271 2.932 0.124 0 0 c:\test>junk91 junk.dat ## +150% Add back: regex; first push; second push; shifts 3.2759997844696 3.369 0.015 0 0` [download] The explanation is that no matter how little time something takes, if you do it a million times, it adds up. In the OPs case, where he must be processing somewhere in the region of 2 or 3 billion records, it adds up big. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re^3: Faster push and shift by rovf (Priest) on Feb 16, 2012 at 12:29 UTC
Indeed ... And I find it interesting, that reading the data does not use that much time, compared to the other operations. I wouldn't have expected this, even if we take buffering into account. -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l]
Re^4: Faster push and shift by locked_user sundialsvc4 (Abbot) on Feb 16, 2012 at 13:53 UTC
Quite obviously, BrowserUK very routinely processes gigantic datasets during the course of his work day. He is quite the expert on those (what are to many of us...) edge cases. Upvoted.