Faster push and shift

baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Faster push and shift by moritz (Cardinal) on Feb 16, 2012 at 10:05 UTC
I could imagine that the regex match is much slower than the push and shift. Use Devel::NYTProf to find out. Also it is wrong to use $1 without checking if the regex succeeded. (And no, I don't think there are much faster options. shift and push are appropriately optimized. But if you tell us how your input data looks like, and what output you want, maybe there is a faster option to achieve the same thing with a different algorithm). Perl 6 - second systems done right	[reply]
Re: Faster push and shift by BrowserUk (Patriarch) on Feb 16, 2012 at 12:43 UTC
Try this. It should complete much more quickly unless all the required records are at the beginning of the file: `#! perl -slw use strict; use Data::Dump qw[ pp ]; use Time::HiRes qw[ time ]; use File::ReadBackwards; tie *FH, 'File::ReadBackwards', $ARGV[0] or die $!; my $start = time; my( $last, @a, @b ); while( <FH> ) { /\d+\t(\d+)/; if( $1 > 3 ) { unshift @b, $1; unshift @a, $last; last if @b == 7; } $last = $1; } print time() - $start; print join ' ',times; pp \@a, \@b;` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re^2: Faster push and shift by rovf (Priest) on Feb 16, 2012 at 14:05 UTC
I don't think your solution is equivalent to the original program. Your idea is to stop reading the file as soon as @b "has enough elements", i.e. you are interested in the first set of occurances of "suitable elements" in the file. The OP, however, throws out elements from @b (he is basically treating @b as a queue, where he pushes from one end and shifts from the other), so he is interested in the last set of occurances. That's why he has to process the whole file always. -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l]
Re^3: Faster push and shift by choroba (Cardinal) on Feb 16, 2012 at 14:18 UTC
That's why `use File::ReadBackwards;`	[reply] [d/l]
Re^4: Faster push and shift by rovf (Priest) on Feb 16, 2012 at 15:08 UTC
Re^2: Faster push and shift by locked_user sundialsvc4 (Abbot) on Feb 16, 2012 at 13:57 UTC
What’s the difference here, BrowserUK? (Really, that’s quite the sincere question.) Kindly enlighten us all: What is the key change from the OP that will make it faster, and how much faster do you predict it now will be? ... And under what governing assumptions?
Re^3: Faster push and shift by BrowserUk (Patriarch) on Feb 16, 2012 at 14:49 UTC
how much faster do you predict it now will be? Doing it the OPs way on 1 million records takes 3.3 seconds: `c:\test>junk91 junk.dat 3.25699996948242 ([1, 3, 4, 4, 1, 3, 2], [4, 4, 4, 4, 4, 4, 4])` [download] Same file doing it my way produces the same results in 1/2 a millisecond: `c:\test>junk91-2 junk.dat 0.000479936599731445 ([1, 3, 4, 4, 1, 3, 2], [4, 4, 4, 4, 4, 4, 4])` [download] I make that 6,600 times faster. The OPs mileage my vary. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re: Faster push and shift by rovf (Priest) on Feb 16, 2012 at 11:24 UTC
If reading the file WITHOUT doing significant processing takes already 10 minutes, this must be a tremendously huge file, and maybe you should think your algorithm all over. However, from your code it is not clear what you are wanting to do. You are populating @a and @b, but never use them anywhere. There is one more thing which puzzles me: Usually, input/output time dominates processing time, unless you do a lot of processing for your input records. You keep your arrays small (@a and @b never grow larger than 6 elements), so the operation on the arrays doesn't impact the run-time significantly. The regexp also is simple enough that I don't think that the processing time of this regexp could be 15 times the reading-time of a record. I would conclude, that the processing time is not lost by your coding, but to be sure, you could insert timer calls in your loop. -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l]
Re^2: Faster push and shift by BrowserUk (Patriarch) on Feb 16, 2012 at 12:07 UTC
Actually, each of those active lines costs substantially. This is just 1M records matching the OPs data as simply as possible: `c:\test>junk91 junk.dat Bare loop ## baseline 0.322973012924194 0.358 0.078 0 0 c:\test>junk91 junk.dat ## +400% Add back: regex; 1.40799999237061 1.466 0.046 0 0 c:\test>junk91 junk.dat ## +80% Add back: regex; first push; 1.67599987983704 1.731 0.046 0 0 c:\test>junk91 junk.dat ## +400% Add back: regex; first push; second push; 2.94299983978271 2.932 0.124 0 0 c:\test>junk91 junk.dat ## +150% Add back: regex; first push; second push; shifts 3.2759997844696 3.369 0.015 0 0` [download] The explanation is that no matter how little time something takes, if you do it a million times, it adds up. In the OPs case, where he must be processing somewhere in the region of 2 or 3 billion records, it adds up big. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re^3: Faster push and shift by rovf (Priest) on Feb 16, 2012 at 12:29 UTC
Indeed ... And I find it interesting, that reading the data does not use that much time, compared to the other operations. I wouldn't have expected this, even if we take buffering into account. -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l]
Re^4: Faster push and shift by locked_user sundialsvc4 (Abbot) on Feb 16, 2012 at 13:53 UTC
Re: Faster push and shift by RichardK (Parson) on Feb 16, 2012 at 13:05 UTC
You could drop the chomp as you don't need it, your regex doesn't care about the line ending. As BrowserUk points out doing anything millions of times adds up, so it might reduce your runtime a little. Also, as you don't use the first value, you might be able to write your regex as just `/\t(\d+)/` , assuming that your file is completely regular and doesn't have any odd lines or comments in it ;)	[reply] [d/l]