improve performance

swissknife has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: improve performance by Corion (Patriarch) on Jun 08, 2015 at 09:45 UTC
Whenever you want a fast lookup, try to use a hash. This will only work if your elements exist as single lines instead of actual substring matches. Also note that if `$strfile` contains regex meta characters (like `*`, `.` or `+`, or `[]`) your code will not work as you might think. Using a hash would result in: `my %lookup = map { $_ => 1 } @arraytocompare; foreach my $strfile (@tempfiles) { if( $lookup{ $strfile } ) { push (@newarray, $strfile); }; };` [download] But I really, really doubt that searching through 8000 array entries will slow your program down that much. Are you certain that this is where your performance bottleneck is?	[reply] [d/l] [select]
Re^2: improve performance by swissknife (Sexton) on Jun 08, 2015 at 10:03 UTC
Thanks Corion. I added few prints and used executed the script using -d option. which clearly shows that this is where the performance bottelneck is. you said if $strfile contains regex meta charachters my code will not will not work as i might have thought... file name has "." before the extension of file. is this statement is valid even if i use hash?	[reply]
Re^3: improve performance by marto (Cardinal) on Jun 08, 2015 at 10:25 UTC
You may want to take a look at Devel::NYTProf for profiling your code.	[reply]
Re^2: improve performance by swissknife (Sexton) on Jun 08, 2015 at 10:41 UTC
I updated the codes with your suggestion which is really faster but does not give the same result as the grep. i find @newarray empty where as it is not. did you consider the NOT operator ! in my original code?	[reply]
Re^3: improve performance by Corion (Patriarch) on Jun 08, 2015 at 10:43 UTC
No, I did not consider it, but you can consider it in your code.	[reply]
Re^4: improve performance by swissknife (Sexton) on Jun 08, 2015 at 12:18 UTC
Re^5: improve performance by Corion (Patriarch) on Jun 08, 2015 at 12:21 UTC
Re^5: improve performance by Anonymous Monk on Jun 08, 2015 at 12:41 UTC
Re: improve performance by pme (Monsignor) on Jun 08, 2015 at 09:53 UTC
You can transform @arraytocompare to a hash as you can see below. Hash key lookup is very effective. `my $hashtocompare{$_}++ for (@arraytocompare); foreach my $strfile (@tempfiles) { push @newarray, $strfile unless exists $hashtocompare{$strfile}; }` [download]	[reply] [d/l]
Re: improve performance by GotToBTru (Prior) on Jun 08, 2015 at 16:34 UTC
Consider if you can use the none function from List::Util for operations like (! grep ). grep will check the entire list to find all the values that match, but in this case, once you have found a match you might as well stop. Some of the List::Util functions, like first or any or none, will short circuit. In this example, the variable $j shows how many times the loop body is executed. `use strict; use warnings; use List::Util qw(none); my @list = 1..100; my $j = 0; if (! grep { $j++; $_ > 10 } @list ) { print 'Didn\'t find any values above 10 ... ' } else { print 'Found some values above 10 ... ' } print "but I had to look at $j values to be sure.\n"; $j = 0; if (none { $j++; $_ > 10 } @list) { print 'Didn\'t find any values above 10 ... ' } else { print 'Found some values above 10 ... ' } print "but I had to look at $j values to be sure.\n";` [download] Output: `Found some values above 10 ... but I had to look at 100 values to be s +ure. Found some values above 10 ... but I had to look at 11 values to be su +re.` [download] Update: forgot to copy the code that actually produces that output! Dum Spiro Spero	[reply] [d/l] [select]
Re: improve performance by Anonymous Monk on Jun 08, 2015 at 09:46 UTC
I suspect that your use of `$strfile` in a regex may not be the best way to go about it, since it would make more sense if you were looking for exact matches, e.g. `grep {$_ eq $strfile} @arraytocompare` If you know the filenames are unique (usually a fairly safe bet), you can use a hash instead of `@arraytocompare`, or convert it to one with something like `my %tocompare = map {$_=>1} @arraytocompare;`, and then test against that via `if (!$tocompare{$strfile}) ...`.	[reply] [d/l] [select]
Re: improve performance by marioroy (Prior) on Jun 12, 2015 at 13:23 UTC
Update: Added simulation. Update: Changed to ! exists ... Looping and grep'ing for each temp file is likely expensive. This populates @newarray with new files only. `# this hash has list (keys) of all the files previously processed my %processed = map { $_ => 1 } @arraytocompare; my $path = "/tmp/testpatch"; opendir DIR, $path or die $!; # in this array i get all the new files which i need to process now my @newarray = map { ! exists $processed{$_} ? $_ : () } readdir DIR; closedir DIR;` [download] The above is simulated below and outputs 8000 taking just a fraction of a second to complete. `# this hash has list (keys) of all the files previously processed my %processed = map { $_ => 1 } 100001 .. 108000; # in this array i get all the new files which i need to process now my @newarray = map { ! exists $processed{$_} ? $_ : () } 108001 .. 116 +000; print scalar @newarray, "\n";` [download]	[reply] [d/l] [select]