Which one is best for optimization

luckypower has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Which one is best for optimization by moritz (Cardinal) on Sep 29, 2008 at 09:20 UTC
Why don't you just try it? Such questions usually can't be answered in general, because the performance characteristics depend on the nature and size of your data. So use Benchmark and Devel::Size (or similar modules) to find out what's best for you. (Update: with `cmp` you compare with string semantics, are you sure you want that?)	[reply] [d/l]
Re: Which one is best for optimization by grinder (Bishop) on Sep 29, 2008 at 09:42 UTC
Solution 1 requires only an iterator to keep track of where you are in the hash. You will visit each entry once, and once only. Solution 2 requires sorting the list, so at a minimum you are going to visit each entry anyway, and the sort algorithm that takes O(n) time in the general case has yet to be invented. This will take more time, no matter what, and worse, it may potentially all for naught if there are no files that are larger than your limit. The only sane approach is solution 1. • another intruder with the mooring in the heart of the Perl	[reply]
Re^2: Which one is best for optimization by salva (Canon) on Sep 29, 2008 at 09:50 UTC
and the sort algorithm that takes O(n) time in the general case has yet to be invented Well, but for this particular case, sorting numbers, it has already been invented!, it is called radix sort and there are at least two Perl implementations available from CPAN: Sort::Radix (pure perl) and Sort::Key::Radix (XS).	[reply]
Re^3: Which one is best for optimization by grinder (Bishop) on Sep 29, 2008 at 15:34 UTC
The whole point is that you don't need to sort! It's a net loss whichever way you cut it. • another intruder with the mooring in the heart of the Perl	[reply]
Re: Which one is best for optimization by ysth (Canon) on Sep 29, 2008 at 09:48 UTC
When you say "Delete the file", do you mean actually delete a file, or delete the file from your hash? If the former, use unlink, not delete. In Perl, "break" is called last. But it doesn't seem like you'd want to break out of your foreach loop. -- Online Fortune Cookie Search Office Space merchandise	[reply]
Re: Which one is best for optimization by salva (Canon) on Sep 29, 2008 at 09:41 UTC
Sorting is an expensive operation with O(NlogN) complexity while iterating over the data is just O(N). If you were sorting without a custom comparison block, the sort solution could be faster for small data sets, but as this is not the case... and anyway, you can `grep`: `for my $file (grep { $hash{$_}{Size} > $sizeLimit } @path) { delete_file($file); }` [download]	[reply] [d/l] [select]
Re: Which one is best for optimization by mscharrer (Hermit) on Sep 29, 2008 at 09:45 UTC
I would try to avoid self written loops and use Perl builtins like ~~`map`~~ `grep` for it if has to be efficient: (untested) `#my @files2del = map { ($hash{$_}->{Size} > $sizelimit ? $_ : () } key +s %hash; my @files2del = grep { $hash{$_}->{Size} > $sizelimit } keys %hash; delete @hash{@files2del}; # deletes paths from hash unlink @files2del; # deletes files from disc` [download] Also: Like moritz already pointed out: `cmp` isn't what you want but `<=>`. Keep your hash key names all lowercase as long you don't have a good reason to do otherwise. Update: Like salsa wrote (while I wrote my post): the use of `grep` instead of `map` is better in this case. See perlfunc for both. Update 2: Thanks moritz, I always appreciate all feedback. In this case you were just replying faster than I was updating. :-)	[reply] [d/l] [select]
Re^2: Which one is best for optimization by moritz (Cardinal) on Sep 29, 2008 at 09:48 UTC
`my @files2del = map { ($hash{$_}->{Size} > $sizelimit ? $_ : () } keys + %hash;` [download] The perlish way to write such a map involves grep ;-) `my @files2del = grep { ($hash{$_}->{Size} > $sizelimit } keys %hash;`	[reply] [d/l] [select]
Re: Which one is best for optimization by Anonymous Monk on Sep 29, 2008 at 09:48 UTC
See Perl.com: Optimizing Your Perl. "Best" can vary by input, easiest to just test it, but 2 definitively uses more memory :)	[reply]
Re: Which one is best for optimization by indragoby (Initiate) on Sep 29, 2008 at 09:33 UTC
Hi, I hope that you could make use of the module Benchmark. which helps you for the same.	[reply]
Re: Which one is best for optimization by Anonymous Monk on Sep 29, 2008 at 09:40 UTC
$ perldoc -f break No documentation for perl function `break' found $ perldoc -q break No documentation for perl FAQ keyword `break' found [download]	[reply] [d/l]