luckypower has asked for the wisdom of the Perl Monks concerning the following question:

hellow,

i want to optimize my perl script.

in my script i have one hash of hashes
%hash{$path}= { Date => $date, Size => $size };

now i want to delete files($path) which size > sizelimit.

solution 1)
-traverse the hash
-compare the size with sizelimit
-Delete the file

solution 2)
-get the array of keys by sorting the hash accorting to size.
-put one condition and break the loop.
e.g
my @sort = sort {$hash{$a}->{"Size"} cmp $hash{$b}->{"Size"}} keys %ha +sh; for each $file (@path) { if ($hash{$file}->{"Size"} > $sizelimit) { delete $file } else { break; } }
so which one is best optimizaion(both in memory & speed)??

Replies are listed 'Best First'.
Re: Which one is best for optimization
by moritz (Cardinal) on Sep 29, 2008 at 09:20 UTC
    Why don't you just try it? Such questions usually can't be answered in general, because the performance characteristics depend on the nature and size of your data.

    So use Benchmark and Devel::Size (or similar modules) to find out what's best for you.

    (Update: with cmp you compare with string semantics, are you sure you want that?)

Re: Which one is best for optimization
by grinder (Bishop) on Sep 29, 2008 at 09:42 UTC

    Solution 1 requires only an iterator to keep track of where you are in the hash. You will visit each entry once, and once only.

    Solution 2 requires sorting the list, so at a minimum you are going to visit each entry anyway, and the sort algorithm that takes O(n) time in the general case has yet to be invented. This will take more time, no matter what, and worse, it may potentially all for naught if there are no files that are larger than your limit.

    The only sane approach is solution 1.

    • another intruder with the mooring in the heart of the Perl

      and the sort algorithm that takes O(n) time in the general case has yet to be invented
      Well, but for this particular case, sorting numbers, it has already been invented!, it is called radix sort and there are at least two Perl implementations available from CPAN: Sort::Radix (pure perl) and Sort::Key::Radix (XS).

        The whole point is that you don't need to sort! It's a net loss whichever way you cut it.

        • another intruder with the mooring in the heart of the Perl

Re: Which one is best for optimization
by ysth (Canon) on Sep 29, 2008 at 09:48 UTC
Re: Which one is best for optimization
by salva (Canon) on Sep 29, 2008 at 09:41 UTC
    Sorting is an expensive operation with O(NlogN) complexity while iterating over the data is just O(N).

    If you were sorting without a custom comparison block, the sort solution could be faster for small data sets, but as this is not the case... and anyway, you can grep:

    for my $file (grep { $hash{$_}{Size} > $sizeLimit } @path) { delete_file($file); }
Re: Which one is best for optimization
by mscharrer (Hermit) on Sep 29, 2008 at 09:45 UTC
    I would try to avoid self written loops and use Perl builtins like map grep for it if has to be efficient:

    (untested)

    #my @files2del = map { ($hash{$_}->{Size} > $sizelimit ? $_ : () } key +s %hash; my @files2del = grep { $hash{$_}->{Size} > $sizelimit } keys %hash; delete @hash{@files2del}; # deletes paths from hash unlink @files2del; # deletes files from disc
    Also:
    Like moritz already pointed out: cmp isn't what you want but <=>.
    Keep your hash key names all lowercase as long you don't have a good reason to do otherwise.

    Update: Like salsa wrote (while I wrote my post): the use of grep instead of map is better in this case. See perlfunc for both.
    Update 2: Thanks moritz, I always appreciate all feedback. In this case you were just replying faster than I was updating. :-)

      my @files2del = map { ($hash{$_}->{Size} > $sizelimit ? $_ : () } keys + %hash;

      The perlish way to write such a map involves grep ;-) my @files2del = grep { ($hash{$_}->{Size} > $sizelimit  } keys %hash;

Re: Which one is best for optimization
by Anonymous Monk on Sep 29, 2008 at 09:48 UTC
Re: Which one is best for optimization
by indragoby (Initiate) on Sep 29, 2008 at 09:33 UTC
    Hi,

    I hope that you could make use of the module Benchmark. which helps you for the same.
Re: Which one is best for optimization
by Anonymous Monk on Sep 29, 2008 at 09:40 UTC
    $ perldoc -f break No documentation for perl function `break' found $ perldoc -q break No documentation for perl FAQ keyword `break' found