Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

i am doing sorting on array using sort function available in perl
opendir( DIR, '.' )||die "Can't open $dir"; my(@filenames)=readdir ( DIR ); closedir ( DIR ); @filenames=sort {$a cmp $b} @filenames;
i want to optimize this piece of code. is there some way i can do this

Code tags added by Arunbear

Replies are listed 'Best First'.
Re: how to do effective sort in perl
by Fletch (Bishop) on May 03, 2006 at 13:28 UTC

    Well, aside from calling my @filenames = sort readdir( DIR ) in one step (since by default sort uses effectively the comparitor you've shown, and that copies the results around one less time) what more do you think could be optimized? Doesn't get much more bare bones than that. If you're not interested in every filename that readdir is going to return you might stick a grep in between there, but there's not really much more to be pared down.

Re: how to do effective sort in perl
by Polonius (Friar) on May 03, 2006 at 15:15 UTC

    If it's an efficient sort algorithm you're after, you should note that "In 5.7, the quicksort implementation was replaced with a stable mergesort algorithm whose worst-case behavior is O(NlogN)." (perldoc). According to Numerical Recipes, "For the basic task of sorting N elements, the best algorithms require on the order of several times N log2 N operations." That means that mergesort is just about as efficient as you can get.

    Incidentally, Numerical Recipes is an astonishingly good book. Knuth may be the definitive reference, and nag may have many algorithms fine-tuned to specific cases, but NR is exceptionally good value for money. You can even download free source code in FORTRAN or C. If you're new to Perl, I'd see the lack of a Perl version as a good thing. I think it's a useful exercise in learning a language to try translating into it.

    Update: Of course, all of this is overkill for the number of items likely to be found in any sensible directory!

    Polonius
      Of course, all of this is overkill for the number of items likely to be found in any sensible directory!

      heh. I've actually had to scold some of my co-workers, telling them they really should not be putting 300,000 files in a single directory, and there are easy enough ways to avoid this... Even 50,000 seems like too much to me (but then, I learned programming on a pdp 11-10, and my sense of scale may be a bit old-fashioned).

        heh. I've actually had to scold some of my co-workers, telling them they really should not be putting 300,000 files in a single directory, and there are easy enough ways to avoid this...

        Anything more than a screenful seems excessive to me! But then, I grew up on a Data General S-250, IIRC (Think PDP 11-70 - that sorta era, but less successful.)

        Polonius

        Been a while since I have referred anyone to this, but it is a pretty good introductory discussion <shamelessplug>(and response)</shamelessplug> on why some directory structures slow down after a certain number of documents are included. Not necessarily useful for those Old System Administrators1.

        1 - OSA's never die, they just put down roots.

        --MidLifeXis

Re: how to do effective sort in perl
by BrowserUk (Patriarch) on May 03, 2006 at 14:51 UTC

    On my OS I'd do

    my @files = glob '*';

    as glob always produces a sorted list on my system. I'm not sure if that is true on other systems?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      On some OSs that doesn't include hidden files:

      $ perl -le 'opendir my $fd, "."; my @files = readdir $fd; closedir $fd +; print scalar @files' 48 $ perl -le 'my @files = <*>; print scalar @files' 7 $ _

      But including them is not difficult:

      $ perl -le 'my @files = (<*>,<.*>); print scalar @files' 48 $ _

      It seems that readdir is faster, though:

      use Benchmark qw(:all) ; cmpthese(20000, { readdir => sub { opendir my $fd, "."; my @files = readdir $fd; close +dir $fd; }, glob => sub { my @files = (<*>,<.*>); } }); __END__ Rate glob readdir glob 3170/s -- -75% readdir 12739/s 302% --

      Update: Of course glob is slower, since in this computer it also sorts the result.

      --
      David Serrano

Re: how to do effective sort in perl
by blazar (Canon) on May 03, 2006 at 13:37 UTC

    sort with no explicit comparison is already optimized away and does the same as what you've shown, so just remove the block. Oh, and BTW: you probably already knew, but "premature optimization is the root of all evil".

Re: how to do effective sort in perl
by duff (Parson) on May 03, 2006 at 14:44 UTC

    You didn't say what you wanted to optimize, so here's another take on it:

    my @filenames = do { opendir my $d, "." or die; sort readdir $d };
    Note that this only works on modern perls.
Re: how to do effective sort in perl
by QM (Parson) on May 03, 2006 at 19:35 UTC
    As other have alluded, there are several paths to optimization.

    Do you want it to run faster?

    Do you want to use less code?

    Do you want it to be more readable, maintainable, portable, or more robust?

    glob or similar will be less code. What you've given (other than the lack of codetags) is almost as good as you can do for balancing all of the criteria above.

    Incidentally, IIRC Perl has builtin optimizations for the following sort subs:

    {$a cmp $b} # default {$b cmp $a} {$a <=> $b} {$b <=> $a}
    These are the most common. So using any one of these should be indistinguishable from the default (no sort sub) case.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: how to do effective sort in perl
by smokemachine (Hermit) on May 03, 2006 at 20:59 UTC
    perl -e '@filenames = sort <*>'