in reply to List non-matching files

You might want to populate the hash via:

@f{@ARGV}= (1)x@ARGV;

especially until Perl 5.6.1 (which includes tilly's patch) because the map could be noticeably slow if excluding a large number of files.

        - tye (but my friends call me "Tye")

Replies are listed 'Best First'.
RE (tilly) 2: List non-matching files
by tilly (Archbishop) on Aug 19, 2000 at 01:12 UTC
    Well since you bring it up... :-)

    tilly's patch indeed. One. Very definitely only. But I do have a contribution to the core that apparently will get in over serious (and legitimate) aesthetic complaints by Sarathy:

    --- pp_ctl.c.bak Tue Mar 21 00:42:26 2000 +++ pp_ctl.c Tue Apr 25 04:03:36 2000 @@ -736,6 +736,8 @@ if (diff > PL_markstack_ptr[-1] - PL_markstack_ptr[-2]) { shift = diff - (PL_markstack_ptr[-1] - PL_markstack_ptr[-2 +]); count = (SP - PL_stack_base) - PL_markstack_ptr[-1] + 2; + if (shift < count) + shift = count; /* Avoid shifting too often */ EXTEND(SP,shift); src = SP;
    Not much, eh? But this has the effect of making a 1-2 map of n things go from being O(n^2) to O(n). Specifically without the patch it is very slow to do this:
    %hash = map {$_, 1} 1..100000;
    and after the patch this is fast. :-)

    Any 1-1 map (eg Schwartzian sort) is fast both before and after. It only matters when you get back more elements than you put in.

RE: RE: List non-matching files
by fundflow (Chaplain) on Aug 19, 2000 at 00:17 UTC
    Can you explain this?

    Is for(@ARGV) { $f{$_}=1 }; better?
    (it has the advantage of being clearer for non-perl programmers)

      IMHO, yes. The for version doesn't build a (possibly huge) list of the form ("x",1,"y",1,...) and then populate the hash; it just populates the hash. I suspect my "slice" version is the fastest... and:

      use Benchmark; @arr=('file.html')x1024; timethese( -3, { 'Slice' => '%x=();@x{@arr}=(1)x@arr', 'For' => '%x=();for(@arr){$x{$_}=1}', 'Map' => '%x=map{$_=>1}@arr' } ); __END__ Benchmark: running For, Map, Slice, each for at least 3 CPU seconds... For: 3 secs (3.13 usr + 0.00 sys = 3.13 CPU) @ 707.72/s (n=2218) Map: 3 secs (3.14 usr + 0.00 sys = 3.14 CPU) @ 96.98/s (n=305) Slice: 4 secs (3.14 usr + 0.00 sys = 3.14 CPU) @ 1074.72/s (n=3380)

      ...it is (for at least one case). So "slice" is 50% faster than "for" which is much faster than "map" (with tilly's patch, "map" will probably still be slower than the others, but not nearly that slow).

              - tye (but my friends call me "Tye")
        Cool, code was updated.

        BTW, i was suspecting that your benchmark was wrong since you used the same value in the hash (file.html). A quick check showed that it doesn't matter much. Interesting.

        This is your code with @arr replaced with @ARGV. The directory has 336 files. (Also, -3 didn't work for me here)
        >./benchnom * Benchmark: timing 5000 iterations of For, Map, Slice... For: 10 secs ( 9.94 usr 0.00 sys = 9.94 cpu) Map: 21 secs (21.26 usr 0.00 sys = 21.26 cpu) Slice: 8 secs ( 7.27 usr 0.00 sys = 7.27 cpu) 38.57u 0.05s 0:38.88 99.3% > ls -l | wc -l 336 >perl -v This is perl, version 5.004_04 built for sun4-solaris
        Similar results were for perl 5.5