RE: List non-matching files

Replies are listed 'Best First'.
RE (tilly) 2: List non-matching files by tilly (Archbishop) on Aug 19, 2000 at 01:12 UTC
Well since you bring it up... :-) tilly's patch indeed. One. Very definitely only. But I do have a contribution to the core that apparently will get in over serious (and legitimate) aesthetic complaints by Sarathy: `--- pp_ctl.c.bak Tue Mar 21 00:42:26 2000 +++ pp_ctl.c Tue Apr 25 04:03:36 2000 @@ -736,6 +736,8 @@ if (diff > PL_markstack_ptr[-1] - PL_markstack_ptr[-2]) { shift = diff - (PL_markstack_ptr[-1] - PL_markstack_ptr[-2 +]); count = (SP - PL_stack_base) - PL_markstack_ptr[-1] + 2; + if (shift < count) + shift = count; /* Avoid shifting too often */ EXTEND(SP,shift); src = SP;` [download] Not much, eh? But this has the effect of making a 1-2 map of n things go from being O(n^2) to O(n). Specifically without the patch it is very slow to do this: `%hash = map {$_, 1} 1..100000;` [download] and after the patch this is fast. :-) Any 1-1 map (eg Schwartzian sort) is fast both before and after. It only matters when you get back more elements than you put in.	[reply] [d/l] [select]
RE: RE: List non-matching files by fundflow (Chaplain) on Aug 19, 2000 at 00:17 UTC
Can you explain this? Is `for(@ARGV) { $f{$_}=1 };` better? (it has the advantage of being clearer for non-perl programmers)	[reply] [d/l]
RE: RE: RE: List non-matching files by tye (Sage) on Aug 19, 2000 at 00:30 UTC
IMHO, yes. The `for` version doesn't build a (possibly huge) list of the form ("x",1,"y",1,...) and then populate the hash; it just populates the hash. I suspect my "slice" version is the fastest... and: `use Benchmark; @arr=('file.html')x1024; timethese( -3, { 'Slice' => '%x=();@x{@arr}=(1)x@arr', 'For' => '%x=();for(@arr){$x{$_}=1}', 'Map' => '%x=map{$_=>1}@arr' } ); __END__ Benchmark: running For, Map, Slice, each for at least 3 CPU seconds... For: 3 secs (3.13 usr + 0.00 sys = 3.13 CPU) @ 707.72/s (n=2218) Map: 3 secs (3.14 usr + 0.00 sys = 3.14 CPU) @ 96.98/s (n=305) Slice: 4 secs (3.14 usr + 0.00 sys = 3.14 CPU) @ 1074.72/s (n=3380)` [download] ...it is (for at least one case). So "slice" is 50% faster than "for" which is much faster than "map" (with tilly's patch, "map" will probably still be slower than the others, but not nearly that slow). - tye (but my friends call me "Tye")	[reply] [d/l] [select]
RE: RE: RE: RE: List non-matching files by fundflow (Chaplain) on Aug 19, 2000 at 00:50 UTC
Cool, code was updated. BTW, i was suspecting that your benchmark was wrong since you used the same value in the hash (file.html). A quick check showed that it doesn't matter much. Interesting. This is your code with @arr replaced with @ARGV. The directory has 336 files. (Also, -3 didn't work for me here) `>./benchnom * Benchmark: timing 5000 iterations of For, Map, Slice... For: 10 secs ( 9.94 usr 0.00 sys = 9.94 cpu) Map: 21 secs (21.26 usr 0.00 sys = 21.26 cpu) Slice: 8 secs ( 7.27 usr 0.00 sys = 7.27 cpu) 38.57u 0.05s 0:38.88 99.3% > ls -l \| wc -l 336 >perl -v This is perl, version 5.004_04 built for sun4-solaris` [download] Similar results were for perl 5.5	[reply] [d/l]