List non-matching files

Replies are listed 'Best First'.
RE: List non-matching files by tye (Sage) on Aug 19, 2000 at 00:07 UTC
You might want to populate the hash via: `@f{@ARGV}= (1)x@ARGV;` [download] especially until Perl 5.6.1 (which includes tilly's patch) because the `map` could be noticeably slow if excluding a large number of files. - tye (but my friends call me "Tye")	[reply] [d/l] [select]
RE (tilly) 2: List non-matching files by tilly (Archbishop) on Aug 19, 2000 at 01:12 UTC
Well since you bring it up... :-) tilly's patch indeed. One. Very definitely only. But I do have a contribution to the core that apparently will get in over serious (and legitimate) aesthetic complaints by Sarathy: `--- pp_ctl.c.bak Tue Mar 21 00:42:26 2000 +++ pp_ctl.c Tue Apr 25 04:03:36 2000 @@ -736,6 +736,8 @@ if (diff > PL_markstack_ptr[-1] - PL_markstack_ptr[-2]) { shift = diff - (PL_markstack_ptr[-1] - PL_markstack_ptr[-2 +]); count = (SP - PL_stack_base) - PL_markstack_ptr[-1] + 2; + if (shift < count) + shift = count; /* Avoid shifting too often */ EXTEND(SP,shift); src = SP;` [download] Not much, eh? But this has the effect of making a 1-2 map of n things go from being O(n^2) to O(n). Specifically without the patch it is very slow to do this: `%hash = map {$_, 1} 1..100000;` [download] and after the patch this is fast. :-) Any 1-1 map (eg Schwartzian sort) is fast both before and after. It only matters when you get back more elements than you put in.	[reply] [d/l] [select]
RE: RE: List non-matching files by fundflow (Chaplain) on Aug 19, 2000 at 00:17 UTC
Can you explain this? Is `for(@ARGV) { $f{$_}=1 };` better? (it has the advantage of being clearer for non-perl programmers)	[reply] [d/l]
RE: RE: RE: List non-matching files by tye (Sage) on Aug 19, 2000 at 00:30 UTC
IMHO, yes. The `for` version doesn't build a (possibly huge) list of the form ("x",1,"y",1,...) and then populate the hash; it just populates the hash. I suspect my "slice" version is the fastest... and: `use Benchmark; @arr=('file.html')x1024; timethese( -3, { 'Slice' => '%x=();@x{@arr}=(1)x@arr', 'For' => '%x=();for(@arr){$x{$_}=1}', 'Map' => '%x=map{$_=>1}@arr' } ); __END__ Benchmark: running For, Map, Slice, each for at least 3 CPU seconds... For: 3 secs (3.13 usr + 0.00 sys = 3.13 CPU) @ 707.72/s (n=2218) Map: 3 secs (3.14 usr + 0.00 sys = 3.14 CPU) @ 96.98/s (n=305) Slice: 4 secs (3.14 usr + 0.00 sys = 3.14 CPU) @ 1074.72/s (n=3380)` [download] ...it is (for at least one case). So "slice" is 50% faster than "for" which is much faster than "map" (with tilly's patch, "map" will probably still be slower than the others, but not nearly that slow). - tye (but my friends call me "Tye")	[reply] [d/l] [select]
RE: RE: RE: RE: List non-matching files by fundflow (Chaplain) on Aug 19, 2000 at 00:50 UTC
RE: List non-matching files by eak (Monk) on Aug 19, 2000 at 23:00 UTC
In the spirit of right tool for the right job, I think this can be done a lot faster by just using 'find'. Take a look at this. `find . -type f -not -name '*.html' -maxdepth 1 -exec rm -rf {} \;` [download] --eric	[reply] [d/l]
RE: RE: List non-matching files by fundflow (Chaplain) on Aug 19, 2000 at 23:19 UTC
Well, not really. Think about: > nom .html .jpg `list-directories` Got the picture?	[reply]
RE (tilly) 2: List non-matching files by tilly (Archbishop) on Aug 20, 2000 at 00:17 UTC
In general pure Perl solutions tend to be faster than find for all of the reasons that Perl usually beats shell scripting. (You don't have to keep on launching processes.) In this case it comes down to launching one rm and passing it a lot of filenames vs launching an rm per file. Guess which I think is faster? However find has one huge advantage. It is one of the few ways to get around limitations with listing large numbers of files in shell scripts. The nom script given doesn't do that. A second advantage is that while find has a more complex API, it is also more flexible... :-)	[reply]
I agree, honest. by gryng (Hermit) on Aug 20, 2000 at 22:34 UTC
`find . -type f -not -name '*.html' -maxdepth 1 -print \| xargs rm -rf;` The above only launches three processes (well, actually a few more if xargs decides there are too many files), and since it's I/O bound, I doubt a Perl based solution would be significantly faster (and personally, my wager is that it would be slower). However, I agree that shell scripting would be slower in general than Perl, for the reason of process creation. But I don't think this case counts. Ciao, Gryn :)	[reply] [d/l]
RE: I agree, honest. by merlyn (Sage) on Aug 20, 2000 at 22:47 UTC
RE^2: I agree, honest. by gryng (Hermit) on Aug 20, 2000 at 22:50 UTC
RE: I agree, honest. by fundflow (Chaplain) on Aug 21, 2000 at 00:58 UTC
RE (tilly) 2: I agree, honest. by tilly (Archbishop) on Aug 21, 2000 at 01:25 UTC
Some notes below your chosen depth have not been shown here