RE: List non-matching files
by tye (Sage) on Aug 19, 2000 at 00:07 UTC
|
@f{@ARGV}= (1)x@ARGV;
especially until Perl 5.6.1 (which includes tilly's
patch) because the map could be noticeably
slow if excluding a large number of files.
-
tye
(but my friends call me "Tye") | [reply] [d/l] [select] |
|
|
--- pp_ctl.c.bak Tue Mar 21 00:42:26 2000
+++ pp_ctl.c Tue Apr 25 04:03:36 2000
@@ -736,6 +736,8 @@
if (diff > PL_markstack_ptr[-1] - PL_markstack_ptr[-2]) {
shift = diff - (PL_markstack_ptr[-1] - PL_markstack_ptr[-2
+]);
count = (SP - PL_stack_base) - PL_markstack_ptr[-1] + 2;
+ if (shift < count)
+ shift = count; /* Avoid shifting too often */
EXTEND(SP,shift);
src = SP;
Not much, eh? But this has the effect of making a 1-2
map of n things go from
being O(n^2) to O(n). Specifically without the patch it is
very slow to do this:
%hash = map {$_, 1} 1..100000;
and after the patch this is fast. :-)
Any 1-1 map (eg Schwartzian sort) is fast both before and
after. It only matters when you get back more elements
than you put in. | [reply] [d/l] [select] |
|
|
Can you explain this?
Is for(@ARGV) { $f{$_}=1 }; better?
(it has the advantage of being clearer for non-perl programmers)
| [reply] [d/l] |
|
|
IMHO, yes. The for version doesn't build a
(possibly huge) list of the form ("x",1,"y",1,...) and
then populate the hash; it just populates the hash.
I suspect my "slice" version is the fastest... and:
use Benchmark;
@arr=('file.html')x1024;
timethese( -3, {
'Slice' => '%x=();@x{@arr}=(1)x@arr',
'For' => '%x=();for(@arr){$x{$_}=1}',
'Map' => '%x=map{$_=>1}@arr'
} );
__END__
Benchmark: running For, Map, Slice, each for at least 3 CPU seconds...
For: 3 secs (3.13 usr + 0.00 sys = 3.13 CPU) @ 707.72/s (n=2218)
Map: 3 secs (3.14 usr + 0.00 sys = 3.14 CPU) @ 96.98/s (n=305)
Slice: 4 secs (3.14 usr + 0.00 sys = 3.14 CPU) @ 1074.72/s (n=3380)
...it is (for at least one case). So "slice" is 50% faster
than "for" which is much faster than "map" (with
tilly's patch, "map" will probably still be slower than
the others, but not nearly that slow).
-
tye
(but my friends call me "Tye") | [reply] [d/l] [select] |
|
|
RE: List non-matching files
by eak (Monk) on Aug 19, 2000 at 23:00 UTC
|
In the spirit of right tool for the right job, I think this can be done a lot faster by just using 'find'.
Take a look at this.
find . -type f -not -name '*.html' -maxdepth 1 -exec rm -rf {} \;
--eric | [reply] [d/l] |
|
|
Well, not really.
Think about:
> nom *.html *.jpg `list-directories`
Got the picture?
| [reply] |
|
|
In general pure Perl solutions tend to be faster than find
for all of the reasons that Perl usually beats shell
scripting. (You don't have to keep on launching processes.)
In this case it comes down to launching one rm and passing
it a lot of filenames vs launching an rm per file. Guess
which I think is faster?
However find has one huge advantage. It is one of the few
ways to get around limitations with listing large numbers
of files in shell scripts. The nom script given doesn't
do that.
A second advantage is that while find has a more complex
API, it is also more flexible... :-)
| [reply] |
|
|
find . -type f -not -name '*.html' -maxdepth 1 -print | xargs rm -rf;
The above only launches three processes (well, actually a few more if
xargs decides there are too many files), and since it's I/O bound, I doubt a
Perl based solution would be significantly faster (and personally, my wager is
that it would be slower).
However, I agree that shell scripting would be slower in general than
Perl, for the reason of process creation. But I don't think this case counts.
Ciao,
Gryn :)
| [reply] [d/l] |
|
|
|
|
|
|
|
|
|