This simple script lists the files (in the current dir) except the ones given on the command line

I use it daily, when converting files from one format to another. I took it from 'Unix Power Tools' long time ago. It was using /bin/sh but perl is so much simpler...

Example: rm `nom *.html` (leaves only .html files)


Updated after tye's suggestion. (thanks)
#!/usr/bin/perl @f{@ARGV}= (1)x@ARGV; for (<*>) { print "$_\n" unless $f{$_}; }

Replies are listed 'Best First'.
RE: List non-matching files
by tye (Sage) on Aug 19, 2000 at 00:07 UTC

    You might want to populate the hash via:

    @f{@ARGV}= (1)x@ARGV;

    especially until Perl 5.6.1 (which includes tilly's patch) because the map could be noticeably slow if excluding a large number of files.

            - tye (but my friends call me "Tye")
      Well since you bring it up... :-)

      tilly's patch indeed. One. Very definitely only. But I do have a contribution to the core that apparently will get in over serious (and legitimate) aesthetic complaints by Sarathy:

      --- pp_ctl.c.bak Tue Mar 21 00:42:26 2000 +++ pp_ctl.c Tue Apr 25 04:03:36 2000 @@ -736,6 +736,8 @@ if (diff > PL_markstack_ptr[-1] - PL_markstack_ptr[-2]) { shift = diff - (PL_markstack_ptr[-1] - PL_markstack_ptr[-2 +]); count = (SP - PL_stack_base) - PL_markstack_ptr[-1] + 2; + if (shift < count) + shift = count; /* Avoid shifting too often */ EXTEND(SP,shift); src = SP;
      Not much, eh? But this has the effect of making a 1-2 map of n things go from being O(n^2) to O(n). Specifically without the patch it is very slow to do this:
      %hash = map {$_, 1} 1..100000;
      and after the patch this is fast. :-)

      Any 1-1 map (eg Schwartzian sort) is fast both before and after. It only matters when you get back more elements than you put in.

      Can you explain this?

      Is for(@ARGV) { $f{$_}=1 }; better?
      (it has the advantage of being clearer for non-perl programmers)

        IMHO, yes. The for version doesn't build a (possibly huge) list of the form ("x",1,"y",1,...) and then populate the hash; it just populates the hash. I suspect my "slice" version is the fastest... and:

        use Benchmark; @arr=('file.html')x1024; timethese( -3, { 'Slice' => '%x=();@x{@arr}=(1)x@arr', 'For' => '%x=();for(@arr){$x{$_}=1}', 'Map' => '%x=map{$_=>1}@arr' } ); __END__ Benchmark: running For, Map, Slice, each for at least 3 CPU seconds... For: 3 secs (3.13 usr + 0.00 sys = 3.13 CPU) @ 707.72/s (n=2218) Map: 3 secs (3.14 usr + 0.00 sys = 3.14 CPU) @ 96.98/s (n=305) Slice: 4 secs (3.14 usr + 0.00 sys = 3.14 CPU) @ 1074.72/s (n=3380)

        ...it is (for at least one case). So "slice" is 50% faster than "for" which is much faster than "map" (with tilly's patch, "map" will probably still be slower than the others, but not nearly that slow).

                - tye (but my friends call me "Tye")
RE: List non-matching files
by eak (Monk) on Aug 19, 2000 at 23:00 UTC
    In the spirit of right tool for the right job, I think this can be done a lot faster by just using 'find'. Take a look at this.
    find . -type f -not -name '*.html' -maxdepth 1 -exec rm -rf {} \;
    --eric
      Well, not really.

      Think about:

      > nom *.html *.jpg `list-directories`

      Got the picture?

      In general pure Perl solutions tend to be faster than find for all of the reasons that Perl usually beats shell scripting. (You don't have to keep on launching processes.) In this case it comes down to launching one rm and passing it a lot of filenames vs launching an rm per file. Guess which I think is faster?

      However find has one huge advantage. It is one of the few ways to get around limitations with listing large numbers of files in shell scripts. The nom script given doesn't do that.

      A second advantage is that while find has a more complex API, it is also more flexible... :-)

        find . -type f -not -name '*.html' -maxdepth 1 -print | xargs rm -rf;

        The above only launches three processes (well, actually a few more if xargs decides there are too many files), and since it's I/O bound, I doubt a Perl based solution would be significantly faster (and personally, my wager is that it would be slower).

        However, I agree that shell scripting would be slower in general than Perl, for the reason of process creation. But I don't think this case counts.

        Ciao,
        Gryn :)