Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Useful addition to Perl?

by tilly (Archbishop)
on Mar 04, 2004 at 20:54 UTC ( [id://334027]=perlmeditation: print w/replies, xml ) Need Help??

Here is a random thought that struck me last night. I've mulled it over for a bit, and it doesn't seem worse than it originally did, so I'm putting it up for discussion to see if other people agree.

I tend to use Perl as a grep on steroids. All that you have to do is:

perl -ne 'print if _____' somefile
and it works like grep, except that the condition can be written in Perl. I use this when I want to use a complex regular expression and I can't be bothered to remember how grep's differ from Perl's. I also use this when I want a condition that I can readily write in code.

That is fine, except that GNU grep does something that I really like. If you type:

grep -r foo bar
then it will recursively search through bar and grep for "foo" in each file. No need to mess around with find. No need to mess around with File::Find. Just a simple -r and It Just Works.

Who else would find it convenient if Perl, when it was invoked with -r, would take what is to become your @ARGS and recursively walk the directory tree to expand it out? That way my "grep on steroids" would trivially have the same feature that I have come to know and love in GNU's version of grep.

I'd appreciate a similar optional feature built into glob in some way...

Replies are listed 'Best First'.
Re: Useful addition to Perl?
by hossman (Prior) on Mar 04, 2004 at 22:13 UTC

    I don't know about mucking with the internals of glob, but I'm with perrin, given the current push to move stuff out of the core, this sounds like a really straight forward module to just include with -M ...

    # Recurse.pm use strict; use warnings; use File::Find; BEGIN { my @my_argv = (); foreach my $arg (@ARGV) { find({ "wanted" => sub { push @my_argv, $_; 1; }, "no_chdir" => 1 }, $arg); } @ARGV = @my_argv; } 1;

    (Anyone who wants to package that up, put it on CPAN, and deal with potential bug reports / feature requests; has my blessings. But please post a reply so others know it's available)

      Personally, I think it should be named simply 'R.pm', since its primary use is directly on the command line:

      perl -MR -ne 'print if _____'

      I would normally say that one-letter modules are a bad thing, but the special-purpose use of this one makes it an exception, IMHO.

      ----
      : () { :|:& };:

      Note: All code is untested, unless otherwise stated

        Actually, I started this, myself, a while ago... only (in order to avoid certain baddnesses of blowing up @ARGV to impossibly stupidly large proportions) it went a little more like this:
        package r; use strict; use File::Spec; tie @ARGV, 'r::Tie::RecursiveARGVArray', @ARGV; sub import { } package r::Tie::RecursiveARGVArray; use Tie::Array; use base 'Tie::StdArray'; sub TIEARRAY { my ($classname,@init) = @_; bless [@init], $classname; } sub FETCH { # magic here to explode directory contents if -d } # etc
        So that @ARGV didn't actually get enormous... it just added items to the front as while (<>) { implicitly unshift'd stuff off it.

        You can tell by the way that it starts that, actually,

        perl -mr -e ...
        was sufficient (who's got time for the shift key, anyway?). Too bad I never finished... coulda been a neat CPAN contribution... oh, well. Maybe someday, if no one runs off from reading these posts and implements it before I have time to finish it.
        ------------ :Wq Not an editor command: Wq
Re: Useful addition to Perl?
by perrin (Chancellor) on Mar 04, 2004 at 22:00 UTC
    Maybe you could just do it as a module, so you would say this:

    perl -MRecurse -ne 'print if _____' somedir/

Re: Useful addition to Perl?
by tachyon (Chancellor) on Mar 05, 2004 at 00:50 UTC

    I install a few handy dandy widgets in my path. These include re and its inverted cousin re! so I have less typing! re! has 3 extra ! chars to invert the matches. Here is re.

    [root@devel3 log]# cat /usr/bin/re #!/usr/bin/perl die " Usage re [RE] <optional filename/dirname> Full Perl grep on STDIN or filename Recursive if filename is a dir " unless @ARGV >= 1; my $re = qr/$ARGV[0]/; if ( $ARGV[1] ) { if ( -d $ARGV[1] ) { # we have a dir so let's recuse require File::Find; File::Find::find( \&grep_file, $ARGV[1] ); } else { open F, $ARGV[1] or die "Can't read $ARGV[1] $!\n"; do{print if m/$re/} while <F>; close F; } } else { do{print if m/$re/} while <STDIN>; } sub grep_file { return unless -f $_; open F, $_ or die "Can't read $File::Find::name $!\n"; my $matches = ''; do{$matches .= $_ if m/$re/} while <F>; print "$File::Find::name\n$matches" if $matches; close F; } [root@devel3 log]# re pri.. /usr/bin/re do{print if m/$re/} while <F>; do{print if m/$re/} while <STDIN>; print "$File::Find::name\n$matches" if $matches; [root@devel3 log]# cat /usr/bin/re | re [A-Z]+IN Full Perl grep on STDIN or filename do{print if m/$re/} while <STDIN>; [root@devel3 log]# cat /usr/bin/re | re "\b[A-Z]+\b" Usage re [RE] <optional filename/dirname> Full Perl grep on STDIN or filename " unless @ARGV >= 1; my $re = qr/$ARGV[0]/; if ( $ARGV[1] ) { [snip] [root@devel3 log]# re perl /devel/www/modperl /devel/www/modperl/logout.pl #!/usr/bin/perl -w /devel/www/modperl/search.pl #!/usr/bin/perl -w /devel/www/modperl/error.pl #!/usr/bin/perl -w [snip] [root@devel3 log]#

    YMMV

    cheers

    tachyon

Re: Useful addition to Perl?
by vladb (Vicar) on Mar 04, 2004 at 22:05 UTC
    This may not always work for large number of files or huge directory trees, but I often use something like
    perl -ne 'print if _____' `find .`
    Or if you have a specific directory in mind
    perl -ne 'print if _____' `find mydirectory`


    _____________________
    "We've all heard that a million monkeys banging on a million typewriters will eventually reproduce
    the entire works of Shakespeare. Now, thanks to the Internet, we know this is not true."

    Robert Wilensky, University of California

      Unfortunately, that's not portable.

        it's more portable than you think; see perl power tools

        ~Particle *accelerates*

Re: Useful addition to Perl?
by BrowserUk (Patriarch) on Mar 05, 2004 at 08:52 UTC

    I really like this idea. I'd also like to see a -g option for non-globbing platforms that would would glob @ARGV for those of use who use systems that do not do this by default. Actually, as I've recently discovered, it would be useful on systems who's shells do glob by default. It would be a way of alleviating the "list too long" problem.

    I have a module called g.pm that does this for me currently using -Mg, and I like the idea enough that if you or someone make a module that does this I'll be adding it to my system as r.pm.

    A compromise solution to putting this is the core might be to have the command line options processing in the perl executable attempt a "use X" where X is any unknown command line option it encounters. If the "use r;" (or g etc) failed, it would then report the "unknown option" in the normal way. Then we could use commands like

    perl -grple ' next unless /..../' \*.log

    One possible problem with implementing this as a module (using File::Find or similar) is that @ARGV can end up containing a huge list on large/deeeply nested subtrees. It would be nice to find a way of processing @ARGV such that each new level of subtree was only expanded when needed. It's difficult to explain what I mean but for example:

    1. @ARGV = '*';

      So this gets globbed @ARGV = glob @ARGV;

    2. Now @ARGV = ( file1, file2, file3, dir1, dir2 );

      Perl enters the normal <> processing loop and processes the three files, but when it encounters the first directory, that directory is then globed, with any directories that result beig unshifted onto @ARGV before any files.

    3. So you get @ARGV = ( dir1/file1, dir1/file2, dir1/sub1, dir1/sub2, dir2 );

      And the process repeats, working it's way through the subtree processing files as they are encountered and decending into directories as it goes until @ARGV is empty.

    Thats probably not well thought through, but the idea is there. I guess one advantage of sticking with the -Mr syntax would be that you could add additional options like -Mr=d for depth first ot -Mr=b for breadth first etc.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
      I have a module called g.pm that does this for me currently using -Mg, ...

      Something like my G.pm (US mirror) ? ;-)

      Jenda
      Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
         -- Rick Osborne

      Edit by castaway: Closed small tag in signature

        Yes. almost exactly like your G.pm:) Thanks.

        Actually, I have your G.pm on my other machine, but when I set this one up, I couldn't remember where I got it -- my portable was dead at the time with a flaky motherboard connection. Then I came across a description of something called "Wild.pm" which I cut and paste but then got fed up with typing -mWild and renamed it to g.pm.

        So yes, probably very similar, and I definitely stole the name from you--I'd gotten used to it. Thanks:)

        Now you've reminded me of where I got it from in the first place, I'll probably grab yours again (and rename it g.pm:) (I just tried but your site seems to be off the air at this moment).


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
      Big ++.

      I obviously agree very much. In fact, some of this stuff (like avoiding exploding @ARGV) are in my discussions of r.pm (in this thread and in its own thread, elsewhere). The nice thing about the way that -n or -p are actually processed (which is with while (<>) { ... }) is that they actually shift @ARGV. Thus, by tieing @ARGV and only exploding directory contents as they are fetched, you can do a highly efficient "perl -mr -ne ...".

      I'm also a big fan of the glob thing... I even considered doing that automatically in r.pm, if $^0 =~ /MSWin32/. I decided, though, to fight one battle at a time.

      Last of all, I totally love the idea of treating otherwise unrecognized switches as uses. ...Well, love the idea of it, and the cuteness... but of course, you run out of switches *real* fast, and actually, they're mostly already used up. Of course, if you want to get more into the perlrun mindset that gives us the -s option to perl, perhaps we would only activate this behavior with another special switch... or maybe a switch that leads a group of "external" switch modules... -X seems to be available. That would give you something like

      perl -Xrg -ple '...' *
      as a shorthand for
      perl -Mr -Mg -ple '...' *
      But, anyway... now I feel that I have clearly strayed off into hyperspace =D
      ------------ :Wq Not an editor command: Wq
Re: Useful addition to Perl?
by dragonchild (Archbishop) on Mar 04, 2004 at 21:20 UTC
    I would certainly find it convenient, were I to use Perlgrep. I tend to use find. -type f | xargs grep foo and it works in the 80% case.

    Building it into glob, however ... that would be VERY useful, in the average case.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Useful addition to Perl?
by Abigail-II (Bishop) on Mar 04, 2004 at 21:51 UTC
    I don't think a proposal to add an option to Perl as you propose will make any chance, unless you've shown it's in demand. How to prove demand? Make a module that gives you this behaviour. It shouldn't be too hard to make a module with a modified glob, that does a recursive expand.

    As for the recursive behaviour of @ARGV, be aware that @ARGV (or rather <ARGV>) is already highly magical. Ever tried passing 'who |' as an argument? Do you think the added complexity to <ARGV> outweights the benefits?

    Abigail

      I don't understand why you claim that @ARGV is highly magical.

      The example that you gave describes what I would expect considering the fact that the entries in @ARGV are passed straight to open, and that is magical. Too magical. I fully agree with the thread that started at Two-arg open() considered dangerous.

      As for the recursive behaviour, yes I think that the benefits of an optional recursive preprocessing of @ARGV outweighs the costs. Of course you couldn't rely on that if you were being paranoid, because of the security problems with trusting Perl's open. But we both knew that anyways.

      Umm.. this is Perl, we throw everything in that appears even slightly useful and say "TIMTOWTDI." Evaluating features on the basis of consistancy is not the Perl way, it's the Python way.

      This conversation ended by the goodwin-wall law.

        Umm.. this is Perl, we throw everything in that appears even slightly useful and say "TIMTOWTDI."
        Who is we? Are you a perl5 porter? Ever read that list? Abigail has a point (just because you use perl, doesn't mean you know anything how decisions are made).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://334027]
Approved by gjb
Front-paged by gjb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-18 23:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found