limzz has asked for the wisdom of the Perl Monks concerning the following question:

EDIT: In the if statement, <TERM> was supposed to say $y. And yes davido, I've got strict and warnings on, this is just a small snippet, thanks for the heads up though.

Hi all, So I'm a newbie to Perl and Unix, I've been self teaching for a couple weeks. Up until now I've been able to find everything on my own, but I can't seem to find this with Google or whatnot. For the script I'm currently writing I need to find a string in a file path, and if it's not found search again, but insensitive to cases. I don't make it case insensitive in the first place because that makes it take a lot longer, and I'm looping this many times. The following is basically what I have now:

$x = `gzgrep $y <PATH>`; if !`gzgrep $y <PATH>`{ $x = `gzgrep -i $y <PATH>`; }

The if statement works, but I'm sure there's a way to do that without greping twice, I just can't figure out the syntax. The other thing I would like to do, is if the case insensitive grep finds a match, I want to set $y to whatever was matched (so if $y = aaa and it finds aaA, I want $y to equal $aaA). I think I know how to do this with regular expressions, but I feel there must be an easier way, and I would like to learn. Thanks in advance, and sorry if this is trivial.

  • Comment on Perl/Unix question; returning grep as a boolean and returning a match
  • Download Code

Replies are listed 'Best First'.
Re: Perl/Unix question; returning grep as a boolean and returning a match
by davido (Cardinal) on Jun 03, 2011 at 19:24 UTC

    Since $x contains the results of the first gzgrep, you can rely on its result instead of gzgrep-ing twice, as in:

    $x = `gzgrep $y <PATH>`; if( not $x ) { $x= `gzgrep -i <TERM> <PATH>`; }

    Or use a logical short-circuit '||' operator like this:

    $x = `gzgrep $y <PATH>` || `gzgrep -i <TERM> <PATH>`;

    This would work because when it evaluates a logical OR operator, Perl stop as soon as it finds success. In other words, $x will be assigned the value of the first gzgrep if it is successful, or if it's unsuccessful, the second gzgrep will be executed, and $x will be assigned its value.

    By the way: Stop everything you're doing, put the following at the top of your script:

    use strict; use warnings;

    And then convert your variables to lexicals, as in, "my $x;" You'll thank yourself later on. Hopefully you're not so far into something that it would create a huge amount of work for you. Doing this will require that you pass values into functions through parameters instead of global osmosis, and return values through the 'return' or through parameter modification (where necessary). It may break what you're already working on to the point that you need to rewrite it. But if what you're working on is potentially going to grow into a larger project, it's worth refactoring to use lexicals instead of globals where possible.


    Dave

Re: Perl/Unix question; returning grep as a boolean and returning a match
by jwkrahn (Abbot) on Jun 04, 2011 at 00:13 UTC

    Based on your comments this may work (UNTESTED):

    my $y = 'aaa'; my $unzip = 'zcat'; my $dir = '<PATH>'; opendir my $DH, $dir or die "Cannot open directory '$dir' because: $!" +; foreach my $path ( map "$dir/$_", readdir $DH ) { my ( $exact, $not_exact ); open my $PIPE, '-|', $unzip, $path or die "Cannot open pipe from ' +$unzip' because: $!"; while ( <$PIPE> ) { if ( /$y/ ) { $exact = 1; last; } if ( /($y)/i ) { $not_exact = $1; } } close $PIPE or warn $! ? "Error closing '$unzip' pipe: $!" : "Exit status $? from '$unzip'"; if ( ! $exact && defined $not_exact ) { $y = $not_exact; } }
Re: Perl/Unix question; returning grep as a boolean and returning a match
by graff (Chancellor) on Jun 04, 2011 at 00:40 UTC
    You said:
    I need to find a string in a file path, and if it's not found search again, but insensitive to cases. I don't make it case insensitive in the first place because that makes it take a lot longer, and I'm looping this many times.

    That, along with fact that you're using gzgrep (to find stuff in presumably large compressed files), raises a variety of warning signals for me. First, case-insensitive search vs. case-sensitive shouldn't have such a big impact on the run-time of gzgrep itself, though it might have a major impact on your perl script, depending on how many matches you get one way vs. the other way.

    Because you're using backticks, perl has to allocate memory for all the output, so if there's a lot, your script likely to slow down a lot. jwkrahn's suggestion above will avoid that problem, and will also do what needs to be done in a single pass over the input, which is bound to be faster than going over the input twice.

    As for "looping this many times", I'm not sure what you mean by that. If it means going over the same input many times to look for different patterns, you can adapt jwkrahn's approach to combine multiple matches in a single pass over the file, either using a more elaborate regex (with capturing parens so you can tell what matched), or extending the sequence of elsif matches on each input line. Either way is likely to be faster than reading the same input over and over for each condition.

Re: Perl/Unix question; returning grep as a boolean and returning a match
by Sandy (Curate) on Jun 03, 2011 at 19:22 UTC
    UPDATE: New code that only uses ONE gzgrep

    Hi,

    I am not sure if I understand you correctly, but if I do, then this should do what you want

    #!/usr/bin/perl use strict; use warnings; my $y = 'Eclipse'; my $p = 'myfile.gz'; # x will contain the line(s) that contain y my $x = `gzgrep $y $p`; unless ( $x ){ print "could not find exact case, but...\n"; $x = `gzgrep -i $y $p`; } if ($x) { print "found:\n$x\n"; } else { print "nothing found\n"; }
    You should note that using backticks (rather than the system command) will return the result of the system command (what is normally written to the terminal) rather than the exit code.

    The exception to this is error messages. Unless you specifically redirect stderr to stdin, stderr will still be displayed on the screen, and will not be captured by $x

    Only one gzgrep!

    #!/usr/bin/perl use strict; use warnings; my $y = 'Eclipse'; my $p = 'myfile.gz'; my @x; my $z; @x = grep /$y/, $z= `gzgrep -i $y $p`; if ( @x ){ print "found @x\n"; } else { print "could not find exact case, but...\n"; if ($z) { print "found:\n$z\n"; } else { print "found nothing\n"; } }