quasimojo321 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am pretty much a newbie to perl. Here is my question...
I understand that `grep xyz anyfile` can be expressed as
while <anyfile> { if (/xyz/) { print $_; } }
But how would I approximate grep -v in perl?
When I use (!/xyz) I am getting some strange results, namely every line of the file is being displayed with the one line that does not match "xyz" being displayed twice at the end of the output. I should also mention that I am using a foreach $i (@list) where @list = qw (leroy brown). I am sure that I am not using the $_ variable correctly but I don't know exactly what it is that I am doing wrong.
SCRIPT: #!/usr/bin/perl -w @list = qw (leroy brown); $log="/home/psmith/logfile"; open IN (<$log); while <IN> { foreach $i (@list){ if (!/$i/) { print $_; } } } OUTPUT: the quick brown fox jumped over the lazy dogs leroy black the badest man in the whole damn town live long and prosper live long and prosper

Any help would be appreciated!

Thanks


Edit 2001-07-09 ar0n -- New title

Replies are listed 'Best First'.
Re: Unix 'grep -v' equivalency in Perl (was: Perl Regex Question)
by runrig (Abbot) on Jul 09, 2001 at 22:54 UTC
    use next to skip the print if you match (also look into the qr operator in perlop to make this more efficient):
    INPUT: while (<IN>) { for my $match (@list) { next INPUT if /$match/; } print; }
    Or join your input together (if it has no metacharacters):
    my $re = join "|", @list; while ... { print unless /$re/; }
    Update: Also, get in the habit of using strict, it may seem like a hassle at first, but makes things more maintainable in the long run.
      To be more idiomatic, you should ensure that your @list doesn't contain anything that will get the regex bent out of shape. A simple application of quotemeta will help set things straight:
      my $re = join ("|", map {quotemeta} @list); while (...) { next if /$re/; print; }
      I'm not a huge fan of the 'next LABEL' command. It's too much like 'goto', which is one of those things that shouldn't be shown in public.
        I don't think the purported similarity to goto is a good reason to not use Perl's loop control facilities. OK, the construct is abusable (what construct is not?), but when used appropriately I think that it clarifies things immensely.

        FWIW the following article on Loop Exits and Structured Programming helped shaped my thinking on this.

Re: Unix 'grep -v' equivalency in Perl (was: Perl Regex Question)
by arturo (Vicar) on Jul 09, 2001 at 23:48 UTC

    In addition to the other advice, Perl makes possible neat idioms that I find are quite readable (adding ! to a regex match, while sometimes useful, can help you get lost in a sea of symbols):

    while (<IN>) { # print if /$pattern/; # normal "grep" print unless /$pattern/; # grep -v }

    $_ is the "default" argument to print, as well as to the pattern match. The "grep -v" line is equivalent to:

    print $_ unless $_ =~ /$pattern/;

    HTH

    perl -e 'print "How sweet does a rose smell? "; chomp ($n = <STDIN>); +$rose = "smells sweet to degree $n"; *other_name = *rose; print "$oth +er_name\n"'
Re: Unix 'grep -v' equivalency in Perl (was: Perl Regex Question)
by dsb (Chaplain) on Jul 09, 2001 at 22:59 UTC
    First of all, you are not opening the file right. Try:
    open(IN,$log) || die $!, "\n";
    Always have an '|| die' when you are opening a file. If the file fails to open and you continue anyway, you'll eventually get an error for trying to work on a closed filehandle, and your script will die anyway.

    Second of all, you seem to using '$_' correctly so I wouldn't worry about that.

    However in order to get a non-match you would need to use the expression:

    if ( $_ !~ m/$i/ ) { print; }
    The key there being the '!~' operator.

    Update: I've been advised to not use the '|| die' notation and use instead 'or die'. As far as I can see this has to do with the trouble that could be caused by combining this notation with a function call that does not use parentheses.

    The higher precedence of '||' causes Perl to see the call differently than if the 'or die' notation was used(when making a function call sans parentheses). Example:

    open FH, "filename" or die $!; #will read as its meant to: open(FH,"filename") or die $!; open FH, "filename" || die $!; #will read as: open FH,("filename" || die $!); which is not good since + the only time #this expression will be treated as false is in cases where filename i +s an undefined expression(0,"",or undefined scalar) #but unrecognized file names will not be evaluated as false so the die + will serve no purpose and the the script will continue to run

    Amel - f.k.a. - kel

      Just a brief addition to this. You'll get more information by leaving off the "\n".

      open(IN,$log) || die $!, "\n"; #or #open(IN,$log) || die "$!\n";

      Gives the message:

      No such file or directory

      While

      open(IN,$log) || die $!;

      gives

      No such file or directory at ./die line 5.

      This behavior is documented in perldoc perlfunc die

      After Compline,
      Zaxo

      Yet another brief addition: the message $! tells you the cause of the addition, but not the source. Zaxo already told you that leaving that newline out makes die tell the terminal which line of code generated that error.

      However, that's not immediately informative. The command probably interpretets a variable to generate the syscall, so in general it's a good idea to provide that information :

      open( INPUT, $filename ) or die "Could not open $filename: $!"; # or rename( $old, $new ) or die "Could not rename $old to $new: $!";

      As a third step, you can use Carp to print out the chain of callers. This is useful when your code is distributed over more than one file.

Re: Unix 'grep -v' equivalency in Perl (was: Perl Regex Question)
by Anonymous Monk on Jul 09, 2001 at 23:36 UTC
    You get what you have written:
    -print if line doesn't match the first
    -OR print if line doesn't match the second.

    That means the first line is printed because it doesn't match leroy and the second is printed because it doesn't match brown. The third is printed for either reason, so double. I think you want a nice flag that is set at every line and reset at every match.
      #!/usr/bin/perl -w use strict; my @avoid= map qr/\Q$_\E/i, qw(leroy brown); my $log= "/home/psmith/logfile"; open( IN, "< $log" ) or die "Can't read $log: $!\n"; my $line; while( $line= <IN> ) { print $line unless grep $line =~ $_, @avoid; }

      If @avoid is quite large, then having grep always match against all of them might be worth avoiding. Maybe one day grep will be optimized for this case. (: Until then:

      while( $line= <IN> ) { for( @avoid ) { if( $line =~ $_ ) { print $line; last; } } }
      Update: except for that being backwards. Try:
      while( $line= <IN> ) { for( @avoid ) { if( $line =~ $_ ) { $line= ""; last; } } print $line; }

              - tye (but my friends call me "Tye")
        Your second code prints if in @avoid. I hope it's Ironic(TM).
Re: Unix 'grep -v' equivalency in Perl (was: Perl Regex Question)
by quasimojo321 (Initiate) on Jul 10, 2001 at 01:19 UTC
    Thank you all for your help!
    I'll be trying each of these options independantly and running them with timex to see which gets the best results.
    The both @list and the db logfile I will be running this script against are extremely large (hence my usage of perl vs. grep as one of you so astutely pointed out!). Again thank you for all the help!