McMahon has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks...

It might just be me being tired, but I have this code...

my @strings; my $stringFile = <STDIN>; chomp $stringFile; if ($stringFile eq "") { print "No file of exclude strings specified.\n"; } else { open (STRINGFILE, "$stringFile") or die "Couldn't open $stringFile:$!\n"; @strings = <STRINGFILE>; } #BLAH BLAH BLAH foreach my $file1(@file1only) { foreach my $string(@strings) { chomp $string; unless ($file1 =~ $string) { print OUT $file1; last; } } }
If <STRINGFILE> has something in it, everything works fine. But if STRINGFILE isn't specified, @strings is null. I did not expect ($file1 =~ $string) to always be true if the right side of the expression is null.

My instinct is to handle the cases (STRINGFILE exists; STRINGFILE does not exist) with subroutines, but that seems like an awful lot of code to solve a really simple problem.

So any suggestions would be welcome...

Replies are listed 'Best First'.
Re: Help with null string behavior in regex?
by hv (Prior) on May 03, 2004 at 22:03 UTC

    Your logic in the final double loop seems strange:

    LINE: for each line of the file { PATTERN: for each exclude string { if it matches the exclude string skip to the next PATTERN else print the line skip to the next LINE } }

    When there are no exclude strings (either because the file was not specified, or because it was empty), the code never enters the inner loop and the line is never printed: I suspect (though it isn't entirely clear) that this is the behaviour you are seeing.

    I'd expect instead logic something more like this:

    LINE: for each line of the file { PATTERN: for each exclude string { if we must exclude this line skip to the next LINE } we weren't excluded by any pattern, so print the line }

    If this is the logic you really wanted, you can achieve it with code something like:

    LINE: foreach my $line (@file1only) { foreach my $exclude (@strings) { chomp $exclude; next LINE if $line =~ $exclude; } # we weren't excluded by any of the patterns print OUT $line; }

    Hugo

      This makes a lot of sense, thanks hv!
      I hadn't seen that "LINE:" syntax before, but I know I've *wanted* it several times in the past.
      One other thing, if you're going to go that route, would be to consider consolidating all the exclude strings into one pattern. Something like this.
      if (@strings){ $re = join "|", map {quotemeta} @strings; } else { $re = undef; } foreach my $line (@file1only) { print unless defined($re) && /$re/; }
Re: Help with null string behavior in regex?
by diotalevi (Canon) on May 03, 2004 at 21:40 UTC

    That isn't right. It isn't that $file =~ $string is always true, it is that in for ( ... ), you never get to the inner guts of the for() if there is nothing in @strings. It is also true that  ... =~ // is also true. Maybe you had a blank line in your input.

    I've included a slightly different and to my mind nicer version of your code. You may find it helpful.

    my $pattern_file = <STDIN>; chomp $pattern_file; open my $patterns, "<", $pattern_file or die "Can't open '$pattern_fil +e' for reading: $!"; my @patterns = <$patterns>; chomp @patterns; @patterns = grep length(), @patterns; @patterns = map qr/$_/, @patterns; ... foreach my $file ( ... ) { # All patterns must match foreach my $pattern ( @patterns ) { next if $file =~ $pattern; print OUT "'$file' !~ /$pattern/\n"; last; } }
      I see what this is doing, but I'm going to the shed with the Camel before I'll be comfortable maintaining it.

      BTW, the kluge I used to solve this was
      if ($stringFile eq "") { print "No file of exclude strings specified\n"; @strings = "xyzzy";

      which is effective, but it makes me feel all dirty. =)

        I hoped that what I was doing was clear enough. The real improvements were doing a chomp() on the whole list at once, only once, removing any empty elements, then pre-compiling all the patterns.

        I also changed your for ( ... ) { unless ( ... ) { ...; last } } because it read strangely for me. unless() is like that, sometimes it just reads strangely. Since the intent of that code was to be an "everything matches or I print the file name" I just skipped right to the next pattern if there was a match and made the exception stand out.

Re: Help with null string behavior in regex?
by Belgarion (Chaplain) on May 03, 2004 at 21:34 UTC

    If a variable is undef it will evaluate to an empty string when used in a regex. An empty string on the right side will match any string on the left.

    For example:

    print "Match 1\n" if "teststring" =~ ""; print "Match 2\n" if "teststring" =~ undef; __OUTPUT__ Match 1 Match 2