mwunderlich has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys,

My first posting here, so please forgive me any blunders that I might ignorantly commit.

I am trying to parse a text file line by line using the lookaround mechanism from the RegEx functions. In essence, I want to match any line that contains a search term, but I want to exclude those instances in which that search term is preceded or followed by certain other terms. So, here is the code that I thought would work (just the relevant bits):
open (MYFILE, 'RegExTest.txt'); #Init variables $searchTerm = "searchForThis"; $pre = "NotBefore"; $post = "NotAfter"; #Loop through the file while (<MYFILE>) { $line = $_; if($line =~ m/(?<!$pre)$searchTerm?!$post/) {print "Match found: " . $line}; } close(MYFILE);
However, this isn't working as expected. If I remove the lookahead and lookbefore clauses from the regex, it does work. Any ideas why?

Cheers,

Martin

Replies are listed 'Best First'.
Re: Using lookaround with variables
by ikegami (Patriarch) on Dec 16, 2008 at 17:51 UTC
    /(?<!$pre)$searchTerm?!$post/
    should be
    /(?<!$pre)$searchTerm(?!$post)/
    Actually, if $pre, $searchTerm and $post are text rather than regexp patterns, it should be
    /(?<!\Q$pre\E)\Q$searchTerm\E(?!\Q$post\E)/
      Actually, if $pre, $searchTerm and $post are text rather than regexp patterns, ...

      Actually, if $pre contains anything that would involve a variable-length match, it just won't work with the look-behind operator:

      perl -e '$pre="x+"; $_="xxFindme"; print "found" if(/(?<=$pre)Findme/) +' # dies with the message: Variable length lookbehind not implemented in regex; marked by <-- HER +E in m/(?<=x+)Findme <-- HERE / at -e line 1.
      That is, the regex quantifiers "*", "?", "+" and "{n,m}" cannot be used as such when doing a look-behind (though they will work with look-ahead). If the OP's $pre might ever contain one of these things, putting \Q...\E around it in the regex -- treating them as literals -- will at least make sure the script doesn't crash. -- update: and as ikegami explains below, there's a better way to handle this

      (I'm sure ikegami knows this already, but it might not have been clear to mwunderlich or other readers.)

        While you're right that very few patterns work in a lookbehind, it's wrong to add \Q..\E based on the possibility that the variable might contain patterns that aren't allowed in lookbehinds. Whether \Q..\E should be used or not is strictly based on whether $pre contains a pattern or literal text. Your advice would break the pattern /a.b/, for example.

        If you want to catch invalid patterns, you need eval BLOCK, not \Q..\E.

Re: Using lookaround with variables
by GrandFather (Saint) on Dec 16, 2008 at 23:10 UTC

    Nothing to do with your question (more an initiation rite), but there are a few things you could tidy up in your code and save yourself some bother in the future:

    First, always use strictures (use strict; use warnings;).

    Next, use the three parameter version of open, use lexical file handles and check the result:

    open my $inFile, '<', 'RegExTest.txt' or die "Failed to open RegExTest +.txt: $!\n";

    Perl's payment curve coincides with its learning curve.
      Thanks a lot everyone for all the hints. Just to clarify, the variables would contain plain text, not a pattern. I have now tried the following two expressions, but neither of the works:
      if( $line =~ m/(?<!\Q$pre\E)\Q$searchTerm\E(?!\Q$post\E)/ ) {print $line}; if( $line =~ m/(?<!$pre)$searchTerm(?!$post)/ ) {print $line};
      Strangely enough, if I put in the contents of the variables $pre and $post directly into the RegEx, then it does work. Any ideas why that might be?

      Cheers,

      Martin
        OK, so investigating this further, it seems that the error not actually caused by the RegEx, but rather by the code that follows it - which I find even stranger.

        The following works:
        while (<MYFILE>) { $line = $_; if( $line =~ m/(?<!$pre)$searchTerm(?!$post)/ ) {print $line}; }
        The following doesn't work:
        while (<MYFILE>) { $line = $_; if( $line =~ m/(?<!$pre)$searchTerm(?!$post)/ ) {print $line}; if ($pre = "" and $post = "") { print "Case 1"; } elsif ($pre = "") { print "Case 2"; } elsif ($post = "") { print "Case 3"; } else { print "Case 4"; } print "\n"; }
        Any ideas why??

        Cheers,

        Martin