in reply to Another Look behind

The essence of the variable-width negative look-behind hack technique is that some part of a string that is needed in a subsequent match is matched and then "consumed" by  (*SKIP) (which prevents backtracking) when the match is then forced to fail. (The  \K operator already provides very nice variable-width positive look-behind.)

In the example below, the part that is "consumed" consists of some whitespace plus the entire  'Appl. Phys.' piece. If only a single whitespace character were guaranteed always to be present before  'Appl. Phys.' and this character was required for subsequent match, it would have been enough to consume only this character, but this seemed too fragile to me: more whitespace can easily creep in. Also, I have captured nothing because I don't understand just what you want from these captures: e.g., capturing the meta-quoted  'Appl. Phys.' is pointless because it's never going to change. What did you really want from these captures? (I'm also using non-capturing, atomic  (?>pattern) groups here rather than simple non-capturing  (?:pattern) groups because I think it makes reasoning about this trick a little easier.)

c:\@Work\Perl>perl -wMstrict -le "use 5.010; ;; my $j = quotemeta 'Appl. Phys.'; ;; LINE: for my $l ( 'R.N. Raox, J. Pure and Appl. Phys.', 'R.N. Raox, J. Pure & Appl. Phys.', 'R.N. Raox, J. Pure or Appl. Phys.', 'R.N. Raox, J. Pure and or Appl. Phys.', 'N.E. One, Fly Fishing and Appl. Phys.', 'N.E. One, Fly Fishing or Appl. Phys.', ) { next LINE unless $l =~ m{ (?> (?> and | &) \s+ $j (*SKIP)(*F))? \s+ $j }xms; print qq{match: '$l'}; } " match: 'R.N. Raox, J. Pure or Appl. Phys.' match: 'R.N. Raox, J. Pure and or Appl. Phys.' match: 'N.E. One, Fly Fishing or Appl. Phys.'

Update: See the usual suspects: perlre (esp. Special Backtracking Control Verbs), perlretut, perlrequick.


Give a man a fish:  <%-(-(-(-<

Replies are listed 'Best First'.
Re^2: Another Look behind
by dominic01 (Sexton) on Mar 13, 2015 at 16:54 UTC

    First your solution works perfectly for me. Next your are right that the some of my "capturing" doesnt make any sense and I was just testing few things by capturing it.

    Next I dont understand how to use the negative variable width look-back with (*SKIP)(*FAIL) when I have a huge regex. For e.g.

    $Line =~ /^(Some_RegEx) (Some_RegEx) (Some_RegEx)(?<!foo.*|some text) +$Jrnl (Some_RegEx)$/;
    Appreciate any pointers in this regard.

      I don't have a good idea of your difficulties, but here's a generalized approach. I can't provide a working example at the moment, so the following is untested handwaving.

      The  (*SKIP)(*FAIL) variable-width negative look-back hack works by messing up a match of something you need to have for an overall match. For a large regex, I tend to take the approach of factoring regex elements:

      # stuff we want to capture my $capture_this = qr{ ... }xms; my $capture_too = qr{ ... }xms; my $capture_also = qr{ ... }xms; my $capture_more = qr{ ... }xms; # stuff we want to cause match failure if before certain other stuff my $avoid_this = qr{ ... }xms; my $avoid_too = qr{ ... }xms; my $avoid_also = qr{ ... }xms; my $negatory = qr{ (?> [aeiou]+ | f[eio]e? | $avoid_too) }xms; # stuff we need for an overall match, may or may not be captured my $needed_for_match = qr{ ... }xms; my $needed_too = qr{ ... }xms; my $string = get_stringy_stuff(); my ($this, $too, $also, $yet_another) = $string =~ m{ \A ($capture_this) ($capture_too) ... (?> (?> $avoid_this | $avoid_too | $avoid_also | etc) $needed_for_ +match (*SKIP)(*F))? $needed_for_match # needed for overall match ($capture_also) ... (?> $negatory $needed_too (*SKIP)(*FAIL))? ($needed_too) # needed for overall match, also captured \z }xms; do_something_with($this, $too, $also, $yet_another);


      Give a man a fish:  <%-(-(-(-<

        This is perfect explanation. Thank you very much.