The essence of the variable-width negative look-behind hack technique is that some part of a string that is needed in a subsequent match is matched and then "consumed" by  (*SKIP) (which prevents backtracking) when the match is then forced to fail. (The  \K operator already provides very nice variable-width positive look-behind.)

In the example below, the part that is "consumed" consists of some whitespace plus the entire  'Appl. Phys.' piece. If only a single whitespace character were guaranteed always to be present before  'Appl. Phys.' and this character was required for subsequent match, it would have been enough to consume only this character, but this seemed too fragile to me: more whitespace can easily creep in. Also, I have captured nothing because I don't understand just what you want from these captures: e.g., capturing the meta-quoted  'Appl. Phys.' is pointless because it's never going to change. What did you really want from these captures? (I'm also using non-capturing, atomic  (?>pattern) groups here rather than simple non-capturing  (?:pattern) groups because I think it makes reasoning about this trick a little easier.)

c:\@Work\Perl>perl -wMstrict -le "use 5.010; ;; my $j = quotemeta 'Appl. Phys.'; ;; LINE: for my $l ( 'R.N. Raox, J. Pure and Appl. Phys.', 'R.N. Raox, J. Pure & Appl. Phys.', 'R.N. Raox, J. Pure or Appl. Phys.', 'R.N. Raox, J. Pure and or Appl. Phys.', 'N.E. One, Fly Fishing and Appl. Phys.', 'N.E. One, Fly Fishing or Appl. Phys.', ) { next LINE unless $l =~ m{ (?> (?> and | &) \s+ $j (*SKIP)(*F))? \s+ $j }xms; print qq{match: '$l'}; } " match: 'R.N. Raox, J. Pure or Appl. Phys.' match: 'R.N. Raox, J. Pure and or Appl. Phys.' match: 'N.E. One, Fly Fishing or Appl. Phys.'

Update: See the usual suspects: perlre (esp. Special Backtracking Control Verbs), perlretut, perlrequick.


Give a man a fish:  <%-(-(-(-<


In reply to Re: Another Look behind by AnomalousMonk
in thread Another Look behind by dominic01

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.