christopherbarsuk has asked for the wisdom of the Perl Monks concerning the following question:

Given the string input and 'input' and 'tinput' and ' input' and 'input ' and input, I want to produce (through substitutive pattern matching) the following string OUTPUT and 'input' and 'tinput' and ' OUTPUT' and 'OUTPUT ' and OUTPUT (that is, substitute occurences of "input" with "OUTPUT" where "input" is surrounded by word boundaries (\b) except when those word boundaries are both followed by single quotes)

My Perl version (5.004?) doesn't seem to support the negative lookbehind regex extension (?<!PATTERN), but it does support the negative lookahead extension (?!PATTERN) ( my sysadmin is unwilling to upgrade Perl, so I'm stuck with this version).

I've come up with s/\b$input_string\b(?!')/$output_string/g (where $input_string = "input" and $output_string = "OUTPUT"), but that produces OUTPUT and 'input' and 'tinput' and ' input' and 'OUTPUT' and OUTPUT, which is not quite right.

I've tried a bunch of other code as well, none of which works correctly... Anybody have any suggestions?

Replies are listed 'Best First'.
Re: complex pattern matching
by tachyon (Chancellor) on Aug 18, 2001 at 11:40 UTC

    Here is a soultion using a /e for the substitution. Lookahead/behind assertions will not work well here so we treat 'input' as the special case of \binput\b that it is. If found we effectively leave it alone by substituting it back in - if not then we sub in OUTPUT.. The result is as desired.

    $_= "input and 'input' and 'tinput' and ' input' and 'input ' and inpu +t"; s/('input'|\binput\b)/($1 eq "'input'") ? $1 : "OUTPUT"/eg; print "Got\n$_\n"; print "Want OUTPUT and 'input' and 'tinput' and ' OUTPUT' and 'OUTPUT ' and OUTPUT +";

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Rigteous, awesome and way cool. Many many thanks.
Re: complex pattern matching
by lemming (Priest) on Aug 18, 2001 at 12:46 UTC

    This one seems to work, but I dumped the \b operator for \w[^'\w]+ This works on
    input and input and 'input' and 'tinput' and notinput and ' input ' and 'input ' and input producing
    OUTPUT and OUTPUT and 'input' and 'tinput' and notinput and ' input ' and 'input ' and OUTPUT

    <code> s/((?:^|\w[^'\w]+))input((?:[^'\w]+\w|$))/$1OUTPUT$2/g; <code>

    Update: whoops, missed a certain point; thought that if input was contained in single quotes, don't substitute. As far as I can tell tachyon's got it, though I did learn more about captures and clusters