tommyw has asked for the wisdom of the Perl Monks concerning the following question:

We've jut upgraded from 5.4 to 5.6.1, and tripped over what appears to be a bug: specifically, ".*?\322" as the search target of a regular expression doesn't work (actually, it's any highbit character).
perl -w -e '$_="Hello there\n"; s/t/\322/; print $_; s/^.*?\322/yes/; +print $_' Hello ̉here Hello ̉here
Clearly,in the second line, the correct output is "yeshere". I have determined that "(.*?)\322", "[^\322]*" and ".*\322" all work (but the last obviously means something slightly different). Since we've got this problem in existing code, I'd rather find a patch, or a mechanism for finding the problem areas, rather than having to go through all of our code, checking by eye. Any ideas where to start?

Edit Masem 2001-09-14 - Code tags in examples

Replies are listed 'Best First'.
Re: High bit bug in RE?
by japhy (Canon) on Sep 14, 2001 at 17:45 UTC
    Got it! I had to tell C to compare the characters as unsigned char instead of char.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Glad to hear it. But, um... does that help me, or just the next version of Perl?
        You could recompile Perl if you want if you make the adjustment to the source code. Or you could change the way the regex is written.

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: High bit bug in RE?
by Zaxo (Archbishop) on Sep 14, 2001 at 13:01 UTC

    That should be \0322. The leading zero makes it an octal number. \322 matches a previously captured match.

    Update: Hmmm... I dropped the ball on regex numbers. This seems to do what you want:

    $ perl -w -e '$_="Hello there\n"; s/t/\322/; print; s/^.*\322/yes/; pr +int' Hello ̉here yeshere
    The minimalizing '?' seems to be the problem. Enclosing (.*?) in parens also works.

    After Compline,
    Zaxo

      I'm afraid that that doesn't make any difference; neither does using \xD2 instead. There's a section in the perlre documentation which implies \322 only means a previously captured match if I've got 322 brackets (although I'm possibly misreading it).
      Further, using  $a=chr(210); s/.*?$a/yes/; doesn't work either.
      However, all the problems go away if I use \177 (ie. 127 decimal) or lower. So there's a definate inconsistency. :-(
        I'm trying to fix this right now. As for why putting the .*? in parens worked, that's because of a failure to optimize that I've now fixed.

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;