Intrepid has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks of Great Perl Wisdom,

I came across a strange regex construct recently and it troubled me deeply. So I bethought myself to ask on perlmonks if this has the blessings of the Wall and the Pumpkings or if this is simply hogwash :-).

if ($varstring =~ /arbITRary CharS ($|--$)/) { # do whatever }
Now this use of <SAMP>($|--$)</SAMP> is what is bothering me. It seems to me that "positional anchors" if that is what we could call the use "$" this way, should only be able to be used as such, not as metachars in the other sense. I tried to test this out in a simple one-liner case but I got strange inconclusive results, so now I'm asking.

Thanks,
  Soren

Replies are listed 'Best First'.
Re: Weird? regex Question
by ariels (Curate) on Jul 25, 2001 at 09:24 UTC
    How can you tell what ($|--$) is really trying to match? use re 'debug'!

    #!/usr/local/bin/perl -w use strict; use re 'debug'; "xyz" =~ /arbITRary CharS ($|--$)/;
    When run (actually, when compiled), you get this output:
    compiling RE `arbITRary CharS ($|--$)'
    size 17 first at 1
       1: EXACT <arbITRary CharS >(7)
       7: OPEN1(9)
       9:   BRANCH(11)
      10:     EOL(15)
      11:   BRANCH(15)
      12:     EXACT <-->(14)
      14:     EOL(15)
      15: CLOSE1(17)
      17: END(0)
    anchored `arbITRary CharS ' at 0 (checking anchored) minlen 16 
    
    OPEN1(9) corresponds to the opening (capturing) bracket. It's followed by one of two BRANCHes: EOL (given by $) or EXACT <--> and an EOL (given by --$).

    So Perl can tell you exactly what it thinks you mean by your regular expression.

    use re 'debug' also prints out information about the match, but in this case we don't care about it...

Re: Weird? regex Question
by HyperZonk (Friar) on Jul 25, 2001 at 04:22 UTC
    Update: And to think, I actually even tested the code! As ChemBoy said yesterday (I must not have been listening), "Test both cases before speaking." Fortunately, someone pointed out the error of my ways. The answer given below is entirely wrong ... at least, the substitution part of it, it would seem. Here's me saying "I really haven't figured it out yet." If I do before someone else gets in, I'll post my results here for everyone to see, along with my shame, below.

    RIGHT ANSWER:
    Heh heh <blush> and it was so easy, too ... In an RE, | refers to alternate matches. Strangely, you can capture the zero-width assertion end-of-line or a '--' at the end of the line just as shown. This RE finds the arbitrary characters followed by a space or a space-dash-dash at the end of the line; either case is a match, and either the null string or the dash-dash is captured in $1.

    WRONG ANSWER:
    I will admit right up front that the RE looks very strange to me, also ... perhaps japhy or a similar REMaster can sort out exactly what this is supposed to do. However, $ is only a "positional anchor" when used as the last character in the regex. Otherwise, it indicates variable substitution in the normal double-quote method. Thus, the first thing the parentheses capture is the character matching whatever $| is set to ($| is the autoflush flag, BTW), followed by two dashes at the end of the line. On my system, $| is normally set to '' (empty string), but it could be set to 1 if autoflush has been enabled using the typical Perl idiom $|++;. Again, I have no idea why you might want to do such a thing, except as an Obfu, of course!


    -HZ