hsmyers has asked for the wisdom of the Perl Monks concerning the following question:

While slimming down some code in a parser, I ran across what I thought was a good solution only to create a nasty debugging situation. Here's a code snippet:
sub dis { shift =~ /.?.?(.)/; }
With the idea of stripping off the optional first two characters and returning the last character. For instance dis('QNP') returns 'P' and so on. To my surprise, this worked---at least seemed to at first. It turns out that in some cases this works, but in others, there are strange side-effects that that leave you looking at once working code that perl now claims is defective. Lucky for me, I typically change only one thing at a time before regression testing and it was fairly easy to spot this one. Here is the patched code:
sub dis { shift =~ /.?.?(.)/; $1; }
The reason I labeled this node a puzzle is that I've no clue as to what was wrong or why this fixes it! So I ask the more knowledgeable out there, what is going on here?

--hsm

"Never try to teach a pig to sing...it wastes your time and it annoys the pig."

Replies are listed 'Best First'.
•Re: Parse Puzzle
by merlyn (Sage) on Mar 30, 2003 at 06:26 UTC
    The times it won't work is when you are using the value in a scalar context, which will return the success status of the match instead of the first memory.

    And you still have a problem with your "fix". If the regex doesn't match (like you have only a single newline in the string, or maybe the string is empty), you'll get the previous $1, which will likely be some other random garbage.

    Perhaps what you really wanted was this:

    sub dis { (shift =~ /.?.?(.)/)[0]; }
    which will return the first, second, or third non-newline character, or an undef if the regex doesn't match.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: Parse Puzzle
by diotalevi (Canon) on Mar 30, 2003 at 06:27 UTC

    I'll bet you're running into the problem detailed at Zen and the Art of Match Variables. Change your code to use the return value of the expression. This just adds a check and returns an empty list on failure.

    sub dis { return shift =~ /.?.?(.)/ ? $1 : (); }
Re: Parse Puzzle
by runrig (Abbot) on Mar 30, 2003 at 06:26 UTC
    Capturing parens will return a list of the matched results. In scalar context, a list returns the length of the list (Update: like merlyn and perlfunc says, in scalar context, the m// operator returns true or false). So my $ch = dis("ABC") will set $ch to "1" (true), but my ($ch) = dis("ABC") will set $ch to "C". Returning $1 in the subroutine explicitly returns the correct thing no matter the context. Some monks will argue that even with this simple regex, you should handle the case where the argument does not match the regex (e.g. the string has newlines, or does not contain even one character, etc).
      In scalar context, a list is the length of the list.
      No. No. There's no such thing as a list in a scalar context. So there can't be a "rule" like this.

      An operator has a "scalar definition" and a "list definition". The scalar definition for regex match is a boolean indicator. See my other post.

      There are no "scalar definitions" which return a list. Period.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.