wind has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks,

I recently observed a slight difference in the way that perl interpolates variables in strings versus regular expressions. The difference arises when a regex interpolates a variable followed by braces. Instead of treating the variable as a hash, it instead treats it as a scalar followed by a repetition quantifier. Observe this contructed example:
my $string = 'aaaaa'; my $var = 'a'; my %var = (5 => 'hash interpolated'); print "String: $var{5}\n"; print "Regex: ", ($string =~ /$var{5}/ ? 'scalar interpolated' : 'hash + interpolated'), "\n"; print "Regex (force hash): ", ($string =~ /${var{5}}/ ? 'scalar' : 'ha +sh'), "\n"; print "Regex (force scalar): ", ($string =~ /${var}{5}/ ? 'scalar' : ' +hash'), "\n"; # Output is: # String: hash interpolated # Regex: scalar interpolated # Regex (force hash): hash # Regex (force scalar): scalar
Now, I only observed this in the process of helping someone on another forum with their malformed code. In that example the best way to fix their issue was /(?:\Q$var\E){5}/.

However, I've continued to search for documentation about this pecularity to no avail. Are you monks familiar with this slight difference in string versus regex interpolation? Are there other differences that may not be common knowledge because they are obscure enough to rarely come up?

Any insights would be appreciated.
- Miller

Replies are listed 'Best First'.
Re: Interpolation differences between Strings and Regular Expressions (weight)
by tye (Sage) on Jun 13, 2007 at 02:32 UTC

    You've run into probably the most vague area of Perl parsing DWIMery. It is the only place in the Perl source code that mentions "weigh". How to interpret such things in a regex is determined by weighing several different criteria so there is no easy explanation as to what Perl will choose. For example, I'm disappointed that in a regex Perl chooses to interpret [$] as the start of a character class followed by the contents of the $] variable. I think it got that DWIM aspect just wrong.

    But in your case, I think Perl got the DWIM correct. The string case is fairly straight-forward and mostly just greedy parsing so it always pulls in the trailing {...} to make a hash deref unless you do something to tell it not to.

    For the regex case, {5} looks more like a quantifier than like a hash key because a hard-coded number as a hash keys is a bit unlikely, though this isn't a slam-dunk winner.

    - tye        

      mostly just greedy parsing so it always pulls in the trailing {...} to make a hash deref

      Similarly, if the lexer in the regexp engine doesn't find a closing curly, the opening curly automatically loses its meta aspect...

      print "a{5" =~ /a{5/

      ... prints 1. This could be the source of annoying errors if you're not careful. The explanation I received was that in terms of costs and benefits, to maintain sufficient context to maintain the ability to report the error would be too much of overhead during the parse. Or something like that, I'm a little hazy on the details by now.

      Nor can I recall having been bitten by this behaviour, so the decision as it stands was probably correct.

      • another intruder with the mooring in the heart of the Perl

        Actually, it was purely a decision to maintain some backward compatibility with earlier regex implementations that didn't treat curlies as metacharacters. In retrospect, probably a mistake, but traditional regex syntax is full of such mistakes, where rather than having a consistent backslashing rule, you have to put metacharacters where they don't make sense to match them literally, such as the infamous []x-] character class, which cannot be written in any other order, because the literal hyphen must come either first or last, and the right bracket may only come first, so the hyphen must come last.
Re: Interpolation differences between Strings and Regular Expressions
by naikonta (Curate) on Jun 13, 2007 at 02:37 UTC
    I've continued to search for documentation about this pecularity to no avail
    From perldata:
    Within search patterns (which also undergo double-quotish substitution) there is an unfortunate ambiguity: Is /$foo[bar]/ to be interpreted as /${foo}[bar]/ (where [bar] is a character class for the regular expres- sion) or as /${foo[bar]}/ (where [bar] is the subscript to array @foo)? If @foo doesn't otherwise exist, then it's obviously a character class. If @foo exists, Perl takes a good guess about [bar], and is almost always right. If it does guess wrong, or if you're just plain paranoid, you can force the correct interpretation with curly braces as above.

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

Re: Interpolation differences between Strings and Regular Expressions
by aquarium (Curate) on Jun 13, 2007 at 06:13 UTC
    If you're using integrals only, then perhaps an array is a better choice to access the elements. In any case, to disambiguate $hash{"5"}
    the hardest line to type correctly is: stty erase ^H