Hue-Bond has asked for the wisdom of the Perl Monks concerning the following question:

I've just read section 5.9.2.1 "When backslashes happen" of "Programming Perl" and are having trouble getting it. I think.

If I understand correctly, interpolation in regexes happens in two phases: the first one is a normal double-quotish interpolation (if allowed by the regex delimiters) and the second is made by the regex engine. The first one defers most of the work in order to prevent the engine from seeing \t as a space and trimming it if the /x modifier is specified (among other things, I suppose). The engine doesn't understand \u, \U, \l, \L, \E and \Q, and it takes them literally.

So m'\u$var\t'; matches a string contaning a backslash, the character 'u', a dollar sign, the string 'var' and a tab. Is this right?

And m/\u$var\t/; matches whathever is the value of $var (with its first character capitalized) followed by a tab. And the tab is expanded by the regex engine, not by the quote interpolation, isn't it?

But then:

$ perl -Mstrict -w print my $var = '\utext'; "Text" =~ m/$var/ && print "matches\n"; __END__ Unrecognized escape \u passed through in regex; marked by <-- HERE in +m/\u <-- HERE text/ at - line 2. \utext$ _

Why is not $var interpolated in this regex? Maybe because Perl won't do a double-quotish interpolation twice? (one pass for turning '$var' into '\utext' and another one for translating '\utext' into 'Text').

--
David Serrano

Replies are listed 'Best First'.
Re: \U, \E and friends interpolation in regex operators
by davido (Cardinal) on Dec 14, 2005 at 03:58 UTC

    The problem you're seeing is that \u is meaningless in the context of a regular expression. It is meaningful in the context of a quoted string (double-quoted style interpolation), but you're not allowing that to happen. The literal string '\utext' is finding its way to the regular expression engine. There is no double-quotish interpolation occurring at the stage where you would need it to happen. Your sense is right; double-quotish interpolation isn't a recursive descent mechanism; it only happens once. You could change '\utext' to "\utext" so that the interpolation occurs at the time that $var is defined. Or you could pre-process $var before passing it to the RE engine.

    The RE compiler does interpolate $var into m//, but it doesn't dive into multiple depths to see if it can find anything within $var that might be interpolated somehow. Imagine this example (which clearly doesn't perform multi-level interpolation):

    $var1 = 'test'; $var2 = '$var1' $string = 'test'; print "Match\n" if $string =~ m/$var2/;

    This seems a little rediculous, and it's not surprising that the match fails. You're asking for a similar interpolation to occur, and it won't.


    Dave

Re: \U, \E and friends interpolation in regex operators
by l.frankline (Hermit) on Dec 14, 2005 at 04:15 UTC

    Hi,

    when you store Escaping characters like \U, \E, \u and so on... in a scalar variable, avoid using single quote instead of double quote, because the regex can't recognise the Escaping characters. If you do like $var = '\utext' the regex will interpolate. here is an example. try this:

    print my $var = "\utext"; "Text" =~ m/$var/ && print "matches\n";

    Regards,
    Franklin

    Don't put off till tomorrow, what you can do today.