in reply to Change utility; code optimization

Greetings all,

I've discovered something interesting. When running the above, I can enter "\n" in the change from field and the script will correctly interpret it to mean a newline character. When I enter "\n" in the change to field, the script incorrect interprets it to mean literally "\n" rather than a newline.

Looking at my code, I have a line near the end of the script:
$file_contents =~ s/$changefrom/$changeto/gi;

Can anyone explain why $changefrom is interpreted, but $changeto is not? Thanks.

-Gryphon.

Replies are listed 'Best First'.
Re: Re: Change utility; code optimization
by chipmunk (Parson) on Feb 13, 2001 at 00:16 UTC
    This is a good question. The reason is that, on the regex side, the two character sequence "\n" is a metacharacter that means "match a literal newline".

    After the value of $changefrom is interpolated, the regular expression engine compiles the regex, sees "\n", and compiles it to match a newline. (If $changeto instead contained a literal newline, the result would be the same, because a literal newline in a regex matches a literal newline.)

    On the replacement side, however, once the value of $changeto is interpolated, there is no second pass over the string to turn "\n" into a newline.

    One solution is to code the extra pass over $changeto yourself, as in:

    $changeto =~ s/\\n/\n/g; $file_contents =~ s/$changefrom/$changeto/gi;
    Please note that that simple example will change \\n to \<newline>.

    Of course, you then have to consider which escape sequences you want to allow. \t and \r? How about \040, \x20, or \cD? That's up to you. :)

      Greetings chipmunk,

      Thanks for your help. Very good information. I have one follow-up question: Is there a way to default allow all escape sequences? In another words, allow $changefrom and $changeto to be interpolated exactly the same way? That way, a user could run the script and change "\n" to "\t" literally.

      Gryphon.

        I can't think of a good quick and easy way to interpolate the escape sequences.

        Here's a simple but unsafe way: s/$changefrom/qq{"$changeto"}/giee; The right-hand side is evaluated twice, so $changeto gets interpolated and then its value gets interpolated. This is unsafe because $changeto could contain $ or @, giving access to variables in your program, or "s, which would either cause a syntax error or allow the evaluation of arbitrary code. In a script where you know the value of $changeto, it's a useful idiom.

        Here's a safer way:

        $changeto =~ s,(?<!\\)((?:\\\\)*)([\$\@\"]),$1\\$2,g; s/$changefrom/qq{"$changeto"}/giee;
        The first line puts a backslash before each $, @, and " which is not already escaped. A character is already escaped if it's preceeded by an odd number of backslashes. I think that those three characters are the only ones that need to be escaped. However, I could be overlooking something, which I hope someone will point out if I am.

        The safest approach is to turn on taint-checking, and carefully untaint $changeto to make sure it contains a safe replacement string.