Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

$1 in variable regex replacement string

by dvergin (Monsignor)
on Feb 12, 2003 at 17:58 UTC ( [id://234769]=perlquestion: print w/replies, xml ) Need Help??

dvergin has asked for the wisdom of the Perl Monks concerning the following question:

I pulled a late-nighter and with the morning comes brain block. Please save an old man a few brain cells... How do I get the following code to DWIM?
#!/usr/bin/perl -w use strict; my $str = 'abcadefaghi'; my $pat = '(a.)'; my $repl = '$1 '; #$str =~ s/$pat/$1 /g; # Working non-dynamic demo $str =~ s/$pat/$repl/g; print "$str\n"; # prints: '$1 c$1 ef$1 hi'
Obviously the '$1' is being taken as a literal when it is included in the variable replacement string. And the regex lobe in my brain has gone numb.

$repl could be any legal regex replacement string. (?{...}) constructions will be filtered out before the regex is invoked even though these come from a trusted source.)

(Coffee... must have coffee...)

------------------------------------------------------------
"Perl is a mess and that's good because the
problem space is also a mess.
" - Larry Wall

Replies are listed 'Best First'.
Re: $1 in variable regex replacement string
by Enlil (Parson) on Feb 12, 2003 at 18:19 UTC
    This works.
    #!/usr/bin/perl -w use strict; my $str = 'abcadefaghi'; my $pat = '(a.)'; my $repl = sub {"$1 "}; $str =~ s/$pat/&$repl/eg; print "$str\n"; # prints: '$1 c$1 ef$1 hi'

    -enlil

Re: $1 in variable regex replacement string
by tadman (Prior) on Feb 12, 2003 at 18:06 UTC
    One workaround is do tweak it slightly, giving you this:
    my $repl = '$1 '; $repl =~ s/\"/\\"/g; $repl = qq{"$repl"}; $str =~ s/$path/$repl/eeg;
    Note that this involves an eval so you should be very, very careful what you feed it.

      For example you would not want it passed:

      $repl = '\";`hacked`;\"';

      You can make this a whole lot safer (maybe even totally safe) with a suitable sanitization of $repl

      sub munge_string { my ( $str, $pat, $repl ) = @_; # make $repl safe to eval $repl =~ tr/\0//d; $repl =~ s/([^A-Za-z0-9\$])/\\$1/g; $repl = '"' . $repl . '"'; $str =~ s/$pat/$repl/eeg; return $str; }

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        Drawing on dvergin's idea, what about this?
        sub safeswitch { my @P = (undef,$1,$2,$3,$4,$5,$6,$7,$8,$9); $_[0] =~ s/\$(\d)/$P[$1]/g; $_[0]; } my $str = "abcdefghijafjafjkagjakg"; my $pat = '(a.)'; my $repl = '$1 '; $str =~ s/$pat/safeswitch($repl)/eg; print $str,$/;
        The advantage here is that you don't end up re-evaluating the function code each time, just the function call.
      This is the sort of thing I was hoping to avoid.

      Is there any concise way to "be very, very careful" about what would get fed to this snippet. The current source *is* trustable... but I've learned that these things get generalized in future iterations and I want to put in the paranoia now if I go this route.

      Applying my 2 cents 4 years too late. I came here tonight looking because I had the same problem as the original poster and after some thinking on the responses I came up with something I didn't see in the other answers (hopefully I read them all, there were alot).

      anyhoo,...

      ApplyRegExRename('witch(\d\d\d)\.jpg','witch_%02d.jpg'); # param 1 = regex expression for find # param 2 = sprintf expression for replacement sub ApplyRegExRename # supports up to 5 caught matches within the patt +ern { my $regex = shift; my $repl = shift; ... my $file2 = $file; $file2 =~ s/$regex/sprintf($repl,$1,$2,$3,$4,$5)/e; print "renaming: $file to $file2\n"; ... }

      Works good enough for my purposes.

      Keep in mind you can be tricky with sprintf expression to reorder how your $1-n vars are consumed as well.

Re: $1 in variable regex replacement string
by CountZero (Bishop) on Feb 12, 2003 at 20:02 UTC

    The reason it doesn't work is (of course) found in the Camel Book:
    The right side of the substitution (between the second and third slashes) is mostly a funny kind of double-quoted string, which is why you can interpolate variables there, including backreference variables. (p.41)

    As you know you only get one level of interpolation in Perl, i.e. your $repl gets replaced by whatever its value is at that moment and then interpolation stops and hands the regex to the regex-engine, which will find a literal $1 and not the value inside the backreference variable $1 you wished for.

    The only solution is through the /e modifier als already pointed out.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      So I pose roughly the same question here that I posed to tadman -- if I go the /e or /ee route, what are the things that $repl might contain that could potentially do horrible things.

      If I filter out '(?{...})' constructs, is that enough? If so, this solution is do-able.

        Really bad thing could happen even outside '(?{...})' constructs.

        For example (this is something totally innocent, but you get the idea):

        use strict; my $str = 'abcadefaghi'; my $pat = qr/(a.)/; my $repl = 'system dir '; $str =~ s/$pat/$repl/eeg;

        Of course you could try to filter out all system, exec and backticks, but that is only solving a small part of the possible problems as anything inside the $repl-variable gets run as a perl-program.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: $1 in variable regex replacement string
by bart (Canon) on Feb 12, 2003 at 21:22 UTC
    This question pops up a lot, and time and time again, people misunderstand the question and reply as Joey the Saint did. See, for example, this recent thread in comp.lang.moderated. Especially the standpoint of M.J.Dominus is interesting. It brings up a point I have wondered myself a lot: since Perl does interpolation, very well, why is it not available as a user accessible function?

    Now, the solutions proposed there are pretty much the same poorish solutions as proposed here: escape the quotes, put quotes around it, and eval. Or search for /\$\w+/ and replace it with the value you want it to have, perhaps using symbolic references (!).

    A potentially better, and likely safer, solution might be to treat the replacement as a template, and use one of the several templating modules to process it, and incorporate the result that comes out of that, into the substitution.

Re: $1 in variable regex replacement string
by dada (Chaplain) on Feb 13, 2003 at 10:03 UTC
    Update: sorry, I missed jsprat's previous answer, so this post is a poor dupe :-)

    this may not be properly what you're looking for, but Dominus's first Quiz of the Week was very similar.

    the goal was to:

    Write a subroutine, 'subst', which gets a string argument, $s. It should search $s and replace any occurrences of "$1" with the current value of $1, any occurrences of "$2" with the current value of $2, and so on.

    For example, if $1, $2, and $3 happen to be "dogs", "fish" and "carrots", then

            subst('$2, $1 and $3')
    should return
            "fish, dogs, and carrots"
    you can find the original question along with MJD's solution and discussion here.

    hope this helps (or entertains at least :-).

    oh, by the way: Dominus++ for the marvelous QOTW!

    cheers,
    Aldo

    King of Laziness, Wizard of Impatience, Lord of Hubris

Re: $1 in variable regex replacement string
by jryan (Vicar) on Feb 13, 2003 at 07:41 UTC

    Here is a safe, fast, eval-less method that doesn't use s/// :-)

    sub munge { my ($str, $pat, $repl) = @_; while ($str =~ /$pat/g) { my $temp = $repl; for (1..$#+) { while ( (my $x = index $temp, "\$$_") >= 0) { substr ($temp, $x, length($_)+1) = $$_; } } my $pos = pos $str; my $offset = $+[$#+]-$-[0]; substr($str, $pos-$offset, $offset) = $temp; pos($str) = $pos - $offset + length($temp); } return $str; } # Here's an example: print "#"; print munge('aaaababbaabbbbabab', '(a)(b+)', ' $2*$1 '); print "#";
Re: $1 in variable regex replacement string
by Joey The Saint (Novice) on Feb 12, 2003 at 18:43 UTC

    Maybe I haven't had enough coffee today either, but is this the problem?

    #!/usr/bin/perl $foo = "hello"; $f2 = '$foo'; $f3 = "$foo"; print "$f2\n"; print "$f3\n";

    Running that causes this to appear:

    $foo hello

    The single quotes inhibit processing $1 as a variable, it's treated as a string.

    Or maybe I just don't know what you want it to do? :-)

    -J.

      That is not the problem. The problem is to get the RHS of the regex to interpolate the value captured in $1 into the replacement string and do the substitution.

      On first impressions the logic for wanting to do this appears dubious, especially in the simple case given. It might just make sense if you wanted to use $1, $2, etc simultaneously. I can remember wanting to do this some time back for reasons that now totally elude me. This was followed by the realisation that there are many ways to tacke any given problem and this way has dubious value IMHO. There are two basic solutions I know of - the q("$1blah") method with /ee and the sub{"$1blah"} method - both noted above.

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        tachyon wrote:
        there are many ways to tackle the problem and this way has dubious value IMHO

        You will have an easy time convincing me of this. As I continue to muddle around with this, my current approach does have the feel of a mis-guided solution strategy. Problem is, this is the only entry door I can see at present and once I get in the room it seems to be full of only less-than-desirable options.

        So let me back off a bit and state the problem in a more general way.

        Given:
          $str (some arbitrary string)
          $pat (a string to be used as a regex pattern)
          $repl (a string to be used as a regex RHS and
                which may contain $1, $2...)

        Can I / How can I perform the indicated regex substitution on $str?

        At this point in this project I *do* have control over the inputs and even the 'API' for the code that calls this routine. But I am hard put to imagine another way of specifying and requesting this kind of functionality.

        But try me... I'm open to suggestions. I would dearly love to find an elegant way of doing this.

      To Joey The Saint -- sadly, I do not think the answer lies in the direction you suggest.

      The problem is that $pat and $repl come as shown to the routine in question from elsewhere. (The code above is, of course, a reduced demo to focus on the problem.) The contents of $1, $2, etc. (if any) are not known until $pat is invoked here, so I do not see an obvious and practical way to *pre*-load the regex vars in $repl using double-quotish behavior. The substitution must be done *after* $repl comes to this routine.

      I did travel a little way down the following path:

      #!/usr/bin/perl -w use strict; my $str = 'abcadefaghi'; my $pat = '(a.)'; my $repl = '\u$1 '; $str =~ s/$pat/ my @save = (undef, $1, $2, $3, $4, $5, $6); $repl =~ s{\$$_}{$save[$_]}g for 1..6; $repl; /eg; print "$str\n";
      ...and that works for the '$1'-type vars without opening up the eval danger. The problem is that this approach then leaves me with having to write more similar code to handle all other special chars (e.g. '\u' as shown) with similar hand-coded replacements. Safe but yucky.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://234769]
Approved by Corion
Front-paged by VSarkiss
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2024-03-28 10:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found