tonyz has asked for the wisdom of the Perl Monks concerning the following question:

This creates a hash where the key is a regex search, and its value is the replacement. I don't know why backreferencing isn't working.

Where the search string (key) is: <CM>HTMLINSERT:<img src=([^>]+)></CM>

The replacement (key's value) is: <figure><graphic url=$1/></figure></p>

Given the following input from TEXTFILE: The quick <CM>HTMLINSERT:<img src="grab me!"></CM> brown fox

My OUTFILE reads: The quick <figure><graphic url=$1/></figure> brown fox

And what I want is: The quick <figure><graphic url="grab me!"/></figure> brown fox

open FSR, "ultimate_fsr.txt" or die "Couldn't open file: $!"; my %fsr_hash = ( ); my $search = undef; my $replace = undef; while (<FSR>) { ($search, $replace) = ($_=~m/"(.*?)" "(((\\")|[^"])*)"/); if ($search ne '' && $replace ne '') { $fsr_hash{$search} = $replace; } } open TEXTFILE, "test_file.txt" or die "Couldn't open file: $!"; open OUTFILE, ">output.txt" or die "Couldn't open file: $!"; while (my $line = <TEXTFILE>) { foreach my $key (keys %fsr_hash) { $line =~ s/$key/$fsr_hash{$key}/g; } print OUTFILE "$line"; }

Thanks for your kind attention!

Replies are listed 'Best First'.
Re: backreferencing fails in a search and replace with a hash
by ikegami (Patriarch) on Jul 31, 2008 at 20:04 UTC

    Interpolation is only happens once. $fsr_hash{$key} get interpolated, but not the $1 in $fsr_hash{$key}'s value. Imagine the following if things worked as you expected:

    $a = '$a'; print "$a";

    You're asking us to help you get Perl to execute arbitrary data, which is never a wise move. A better alternative is to use a template. String::Interpolate would require the least amount of changes.

    See reinterpolation of regexp (and strings?) for an essentially identical question.

Re: backreferencing fails in a search and replace with a hash
by injunjoel (Priest) on Jul 31, 2008 at 20:04 UTC
Re: backreferencing fails in a search and replace with a hash
by almut (Canon) on Jul 31, 2008 at 21:06 UTC

    In this case, you could also use s///ee (double eval).

    my $line = 'The quick <CM>HTMLINSERT:<img src="grab me!"></CM> brown f +ox'; my $key = '<CM>HTMLINSERT:<img src=([^>]+)></CM>'; my %fsr_hash; $fsr_hash{$key} = '"<figure><graphic url=$1/></figure></p>"'; $line =~ s/$key/$fsr_hash{$key}/gee; print "$line\n";

    ___

    $ ./701504.pl The quick <figure><graphic url="grab me!"/></figure></p> brown fox

    Unless I've overlooked something (which I'm sure someone would point out :), the usual worries about possibly executing arbitrary Perl code do not apply here, because whatever user input is in $1, despite the double eval it won't be evaluated — i.e. something like ... src=@{[...some evil code...]} ... in the input, will just produce

    The quick <figure><graphic url=@{[...some evil code...]}/></figure></p +> brown fox

    (Of course, allowing arbitrary HTML code to be included could also be problematic... but that's another issue...)

Re: backreferencing fails in a search and replace with a hash
by moritz (Cardinal) on Jul 31, 2008 at 20:05 UTC
    You need String::Interpolate (or something similar) and the /e modifier:
    use String::Interpolate qw(safe_interpolate); ... $line =~ s/$key/safe_interpolate($fsr_hash{$key})/e;

    (Untested). If you are sure that there are no "evil" components in the replace string you can use eval to do the work of String::Interpolate.

      moritz, I've just tested your suggestion, and it works. Thank you very much! (And thanks to the other Monks also for pointing me in the right direction!)
Re: backreferencing fails in a search and replace with a hash
by toolic (Bishop) on Jul 31, 2008 at 20:28 UTC
    Unrelated to your problem, but here is a suggestion.

    Populating your hash could also be done this way, with fewer variables and lines of code:

    my $re = qr/"(.*?)" "(((\\")|[^"])*)"/; my %fsr_hash; while (<FSR>) { if (/$re/) { $fsr_hash{$1} = $2 } }

    Update: Nevermind the following drivel... the OP wants the extra parens so that they are interpolated. I'll leave the YAPE::Regex::Explain example alone, since it's not hurting anyone.

    It seems as though you have too many capturing parentheses in your regex. You are potentially capturing 4 things, but you are only using 2.

    use warnings; use strict; use YAPE::Regex::Explain; my $re = '"(.*?)" "(((\\")|[^"])*)"'; print YAPE::Regex::Explain->new($re)->explain;

    outputs: