Handling scalars as regexs within a substitution. (Take 2)

vroom has asked for the wisdom of the Perl Monks concerning the following question:

Thanks for the suggestions in response to Having scalars handled as regexs inside a substitution... unfortunately none of them worked the way I had hoped they would. Fortunately I've given you some sample code to dissect.

I'm working on some code that will allow arbitrary textual substitutions on anything that is outside of a given set of blocks ie... outside of <CODE>.*?<\CODE;> or outside of HTML tags. The first reason I want this is to split up long words in the chatterbox without breaking URLS.

So here's some test code I've been playing with.

#!/usr/bin/perl

my $string="realllylongstringthatrefusestoend".
" <A HREF=\"http://perlmonks.org/images/blah/blah/blah\">\n";
print splitter($string,"<.*?>","\S{18}","$1 ");
sub splitter{
    my($string,$spliton,$find,$replace)=@_;
   my @array=split(/$spliton/,$string);
    my $i=0;
    my @splitters;
    my $str;
    while($string=~/($spliton)/g){
        push @splitters,$1;
    }
    for(@array){

        #none of these work
       #s/$find/$replace/eeg;               
        #s/$find/$1 /g;
        #eval '$string' . " =~ s/$find/$replace/";

        #this works
        s/(\S{18})/$1 /g;
        $str.=$array[$i];
        $str.=$splitters[$i];
        $i++;
    }
    $str;
}
[download]

Comment on Handling scalars as regexs within a substitution. (Take 2) Download Code

Replies are listed 'Best First'.
Re: Handling scalars as regexs within a substitution. (Take 2) by ZZamboni (Curate) on May 22, 2000 at 20:20 UTC
Ok, here it goes, some points I noticed: You were not giving any parenthesis in your $find argument, so the $1 was not matching anything. I think this was the main thing that kept your attempt with eval from working. You were using double quotes for your "$1 " parameter, which was making it evaluate it at call time, so your subroutine never saw the $1, only a space. I tried enclosing the eval block in braces, but it does not work. I'm still a little bit puzzled about that. I'm sure the for loop can be done without the indexing, but I'm also sure you just did that as a quick hack, so I'm not going to try to correct it :-) So the code below works: `print splitter($string,"<.*?>",'(\S{18})','$1 '); sub splitter{ my($string,$spliton,$find,$replace)=@_; my @array=split(/$spliton/,$string); my $i=0; my @splitters; my $str; while($string=~/($spliton)/g){ push @splitters,$1; } my $a; for (@array){ eval "s/$find/$replace/g; "; die "$@" if $@; $str.=$array[$i]; $str.=$splitters[$i]; $i++; } $str; }` [download] I still think that eval'ing regular expressions in a quote block may be dangerous. But as long as you control the values of the expressions, it should be ok. Hope this helps, --ZZamboni	[reply] [d/l]
RE: Handling scalars as regexs within a substitution. (Take 2) by ZZamboni (Curate) on May 22, 2000 at 22:09 UTC
Ha! Got another solution. Using what chromatic previously suggested. I discovered that it is working, but it is ignoring the space in the '$1 ', so the string is not modified. However, if the replacement string is specified as '$1." "', because the /e modifier evalutes it as a Perl expression, it correctly puts the space after the value of $1. So here's another version that works: `#!/usr/bin/perl my $string="realllylongstringthatrefusestoend <A HREF=\"http://perlmo +nks.org/images/blah/blah/blah\">\n"; print splitter($string,"<.*?>",'(\S{18})','$1." "'); sub splitter{ my($string,$spliton,$find,$replace)=@_; my @array=split(/$spliton/,$string); my $i=0; my @splitters; my $str; while($string=~/($spliton)/g){ push @splitters,$1; } for (@array){ s/$find/$replace/eeg; ; $str.=$array[$i]; $str.=$splitters[$i]; $i++; } $str; }` [download] I believe this has the same security problems as evaluating with double quotes, because it allows the execution of arbitrary perl code. --ZZamboni	[reply] [d/l]
Re: Handling scalars as regexs within a substitution. (Take 2) by Anonymous Monk on Oct 24, 2001 at 18:07 UTC
Just looking through the archives to find a better solution than the one that I hacked up, but I couldn't find it! So, I'll offer my CrapCode to the monestary or something # FUNCTION: sub_with_str # Safely do a pattern-matching substitution from two strings (i.e. the pattern # itself and the substitution string - complete with $1's, etc - are passed in # as strings. Note that no pattern validity check is done on $old_patt, check # validity before calling this function. Return the new substituted string. sub sub_with_str { my ($string, $old_patt, $new_patt) = @_; my @matches = ( $string =~ m/$old_patt/o ); for (my $i=1; $i <= @matches; $i++) { # Find a dollar sign that is not preceeded by an escape character and # $i (a number). For example, $1foo${2}bar\$3 will match on $1 and # ${2}, but not on $3. Substitute all occurances with their actual # match, which was found above and put in the @matches array. my $patt_part = ''; while ($new_patt =~ s/(.?)(?:\A\|(?<=`[^\\]`))\$(?:$i\|\{$i\})(.)/$2/) { $patt_part .= $1 . $matches`[$i-1]`; } $new_patt = $patt_part . $new_patt; } # Get rid of any other $n (found as explained above) since they weren't # found as matches. $new_patt =~ s/(?:\A\|(?<=`[^\\]`))\$(?:\d+\|\{\d+\})//go; eval { $string =~ s/$old_patt/$new_patt/o }; return $string; }	[reply] [d/l] [select]