vroom has asked for the wisdom of the Perl Monks concerning the following question:

Thanks for the suggestions in response to Having scalars handled as regexs inside a substitution... unfortunately none of them worked the way I had hoped they would. Fortunately I've given you some sample code to dissect.

I'm working on some code that will allow arbitrary textual substitutions on anything that is outside of a given set of blocks ie... outside of <CODE>.*?<\CODE;> or outside of HTML tags. The first reason I want this is to split up long words in the chatterbox without breaking URLS.

So here's some test code I've been playing with.

#!/usr/bin/perl my $string="realllylongstringthatrefusestoend". " <A HREF=\"http://perlmonks.org/images/blah/blah/blah\">\n"; print splitter($string,"<.*?>","\S{18}","$1 "); sub splitter{ my($string,$spliton,$find,$replace)=@_; my @array=split(/$spliton/,$string); my $i=0; my @splitters; my $str; while($string=~/($spliton)/g){ push @splitters,$1; } for(@array){ #none of these work #s/$find/$replace/eeg; #s/$find/$1 /g; #eval '$string' . " =~ s/$find/$replace/"; #this works s/(\S{18})/$1 /g; $str.=$array[$i]; $str.=$splitters[$i]; $i++; } $str; }

Replies are listed 'Best First'.
Re: Handling scalars as regexs within a substitution. (Take 2)
by ZZamboni (Curate) on May 22, 2000 at 20:20 UTC
    Ok, here it goes, some points I noticed:
    • You were not giving any parenthesis in your $find argument, so the $1 was not matching anything. I think this was the main thing that kept your attempt with eval from working.
    • You were using double quotes for your "$1 " parameter, which was making it evaluate it at call time, so your subroutine never saw the $1, only a space.
    • I tried enclosing the eval block in braces, but it does not work. I'm still a little bit puzzled about that.
    • I'm sure the for loop can be done without the indexing, but I'm also sure you just did that as a quick hack, so I'm not going to try to correct it :-)
    So the code below works:
    print splitter($string,"<.*?>",'(\S{18})','$1 '); sub splitter{ my($string,$spliton,$find,$replace)=@_; my @array=split(/$spliton/,$string); my $i=0; my @splitters; my $str; while($string=~/($spliton)/g){ push @splitters,$1; } my $a; for (@array){ eval "s/$find/$replace/g; "; die "$@" if $@; $str.=$array[$i]; $str.=$splitters[$i]; $i++; } $str; }
    I still think that eval'ing regular expressions in a quote block may be dangerous. But as long as you control the values of the expressions, it should be ok.

    Hope this helps,

    --ZZamboni

RE: Handling scalars as regexs within a substitution. (Take 2)
by ZZamboni (Curate) on May 22, 2000 at 22:09 UTC
    Ha! Got another solution. Using what chromatic previously suggested. I discovered that it is working, but it is ignoring the space in the '$1 ', so the string is not modified. However, if the replacement string is specified as '$1." "', because the /e modifier evalutes it as a Perl expression, it correctly puts the space after the value of $1.

    So here's another version that works:

    #!/usr/bin/perl my $string="realllylongstringthatrefusestoend <A HREF=\"http://perlmo +nks.org/images/blah/blah/blah\">\n"; print splitter($string,"<.*?>",'(\S{18})','$1." "'); sub splitter{ my($string,$spliton,$find,$replace)=@_; my @array=split(/$spliton/,$string); my $i=0; my @splitters; my $str; while($string=~/($spliton)/g){ push @splitters,$1; } for (@array){ s/$find/$replace/eeg; ; $str.=$array[$i]; $str.=$splitters[$i]; $i++; } $str; }
    I believe this has the same security problems as evaluating with double quotes, because it allows the execution of arbitrary perl code.

    --ZZamboni

Re: Handling scalars as regexs within a substitution. (Take 2)
by Anonymous Monk on Oct 24, 2001 at 18:07 UTC
    Just looking through the archives to find a better solution than the one that I hacked up, but I couldn't find it!
    So, I'll offer my CrapCode to the monestary or something
    # FUNCTION: sub_with_str
    # Safely do a pattern-matching substitution from two strings (i.e. the pattern
    # itself and the substitution string - complete with $1's, etc - are passed in
    # as strings. Note that no pattern validity check is done on $old_patt, check
    # validity before calling this function. Return the new substituted string.
    sub sub_with_str {
        my ($string, $old_patt, $new_patt) = @_;
    
        my @matches = ( $string =~ m/$old_patt/o );
        for (my $i=1; $i <= @matches; $i++) {
            # Find a dollar sign that is not preceeded by an escape character and
            # $i (a number). For example, $1foo${2}bar\$3 will match on $1 and
            # ${2}, but not on $3. Substitute all occurances with their actual
            # match, which was found above and put in the @matches array.
            my $patt_part = '';
            while ($new_patt =~ s/(.*?)(?:\A|(?<=[^\\]))\$(?:$i|\{$i\})(.*)/$2/) {
                $patt_part .= $1 . $matches[$i-1];
            }
            $new_patt = $patt_part . $new_patt;
        }
    
        # Get rid of any other $n (found as explained above) since they weren't
        # found as matches.
        $new_patt =~ s/(?:\A|(?<=[^\\]))\$(?:\d+|\{\d+\})//go;
    
        eval { $string =~ s/$old_patt/$new_patt/o };
    
        return $string;
    }