yacoubean has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to comment out all the cfmail blocks in my ColdFusion site, and something weird is happening. Here's my code:
$outdata =~ s{<cfmail}{<!--- <cfmail}g; $outdata =~ s{</cfmail>}{</cfmail> --->}g;
Here's the expected result:
<!--- <cfmail to="#to_address#"> ...... </cfmail> --->
But this is what I'm getting:
<!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!-- +- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- < +!--- <!--- <cfmail to="#to_address#"> ..... </cfmail> ---> ---> ---> ---> ---> ---> ---> --->
I can't for the life of me figure out why its repeating those comment tags over and over again.

Replies are listed 'Best First'.
Re: Unexpected results from a regex replacement
by jimbojones (Friar) on Nov 10, 2004 at 22:06 UTC
    Hi

    Can you post a full code snippet with __DATA__ tags that show this effect? My test on 5.6.1 doesn't show the behavior that you're seeing.

    foreach my $outdata ( <DATA> ) { print "Init: $outdata\n"; $outdata =~ s{<cfmail}{<!--- <cfmail}g; $outdata =~ s{</cfmail>}{</cfmail> --->}g; print "Final: $outdata\n\n"; } __DATA__ <cfmail to="#to_address#"> </cfmail>
    The results were

    Init: <cfmail to="#to_address#"> Final: <!--- <cfmail to="#to_address#"> Init: </cfmail> Final: </cfmail> --->
    So it's hard to see where the your issue is coming from. My only assumption is that your regex code is in a loop that runs the 25 times that the comment characters get repeated.

    - j
      Ok, here's the entire sub:
      sub find_replace { my $filename = shift; open (my $infile, "$dir/$filename") or die "Can't open file: $!"; my $outdata = ""; while (<$infile>) { $outdata .= $_; $outdata =~ s{../CLRIS/}{}g; $outdata =~ s{../../menu/}{}g; $outdata =~ s{../../Images}{Images}g; $outdata =~ s{<cfmail}{<!--- <cfmail}g; $outdata =~ s{</cfmail>}{</cfmail> --->}g; } close $infile; open (my $outfile, "+>$dir/$filename") or die "Can't open file: $! +"; print $outfile "$outdata"; close $outfile; }
      The first three replacements work correctly.

      P.S. I gave the before/after data in my first post. I'm using ActivePerl 5.8.4.810.
        The problem is that each time you append the new line to $outdata, and then run your replacements against the entire text (not just the newly added line). To fix it, either run your replacements once outside of the loop or run them only on the new line. Actually, I would suggest another approach without the loop.
        sub find_replace { my $filename = shift; open (my $infile, "$dir/$filename") or die "Can't open file: $!"; local $/ = undef; my $outdata = <$infile>; close $infile; $outdata =~ s{../CLRIS/}{}g; $outdata =~ s{../../menu/}{}g; $outdata =~ s{../../Images}{Images}g; $outdata =~ s{<cfmail}{<!--- <cfmail}g; $outdata =~ s{</cfmail>}{</cfmail> --->}g; open (my $outfile, "+>$dir/$filename") or die "Can't open file: $! +"; print $outfile "$outdata"; close $outfile; }
        Oh, while I'm looking at it, I suspect you don't actually mean s{../CLRIS/}{}g;. It seems much more likely that you mean s{\.\./CLRIS/}{}g;
Re: Unexpected results from a regex replacement
by ikegami (Patriarch) on Nov 10, 2004 at 22:07 UTC

    The problem is outside what you showed us here, I bet. What you showed works fine (as seen below). Could you provide more of the code?

    $outdata = <<'__EOI__'; <cfmail to="#to_address#"> ...... </cfmail> __EOI__ $outdata =~ s{<cfmail}{<!--- <cfmail}g; $outdata =~ s{</cfmail>}{</cfmail> --->}g; print($outdata); __END__ output ====== <!--- <cfmail to="#to_address#"> ...... </cfmail> --->
Re: Unexpected results from a regex replacement
by bgreenlee (Friar) on Nov 10, 2004 at 22:08 UTC

    Because your regex just looks for <cfmail, which still occurs in the commented-out version. My guess is that it is repeated because you've run it multiple times. Try this (untested):

    $outdata =~ s{(?<!<!--- )<cfmail}{<!--- <cfmail}g $outdata =~ s{</cfmail>(?! --->)}{</cfmail> --->}g;

    That will only replace cfmail tags that aren't preceeded by a comment delimiter, and closing cfmail tags that aren't followed by a comment delimiter.

    -b

      Good thought, but that is not the case. I am deleting the comment tags after every run.

      Here is the original code copied/pasted directly from one of my pages:
      <cfmail to="#to_address#" .... </cfmail>
      Now, I just ran the script again and got (again, copied directly from the page after the script ran):
      <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!-- +- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- <!--- < +!--- <!--- <!--- <!--- <cfmail to="#to_address#" ..... </cfmail> ---> ---> ---> ---> ---> ---> ---> --->
      Ok, that's odd. Even though bgreenlee was wrong in saying that I wasn't clearing out the comment tags from previous runs, his code still did the trick. It doesn't make sense to me why
      $outdata =~ s{(?<!<!--- )<cfmail}{<!--- <cfmail}g; $outdata =~ s{</cfmail>(?! --->)}{</cfmail> --->}g;
      works better than
      $outdata =~ s{<cfmail}{<!--- <cfmail}g; $outdata =~ s{</cfmail>}{</cfmail> --->}g;
      but it does, so I'm not going to complain. :)
        The problem is correctly identified in Re^3: Unexpected results from a regex replacement (++). You are running the regexp on the $outdata every time you add a line to it.

        The reason the above regexp works is it doesn't look for "<cfmail", it looks for "<cfmail" that isn't preceded by a "<!--" comment tag. Consider the following:

        my $outdata_v1 = ""; my $outdata_v2 = ""; my $data_offset = tell DATA; my $line_count = 1; print "First Regexp solution\n"; print "-"x20, "\n"; while ( <DATA> ) { $outdata_v1 .= $_; print "outdata for read of line $line_count before:\n$outdata_v1\n"; $outdata_v1 =~ s{<cfmail}{<!--- <cfmail}g; $outdata_v1 =~ s{</cfmail>}{</cfmail> --->}g; print "outdata for read of line $line_count after:\n$outdata_v1\n"; $line_count++; } #-- reset it all, start again with the better regexp. seek( DATA, $data_offset, 0); $line_count = 1; print "Second Regexp solution\n"; print "-"x20, "\n"; while ( <DATA> ){ $outdata_v2 .= $_; print "outdata for read of line $line_count before:\n$outdata_v2\n"; $outdata_v2 =~ s{(?<!<!--- )<cfmail}{<!--- <cfmail}g; $outdata_v2 =~ s{</cfmail>(?! --->)}{</cfmail> --->}g; print "outdata for read of line $line_count after:\n$outdata_v2\n"; $line_count++; } __DATA__ <cfmail to="#to_address#"> </cfmail> <cfmail to="#to_address_2#">
        The output is:
        First Regexp solution -------------------- outdata for read of line 1 before: <cfmail to="#to_address#"> outdata for read of line 1 after: <!--- <cfmail to="#to_address#"> outdata for read of line 2 before: <!--- <cfmail to="#to_address#"> </cfmail> outdata for read of line 2 after: <!--- <!--- <cfmail to="#to_address#"> </cfmail> ---> outdata for read of line 3 before: <!--- <!--- <cfmail to="#to_address#"> </cfmail> ---> <cfmail to="#to_address_2#"> outdata for read of line 3 after: <!--- <!--- <!--- <cfmail to="#to_address#"> </cfmail> ---> ---> <!--- <cfmail to="#to_address_2#"> Second Regexp solution -------------------- outdata for read of line 1 before: <cfmail to="#to_address#"> outdata for read of line 1 after: <!--- <cfmail to="#to_address#"> outdata for read of line 2 before: <!--- <cfmail to="#to_address#"> </cfmail> outdata for read of line 2 after: <!--- <cfmail to="#to_address#"> </cfmail> ---> outdata for read of line 3 before: <!--- <cfmail to="#to_address#"> </cfmail> ---> <cfmail to="#to_address_2#"> outdata for read of line 3 after: <!--- <cfmail to="#to_address#"> </cfmail> ---> <!--- <cfmail to="#to_address_2#">
        You can see that your original regexp (as Eimi Metamorphoumai correctly pointed out), runs on every line in your file for each line in the file, adding a new comment flag every time. The second regexp solution does not add a new comment every time, since it is constructed to look for cfmail flags that are not preceded by a comment.

Re: Unexpected results from a regex replacement
by dimar (Curate) on Nov 10, 2004 at 22:28 UTC

    Those spurious comment tags may be the result of some other part of your code, perhaps a loop. If you wanted to, you could comment out all the cfmail tags in your file with a single RegEx if you turn on the 's' flag at the end of your RegEx.

    The RegEx in the following sample code is a tad longer, but it enables you to comment out all the cfmail tags, even those that are not 'neatly typed in' but still valid CFML. It also does all the replacements without using a 'loop' because of the 's' flag that we tack on to the end.

    ...
    before ...
    ...
    ### begin_: file metadata ### desc : comment out cfmail tags in a coldfusion file ### begin_: init perl use strict; use warnings; my $sTest = join '',<DATA>; ### ### begin_: do the replacement $sTest =~ s{(<\s*cfmail[^>]*>.*?\s*/\s*cfmail\s*>)} {<!---\n$1\n--->}gs; print $sTest; __DATA__ <cfparam name="foo" value="fee" /> <cfmail to="#to_address#"> yadda yadda yadda </cfmail> <cfoutput> the following cfmail tag is messy, but still valid cold-fusion CFML </cfoutput> < cfmail to="#to_address#" > yadda yadda yadda </ cfmail >
    ...
    after ...
    ...
    <cfparam name="foo" value="fee" /> <!--- <cfmail to="#to_address#"> yadda yadda yadda </cfmail> ---> <cfoutput> the following cfmail tag is messy, but still valid cold-fusion </cfoutput> <!--- < cfmail to="#to_address#" > yadda yadda yadda </ cfmail > --->