batkins has asked for the wisdom of the Perl Monks concerning the following question:

hi monks.

i'm trying to write a regular expression that will replace all single quotes (') not contained in <%...%> tags with a separate string. i've been trying for some time now and feel like a moron for not being able to get it. here's what i've come up with:

s/(%>.*)'(.*<%)/${1}test$2/g;

it even seems to work in test cases. but when i apply it to this file:

"Avert thine eyes, lest ye be stricken with coolness." (from the origi +nal batkins web site, dated April 8, 2002 )<br><br> <a href="http://batkins.com/forum/YaBB.pl">Enter the batkins forum</a> + - it's the place to be. <br><br> batkins.com is best viewed from the Hubble Telescope. <br><br> <hr> <font size="+1"><b><u>the batkins weblog - it's nature's candy</u></b> +</font> <br><br> <% # news my @news = split(/%%/, slurp("$DAT/news")); my $bit; my $i; for($i = 0; $i < 10; $i++) { $bit = "<b>$news[$i]</b><br>" . $news[$i + 1] . "<p>"; print $bit; $i++; } %> <a href="/main/news.pl">View the rest of the weblog (<% print scalar(@ +news) - $i; %> more entries)</a> <p> <font face="-2"><a href="/main/all.pl">the batkins directory</a></font +>
it doesn't do anything. any idea why?

thanks for your help and let me know if you need any more info.

Replies are listed 'Best First'.
Re: silly regex question
by chromatic (Archbishop) on Jan 19, 2003 at 00:39 UTC

    It doesn't work because there's no end token before the two single quotes.

    I'm not sure there's a way to do this within a single, maintainable regular expression. I'd rather inch through the file, keeping a flag to signify whether or not I'm in an embedded code sequence, replacing all single quotes while I'm not.

Re: silly regex question
by pg (Canon) on Jan 19, 2003 at 02:28 UTC

    I asked a similar question a while ago, and lots of fellow monks helped me at that time, and I learned a lot, (now their names are much more familiar to me than they did back then ;-)

    Make some slight changes to some of those solutions, you will get what you want.

    I don't want to steal their smart solutions, and make those look like mine, so I just give you the link to that thread, you can see those solutions and their authors.

Re: silly regex question
by bart (Canon) on Jan 19, 2003 at 10:24 UTC
    i'm trying to write a regular expression that will replace all single quotes (') not contained in <%...%> tags with a separate string.

    The recipe I've used for years now: match the skip sequence, or the string you want to match, in that order. If you found a skip sequence, replace the matched string by itself, otherwise, replace by whatever you want.

    A conversion to simple code:

    <s/(<%.*?%>)|(')/$1 || 'test'/seg;
    I didn't have to put parens around the second pattern in this simple case, but it's the way to capture the matched text in case you want to match more than a single literal string, and you want to know what you matched: it'll be in $2.
Re: silly regex question
by Abigail-II (Bishop) on Jan 19, 2003 at 15:27 UTC
    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { s/(<%(?:[^%]+|%[^>])*%>)|'/$1 || ""/eg; print; } __DATA__ Leave this line alone. Remove this quote ' and these two quotes '' from this line. Leave <% the first quote ' %> alone, but remove the second ' one. Remove from <% an open but not closed ' tag. ' Leading quote? <% Multiple ' tags %> ' on one ' <% line ''' work %> too ' !

    Running this gives:

    Leave this line alone. Remove this quote and these two quotes from this line. Leave <% the first quote ' %> alone, but remove the second one. Remove from <% an open but not closed tag. Leading quote? <% Multiple ' tags %> on one <% line ''' work %> too !
Re: silly regex question
by I0 (Priest) on Jan 19, 2003 at 06:38 UTC
    $_=<<ENDHERE; "Avert thine eyes, lest ye be stricken with coolness." (from the origi +nal batkins web site, dated April 8, 2002 )<br><br> <a href="http://batkins.com/forum/YaBB.pl">Enter the batkins forum</a +> - it's the place to be. <br><br> batkins.com is best viewed from the Hubble Telescope. <br><br> <hr> <font size="+1"><b><u>the batkins weblog - it's nature's candy</u></b +></font> <br><br> <% # news my @news = split(/%%/, slurp("$DAT/news")); my $bit; my $i; for($i = 0; $i < 10; $i++) { $bit = "<b>$news[$i]</b><br>" . $news[$i + 1] . "<p>"; print $bit; $i++; } %> <a href="/main/news.pl">View the rest of the weblog (<% print scalar( +@news) - $i; %> more entries)</a> <p> <font face="-2"><a href="/main/all.pl">the batkins directory</a></fon +t> ENDHERE s/'|(<%.*?%>)/$1||'test'/egs; print;
Re: silly regex question
by Anonymous Monk on Jan 19, 2003 at 03:41 UTC
    You need the /s modifier so that . matches newlines. See perldoc perlre. But it won't match anyway, because there isn't a %> sequence before the first single quote. This version optionally matches the start of the string instead of %> and the end instead of <%:
    s/ ( (?: # non capturing parens ^|%> # start of string or %> ) .* ) ' (.* (?:<%|$) ) /$1 SOMESTRING $2/gsx # add s and x modifiers
    I'm not at all sure this is a good or robust solution however, and I'm looking forward to the other replies.
Re: silly regex question
by batkins (Chaplain) on Jan 19, 2003 at 04:05 UTC
    thanks, guy. i figured it out. i would explain what i did, but the solution actually has nothing to do with this and would be no fun to explain.

    thanks again.

    -bill