silly regex question

batkins has asked for the wisdom of the Perl Monks concerning the following question:

hi monks.

i'm trying to write a regular expression that will replace all single quotes (') not contained in <%...%> tags with a separate string. i've been trying for some time now and feel like a moron for not being able to get it. here's what i've come up with:

s/(%>.*)'(.*<%)/${1}test$2/g;

it even seems to work in test cases. but when i apply it to this file:

"Avert thine eyes, lest ye be stricken with coolness." (from the origi
+nal batkins web site, dated April 8, 2002
)<br><br>
<a href="http://batkins.com/forum/YaBB.pl">Enter the batkins forum</a>
+ - it's the place to be.
<br><br>
batkins.com is best viewed from the Hubble Telescope.
<br><br>
<hr>
<font size="+1"><b><u>the batkins weblog - it's nature's candy</u></b>
+</font>
<br><br>
<%
    # news
    my @news = split(/%%/, slurp("$DAT/news"));
    my $bit;
    my $i;
    for($i = 0; $i < 10; $i++)
    {
        $bit = "<b>$news[$i]</b><br>" . $news[$i + 1] . "<p>";
        print $bit;
        $i++;
    }

%>
<a href="/main/news.pl">View the rest of the weblog (<% print scalar(@
+news) - $i; %> more entries)</a>
<p>
<font face="-2"><a href="/main/all.pl">the batkins directory</a></font
+>
[download]

it doesn't do anything. any idea why?

thanks for your help and let me know if you need any more info.

Comment on silly regex question Select or Download Code

Replies are listed 'Best First'.
Re: silly regex question by chromatic (Archbishop) on Jan 19, 2003 at 00:39 UTC
It doesn't work because there's no end token before the two single quotes. I'm not sure there's a way to do this within a single, maintainable regular expression. I'd rather inch through the file, keeping a flag to signify whether or not I'm in an embedded code sequence, replacing all single quotes while I'm not.	[reply]
Re: silly regex question by pg (Canon) on Jan 19, 2003 at 02:28 UTC
I asked a similar question a while ago, and lots of fellow monks helped me at that time, and I learned a lot, (now their names are much more familiar to me than they did back then ;-) Make some slight changes to some of those solutions, you will get what you want. I don't want to steal their smart solutions, and make those look like mine, so I just give you the link to that thread, you can see those solutions and their authors.	[reply]
Re: silly regex question by bart (Canon) on Jan 19, 2003 at 10:24 UTC
i'm trying to write a regular expression that will replace all single quotes (') not contained in <%...%> tags with a separate string. The recipe I've used for years now: match the skip sequence, or the string you want to match, in that order. If you found a skip sequence, replace the matched string by itself, otherwise, replace by whatever you want. A conversion to simple code: `<s/(<%.*?%>)\|(')/$1 \|\| 'test'/seg;` [download] I didn't have to put parens around the second pattern in this simple case, but it's the way to capture the matched text in case you want to match more than a single literal string, and you want to know what you matched: it'll be in $2.	[reply] [d/l]
Re: silly regex question by Abigail-II (Bishop) on Jan 19, 2003 at 15:27 UTC
`#!/usr/bin/perl use strict; use warnings; while (<DATA>) { s/(<%(?:[^%]+\|%[^>])*%>)\|'/$1 \|\| ""/eg; print; } __DATA__ Leave this line alone. Remove this quote ' and these two quotes '' from this line. Leave <% the first quote ' %> alone, but remove the second ' one. Remove from <% an open but not closed ' tag. ' Leading quote? <% Multiple ' tags %> ' on one ' <% line ''' work %> too ' !` [download] Running this gives: `Leave this line alone. Remove this quote and these two quotes from this line. Leave <% the first quote ' %> alone, but remove the second one. Remove from <% an open but not closed tag. Leading quote? <% Multiple ' tags %> on one <% line ''' work %> too !` [download]	[reply] [d/l] [select]
Re: silly regex question by I0 (Priest) on Jan 19, 2003 at 06:38 UTC
$_=<<ENDHERE; "Avert thine eyes, lest ye be stricken with coolness." (from the origi +nal batkins web site, dated April 8, 2002 )<br><br> <a href="http://batkins.com/forum/YaBB.pl">Enter the batkins forum</a +> - it's the place to be. <br><br> batkins.com is best viewed from the Hubble Telescope. <br><br> <hr> <font size="+1"><b><u>the batkins weblog - it's nature's candy</u></b +></font> <br><br> <% # news my @news = split(/%%/, slurp("$DAT/news")); my $bit; my $i; for($i = 0; $i < 10; $i++) { $bit = "<b>$news[$i]</b><br>" . $news[$i + 1] . "<p>"; print $bit; $i++; } %> <a href="/main/news.pl">View the rest of the weblog (<% print scalar( +@news) - $i; %> more entries)</a> <p> <font face="-2"><a href="/main/all.pl">the batkins directory</a></fon +t> ENDHERE s/'\|(<%.*?%>)/$1\|\|'test'/egs; print; [download]	[reply] [d/l]
Re: silly regex question by Anonymous Monk on Jan 19, 2003 at 03:41 UTC
You need the /s modifier so that . matches newlines. See perldoc perlre. But it won't match anyway, because there isn't a %> sequence before the first single quote. This version optionally matches the start of the string instead of %> and the end instead of <%: `s/ ( (?: # non capturing parens ^\|%> # start of string or %> ) .* ) ' (.* (?:<%\|$) ) /$1 SOMESTRING $2/gsx # add s and x modifiers` [download] I'm not at all sure this is a good or robust solution however, and I'm looking forward to the other replies.	[reply] [d/l]
Re: silly regex question by batkins (Chaplain) on Jan 19, 2003 at 04:05 UTC
thanks, guy. i figured it out. i would explain what i did, but the solution actually has nothing to do with this and would be no fun to explain. thanks again. -bill	[reply]