in reply to Re: 'one-liner' help
in thread 'one-liner' help

Wouldn't <br><br><\sbody> match <br><br></body> since \s matches any non-whitespace? Granted it would also match any <?body> where ? is any non-whitespace character, but that should not happen often in a html file I write. So the question is why it does not want to match some tag/text cut it and replace?

Here is the script:
perl -pe 's#.*(<div class="Content.*)</div></div></body>#$1#sgi' -i.bak test.html

Here is the test html file:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> <html lang="en"> <head> <meta http-equiv="content-type" content="text/html; charset=iso-88 +59-1"> <title>Test</title> <meta name="generator" content="BBEdit 6.5.2"> </head> <body> <div class="Header"> <img src="images/xlogo.gif" alt="" width="612" height="108" border="2" + align="middle"> </div> <div class="Navigation"> <div class="navbox"><a class="nav" href="OSXTips2.html">Home</a>&nbsp; +</div> <div class="navbox"><a class="nav">Tips:&nbsp;</a><br> &nbsp;<a class="nav2" href="bash_shell.html"> &#8226; Bash</a>&nbsp;<b +r> &nbsp;<a class="nav2" href="beta_tools.html"> &#8226; Beta Tools</a>&n +bsp;<br> &nbsp;<a class="nav2" href="http://www.savagetranscendental.com/OSX.ht +ml.htm"> &#8226; Color LS</a>&nbsp;<br> &nbsp;<a class="nav2" href="spl.html"> &#8226; Lost Password</a>&nbsp; +<br> &nbsp;<a class="nav2" href="ppp.html"> &#8226; PPP Setup</a>&nbsp;<br> &nbsp;<a class="nav2" href="spl.html"> &#8226; Password Lock</a>&nbsp; +<br> &nbsp;<a class="nav2" href="TCSH.html"> &#8226; TCSH Setup</a>&nbsp;<b +r> &nbsp;<a class="nav2" href="gimp.html"> &#8226; The Gimp</a>&nbsp;<br> &nbsp;<a class="nav2" href="vnc.html"> &#8226; VNC &amp; Xfree86</a>&n +bsp;<br> </div> <div class="navbox"><a class="nav" href="construction.html">Links:&nbs +p;</a><br> &nbsp;<a class="nav2" href="http://www.darwinfo.org/"> &#8226; Darwinf +o</a>&nbsp;<br> &nbsp;<a class="nav2" href="downloads.html"> &#8226; Downloads</a>&nbs +p;<br> &nbsp;<a class="nav2" href="construction.html"> &#8226; Dev. Tools</a> +&nbsp;<br> &nbsp;<a class="nav2" href="http:/www.apple.com/macosx/"> &#8226; Mac +OSX</a>&nbsp;<br> &nbsp;<a class="nav2" href="http://www.savagetranscendental.com/OSX.ht +ml"> &#8226; More OSX Tips</a>&nbsp;<br> &nbsp;<a class="nav2" href="http://www.osxfaq.com/"> &#8226; OSX Faq</ +a>&nbsp;<br> </div> <div class="navbox"><a class="nav" href="construction.html">PDFs:&nbsp +;</a><br> &nbsp;<a class="nav2" href="osxpdf.html"> &#8226; OSX</a>&nbsp;<br> &nbsp;<a class="nav2" href="netpdf.html"> &#8226; Networking</a>&nbsp; +<br> &nbsp;<a class="nav2" href="unixpdf.html"> &#8226; Unix Tips</a>&nbsp; +<br> </div> </div> <div class="Content"> 11.03.02 <p><img src="images/consbar.gif" width="464" height="41"></p> <p><b><font size="2">?</font></b>Still working on this page. If anyone has links for me to add, please email me at the address bel +ow.</p> <p>Thanx,</p> <p>SA</p> <p>11.03.02</p> <p><a href="http://member.bcentral.com/cgi-bin/fc/fastcounter-login?21 +64123"><img src="http://fastcounter.bcentral.com/fastcounter?2164123+ +4328253" alt="fastcounter" border="0" width="90" height="16"></a><fon +t size="2"><br> </font><a href="http://www.bcentral.com/fastcounter/"><font face="Aria +l, helvetica" size="1">FastCounter by bCentral</font></a></p> <div class="box"> <B>[<U> <a href="http://www.apple.com" title="Apple">Apple</a></U> ] [ +<U> <a href="http://www.apple.com/developer" title="Apple Developer"> +AppleDeveloper</a></U> ] [<U><a href="downloads.html" title="Download +s"> Downloads</a></U> ]</B>&nbsp;&nbsp;&nbsp;&nbsp;<img src="images/e +mailp.gif" alt="email" width="44" height="51"> &nbsp;&nbsp; <span style="font-size: 14pt; ">Send all mail To:<span style="mso-spac +erun: yes"></span></span><a href="mailto:t"> </a><BR> </div> </div> </body> </html>


Thanks.
SA
:)

Replies are listed 'Best First'.
Re: Re: Re: 'one-liner' help
by dws (Chancellor) on Apr 16, 2003 at 18:47 UTC
    Wouldn't <br><br><\sbody> match <br><br></body> since \s matches any non-whitespace?

    \s matches whitespace.
    \S matches non-whitespace.

    Better to quote the backslash, if necessary, to match it explicitly.

      My bad. i got my \s and \S mixed up. But the script still does not work as stated above.

      Any ideas?

      Thanks.
      SA
      :)
        Any ideas?

        As it happens, yes. Note that   m#</div></div># will not match

        </div> </div>
        because there's (vertical) whitespace between the two tags. You need \s* in your regex at points where whitespace is expected to appear between tags. And you need to slurp the entire file into memory first, since doing this line-by-line won't work.

        I've not tried it, but perlrun notes that -00 will force Perl into "paragraph" mode. Given your HTML, that might be sufficient. Otherwise, investigate the other options in perlrun.