blk97tt has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys/gals. I am somewhat new to PERL and need your help. I am trying to search html files using a proxy filter, insert some code onto the top of each page and then push it to the user's browser. I am using a proxy filter named Privoxy for this. Privoxy uses PERL to match/substitute patterns. The best/cleanest way I have been able to come up with so far is to search for "<body>" and insert a <body>"my stuff"</body> before it. I am using a "forward lookup" for this: (.*(?=<body)) and assigning this value to $1 and then replacing it with $1<body>"my stuff"</body>. I am able to get this to work for 90% of the pages. It won't work on pages that have no <body>. It is also not working on pages like www.gmail.com. Gmail's page starts with a bunch of scripts and then eventually hs a <body> tag. I don't understand why (.*(?=<body)) won't match this pattern. Is there a problem with my logic? Thanks in advance. Sal

Replies are listed 'Best First'.
Re: Search and Replace Question
by data64 (Chaplain) on Dec 19, 2004 at 04:36 UTC

    Privoxy does not actually use Perl. It uses Perl Compatible regular expressions, which means it supports mostly the same syntax for regular expression matching. Privoxy lets you apply regular expressions to the html page before it makes it to the browser.

    For example, I use the following to filter out background images on some sites:

    s|<body ([^>]*)background=\S+([^>])>|<body $1 $2>|ig

    I am not completely sure what exactly you are trying to solve, but if you replace <body> with <body>"my stuff"</body> then all your html page will show is "my stuff". If you really do want to do that, I am not sure a forward lookup is needed. Untested code ahead:

    s|<body>|<body>"my stuff"</body>|ig