Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi there, I am very new to perl and I am trying to get a search script running. So far it is working as expected however one thing is really bugging me, the search results are sadly full of un-encoded ampersands....yuk, the rest of the site is fully valid XHTML 1.0 strict, so this sort of throws a proverbial spanner in the works. I have managed to kill off some of the ampersands but there are still a couple of illusive ones that my perl knowledge just cant grapple with. This is the output that i get:
<div><strong>Search results (8 found, 8 shown)</strong><p> 1 - <a href="fcp.pl?words=water+and+fountain&wt=ew&bl=or&amp;d=/link</ +a><br> 2 - <a href="fcp.pl?words=water+and+fountain&wt=ew&bl=or&amp;d=/link</ +a><br> 3 - <a href="fcp.pl?words=water+and+fountain&wt=ew&bl=or&amp;d=/link</ +a><br> 4 - <a href="fcp.pl?words=water+and+fountain&wt=ew&bl=or&amp;d=/link</ +a><br> 5 - <a href="fcp.pl?words=water+and+fountain&wt=ew&bl=or&amp;d=/link</ +a><br> </div>
As you can see there is one encoded ampersand in each link but two un-encoded ampersands. The script that is making these ugly little varmints looks like this: (this is just a snippet)
if(exists $form{'words'}) { $search_words = $form{'words'}; if($form{'wt'} eq 'be') {$search_wb = ' checked'; $search_ew = '';} else {$search_wb = ''; $search_ew = ' checked';} if($form{'bl'} eq 'an') {$search_bAND = ' checked'; $search_bOR = ''; +$search_bPHR = '';} elsif($form{'bl'} eq 'ph') {$search_bAND = ''; $search_bOR = ''; $sear +ch_bPHR = ' checked';} else {$search_bAND = ''; $search_bOR = ' checked'; $search_bPHR = '';} $wl = lc $search_words; $wl =~ tr/a-z0-9/ /c; $wl =~ s/(\A\s+)|(\s+\Z)//g; @words = split /\s+/, $wl; if($wl eq '' || $#words < 0) { $extra = "<font color=red>Please enter some words in the search box. +</font><br>"; @words = (); } else { $title = join ' ', 'Search results for', @words; } $search_q = $ENV{'QUERY_STRING'}; $search_q =~ s/\&amp;pg=\d+//; } else { $search_words = ''; $search_wb = ' checked'; $search_ew = ''; $search_bAND = ''; $search_bOR = ' checked'; $search_bPHR = '';
etc etc.. As far as I can tell the problem is the "$search_bAND" thing. Problem is how do I change "AND" to & ? Any thoughts or suggestions are much appreciated thanks. Humble Regards Nathan.

20041009 Edit by castaway: Changed title from 'New to perl'

Replies are listed 'Best First'.
Re: Parsing un-encoded ampersand in XHTML
by astroboy (Chaplain) on Oct 08, 2004 at 10:29 UTC
    I'm not quite clear what you're trying to do - but it could be that you're trying to deal with link encodings. If that's the case, have a look at URI::Escape
Re: Parsing un-encoded ampersand in XHTML
by borisz (Canon) on Oct 08, 2004 at 10:19 UTC
    Your snippet is the wrong part, sorry we need another part to suggest something.
    Boris
      Sorry as I said new... like today ;) here is the whole thing still working on the validation of tags etc but the code is untouched other than the encoded ampersands..

      astroboy! You may have pointed me to the right place but it all went over my head sorry hope the rest of the code helps. For all I know the code may be way heavier than it needs to be, but it works fine just wont validate as XHTML 1.0 strict. Again thanks for your time! Nathan.

      Janitored by Arunbear - added readmore tags

        Whow, your script should learn about
        use strict; use warnings;
        and lexical vars. you should use code and readmore tags, when posting to PM. Back to the quiestion, in your code replace the lines
        $search_q = $ENV{'QUERY_STRING'}; $search_q =~ s/\&amp;pg=\d+//;
        with
        $search_q = $ENV{'QUERY_STRING'}; $search_q =~ s/\&amp;pg=\d+//; $search_q =~ s/&(?!amp;)/&amp;/g;
        But it is just a guess.
        Boris
        For the benefit of anyone reading this thread, here is a perltidy'd version of the script:


Re: Parsing un-encoded ampersand in XHTML
by Anonymous Monk on Oct 08, 2004 at 23:28 UTC
    Your problem seems to be that you are new to HTML too.

    There are two uses of & in HTML. First the one you know about: between > and < . There it codes foreign characters. And is itself a foreign character which must be coded. Now the second use: between < and > . There it separates arguments to CGI-scripts. There it should not be encoded.

    Because other newbies confuse these two uses too, modern browsers can be instructed to use ; instead. Then you can use ; in your links too (the case between < and >).

Re: Parsing un-encoded ampersand in XHTML
by TedPride (Priest) on Oct 09, 2004 at 00:21 UTC
    As I see it, the problem is that some of the & are not encoded properly, right? Just change all &amp; to & and then change all & to &amp;. Problem solved.
    $text =~ s/&amp;/&/gi; $text =~ s/&/&amp;/g;
      Ok lets see now...

      This: $search_bAND
      prints one of these: &

      I want this:
      &amp; So as I see it AND or bAND means & this is the problem. Like I said I am new to perl (not HTML) If I write $search_b&amp; or $search_&amp; the code breaks. Is there a way to change just the $search_bAND line without breaking the rest of the script?