Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have written a simple code on extraction of lines from the webpage description. Pls look at my codes ..does it make any sense ?? Anyway, I can't run it and pls help me to troubleshoot. I'm still a beginner and quite *dump* on it this.Feel free to give ur comments and suggestions. Thanks in advance..looking forward for some reply a.s.a.p
#!/usr/bin/perl -w require WWW::Search; use CGI; $q = new CGI; $word = $q->param('query'); chomp($word); my $search = new WWW::Search ('AltaVista'); $search->maximum_to_retrieve(10); $search->native_query (WWW::Search::escape_query($word)); &do_print; #Subroutine to print search results sub do_print { print $q->header; print $q->start_html("Web Search"); print $q->h1({-align=>"center"},'Web Concordance Search Results'); print $q->h3({-align=>"center"},"for search term '$word'"); print $q->h4({-align=>"center"},"Producing output....\n"); print $q->hr; while ( $results = $search->next_result()) { $n++; print $q->a({href=>$results->url}, $results->url); $urlresult = $result->url; $result->description = @desc; $desc = "@desc"; #To strip HTML tags crudely $desc = ~s(<[^>*>)()g; @splittext = split(/$word/,$desc); #To extract concordance lines from text for (my $i=1; $i < @splittext; $i++) { my $before = substr((' 'x10).$splittext[$i-1],-20,2 +0); my $after = substr($splittext[$i].' 'x10,0,20); print p($before, strong($word), $after,"\n"), } print $q->br; print $results->title,"\n"; print $q->br; print $results->description,"\n"; print $results->change_date,"\n"; print $q->hr; if ($n == 0) { print "<P>Results not found";} } print qq{<P><A HREF="http://mogana/index.htm">Search Again!</ +A>}; print $q->end_html; 1; }
Rgds,

Edit kudra, 2001-10-24 Changed title

  • Comment on Please review simple code to extract lines from a webpage description
  • Download Code

Replies are listed 'Best First'.
Re: Need help desperately on codes
by dvergin (Monsignor) on Oct 23, 2001 at 09:00 UTC
    Well, hmmm... There are a couple levels of this problem to deal with. First, if you "use strict", a number of slip-ups appear in your code. I lost track of them, but here is a tidied-up version:
    use strict; use CGI; use WWW::Search; my $q = new CGI; my $search = new WWW::Search ('AltaVista'); $search->maximum_to_retrieve(10); #my $word = $q->param('query'); my $word = 'penguin'; # for testing chomp($word); $search->native_query (WWW::Search::escape_query($word)); do_print($q, $search); sub do_print { my ($q, $search) = @_; print $q->header; print $q->start_html("Web Search"); print $q->h1({-align=>"center"},'Web Concordance Search Results'); print $q->h3({-align=>"center"},"for search term '$word'"); print $q->h4({-align=>"center"},"Producing output....\n"); print $q->hr; my $n = 0; while ( my $result = $search->next_result() ) { $n++; print $q->a({href=>$result->url}, $result->url); my $urlresult = $result->url; #my $result->description = @desc; my @desc = $result->description; my $desc = "@desc"; #To strip HTML tags crudely $desc = ~s(<[^>]*>)()g; my @splittext = split(/$word/,$desc); #To extract concordance lines from text for (my $i=1; $i < @splittext; $i++) { my $before = substr((' 'x10).$splittext[$i-1],-20,20); my $after = substr($splittext[$i].' 'x10,0,20); print p($before, strong($word), $after,"\n"), } print $q->br, $result->title, "\n"; print $q->br, $result->description,"\n"; print $result->change_date,"\n"; print $q->hr; } if ($n == 0) { print "<P>Results not found"; } print qq{<P><A HREF="http://mogana/index.htm">Search Again!</A>}; print $q->end_html; }
    This runs (which is nice). Unfortunately it does not return any results for display. Since there were no results to process, I left your un-Perlish 'for' loop and some of the processing in that region of the code alone for now.

    So we go through the docs looking for a simple example to try and end up with:

    use strict; use WWW::Search; my $Search = new WWW::Search(); $Search->native_query(WWW::Search::escape_query('penguin')); while (my $Result = $Search->next_result()) { print $Result->url, "\n"; }
    Which also runs and returns no results. Drat!

    Curiously, in all the docs for this module, there seems to be no example I can find of what constitutes a proper argument for WWW::Search::escape_query( . . . ). I tried several things and nothing seemed to work. And I can't find anything here on PerlMonks or on the web.

    So there we are... until someone else comes along who has some experience with this module, the script runs fine and appears quite harmless! ;-)

Re: Need help desperately on codes
by growlf (Pilgrim) on Oct 23, 2001 at 16:30 UTC
    Wow. I spent an hour on this - and found nothing. Even after looking in the module itself and trying all the examples in all the pages on perldoc.com and the ActiveState POD references.

    I used this code to test a theory though:
    #!/usr/bin/perl use WWW::Search; my $oSearch = new WWW::Search('AltaVista'); my $sQuery = WWW::Search::escape_query("FreeBSD Security"); $oSearch->native_query( $sQuery, { search_url=>"http://altavista.com/sites/search/web" }); print "QRY:".$sQuery."\n"; print "MAX:".$oSearch->maximum_to_retrieve(100)."\n"; my $response = $oSearch->response(); if ($response->is_success) { print "Results:\n"; @results = $oSearch->results(); foreach $result (@results) { print $result->url(), "\n"; }; } else { print "error: " . $response->as_string() . "\n"; }
    The escape_query works fine (tested with several different engines even - it DOES change) but no matter what i did (even specifying the search url which i kyped directly from the AltaVista search form) it retrieved no results but DID get a response.

    One other thing of note - if i gave it a bogus search_url - i still got an 'OK' response and no errors.