Search engine result limiting (was: help!!)

bobbyboy has asked for the wisdom of the Perl Monks concerning the following question:

I have a search engine which is working fine but I want to add a couple of features to it such as a title to a search which has found a match. A heading something like here are your Search Results. I would also like to number the results and keep the number of matches to 10 on a page.

I would be very grateful if some one could help me!!!

Here is the code

#!/usr/bin/perl

require "get_form_data.pl";
&get_form_data();

$search_term = $FORM{'search'};

my @match;

opendir(DIR, ".");
while($file = readdir(DIR))
{
    next if($file !~ /.html/);
    open(FILE, $file);
    $found_match = 0;
    $title = "";
    while(<FILE>)
    {
        if(/$search_term/i)
        {
            $found_match = 1;
        }
        if((/<TITLE>/) || ($found_title))
        {
            if((/<\/TITLE>/) && (/<TITLE>/))            {
                chop;
                $title = $_;
                $title =~ s/<TITLE>//g;
                $title =~ s/<\/TITLE>//g;
            }
            else
            {
                if($found_title == 1)
                {
                    $title = $_;
                    $found_title = 2
                }
                elsif($found_title == 2)
                {
                    $found_title = 0;
                }
                else
                {
                    $found_title = 1;
                }
            }
        }
    }
    
    if($found_match)
    {

        push @match, qq
   (
        <HTML>\n<BODY BGCOLOR=\"#000099\"link=\"#FFFF00\">\n
        <font face=\"Verdana\"><A HREF="$file">$title</A></font><p>\n
        
        );


    }
    
    close(FILE);
}

closedir(DIR);

# now output
print "Content-type: text/html\n\n";
if (@match) {
  for (@match) {
    print;
  }
}
else {
  print "<HTML>\n<BODY BGCOLOR=\"#000099\"TEXT=\"#FFFFFF\">\n";
  print "<H3>\n";
  print "<font face=\"Verdana\">Sorry!!!</font>\n";
  print "</H3>\n";
  print "<font face=\"Verdana\">Sorry, we were unable to match your qu
+ery<p>Please use the <STRONG>FEEDBACK</STRONG> button</font>\n";
  print "<font face=\"Verdana\">to go to the form and let us know what
+ you are looking for.</font>\n\n";
}

exit;
[download]

2001-04-05 Edit by Corion: Added formatting and CODE tags, changed title

Comment on Search engine result limiting (was: help!!) Download Code

Replies are listed 'Best First'.
Re: Search engine result limiting (was: help!!) by arturo (Vicar) on Apr 05, 2001 at 17:06 UTC
If you want display one title for a search that's found a match and a different one for a search that fails, then your script should first do the search, THEN generate the HTML for the page. That's in fact what you're doing, so it would be as simple as adding a title into the string that you `push` onto the `@match` array. However, you should change what you're doing => notice that you're adding a mini-HTML document for each match you find ... that's bad HTML, and it's also unnecessary. Just add the information about each document during the search to the `@match` array, and when the search is done, check to see that the array has at least one item (`if (@match) { }` is idiomatic Perl that will do this test): if it does, generate a standard HTTP header and your "we found one" HTML header, and then print out the match results. If you don't find any matches, print the "we're sorry, we couldn't find any matches" message. Here are a few more general suggestions about your code; the idea here is to tell you about some tools and strategies that will make your life easier if you learn how to use them; strictly speaking, most of them are not necessary to get this script to work the way you want it to, but some of them are very important. First, you appear to be using some sort of custom form-parsing routine. see use CGI or die; for why this is not a good idea. The standard (i.e. already installed) CGI module is well tested and secure, and heavily documented both here and just about everywhere Perl is spoken. Use it to get at your form data; if you want, you can also use it to generate HTML. If you want to extract a string in between HTML tags (your title-snagging routine) there's a much cleaner way to do it, assuming the `<title>` opening and closing tags are all on the same line (if they might not be, consider using HTML::Parser, which handles such things nicely): `my $page_title = "unknown" # (default value) if (/<title>(.?)<\/title>/) { $page_title = $1; # set the value of $page_title # to be whatever was found in # between the <title> and </title> # tags }` [download] I noted why this is not perfect; but I just wanted to introduce you to the notion of using backreferences to capture data from a regular expression. Then of course, there is the standard (and it's standard for good reasons) exhortation to `use strict;` in all your scripts and to turn on warnings, especially while developing. It catches errors you will make, will show you where you're making certain assumptions that you shouldn't be making, and so forth. HTH Philosophy can be made out of anything. Or less* -- Jerry A. Fodor	[reply] [d/l] [select]
Re: Search engine result limiting (was: help!!) by Hero Zzyzzx (Curate) on Apr 05, 2001 at 17:01 UTC
I haven't given your code more than a cursory glance. Sorry. A few things popped out. On the search results limiting thing, the quick and dirty way would be to use a `for` block. You could also easily set up a "next ten results, previous ten results" button with a for loop and form parameters. I was recently in your boat (being new to perl), and I'm sure you've probably heard it, but you should learn CGI.pm. It makes handling form parameters infinitely more reliable and easier. Because you're using some custom module for form parsing, your complete code isn't here. Another trick I love for debugging is to `use CGI::Carp qw/fatalsToBrowser/;` at the top of your scripts. It will give you good info sometimes, without having to go to the error_log. Get the O'Reilly "Learning Perl" (the llama book). It's excellent. It will help make you a far better perl coder. Thanks merlyn!	[reply] [d/l] [select]
Re: Search engine result limiting (was: help!!) by merlyn (Sage) on Apr 05, 2001 at 16:53 UTC
I have an example of paging through the results of a search at one of my earliest WT columns. -- Randal L. Schwartz, Perl hacker	[reply]
Re: help!! by djw (Vicar) on Apr 05, 2001 at 16:43 UTC
I (--)'d you because I am unable to read your post clearly. Please review the Help section of the site. There you will find this section Submitting Code and Escaping Characters. And please, review your post before submitting. Thanks, djw	[reply]