Shrink Your CGI Searches with CGI::Search

A common use of CGI is to search some kind of database and display the results to the user. Personally, I got really sick of seeing almost the exact same code used over and over again with only slight differences in the regex used to search a flat-file. Further, very few of the current scripts at the place I work are even using cgi-lib.pl, much less HTML::Template (or any other templating system), strict, CGI, or any other CGI techniques developed within the last 5-10 years or so. I shudder to think what would happen if we attempted to run mod_perl.

I wanted to abstract away the process of writing a flat-text search. Further, I wanted to ease the upgrade path into a real database system, so the solution should be easily modified later to use a DBI interface with little modification to older scripts. A templating system (HTML::Template) and user-input validation would be built into its function, so there is no excuse for not doing them. CGI::Search is the result.

It should be noted that CGI::Search is designed to make an easy job easy. It probably won't do a good job of making hard jobs possible. If your searches are like this:

field1 = text1 OR field2 = text2 OR field3 = text3

Or like this:

field1 = text1 AND field2 = text2 AND field3 = text3

Then you shouldn't have a problem with CGI::Search. But if you need to do this:

field1 = text1 AND field2 = text2 OR field3 = text3

Then you'll probably need a full-customized script. Searching using some sort of SQL subset might be implemented someday, but it isn't right now. You could probably run multiple instances of CGI::Search and combine the results, but it would probably be easier to write a fully-customized CGI.

With that out of the way, lets get to some code. Start out with the basics:

use strict;
use warnings;
use CGI qw(:standard);
use CGI::Search qw(:validators);

my $TMPL_FILE    = '/path/to/template';
my $DB_FILE      = '/path/to/flat_file';
my $DB_SEPERATOR = '\|';
[download]

The :validators import brings in a series of default validation subs. Writing custom validation subs is possible, and we'll cover it in a moment. The currently defined subs are INTEGER, WORD, BLOB, EMAIL, and NONE.

Each validator is passed the variable it should validate against and returns a three-element list. The first element is a boolean value specifiying if the data passed the validation or not. The second element is the data that was validated in untainted form (except for the NONE validator, since it doesn't really validate at all--it just returns the data). The third element is a string which can be used to report errors if the data failed to validate. If validation was successful, then the third element can be any old string you want ("Passed" is generally used, but should not be relied upon).

$DB_SEPERATOR will be passed into split, so we need to escape any special regex chars and put it in single quotes to maintain the escaping.

Next, we need to describe how the fields in the database is setup. This is done in an array-of-arrays:

my @DB_FIELDS = (
    [ 'num1',   \&INTEGER,     1 ], 
    [ 'text1',  \&WORD,        0 ], 
    [ 'email',  \&EMAIL,       1 ], 
);
[download]

The first element contains a name of the field. This must line up with the name of field in the search options (see below). The second element is a referance to a validation subroutine to check the field against. Note that the validation sub will be checked against every single element of that field in the database. The third element is a boolean value of whether that field is required or not. If it evalutes to false, then CGI::Search will just ignore the field if it is blank. If it evalues to true, then CGI::Search will throw an error if there is a blank entry for that field in the database.

Search options are done in a hash-of-arrays:

my %SEARCH  = (
    num1  => [ param("num1"),   \&INTEGER ], 
    email => [ param("email"),  \&EMAIL   ], 
);
[download]

Notice that the name of the search options are the same as the name of the field in the database. The first element of the array contains the data being searched for. In this case, all our search terms are being taken from user-input via CGI. The second element is a referance to a validation sub.

Doing paging is not quite implemented yet, but the API is defiend and (hopefully) won't change. I hope that this portion is mostly self-documenting:

my $RESULTS_PER_PAGE  = param('RESULTS_PER_PAGE') || 0;  # 0 means inf
+inate
my $MAX_RESULTS       = 0;   # Also infinate
my $PAGE_NUMBER       = param('PAGE') || 0;  # Pagin numbering starts 
+at 0
[download]

Note that each of these fields are automatically verified using the INTEGER sub.

We're finally ready to intitlize CGI::Search:

my $search  = CGI::Search->new(
    script_name      => $ENV{SCRIPT_NAME}, # Script location on web si
+te--for paging
    template         => $TMPL_FILE,        # Path to HTML::Template fi
+le

    # Database options
    db_file          => $DB_FILE,          # Path to database file to 
+search
    db_seperator     => $DB_SEPERATOR,     # Database field seperator
    db_fields        => \@DB_FIELDS,       # Referance to the database
+ fields description

    # Paging options
    results_per_page => $RESULTS_PER_PAGE, 
    max_results      => $MAX_RESULTS, 
    page_number      => $PAGE_NUMBER, 

    search_fields    => \%SEARCH,          # Referance to search field
+s
);
[download]

We can get our seach results like this:

my @data  = $search->result(1) or die "Error: " . $search->errstr;
my $tmpl  = $search->result(1) or die "Error: " . $search->errstr;
[download]

In list context, result returns your search results an array-of-hashes. This data structure can be put directly into an HTML::Template TMPL_LOOP. In scalar context, an HTML::Template object is returned with the results already entered.

The parameter passed to result defines if we're doing an AND search or an OR search. If the parameter evalutes to a true value, then it any one of the fields being matched against is enough to get the entire database entry. If false, then all the search fields must match to get the entry.

result can also take an optional parameter that overrides the search options passed to new:

my %new_search = (
    num1  => [ param("num1"),        \&INTEGER ], 
    email => [ param("other_email"), \&EMAIL   ], 
);

my $new_tmpl = $search->result(1, \%new_search);
[download]

Templates

The templates you use will run under HTML::Template and must have a certain entries in order to function with CGI::Search. The results will be show inside a TMPL_LOOP, and you must check for errors in both the overall search and in an individual database entry. Here is an example:

<TMPL_UNLESS NAME="error">
    <!-- This will show if there were no problems with the overall sea
+rch -->
    <h1>Search Results</h1>

    <TMPL_UNLESS NAME="results">
        <!-- Shows up if there were no results -->
        <p>No results were found for your search.</p>
    </TMPL_UNLESS>

    <TMPL_LOOP NAME="results">
        <!-- Now we iterate through each of the results -->
        <TMPL_UNLESS NAME="error">
            <!-- 
                Shows up if there wasn't an error with a specifc 
                entry in the database 
            -->

            <!-- 
                Each of the field names lines up with a the name you 
                specified in the search description.  You only 
                need to have the fields here you wish to display to 
                the user.
            -->
            
            <p><TMPL_VAR NAME="num1"></p>
            <p><TMPL_VAR NAME="email"></p>
        <TMPL_ELSE>
            <!-- 
                Shows up if there was an error with a specific entry i
+n 
                the database.  The template variable "error" holds a 
                specific error message.
            -->
            <p>Error in database: <TMPL_VAR NAME="error"></p>
        </TMPL_UNLESS>
    </TMPL_LOOP>

    <!-- For pagination feature, which isn't yet implemented -->
    <p>
    <TMPL_IF NAME="prev"> <!-- If there is a previous page -->
        <a href="<TMPL_VAR NAME="prev">">Previous</a> 
    </TMPL_IF
    <TMPL_IF NAME="next"> <!-- If there is a next page -->
        <a href="<TMPL_VAR NAME="next">">Next</a>
    </TMPL_IF>
    </p>
<TMPL_ELSE>
    <!-- Errors in the overall search (you couldn't open the database,
+ for instance) -->
    <h1><TMPL_VAR NAME="error"></h1> <!-- a short error message -->
    <p><TMPL_VAR NAME="errstr"></p>  <!-- a more descriptive error mes
+sage -->
</TMPL_UNLESS>
[download]

By default HTML::Template is called with the 'die_on_bad_params => 0' option, so any database fields that are matched but you don't want to disply won't kill the entire process.

Custom Validators

As stated above, the validators take in the data to validate, and return a three-element list containing a boolean value of whether the data validated, the data in untainted form (or undef, if it failed to validate), and a string that contains an error message if the data didn't validate. Here is a (stupid) example:

my $custom_validator = sub 
{
    if($_[0] =~ /\A(.*)/\z/) {
        return (1, $1, "Passed");
    }
    else {
        return (0, undef, "$_[0] is not valid");
    }
};
[download]

Note that the above validator will blindly validate and untaint any data you check against it, so it shouldn't be used in a real program.

After defineing it, you can simply pass it into your search terms or database fields description like any other validator:

my @DB_FIELDS = (
    [ 'num1',    $custom_validator,  1 ], 
    [ 'email',   \&EMAIL,            0 ], 
    ...
);
[download]

Future Directions

Getting paging implemented is the priority at the moment. This should help memory usage and speed, since results that aren't going to be showed don't have to be loaded into memory, and we can simply stop the search once we've reached the number of results we will view on the given page. This is already being developed.

Getting more complex searches (like mixing AND and OR searches) is less of a priority, but is definately on the TODO list. I am not currently working on this, but I encourage anyone who is intrested to send patches to implement this. The simplist solution might be to fall back on DBD::CSV, but there might be limitations with that approach in regards to validating the fields.

CGI::Search can help to drastically reduce the size of your code base on your web site. Though I don't think it will work well for all searches, it will be good-enough in many cases, and encrouage good coding practices along the way.

Update: Got rid of dupicate title at top (oops)

Update: Got rid of .sig

Comment on Shrink Your CGI Searches with CGI::Search Select or Download Code