Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
I have I simple Perl search script which search a database and display the results.
Database is flat text file, pipe (|) delimited. From a search box on a web page, the information is sent to the perl search search script. This then displays the results from the database in the format of the template.
It works, but has one lack: it't insufficiently exact.
Example: I search for a word 'ABBA', alongside with the given word it displays also other words containing a word ABBA inside:
'Barabba'
'Black Sabbath'
'Cabballero'

That is is I just want improve a search result filtration that the script showed only an exact word only.
I am new in Perl, could someone improve this feature impeccably, please? Just correct my code (red line corrected) Thank you.

#!/usr/bin/perl ########################################################## ########################################################## #%FORM = parse_cgi(); print "Content-type: text/html\n\n"; (my $head, my $tmp, my $foot) = get_html($HTML_template); $qs=$ENV{'QUERY_STRING'}; ##read db my @data = read_file($CSV_file); chomp $data[0]; my @fields= split('\|', shift @data); $base_length = @data; error("You have bad file!") if !@fields; error("Database is clear!") if $base_length<1; if($qs =~m/header=([^\&\Z]*)/){push @header,$1;} if($ID_use && $qs =~m/show=([^\&\Z]*)/){ @data = search($1,$ID_field_name); } else{ @conditions=split(/&/,$qs); my $a=0; foreach (@conditions){ ($name, $value) = split(/=/, $_); if($name eq 'search'){ $FORM{search} = $value; @data = search($value, $header[0]); } elsif($name eq 'header'){} elsif($_=~/([^=<>!]+)!=([^=<>!]+)/){@data = search($value, $1, + "!=");} elsif($_=~/([^=<>!]+)=([^=<>!]+)/){@data = search($value, $1); +} $a++; } } my $result; ##matched data foreach(@data){ chomp; @line = split('\|', $_); $a=0; %INSERT=(); foreach(@fields){$INSERT{$_} = $line[$a++];} $result.=get_record($tmp) } %INSERT=(); $INSERT{'#_matches'} = @data; $INSERT{'#_total'} = $base_length; $result = $no_matches_found."<br>" unless @data; print get_record($head), $result, get_record($foot); undef $result; undef $head; undef $foot; exit; ######################################################### sub search{ my $word=shift; my $field=shift; my $action=shift; $word=~tr/+/ /; $word=~s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C",hex($1))/eg; my $position=-1; my $a=0; if($field){ $field=~tr/+/ /; $field=~s/%([\dA-Fa-f][\dA-Fa-f])/pack ("C",hex($1))/eg; foreach(@fields){$position=$a if $_ eq $field; $a++;} } my %match; $word =~s/ +/ /g; my @new_data=(); my @keys= split(" ", $word); if($action eq '!='){for(0..@data-1){$match{$_} = 1;}} foreach $key (@keys){ $a=0; foreach $record (@data){ @line = split('\|', $record); if($field && $position>-1){ if($action eq '!='){ $match{$a} = 0 if $line[$position]=~m/\Q$key/i; } else{$match{$a} = 1 if $line[$position]=~m/\Q$key/i;} } else{ foreach(@line){if ($_=~m/\Q$key/i){$match{$a} = 1; las +t;}} } $a++; } } $a=0; my $b=0; foreach(@data){ $new_data[$b++] = $_ if $match{$a}; $a++; } return @new_data; } sub get_record{ my $text = $_[0]; $text =~ s{<<(.*?)>>}{exists($INSERT{$1}) ? $INSERT{$1} : ""}gsex; return $text; } sub get_html{ my @txt = read_file($_[0]); my $txt; foreach(@txt){$txt.=$_;} $txt=~/(.*)<template>(.*)<\/template>(.*)/s; error("Template-tag not found!") if !$1 or !$2; return ($1,$2,$3); } sub read_file{ open(F, $_[0]) || error("Can't open file $_[0]!"); my @data = <F>; close F; return @data; } sub error{ print "<html><head><title>Error</title>$style</head><body><br><br> +<br><font color=red><h3>$_[0]</h3></font></body></html>"; exit; } ##########################################################

janitored by ybiC: Balanced <readmore> tags around longish codeblock, as per Monastery convention

20040928 Edit by castaway: Changed title from 'tenerif'

Replies are listed 'Best First'.
Re: search for exact word only
by Roy Johnson (Monsignor) on Sep 27, 2004 at 14:15 UTC
    What you want to do is match on a word rather than any part of a word. So mark word boundaries in your match. Your old code here:
    if($action eq '!='){ $match{$a} = 0 if $line[$position]=~m/\Q$key/i; } else{$match{$a} = 1 if $line[$position]=~m/\Q$key/i;}
    Can be rewritten like so:
    $match{$a} = ($line[$position] =~ m/\b\Q$key\E\b/i) ? 1 : 0; $match{$a} = !$match{$a} if $action eq '!=';

    Caution: Contents may have been coded under pressure.
      Please let me a little bit modify my problem.
      There is a number of names which consist of 2 or 3 words.

      For example:

      'Ministry Of Sound' or
      'Oliver Shanti & Friends'
      The user has remembered exactly only the first word ('Ministry'),the other words he has written incorrectly/with mistake.
      I want that this first 'exact' word was enough to find the full name - 'Ministry Of Sound'. Even if there will be shown additionally some other items than the user searched, the user can choose the right item.
      The main thing is to prevent code display of words just containing a required word as a syllable, like 'nistry' or 'inis'..

      I have no access to Perl modules, i use shared hosting. Its not clear for me where was meant to insert lines:
      use strict;
      use warnings;
      use CGI::Carp::FatalsToBrowser();

        The solution I provided solves the problem you describe here. It matches on word boundaries, it does not require the entire string to match. My solution did not include strict, warnings, or Carp, but strict and warnings are pragmas, and do not require external modules. Warnings is not supported in some versions of Perl, so you should rely on the -w option, in that case.

        Caution: Contents may have been coded under pressure.
        I have no access to Perl modules, i use shared hosting.

        I have used various web hosting providers, but I've always had the CGI module. I'm pretty sure it comes as standard with all Perl distributions, so it would be very unlikely that you don't already have it. If you genuinely don't have it, you should give your hosting provider a good kicking.

        By the way - the monk who advised using CGI::Carp made a small typo, which would give you an error. The line should be:

        use CGI::Carp qw(fatalsToBrowser);

        and this (along with use strict) should go straight after the #! line at the start of your file:

        #!/usr/bin/perl -w use strict; use CGI; use CGI::Carp qw(fatalsToBrowser);

        Hope this helps.


        s^^unp(;75N=&9I<V@`ack(u,^;s|\(.+\`|"$`$'\"$&\"\)"|ee;/m.+h/&&print$&
Re: search for exact word only
by muntfish (Chaplain) on Sep 27, 2004 at 14:19 UTC

    I haven't read through all your code (there's rather a lot of it) but in general if you want to search for an exact word, rather than any substring, you can use \b in a regular expression. For example, the following regexp:

    /\bABBA\b/i

    would match ABBA but not any of your other examples.

    I hope this helps. Sorry if I have misunderstood the question.


    s^^unp(;75N=&9I<V@`ack(u,^;s|\(.+\`|"$`$'\"$&\"\)"|ee;/m.+h/&&print$&
Re: search for exact word only
by Grygonos (Chaplain) on Sep 27, 2004 at 15:36 UTC

    Your question has already been adequately answered. However I feel the need to interject. please use CGI;. Any attempt to roll your own query string parser is (usually) wraught with peril. It doesn't make you a "lesser" developer to use the CGI module, it means you're smart enough to use what has been tested EXTENSIVELY and many monks rely on. There is nothing wrong at all with plopping open the CGI.pm file and trying to grok it, if you really wanna understand what's up with it.

    Also

    use strict; use warnings; use CGI::Carp::FatalsToBrowser();
    Check up on the CGI::Carp deal. strict and warnings will help prevent silly mistakes, and make your code more maintable (IMHO).

    Again not trying to sound like a evangelist here, but I made the same mistakes when I started perl, and I wish someone had taken the time to tell me these things.

      Small typo here:

      use CGI::Carp::FatalsToBrowser();

      should be:

      use CGI::Carp qw(fatalsToBrowser);

      s^^unp(;75N=&9I<V@`ack(u,^;s|\(.+\`|"$`$'\"$&\"\)"|ee;/m.+h/&&print$&
Re: search for exact word only
by TedPride (Priest) on Sep 27, 2004 at 21:54 UTC
    My script searches a copy of the original text that has had all non alphabet characters converted to spaces (except ' in words like isn't, which just disappears), all sections of spaces reduced to one space, and all letters converted to lowercase. A space has also been added to the start and end of each section. The search terms are then converted using this method too, and searches are performed for exact phrase, all keywords, and any keywords - depending of course on the number of search results at each level, and the number of keywords submitted.

    This method seems to work well unless you want to return the section of text matched, in which case the copy can't be resized by having characters removed. You should certainly have a copy, though, since exact match searches are much faster than case-insensitive searches, and all text to be searched should be in one file rather than many.