in reply to Re: Re: Faster Search Engine
in thread Faster Search Engine

Here is a really basic search application for you. In this script you are prompted for a search string but this could easily be CGI input. Note that the quotemeta will escape most chars with a \ which 1) makes the string safe to use in the grep regex and 2)helps thwart hackers. *Do not interpolate a user supplied string into a regex without the quotemeta.* It then grep's out all the lines that contain that string and stores them in an array. The /i makes the search case insensitive. It is looking for an exact match only and will not understand boolean logic.

Using your 70kb 'links.db' text file as the data and searching for 'PlanetGimmick' which is the last entry in the file it takes 0 seconds to run. If you ramp up and do the search 10000 times so that we run long enough to get a valid time it takes 161 seconds or 16.1 milliseconds to do the search. This is on an old PII 233 MHZ 64MB RAM Win95 Perl 5.6 system (my laptop). I expect this is fast enough for most practical purposes. Once you have the match lines in an array you can do whatever processing you want to them. The advantage being that you only process those lines that have matched your search criteria.

#!/usr/bin/perl -wT use strict; # clean up the environment for CGI use delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; $ENV{'PATH'} = '/bin:'; # you may need more path info my $db_file = 'c:/links.db'; print "Find what? "; chomp(my $find = <>); # this escapes regex metachars and makes it safe # to interpolate $find into the regex in our grep. $find = quotemeta $find; # this untaints $find - we have made it safe above # using the quotemeta, this satisfies -T taint mode $find =~ m/^(.*)$/; $find = $1; my $start = time(); open (FILE, "<$db_file") or die "Oops can't read $db_file Perl says $! +\n"; my @db_file = <FILE>; # get the whole database into an array in RAM close FILE; # do the search my @lines = grep {/$find/i}@db_file; my $time = time() - $start; print "Search took $time seconds\n"; if (@lines) { print "Found\n@lines\n"; } else { print "No match\n"; }

I expect this should solve your problem as it is plenty fast enough. It should scale in a linear fashion ie twice as big a file == twice as long for search. The scaling will breakdown when your file becomes larger than can be stored in main memory in an array and the operating system resorts to using swapspace on the disk as virtual RAM. If you get this big send me some options in the IPO OK!

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Replies are listed 'Best First'.
Re: Re: Re: Re: Faster Search Engine
by drewboy (Sexton) on Jul 22, 2001 at 21:19 UTC
    thanks, but i'm a little overwhelmed with everything that you wrote(!). i don't run my site in my own server, if that's what you're presuming (just in case).

    how do i implement your code to my site (running on unix -- at dreamhost). does that mean i have to replace my search.cgi file? or is your script for the purpose of sort of caching the results to my system so that my current search.cgi will perform faster?

    sorry for sounding stupid, i am not that good with perl/cgi. please tell me exactly what to do with the script that you offered. thanks for taking your time to help me out!!

    drewboy

      Yes this is a program that you can currently run on your home computer (you'll need perl see New Monks to get it]. Ultimately any search program will need to be run on the server. It is not configured as a CGI at the moment but could easily be. What sort of results do you want the search to return? Domain name with links to that domain or something else? If you can not get scripts and modules installed on the server let me know if they have the modules 'CGI.pm' and 'HTML::Template' installed. Ask the systems administrator.

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        i went ahead right after i read your prior, prior message and downloaded activeperl. i dunno if i did the right thing. anyway i installed it along with other required files for windows 98 but until there i'm stuck. i dunno how to run a perl script using active perl. i tried to but all i got was an odd ms-dos window that pops up for .001 of a second and disappears in a blink of an eye. i did all the file associations necessary but maybe not(?).

        what i thought about your script was to somehow speed up my current search.cgi file. i already paid a bunch of money for it so i would like to stick to it as much as possible. the mysql version (aptly called links mysql 2.0)is incredibly out of my league at the moment (costs $400!). i'm just trying to find a way to avoid having to worry about my search engine's performance in the future as my database or 'flat file' builds up. i would also want it to have an edge over other search engines who use this software, wherein it is faster and more efficient in bringing up search results. from what i've noticed with other sites is that they can be extremely slow.

        as i've mentioned before i read at the gossamer threads (the company that made links 2.0) forum about using the grep function or perl core dump or whatnot to speed up the search. i believe that you also used grep in the script that you made for me (appreciate it very much). i wonder if i could use this same idea for my current search.cgi file.

        thanks tachyon!!!!

        p.s. -- can you give me a link that gives a good explanation of grep?

        drewboy
        c",)