in reply to Re: Re: Faster Search Engine
in thread Faster Search Engine
Here is a really basic search application for you. In this script you are prompted for a search string but this could easily be CGI input. Note that the quotemeta will escape most chars with a \ which 1) makes the string safe to use in the grep regex and 2)helps thwart hackers. *Do not interpolate a user supplied string into a regex without the quotemeta.* It then grep's out all the lines that contain that string and stores them in an array. The /i makes the search case insensitive. It is looking for an exact match only and will not understand boolean logic.
Using your 70kb 'links.db' text file as the data and searching for 'PlanetGimmick' which is the last entry in the file it takes 0 seconds to run. If you ramp up and do the search 10000 times so that we run long enough to get a valid time it takes 161 seconds or 16.1 milliseconds to do the search. This is on an old PII 233 MHZ 64MB RAM Win95 Perl 5.6 system (my laptop). I expect this is fast enough for most practical purposes. Once you have the match lines in an array you can do whatever processing you want to them. The advantage being that you only process those lines that have matched your search criteria.
#!/usr/bin/perl -wT use strict; # clean up the environment for CGI use delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; $ENV{'PATH'} = '/bin:'; # you may need more path info my $db_file = 'c:/links.db'; print "Find what? "; chomp(my $find = <>); # this escapes regex metachars and makes it safe # to interpolate $find into the regex in our grep. $find = quotemeta $find; # this untaints $find - we have made it safe above # using the quotemeta, this satisfies -T taint mode $find =~ m/^(.*)$/; $find = $1; my $start = time(); open (FILE, "<$db_file") or die "Oops can't read $db_file Perl says $! +\n"; my @db_file = <FILE>; # get the whole database into an array in RAM close FILE; # do the search my @lines = grep {/$find/i}@db_file; my $time = time() - $start; print "Search took $time seconds\n"; if (@lines) { print "Found\n@lines\n"; } else { print "No match\n"; }
I expect this should solve your problem as it is plenty fast enough. It should scale in a linear fashion ie twice as big a file == twice as long for search. The scaling will breakdown when your file becomes larger than can be stored in main memory in an array and the operating system resorts to using swapspace on the disk as virtual RAM. If you get this big send me some options in the IPO OK!
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Re: Re: Faster Search Engine
by drewboy (Sexton) on Jul 22, 2001 at 21:19 UTC | |
by tachyon (Chancellor) on Jul 23, 2001 at 03:25 UTC | |
by drewboy (Sexton) on Jul 23, 2001 at 08:52 UTC | |
by tachyon (Chancellor) on Jul 23, 2001 at 13:17 UTC | |
by toadi (Chaplain) on Jul 24, 2001 at 12:04 UTC |