I recently wrote a small search engine for a site.
It consists in two script: a script that reads up
HTML pages to create an index (some db-files)
and a CGI script to consult this index.
When the user enter a word, CGI script looks up the hash
table in order to pick the corresponding ID.
Then, it searches for that ID in another hash to find
files containing that word.
Something like this:
$id = $Words_db{$word};
foreach $i (keys %Index_db) {
if ($i == $id) {
@fileId = split( /:/, $Index_db{$i});
foreach $fId (@fileId) {
# ...
}
}
}
It works just fine, thanks to hash tables.
Now I'd like to allow users to write only pieces of words to
perform the search (e.g.: "man" will match "man" and "maniac").
In this case, I'd have to modify the code. Something like:
my $piece;
foreach (keys %Words_db) {
if ( ... ) {
# if $piece is a substring of $_
...
} else {
$piece does not occur in $_
...
}
}
I didn't try that, because it seems to be too inefficient.
I'd be glad to see your suggestion. Thank you.
Larsen
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.