in reply to make a web site searchable
Note that my approach is probebly overkill in your situation, it is meant for large data sets (tested on 250MB of text & works surprisingly well).
I found that the best way is to set up an inverted-index of all the terms as well as an index which shows the position of each word within each document.
I then use an algorithm which gives a bonus if the words that are being searched appear close to each other in a ducument -- this proximity-search algorithm is described at http://citeseer.nj.nec.com/cachedpage/550719/1 .
Also to improve the inverted-index words are indexed by their stem (a stemming algorithm can be found here http://www.ldc.usb.ve/~vdaniel/porter.pm ).
Aswell I have implemented an algorithm similar to google's pagerank (a good description of it is at http://citeseer.nj.nec.com/cachedpage/368196/1 ), the popularity of a page is taken into account when returning results.
I use MySQL for all the storage / indexes.
|
|---|