I am so impressed with this
nifty module. It is awesome for finding "similar" documents, the way the old Excite for Web Servers search engine did (the "More Like This One" button).
The unusual aspect of this search technique is that searches become more accurate the larger the query is ... you can input the entire text of a document and the search engine returns a list of documents like it.
I made a modification to it so that I'd see a document ID in the command-line list of results (in addition to the filename), so that I can input the document ID in order to in effect provide all the terms in that document as the new query ... the result is awesomely accurate.
I'd love to have a web interface for this module and give it a try on a real site. I guess the first big obstacle is to turn the module into a daemon so that once all the vectors are created they could "hang around" without having to be recreated each time the search engine is used. Has anybody done any work in that regard?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
|
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.