My first thought is that if you're just demoing it in the first instance, it might be easier just to use a free site-search facility, like the one offered by Google. OK, so you have to suffer some branding, but it's very easy to roll out for prototype purposes.
I definitely wouldn't try and write something from scratch; the rules that make a search work are surprisingly complex; it might look like a simple =~ m/$searchword/, but any search engine has plenty of things to consider like:
- proximity: when searching for more than one word, you might want to rate the result higher where the words are close
- stemming: when searching for "rain", you might want to find results about "raining", "rainy"
- weighting: when searching for a word, how do you tell apart documents where the term is only mentioned once, possibly as a cross-reference ('for more information about rain, see "All About Rain"') from the much more relevant and useful document "All About Rain".
- ....
For a good, free, open implementation of searching, consider HT://dig which isn't in Perl, but is GPL'd, and widely used in academic environments. It can index an HTML site and search its index fast and effectively.
If you're really keen to do it yourself in Perl (it's an interesting project to do), there's plenty of modules on CPAN which might be useful, particularly under Text:: (eg Text::Query), and possibly Lingua:: (eg Lingua::Stem). Since you've got the site source in XML, it should make generation of index of meaningful content much easier.
A
it's raining here
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.