Hi All
I need to provide search facilities for web sites, from the sites themselves. I have found a number of inverted index modules on CPAN that can do this, but although they’ll index text and html, they don’t do MS Word files - which is a requirement that I have. So I figure if I strip the text out of the proprietary formats then I’ll be able to come up with a viable solution.
Unfortunately, because the sites will invariably be hosted on UNIX, I won’t be able to use Windows-specific Perl modules. Any ideas on alternatives?