vsailas has asked for the wisdom of the Perl Monks concerning the following question:

I was looking to execute a search on pod2html created html files.
I am trying to implement it using indexing, so that search may be faster.

Please help me on how to go about it.
Thanks in Advance.

Replies are listed 'Best First'.
Re: Search on html files
by tachyon-II (Chaplain) on Nov 29, 2007 at 10:03 UTC

    While writing your own search engine might well be fun I would recommend using an open source one like swish-e to do the heavy lifting generating the index. It has a nice set of Perl bindings and works very well.

Re: Search on html files
by GrandFather (Saint) on Nov 29, 2007 at 09:50 UTC

    First off you need to define the problem a little better. What exactly do you want to index? Every word in the document? Every word in Headings? Some collection of key words (determined or predetermined in some fashion)?

    When you have built your index, what do you want to do with it? Knowing that will dictate to some extent the data structures you need to store the index as you create it.

    Once you have sorted out some of that stuff then you can start thinking about coding. At that point I'd take a good look at some of the HTML modules - HTML::TreeBuilder is a good starting point for this sort of task.


    Perl is environmentally friendly - it saves trees
      Sorry for being short..
      Basically my collection of html files are output of pod2html of say about 300 modules, being updated monthly.
      All I want to do is search by function names and say database tables they access, have not added any meta tags.
      If possible create a small index to quicken search.