in reply to Search on html files
First off you need to define the problem a little better. What exactly do you want to index? Every word in the document? Every word in Headings? Some collection of key words (determined or predetermined in some fashion)?
When you have built your index, what do you want to do with it? Knowing that will dictate to some extent the data structures you need to store the index as you create it.
Once you have sorted out some of that stuff then you can start thinking about coding. At that point I'd take a good look at some of the HTML modules - HTML::TreeBuilder is a good starting point for this sort of task.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Search on html files
by vsailas (Beadle) on Nov 29, 2007 at 11:52 UTC |