in reply to Script generated site index db
The first being the roll-your-own solution, which while being prehaps the most involved, that guaranteed to be most customised to your needs :-) For this, I would advise either using an indexing module such as my own Local::SiteRobot or WWW::SimpleRobot (the author of which has updated based on patches submitted). In following this path, I would advise you to do a bit of research and code audit of some of the existing search and indexing solutions - For example, a flat file text storage base simply won't scale, the better option, employed by most other scripts of this nature, is tied hash or DBM file storage. For content and meta-tag following, some of the existing modules such as HTML::TokeParser will shorten your development time and programmer migraines immensely.
The other way is to explore some of the existing solutions, one of the better options which I found when I was looking into this issue was the Perlfect Search script which offers HTTP and file system indexing, PDF text extraction, meta-tag following, ranking support and template-system output. I haven't as yet had a chance to set this up on one of my development boxes and while a preliminary code review has found a couple of quirky style and indexing issues, this package looks fairly solid.
Good luck ... And feel free to /msg me if you have any questions.
perl -e 's&&rob@cowsnet.com.au&&&split/[@.]/&&s&.com.&_&&&print'
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Script generated site index db
by S_Shrum (Pilgrim) on Mar 19, 2002 at 10:12 UTC | |
by fireartist (Chaplain) on Mar 19, 2002 at 14:21 UTC |