Re: Templated Web Sites and Search Engines

Swish-E supports retrieval of docs through HTTP rather than on the filesystem. Take a look at the Spidering section of the manual.

In fact, the implementation is done through LWP. I haven't looked at it much other than to notice that, though, although I do remember thinking at the time that it seemed a bit kludgy to have to invoke a Perl script to spider the site. Presumably there's quite a bit of interaction between the main C source, the system, and the Perl program that could be solved by having an actual HTTP implementation inline.

Comment on Re: Templated Web Sites and Search Engines

Replies are listed 'Best First'.
Re: Re: Templated Web Sites and Search Engines by Maclir (Curate) on May 01, 2001 at 09:50 UTC
Doh!!! a big ++ to btrott. Of course as I read your reply, it triggered a memory that there was the HTTP sextion of the config file for swish-e. Thank you for reminding me to RTFM.	[reply]
Re: Re: Templated Web Sites and Search Engines by DrZaius (Monk) on May 01, 2001 at 18:58 UTC
So we meet again :) Anywho, what if I have a template driven website where those templates can take on a few million or so different permutations. For example, a well used message board. I have designed atleast one website that had a search functionality for one type of content and another for the message board. Depending on your index this could take a few days to index. Are there any search engines that all you to macro in how to find the data and how to write the url for it?	[reply]
Re: Re: Re: Templated Web Sites and Search Engines by btrott (Parson) on May 01, 2001 at 23:06 UTC
For this type of need you may have the best luck rolling your own engine, probably using something like Search::InvertedIndex. This allows you the maximum in flexibility--you can customize exactly what gets indexed, ie. just the content and title of the message board posts--and flexibility (presumably) in the URLs associated with each of the index entries.	[reply]


laziness, impatience, and hubris
	PerlMonks