Maclir has asked for the wisdom of the Perl Monks concerning the following question:
The site I manage has been using conventional (hand crafted) web pages since its inception, and we now have over 300 pages. We have a site search function, using the popular Swish-e tool. This is a C program, that is kicked off by a cron job each night, and scans each file in the server document tree, and builds search indexes and so on. When a person searches our site, they are given (hopefully) a list of pages, identified by document title - that is stuff between the <title> and </title> tags.
Now, since we are about to use EmbPerl::Object to have a far easier to manage site, each page only has the guts of the page as HTML stuff, with standard embperl files making the standard page headers, and so on. Any browser (or spider) getting pages through our server is delivered the complete HTML code, with titles, body stuff and so on. No problem there. But, swish-e, which runs the index generation outside of the web server, only sees the "raw" files. Hence, even though it indexes all the searchable text, there are no title tags in each content file.
Have other people faced this problem? Is there a version of swish-e - or something similar - that can be scheduled on a regular basis, but indexes documents retrieved through the web server itself? I am sure this coudl be done with LWP, but not wanting to invent the wheel . . .
How do those sites with large content management systems provide this search capability?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Templated Web Sites and Search Engines
by btrott (Parson) on May 01, 2001 at 08:58 UTC | |
by Maclir (Curate) on May 01, 2001 at 09:50 UTC | |
by DrZaius (Monk) on May 01, 2001 at 18:58 UTC | |
by btrott (Parson) on May 01, 2001 at 23:06 UTC | |
|
Re: Templated Web Sites and Search Engines
by asiufy (Monk) on May 02, 2001 at 07:18 UTC |